从 MySQL 切换到 Cassandra - 优点/缺点?

本文介绍了从 MySQL 切换到 Cassandra - 优点/缺点?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

对于一些背景知识 - 此问题涉及在单个小型 EC2 实例上运行的项目，并且即将迁移到中型实例.主要组件是 Django、MySQL 和大量用 python 和 java 编写的自定义分析工具，它们做了繁重的工作举重.同一台机器也在运行 Apache.

For a bit of background - this question deals with a project running on a single small EC2 instance, and is about to migrate to a medium one. The main components are Django, MySQL and a large number of custom analysis tools written in python and java, which do the heavy lifting. The same machine is running Apache as well.

数据模型如下所示 - 大量实时数据来自各种联网传感器，理想情况下，我想建立一种长轮询方法，而不是当前每 15 分钟轮询一次的方法(计算统计数据和写入数据库本身的限制).一旦数据进来，我将原始版本存储在MySQL，让分析工具对这些数据松散，并将统计信息存储在另外几个表中.所有这些都是使用 Django 呈现的.

The data model looks like the following - a large amount of real time data comes in streamed from various networked sensors, and ideally, I'd like to establish a long-poll approach rather than the current poll every 15 minutes approach (a limitation of computing stats and writing into the database itself). Once the data comes in, I store the raw version in MySQL, let the analysis tools loose on this data, and store statistics in another few tables. All of this is rendered using Django.

我需要的关系特征 -

Order by [Cassandra API 中的 SliceRange 似乎可以满足这一点]
分组依据
多个表之间的多条关系[Cassandra SuperColumns 似乎对一对多的处理效果很好]
Sphinx 在这方面给了我一个不错的全文引擎，所以这也是必要的.[在 Cassandra 上，Lucandra 项目似乎满足了这个需求]

我的主要问题是数据读取速度非常慢(写入也不那么热).我现在不想在上面投入大量资金和硬件，我更喜欢可以随时间轻松扩展的东西.从这个意义上说，垂直扩展 MySQL 并非微不足道(或便宜).

My major problem is that data reads are extremely slow (and writes aren't that hot either). I don't want to throw a lot of money and hardware on it right now, and I'd prefer something that can scale easily with time. Vertically scaling MySQL is not trivial in that sense (or cheap).

基本上，在阅读了大量有关 NOSQL 并尝试过 MongoDB、Cassandra 和 Voldemort 之类的东西之后，我的问题是，

So essentially, after having read a lot about NOSQL and experimented with things like MongoDB, Cassandra and Voldemort, my questions are,

在中型 EC2 实例上，通过转向 Cassandra 之类的东西，我会在读/写方面获得任何好处吗?这篇文章 (pdf) 似乎确实表明了这一点.目前，我会说每分钟几百次写入将是常态.对于读取 - 由于数据每 5 分钟左右更改一次，缓存失效必须很快发生.在某些时候，它也应该能够处理大量并发用户.即使创建了索引，MySQL 在大型表上执行一些连接时，应用程序的性能目前也会被扼杀 - 大约 32k 行的内容需要超过一分钟的时间来呈现.(这也可能是 EC2 虚拟化 I/O 的产物).表的大小约为 4-5 百万行，大约有 5 个这样的表.

On a medium EC2 instance, would I gain any benefits in reads/writes by shifting to something like Cassandra? This article (pdf) definitely seems to suggest that. Currently, I'd say a few hundred writes per minute would be the norm. For reads - since the data changes every 5 minutes or so, cache invalidation has to happen pretty quickly. At some point, it should be able to handle a large number of concurrent users as well. The app performance currently gets killed on MySQL doing some joins on large tables even if indexes are created - something to the order of 32k rows takes more than a minute to render. (This may be an artifact of EC2 virtualized I/O as well). Size of tables is around 4-5 million rows, and there are about 5 such tables.

鉴于 CAP 定理和最终一致性，每个人都在谈论在多个节点上使用 Cassandra.但是，对于一个刚刚开始成长的项目，是否有意义?部署一个单节点 cassandra 服务器?有什么注意事项吗?例如，它可以取代 MySQL 作为 Django 的后端吗?[这是推荐的吗?]

Everyone talks about using Cassandra on multiple nodes, given the CAP theorem and eventual consistency. But, for a project that is just beginning to grow, does it make sense to deploy a one node cassandra server? Are there any caveats? For instance, can it replace MySQL as a backend for Django? [Is this recommended?]

如果我要换班，我猜我将不得不重写应用程序的某些部分来做更多的管理"，因为我必须进行多次查找才能获取行.

If I do shift, I'm guessing I'll have to rewrite parts of the app to do a lot more "administrivia" since I'd have to do multiple lookups to fetch rows.

仅将 MySQL 用作键值存储而不是关系引擎是否有意义，并继续使用它?这样我就可以利用大量可用的稳定 API，以及一个稳定的引擎(并根据需要使用关系).(Brett Taylor 在 Friendfeed 上的帖子 - http://bret.appspot.com/entry/how-friendfeed-uses-mysql)

非常感谢已经完成轮班的人的任何见解！

Any insights from people who've done a shift would be greatly appreciated!

谢谢.

从 MySQL 切换到 Cassandra - 优点/缺点?

问题描述

推荐答案

相关文档推荐