修改本页
Redis

分区:如何在多台Redis中分离数据。

分区是一个将数据分离到多台Redis中的处理过程,因此每台Redis将只会包括一部分键。 文档的第一部分将会给你介绍分区的概念,第二部分将会给你展示一些Redis分区的替代方案。

为什么分区有用

在Redis服务器中实现分区的两个主要目标:

分区基础知识

分区的标准是有差异的。 假设我们实例化四台Redis R0, R1, R2, R3, 很多表示用户的键像 user:1, user:2, ... 等等, 我们可以找到不同的方式在我们存储的键的实例中来查询。也就是说有 不同的系统来映射 一个指定的键到一个指定的Redis服务器。

其中最简单的分区方式是 范围分区, 根据对象的范围映射到特定的某台Redis中来实现。例如,我假设将ID为0-10000的用户分到 R0, 而ID为10001-20000的用户分到 R1 ,以此类推。

这种体系可行并且被用到实际中,然而它的缺点是需要用一张表来实现映射范围。这个表需要被管理并且它还要适合我们所有的对象。 对于Redis来说通常这不是一个好的方法。

另一种范围分区方式是哈希分区。 此方式适合于任何键, 没有必要存在形如 object_name:<id> 的键,只是像这样简单:

虽然有很多其他的方式来实现分区,但是通过这两个例子你应该明白分区的思想了。一种改进的哈希分区方式叫 consistent hashing 并被一些Redis客户端和代理实现。

分区的不同实现方案

分区可以是一个软件栈的不同部分来负责的。

分区的缺点

Redis的一些特点影响分区:

Data store or cache?

Partitioning when using Redis as a data store or cache is conceptually the same, however there is a huge difference. While when Redis is used as a data store you need to be sure that a given key always maps to the same instance, when Redis is used as a cache if a given node is unavailable it is not a big problem if we start using a different node, altering the key-instance map as we wish to improve the availability of the system (that is, the ability of the system to reply to our queries).

Consistent hashing implementations are often able to switch to other nodes if the preferred node for a given key is not available. Similarly if you add a new node, part of the new keys will start to be stored on the new node.

The main concept here is the following:

Presharding

We learned that a problem with partitioning is that, unless we are using Redis as a cache, to add and remove nodes can be tricky, and it is much simpler to use a fixed keys-instances map.

However the data storage needs may vary over the time. Today I can live with 10 Redis nodes (instances), but tomorrow I may need 50 nodes.

Since Redis is extremely small footprint and lightweight (a spare instance uses 1 MB of memory), a simple approach to this problem is to start with a lot of instances since the start. Even if you start with just one server, you can decide to live in a distributed world since your first day, and run multiple Redis instances in your single server, using partitioning.

And you can select this number of instances to be quite big since the start. For example, 32 or 64 instances could do the trick for most users, and will provide enough room for growth.

In this way as your data storage needs increase and you need more Redis servers, what to do is to simply move instances from one server to another. Once you add the first additional server, you will need to move half of the Redis instances from the first server to the second, and so forth.

Using Redis replication you will likely be able to do the move with minimal or no downtime for your users:

Redis分区实现

So far we covered Redis partitioning in theory, but what about practice? What system should you use?

Redis Cluster

Redis Cluster is the preferred way to get automatic sharding and high availability. It is currently not production ready, but finally entered beta stage, so we recommend you to start experimenting with it. You can get more information about Redis Cluster in the Cluster tutorial.

Once Redis Cluster will be available, and if a Redis Cluster complaint client is available for your language, Redis Cluster will be the de facto standard for Redis partitioning.

Redis Cluster is a mix between query routing and client side partitioning.

Twemproxy

Twemproxy is a proxy developed at Twitter for the Memcached ASCII and the Redis protocol. It is single threaded, it is written in C, and is extremely fast. It is open source software released under the terms of the Apache 2.0 license.

Twemproxy supports automatic partitioning among multiple Redis instances, with optional node ejection if a node is not available (this will change the keys-instances map, so you should use this feature only if you are using Redis as a cache).

It is not a single point of failure since you can start multiple proxies and instruct your clients to connect to the first that accepts the connection.

Basically Twemproxy is an intermediate layer between clients and Redis instances, that will reliably handle partitioning for us with minimal additional complexities. Currently it is the suggested way to handle partitioning with Redis.

You can read more about Twemproxy in this antirez blog post.

Clients supporting consistent hashing

An alternative to Twemproxy is to use a client that implements client side partitioning via consistent hashing or other similar algorithms. There are multiple Redis clients with support for consistent hashing, notably Redis-rb and Predis.

Please check the full list of Redis clients to check if there is a mature client with consistent hashing implementation for your language.