Horizontal scaling with ElastiCache Redis: Stop getting burned by hot keys and shards
Solutions to avoid performance issues and balance traffic.
Cloud computing thrives on the promise that workloads of enormous scope can be served by horizontally scaling architectures. When a task that cannot be easily subdivided between hosts arises, however, horizontal scaling cannot help us – and when working with Redis, Hot Keys and Hot Shards are an all-too-common example.
What are hot keys?
In a theoretical perfect world, traffic would be evenly divided across all keys in a cache. In the real world, it is often the case that traffic is unevenly concentrated on a small set of keys. These are often referred to as “Hot Keys”. When this occurs, we often encounter bottlenecks in even scalable systems, particularly when dealing with high write loads on a small number of keys.
What are hot shards?
When running Redis in Cluster mode, as is the default “Production” configuration in ElastiCache Redis, data is divided over multiple “Shards”. Each shard must fit on a single host node. Shards may be replicated, providing redundancy and additional resources for serving reads, but writes are still the sole responsibility of the “primary” node, and shard capacity is dictated by the size of the node used.
To decide which shard should be responsible for a given key, Redis takes the CRC16 of the key and uses it to assign a numerical “hash slot”. Each shard in a cluster is responsible for an independent subset of these slots.
The condition we call a “Hot Shard” arises when one of these shards receives a disproportionate share of the total traffic. This may be due to the presence of hot keys within that shard, or due to unfortunate slot placement that results in a greater-than-fair portion of the actual data being assigned to a shard.
Why do hot keys/hot shards matter?
Hot keys and hot shards stymy our ability to scale. This typically manifests as resource exhaustion on overworked hosts.
CPU saturation
Hot keys and shards often load the serving hosts to capacity, saturating the available compute time and resulting in elevated latency and errors due to overload.
This is particularly exacerbated in Redis, which performs all operations within a single thread and is unable to effectively leverage multi-core hosts.
Memory saturation
Hot keys and shards can also stress the memory utilization of the serving host.
In Redis, the “best” case scenario is that the configuration is appropriate and evicts data or rejects new writes. In worse cases, Redis may exceed host memory limits and be killed, resulting in downtime and data loss.
Network saturation
Hot keys and shards can also result in network overload. The bandwidth required to serve the requests can simply exceed what is available to the serving hosts.
In Redis, it is important to remember that when replication is enabled, writes incur multiplicatively more bandwidth usage, as in addition to the bandwidth used by the primary to serve the request, it must also send that data to downstream replicas.
How can we mitigate the impact of hot keys in Redis?
If the hot keys are experiencing issues due to write performance in Redis, the best available option is often to instead attempt to reduce the volume of writes to those keys via changes in the client application. As writes are ultimately restricted by the performance of a single host, the only means available to address issues within Redis is to vertically scale the hosts – though that has marginal returns, as Redis is unable to make use of multiple cores.
If the hot keys are experiencing issues due to read performance, a common first mitigation would be to distribute the read load across additional replicas by increasing the replication factor of the shard. Alternatively, read load can often be reduced by introducing cache layers in downstream Redis consumers.
How can we mitigate the impact of hot shards in Redis?
Reassign hash slots to balance load
Redis allows for specifying which shards are responsible for which hash slots. By reassigning hash slots, we can help to balance traffic between shards.
However, this method has significant downsides. It is highly manual and requires ongoing maintenance to keep balanced. Additionally, it can only be performed offline, requiring downtime. While ElastiCache has developed an impressive slot rebalancing feature which can act on online clusters, it is only able to distribute slots evenly between shards – it doesn’t take into consideration whether that will produce balanced load.
If your load is predictably distributed between keys and is not expected to fluctuate significantly, doing the one-time work to balance your load may pay dividends.
Scale the entire cluster
A less work-intensive but more costly alternative, hot shards are often mitigated by simply adding additional shards to a Redis cluster. While splitting traffic to additional shards will reduce the work hot shards are required to do and frequently resolve the issue, doing so hurts our efficiency, as now the same amount of traffic requires more nodes to serve, many of which are likely to be relatively idle.
In conclusion
Hot keys and hot shards are a difficult challenge when working with Redis. We are often left stuck between a rock and a hard place – we can overprovision our clusters to support a small portion of the keyspace, or rework our client applications to balance traffic more evenly.
At Momento, we optimize our infrastructure to handle hot keys (and hot shards) behind the scenes. As traffic to your caches changes, you can leave handling the scaling to us–no downtime or maintenance windows required.