Distributed Caching: How to Tune Redis for Traffic Spikes

A distributed caching system is a great tool for enhancing application performance and scalability. These systems temporarily store frequently accessed data in memory, reducing the latency and load on primary databases. This ensures that applications can sustain optimal performance during peak traffic periods.

Redis is a source-available, in-memory data store that was initially designed as a caching layer. Over time, it has evolved into a platform that can also function as a database, message broker and streaming engine. Besides strings it supports rich data structures such as hashes, lists, sets, sorted sets, HyperLogLogs and bitmaps, as well as streams and geospatial indexes.

Alternatives to Redis exist, but have different use cases. memcached is also a high-performance, in-memory caching system, but it's simpler and more lightweight than Redis and focuses solely on key-value storage without support for the complex data types or persistence features found in Redis. Hazelcast is an in-memory data grid solution that shares a lot of similarities with Redis, including data structures and caching capabilities. However, Hazelcast is more suited for use cases that require high-performance processing for real-time streaming data. While Redis also has primitives like Redis Streams, it's mainly focused on querying data at rest.

Adjusting Memory Allocation Settings

Because Redis is an in-memory database, efficient memory handling is a crucial factor in performance tuning.

Tune Your Active Expire Cycle

By default, Redis uses a passive expiration strategy where keys expire lazily (that is, only when they are accessed). The active expiration cycle in Redis periodically scans and looks to reclaim expired keys.

If you increase the active-expire-effort value (max limit 10) you’ll see a more aggressive active expire cycle and less tolerance for already expired keys remaining in the system:

active-expire-effort 5

Manage Memory Fragmentation

Memory fragmentation occurs when free memory is scattered in small blocks, making it difficult for the system to allocate large contiguous memory blocks. In Redis, this can happen due to frequent allocation and deallocation of memory because of the dynamic nature of its data structures.

To monitor memory fragmentation in Redis, use the INFO memory command, which provides detailed statistics about memory usage. A key metric to observe is the mem_fragmentation_ratio, calculated as the ratio between used_memory_rss (the number of bytes that Redis allocates as seen by the operating system) and used_memory (the total number of bytes allocated by Redis using its allocator):

INFO memory

After running this command, your output would look something like this:

# Memory used_memory:1024000 used_memory_human:1000.00K used_memory_rss:2048000 used_memory_rss_human:2.00M mem_fragmentation_ratio:2.00

You can also use the MEMORY PURGE command to help reduce memory fragmentation by attempting to release memory back to the operating system:

MEMORY PURGE

Another technique is to turn on active defragmentation in the configuration file (it's disabled by default). This process runs in the background and helps defragment memory over time:

activedefrag yes

Set Max Memory for Load Management

Configuring maxmemory is essential for preventing Redis from exhausting system resources during load spikes. It also ensures predictable memory usage. By setting a limit on the maximum memory Redis can use, you can avoid out-of-memory issues that can lead to system instability or crashes:

maxmemory 10gb

When you reach the memory limit, Redis removes keys as per the configured eviction policy specified by the maxmemory-policy attribute:

maxmemory-policy <policy name>

Redis provides several eviction policies to manage memory effectively when it reaches the maxmemory limit:

noeviction disallows any evictions. When the memory limit is reached, write operations return an error. This is suitable for read-heavy workloads where data persistence in memory is critical, but it can cause write operations to fail during load spikes.
allkeys-lru evicts the least recently used (LRU) keys across the entire data set. allkeys-lru is useful for workloads where all data is considered equally important, ensuring that frequently accessed data remains in memory.
volatile-lru is similar to allkeys-lru, but it only considers keys with a time to live (TTL). This is effective when you want to prioritize keeping non-expiring keys in memory while evicting less critical, expiring data.
allkeys-random evicts random keys from the entire data set. It can be useful in scenarios where data access patterns are highly unpredictable, but it can lead to less optimal performance compared to LRU policies.
volatile-random is similar to allkeys-random but only considers keys with a TTL. This policy is useful when you want to evict expiring keys randomly, providing a balance between randomness and prioritizing non-expiring data.
volatile-ttl evicts keys with the shortest TTL first. It's ideal for scenarios where it's preferable to remove soon-to-expire data during load spikes, thus preserving more recently added data.
volatile-lfu is a least frequently used (LFU) policy that evicts keys with the lowest access frequency among those with a TTL. It's best for workloads where access patterns show frequent use of a subset of keys, helping to retain the most frequently accessed data.

Monitoring eviction events and memory usage is crucial in order to ensure your eviction policy works as intended. It's possible to do this with the combination of INFO memory and the INFO stats command, which provides a range of statistics about the Redis server, including eviction events:

INFO stats

Example output:

# Stats total_connections_received:123456 total_commands_processed:234567 instantaneous_ops_per_sec:789 total_net_input_bytes:3456789 total_net_output_bytes:4567890 rejected_connections:0 expired_keys:1234 evicted_keys:567 keyspace_hits:123456 keyspace_misses:7890

In this code, the following metrics are important:

evicted_keys is the total number of keys evicted due to the maxmemory limit.
keyspace_hits counts the number of successful key lookups in the database and increments each time a key is found when requested. In contrast, keyspace_misses counts the number of failed key lookups and increments each time a key is requested but not found in the database. Together, these metrics help you understand the efficiency of your data access patterns.
expired_keys is the total number of keys that have expired.

During load testing or actual production load spike, pay close attention to the evicted_keys metric. If the eviction policy is working as intended, you should see an increase in evicted keys when the memory usage approaches the maxmemory limit. In addition, monitor keyspace_hits and keyspace_misses to ensure that the eviction policy is not adversely affecting the hit rate.

You may need to adjust Redis configuration parameters to balance memory management and performance based on observed load behavior.

To set the eviction policy dynamically, you can use the following:

CONFIG SET maxmemory-policy volatile-ttl

Tweaking Connection Settings

It's easy to focus on the memory aspects and overlook Redis connection settings when it comes to preventing it from being overwhelmed during load spikes. Let's explore some connection strategies that can help.

Set Connection Limits

Set appropriate connection limits using the maxclients directive, which specifies the maximum number of simultaneous connections Redis will accept:

maxclients 10000

If Redis reaches the maxclients limit, it starts rejecting new connections with an error message.

Each client connection consumes resources (memory and CPU). To estimate an appropriate maxclients value, consider the number of concurrent connections you expect during peak usage and the connection behavior of your applications, and evaluate the available system resources.

For example, if the default memory overhead per connection is estimated to be around 1 KB and the expected number of connections is 10,000, then the total memory overhead would be:

10,000 connections * 1 KB = 10 MB

Use the INFO command to monitor connections:

INFO clients

Your output would look like this:

# Clients connected_clients:5000 blocked_clients:0 maxclients:10000

Use Timeout Settings

The timeout directive manages idle connections by setting a timeout value in seconds. Connections that remain idle for longer than the specified timeout are closed.

For example, to set the timeout to 120 seconds, add the following to the Redis configuration file:

timeout 120

A shorter timeout value can prevent idle connections from consuming resources during peak traffic, while a longer timeout may be beneficial for applications with intermittent communication patterns.

Enable TCP Keepalive

Enabling TCP keepalive helps detect dead client connections, ensuring resources aren't wasted on clients that are no longer connected.

To configure TCP keepalive, set the tcp-keepalive directive to an appropriate value in seconds. The default setting is 300 seconds, but this can be adjusted based on your network conditions and requirements. For instance, in an environment with high latency or unstable network conditions, you might want to reduce the tcp-keepalive value to detect broken connections more quickly. A value of 60 to 120 seconds can help Redis identify and close idle connections faster, preventing issues caused by lingering dead connections:

tcp-keepalive 60

Set Connection Buffer Limits

Redis allows setting limits for different classes of clients (normal, replica, pubsub) using the client-output-buffer-limit directive. Configuring client output buffer limits prevents a single client type from using excessive memory:

client-output-buffer-limit normal 0 0 0 client-output-buffer-limit replica 256mb 64mb 60 client-output-buffer-limit pubsub 32mb 8mb 60

The syntax is as follows:

client-output-buffer-limit <class><hard limit><soft limit><soft seconds>

A client is immediately disconnected once the hard limit is reached or if the soft limit is reached and remains reached for the specified number of seconds (continuously).

Optimize the TCP Backlog

The tcp-backlog setting determines the size of the TCP backlog queue, which is the queue of pending connections waiting to be accepted by Redis. During load spikes, increasing the backlog size can help Redis handle a higher number of incoming connection requests without dropping them:

tcp-backlog 511

Use Connection Pooling

Connection pooling reuses existing connections, which significantly reduces the overhead of establishing new connections during load spikes. Many Redis client libraries support connection pooling out of the box.

Make sure you configure the connection pool size based on your application's concurrency requirements and Redis server capacity:

import redis

pool = redis.ConnectionPool(host='localhost', port=6379, db=0)

# Use the connection pool to create a Redis client
client = redis.Redis(connection_pool=pool)

client.set('key', 'value')
print(client.get('key'))

Monitor Connection Metrics

Use the INFO command to track active connections, rejected connections and connection spikes. This helps detect capacity issues and optimize resource allocation. INFO also enables quick identification of performance bottlenecks and ensures efficient handling of client connections:

INFO clients

Key metrics to monitor include:

connected_clients: The number of currently connected clients.
rejected_connections: The number of connections rejected due to the maxclients limit.
connections_received: The total number of connections received since the server started.

These connection settings are dynamically updated to maintain optimal performance during load spikes.

Optimizing Data Structure Usage

Redis data structures are quite versatile and offer improved performance. However, you can also optimize them further to squeeze out even more performance, which is crucial under heavy load.

Use Common Data Structures

The following data structures are commonly used in Redis applications.

Hashes

Hashes in Redis can be memory-efficient when certain optimizations are applied. Redis uses two internal encodings for hashes: ziplist (compressed list) and hashtable. By default, small hashes are encoded as ziplist to save memory.

Configuring these parameters helps ensure that small hashes use ziplist encoding, which is more memory-efficient:

hash-max-ziplist-entries: Maximum number of entries in a hash to use ziplist encoding.
hash-max-ziplist-value: Maximum size of values in a hash to use ziplist encoding.

hash-max-ziplist-entries 512 hash-max-ziplist-value 64

Lists

Lists are used for sequential storage, job queues and task lists. They support efficient operations at both ends of the list, making them suitable for first in, first out (FIFO) and last in, first out (LIFO) patterns:

LPUSH task_queue "task1"
RPOP task_queue

While LPUSH adds a new task to the queue, LTRIM caps the list at a certain number of items (1,000 items in this case). Trimming prevents unbounded list growth, ensuring consistent memory usage and performance. It's ideal for scenarios where only the most recent items matter, like log management or real-time data streams, where older data becomes less relevant over time.

LPUSH task_queue "task1"
LTRIM task_queue 0 999

Blocking operations like BRPOP are excellent for creating efficient, real-time task queues. BRPOP waits for an item to appear in the list, blocking until one is available or a timeout occurs. This approach eliminates constant polling, reducing CPU usage and network traffic.

In the following example, a timeout of 0 means it will block indefinitely until an element is available:

BRPOP task_queue 0

Sorted Sets

Sorted sets are used for scenarios requiring ordered data with scores, such as leaderboards or ranked queues. They maintain order and allow for a range of queries based on scores:

ZADD leaderboard 100 "user1" 200 "user2"
ZRANGE leaderboard 0 -1 WITHSCORES

When a sorted set is small and its members are short strings, Redis uses a compact encoding called ziplist, similar to hashes. The conditions for using ziplist encoding are controlled by two configuration parameters:

zset-max-ziplist-entries: Maximum number of members (default 128)
zset-max-ziplist-value: Maximum length of member string (default 64 bytes)

For sorted sets, keep the number of members and their string lengths below the configured thresholds.

HyperLogLogs

HyperLogLogs are useful for approximating the cardinality of a set (counting unique items) with minimal memory. They're ideal for large-scale analytics, such as counting unique visitors or events.

PFADD visitors "user1" "user2" "user3" "user1" "user4" "user3"
PFCOUNT visitors

The encoding of HyperLogLog (HLL) structures can switch between SPARSE and DENSE based on their internal state, allowing for a trade-off between accuracy and memory efficiency. Initially, a HyperLogLog structure uses SPARSE encoding, which is more memory-efficient for smaller sets. As more elements are added, Redis automatically converts it to DENSE encoding to maintain accuracy.

Use Memory Optimization Commands

You can use several commands that only modify data structures when certain conditions are met:

HSETNX sets a field in a hash only if it does not already exist, preventing redundant data writes.
LPUSHX adds an element to a list only if the list exists, avoiding the creation of new lists with a single element.
The NX option for SET sets the value of a key only if the key does not already exist, and the XX option sets the value of a key only if the key already exists.

Here are a few examples:

HSETNX user:42 email "foo@bar.com"

In this example, the output of HSETNX is 1 if the field email was set because it did not exist, and 0 if the field email was not set because it already existed.

LPUSHX existing_list "new_element"

In this example, LPUSHX returns the length of the list after the push operation if the list exists, or 0 if the list does not exist.

SET key1 value1 NX

In this example, the output is OK if the key was set because it did not exist or nil, if the key was not set because it already existed.

Compress Data Structures

In your application, use libraries like gzip or snappy to compress large strings or binary data before storing it in Redis:

import gzip

data = b"Large data to compress"
compressed_data = gzip.compress(data.encode())
redis.set("key", compressed_data)

In addition to hashes, small lists and sorted sets can also be encoded as ziplist to save memory. For example, use list-max-ziplist-size to set the maximum size of entries in a list to use ziplist encoding.

Incorporate Regular Data Structure Cleanup

Using TTL in Redis allows for automatic key removal after a specified period, which is useful for managing temporary data, optimizing memory usage and ensuring data freshness in caching scenarios.

SET session:123 "data" EX 3600  # Expires after 1 hour

Implement background tasks to clean up stale or unnecessary data periodically. For example, you can use Redis SCAN to find and delete keys matching certain patterns:

cursor = 0
while True:
   cursor, keys = redis.scan(cursor, match="temp:*")
   if keys:
       redis.delete(*keys)
   if cursor == 0:
       break

Implement Benchmarking

To identify the most efficient data structures and configurations for specific scenarios, you can use the redis-benchmark CLI tool to simulate load spikes.

For instance, in the following example, the benchmarks perform 100,000 operations of the specified command and report the performance metrics:

redis-benchmark -n 100000 -q -t hset redis-benchmark -n 100000 -q -t lpush redis-benchmark -n 100000 -q -t zadd

The output for these benchmark commands is as follows:

====== HSET ====== 100000 requests completed in 1.52 seconds 50 parallel clients 3 bytes payload keep alive: 1 99.99% <= 1 milliseconds 100.00% <= 1 milliseconds 65789.47 requests per second

This output provides a detailed performance analysis of specific Redis commands. In this case, the HSET command benchmark completed 100,000 requests in 1.52 seconds using 50 parallel clients with a 3-byte payload. The latency distribution shows that 99.99 percent of requests were completed in less than or equal to 1 millisecond, and 100 percent within 1 millisecond. The throughput was 65,789.47 requests per second, indicating the efficiency of Redis in handling these operations.

The results indicate excellent performance for the HSET command, showcasing very low latency and high throughput, with nearly all requests completed in under one millisecond and a throughput of over 65,000 requests per second.

Adjustments may be necessary if benchmarks reveal higher latency, lower throughput, high resource utilization, or concurrency issues. For example, if latency exceeds 1-2 milliseconds for a substantial percentage of requests or if throughput falls significantly below expectations, it suggests potential bottlenecks. High CPU or memory usage during benchmarks or performance degradation with increased parallel clients are also indicators that optimizations or scaling may be required to maintain desired performance levels.

Other Criteria

In addition to the best practices discussed so far, there are a few other areas to keep in mind when optimizing Redis, including networking and distributed Redis architectures.

Monitor Network Latency

Redis provides several commands and tools for monitoring latency:

LATENCY LATEST returns the latest latency samples for all events.
LATENCY DOCTOR provides a human-readable report of detected latency issues.
LATENCY HISTORY <event name> provides raw data of the event's latency spikes in time series format.

External tools such as Prometheus, Grafana, New Relic or Datadog can also be used for monitoring and managing the performance of systems, applications and infrastructure components to identify bottlenecks.

Implement Sharding and Replication

Redis sharding distributes data across multiple instances, reducing the load and network traffic on any single Redis node during high-traffic periods. Sharding can be implemented using Redis Cluster or client-side sharding.

Redis Cluster automatically handles data distribution and provides a built-in sharding mechanism. It offers high availability and horizontal scalability, and read requests can be offloaded to replica nodes to reduce the load on the primary node and balance the network traffic.

Client-side sharding can be used to manually distribute data across multiple Redis instances using client libraries that support sharding. This approach gives more control over the sharding logic.

The following is an example from the go-redis client that uses consistent hashing to distribute keys across multiple Redis shards:

import "github.com/redis/go-redis/v9"

rdb := redis.NewRing(&redis.RingOptions{
   Addrs: map[string]string{
       "shard1": "localhost:7000",
       "shard2": "localhost:7001",
       "shard3": "localhost:7002",
   },
})

Configure replica-serve-stale-data to control whether replicas can serve stale data during a disconnection. Setting this to yes helps reduce the load on the primary node during spikes, as replicas can handle read traffic even if they are slightly out of sync:

replica-serve-stale-data yes

min-replicas-to-write ensures writes are only accepted if a minimum number of replicas acknowledge the write. This helps maintain data consistency and availability:

min-replicas-to-write 1

Set Up Dynamic Scaling

Redis clusters can be autoscaled by dynamically adding or removing nodes based on traffic conditions. Monitoring tools track key performance indicators such as CPU usage, memory consumption and request rates. When a traffic spike is detected, additional Redis nodes can be provisioned and added to the cluster, distributing the load more evenly and preventing any single node from becoming a bottleneck.

Conclusion

Even with a high-performance data store like Redis, issues often remain hidden when the system is running at low to moderate traffic. In this article, you explored Redis performance tuning best practices, such as adequate memory allocation, using the right eviction policies and optimizing data structure usage. Implementing these best practices ensures optimal resource utilization, enabling Redis to handle sudden surges in demand while maintaining fast response times and overall system stability.

If you are implementing a distributed caching system to support a high-stakes application serving a global user base, Equinix provides a set of robust network and cloud infrastructure primitives that are fully programmable and can be used to design exactly the architecture your app needs. Automated bare metal compute and storage in 31 metros, a global backbone and the ability to link privately to all major clouds and thousands of service providers and enterprises gives you the freedom to combine just the components your solution requires.

How to Tune Redis to Handle Big Spikes in Traffic