Skip to content

Stage 1: Single Instance

A single Redis instance can handle 100,000+ operations per second on commodity hardware. For most small-to-medium applications — caching, session storage, simple queues, rate limiting — this is more than enough. Before adding replicas, Sentinel, or Cluster, understand why one instance is so fast and what it guarantees.

The Anti-Over-Engineering Rule

If your app has one or two servers, your Redis fits in memory, and you can tolerate losing up to one second of data on crash — stay here. Everything in the next two stages adds operational complexity that you don't need yet.


The Event Loop

Redis processes every command on a single thread. This sounds like a bottleneck, but it's actually a performance advantage.

How It Works

Redis implements its own event library called ae, which wraps the fastest available OS-level I/O multiplexer: epoll on Linux, kqueue on macOS/BSD, or select as a fallback. The core loop is minimal:

  1. Register all client sockets with the multiplexer
  2. Wait for any socket to have data ready (non-blocking)
  3. Process ready connections one at a time, to completion
  4. Repeat
graph TD
    A[Wait for ready sockets<br/><i>epoll/kqueue</i>] --> B{Any socket<br/>has data?}
    B -->|Yes| C[Read request<br/><i>network I/O</i>]
    C --> D[Parse command<br/><i>CPU</i>]
    D --> E[Execute against<br/>data structures<br/><i>CPU + memory</i>]
    E --> F[Write response<br/><i>network I/O</i>]
    F --> A
    B -->|No| G[Process timer events<br/><i>expiry, background tasks</i>]
    G --> A

Each command runs to completion before the next one starts. There is no interleaving, no preemption, no concurrent access to data structures.

Why Single-Threaded Is Fast

Factor Multi-threaded cost Redis avoids it
Context switching Thousands of CPU cycles per switch (kernel transition, register save/restore, cache invalidation) Zero — one thread, no switching
Cache locality Cores fight over cache lines ("ping-ponging"), adding hundreds of cycles per operation Hot data stays in L1/L2 cache (~1ns access vs ~100ns for main memory)
Synchronization Mutexes, spin locks, CAS loops — all add latency and complexity No locks anywhere. Impossible to deadlock.
Correctness Race conditions, data corruption, hard-to-reproduce bugs Every operation is inherently atomic

The key insight: since Redis operations are microsecond-scale (data is in memory, not on disk), the bottleneck is network I/O, not CPU. A single core can process commands far faster than the network can deliver them.

Redis 6.0+: Threaded I/O

Redis 6.0 introduced multi-threaded I/O — but with a critical distinction:

What Threaded?
Reading requests from sockets Yes (I/O threads)
Writing responses to sockets Yes (I/O threads)
Parsing commands No (main thread)
Executing commands No (main thread)
Data structure manipulation No (main thread)

The I/O threads do not run concurrently with command execution. The main thread dispatches read/write work to I/O threads, waits for them to finish, then processes commands. This preserves the single-threaded execution guarantee while parallelizing the socket reads/writes that were the actual bottleneck.

Impact

Threaded I/O is disabled by default. When enabled with 8 I/O threads, benchmarks show 37–112% throughput improvement depending on workload. The gains come from parallelizing network serialization/deserialization, not from executing commands faster.


Data Structures Under the Hood

When you use a Redis Hash, List, or Sorted Set, you're not directly using a hashtable, linked list, or tree. Redis chooses an internal encoding based on the data's size and shape, silently switching encodings as data grows.

SDS (Simple Dynamic Strings)

Redis never uses raw C strings. Every string value is an SDS (Simple Dynamic String):

[Header: len | alloc | flags] [string data...] [\0]
Property C string SDS
Get length O(n) — scan for \0 O(1) — read len field
Binary safe No — \0 terminates Yes — length-tracked, can contain \0
Buffer overflow protection None Checked against alloc
Resize cost Always full realloc Pre-allocated space reduces reallocs

SDS uses multiple header types (sdshdr8, sdshdr16, sdshdr32, sdshdr64) — the smallest header that can represent the string's length. A 200-byte string uses a 3-byte header; a 1GB string uses a 9-byte header.

Growth strategy: strings under 1MB double their allocation; strings over 1MB grow by 1MB at a time. This amortizes reallocation cost.

Listpack (Replacing Ziplist)

For small hashes, lists, and sorted sets, Redis uses a listpack — a single contiguous block of memory with all entries stored sequentially.

graph LR
    subgraph "Listpack (one contiguous allocation)"
        A["Total<br/>bytes"] --> B["Entry 1<br/><i>encoding + data + backlen</i>"]
        B --> C["Entry 2<br/><i>encoding + data + backlen</i>"]
        C --> D["Entry 3<br/><i>encoding + data + backlen</i>"]
        D --> E["EOF<br/>marker"]
    end

Why contiguous memory matters: a hashtable with 10 entries requires ~10 separate allocations (node structs, key/value pointers). A listpack stores all 10 entries in one allocation. This means:

  • ~2 bytes overhead per entry vs ~21 bytes for a hashtable node
  • Sequential memory access (cache-friendly) vs pointer chasing (cache-hostile)
  • Zero pointer overhead

The ziplist problem: Redis 7 replaced ziplists with listpacks because of a design flaw. Ziplists stored each entry's previous entry's length. If you inserted a large entry, the next entry's "previous length" field might need to grow from 1 byte to 5 bytes, which could cascade through the entire structure — an O(n) chain reaction from a single insert. Listpacks store each entry's own length instead, eliminating cascading updates entirely.

Skiplist

Used for sorted sets when they outgrow the listpack threshold. A skiplist is a probabilistic data structure — a linked list with multiple "express lanes":

graph LR
    subgraph "Level 3"
        L3_1["1"] --> L3_9["9"]
    end
    subgraph "Level 2"
        L2_1["1"] --> L2_5["5"] --> L2_9["9"]
    end
    subgraph "Level 1"
        L1_1["1"] --> L1_3["3"] --> L1_5["5"] --> L1_7["7"] --> L1_9["9"]
    end
Property Skiplist Balanced tree (e.g., AVL, Red-Black)
Search O(log n) O(log n)
Insert/Delete O(log n), simpler O(log n), complex rotations
Range queries Very efficient (follow forward pointers) Requires in-order traversal
Memory per node ~1.33 pointers average (p=1/4) 2 pointers (left/right)
Implementation ~100 lines ~300+ lines

Redis chose skiplists over balanced trees because they're simpler to implement, naturally efficient for range queries (ZRANGEBYSCORE), and have comparable performance with slightly less memory.

Dict (Hashtable) with Incremental Rehashing

The core key-value store uses a dict — a hash table with a critical design choice: incremental rehashing.

A naive hash table doubles in size when the load factor gets too high. This means allocating a new table and moving all entries — a sudden, blocking operation that causes a latency spike proportional to the table size.

Redis avoids this by maintaining two hash tables simultaneously:

graph TD
    subgraph "During Rehashing"
        HT0["ht[0] — old table<br/><i>gradually emptying</i>"]
        HT1["ht[1] — new table (2x size)<br/><i>gradually filling</i>"]
    end

    OP["Every dict operation<br/>(lookup, insert, delete)"] --> M["Migrate 1 bucket<br/>from ht[0] → ht[1]"]
    M --> HT0
    M --> HT1
  1. When a resize triggers, ht[1] is allocated at double the size
  2. A rehashidx counter starts at 0
  3. On every normal operation (GET, SET, DEL), Redis migrates one bucket from ht[0] to ht[1] (visiting up to 10 empty buckets to bound work per step)
  4. Lookups check both tables; inserts go only to ht[1]
  5. When migration completes, ht[1] becomes ht[0]

The result: rehashing is invisible to clients. There is no sudden latency spike, just a tiny constant overhead spread across millions of operations. The trade-off is temporarily using ~2x the hash table memory during the migration window.

Encoding Thresholds

Redis automatically switches encodings when data outgrows compact representations:

Type Compact encoding Condition to stay compact Full encoding
Hash Listpack Entries ≤ 128 AND all values ≤ 64 bytes Hashtable
Sorted Set Listpack Entries ≤ 128 AND all values ≤ 64 bytes Skiplist + Hashtable
Set Intset All integers AND count ≤ 512 Hashtable
List Quicklist (linked list of listpacks) Always uses quicklist

These thresholds are configurable (hash-max-listpack-entries, hash-max-listpack-value, etc.). The defaults are tuned for a balance between memory efficiency and CPU cost of linear scans in compact encodings.

BullMQ: How Data Structures Map

BullMQ stores each job as a Redis Hash (bull:<queue>:<jobId>) with fields like data, opts, progress, returnvalue. For a queue with thousands of small jobs, each job hash likely uses the compact listpack encoding (few fields, small values).

The queue itself is a List (bull:<queue>:wait) — FIFO order via LPUSH/BRPOPLPUSH. Delayed jobs sit in a Sorted Set (bull:<queue>:delayed) scored by their target timestamp, leveraging the skiplist's efficient ZRANGEBYSCORE to find jobs ready to execute.


Memory Model

Jemalloc and Fragmentation

Redis uses jemalloc as its memory allocator. Jemalloc requests large chunks from the OS and subdivides them into size classes for smaller allocations — which is efficient, but creates a fundamental tension.

When Redis creates and deletes keys of varying sizes, the freed slots in jemalloc's arenas can't always be reused for differently-sized allocations. An arena can only be returned to the OS when all allocations within it are freed. If even one small allocation remains, the entire arena stays resident.

Fragmentation ratio:

\[ \text{fragmentation\_ratio} = \frac{\text{used\_memory\_rss}}{\text{used\_memory}} \]
Ratio Meaning
< 1.0 Redis is swapping to disk — very bad, immediate action needed
1.0 – 1.5 Normal, healthy
1.5 – 2.0 Moderate fragmentation — monitor
> 2.0 High fragmentation — consider active defrag

Active defragmentation (Redis 4.0+): Redis can proactively move allocations to consolidate fragmented memory. It scans key-value pairs and reallocates them to contiguous regions. This trades CPU for reduced memory waste — configurable via activedefrag yes with tunable thresholds for when to start and how much CPU to spend.

Eviction Policies

When Redis hits maxmemory, it must evict keys to make room. The choice of eviction policy has significant implications:

Policy Scope Algorithm
noeviction Return errors on writes (safe but disruptive)
allkeys-lru All keys Approximated LRU
allkeys-lfu All keys Approximated LFU
allkeys-random All keys Random eviction
volatile-lru Keys with TTL only Approximated LRU
volatile-lfu Keys with TTL only Approximated LFU
volatile-ttl Keys with TTL only Shortest TTL first
volatile-random Keys with TTL only Random eviction

Redis LRU Is Approximated, Not Exact

Redis does not maintain a true LRU linked list — that would require per-key pointer overhead (16 bytes per key on 64-bit systems). Instead:

  1. Each key stores a 24-bit LRU clock (last access timestamp)
  2. On eviction, Redis samples maxmemory-samples (default 5) random keys
  3. Sampled keys populate a 16-entry eviction pool sorted by idle time
  4. The key with the longest idle time in the pool is evicted

With 5 samples, the approximation is surprisingly close to true LRU. With 10 samples, it's nearly indistinguishable. This is a deliberate engineering trade-off: exact LRU costs O(n) memory, approximated LRU costs O(1) with tunable accuracy.

LFU (Least Frequently Used) — added in Redis 4.0, reuses the same 24 bits differently:

  • 16 bits: decay timestamp (when the counter was last decremented)
  • 8 bits: a Morris counter — a logarithmic probabilistic counter

The Morris counter's increment probability decreases as the count grows: p = 1 / (counter * lfu_log_factor + 1). This means frequently accessed keys quickly reach a high count and stay there, while the decay mechanism periodically reduces counts so old popular keys don't persist forever.


Persistence: What Can You Lose?

Redis is an in-memory store. Without persistence, a crash means total data loss. Redis offers two persistence mechanisms, each with distinct trade-offs.

RDB (Snapshots)

Redis periodically saves the entire dataset to a binary file (.rdb):

  1. Redis calls fork() to create a child process
  2. Parent continues serving clients
  3. Child writes the entire dataset to a temp file
  4. Child atomically replaces the old .rdb file

Copy-on-Write (CoW): after fork(), parent and child share the same memory pages. Only pages modified by the parent (from new writes during the snapshot) are physically copied. The memory overhead is proportional to write volume during the snapshot, not dataset size.

fork() Is Redis's Achilles' Heel

fork() must duplicate the page table — a kernel data structure that maps virtual addresses to physical memory. With a 25GB dataset, this can take 100ms+, during which Redis is completely blocked.

Transparent Huge Pages (THP) make this dramatically worse. THP increases the CoW granularity from 4KB to 2MB pages. A single byte write during a snapshot now copies 2MB instead of 4KB. Redis's official documentation explicitly recommends disabling THP:

echo never > /sys/kernel/mm/transparent_hugepage/enabled

AOF (Append-Only File)

Every write command is appended to a log file. Three fsync policies control the durability-performance trade-off:

appendfsync Data loss window How it works
always None fsync after every write — safest, slowest
everysec (default) ~1 second fsync every second in a background thread
no ~30 seconds (OS-dependent) Let the OS flush when it wants

AOF Rewrite: the AOF grows indefinitely because it logs every write. Periodically, Redis rewrites it to the minimal set of commands that recreate the current dataset. Redis 7.0+ uses Multi-Part AOF — a base file (in RDB format) plus incremental files plus a manifest — which eliminates the old double-write penalty and memory spike during rewrites.

AOF rewrite also uses fork()

The same CoW memory concerns from RDB apply here. The rewrite happens less frequently, but the fork() latency hit is the same.

What You Can Lose

Persistence mode Data loss on crash Recovery speed Best for
None Everything N/A (no recovery) Pure cache
RDB only Minutes (since last snapshot) Fast (binary load) Backups, disaster recovery
AOF only (everysec) ~1 second Slower (replay commands) Durability-sensitive apps
RDB + AOF (recommended) ~1 second (loads AOF) Medium Production — best of both
AOF (always) Nothing (theoretically) Slowest Extremely rare — significant perf cost

Talking to Redis: The Wire Protocol

Before understanding why pipelining and connection pooling matter, you need to understand how clients actually communicate with Redis.

RESP (REdis Serialization Protocol)

Every Redis client — whether it's redis-cli, ioredis, Jedis, or Lettuce — talks to the server over TCP using RESP, a simple text-based protocol designed to be human-readable and machine-parseable.

A client sends a command as a RESP Array of Bulk Strings:

*3\r\n        ← Array of 3 elements
$3\r\n        ← Bulk String, 3 bytes
SET\r\n       ← "SET"
$4\r\n        ← Bulk String, 4 bytes
name\r\n      ← "name"
$5\r\n        ← Bulk String, 5 bytes
alice\r\n     ← "alice"

The server responds with a type-prefixed reply:

Prefix Type Example
+ Simple String +OK\r\n
- Error -ERR unknown command\r\n
: Integer :1\r\n
$ Bulk String $5\r\nalice\r\n
* Array *2\r\n$3\r\nfoo\r\n$3\r\nbar\r\n

The protocol is synchronous by default: send one command, wait for one reply. This is the bottleneck that pipelining solves.

Why This Matters

Understanding RESP explains why Redis is fast at the protocol level: there is no schema negotiation, no handshake overhead per command, no binary encoding/decoding complexity. It's plain text over TCP. The simplicity is the speed.

Connection Pooling

Every Redis command requires a TCP connection. Creating a new TCP connection involves a three-way handshake (~1 RTT), and if TLS is enabled, a TLS handshake on top (~1-2 more RTTs). Doing this per-request is wasteful.

graph LR
    subgraph "Without Pool"
        A1["Request 1"] -->|"new TCP conn"| R1["Redis"]
        A2["Request 2"] -->|"new TCP conn"| R1
        A3["Request 3"] -->|"new TCP conn"| R1
    end
graph LR
    subgraph "With Pool"
        A1["Request 1"] --> P["Connection Pool<br/><i>N persistent connections</i>"]
        A2["Request 2"] --> P
        A3["Request 3"] --> P
        P -->|"reuse conn A"| R1["Redis"]
        P -->|"reuse conn B"| R1
    end

How it works:

  1. On startup, the client library creates N persistent TCP connections to Redis
  2. When a thread needs to send a command, it borrows a connection from the pool
  3. After the command completes, the connection is returned to the pool
  4. Connections are kept alive and reused across thousands of requests

BullMQ: Connection Pooling via ioredis

BullMQ uses ioredis as its Redis client, which supports connection pooling natively. A BullMQ Worker maintains its own connection (separate from the Queue producer connection) because the worker uses BRPOPLPUSH/BLMOVE — a blocking command that monopolizes the connection until a job arrives. If the worker shared a connection with non-blocking operations, they'd all stall.

This is why BullMQ requires separate Redis connections for the Queue (producer), Worker (consumer), and QueueEvents (event listener) — each has a fundamentally different connection usage pattern.

Pool Sizing

Too few connections: threads wait for a free connection, adding latency. Too many connections: each connection consumes memory on both client and server (Redis tracks each connection's output buffer, query buffer, and state). A good starting point is number of application threads that concurrently access Redis, plus a small buffer.

Check INFO clientsconnected_clients to see how many connections your application is actually using.

Pipelining

In the default request-response model, each command pays a full network round-trip (RTT). If RTT is 0.5ms and you need to run 100 commands, that's 50ms of pure network waiting — even though Redis can process those 100 commands in under 1ms.

sequenceDiagram
    participant Client
    participant Redis

    rect rgb(255, 230, 230)
        Note over Client,Redis: Without pipelining (3 × RTT)
        Client->>Redis: SET key1 "a"
        Redis->>Client: +OK
        Client->>Redis: SET key2 "b"
        Redis->>Client: +OK
        Client->>Redis: SET key3 "c"
        Redis->>Client: +OK
    end

    rect rgb(230, 255, 230)
        Note over Client,Redis: With pipelining (1 × RTT)
        Client->>Redis: SET key1 "a"<br/>SET key2 "b"<br/>SET key3 "c"
        Redis->>Client: +OK<br/>+OK<br/>+OK
    end

How it works: The client sends multiple commands without waiting for replies, then reads all replies in one batch. This is possible because RESP is a simple stream protocol — Redis processes commands in order and writes replies in order. The client can match replies to commands by position.

Impact: Pipelining can improve throughput by 5-10x for bulk operations. The improvement is proportional to RTT — the higher the latency between client and server, the bigger the gain.

Scenario RTT 100 commands without pipeline 100 commands with pipeline
Same machine ~0.05ms ~5ms ~0.1ms
Same datacenter ~0.5ms ~50ms ~1ms
Cross-region ~50ms ~5,000ms ~50ms

Pipelining Is Not Atomic

Unlike MULTI/EXEC, pipelined commands are not transactional. Other clients' commands can interleave between your pipelined commands. Pipelining is a throughput optimization, not a consistency mechanism. If you need atomicity, use MULTI/EXEC or Lua scripts.

BullMQ: Move and Fetch as Implicit Pipelining

BullMQ's moveToCompleted Lua script is a form of server-side pipelining: it completes the current job and fetches the next job in a single round-trip. Without this optimization, every job completion would cost 2 RTTs (one to complete, one to fetch). Under high throughput, this nearly doubles the effective processing rate.


Production Tuning

A default Redis installation works well for development. Production workloads need deliberate tuning of both Redis configuration and the operating system.

Redis Configuration

Directive Default Production guidance
maxmemory 0 (no limit) Always set this. Without it, Redis will consume all available RAM and the OS will OOM-kill it. Leave ~25% overhead for fork() CoW, replication buffers, and fragmentation.
maxclients 10,000 Increase if you have many application instances or microservices. After this limit, Redis responds with errors to new connections.
tcp-backlog 511 The queue size for incoming connections waiting to be accepted. Under burst traffic, increase this alongside the OS somaxconn and tcp_max_syn_backlog.
timeout 0 (disabled) Seconds before idle connections are closed. Setting this (e.g., 300) prevents connection leaks from crashed clients.
maxmemory-policy noeviction Choose an eviction policy (see Eviction Policies above). noeviction returns errors on writes when memory is full — safe but disruptive.

Operating System Tuning

Transparent Huge Pages (THP)

THP increases CoW page size from 4KB to 2MB, causing massive memory amplification during fork(). Redis will log a warning at startup if THP is enabled:

echo never > /sys/kernel/mm/transparent_hugepage/enabled

Make this persistent across reboots by adding it to /etc/rc.local or a systemd unit.

File descriptor limits: each Redis client connection uses one file descriptor. The default limit (often 1024) is too low for production. Redis will log errors like "Redis can't set maximum open files".

# /etc/security/limits.conf (or systemd override)
redis soft nofile 65536
redis hard nofile 65536

Network stack tuning for high-connection environments:

Kernel parameter What it does Suggested value
net.core.somaxconn Max queued connections 65535
net.ipv4.tcp_max_syn_backlog SYN queue size 65535
vm.overcommit_memory Allow fork() to succeed even if memory looks tight 1

vm.overcommit_memory = 1

Redis's fork() for persistence creates a child process that theoretically could use as much memory as the parent (due to CoW). With the default overcommit setting (0), Linux may refuse the fork if it thinks there isn't enough memory — even though CoW means the child will only use a fraction. Setting overcommit_memory = 1 tells Linux to always allow the fork. Redis logs a warning if this is not set.

CPU affinity and RPS: on multi-core machines, prevent Redis from competing with network interrupt handlers for the same CPU cores:

  1. Enable RPS (Receive Packet Steering) on network interfaces, pinned to cores 0-1
  2. Set Redis's CPU affinity to cores 2+

This ensures the network stack and Redis's event loop don't contend for the same L1/L2 cache.


Observability

You can't fix what you can't measure. A single Redis instance is simple to monitor — and you should monitor it from day one, not after the first incident.

The INFO Command

INFO is the single most important diagnostic command. It returns metrics across multiple sections:

redis-cli INFO [section]
Section Key metrics What to watch
server redis_version, uptime_in_seconds Unexpected restarts (uptime < 300 means something crashed it)
clients connected_clients, blocked_clients Connection leaks (growing connected_clients), workers stuck on blocking commands (blocked_clients)
memory used_memory, used_memory_rss, mem_fragmentation_ratio RSS growing faster than used_memory = fragmentation; ratio < 1.0 = swapping
stats keyspace_hits, keyspace_misses, evicted_keys, instantaneous_ops_per_sec Hit ratio (hits / (hits + misses)), eviction pressure, throughput
persistence rdb_last_save_time, aof_rewrite_in_progress Stale snapshots, concurrent rewrite during high load
replication connected_slaves, master_link_down_since Replication broken if master_link_down_since > 0
keyspace keys, expires, avg_ttl per database Key count trends, TTL distribution

Cache hit ratio is the single most important application-level metric:

\[ \text{hit\_ratio} = \frac{\text{keyspace\_hits}}{\text{keyspace\_hits} + \text{keyspace\_misses}} \]

A healthy cache typically has a hit ratio above 0.90 (90%). If it drops, either your TTLs are too short, your cache is too small, or your access patterns have shifted.

Latency Diagnostics

Quick latency check — measures round-trip time to Redis from the client machine:

redis-cli --latency              # continuous, 1-second intervals
redis-cli --latency-history      # rolling 15-second windows
redis-cli --latency-dist         # visual latency distribution

Latency monitoring framework — tracks latency spikes inside Redis (command processing time, not network):

CONFIG SET latency-monitor-threshold 5    # log events > 5ms
LATENCY LATEST                            # most recent spikes
LATENCY HISTORY command                   # history for a specific event
LATENCY DOCTOR                            # automated diagnosis
Event type What it measures
command Regular command execution time
fast-command O(1) and O(log n) commands
fork Time spent in fork() for persistence
aof-write AOF write latency
expire-cycle Key expiration processing
eviction-cycle Key eviction processing

Slow Log

Captures commands whose execution time (excluding I/O) exceeds a threshold:

# redis.conf
slowlog-log-slower-than 10000    # microseconds (10ms)
slowlog-max-len 256              # keep last 256 entries
SLOWLOG LEN                      # number of entries
SLOWLOG GET 10                   # last 10 slow commands
SLOWLOG RESET                    # clear the log

Set the threshold low initially

Start with slowlog-log-slower-than 10000 (10ms). In a healthy Redis instance, almost nothing should take 10ms. If the slow log fills up, you're either running O(n) commands on large keys (KEYS *, SMEMBERS on a huge set) or experiencing fork() latency from persistence.

Finding Problematic Keys

redis-cli --bigkeys              # scan for largest keys by type
redis-cli --memkeys              # scan with memory usage per key
redis-cli --hotkeys              # most frequently accessed keys (requires LFU policy)

These use SCAN internally — safe to run in production (non-blocking, cursor-based).

MONITOR: Use With Extreme Caution

redis-cli MONITOR streams every command processed by Redis in real time. It is invaluable for debugging but can reduce throughput by up to 50%. Never leave it running in production. Use it briefly to diagnose issues, then disconnect.

Alert Thresholds to Start With

Metric Condition Severity
uptime_in_seconds < 300 Critical — Redis restarted unexpectedly
connected_clients < expected minimum Warning — application connection issue
mem_fragmentation_ratio > 2.0 or < 1.0 Warning — high fragmentation or swapping
evicted_keys (rate) > 0 when unexpected Warning — memory pressure causing data loss
instantaneous_ops_per_sec > 80% of benchmark baseline Warning — approaching throughput limit
rdb_last_save_time > max acceptable interval Warning — persistence falling behind
keyspace_misses / (hits + misses) > 0.20 Warning — cache effectiveness degrading

When to Stay Here

You don't need replication, Sentinel, or Cluster if:

  • [x] Your application runs on a single server (or a small number of servers all in one region)
  • [x] Redis fits comfortably in memory on one machine
  • [x] You can tolerate ~1 second of data loss (AOF everysec)
  • [x] Your throughput is well under 100K ops/sec
  • [x] You don't need high availability (brief downtime on crash/restart is acceptable)
  • [x] You're not doing multi-key atomic operations that span multiple services

The Over-Engineering Trap

Adding replication "just in case" means managing replication lag, monitoring replica health, handling failover (manual or Sentinel), and debugging split-brain scenarios. If your app doesn't need HA yet, that complexity will create more incidents than it prevents.