Redis: From Single Instance to Distributed Scale¶
What Is Redis?¶
Redis (REmote DIctionary Server) is an in-memory data structure store. At its core, it holds all data in RAM and processes every command on a single thread — which makes it fundamentally different from what most people think of as a "database."
Redis vs Traditional Databases¶
A traditional database (MySQL, PostgreSQL, MongoDB) is designed around durability first: data goes to disk, indexes are built for query flexibility, and the system is optimized for complex reads across large datasets. The cost is latency — every operation involves disk I/O, query planning, locking, and transaction coordination.
Redis inverts this priority. Speed first, durability second.
| Traditional Database | Redis | |
|---|---|---|
| Where data lives | Disk (with memory caching) | Memory (with optional disk persistence) |
| Access time | Milliseconds (disk seek + query plan) | Microseconds (direct memory access) |
| Data model | Tables/documents with rich query languages (SQL, MQL) | Key-value with typed data structures (strings, lists, sets, hashes, sorted sets, streams) |
| Concurrency model | Multi-threaded with locks, MVCC, or optimistic concurrency | Single-threaded — commands execute sequentially, no locking needed |
| Durability | Writes are durable by default (WAL, fsync) | Writes are in-memory by default; persistence is opt-in and lossy |
| Query capability | Joins, aggregations, subqueries, indexes on any column | Get by key, operate on data structure (no joins, no ad-hoc queries) |
| Dataset size | Limited by disk (terabytes+) | Limited by RAM (typically gigabytes) |
| Failure cost | Data survives crashes (designed for it) | Data can be lost on crash (depends on persistence config) |
Why Not Just Use a Database for Everything?¶
Because some operations don't need what a database provides — and pay a heavy price for it.
Consider a session store. You write a session on login, read it on every request, and delete it on logout. You don't need joins, transactions, or complex queries. You don't need the data to survive a server reboot (the user just logs in again). What you need is speed — sub-millisecond reads on every single HTTP request.
A database handles this at ~1-5ms per read (network + query plan + disk/cache). Redis handles it at ~0.1ms. At 10,000 requests per second, that's the difference between 10-50 seconds of cumulative latency and 1 second.
The same logic applies to:
- Caching: store computed results to avoid re-querying the database
- Rate limiting: count requests per user per time window
- Leaderboards: sorted sets give you rank operations in O(log n)
- Queues: lists with blocking pop give you a job queue with no polling
- Pub/Sub: real-time message fanout without a message broker
When Redis Is the Wrong Choice¶
Redis is not a replacement for your database. It's a complement:
Redis Is Not a Database Replacement
- Your source of truth should be a database. Redis can lose data on crash (even with persistence, the default
appendfsync everysechas a ~1 second data loss window). - If your data doesn't fit in RAM, Redis is not the right tool. A 500GB dataset needs 500GB of RAM — expensive and impractical for most use cases.
- If you need complex queries, Redis can't help. There is no
SELECT * FROM users WHERE age > 30 AND city = 'Tokyo'. Redis retrieves by key, not by query. - If you need strong durability guarantees (financial transactions, legal records), Redis's persistence model is too weak. Use a database with WAL and synchronous replication.
The Mental Model¶
Think of Redis as your application's working memory — fast, limited in size, and optimized for data you need right now. Your database is long-term storage — slower, vast, and designed to never lose anything. The two work together:
graph LR
App["Application"] -->|"fast path<br/><i>~0.1ms</i>"| Redis["Redis<br/><i>working memory</i>"]
App -->|"slow path<br/><i>~1-5ms</i>"| DB["Database<br/><i>long-term storage</i>"]
Redis -.->|"cache miss"| DB
DB -.->|"populate cache"| Redis
The rest of this guide dives into how Redis achieves this speed, what happens when you outgrow a single instance, and the hard distributed systems problems that emerge at scale.
Why Not Just Use In-Memory Data Structures?¶
Every language has hashmaps, dictionaries, and in-process caches. A Python dict or a Node.js Map lives in the same process, requires zero network calls, and is faster than Redis for raw access time (~50ns vs ~100us). So why add an external system?
Because in-process memory has fundamental limitations that appear the moment your application grows beyond a single process.
graph TD
subgraph "In-Process Cache"
P1["Process 1<br/><i>cache: {user:1 → Alice}</i>"]
P2["Process 2<br/><i>cache: {user:1 → ???}</i>"]
P3["Process 3<br/><i>cache: {user:1 → ???}</i>"]
end
subgraph "Redis (Shared)"
R["Redis<br/><i>{user:1 → Alice}</i>"]
P4["Process 1"] --> R
P5["Process 2"] --> R
P6["Process 3"] --> R
end
| Concern | In-Process (Map, dict) |
Redis |
|---|---|---|
| Shared across processes | No — each process has its own copy. Update in one, others are stale. | Yes — single source of truth for all processes, servers, and services. |
| Survives process restart | No — process dies, cache is gone. Cold start on every deploy. | Yes — data persists across restarts (with AOF/RDB). Deploys don't flush your cache. |
| Shared across servers | No — Server A's cache is invisible to Server B. | Yes — any server can read/write the same keys. |
| Memory limit | Bounded by the process's heap. Competes with your application for RAM. Large caches cause GC pressure (Java, Node.js, Go). | Dedicated memory. Doesn't affect application GC. Can be on a separate machine with more RAM. |
| Eviction policies | You build your own (or use a library). | Built-in LRU, LFU, TTL-based eviction with tunable sampling. |
| Atomic operations | Thread-safety is your problem (mutexes, locks, concurrent maps). | Single-threaded — every command is atomic. No race conditions by design. |
| Data structures | Basic (maps, lists, sets). Sorted sets, HyperLogLog, streams — you'd build these yourself. | Native sorted sets, streams, bitmaps, HyperLogLog, pub/sub, geospatial indexes. |
| Expiration | Manual — you track timestamps and purge stale entries. | Built-in TTL per key, with lazy + periodic expiration. |
When In-Process Cache Is the Right Choice¶
In-process caching is not wrong — it's a different tool for a different problem:
- Single-process application that will never scale horizontally — a
Mapis simpler and faster - Immutable reference data (country codes, config, feature flags) that rarely changes — load once, use forever
- Hot-path microsecond optimization where even Redis's ~100us RTT is too slow — use an in-process L1 cache in front of Redis
BullMQ: Why a HashMap Can't Replace Redis
Consider implementing a job queue with in-process data structures. You'd need:
- A list for the queue (easy)
- Atomic pop-and-push across processes (impossible without shared state)
- Blocking wait for new jobs without polling (requires OS-level primitives)
- Job locking across multiple workers on different machines (requires distributed state)
- Persistence so jobs survive restarts (requires your own serialization/storage)
BullMQ uses Redis because every one of these requirements demands shared, persistent, atomic state that no in-process data structure can provide. The BRPOPLPUSH command alone — atomically pop from one list and push to another, blocking until data is available — has no equivalent in a Map.
The Hybrid Approach: L1 + Redis¶
In high-throughput systems, the best architecture often combines both:
graph LR
App["Application"] -->|"~50ns"| L1["L1: In-Process Cache<br/><i>small, hot data</i>"]
L1 -->|"miss → ~100μs"| Redis["L2: Redis<br/><i>shared, larger</i>"]
Redis -->|"miss → ~1-5ms"| DB["L3: Database<br/><i>source of truth</i>"]
The in-process L1 cache holds a small set of frequently accessed keys (hundreds, not millions). Redis serves as the shared L2. The database is the durable source of truth. Each tier is 10-100x slower than the one above, but holds progressively more data.
The challenge with L1 is invalidation: when a key changes in Redis, all processes with a stale L1 copy need to know. Redis 7.0+ provides server-assisted client-side caching with invalidation messages — Redis tracks which clients cached which keys and pushes invalidation notices when those keys change.
The Journey¶
This guide follows the lifecycle of a growing application. At each stage, we teach the Redis internals that matter, the failure modes that appear, and — critically — why you shouldn't jump ahead. Over-engineering your Redis setup is just as dangerous as under-engineering it.
graph LR
A["<b>Single Instance</b><br/>Event loop, data structures,<br/>persistence, memory model"] --> B["<b>Growing Pains</b><br/>Replication, background jobs,<br/>Sentinel, cache stampede"]
B --> C["<b>Large Scale</b><br/>Cluster, CAP trade-offs,<br/>race conditions, Redlock"]
style A fill:#4CAF50,color:#fff,stroke:#388E3C
style B fill:#FF9800,color:#fff,stroke:#F57C00
style C fill:#F44336,color:#fff,stroke:#D32F2F
How to Read This Guide
Each stage explains three things:
- What's appropriate at this scale
- What's overkill (and why adding it hurts more than it helps)
- The internals behind why things work — or break
BullMQ is used as a running example throughout, showing which Redis primitives power real-world job queue infrastructure.
-
Single Instance
A single Redis instance is surprisingly powerful. Learn the event loop, internal data structures, memory model, and persistence trade-offs that make it work.
-
Growing Pains
Your app is scaling. Replication, background jobs with BullMQ, Sentinel for HA — and the failure modes that come with each.
-
Large Scale
Distributed Redis. Cluster architecture, the CAP theorem, race conditions, and why hardware clocks break distributed locks.