Understanding Redis Cache Topology

This page covers the decision that shapes every other caching choice you make: whether to run a replicated single-shard Redis or a hash-slot sharded Redis Cluster, and how that topology dictates routing, failure domains, memory limits, and invalidation cost.

Treating Redis as a monolithic key-value store is an architectural mistake at scale. A production deployment is a distributed, fault-tolerant routing fabric where topology fixes your horizontal scaling ceiling, data locality, latency profile, and failure-domain boundaries. The two shapes that matter are a replicated single-shard deployment — one primary, one or more replicas, promoted by Redis Sentinel — and a sharded Redis Cluster that partitions the keyspace across many primaries. Which one you run constrains how the access patterns, eviction policy, and invalidation strategy established in the parent guide to Redis Caching Architecture & Invalidation Fundamentals actually behave under load.

Architectural Trade-offs

Both topologies survive a node failure and both accelerate reads. They diverge on how the keyspace is distributed: a replicated single shard keeps the entire dataset on every primary (memory is bounded by one machine), while a sharded cluster splits the keyspace so capacity scales horizontally at the cost of routing and colocation complexity. Write amplification below means redundant work under invalidation and repopulation — fan-out DEL traffic and cross-node reads — not the SSD-level term.

Axis	Replicated Single-Shard (Sentinel)	Sharded Redis Cluster
Consistency	Strong within the shard; every key on one primary, no cross-slot semantics	Per-shard consistency only; multi-key ops need hash-tag colocation
Latency	One hop to the primary (or a replica for reads); no client-side slot resolution	Client resolves `slot = CRC16(key) % 16384` and may follow `MOVED`/`ASK` redirects
Write Amplification	Low — invalidation touches one primary and replicates once	Higher — broadcast invalidation and `SCAN` sweeps fan out across every shard
Operational Complexity	Sentinel quorum, one `maxmemory` ceiling, vertical scaling only	Slot management, per-node `maxmemory`, reshard/rebalance automation, gossip health

Neither column wins outright. The single-shard topology keeps semantics simple and fits datasets that live comfortably in one machine's RAM; the sharded topology is the only way past that memory ceiling but pushes routing, colocation, and rebalancing into your clients and your automation. The rest of this page implements both, then ties the choice to signals you can measure.

Approach A — Replicated Single-Shard Topology

A single primary holds the whole keyspace and asynchronously replicates to one or more replicas. Sentinel processes monitor the primary, agree via quorum that it is down, and promote a replica — clients discover the new address by asking Sentinel rather than hard-coding it. Because every key lives on one primary, there are no hash slots, no cross-slot restrictions, and multi-key operations (MGET, transactions, Lua) work without colocation. The hard limit is memory: the working set must fit on the largest machine you can provision, and you scale reads by adding replicas, not capacity.

Production Implementation (Python 3.10+ / redis-py 5.x, async)

The client below resolves the current primary through Sentinel on every failover, routes reads to replicas, and fails open so a Redis outage degrades latency rather than availability.

import json
import logging
from redis.asyncio.sentinel import Sentinel
from redis.exceptions import ConnectionError, TimeoutError

logger = logging.getLogger(__name__)

class ReplicatedCache:
    def __init__(self, sentinels: list[tuple[str, int]], service: str):
        # Sentinel discovers the current primary/replica set; on failover the
        # next master_for()/slave_for() call returns the newly promoted node.
        self.sentinel = Sentinel(
            sentinels, socket_timeout=2.0, decode_responses=True
        )
        self.service = service

    async def get_profile(self, user_id: str) -> dict:
        key = f"usr:profile:{user_id}"
        try:
            replica = self.sentinel.slave_for(self.service, socket_timeout=2.0)
            cached = await replica.get(key)
            if cached:
                return json.loads(cached)
        except (ConnectionError, TimeoutError) as e:
            logger.warning("Redis read failed, falling back to DB: %s", e)

        data = await self._fetch_from_primary_db(user_id)
        if data:
            try:
                primary = self.sentinel.master_for(self.service, socket_timeout=2.0)
                # setex writes value + TTL atomically so a crash can't strand a
                # never-expiring key on the primary.
                await primary.setex(key, 3600, json.dumps(data))
            except (ConnectionError, TimeoutError):
                logger.error("Failed to populate cache for %s", key)
        return data or {}

    async def _fetch_from_primary_db(self, user_id: str) -> dict | None:
        return {"user_id": user_id, "status": "active", "tier": "premium"}

The whole dataset shares one maxmemory ceiling, so eviction and fragmentation are a single-node concern. Watch used_memory_rss against maxmemory: RSS includes fragmentation overhead, and when mem_fragmentation_ratio climbs above 1.2 you either enable activedefrag yes or scale the machine vertically — there is no other lever, because you cannot split the keyspace across nodes in this topology.

Approach B — Sharded Redis Cluster Topology

Redis Cluster partitions the keyspace into exactly 16,384 hash slots. Each primary owns a contiguous slot range, and clients resolve slot-to-node mappings before executing a command. The mapping is deterministic — slot = CRC16(key) % 16384 — which removes the single memory ceiling but pushes routing into the client layer. When a slot moves, the Redis cluster answers MOVED (permanent redirection) or ASK (a migration is in progress); redis-py follows both automatically, but custom routing layers must invalidate their slot cache to avoid a redirect storm during mass topology shifts.

Production Implementation (Python 3.10+ / redis-py 5.x, async)

Use the async RedisCluster client from redis-py 5.x. Enable replica read routing and configure retry-with-backoff so transient topology changes do not surface as request failures.

from redis.asyncio.cluster import RedisCluster, ClusterNode
from redis.asyncio.retry import Retry
from redis.backoff import FullJitterBackoff

startup_nodes = [
    ClusterNode("10.0.1.10", 6379),
    ClusterNode("10.0.1.11", 6379),
    ClusterNode("10.0.1.12", 6379),
]

# Full-jitter exponential backoff spreads retries so a topology change
# doesn't trigger a synchronized reconnect stampede.
retry = Retry(FullJitterBackoff(cap=2, base=0.1), retries=3)

rc = RedisCluster(
    startup_nodes=startup_nodes,
    read_from_replicas=True,
    retry=retry,
    cluster_error_retry_attempts=5,
    socket_connect_timeout=2,
    socket_timeout=2,
    decode_responses=True,
)

Setting cluster-require-full-coverage no in redis.conf lets the Redis cluster keep serving reachable slots while marking unreachable ones offline — it shifts the consistency guarantee to the application layer, which must handle a partial keyspace instead of a hard cluster-down.

Per-Node Memory and Eviction Calibration

The defining difference from Approach A is that memory limits are enforced per node, not globally. A 12 GB cluster of three primaries does not give you 12 GB of contiguous free space; each shard independently enforces its own maxmemory, and a misaligned eviction policy on one shard causes cascading misses and uneven latency. For skewed access distributions — session stores, leaderboards, hot config keys — allkeys-lfu beats LRU because it tracks frequency rather than recency and won't evict hot keys during a burst; raise maxmemory-samples to 10 or higher for better eviction accuracy at negligible CPU cost. Wrap writes with a memory-aware guard so a hot shard cannot OOM-restart:

from redis.asyncio.cluster import RedisCluster

async def safe_cluster_set(client: RedisCluster, key: str, value: str, ttl: int):
    # info() on a cluster returns per-node dicts; target the node that owns
    # this key so the memory reading reflects the right shard.
    node = client.get_node_from_key(key)
    info = await client.info("memory", target_nodes=node)
    max_mem = info.get("maxmemory") or 0  # 0 == unlimited
    usage_ratio = info["used_memory"] / max_mem if max_mem else 0.0

    if usage_ratio > 0.85:
        # Relieve pressure on this shard before writing rather than after.
        random_key = await client.randomkey(target_nodes=node)
        if random_key:
            await client.unlink(random_key)

    await client.setex(key, ttl, value)

Scaling this topology is itself a workflow: automated pipelines should provision a new empty primary, assign it a contiguous slot range, and rebalance load onto it when used_memory/maxmemory exceeds 0.75 across three consecutive polling intervals. The slot handoff is incremental and must not drop keys — the mechanics of that transition are covered in depth in zero-downtime slot migration, and the underlying slot model in Redis Cluster slot allocation basics.

# 1. Add the new node as an empty primary.
redis-cli --cluster add-node 10.0.1.13:6379 10.0.1.10:6379

# 2. Move a contiguous slot range onto it (≈1/4 of a 5461-slot shard).
redis-cli --cluster reshard 10.0.1.10:6379 \
  --cluster-from <source-node-id> \
  --cluster-to <new-node-id> \
  --cluster-slots 1365 \
  --cluster-yes

Cross-Node Invalidation Cost

Invalidation is where the sharded topology's write amplification shows up. Broadcasting DEL across every shard adds network overhead and opens races with concurrent reads; relying purely on TTL expiration defers consistency but risks stale reads. The workable middle ground pairs targeted deletes with a short TTL safety net, driven over Pub/Sub so each shard purges only the keys it owns:

async def invalidate_across_shards(client: RedisCluster, pattern: str):
    # Announce the invalidation so per-shard workers can act on their slice.
    await client.publish("cache:invalidation", pattern)

    # UNLINK reclaims memory off the event loop; scan_iter walks each shard.
    async for key in client.scan_iter(match=pattern, count=1000):
        await client.unlink(key)

For a full event-driven fan-out across services rather than a single client sweep, route invalidation through Pub/Sub routing for cross-service cache invalidation so every consumer converges on the same keyspace state.

When to Choose Which

Map the topology to signals you can measure, not to preference. Each criterion points at a concrete threshold.

Working-set size. If the dataset fits — with headroom for fragmentation and growth — on one machine's RAM, the replicated single shard is simpler and strongly consistent. Once the working set exceeds what one node can hold (or its maxmemory forces constant eviction churn), you need horizontal sharding.
Multi-key operation reliance. Heavy use of transactions, Lua over many keys, or MGET/MSET across unrelated keys argues for the single shard, where everything is colocated. In a sharded cluster the same operations require hash-tag colocation (usr:{123}:profile and usr:{123}:prefs share a slot) and break across slot boundaries.
Read throughput vs. write/capacity growth. Read-heavy load that fits in memory scales cheaply by adding replicas to a single shard. Write and capacity growth that outstrips one node's memory or CPU is the signal to shard.
Invalidation fan-out tolerance. If invalidation is frequent and pattern-based, the single shard purges in one place; a sharded cluster pays a SCAN/broadcast cost on every shard, so weigh that against the capacity you gain.
Operational maturity. A sharded cluster demands slot automation, per-node memory monitoring, and reshard tooling. If the team cannot yet operate that reliably, stay single-shard until capacity genuinely forces the move.

Signal	Lean Single-Shard	Lean Sharded Cluster
Working set	Fits one machine's RAM with headroom	Exceeds one node's memory ceiling
Multi-key ops	Frequent transactions / cross-key Lua	Rare, or already hash-tag colocated
Growth axis	Read throughput (add replicas)	Capacity + write throughput (add shards)
Invalidation	Frequent pattern-based purges	Tolerant of per-shard fan-out cost
Ops burden	Sentinel quorum is enough	Team can automate reshard + per-node alerts

Failure Modes and Diagnostics

Three topology-specific failure modes dominate incident channels. Each has a distinct signature and a targeted diagnosis.

Partial Slot Coverage Loss

When a primary and its replicas all go down, the slots they owned become unreachable. With cluster-require-full-coverage yes the whole cluster refuses writes; with no it serves the rest while those slots error. The signature is cluster_state:fail or a non-zero cluster_slots_fail alongside client errors scoped to a key range.

# cluster_state and the fail/ok slot counts reveal a coverage gap immediately.
redis-cli -c CLUSTER INFO | grep -E "cluster_state|cluster_slots_assigned|cluster_slots_fail"

# Map which node owns the affected range to confirm the blast radius.
redis-cli -c CLUSTER SLOTS

Remediate by restoring or replacing the failed shard and re-checking coverage; if you run cluster-require-full-coverage no, ensure the application degrades gracefully on the missing key range instead of retrying into a wall.

Split-Brain After a Partition

A network partition can leave two primaries believing they own the same slots. When the partition heals, the higher configEpoch wins and the loser's writes are discarded — silent data loss. The signature is divergent CLUSTER NODES views across nodes and mismatched configuration epochs.

# Compare epochs across nodes; a mismatch means gossip hasn't converged.
redis-cli -h <node-a> CLUSTER NODES | awk '{print $1, $3, $7}'
redis-cli -h <node-b> CLUSTER NODES | awk '{print $1, $3, $7}'

Set cluster-node-timeout high enough (e.g. 15000 ms) to avoid needless failovers on transient blips, and keep cluster-migration-barrier 1 so a primary always retains a replica.

Per-Node Eviction Churn and OOM

Because each shard enforces its own maxmemory, an unbalanced keyspace can push one node into constant eviction while others sit idle. The signature is a rising evicted_keys delta on a single node and a decaying hit ratio for the keys it owns.

# Per-node eviction and memory pressure; run against the suspect shard.
redis-cli -h <hot-node> INFO stats | grep -E "evicted_keys|keyspace_misses"
redis-cli -h <hot-node> INFO memory | grep -E "used_memory_rss|maxmemory|mem_fragmentation_ratio"

Rebalance the slot distribution, switch the shard to allkeys-lfu, or add a primary and reshard to relieve the hotspot — see the eviction tuning in LRU vs LFU eviction policies.

Verification

Confirm the topology is healthy with observable signals, not just a successful connection. Instrument per-node memory and slot state, then watch them under load.

# 1. Cluster-wide health: state must be ok and all 16384 slots covered.
redis-cli --cluster check 10.0.1.10:6379

# 2. Slot ownership and node roles — audit that shards are balanced.
redis-cli -c CLUSTER SLOTS
redis-cli -c CLUSTER NODES | grep master | awk '{print $2, $9}' | sort

# 3. Per-node hit ratio — misses should trail hits on every shard.
redis-cli -h <node> INFO stats | grep -E "keyspace_hits|keyspace_misses"

# 4. Per-node memory headroom — used_memory over maxmemory is the scale signal.
redis-cli -h <node> INFO memory | grep -E "used_memory:|maxmemory:|mem_fragmentation_ratio"

For continuous telemetry, scrape per-node metrics into Prometheus and alert on the signals that predict incidents: redis_cluster_slots_fail > 0 (page immediately), used_memory / maxmemory > 0.80 (scale trigger), a keyspace miss ratio above 0.3 (eviction misconfiguration), and cluster_state != ok (topology degradation). Multi-tenant deployments need real failure-domain isolation on top of this — key prefixes alone will not contain a noisy neighbor, so enforce ACLs, network segmentation, and per-tier separation as detailed in Redis security boundaries for multi-tenant applications. When Redis is unreachable entirely, a topology-aware fallback routing strategy keeps the miss path from overrunning the datastore.

Operational Checklist

Topology matched to working-set size — sharded only once one node's memory is the ceiling
cluster-require-full-coverage set deliberately, with app-side handling if no
read_from_replicas=True configured in the Redis cluster client
allkeys-lfu (or a workload-appropriate policy) applied per shard, maxmemory-samples >= 10
Automated scaling triggers at 75% per-node memory utilization
Pub/Sub or SCAN-targeted invalidation replaces broadcast DEL
Prometheus scraping cluster_slots_fail, used_memory, and keyspace_* per node
cluster-node-timeout and cluster-migration-barrier set to survive partitions
Network segmentation and ACLs enforce tenant isolation

A topology-aware deployment turns Redis from a volatile store into a predictable, horizontally scalable routing fabric: aligning client routing, per-node memory policy, and scaling automation with the shape you chose gives you consistent latency, graceful degradation on failure, and capacity expansion without downtime.

Up one level: Redis Caching Architecture & Invalidation Fundamentals

Understanding Redis Cache Topology

# Architectural Trade-offs

# Approach A — Replicated Single-Shard Topology

# Production Implementation (Python 3.10+ / redis-py 5.x, async)

# Approach B — Sharded Redis Cluster Topology

# Production Implementation (Python 3.10+ / redis-py 5.x, async)

# Per-Node Memory and Eviction Calibration

# Cross-Node Invalidation Cost

# When to Choose Which

# Failure Modes and Diagnostics

# Partial Slot Coverage Loss

# Split-Brain After a Partition

# Per-Node Eviction Churn and OOM

# Verification

# Operational Checklist

# Related

Architectural Trade-offs

Approach A — Replicated Single-Shard Topology

Production Implementation (Python 3.10+ / redis-py 5.x, async)

Approach B — Sharded Redis Cluster Topology

Production Implementation (Python 3.10+ / redis-py 5.x, async)

Per-Node Memory and Eviction Calibration

Cross-Node Invalidation Cost

When to Choose Which

Failure Modes and Diagnostics

Partial Slot Coverage Loss

Split-Brain After a Partition

Per-Node Eviction Churn and OOM

Verification

Operational Checklist

Related