What cache hit ratio should the CI gate enforce?

Set the threshold from your own steady-state baseline. 85% is a reasonable floor for read-heavy APIs, but a low-reuse workload may sit lower. Gate on a regression from your measured baseline plus a tail-latency SLO.

Implementing the Cache-Aside Pattern in Microservices

You run a fleet of stateless Python services that each read from Redis and fall back to a primary datastore, and you need every service to handle a cache miss, hydrate the entry, and invalidate it correctly without sharing memory or leaking connection state across the network. This page walks through a production implementation of cache-aside — where the application, not a middleware layer, owns the read path, the write ordering, and the failure isolation. It builds on the boundary and operational trade-offs covered in Cache-Aside vs Read-Through Patterns; here we focus strictly on the code and the failure surfaces you hit once real traffic arrives.

Prerequisites

Redis 7.x reachable from every service instance, with maxmemory and an explicit eviction policy set (see Configuring LRU Eviction for High-Throughput APIs).
Python 3.10+ with redis-py 5.x (redis.asyncio) and tenacity 8.x for retry policy.
An asyncio service runtime (FastAPI, aiohttp, or a Celery worker with an async bridge).
A primary source of truth (Postgres, DynamoDB, an upstream service) whose writes you can order relative to cache updates.
Agreement on a per-domain TTL policy — decide early using How to Choose Between TTL and Explicit Invalidation.

Step-by-Step Implementation

Step 1 — Establish a bounded connection pool

Create one shared, size-bounded connection pool per process so that concurrent coroutines multiplex over a fixed number of sockets instead of opening a connection per request.

import redis.asyncio as redis

def build_pool(redis_url: str, max_connections: int = 50) -> redis.Redis:
    pool = redis.ConnectionPool.from_url(
        redis_url,
        max_connections=max_connections,
        decode_responses=True,
        socket_timeout=2.0,        # cap read latency so a slow node fails fast
        socket_connect_timeout=1.0,
        retry_on_timeout=True,
    )
    return redis.Redis(connection_pool=pool)

Step 2 — Implement the get-or-hydrate read path

Read from the cache first, and on a miss (or a Redis read error) call the fallback loader, then write the value back with an explicit expiry — never an unbounded key.

import json, logging
from typing import Any, Awaitable, Callable, Optional

logger = logging.getLogger(__name__)

async def get_or_hydrate(
    r: redis.Redis,
    key: str,
    loader: Callable[[], Awaitable[Any]],
    ttl: int = 300,
) -> Optional[Any]:
    try:
        cached = await r.get(key)          # GET on the hot path
        if cached is not None:
            return json.loads(cached)
    except redis.ConnectionError as e:
        logger.warning("cache read failed, serving from primary: %s", e)

    value = await loader()                 # miss → source of truth
    if value is None:
        return None
    try:
        await r.setex(key, ttl, json.dumps(value))   # SETEX = value + expiry atomically
    except Exception:
        logger.exception("cache write failed for %s", key)  # never fail the request on a write
    return value

Step 3 — Enforce write ordering on mutations

On any update, commit to the primary store first and only then invalidate or refresh the cache, so a rollback can never leave a stale entry behind as authoritative.

async def update_entity(r: redis.Redis, db, entity_id: str, patch: dict) -> None:
    async with db.transaction():           # 1. durable commit first
        await db.update(entity_id, patch)
    # 2. only after commit succeeds do we touch the cache
    await r.unlink(f"entity:{entity_id}")  # UNLINK reclaims memory off the event loop

Step 4 — Coalesce concurrent misses on a hot key

Guard hydration with a per-key lock and re-check the cache inside the lock, so that when a popular key expires only one coroutine hits the primary while its siblings wait and read the freshly warmed value.

import asyncio
from contextlib import asynccontextmanager

_locks: dict[str, asyncio.Lock] = {}

@asynccontextmanager
async def coalesce(key: str):
    lock = _locks.setdefault(key, asyncio.Lock())
    async with lock:
        yield
    if not lock.locked():
        _locks.pop(key, None)

async def get_coalesced(r, key, loader, ttl=300):
    async with coalesce(key):
        cached = await r.get(key)          # re-check: a sibling may have filled it
        if cached is not None:
            return json.loads(cached)
        return await get_or_hydrate(r, key, loader, ttl)

Step 5 — Wrap Redis calls in bounded retry with jitter

Retry only transient, recoverable errors using capped exponential backoff with jitter, and reraise everything else immediately so blind retries never amplify a partial outage into a thundering herd.

from tenacity import (
    retry, stop_after_attempt, wait_exponential_jitter, retry_if_exception,
)
import redis.exceptions as rex

def _retryable(err: Exception) -> bool:
    return isinstance(err, (rex.ConnectionError, rex.TimeoutError, rex.BusyLoadingError))

@retry(
    stop=stop_after_attempt(3),
    wait=wait_exponential_jitter(initial=0.1, max=2.0, jitter=0.1),
    retry=retry_if_exception(_retryable),   # predicate form takes a callable
    reraise=True,
)
async def safe_get(r: redis.Redis, key: str):
    return await r.get(key)

Step 6 — Gate deployments on cache behavior in CI

Add a pipeline stage that runs a synthetic load test against staging and fails the build when the hit ratio or tail latency regresses past your SLO.

- name: Cache performance gate
  run: |
    k6 run --out json=cache_metrics.json \
      -e TARGET_URL=${{ secrets.API_STAGING_URL }} cache_load_test.js
    python3 - <<'EOF'
    import json, sys
    m = json.load(open("cache_metrics.json"))
    hit = m.get("cache_hit_ratio", {}).get("value", 0)
    p95 = m.get("http_req_duration", {}).get("p(95)", 0)
    if hit < 0.85:  print(f"FAIL hit ratio {hit:.2%} < 85%"); sys.exit(1)
    if p95 > 150:   print(f"FAIL p95 {p95}ms > 150ms"); sys.exit(1)
    print("PASS cache within SLO")
    EOF

Critical Path

The sequence below shows why the per-key lock from Step 4 collapses a burst of concurrent misses into a single primary-store call.

Failure Modes

Cache stampede on a hot key

When a popular key expires under load, every concurrent request misses at once and stampedes the primary store. Diagnose it by watching latency and miss counters spike together during cold starts:

redis-cli --latency-history -h <redis-host> -p 6379
redis-cli INFO stats | grep -E "keyspace_hits|keyspace_misses|evicted_keys"

Fix it with the Step 4 coalescing wrapper, and add jittered TTLs so a batch of keys written together does not expire on the same tick.

Partial-write inconsistency

Writing the cache before the database commits leaves a stale entry advertised as truth if the transaction rolls back. Detect it by comparing a suspect key against the source of truth:

redis-cli GET entity:42
# compare against the primary row for id=42

Fix it by enforcing the Step 3 order (commit, then UNLINK), and for cross-service fan-out route the purge through Redis Pub/Sub so dependent caches drop the key too.

Connection pool exhaustion

Under sustained load an undersized pool surfaces as redis.exceptions.ConnectionError: No connection available. Inspect live client usage and pool internals:

redis-cli INFO clients | grep connected_clients

p = r.connection_pool
print("in_use", len(p._in_use_connections), "idle", len(p._available_connections))

Fix it by sizing max_connections near expected_rps * avg_latency_s * 1.5 and keeping the socket_timeout from Step 1 so a stalled node returns connections to the pool instead of pinning them. When Redis itself is unreachable, degrade through a defined fallback route for cache misses rather than failing the request.

Verification

Confirm the read path warms correctly and expires on schedule:

# 1. Cold miss then warm hit — key should exist with a bounded TTL
redis-cli TTL entity:42        # expect a value between 1 and 300, never -1

# 2. Hit ratio should climb under repeated reads
redis-cli INFO stats | grep -E "keyspace_hits|keyspace_misses"

# 3. Eviction policy is actually enforced
redis-cli CONFIG GET maxmemory-policy   # expect allkeys-lru or volatile-ttl

A TTL of -1 means an entry was written without expiry — grep your code for a bare SET/set() that should be SETEX/setex(). A keyspace_misses rate that stays high after warmup points at a key-naming mismatch between the read and write paths, not a capacity problem.

FAQ

Should cache-aside services write to Redis synchronously in the request path?

Write with setex after computing the value, but never let a cache-write failure fail the user request — wrap it in a try/except as in Step 2. The cache is an optimization; the primary store is the source of truth.

Why `UNLINK` instead of `DEL` for invalidation?

UNLINK removes the key from the keyspace immediately but reclaims its memory on a background thread, so a large value or collection does not block the event loop. Use DEL only for tiny keys where the reclaim cost is negligible.

Does request coalescing need a distributed lock across service instances?

The in-process asyncio.Lock in Step 4 only serializes coroutines within one process. If dozens of pods can independently stampede the same key, promote coalescing to a short-lived distributed lock or add probabilistic early expiration so instances refresh at staggered times rather than all at once.

How do I keep hydration logic consistent across many microservices?

Publish the get_or_hydrate, coalescing, and retry helpers as a shared internal package so every service uses identical TTL defaults, key naming, and failure handling. Divergent per-service copies are the most common source of subtle coherence bugs.

What hit ratio should the CI gate enforce?

Set the threshold from your own steady-state baseline, not a universal number — 85% is a reasonable floor for read-heavy APIs, but a low-reuse workload may legitimately sit lower. Gate on a regression from your measured baseline plus a tail-latency SLO, so the check catches real drift rather than an arbitrary target.

Up: Cache-Aside vs Read-Through Patterns

Implementing the Cache-Aside Pattern in Microservices

# Prerequisites

# Step-by-Step Implementation

# Step 1 — Establish a bounded connection pool

# Step 2 — Implement the get-or-hydrate read path

# Step 3 — Enforce write ordering on mutations

# Step 4 — Coalesce concurrent misses on a hot key

# Step 5 — Wrap Redis calls in bounded retry with jitter

# Step 6 — Gate deployments on cache behavior in CI

# Critical Path

# Failure Modes

# Cache stampede on a hot key

# Partial-write inconsistency

# Connection pool exhaustion

# Verification

# FAQ

# Should cache-aside services write to Redis synchronously in the request path?

# Why UNLINK instead of DEL for invalidation?

# Does request coalescing need a distributed lock across service instances?

# How do I keep hydration logic consistent across many microservices?

# What hit ratio should the CI gate enforce?

# Related