Configuring LRU Eviction for High-Throughput APIs

Your API tier sits behind a Redis cache that is permanently full: the working set is larger than maxmemory, so every write that lands a new key forces the eviction of an old one. Under a sustained write burst — a deploy that warms thousands of fragments, a crawler hammering cold endpoints — the default eviction settings start dropping hot keys, the cache hit ratio collapses, and the miss traffic slams your primary database. Worse, eviction runs on the same event loop that serves GET and SET, so a badly tuned reclaim cycle shows up directly as p99 latency. This page tunes approximate LRU eviction for exactly this scenario: keep the genuinely recent keys resident, keep reclaim cheap and off the critical path, and fail safe on the client when a miss does happen. It is the hands-on companion to the recency-versus-frequency decision covered in the parent guide, and it assumes you have already decided LRU — not LFU — is the right signal for your access pattern.

Prerequisites

Redis 6.0+ (7.x recommended for improved activedefrag and lazyfree defaults).
A cache whose working set genuinely exceeds maxmemory — if you never hit the ceiling, eviction never runs and none of this applies.
redis-py 5.x and Python 3.10+ for the client-side resilience step (this page uses the synchronous client, which is the common shape for WSGI/ASGI API workers).
Ability to set parameters via CONFIG SET at runtime and persist them in redis.conf.
A load-generation tool (k6, Locust, or redis-benchmark) to reproduce write amplification before you ship.

Step-by-Step Implementation

Each step is independently runnable. Apply them with CONFIG SET first to validate against live traffic, then persist the surviving values to redis.conf so they outlast a restart.

1. Pin maxmemory and select the correct LRU variant. Choose allkeys-lru when every key is a disposable cache entry, or volatile-lru when the instance also holds keys that must never be evicted and you mark the evictable ones with a TTL.

# allkeys-lru: any key is a candidate (pure cache).
# volatile-lru: only keys with an expiry set are candidates.
redis-cli CONFIG SET maxmemory 8gb
redis-cli CONFIG SET maxmemory-policy allkeys-lru

2. Raise the sampling depth so eviction picks a genuinely cold victim. Redis approximates LRU by sampling a handful of keys and evicting the least-recently-used among them; the default maxmemory-samples 5 misfires under churn, so raise it to 10 (or 15 for latency-sensitive read-heavy tiers).

redis-cli CONFIG SET maxmemory-samples 10

3. Offload large-object deallocation to a background thread. Freeing a big value inline stalls the event loop; lazyfree-lazy-eviction moves that free off the critical path so eviction of a multi-megabyte value no longer spikes p99.

redis-cli CONFIG SET lazyfree-lazy-eviction yes

4. Enable active defragmentation to stop phantom memory pressure. Under high write/evict churn the jemalloc allocator holds freed pages, inflating used_memory and triggering eviction while real data would still fit; active defrag reclaims those pages continuously.

redis-cli CONFIG SET activedefrag yes
redis-cli CONFIG SET active-defrag-threshold-lower 10
redis-cli CONFIG SET active-defrag-cycle-min 1

5. Persist the surviving values. Once the settings hold under load, write them into redis.conf so a restart does not silently revert to defaults.

maxmemory 8gb
maxmemory-policy allkeys-lru
maxmemory-samples 10
lazyfree-lazy-eviction yes
activedefrag yes
active-defrag-threshold-lower 10
active-defrag-cycle-min 1

6. Make the API client survive the misses eviction will still cause. No eviction policy is perfect, so the backend must absorb a miss surge without cascading; wrap reads in retry-with-backoff and a source-of-truth fallback so an evicted key becomes one extra database read, not a request failure.

import json
from redis import Redis
from redis.retry import Retry
from redis.backoff import ExponentialBackoff
from redis.exceptions import ConnectionError, TimeoutError

# One shared, pooled client per process. Short timeouts fail fast so a
# stalled reclaim cycle never becomes a hung request thread.
client = Redis(
    host="cache-primary.internal",
    port=6379,
    retry=Retry(ExponentialBackoff(), retries=3),
    retry_on_error=[ConnectionError, TimeoutError],
    socket_timeout=0.5,
    socket_connect_timeout=0.5,
    decode_responses=True,
)

def get_with_fallback(key: str, db_fetch_fn, ttl: int = 300):
    val = client.get(key)          # cache hit fast-path
    if val is not None:
        return val
    val = db_fetch_fn()            # miss: read source of truth
    if val is not None:
        payload = val if isinstance(val, str) else json.dumps(val)
        client.setex(key, ttl, payload)  # repopulate with a bounded TTL
    return val

Pair this client with a bounded connection pool and a circuit breaker so an eviction storm cannot exhaust worker threads. If you need the miss path to degrade to a stale copy or a secondary store, layer in graceful fallback routing around this call.

7. Gate deploys on eviction health. Reproduce write amplification in CI and refuse to promote a build whose fragmentation or eviction-to-ops ratio breaches budget.

#!/usr/bin/env bash
set -euo pipefail
REDIS_HOST="${REDIS_HOST:-127.0.0.1}"
MEM_INFO=$(redis-cli -h "$REDIS_HOST" INFO memory)
# evicted_keys and instantaneous_ops_per_sec live in the Stats section, not Memory.
STATS_INFO=$(redis-cli -h "$REDIS_HOST" INFO stats)

FRAG=$(echo "$MEM_INFO"   | awk -F: '/mem_fragmentation_ratio/    {print $2}' | tr -d '\r ')
EVICTED=$(echo "$STATS_INFO" | awk -F: '/evicted_keys/            {print $2}' | tr -d '\r ')
OPS=$(echo "$STATS_INFO"  | awk -F: '/instantaneous_ops_per_sec/  {print $2}' | tr -d '\r ')

if (( $(echo "$FRAG > 1.5" | bc -l) )); then
  echo "FAIL: memory fragmentation exceeds threshold. Aborting deploy."
  exit 1
fi

if [ "${OPS:-0}" -gt 0 ]; then
  EVICT_RATE=$(echo "scale=3; $EVICTED / $OPS" | bc)
  if (( $(echo "$EVICT_RATE > 0.05" | bc -l) )); then
    echo "FAIL: eviction-to-ops ratio exceeds 5%. Review maxmemory or sampling depth."
    exit 1
  fi
fi

echo "PASS: cache health metrics within acceptable bounds."

Failure Modes

Hot keys evicted during a write burst (sampling too shallow). A traffic spike floods the sample pool with brand-new keys, and with maxmemory-samples 5 the algorithm evicts long-lived hot entries by mistake. Diagnose by watching evicted_keys climb while the hit ratio drops:

redis-cli INFO stats | grep -E "evicted_keys:|keyspace_hits:|keyspace_misses:"

Fix by raising maxmemory-samples to 10–15; if the working set is fundamentally frequency-skewed rather than recency-skewed, reconsider LFU instead of LRU.

Latency spikes from synchronous deallocation. Evicting a large value frees it inline on the event loop, blocking every other command for the duration. It surfaces as periodic p99 spikes with no network cause:

redis-cli SLOWLOG GET 10
redis-cli --latency-history

Fix by confirming lazyfree-lazy-eviction yes is active (CONFIG GET lazyfree-lazy-eviction) so large frees move to a background thread.

Premature eviction from allocator fragmentation. used_memory reports far more than the sum of live values, so Redis evicts while real data would still fit. A mem_fragmentation_ratio above 1.5 is the tell:

redis-cli INFO memory | grep -E "used_memory:|used_memory_rss:|mem_fragmentation_ratio:"

Fix by enabling activedefrag yes for continuous reclaim, or run a one-off MEMORY PURGE to release freed pages immediately.

Verification

Confirm the policy and tunables actually took effect:

redis-cli CONFIG GET maxmemory-policy
redis-cli CONFIG GET maxmemory-samples
redis-cli CONFIG GET lazyfree-lazy-eviction

Drive a write burst and watch eviction versus throughput in real time — a healthy tier keeps evicted_keys growth well under 5% of ops/sec:

redis-cli --stat

Assert the hit ratio holds after the burst; the value below should stay close to its pre-burst baseline:

redis-cli INFO stats | awk -F: '/keyspace_hits|keyspace_misses/ {print}'

FAQ

Should I use allkeys-lru or volatile-lru? Use allkeys-lru when the instance is a pure cache and every key is safe to drop. Use volatile-lru only when the same instance also stores keys that must never be evicted — then set an expiry on the disposable keys so they, and only they, are eviction candidates. On a mixed instance with no TTLs set, volatile-lru has nothing to evict and errors on write once full.

Does raising maxmemory-samples hurt performance? Barely. Going from 5 to 10 roughly doubles the sampling work per eviction, but that work is trivial next to the accuracy gain, and on modern cores the CPU cost is negligible. The far larger performance risk is a low sample count evicting hot keys and multiplying database load.

Why is my used_memory higher than the sum of my values? That gap is allocator fragmentation: jemalloc holds freed pages instead of returning them to the OS. It inflates used_memory, trips eviction early, and distorts LRU accounting. Enable activedefrag yes, or issue MEMORY PURGE for an immediate one-off reclaim.

Is approximate LRU the same as textbook LRU? No. Redis does not maintain a full linked list of access order — it samples maxmemory-samples keys and evicts the least-recently-used of that sample. A higher sample count gets you closer to true LRU; the default trades accuracy for speed.

Do I still need TTLs if LRU already evicts under pressure? Yes. Eviction only fires at the maxmemory ceiling and says nothing about staleness — an LRU-resident key can be hot and wrong. A bounded TTL caps how long stale data can serve regardless of memory pressure, and the two mechanisms are complementary, not redundant.

Up one level: LRU vs LFU Eviction Policies in Redis

LRU vs LFU Eviction Policies in Redis — the recency-versus-frequency decision this page assumes you have already made.
How to Choose Between TTL and Explicit Invalidation — the staleness controls that sit above eviction.
Designing Graceful Fallback Routing for Cache Misses — where evicted-key misses go when the database alone is not enough.
Redis Caching Architecture & Invalidation Fundamentals — how eviction fits among the other core caching decisions.

Configuring LRU Eviction for High-Throughput APIs

# Prerequisites

# Step-by-Step Implementation

# Failure Modes

# Verification

# FAQ

# Related

Prerequisites

Step-by-Step Implementation

Failure Modes

Verification

FAQ

Related