Production Guide: LRU vs LFU Eviction Policies in Redis

Memory pressure is an operational certainty in distributed caching systems. When proactive data lifecycle controls like TTL vs Explicit Invalidation fail to constrain working set growth, Redis must trigger eviction to prevent out-of-memory (OOM) termination. Eviction is not a fallback mechanism; it is a deterministic capacity-control layer. A rigorous grasp of Redis Caching Architecture & Invalidation Fundamentals establishes that algorithm selection directly dictates cache hit ratios, downstream database load, and p99 latency during traffic spikes.

This guide details the operational mechanics of approximate LRU and LFU, provides production-grade configuration playbooks, and integrates observability pipelines for continuous policy validation.

flowchart TD
    W{Dominant access pattern?} -->|recency, fairly uniform| LRU[allkeys-lru]
    W -->|frequency, stable hot set| LFU[allkeys-lfu]
    W -->|only keys with a TTL evictable| VOL[volatile-lru / volatile-lfu]
    W -->|must never drop data| NO[noeviction + client-side TTL]

Approximate LRU: Mechanics & High-Throughput Tuning

Redis deliberately avoids strict LRU to prevent the memory and CPU overhead of maintaining a global doubly-linked list. Instead, it implements an approximate algorithm that samples a configurable subset of keys (maxmemory-samples) and evicts the least recently accessed among them. The default sampling window is 5, which provides acceptable accuracy for uniform access patterns but degrades under skewed or bursty workloads.

For high-throughput API gateways or ephemeral payload caches, increasing the sample size improves eviction precision at a marginal CPU cost. As detailed in Configuring LRU Eviction for High-Throughput APIs, raising maxmemory-samples to 10–15 aligns the eviction window with modern write velocities.

Production Configuration

# Runtime adjustment (non-persistent)
redis-cli CONFIG SET maxmemory-policy allkeys-lru
redis-cli CONFIG SET maxmemory-samples 10

# Persistent configuration (redis.conf)
maxmemory-policy allkeys-lru
maxmemory-samples 10

Python Integration & Validation

import redis
from redis.exceptions import ResponseError

client = redis.Redis(host="cache-primary.internal", port=6379, decode_responses=True)

def tune_lru_sampling(target_samples: int = 12):
    try:
        current = client.config_get("maxmemory-samples")
        if int(current["maxmemory-samples"]) != target_samples:
            client.config_set("maxmemory-samples", target_samples)
            print(f"Updated maxmemory-samples to {target_samples}")
    except ResponseError as e:
        print(f"Configuration locked or unsupported: {e}")

# Validate eviction pressure
def check_eviction_velocity():
    info = client.info("stats")
    return info.get("evicted_keys", 0)

LFU: Logarithmic Counters & Session Preservation

Least Frequently Used (LFU) shifts the eviction heuristic from recency to access frequency. Each key carries an 8-bit logarithmic counter (stored within the object's 24-bit LRU field, so OBJECT FREQ ranges from 0–255) in its metadata. On access, the counter increments probabilistically based on current value and lfu-log-factor. Over time, the counter decays according to lfu-decay-time (default: 1 minute), allowing Redis to differentiate between a key accessed once years ago and one accessed repeatedly in the last hour.

LFU excels in stateful workloads where temporal spikes would otherwise flush long-lived, high-value objects. When architecting authentication layers, Comparing LFU vs LRU for User Session Caching demonstrates that LFU preserves active user contexts during flash traffic, provided lfu-log-factor matches application access velocity. Lower values (1–5) accelerate counter growth for hot sessions; higher values (50–100) delay saturation, preserving granularity across tiered access patterns.

Production Configuration

# Enable LFU with tuned decay and log factor
redis-cli CONFIG SET maxmemory-policy allkeys-lfu
redis-cli CONFIG SET lfu-log-factor 5
redis-cli CONFIG SET lfu-decay-time 1

# Inspect frequency counter for a specific key
redis-cli OBJECT FREQ session:user:8a3f9c

Topology-Aware Scaling Considerations

Eviction policies apply per-node in Redis Cluster. Each shard independently enforces maxmemory and executes eviction against its local keyspace. Misaligned policies across nodes cause uneven memory distribution and unpredictable cache miss rates. As outlined in Understanding Redis Cache Topology, consistent maxmemory-policy and maxmemory-samples/lfu-* parameters must be deployed via configuration management (Ansible, Terraform, or Kubernetes ConfigMaps) before cluster rebalancing.

# Apply uniform eviction policy across all cluster nodes
redis-cli --cluster call <cluster-node-ip>:6379 CONFIG SET maxmemory-policy allkeys-lfu
redis-cli --cluster call <cluster-node-ip>:6379 CONFIG SET lfu-log-factor 5

Observability & Database Load Correlation

Eviction is a leading indicator of capacity misalignment. Sudden spikes in evicted_keys correlate directly with increased origin database read load, cache stampede risk, and elevated p95 latency. Monitoring must track eviction velocity alongside hit/miss ratios and memory fragmentation.

Prometheus Integration

Deploy the official Redis Exporter and scrape standard metrics. Key queries for alerting:

# Eviction rate over 5m window
rate(redis_evicted_keys_total[5m]) > 0

# Memory utilization vs configured limit
redis_memory_used_bytes / redis_memory_max_bytes > 0.85

# Cache hit ratio degradation
rate(redis_keyspace_hits_total[5m]) / (rate(redis_keyspace_hits_total[5m]) + rate(redis_keyspace_misses_total[5m])) < 0.90

Python Telemetry Hook

import time
from prometheus_client import Counter, Gauge, start_http_server

EVICTED_RATE = Counter("redis_eviction_rate_total", "Cumulative evicted keys")
MEM_USAGE = Gauge("redis_memory_usage_ratio", "Used vs max memory ratio")

def emit_eviction_metrics(r: redis.Redis):
    stats = r.info("stats")
    mem = r.info("memory")
    evicted = stats.get("evicted_keys", 0)
    delta = evicted - getattr(emit_eviction_metrics, "last_evicted", 0)
    if delta > 0:  # Prometheus counters reject negative increments (e.g. after a restart)
        EVICTED_RATE.inc(delta)
    emit_eviction_metrics.last_evicted = evicted
    # maxmemory is 0 when unlimited — fall back to system memory to avoid ZeroDivisionError
    max_mem = mem.get("maxmemory") or mem.get("total_system_memory", 0)
    if max_mem:
        MEM_USAGE.set(mem["used_memory"] / max_mem)

The relationship between eviction frequency and origin load is non-linear. Aggressive LRU eviction on read-heavy endpoints can trigger thundering herd patterns, while poorly calibrated LFU decay can retain stale objects that consume memory without serving traffic. A systematic evaluation of Eviction Policy Impact on Database Read Load should precede any production policy migration.

Decision Matrix & Operational Playbook

Workload Profile Recommended Policy Key Tunables Monitoring Focus
High-throughput REST/GraphQL APIs allkeys-lru maxmemory-samples: 10-15 evicted_keys/sec, instantaneous_ops_per_sec
User sessions, auth tokens, shopping carts allkeys-lfu lfu-log-factor: 5-10, lfu-decay-time: 1 OBJECT FREQ distribution, session miss rate
Mixed read/write with predictable hot sets volatile-lru / volatile-lfu TTL alignment, maxmemory-samples Key expiration vs eviction ratio
Strict memory isolation per tenant noeviction + client-side TTL Application-level cache routing OOM errors, used_memory ceiling

Rollout Checklist

  1. Baseline: Record 7-day evicted_keys, keyspace_misses, and used_memory under current policy.
  2. Staging Validation: Apply new maxmemory-policy and tunables to a shadow cluster replaying production traffic.
  3. Gradual Promotion: Use CONFIG SET on a single replica, verify hit ratio stability, then promote to primaries.
  4. Alerting: Set PagerDuty/OpsGenie thresholds on rate(redis_evicted_keys_total[5m]) and redis_memory_used_bytes / redis_memory_max_bytes.
  5. Fallback: Maintain maxmemory-policy snapshots in version control; revert via CONFIG SET within 60s if p99 latency exceeds SLO.

Eviction policy selection is a continuous optimization loop, not a one-time configuration. By aligning algorithmic behavior with access patterns, enforcing topology-wide consistency, and instrumenting eviction velocity, engineering teams transform Redis from a passive cache into a predictable, self-regulating capacity layer.