Redis Security Boundaries for Multi-Tenant Applications

Running many tenants against one Redis deployment changes the threat model completely: the failure you must engineer against is no longer slow reads but cross-tenant data leakage, unauthorized key mutation, and eviction-driven denial of service by a noisy neighbor. Single-tenant habits — a shared requirepass and a hopeful tenant:* key prefix — enforce naming, not isolation; any client that authenticates can SCAN the whole keyspace and read every tenant's data. This page shows how to build real boundaries inside a shared instance by treating the tenant identifier as a security principal: scoped Redis ACL users, command lockdown, a client that verifies its own identity before serving traffic, per-tenant eviction isolation, and a CI/CD gate that fails the build if any tenant can reach another's keys. It builds directly on the cache topology you have chosen, because isolation is enforced differently on a replicated single shard than across a sharded Redis Cluster.

Prerequisites

Redis 7.0+ — ACL selectors ((...) scoped rule groups) and ACL DRYRUN (7.2+) are required; the rename-command approach from Redis 6.x is deprecated and does not scope per tenant.
redis-py 5.x on Python 3.10+, using redis.asyncio so the identity check and reads never block the event loop.
A deliberate key-naming contract — every tenant key is prefixed tenant:<id>:* — so ACL key patterns can bind to it.
TLS enabled on the instance (tls-port, client certs) so ACL passwords are never sent in cleartext between services.
Client-side ACL LOG metrics exported to your observability stack so NOPERM/NOAUTH spikes are alertable.

Step-by-Step Implementation

1. Baseline the current boundary with a direct ACL audit. Before changing anything, query Redis itself rather than application logs — ACL LOG records rejected commands, the offending user, and the key-space violation with millisecond precision so you can separate a routing bug from configuration drift.

# Recent ACL violations (default: last 10 entries)
redis-cli ACL LOG

# Map offending connections to IPs / service accounts
redis-cli CLIENT LIST | grep -E "addr=|user="

# Spot bulk-enumeration leaks: heavy SCAN/KEYS/DEL against the shared keyspace
redis-cli INFO COMMANDSTATS | grep -E "cmdstat_keys|cmdstat_scan|cmdstat_del"

2. Create a namespace-scoped ACL user per tenant. Give each tenant its own user whose keys are pinned to ~tenant:<id>:*, granting broad read on the namespace but confining writes to a narrower slice with a selector — the (...) group applies its rules only within that scope.

# Cryptographically random password — never hand-pick tenant secrets
TENANT_PASS=$(redis-cli ACL GENPASS)

# Read across the tenant namespace; write only to its session keys.
# Rules apply left-to-right; a later rule can widen an earlier grant,
# so order the namespace pin before the selector.
redis-cli ACL SETUSER tenant_123 on ">$TENANT_PASS" \
  "+@read" "~tenant:123:*" \
  "(+@write ~tenant:123:session:*)"

redis-cli ACL GETUSER tenant_123

3. Disable enumeration and flush commands globally. The strongest namespace pin still leaks if a tenant can run KEYS * or FLUSHALL, so remove those verbs from every tenant user and force cursor-based SCAN in application code.

# Strip dangerous verbs from the default user and every tenant user
redis-cli ACL SETUSER default -@dangerous -keys -flushall -flushdb
redis-cli ACL SETUSER tenant_123 -keys -flushall -flushdb -@admin

# Validate a would-be cross-tenant write is refused BEFORE deploying (7.2+)
redis-cli ACL DRYRUN tenant_123 SET tenant:other:leak "blocked"
# Expected: "This user has no permissions to access the 'tenant:other:leak' key"

4. Build a tenant-scoped async client that verifies its own identity. Dynamic ACL changes can outpace connection pooling and surface as NOAUTH/NOPERM on cold start, so authenticate as the tenant user and confirm ACL WHOAMI before the client is allowed to serve traffic.

import logging
from redis.asyncio import Redis
from redis.exceptions import AuthenticationError, NoPermissionError, ConnectionError
from tenacity import (
    retry, stop_after_attempt, wait_exponential, retry_if_exception_type,
)

logger = logging.getLogger(__name__)

async def get_tenant_client(tenant_id: str, password: str) -> Redis:
    client = Redis(
        host="cache.internal.svc",
        port=6379,
        username=f"tenant_{tenant_id}",   # authenticate AS the tenant, not default
        password=password,
        ssl=True,                         # ACL secrets must not cross the wire in clear
        decode_responses=True,
        socket_timeout=2.0,
        socket_connect_timeout=1.0,
        health_check_interval=15,
    )
    # Refuse to hand back a client whose ACL identity is wrong — this catches
    # propagation lag right after an ACL SETUSER rollout.
    whoami = await client.acl_whoami()
    if whoami != f"tenant_{tenant_id}":
        raise RuntimeError(f"ACL mismatch: expected tenant_{tenant_id}, got {whoami}")
    return client

5. Fail closed on permission errors and retry only transient faults. A NoPermissionError (NOPERM) is a permanent boundary violation, not a blip, so exclude it from the retry policy and re-raise it immediately while still retrying genuine connection faults.

@retry(
    retry=retry_if_exception_type((AuthenticationError, ConnectionError)),
    wait=wait_exponential(multiplier=0.5, min=0.5, max=5),
    stop=stop_after_attempt(3),
    reraise=True,
)
async def tenant_cache_get(client: Redis, key: str) -> str | None:
    try:
        return await client.get(key)
    except NoPermissionError as e:
        # NOPERM == crossed a tenant boundary. Never retry it; surface it loud.
        logger.error("ACL boundary violation on key %s: %s", key, e)
        raise

6. Isolate eviction so a noisy tenant cannot evict a neighbor's keys. maxmemory-policy is instance-global — you cannot give one tenant allkeys-lfu and another volatile-ttl on the same instance — so route high-churn tenants onto dedicated instances and tune the shared eviction policy deliberately, the same eviction policy decision that governs single-tenant capacity.

# Shared-instance eviction is one global lever — set it, don't assume per-tenant
redis-cli CONFIG GET maxmemory-policy
redis-cli CONFIG SET maxmemory-policy allkeys-lfu   # frequency-aware, protects hot tenants

# Watch eviction pressure that would cross tenant lines under memory contention
redis-cli INFO stats | grep -E "evicted_keys|keyspace_misses"

How Tenant Isolation Routes a Command

The path below shows why the ACL layer, not the key prefix, is the boundary: every command is resolved against the authenticated user's key pattern before it touches a keyspace, and a request outside the tenant's namespace is rejected with NOPERM rather than served.

Failure Modes

Selector or rule ordering silently widens a grant. ACL rules apply left-to-right and a later ~* or +@all overrides an earlier namespace pin, so a copy-pasted rule can hand one tenant the whole keyspace. Diagnose by replaying the exact command through redis-cli ACL DRYRUN tenant_123 GET tenant:other:secret (an OK where you expected a refusal is the bug) and by reading redis-cli ACL GETUSER tenant_123 for a stray ~*. Fix by ordering the ~tenant:<id>:* pin before any selector and never granting allkeys.

A global-eviction noisy neighbor evicts another tenant's hot keys. Because eviction is instance-wide, one tenant filling memory forces Redis to evict any key, cratering a co-located tenant's hit ratio during a burst. Diagnose with redis-cli INFO stats | grep -E "evicted_keys|keyspace_misses" showing an eviction spike uncorrelated with the victim tenant's own writes; fix by moving the high-churn tenant to a dedicated instance or promoting it off the shared shard, and by setting a per-tenant memory budget upstream so no single tenant can consume the whole ceiling.

An unscoped bulk invalidation traverses tenant namespaces. A SCAN/DEL sweep during bulk invalidation that forgets the tenant:<id>: prefix walks every tenant's keys, causing unauthorized misses or exposing stale references — and if the operation runs during a MOVED redirection on a sharded cluster, slot movement without tenant-aware pinning compounds it. Diagnose with redis-cli INFO COMMANDSTATS | grep -E "cmdstat_scan|cmdstat_del" showing volume far above one tenant's footprint; fix by binding every invalidation sweep to a MATCH tenant:<id>:* cursor and pairing targeted deletes with a short TTL safety net.

Verification

Confirm the boundary holds by trying to cross it, not just by connecting successfully.

# 1. Cross-tenant write must be refused for every tenant user
redis-cli ACL DRYRUN tenant_123 SET tenant:999:leak "x" | grep -qi "no permissions" \
  && echo "PASS: cross-tenant write blocked" || echo "FAIL"

# 2. Enumeration verbs are gone from tenant users
redis-cli ACL GETUSER tenant_123 | grep -q "keys" && echo "FAIL: KEYS still allowed" \
  || echo "PASS: KEYS disabled"

# 3. The client authenticates AS the tenant, not default
redis-cli -u "redis://tenant_123:$TENANT_PASS@cache.internal.svc:6379" ACL WHOAMI

# 4. No fresh boundary violations are accumulating
redis-cli ACL LOG RESET && sleep 60 && redis-cli ACL LOG | grep -c "reason"

Wire the same checks into CI/CD so a policy regression fails the merge instead of shipping: stand up a throwaway redis:7.2 container, create a tenant user scoped to ~tenant:99:*, assert that an out-of-namespace ACL DRYRUN returns a permission error, then run an integration test that drives concurrent tenant workloads and asserts zero cross-tenant key reads.

# .github/workflows/redis-acl-gate.yml
name: Redis ACL Policy Validation
on:
  pull_request:
    paths: ['infra/redis/acl/**']
jobs:
  validate-acl:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - name: Start Redis 7.2
        run: docker run -d --name redis-test -p 6379:6379 redis:7.2-alpine
      - name: Assert cross-tenant write is blocked
        run: |
          docker exec redis-test redis-cli ACL SETUSER test_user on ">testpass" "+@read" "~tenant:99:*"
          RESULT=$(docker exec redis-test redis-cli ACL DRYRUN test_user SET tenant:other:leak "blocked")
          echo "$RESULT" | grep -qi "no permissions" || { echo "FAIL: cross-tenant write not blocked"; exit 1; }
          echo "PASS: cross-tenant isolation verified"
      - name: Integration isolation test
        run: python -m pytest tests/redis_isolation.py --redis-url=redis://localhost:6379

A correct deployment shows every out-of-namespace ACL DRYRUN refused, no keys/flushall verb on any tenant user, ACL WHOAMI returning the tenant identity rather than default, and a flat ACL LOG under steady-state load.

FAQ

Isn't a tenant:<id>: key prefix enough to isolate tenants? No. A prefix is a naming convention, not an access boundary — any client that authenticates with a shared password can SCAN, GET, or DEL across every prefix. Isolation requires an ACL user whose ~tenant:<id>:* key pattern makes Redis itself reject out-of-namespace commands with NOPERM.

Can I give each tenant its own maxmemory-policy on a shared instance? No. maxmemory-policy and maxmemory are instance-global in open-source Redis. To give a tenant an independent eviction profile or memory budget you must route it to a dedicated instance (or a separate Redis Enterprise database); on a shared instance the policy is one lever for everyone.

Should the client ever retry a NoPermissionError? Never. NOPERM signals a permanent policy decision — the tenant crossed a boundary — so retrying only hammers the instance and hides the violation. Retry only ConnectionError/TimeoutError and AuthenticationError during propagation lag; re-raise NOPERM immediately and alert on it.

Do ACL rules survive a restart or failover? Only if persisted. Rules created with ACL SETUSER live in memory until you run ACL SAVE (with an aclfile configured) or bake them into redis.conf; otherwise a restart or a promoted replica without the same aclfile comes up with no tenant users. Manage the ACL file as versioned infrastructure and load it identically on every node.

How does isolation differ on a sharded cluster versus a single shard? ACL users and key patterns replicate across the Redis cluster, so the rules are the same — but a tenant's keys can land on any shard, and enumeration or invalidation now fans out per node. Keep the boundary at the ACL layer and pin bulk operations to MATCH tenant:<id>:* so a per-shard SCAN never walks a neighbor's slice.

Keep exploring

Up one level: Understanding Redis Cache Topology

Redis Security Boundaries for Multi-Tenant Applications

# Prerequisites

# Step-by-Step Implementation

# How Tenant Isolation Routes a Command

# Failure Modes

# Verification

# FAQ

# Keep exploring

# Related