2026-02-10

Your Agent Doesn't Need a Database. It Needs Redis.

#redis#agents#architecture#engineering

Redis for agent state — fast ephemeral storage over traditional databases

Every agent framework tutorial starts the same way. "First, set up your database." Then they show you a Postgres schema with tables for agent runs, task states, conversation history, tool call logs. It's a lot of infrastructure before you've written a single line of agent logic.

We did the same thing with our first agent system. Full Postgres schema. Migrations. An ORM. Connection pooling. The works.

Then we watched the database during a production run and realized the access patterns were nothing like a traditional application. The agent was writing a checkpoint every few seconds, reading it back immediately on the next iteration, and never querying historical checkpoints. The conversation history was append-only, read-forward, and aged out after a few hours. The tool call logs were written once and read once — during the same pipeline run — for error recovery.

We were using Postgres like a very expensive, very slow key-value store.

Agent State Isn't Application State

Traditional applications have long-lived, richly relational data. Users have orders. Orders have line items. Line items reference products. You need joins, transactions, constraints, indexes. Postgres is built for this.

Agent state is different:

Checkpoints are single key-value writes that get overwritten on the next iteration. The agent saves "here's where I am" so it can resume after a crash. You don't need ACID transactions for this. You need fast writes and fast reads on a single key.

Ephemeral coordination happens when multiple agents need to share state during a pipeline run. Agent A produces a result. Agent B needs to read it. Agent C is waiting for both to finish. This is pub/sub, not a database query.

Rate limiting and deduplication require atomic counters and set membership checks. "Have I processed this ID before?" is a SISMEMBER call, not a SELECT WHERE id = ?.

TTL-based expiration is native to agent workflows. A checkpoint from yesterday's run is useless. Conversation context older than the current session is useless. You want data to disappear automatically, not accumulate until someone remembers to run a cleanup job.

Redis handles all of these patterns natively. Postgres can be made to handle them, but you're fighting the tool instead of using it.

The Performance Gap Is Enormous

We benchmarked this when we built Agent Runner. Same agent, same workload, Postgres vs. Redis for state management:

Checkpoint write: Postgres 4-8ms, Redis 0.1-0.3ms
Checkpoint read: Postgres 2-5ms, Redis 0.1-0.2ms
Pub/sub notification: Postgres LISTEN/NOTIFY ~5ms, Redis Pub/Sub ~0.2ms
Set membership check: Postgres indexed query ~3ms, Redis SISMEMBER ~0.1ms

For a single operation, the difference is negligible. For an agent that checkpoints every 3 seconds and makes dozens of coordination calls per pipeline stage, the difference adds up. On a busy day, we estimated the Postgres-backed agent was spending roughly 15% of its wall-clock time on state management. The Redis-backed version spends less than 1%.

What We Store in Redis

Here's our actual Redis usage across Agent Runner and the other systems:

Agent checkpoints (SET agent:{id}:checkpoint {json} EX 86400). Key-value with a 24-hour TTL. When the agent restarts, it reads its last checkpoint and resumes. If there's no checkpoint (first run or expired), it starts fresh. No migration needed. No schema to manage.

Pipeline coordination (PUBLISH pipeline:{id}:stage_complete {json}). When a pipeline stage finishes, it publishes a message. Downstream stages subscribe and start when their dependencies are met. This replaces polling loops and database flag-checking.

Deduplication sets (SADD processed:{date} {item_id}). Each pipeline run checks whether it's already processed an item. The set expires at the end of the day. This prevents reprocessing without maintaining a permanent history table.

Rate limit counters (INCR ratelimit:{service}:{window} / EXPIRE). When agents call external APIs, we rate-limit per service using atomic counters with expiring windows. Redis does this in a single round-trip.

Ephemeral context (HSET context:{run_id} {field} {value} / EXPIRE). During a pipeline run, stages share context — extracted entities, intermediate calculations, flags. The hash expires when the run completes. No cleanup jobs needed.

When You Still Need Postgres

We're not anti-database. We use Postgres extensively — just not for agent state. Here's what still lives in Postgres:

Permanent records. The final output of a pipeline run — the structured results, the audit trail, the compliance records. These need to persist for months or years. They need to be queryable. They benefit from relational structure.

Configuration. Pipeline definitions, tool configurations, client settings. These are long-lived, relationally structured, and rarely written. Classic Postgres territory.

Analytics. Aggregate metrics, trend data, usage reporting. This data is queried in complex ways (joins, GROUP BY, window functions) and benefits from Postgres's query planner.

The pattern is: Redis for hot, ephemeral, agent-lifecycle state. Postgres for cold, permanent, business-lifecycle data. They serve different purposes and trying to use one for both leads to compromises in both directions.

The Operations Argument

There's a common objection: "Adding Redis means another service to operate." Fair point. But in 2026, Redis is table stakes for production infrastructure. If you're running any kind of web application, you probably already have Redis for caching, session storage, or job queues. Adding agent state to an existing Redis instance is a config change, not a new deployment.

If you're starting from scratch and really don't want to run Redis, the managed offerings (ElastiCache, Upstash, Redis Cloud) reduce the operational overhead to nearly zero. You're not managing a Redis cluster. You're calling an endpoint.

And honestly, if the alternative is overloading your Postgres instance with thousands of rapid-fire checkpoint writes, Redis is the thing that reduces your operational burden. A Postgres instance that's 15% occupied with ephemeral agent state is harder to operate than a Postgres instance that's focused on what it's good at, plus a Redis instance doing what it's good at.

How We Set It Up in Agent Runner

Agent Runner ships with Redis as the default state backend. The setup is one environment variable:

AGENT_RUNNER_REDIS_URL=redis://localhost:6379/0

All state operations go through an abstraction layer, so if you really want to use Postgres (or DynamoDB, or SQLite for local development), you can write a custom backend. But the Redis backend is what we use in production, what we test against, and what we recommend.

The abstraction layer exposes five operations: checkpoint, restore, publish, subscribe, and exists. That's it. If your state management needs are more complex than those five operations, you're probably storing things that belong in a database, not in agent state.

The Point

Don't reach for Postgres by default just because it's the tool you know. Look at your access patterns. If your agent is writing and reading single keys, coordinating with pub/sub, and expiring data on a TTL — that's Redis. Use the right tool for the pattern, not the tool you're most comfortable with.

Your agent will be faster. Your database will be happier. And you'll spend less time writing cleanup jobs for data that should have disappeared on its own.