Back to Blog

2026-01-22

Serverless Agents Are Underrated

#serverless#aws#architecture#data-pipelines

Serverless architecture — cloud to containers

When people hear "serverless," they think of simple webhook handlers and CRUD endpoints. Fair enough — that's where most serverless usage lives. But our data pipeline — the one that processes large heterogeneous data packages with 65k+ pattern definitions — is fully serverless. Lambda functions, ECS Fargate tasks, S3 event triggers, no EC2 instances anywhere.

People are often surprised by this. "Can you really run complex agent workloads on serverless?" Yes. But not the way most people try to.

Why We Went Serverless

The workload profile made the decision for us. Our pipeline processes data packages that arrive irregularly. Some days we get 200. Some days we get 5. Some packages are tiny. Some are enormous. The processing for each package can take anywhere from 30 seconds to 20 minutes depending on size and complexity.

On EC2, you have two bad options: provision for peak load and pay for idle capacity 90% of the time, or provision for average load and watch your system fall over during spikes.

Serverless gives you a third option: provision for nothing and let the platform scale to whatever the current load requires. When 200 packages arrive in an hour, 200 Lambda functions spin up. When nothing arrives for three hours, you're paying for nothing.

The cost math was overwhelming. Our estimated EC2 bill for equivalent throughput was roughly 4x what we're paying on serverless, primarily because of all the idle time between bursts.

The Architecture

The pipeline is event-driven end to end:

  1. Data package lands in S3. This fires an S3 event notification.
  2. Intake Lambda triggers. Validates the package, registers it in our tracking system, and decomposes it into individual components. Each component gets written back to S3 with metadata.
  3. Classification Lambda triggers (per component). Examines each component and determines which analysis chains it needs to go through. Writes a processing plan to a coordination queue.
  4. Analysis workers execute (Fargate tasks for heavy work, Lambda for light work). Each worker runs a specific analysis pass — pattern matching, entity extraction, structural analysis. Results go back to S3.
  5. Aggregation Lambda triggers when all analysis workers for a package complete. Correlates results across components, applies priority scoring, and produces the final output.

Every step is triggered by an event. There's no orchestrator process sitting in a loop polling for work. The architecture is the orchestration.

Lambda vs. Fargate: When to Use Which

This is the decision that trips people up. Lambda has a 15-minute execution limit and a relatively low memory ceiling. Fargate has neither constraint but takes longer to start and costs more per second of compute.

Our rule of thumb:

  • Lambda for anything that completes in under 5 minutes and needs less than 3GB of memory. This covers classification, lightweight analysis passes, result aggregation, and all coordination logic.
  • Fargate for heavy processing that might take 10-20 minutes or needs significant memory. Deep pattern matching against 65k+ patterns, large file parsing, and any analysis that loads substantial reference data into memory.

The key insight is that you don't have to choose one or the other. A single pipeline execution might involve 15 Lambda invocations and 3 Fargate tasks, all triggered by events, all running in parallel where possible.

Cold Starts: The Real Story

Everyone asks about cold starts. Here's the truth: they matter less than you think for batch processing workloads, and more than you think for real-time ones.

Our pipeline processes data packages, not real-time requests. If the first Lambda in the chain takes 2 extra seconds to cold start, that's 2 seconds on a process that takes minutes. Nobody notices.

Where cold starts do bite us is the Fargate tasks. A Fargate task can take 30-60 seconds to pull the container image and start. For a task that runs for 15 minutes, that's acceptable. For a task that runs for 30 seconds, the startup overhead is significant.

Our mitigation: we keep a small pool of Fargate tasks warm during business hours using a simple keep-alive mechanism. After hours, we let them scale to zero and accept the cold start penalty. The cost of keeping a few tasks warm during peak hours is trivial compared to running full EC2 instances 24/7.

State Management Without Servers

This was the trickiest part. Serverless functions are stateless by design, but our pipeline needs to track state: which components have been processed, which analysis passes are complete, whether the full package is done.

We use a combination of:

  • S3 object metadata for component-level state. Each component's processing status is stored as metadata on the S3 object itself.
  • DynamoDB for package-level tracking. A lightweight record per package that tracks overall progress, timing, and the final status.
  • SQS for coordination. When all components of a package have been analyzed, a completion message triggers the aggregation step.

No Redis, no Postgres, no state server. Everything is managed through AWS-native serverless services. The entire infrastructure is defined in Terraform and can be torn down and rebuilt from scratch in minutes.

When Serverless Doesn't Work for Agents

We'd be dishonest if we said serverless works for everything. Here's where we wouldn't use it:

Long-running conversational agents. If your agent maintains a conversation over minutes or hours, serverless doesn't make sense. You need a persistent process. This is why Beacon runs on traditional infrastructure — a Slack bot needs to be always-on and maintain conversation state.

GPU workloads. If your agent needs GPU access for local model inference, serverless options are limited and expensive. Run your own instances.

Sub-second latency requirements. If the user is waiting for a response and cold starts are unacceptable, serverless adds unpredictable latency. Pre-warming helps but adds complexity.

High-frequency, steady-state workloads. If you're processing a constant, predictable stream of data 24/7, reserved EC2 instances are probably cheaper. Serverless wins when the workload is bursty.

The Terraform Factor

One underappreciated benefit of serverless: your entire infrastructure is code. Our Torrent pipeline is defined in ~800 lines of Terraform. Lambda functions, Fargate task definitions, S3 buckets, event triggers, IAM roles, CloudWatch alarms — all of it.

A new engineer can read those 800 lines and understand the complete architecture. They can deploy a complete copy to a test account with a single command. They can diff the infrastructure between staging and production.

Try doing that with a fleet of EC2 instances, load balancers, auto-scaling groups, and configuration management tools. Serverless doesn't just simplify your runtime — it simplifies your operations.

The Bottom Line

Serverless isn't the right choice for every agent workload. But for bursty, event-driven processing — the kind where you're handling data packages, documents, or batch jobs that arrive irregularly — it's hard to beat. You pay for exactly what you use, you scale without intervention, and your infrastructure is a text file.

We've been running Torrent in production for months. The pipeline has processed thousands of data packages. Our operational burden is approximately zero — there are no servers to patch, no instances to monitor, no capacity to plan. The closest thing we have to an operations task is reviewing the CloudWatch dashboards once a week to make sure cost trends are where we expect them.

If your workload fits the pattern, give serverless a serious look. The limitations are real, but the benefits — for the right use case — are transformative.