2026-06-27

9 min read

Streaming an AI Agent Without a Function Timeout

An AI agent and a serverless function want different things. The agent wants to think, call a tool, stream some tokens, call another tool, and keep the connection open the whole time, which can be tens of seconds or more. A lot of serverless tiers want the opposite: do your work quickly and return, because the invocation has an execution cap. Put them together and you get the failure everyone who has shipped an agent has seen at least once: the response is still streaming when the platform decides time is up and closes the socket.

This is the second post in our series on Neon Functions. The first was about where your compute runs relative to your data; this one is about how long it is allowed to keep talking. Neon Functions are built to hold long-lived streaming connections, so a slow agent or a long stream is a normal request, not a fight with a timeout. To show it rather than assert it, I deployed two endpoints and measured them.

(Companion repo, deploy it yourself: The-DevOps-Daily/neon-streaming-demo.)

Two endpoints, one config

The whole backend is a single Hono function with the AI Gateway switched on in neon.ts:

import { defineConfig } from '@neondatabase/config/v1';

export default defineConfig({
  preview: {
    aiGateway: true,
    functions: {
      stream: { name: 'streaming demo', source: 'src/index.ts' },
    },
  },
});

The streaming itself is ordinary Hono. The first endpoint holds a server-sent-events connection open and emits a tick every second, for as many seconds as you ask:

import { streamSSE } from 'hono/streaming';

app.get('/long-stream', (c) => {
  const seconds = Math.min(600, Math.max(1, Number(c.req.query('seconds') ?? '90')));
  const start = Date.now();
  return streamSSE(c, async (stream) => {
    for (let i = 1; i <= seconds; i++) {
      await stream.writeSSE({ event: 'tick', data: JSON.stringify({ tick: i, elapsed_ms: Date.now() - start }) });
      await stream.sleep(1000);
    }
    await stream.writeSSE({ event: 'done', data: JSON.stringify({ ticks: seconds }) });
  });
});

It streamed for 90 seconds without being asked twice

I called /long-stream?seconds=90 and let it run. It ticked once a second, on the second, for a minute and a half, and closed cleanly on its own terms:

Ninety seconds is not a magic number; I picked it because it is comfortably past the execution cap a lot of serverless functions ship with by default, and the function did not care. No special mode, no config flag, no "streaming response" opt-in. The handler just held the connection.

Note

To be precise about the comparison: this is about defaults and design, not "infinite versus finite." Traditional serverless functions cap a single invocation low by default (Vercel's Hobby tier at 10 seconds, Pro at 60), which is exactly where a slow agent gets cut off. Platforms do offer longer runs when you reach for them: Vercel's Fluid Compute extends to 300 to 1800 seconds, and AWS Lambda allows up to 15 minutes. The point is that long-lived streaming is the default behaviour of a Neon Function, not a setting you discover after your agent times out in production.

Now stream an actual agent

A ticking clock proves the connection lasts. The real workload is a model streaming tokens. The second endpoint sends the prompt to the Neon AI Gateway with stream: true and relays each token to the caller as it arrives:

const upstream = await fetch(`${process.env.NEON_AI_GATEWAY_BASE_URL}/ai-gateway/mlflow/v1/chat/completions`, {
  method: 'POST',
  headers: { Authorization: `Bearer ${process.env.NEON_AI_GATEWAY_TOKEN}`, 'content-type': 'application/json' },
  body: JSON.stringify({ model: 'gpt-5-nano', stream: true, messages }),
});
// ...parse the upstream SSE and re-emit each delta as it lands
await stream.writeSSE({ event: 'token', data: JSON.stringify({ delta }) });

Calling it with a small prompt, the first token came back at 466 ms and the full 62-token reply finished at about 2.0 seconds. The reader sees the answer forming almost immediately instead of waiting two seconds for a wall of text:

Two seconds is short because the model and the prompt are small. The reason this matters is that real agents are not short: they make several model calls, run tools between them, and a full run is routinely tens of seconds. On a platform that caps invocations at 10 or 60 seconds, that run is a gamble against the clock. On a function built to hold the stream, it is just a request that takes a while.

What it is, and what it is not

Warning

Private preview, one region, new projects only. Everything is in AWS us-east-2 and only works on projects created inside the preview. Plan accordingly before building on it.

Two more things worth knowing before you reach for this:

It is request/response, even when the response is long. These functions answer a caller and can keep streaming to it for a long time, including over WebSockets and SSE. They are not a background job runner. Work that should outlive the request (queues, retries, scheduled tasks) belongs to something like Inngest or QStash.
Idle functions can be evicted. A long active stream is fine; a function sitting idle may be scaled to zero and cold-start on the next call. That is the usual serverless tradeoff, not a streaming-specific one.

Who this is for

If you are shipping anything agentic (a chat assistant, a tool-using agent, a long generation, an MCP server holding a session), the timeout is the wall you hit first, and the usual workaround is to learn your platform's extended-duration mode and hope you configured it right. A function that holds the stream by default removes that whole category of "why did my response get cut off" debugging.

The full demo, both endpoints, is here. The streaming logic is about 80 lines:

Next in the series: a Postgres-backed MCP server in about twenty lines, and preview environments that include the backend, not just the frontend. The strategy behind all of it is in Neon is becoming a backend platform, not just Postgres.

Proudly Sponsored By

We earn commissions when you shop through the links below.

DigitalOcean

Cloud infrastructure for developers

Simple, reliable cloud computing designed for developers

Learn more

DevDojo

Developer community & tools

Join a community of developers sharing knowledge and tools

Learn more

SMTPfast

Developer-first email API

Send transactional and marketing email through a clean REST API. Detailed logs, webhooks, and embeddable signup forms in one dashboard.

Learn more

QuizAPI

Developer-first quiz platform

Build, generate, and embed quizzes with a powerful REST API. AI-powered question generation and live multiplayer.

Learn more

Want to support DevOps Daily and reach thousands of developers?

Become a Sponsor

Published: 2026-06-27|Last updated: 2026-06-27T14:00:00Z

Also worth your time on this topic

Article

I Gave an AI Agent a Database, Compute, Storage, and Models From One CLI

An AI agent usually needs four accounts: a database, somewhere to run, object storage, and a model provider. I wired all four from a single Neon credential and had a deployed image-generating agent in a few minutes. Here is the actual build log, the config that ties it together, and the honest caveats.

Exercise

Complete Web Server Automation with Ansible

Build a comprehensive Ansible playbook to automate web server deployment, configuration, and security hardening across multiple environments.

75 minutes

Streaming an AI Agent Without a Function Timeout

Two endpoints, one config

It streamed for 90 seconds without being asked twice

Now stream an actual agent

What it is, and what it is not

Who this is for

DigitalOcean

DevDojo

SMTPfast

QuizAPI

Tags

Related Posts

Compute That Lives on Your Database Branch

I Gave an AI Agent a Database, Compute, Storage, and Models From One CLI

Neon Is Becoming a Backend Platform, Not Just Postgres

Also worth your time on this topic

I Gave an AI Agent a Database, Compute, Storage, and Models From One CLI

Complete Web Server Automation with Ansible