Skip to main content
9 min read

Streaming an AI Agent Without a Function Timeout

Streaming an AI Agent Without a Function Timeout

An AI agent and a serverless function want different things. The agent wants to think, call a tool, stream some tokens, call another tool, and keep the connection open the whole time, which can be tens of seconds or more. A lot of serverless tiers want the opposite: do your work quickly and return, because the invocation has an execution cap. Put them together and you get the failure everyone who has shipped an agent has seen at least once: the response is still streaming when the platform decides time is up and closes the socket.

This is the second post in our series on Neon Functions. The first was about where your compute runs relative to your data; this one is about how long it is allowed to keep talking. Neon Functions are built to hold long-lived streaming connections, so a slow agent or a long stream is a normal request, not a fight with a timeout. To show it rather than assert it, I deployed two endpoints and measured them.

(Companion repo, deploy it yourself: The-DevOps-Daily/neon-streaming-demo.)

Two endpoints, one config

The whole backend is a single Hono function with the AI Gateway switched on in neon.ts:

import { defineConfig } from '@neondatabase/config/v1';

export default defineConfig({
  preview: {
    aiGateway: true,
    functions: {
      stream: { name: 'streaming demo', source: 'src/index.ts' },
    },
  },
});

The streaming itself is ordinary Hono. The first endpoint holds a server-sent-events connection open and emits a tick every second, for as many seconds as you ask:

import { streamSSE } from 'hono/streaming';

app.get('/long-stream', (c) => {
  const seconds = Math.min(600, Math.max(1, Number(c.req.query('seconds') ?? '90')));
  const start = Date.now();
  return streamSSE(c, async (stream) => {
    for (let i = 1; i <= seconds; i++) {
      await stream.writeSSE({ event: 'tick', data: JSON.stringify({ tick: i, elapsed_ms: Date.now() - start }) });
      await stream.sleep(1000);
    }
    await stream.writeSSE({ event: 'done', data: JSON.stringify({ ticks: seconds }) });
  });
});

It streamed for 90 seconds without being asked twice

I called /long-stream?seconds=90 and let it run. It ticked once a second, on the second, for a minute and a half, and closed cleanly on its own terms:

Ninety seconds is not a magic number; I picked it because it is comfortably past the execution cap a lot of serverless functions ship with by default, and the function did not care. No special mode, no config flag, no "streaming response" opt-in. The handler just held the connection.

Note

To be precise about the comparison: this is about defaults and design, not "infinite versus finite." Traditional serverless functions cap a single invocation low by default (Vercel's Hobby tier at 10 seconds, Pro at 60), which is exactly where a slow agent gets cut off. Platforms do offer longer runs when you reach for them: Vercel's Fluid Compute extends to 300 to 1800 seconds, and AWS Lambda allows up to 15 minutes. The point is that long-lived streaming is the default behaviour of a Neon Function, not a setting you discover after your agent times out in production.

Now stream an actual agent

A ticking clock proves the connection lasts. The real workload is a model streaming tokens. The second endpoint sends the prompt to the Neon AI Gateway with stream: true and relays each token to the caller as it arrives:

const upstream = await fetch(`${process.env.NEON_AI_GATEWAY_BASE_URL}/ai-gateway/mlflow/v1/chat/completions`, {
  method: 'POST',
  headers: { Authorization: `Bearer ${process.env.NEON_AI_GATEWAY_TOKEN}`, 'content-type': 'application/json' },
  body: JSON.stringify({ model: 'gpt-5-nano', stream: true, messages }),
});
// ...parse the upstream SSE and re-emit each delta as it lands
await stream.writeSSE({ event: 'token', data: JSON.stringify({ delta }) });

Calling it with a small prompt, the first token came back at 466 ms and the full 62-token reply finished at about 2.0 seconds. The reader sees the answer forming almost immediately instead of waiting two seconds for a wall of text:

Two seconds is short because the model and the prompt are small. The reason this matters is that real agents are not short: they make several model calls, run tools between them, and a full run is routinely tens of seconds. On a platform that caps invocations at 10 or 60 seconds, that run is a gamble against the clock. On a function built to hold the stream, it is just a request that takes a while.

What it is, and what it is not

Warning

Private preview, one region, new projects only. Everything is in AWS us-east-2 and only works on projects created inside the preview. Plan accordingly before building on it.

Two more things worth knowing before you reach for this:

  • It is request/response, even when the response is long. These functions answer a caller and can keep streaming to it for a long time, including over WebSockets and SSE. They are not a background job runner. Work that should outlive the request (queues, retries, scheduled tasks) belongs to something like Inngest or QStash.
  • Idle functions can be evicted. A long active stream is fine; a function sitting idle may be scaled to zero and cold-start on the next call. That is the usual serverless tradeoff, not a streaming-specific one.

Who this is for

If you are shipping anything agentic (a chat assistant, a tool-using agent, a long generation, an MCP server holding a session), the timeout is the wall you hit first, and the usual workaround is to learn your platform's extended-duration mode and hope you configured it right. A function that holds the stream by default removes that whole category of "why did my response get cut off" debugging.

The full demo, both endpoints, is here. The streaming logic is about 80 lines:

Next in the series: a Postgres-backed MCP server in about twenty lines, and preview environments that include the backend, not just the frontend. The strategy behind all of it is in Neon is becoming a backend platform, not just Postgres.

Published: 2026-06-27|Last updated: 2026-06-27T14:00:00Z

Found an issue?

Also worth your time on this topic