Search Autocomplete System Design

Design the backend for a search autocomplete system that returns suggestions within 100ms as the user types.

senior

advanced

System Design

Question

Design the backend for a search autocomplete system that returns suggestions within 100ms as the user types.

Answer

The core data structure is a trie (prefix tree) that stores popular search queries. Each node represents a character, and nodes are annotated with frequency scores so you can return the top-k suggestions for any prefix. To meet the 100ms target at scale, you partition the trie across multiple servers by prefix range (a-f on server 1, g-m on server 2, etc.) and keep the working set in memory. A separate offline pipeline aggregates search logs, computes query frequencies, and rebuilds the trie periodically. When the user types, the frontend debounces keystrokes (typically 150-300ms delay), sends the prefix to an API gateway, which routes to the correct shard. The shard walks the trie to the prefix node and returns the top-k results. A CDN or edge cache sits in front for the most common prefixes, since queries like 'how' or 'what' are nearly identical across users.

Why This Matters

Autocomplete is one of the most common system design interview questions because it touches on data structures, caching, ranking, and latency budgets all at once. A strong answer addresses the full pipeline: data collection, offline processing, serving infrastructure, and client-side optimizations. Senior candidates should discuss how to handle personalization, freshness of trending queries, and multi-language support. The interviewer is looking for the ability to make tradeoffs between serving speed and data freshness.

Code Examples

Trie-based autocomplete with frequency ranking

python

System architecture overview

yaml

Common Mistakes

Proposing a simple SQL LIKE query against a database, which cannot meet the latency requirement at scale
Forgetting client-side optimizations like debouncing and local caching, which reduce backend load significantly
Not discussing how the trie data stays fresh or how you handle trending queries
Ignoring the ranking component and treating all prefix matches as equally relevant
Overcomplicating the design with ML ranking models before addressing the basic infrastructure
Not mentioning how to partition or replicate the trie when it outgrows a single server

Follow-up Questions

Interviewers often ask these as follow-up questions

How would you handle trending or breaking-news queries that spike suddenly?
How do you personalize autocomplete results per user without adding latency?
What happens if one of the trie shards goes down? How do you ensure availability?
How would you filter out offensive or inappropriate suggestions?
How does the system handle multiple languages and character sets?