Search Autocomplete System Design
Design the backend for a search autocomplete system that returns suggestions within 100ms as the user types.
Design the backend for a search autocomplete system that returns suggestions within 100ms as the user types.
The core data structure is a trie (prefix tree) that stores popular search queries. Each node represents a character, and nodes are annotated with frequency scores so you can return the top-k suggestions for any prefix. To meet the 100ms target at scale, you partition the trie across multiple servers by prefix range (a-f on server 1, g-m on server 2, etc.) and keep the working set in memory. A separate offline pipeline aggregates search logs, computes query frequencies, and rebuilds the trie periodically. When the user types, the frontend debounces keystrokes (typically 150-300ms delay), sends the prefix to an API gateway, which routes to the correct shard. The shard walks the trie to the prefix node and returns the top-k results. A CDN or edge cache sits in front for the most common prefixes, since queries like 'how' or 'what' are nearly identical across users.
Autocomplete is one of the most common system design interview questions because it touches on data structures, caching, ranking, and latency budgets all at once. A strong answer addresses the full pipeline: data collection, offline processing, serving infrastructure, and client-side optimizations. Senior candidates should discuss how to handle personalization, freshness of trending queries, and multi-language support. The interviewer is looking for the ability to make tradeoffs between serving speed and data freshness.
Trie-based autocomplete with frequency ranking
System architecture overview
- Proposing a simple SQL LIKE query against a database, which cannot meet the latency requirement at scale
- Forgetting client-side optimizations like debouncing and local caching, which reduce backend load significantly
- Not discussing how the trie data stays fresh or how you handle trending queries
- Ignoring the ranking component and treating all prefix matches as equally relevant
- Overcomplicating the design with ML ranking models before addressing the basic infrastructure
- Not mentioning how to partition or replicate the trie when it outgrows a single server
- How would you handle trending or breaking-news queries that spike suddenly?
- How do you personalize autocomplete results per user without adding latency?
- What happens if one of the trie shards goes down? How do you ensure availability?
- How would you filter out offensive or inappropriate suggestions?
- How does the system handle multiple languages and character sets?
More System Design interview questions
Also worth your time on this topic
Username Availability with Bloom Filters
Explain how you'd check username availability for a service with billions of users without hitting the database on every keystroke.
mid
Redis Caching Strategies for Scalable Applications
Implement production-ready caching patterns with Redis to dramatically improve application performance and scalability.
70 minutes
How to Exclude Directories from grep -R
Learn how to exclude specific directories when searching recursively with grep, including node_modules, .git, and other directories you want to skip.