Username Availability with Bloom Filters
Explain how you'd check username availability for a service with billions of users without hitting the database on every keystroke.
Explain how you'd check username availability for a service with billions of users without hitting the database on every keystroke.
Use a Bloom filter as the first layer. A Bloom filter is a space-efficient probabilistic data structure that can tell you with certainty that a username is available (not in the set) or with high probability that it is taken. You load all existing usernames into the filter at startup. When a user types a username, you check the Bloom filter first. If the filter says 'not present,' the username is definitely available and you skip the database entirely. If the filter says 'possibly present,' you then query the database to confirm. For billions of usernames, a Bloom filter with a 1% false positive rate needs roughly 1.2 GB of memory, which fits comfortably on a single server. This eliminates the vast majority of database lookups.
This question tests whether candidates can identify the right data structure for a probabilistic membership check at scale. Bloom filters appear in many production systems: Chrome uses one to check malicious URLs, databases use them to skip unnecessary disk reads, and CDNs use them for cache filtering. The interviewer wants to see that you understand the false-positive tradeoff, can reason about memory usage, and know when an approximate answer is good enough to avoid an expensive exact lookup.
Simple Bloom filter implementation
Username availability API with Bloom filter layer
- Saying you would query the database on every keystroke for real-time feedback
- Confusing false positives and false negatives: Bloom filters never produce false negatives, only false positives
- Not knowing how to estimate the memory required for a given number of items and false positive rate
- Forgetting that Bloom filters do not support deletion without switching to a counting variant
- Proposing a simple HashSet, which would require far more memory for billions of entries
- What happens when a new user registers? How do you keep the Bloom filter in sync?
- Can you remove an element from a Bloom filter? If not, what alternatives exist?
- How would you handle the filter across multiple application servers?
- What is a counting Bloom filter, and when would you use one instead?
- How does a cuckoo filter compare to a Bloom filter for this use case?
More System Design interview questions
Also worth your time on this topic
Instant Credit Card Validation
How does a credit card form validate numbers instantly, before even contacting the bank?
junior
How Does It Work So Fast? The Engineering Behind Instant UI Responses
Credit card validation, username checks, autocomplete, URL shorteners - they all feel instant. Here is what is actually happening under the hood in each case.
Redis Caching Strategies for Scalable Applications
Implement production-ready caching patterns with Redis to dramatically improve application performance and scalability.
70 minutes