Skip to main content

Username Availability with Bloom Filters

Explain how you'd check username availability for a service with billions of users without hitting the database on every keystroke.

mid
intermediate
System Design
Question

Explain how you'd check username availability for a service with billions of users without hitting the database on every keystroke.

Answer

Use a Bloom filter as the first layer. A Bloom filter is a space-efficient probabilistic data structure that can tell you with certainty that a username is available (not in the set) or with high probability that it is taken. You load all existing usernames into the filter at startup. When a user types a username, you check the Bloom filter first. If the filter says 'not present,' the username is definitely available and you skip the database entirely. If the filter says 'possibly present,' you then query the database to confirm. For billions of usernames, a Bloom filter with a 1% false positive rate needs roughly 1.2 GB of memory, which fits comfortably on a single server. This eliminates the vast majority of database lookups.

Why This Matters

This question tests whether candidates can identify the right data structure for a probabilistic membership check at scale. Bloom filters appear in many production systems: Chrome uses one to check malicious URLs, databases use them to skip unnecessary disk reads, and CDNs use them for cache filtering. The interviewer wants to see that you understand the false-positive tradeoff, can reason about memory usage, and know when an approximate answer is good enough to avoid an expensive exact lookup.

Code Examples

Simple Bloom filter implementation

python

Username availability API with Bloom filter layer

python
Common Mistakes
  • Saying you would query the database on every keystroke for real-time feedback
  • Confusing false positives and false negatives: Bloom filters never produce false negatives, only false positives
  • Not knowing how to estimate the memory required for a given number of items and false positive rate
  • Forgetting that Bloom filters do not support deletion without switching to a counting variant
  • Proposing a simple HashSet, which would require far more memory for billions of entries
Follow-up Questions
Interviewers often ask these as follow-up questions
  • What happens when a new user registers? How do you keep the Bloom filter in sync?
  • Can you remove an element from a Bloom filter? If not, what alternatives exist?
  • How would you handle the filter across multiple application servers?
  • What is a counting Bloom filter, and when would you use one instead?
  • How does a cuckoo filter compare to a Bloom filter for this use case?
Tags
system-design
bloom-filters
data-structures
scalability
performance
Sponsored
Carbon Ads

More System Design interview questions

Also worth your time on this topic