Username Availability with Bloom Filters

Explain how you'd check username availability for a service with billions of users without hitting the database on every keystroke.

Back to mid questions

mid

intermediate

System Design

Question

Explain how you'd check username availability for a service with billions of users without hitting the database on every keystroke.

Answer

Use a Bloom filter as the first layer. A Bloom filter is a space-efficient probabilistic data structure that can tell you with certainty that a username is available (not in the set) or with high probability that it is taken. You load all existing usernames into the filter at startup. When a user types a username, you check the Bloom filter first. If the filter says 'not present,' the username is definitely available and you skip the database entirely. If the filter says 'possibly present,' you then query the database to confirm. For billions of usernames, a Bloom filter with a 1% false positive rate needs roughly 1.2 GB of memory, which fits comfortably on a single server. This eliminates the vast majority of database lookups.

Why This Matters

This question tests whether candidates can identify the right data structure for a probabilistic membership check at scale. Bloom filters appear in many production systems: Chrome uses one to check malicious URLs, databases use them to skip unnecessary disk reads, and CDNs use them for cache filtering. The interviewer wants to see that you understand the false-positive tradeoff, can reason about memory usage, and know when an approximate answer is good enough to avoid an expensive exact lookup.

Code Examples

Simple Bloom filter implementation

python

Username availability API with Bloom filter layer

python

Common Mistakes

Saying you would query the database on every keystroke for real-time feedback
Confusing false positives and false negatives: Bloom filters never produce false negatives, only false positives
Not knowing how to estimate the memory required for a given number of items and false positive rate
Forgetting that Bloom filters do not support deletion without switching to a counting variant
Proposing a simple HashSet, which would require far more memory for billions of entries

Follow-up Questions

Interviewers often ask these as follow-up questions

What happens when a new user registers? How do you keep the Bloom filter in sync?
Can you remove an element from a Bloom filter? If not, what alternatives exist?
How would you handle the filter across multiple application servers?
What is a counting Bloom filter, and when would you use one instead?
How does a cuckoo filter compare to a Bloom filter for this use case?

Also worth your time on this topic

Quiz

Back-of-the-Envelope Estimation Quiz

The napkin-math skill that makes or breaks a system design interview: turning users into QPS, data into storage, traffic into bandwidth, and knowing the latency numbers every engineer should have memorised.

16-22 minutes

Interview

Instant Credit Card Validation

How does a credit card form validate numbers instantly, before even contacting the bank?

junior

Article

How Does It Work So Fast? The Engineering Behind Instant UI Responses

Credit card validation, username checks, autocomplete, URL shorteners - they all feel instant. Here is what is actually happening under the hood in each case.

Username Availability with Bloom Filters

More System Design interview questions

Also worth your time on this topic

Back-of-the-Envelope Estimation Quiz

Instant Credit Card Validation

How Does It Work So Fast? The Engineering Behind Instant UI Responses