My reference for system design interviews. Three sections: core concepts (the building blocks), common patterns (recurring solutions), and key technologies (when to reach for what).

Scan the tables to refresh. Read the bold text for the key decisions and tradeoffs.


Core Concepts

Networking Essentials

Topic Key Points
TCP vs UDP TCP: reliable, ordered, connection-oriented (HTTP, databases). UDP: fast, no guarantees (video streaming, DNS).
HTTP/HTTPS Request-response over TCP. HTTP/2 adds multiplexing. HTTP/3 uses QUIC (UDP-based). TLS for encryption.
WebSockets Persistent bidirectional connection over TCP. Use for real-time (chat, live updates). Initiated via HTTP upgrade.
DNS Domain to IP resolution. TTL controls caching. Can route by geography for latency.
Load Balancers L4 (transport): routes by IP/port, fast, no inspection. L7 (application): routes by URL/headers, can do auth and rate limiting.

The key distinction in interviews is L4 vs L7. L4 is faster but dumb. L7 is slower but can make smart routing decisions based on content.

API Design

Style Use When Tradeoffs
REST Public APIs, CRUD, simple resource models Well-understood, cacheable. Overfetching/underfetching common.
gRPC Service-to-service, low latency, streaming Binary protobuf, fast, typed contracts. Not browser-friendly without proxy.
GraphQL Client-driven queries, mobile apps needing flexible data Single endpoint, no overfetching. Complexity on server, caching harder.

Key decisions: pagination (cursor-based for real-time data, offset for static), idempotency (POST with client-generated ID), versioning (URL path is simplest: /v1/resource).

Data Modeling

Approach Strengths Weaknesses
Relational (PostgreSQL) ACID, joins, complex queries, strong consistency Schema rigidity, harder to shard
Document (MongoDB) Flexible schema, nested data, horizontal scaling No joins, denormalized data can diverge
Wide-column (Cassandra) Massive write throughput, time-series, multi-DC Limited query patterns, must design around partition key
Key-value (Redis, DynamoDB) Sub-ms latency, simple access patterns No complex queries

Design your schema around how data is read, not how it’s logically organized. If reads far outnumber writes, denormalize. If you need joins, use relational.

Caching

Strategy How It Works Use When
Cache-aside App checks cache first, falls back to DB on miss, fills cache after General purpose, most common
Write-through Write to cache and DB synchronously on every write Need cache and DB always in sync
Write-back Write to cache only, async flush to DB Write-heavy, can tolerate data loss risk
Read-through Cache itself fetches from DB on miss Simplify app logic, cache acts as proxy
Cache-aside read path:
  App β†’ Cache β†’ HIT  β†’ return
             β†’ MISS β†’ read DB β†’ fill cache β†’ return

Eviction: LRU (most common), TTL (simplest to reason about), LFU (frequency-based, good for skewed access).

Cache invalidation is the hard part. TTL is simplest: accept staleness up to N seconds. Event-driven invalidation (publish on write, subscribers evict) is more precise but more complex. When in doubt, start with TTL.

Sharding

Strategy How It Works Tradeoffs
Hash-based Hash the partition key, mod by shard count Even distribution, but range queries hit all shards
Range-based Assign contiguous key ranges to shards Efficient range queries, but hot ranges cause imbalance

Choose a partition key with high cardinality and even distribution. Bad key: country (skewed). Good key: user ID (uniform). Cross-shard queries are expensive. Resharding is painful without consistent hashing.

Replication

Strategy How It Works Tradeoffs
Leader-Follower One leader handles writes, followers replicate and serve reads Simple, but leader is a bottleneck. Replication lag means stale reads.
Multi-Leader Multiple nodes accept writes, replicate to each other Better write availability across regions. Conflict resolution is hard.
Leaderless (Quorum) Read/write to multiple nodes. Consistency when W + R > N. High availability, tunable consistency. More complex client logic.

Leader-follower is the default. Most SQL databases use it. Go multi-leader for multi-region writes. Go leaderless (Dynamo-style) when you need high availability and can handle eventual consistency.

Consistent Hashing

Hash Ring (linear view):

  0 ─── NA ─── NB ─── NC ─── ND ─── 0
         ↑   ↑         ↑
        k1  k2        k3

  Keys route to the next node clockwise:
  k1 β†’ NA    k2 β†’ NB    k3 β†’ NC
Concept Detail
Hash ring Both nodes and keys are hashed to positions on a ring. Each key routes to the next node clockwise.
Adding a node Only keys between the new node and its predecessor move. Minimal redistribution.
Virtual nodes Each physical node maps to multiple ring positions. Improves balance.
Used in Distributed caches, DynamoDB partitioning, Cassandra, CDN routing.

The point is minimal disruption. Adding or removing a node only affects its immediate neighbors, not a full reshuffle.

CAP Theorem

Choice Behavior During Partition Examples
CP (Consistency) Reject requests rather than serve stale data ZooKeeper, HBase, etcd
AP (Availability) Serve requests, accept eventual consistency Cassandra, DynamoDB (default), CouchDB

CAP only forces a choice during network partitions. When the network is healthy, you get both C and A. Most production systems choose AP with tunable consistency: DynamoDB lets you choose strong or eventual per read, Cassandra lets you set consistency level per query.

Rate Limiting

Algorithm How It Works Tradeoffs
Token bucket Tokens added at fixed rate, each request costs one Allows bursts up to bucket size. Most common.
Sliding window log Store timestamp of each request, count within window Precise, but high memory at scale
Sliding window counter Weighted count from current + previous window Memory-efficient approximation. Good enough for most cases.
Fixed window counter Count requests per fixed time window (e.g., per minute) Simplest. Spike at window boundaries (double rate across boundary).

Token bucket is the standard choice. It handles bursts naturally and is what most API gateways implement. Rate limit by IP for anonymous traffic, by API key or user ID for authenticated traffic.

Database Indexing

Index Type Best For How It Works
B-tree Range queries, sorted access Balanced tree, O(log n). Default in most databases.
Hash Exact-match lookups O(1) lookup. No range queries.
Composite Multi-column queries Leftmost prefix rule: index on (a, b, c) supports (a), (a, b), (a, b, c).
Covering Avoiding table lookups Index includes all columns the query needs, no row fetch required.

Indexes speed reads but slow writes. Every insert and update must update every relevant index. Don’t over-index. Start with the queries you need to optimize, add indexes for those.

Numbers to Know

Latency:

Operation Time
L1 cache reference ~1 ns
L2 cache reference ~4 ns
RAM access ~100 ns
SSD random read ~100 ΞΌs
HDD seek ~10 ms
Same-datacenter round trip ~0.5 ms
Cross-continent round trip ~150 ms

Throughput (order of magnitude):

Component Ballpark
Single web server ~10K QPS
Single SQL database ~10K QPS
Redis ~100K QPS
Kafka (per broker) ~100K msgs/sec

Storage:

Unit Size
1 tweet (text + metadata) ~1 KB
1 image (compressed) ~200 KB
1 minute of video (720p) ~5 MB

Time conversions for estimation:

Period Seconds
1 day ~100K
1 month ~2.5M
1 year ~30M

Back-of-Envelope Estimation

The goal is order of magnitude, not precision. 2x off is fine. 100x off means you picked the wrong architecture.

Method:

  1. Start with users or requests (DAU, writes/day, reads/day)
  2. Estimate per-unit size or per-unit cost
  3. Multiply out: per second, per day, per year
  4. Round aggressively

Worked example: URL shortener storage

  • 100M new URLs per month
  • Each record: short code (7 B) + long URL (~200 B) + metadata (~50 B) β‰ˆ 250 B
  • Monthly storage: 100M x 250 B = 25 GB/month
  • 5-year retention: 25 GB x 60 = 1.5 TB total
  • Read QPS: 10:1 read/write β†’ 1B reads/month β†’ 1B / 2.5M sec β‰ˆ 400 QPS

1.5 TB fits on a single machine. 400 QPS is trivially handled. This tells you the bottleneck isn’t storage or throughput, it’s latency (caching helps) and availability (replication helps).


Common Patterns

Real-time Updates

Approach How It Works Use When
WebSockets Persistent bidirectional connection Chat, collaborative editing, gaming
Server-Sent Events Server pushes over HTTP, one-directional Live feeds, notifications, dashboards
Long polling Client sends request, server holds until data available Fallback when WebSockets not supported

For feeds and timelines, the key decision is fan-out strategy:

Strategy How It Works Use When
Fan-out on write Push updates to all subscriber inboxes at write time Most users have small follower counts
Fan-out on read Pull and merge updates at read time Some users have millions of followers (celebrities)
Hybrid Fan-out on write for normal users, fan-out on read for high-follower users Twitter-scale systems

Dealing with Contention

Approach How It Works Use When
Optimistic locking Read version, write with version check, retry on conflict Low contention (most writes succeed)
Pessimistic locking Acquire lock before read-modify-write High contention (conflicts are expensive)
CAS (Compare-and-Swap) Atomic conditional update at the DB level Counters, inventory, simple state transitions
Queue writes Serialize concurrent writes through a queue Ordering matters, or writes need complex processing

Default to optimistic locking. Switch to pessimistic or queuing when conflict rates are high enough that retries become wasteful.

Multi-step Processes

Pattern How It Works Use When
Saga (choreography) Each service emits events, next service reacts Loosely coupled services, simple flows
Saga (orchestration) Central coordinator directs each step Complex flows, need visibility into process state
Two-phase commit Coordinator asks all to prepare, then commit/abort Strong consistency across services. Avoid if possible (slow, fragile).

Idempotency is the foundation. Every step must be safe to retry. Use client-generated UUIDs as idempotency keys so retries don’t create duplicates. Every compensating action (undo) must also be idempotent.

Scaling Reads

Technique How It Works Tradeoff
Read replicas Route reads to follower replicas Replication lag (stale reads)
Caching Cache hot data in Redis or Memcached Invalidation complexity
CDN Cache static/semi-static content at the edge Only for cacheable content
Denormalization Pre-join data at write time Faster reads, harder writes
Materialized views Precomputed query results, refreshed periodically Stale between refreshes

Scaling Writes

Technique How It Works Tradeoff
Sharding Partition data across nodes Cross-shard queries, resharding pain
Write-ahead log Append-only log, apply changes asynchronously Sequential I/O is fast, crash recovery
Batching Buffer writes, flush in bulk Higher throughput, higher per-write latency
Async processing Accept write into queue, process later Fast ack to client, eventual consistency
Event sourcing Store events as source of truth, derive state Full audit trail, complex to query current state

Handling Large Blobs

Concern Approach
Storage Object store (S3, GCS). Never store blobs in your database.
Uploads Presigned URLs for direct client-to-S3 upload. Chunked uploads for large files (resumable).
Serving CDN in front of object store. Signed URLs for access control.
Processing Async pipeline triggered by upload event (thumbnails, transcoding, virus scan).

Managing Long Running Tasks

Concern Approach
Dispatch Message queue (SQS, Kafka) decouples producer from worker
Execution Worker pool pulls from queue, processes independently
Reliability Checkpointing for progress. Idempotent retries on failure.
Failure Dead letter queue for messages that repeatedly fail. Alert on DLQ depth.
Visibility Status tracking in DB. Status endpoint for clients to poll.

Key Technologies

Technology What It Is Reach For It When
Redis In-memory key-value store Caching, rate limiting, leaderboards, pub/sub, session storage. Sub-ms reads.
Elasticsearch Distributed search engine Full-text search, log aggregation (ELK), faceted search, autocomplete.
Kafka Distributed event streaming Event-driven architecture, decoupling services, high-throughput messaging, replay.
API Gateway Reverse proxy at the edge Rate limiting, auth, routing, SSL termination, versioning.
Cassandra Wide-column distributed DB High write throughput, time-series, multi-DC replication.
DynamoDB Managed key-value / document DB Predictable single-digit ms latency at any scale, serverless backends.
PostgreSQL Relational database ACID, complex queries, joins. Default choice until you outgrow it.
Flink Stream processing framework Real-time aggregations, windowed computations, complex event processing.
ZooKeeper Distributed coordination Leader election, distributed locks, config management. Being replaced by etcd in newer systems.


Resources: