System Design: Concepts, Patterns, Technologies

My reference for system design interviews. Three sections: core concepts (the building blocks), common patterns (recurring solutions), and key technologies (when to reach for what).

Scan the tables to refresh. Read the bold text for the key decisions and tradeoffs.

Core Concepts

Networking Essentials

Topic	Key Points
TCP vs UDP	TCP: reliable, ordered, connection-oriented (HTTP, databases). UDP: fast, no guarantees (video streaming, DNS).
HTTP/HTTPS	Request-response over TCP. HTTP/2 adds multiplexing. HTTP/3 uses QUIC (UDP-based). TLS for encryption.
WebSockets	Persistent bidirectional connection over TCP. Use for real-time (chat, live updates). Initiated via HTTP upgrade.
DNS	Domain to IP resolution. TTL controls caching. Can route by geography for latency.
Load Balancers	L4 (transport): routes by IP/port, fast, no inspection. L7 (application): routes by URL/headers, can do auth and rate limiting.

The key distinction in interviews is L4 vs L7. L4 is faster but dumb. L7 is slower but can make smart routing decisions based on content.

API Design

Style	Use When	Tradeoffs
REST	Public APIs, CRUD, simple resource models	Well-understood, cacheable. Overfetching/underfetching common.
gRPC	Service-to-service, low latency, streaming	Binary protobuf, fast, typed contracts. Not browser-friendly without proxy.
GraphQL	Client-driven queries, mobile apps needing flexible data	Single endpoint, no overfetching. Complexity on server, caching harder.

Key decisions: pagination (cursor-based for real-time data, offset for static), idempotency (POST with client-generated ID), versioning (URL path is simplest: /v1/resource).

Data Modeling

Approach	Strengths	Weaknesses
Relational (PostgreSQL)	ACID, joins, complex queries, strong consistency	Schema rigidity, harder to shard
Document (MongoDB)	Flexible schema, nested data, horizontal scaling	No joins, denormalized data can diverge
Wide-column (Cassandra)	Massive write throughput, time-series, multi-DC	Limited query patterns, must design around partition key
Key-value (Redis, DynamoDB)	Sub-ms latency, simple access patterns	No complex queries

Design your schema around how data is read, not how it’s logically organized. If reads far outnumber writes, denormalize. If you need joins, use relational.

Caching

Strategy	How It Works	Use When
Cache-aside	App checks cache first, falls back to DB on miss, fills cache after	General purpose, most common
Write-through	Write to cache and DB synchronously on every write	Need cache and DB always in sync
Write-back	Write to cache only, async flush to DB	Write-heavy, can tolerate data loss risk
Read-through	Cache itself fetches from DB on miss	Simplify app logic, cache acts as proxy

Cache-aside read path:
  App → Cache → HIT  → return
             → MISS → read DB → fill cache → return

Eviction: LRU (most common), TTL (simplest to reason about), LFU (frequency-based, good for skewed access).

Cache invalidation is the hard part. TTL is simplest: accept staleness up to N seconds. Event-driven invalidation (publish on write, subscribers evict) is more precise but more complex. When in doubt, start with TTL.

Sharding

Strategy	How It Works	Tradeoffs
Hash-based	Hash the partition key, mod by shard count	Even distribution, but range queries hit all shards
Range-based	Assign contiguous key ranges to shards	Efficient range queries, but hot ranges cause imbalance

Choose a partition key with high cardinality and even distribution. Bad key: country (skewed). Good key: user ID (uniform). Cross-shard queries are expensive. Resharding is painful without consistent hashing.

Replication

Strategy	How It Works	Tradeoffs
Leader-Follower	One leader handles writes, followers replicate and serve reads	Simple, but leader is a bottleneck. Replication lag means stale reads.
Multi-Leader	Multiple nodes accept writes, replicate to each other	Better write availability across regions. Conflict resolution is hard.
Leaderless (Quorum)	Read/write to multiple nodes. Consistency when W + R > N.	High availability, tunable consistency. More complex client logic.

Leader-follower is the default. Most SQL databases use it. Go multi-leader for multi-region writes. Go leaderless (Dynamo-style) when you need high availability and can handle eventual consistency.

Consistent Hashing

Hash Ring (linear view):

  0 ─── NA ─── NB ─── NC ─── ND ─── 0
         ↑   ↑         ↑
        k1  k2        k3

  Keys route to the next node clockwise:
  k1 → NA    k2 → NB    k3 → NC

Concept	Detail
Hash ring	Both nodes and keys are hashed to positions on a ring. Each key routes to the next node clockwise.
Adding a node	Only keys between the new node and its predecessor move. Minimal redistribution.
Virtual nodes	Each physical node maps to multiple ring positions. Improves balance.
Used in	Distributed caches, DynamoDB partitioning, Cassandra, CDN routing.

The point is minimal disruption. Adding or removing a node only affects its immediate neighbors, not a full reshuffle.

CAP Theorem

Choice	Behavior During Partition	Examples
CP (Consistency)	Reject requests rather than serve stale data	ZooKeeper, HBase, etcd
AP (Availability)	Serve requests, accept eventual consistency	Cassandra, DynamoDB (default), CouchDB

CAP only forces a choice during network partitions. When the network is healthy, you get both C and A. Most production systems choose AP with tunable consistency: DynamoDB lets you choose strong or eventual per read, Cassandra lets you set consistency level per query.

Rate Limiting

Algorithm	How It Works	Tradeoffs
Token bucket	Tokens added at fixed rate, each request costs one	Allows bursts up to bucket size. Most common.
Sliding window log	Store timestamp of each request, count within window	Precise, but high memory at scale
Sliding window counter	Weighted count from current + previous window	Memory-efficient approximation. Good enough for most cases.
Fixed window counter	Count requests per fixed time window (e.g., per minute)	Simplest. Spike at window boundaries (double rate across boundary).

Token bucket is the standard choice. It handles bursts naturally and is what most API gateways implement. Rate limit by IP for anonymous traffic, by API key or user ID for authenticated traffic.

Database Indexing

Index Type	Best For	How It Works
B-tree	Range queries, sorted access	Balanced tree, O(log n). Default in most databases.
Hash	Exact-match lookups	O(1) lookup. No range queries.
Composite	Multi-column queries	Leftmost prefix rule: index on (a, b, c) supports (a), (a, b), (a, b, c).
Covering	Avoiding table lookups	Index includes all columns the query needs, no row fetch required.

Indexes speed reads but slow writes. Every insert and update must update every relevant index. Don’t over-index. Start with the queries you need to optimize, add indexes for those.

Numbers to Know

Latency:

Operation	Time
L1 cache reference	~1 ns
L2 cache reference	~4 ns
RAM access	~100 ns
SSD random read	~100 μs
HDD seek	~10 ms
Same-datacenter round trip	~0.5 ms
Cross-continent round trip	~150 ms

Throughput (order of magnitude):

Component	Ballpark
Single web server	~10K QPS
Single SQL database	~10K QPS
Redis	~100K QPS
Kafka (per broker)	~100K msgs/sec

Storage:

Unit	Size
1 tweet (text + metadata)	~1 KB
1 image (compressed)	~200 KB
1 minute of video (720p)	~5 MB

Time conversions for estimation:

Period	Seconds
1 day	~100K
1 month	~2.5M
1 year	~30M

Back-of-Envelope Estimation

The goal is order of magnitude, not precision. 2x off is fine. 100x off means you picked the wrong architecture.

Method:

Start with users or requests (DAU, writes/day, reads/day)
Estimate per-unit size or per-unit cost
Multiply out: per second, per day, per year
Round aggressively

Worked example: URL shortener storage

100M new URLs per month
Each record: short code (7 B) + long URL (~200 B) + metadata (~50 B) ≈ 250 B
Monthly storage: 100M x 250 B = 25 GB/month
5-year retention: 25 GB x 60 = 1.5 TB total
Read QPS: 10:1 read/write → 1B reads/month → 1B / 2.5M sec ≈ 400 QPS

1.5 TB fits on a single machine. 400 QPS is trivially handled. This tells you the bottleneck isn’t storage or throughput, it’s latency (caching helps) and availability (replication helps).

Common Patterns

Real-time Updates

Approach	How It Works	Use When
WebSockets	Persistent bidirectional connection	Chat, collaborative editing, gaming
Server-Sent Events	Server pushes over HTTP, one-directional	Live feeds, notifications, dashboards
Long polling	Client sends request, server holds until data available	Fallback when WebSockets not supported

For feeds and timelines, the key decision is fan-out strategy:

Strategy	How It Works	Use When
Fan-out on write	Push updates to all subscriber inboxes at write time	Most users have small follower counts
Fan-out on read	Pull and merge updates at read time	Some users have millions of followers (celebrities)
Hybrid	Fan-out on write for normal users, fan-out on read for high-follower users	Twitter-scale systems

Dealing with Contention

Approach	How It Works	Use When
Optimistic locking	Read version, write with version check, retry on conflict	Low contention (most writes succeed)
Pessimistic locking	Acquire lock before read-modify-write	High contention (conflicts are expensive)
CAS (Compare-and-Swap)	Atomic conditional update at the DB level	Counters, inventory, simple state transitions
Queue writes	Serialize concurrent writes through a queue	Ordering matters, or writes need complex processing

Default to optimistic locking. Switch to pessimistic or queuing when conflict rates are high enough that retries become wasteful.

Multi-step Processes

Pattern	How It Works	Use When
Saga (choreography)	Each service emits events, next service reacts	Loosely coupled services, simple flows
Saga (orchestration)	Central coordinator directs each step	Complex flows, need visibility into process state
Two-phase commit	Coordinator asks all to prepare, then commit/abort	Strong consistency across services. Avoid if possible (slow, fragile).

Idempotency is the foundation. Every step must be safe to retry. Use client-generated UUIDs as idempotency keys so retries don’t create duplicates. Every compensating action (undo) must also be idempotent.

Scaling Reads

Technique	How It Works	Tradeoff
Read replicas	Route reads to follower replicas	Replication lag (stale reads)
Caching	Cache hot data in Redis or Memcached	Invalidation complexity
CDN	Cache static/semi-static content at the edge	Only for cacheable content
Denormalization	Pre-join data at write time	Faster reads, harder writes
Materialized views	Precomputed query results, refreshed periodically	Stale between refreshes

Scaling Writes

Technique	How It Works	Tradeoff
Sharding	Partition data across nodes	Cross-shard queries, resharding pain
Write-ahead log	Append-only log, apply changes asynchronously	Sequential I/O is fast, crash recovery
Batching	Buffer writes, flush in bulk	Higher throughput, higher per-write latency
Async processing	Accept write into queue, process later	Fast ack to client, eventual consistency
Event sourcing	Store events as source of truth, derive state	Full audit trail, complex to query current state

Handling Large Blobs

Concern	Approach
Storage	Object store (S3, GCS). Never store blobs in your database.
Uploads	Presigned URLs for direct client-to-S3 upload. Chunked uploads for large files (resumable).
Serving	CDN in front of object store. Signed URLs for access control.
Processing	Async pipeline triggered by upload event (thumbnails, transcoding, virus scan).

Managing Long Running Tasks

Concern	Approach
Dispatch	Message queue (SQS, Kafka) decouples producer from worker
Execution	Worker pool pulls from queue, processes independently
Reliability	Checkpointing for progress. Idempotent retries on failure.
Failure	Dead letter queue for messages that repeatedly fail. Alert on DLQ depth.
Visibility	Status tracking in DB. Status endpoint for clients to poll.

Key Technologies

Technology	What It Is	Reach For It When
Redis	In-memory key-value store	Caching, rate limiting, leaderboards, pub/sub, session storage. Sub-ms reads.
Elasticsearch	Distributed search engine	Full-text search, log aggregation (ELK), faceted search, autocomplete.
Kafka	Distributed event streaming	Event-driven architecture, decoupling services, high-throughput messaging, replay.
API Gateway	Reverse proxy at the edge	Rate limiting, auth, routing, SSL termination, versioning.
Cassandra	Wide-column distributed DB	High write throughput, time-series, multi-DC replication.
DynamoDB	Managed key-value / document DB	Predictable single-digit ms latency at any scale, serverless backends.
PostgreSQL	Relational database	ACID, complex queries, joins. Default choice until you outgrow it.
Flink	Stream processing framework	Real-time aggregations, windowed computations, complex event processing.
ZooKeeper	Distributed coordination	Leader election, distributed locks, config management. Being replaced by etcd in newer systems.

Resources:

Hello Interview - System Design - Structured course with question breakdowns
System Design Primer - Comprehensive open-source reference
Designing Data-Intensive Applications - The foundational book on distributed systems