Scalable System Design

Strategy	Description	Best For
Vertical (Scale Up)	Bigger machine, more CPU/RAM	Simpler ops, stateful services
Horizontal (Scale Out)	More machines, same size	Stateless services, web tiers

Algorithm	How It Works	Use Case
Round Robin	Cycles through servers in order	Equal-capacity servers
Least Connections	Routes to server with fewest active connections	Varying request durations
IP Hash	Hashes client IP → sticky sessions	Session-sensitive apps
Weighted RR	Routes proportional to server weight	Mixed server capacities

Pattern	Behavior	Tradeoff
Write-Through	Write to cache and DB simultaneously	Consistent, higher write latency
Write-Behind (Async)	Write to cache first, DB asynchronously	Faster writes, risk of data loss
Write-Around	Skip cache, write directly to DB	Avoids cache pollution

Problem	Description	Solution
Cache Stampede	Many requests hit DB on simultaneous cache miss	Mutex lock / probabilistic early expiry
Cache Penetration	Requests for non-existent keys bypass cache	Bloom filter + cache null results
Cache Avalanche	Many keys expire at the same time	Jitter TTL values

Strategy	How	Risk
Range-based	Shard by ID range	Hot spots on sequential writes
Hash-based	`hash(key) % N` shards	Rebalancing on resize
Directory-based	Lookup table maps keys → shards	Lookup overhead

	SQL (RDBMS)	NoSQL
Schema	Rigid, relational	Flexible, document/KV/graph
Scaling	Vertical (primarily)	Horizontal (built-in)
Transactions	ACID	Eventual consistency (usually)
Best for	Complex queries, integrity	High write throughput, flexibility

Guarantee	Meaning
Consistency (C)	Every read returns the most recent write
Availability (A)	Every request receives a response (may be stale)
Partition Tolerance (P)	System continues despite network partitions

System	Partition	Else
DynamoDB	AP	EL (low latency)
PostgreSQL	CP	EC (consistent)
Cassandra	AP	EL

Feature	Apache Kafka	AWS SQS
Model	Log-based streaming	Queue (pull-based)
Retention	Days/weeks (configurable)	14 days max
Consumer model	Consumer groups, offset tracking	Competing consumers
Ordering	Ordered per partition	FIFO queue option
Throughput	Extremely high	High
Best for	Event sourcing, streaming pipelines	Task queues, decoupling

Algorithm	How It Works	Pros / Cons
Token Bucket	Tokens refill at fixed rate; request consumes one	Allows bursts; smooth average rate
Leaky Bucket	Requests drain at fixed rate	Smooth output; drops bursts
Fixed Window	Count reqs per time window	Simple; boundary burst problem
Sliding Window Log	Log timestamps, count within rolling window	Accurate; memory-intensive
Sliding Window Counter	Hybrid of fixed windows with interpolation	Efficient + accurate

Signal	Action
Teams blocked by shared code	Split the bounded context
One component needs different scaling	Extract as a service
Independent deployment needed	Service boundary
Distinct data ownership	Separate DB + service

Term	Meaning	Example
SLI (Indicator)	The metric being measured	Request success rate
SLO (Objective)	Internal target for the SLI	99.9% success over 30 days
SLA (Agreement)	External contractual commitment	99.5% with penalty clauses

Putting It All Together

A Scalable Web System

         ┌─────────────────────────────────────────────────┐
         │                    CDN / Edge                   │
         └────────────────────┬────────────────────────────┘
                              ▼
                   ┌─────────────────┐
                   │  Load Balancer  │  ◄── Rate Limiting
                   └────────┬────────┘
             ┌──────────────┼──────────────┐
             ▼              ▼              ▼
       ┌──────────┐   ┌──────────┐   ┌──────────┐
       │ App Svc  │   │ App Svc  │   │ App Svc  │
       └──────────┘   └──────────┘   └──────────┘
             │              │              │
             ▼              ▼              ▼
      ┌─────────┐    ┌────────────┐  ┌──────────────┐
      │  Redis  │    │  Primary   │  │ Message Queue│
      │  Cache  │    │  DB + Reps │  │   (Kafka)    │
      └─────────┘    └────────────┘  └──────────────┘

Resource	Approximate Latency
L1 cache read	~1 ns
L2 cache read	~10 ns
RAM read	~100 ns
SSD random read	~100 µs
HDD random read	~10 ms
Network (same DC)	~0.5 ms
Network (cross-region)	~100–300 ms

Scalable System Design

Core Concepts for Building at Scale

Agenda

Scalability Fundamentals

The Two Axes of Scale

Key Definitions

Load Balancing

Why It Matters

Algorithms

Layers

Load Balancing — Architecture

Caching Strategies

The Cache Hierarchy

Write Strategies

Read Strategies

Caching — Eviction & Pitfalls

Eviction Policies

Common Problems

Database Scaling

Replication

Sharding (Horizontal Partitioning)

Database Scaling — Patterns

Indexing Tips

SQL vs. NoSQL at Scale

CAP Theorem

You Can Only Guarantee Two of Three

CAP — Real World Examples

CP Systems (Consistency over Availability)

AP Systems (Availability over Consistency)

PACELC Extension

Message Queues & Event Streaming

Why Async Matters

Use Cases

Message Queues — Kafka vs. SQS

Delivery Guarantees

Rate Limiting & Throttling

Why Rate Limit?

Algorithms

CDNs & Edge Caching

Content Delivery Networks

What to Put on a CDN

Cache Invalidation Strategies

Microservices vs. Monolith

The Spectrum

When to Split

Microservices — Communication Patterns

Synchronous (Request/Response)

Asynchronous (Event-Driven)

Service Discovery

Observability: Metrics, Logs, Traces

The Three Pillars

Key Metrics to Track (USE Method)

Observability — SLOs & Alerting

SLI / SLO / SLA

Error Budget

Putting It All Together

A Scalable Web System

Key Takeaways

Reference: Back-of-Envelope Estimation

Useful Numbers

Storage Units

Traffic Estimation