SYSTEM DESIGN
System design is about architecting large-scale software systems. It’s how you build applications that handle millions of users, massive data, and stay reliable.
Core Principles
Scalability – Handle growing load (users, data, traffic) Reliability – System works correctly even when things fail Availability – System is operational and accessible Maintainability – Easy to update, debug, and extend Performance – Fast response times, efficient resource use
Scalability Strategies
Vertical Scaling (Scale Up)
Add more power to existing machines (CPU, RAM, disk)
Pros: Simple, no code changes Cons: Hardware limits, expensive, single point of failure
Horizontal Scaling (Scale Out)
Add more machines to distribute load
Pros: Nearly unlimited, cost-effective with commodity hardware Cons: Complexity, need load balancing, data consistency challenges
Key System Design Concepts
1. Load Balancing
Distributes incoming requests across multiple servers
Algorithms:
- Round Robin: Request 1 → Server A, Request 2 → Server B, Request 3 → Server C
- Least Connections: Send to server with fewest active connections
- IP Hash: Same client always goes to same server (session affinity)
- Weighted: Send more traffic to more powerful servers
Types:
- Layer 4 (Transport): Routes based on IP/port (fast, simple)
- Layer 7 (Application): Routes based on content/cookies (flexible, slower)
Example Architecture:
Load Balancer
|
+----------------+----------------+
| | |
Server 1 Server 2 Server 3
2. Caching
Store frequently accessed data in fast storage
Cache Levels:
- Client-side: Browser cache
- CDN: Cache static assets (images, CSS, JS) near users
- Application: Redis, Memcached
- Database: Query result cache
Cache Strategies:
Cache-Aside (Lazy Loading):
1. App checks cache
2. If miss, fetch from database
3. Store in cache
4. Return data
Write-Through:
1. Write to cache
2. Cache writes to database
3. Return success
Write-Behind:
1. Write to cache
2. Return success immediately
3. Cache writes to database asynchronously
Cache Eviction Policies:
- LRU (Least Recently Used): Remove least accessed items
- LFU (Least Frequently Used): Remove items accessed least often
- FIFO: Remove oldest items first
- TTL (Time To Live): Expire after set time
3. Database Strategies
Replication:
- Master-Slave: Master handles writes, slaves handle reads
- Master-Master: Multiple masters, both read and write
Partitioning/Sharding: Split data across multiple databases
Horizontal Sharding:
Users 1-1M → Database 1
Users 1M-2M → Database 2
Users 2M-3M → Database 3
Sharding Strategies:
- Range-based: User IDs 0-1000, 1001-2000, etc.
- Hash-based: hash(user_id) % num_shards
- Geographic: US users, EU users, Asia users
- Directory-based: Lookup table maps keys to shards
SQL vs NoSQL:
SQL (Relational):
- Structured data with relationships
- ACID transactions
- Complex queries (JOINs)
- Examples: PostgreSQL, MySQL
NoSQL:
- Document: MongoDB (JSON documents)
- Key-Value: Redis, DynamoDB (fast lookups)
- Column-family: Cassandra (wide tables, high write throughput)
- Graph: Neo4j (relationships are first-class)
Use SQL by default, NoSQL when you need specific benefits (scalability, flexibility, speed)
4. CAP Theorem
You can only have 2 of 3:
Consistency: All nodes see same data at same time Availability: Every request gets a response Partition Tolerance: System works despite network failures
Real-world choices:
- CP: MongoDB, HBase (consistent, may be unavailable during partitions)
- AP: Cassandra, DynamoDB (available, eventual consistency)
- CA: Traditional RDBMS (but networks always partition, so this is theoretical)
5. Message Queues
Asynchronous communication between services
Benefits:
- Decouple services
- Handle traffic spikes
- Retry failed operations
- Process tasks in background
Popular Options:
- RabbitMQ
- Apache Kafka (high-throughput streaming)
- AWS SQS
- Redis (simple pub/sub)
Example:
User uploads video → Queue → Worker processes video → Queue → Worker sends notification
6. Microservices vs Monolith
Monolith:
- Single codebase, deployed as one unit
- Simple to develop initially
- Hard to scale, deploy, maintain at scale
Microservices:
- Multiple small services, each with specific responsibility
- Independent deployment and scaling
- More complex (need service discovery, API gateway, monitoring)
When to use microservices:
- Large team
- Need to scale parts independently
- Different tech stacks make sense
- Want faster deployment cycles
7. API Design
REST:
GET /users - List users
GET /users/123 - Get user
POST /users - Create user
PUT /users/123 - Update user
DELETE /users/123 - Delete user
GraphQL:
query {
user(id: 123) {
name
email
posts {
title
}
}
}
Client requests exactly what it needs, single endpoint
gRPC:
- Binary protocol (faster than JSON)
- Strong typing with Protocol Buffers
- Bi-directional streaming
8. Rate Limiting
Prevent abuse and ensure fair usage
Algorithms:
Token Bucket:
- Bucket holds tokens
- Request consumes token
- Tokens refill at fixed rate
- If no tokens, request denied
Leaky Bucket:
- Requests enter queue
- Process at fixed rate
- Overflow requests rejected
Fixed Window:
- 100 requests per minute
- Counter resets every minute
- Problem: Burst at window boundaries
Sliding Window:
- Smooths out fixed window issues
- Considers requests in rolling time period
System Design Example: URL Shortener (like bit.ly)
Requirements:
- Shorten URLs
- Redirect to original URLs
- Handle 100M URLs, 1000 requests/second
- URLs don’t expire
Design:
1. API:
POST /shorten
Body: { "url": "https://example.com/very/long/url" }
Response: { "short_url": "abc123" }
GET /{short_code}
Redirects to original URL
2. Generate Short Codes:
- Base62 encoding (a-z, A-Z, 0-9) = 62 characters
- 7 characters = 62^7 = 3.5 trillion possibilities
- Use auto-increment ID, convert to base62
3. Database Schema:
URLs (
id BIGINT PRIMARY KEY AUTO_INCREMENT,
original_url TEXT NOT NULL,
short_code VARCHAR(10) UNIQUE NOT NULL,
created_at TIMESTAMP,
INDEX(short_code)
)
4. Architecture:
User → Load Balancer → App Servers → Cache (Redis) → Database
↓
Metrics
5. Optimization:
- Cache popular URLs in Redis (80/20 rule)
- Use CDN for static assets
- Database read replicas for redirects
- Rate limiting per IP
6. Scalability:
- Shard database by short_code range
- Multiple app servers behind load balancer
- Pre-generate and cache short codes
DESIGN PATTERNS
Design patterns are reusable solutions to common programming problems. They’re templates, not finished code.
Categories
- Creational – Object creation
- Structural – Object composition
- Behavioral – Object interaction
Creational Patterns
1. Singleton
Ensure only one instance of a class exists
Use case: Database connection, logger, configuration manager
class Database:
_instance = None
def __new__(cls):
if cls._instance is None:
cls._instance = super().__new__(cls)
cls._instance.connection = create_connection()
return cls._instance
# Usage
db1 = Database()
db2 = Database()
# db1 and db2 are the same instance
Pros: Single point of control, lazy initialization Cons: Global state (testing issues), thread-safety concerns
2. Factory
Create objects without specifying exact class
Use case: Creating different types of objects based on input
class Dog:
def speak(self):
return "Woof!"
class Cat:
def speak(self):
return "Meow!"
class AnimalFactory:
def create_animal(self, animal_type):
if animal_type == "dog":
return Dog()
elif animal_type == "cat":
return Cat()
# Usage
factory = AnimalFactory()
pet = factory.create_animal("dog")
print(pet.speak()) # "Woof!"
3. Builder
Construct complex objects step by step
Use case: Creating objects with many optional parameters
class Pizza:
def __init__(self):
self.size = None
self.cheese = False
self.pepperoni = False
self.mushrooms = False
class PizzaBuilder:
def __init__(self):
self.pizza = Pizza()
def set_size(self, size):
self.pizza.size = size
return self
def add_cheese(self):
self.pizza.cheese = True
return self
def add_pepperoni(self):
self.pizza.pepperoni = True
return self
def build(self):
return self.pizza
# Usage
pizza = PizzaBuilder()\
.set_size("large")\
.add_cheese()\
.add_pepperoni()\
.build()
Structural Patterns
4. Adapter
Make incompatible interfaces work together
Use case: Integrating third-party libraries, legacy code
# Old interface
class OldPaymentSystem:
def make_payment(self, amount):
print(f"Old system: Processing ${amount}")
# New interface expected by our app
class PaymentProcessor:
def process(self, amount):
pass
# Adapter
class PaymentAdapter(PaymentProcessor):
def __init__(self, old_system):
self.old_system = old_system
def process(self, amount):
self.old_system.make_payment(amount)
# Usage
old_system = OldPaymentSystem()
adapter = PaymentAdapter(old_system)
adapter.process(100) # Works with new interface!
5. Decorator
Add behavior to objects dynamically
Use case: Adding features without modifying original class
class Coffee:
def cost(self):
return 5
class MilkDecorator:
def __init__(self, coffee):
self.coffee = coffee
def cost(self):
return self.coffee.cost() + 2
class SugarDecorator:
def __init__(self, coffee):
self.coffee = coffee
def cost(self):
return self.coffee.cost() + 1
# Usage
coffee = Coffee()
coffee_with_milk = MilkDecorator(coffee)
coffee_with_milk_and_sugar = SugarDecorator(coffee_with_milk)
print(coffee_with_milk_and_sugar.cost()) # 8
6. Facade
Simplified interface to complex subsystem
Use case: Hide complexity, provide simple API
# Complex subsystems
class CPU:
def freeze(self): pass
def execute(self): pass
class Memory:
def load(self): pass
class HardDrive:
def read(self): pass
# Facade
class ComputerFacade:
def __init__(self):
self.cpu = CPU()
self.memory = Memory()
self.hard_drive = HardDrive()
def start(self):
self.cpu.freeze()
self.memory.load()
self.hard_drive.read()
self.cpu.execute()
# Usage
computer = ComputerFacade()
computer.start() # Simple interface!
Behavioral Patterns
7. Observer
Objects notify subscribers about changes
Use case: Event systems, pub/sub, MVC
class Subject:
def __init__(self):
self._observers = []
def attach(self, observer):
self._observers.append(observer)
def notify(self, message):
for observer in self._observers:
observer.update(message)
class EmailObserver:
def update(self, message):
print(f"Email: {message}")
class SMSObserver:
def update(self, message):
print(f"SMS: {message}")
# Usage
subject = Subject()
subject.attach(EmailObserver())
subject.attach(SMSObserver())
subject.notify("New order received!")
# Email: New order received!
# SMS: New order received!
8. Strategy
Define family of algorithms, make them interchangeable
Use case: Different sorting algorithms, payment methods, compression
class PaymentStrategy:
def pay(self, amount):
pass
class CreditCardPayment(PaymentStrategy):
def pay(self, amount):
print(f"Paid ${amount} with credit card")
class PayPalPayment(PaymentStrategy):
def pay(self, amount):
print(f"Paid ${amount} with PayPal")
class ShoppingCart:
def __init__(self, payment_strategy):
self.payment_strategy = payment_strategy
def checkout(self, amount):
self.payment_strategy.pay(amount)
# Usage
cart = ShoppingCart(CreditCardPayment())
cart.checkout(100)
cart = ShoppingCart(PayPalPayment())
cart.checkout(50)
9. Command
Encapsulate requests as objects
Use case: Undo/redo, task queues, transactions
class Light:
def turn_on(self):
print("Light ON")
def turn_off(self):
print("Light OFF")
class Command:
def execute(self):
pass
class TurnOnCommand(Command):
def __init__(self, light):
self.light = light
def execute(self):
self.light.turn_on()
class TurnOffCommand(Command):
def __init__(self, light):
self.light = light
def execute(self):
self.light.turn_off()
class RemoteControl:
def __init__(self):
self.history = []
def execute(self, command):
command.execute()
self.history.append(command)
# Usage
light = Light()
remote = RemoteControl()
remote.execute(TurnOnCommand(light))
remote.execute(TurnOffCommand(light))
10. Template Method
Define algorithm skeleton, let subclasses override steps
Use case: Frameworks, game engines, workflows
from abc import ABC, abstractmethod
class DataProcessor(ABC):
def process(self):
self.read_data()
self.process_data()
self.save_data()
@abstractmethod
def read_data(self):
pass
@abstractmethod
def process_data(self):
pass
def save_data(self):
print("Saving to database") # Common implementation
class CSVProcessor(DataProcessor):
def read_data(self):
print("Reading CSV")
def process_data(self):
print("Processing CSV data")
class JSONProcessor(DataProcessor):
def read_data(self):
print("Reading JSON")
def process_data(self):
print("Processing JSON data")
# Usage
processor = CSVProcessor()
processor.process()
Anti-Patterns (What to Avoid)
God Object: One class does everything Spaghetti Code: Tangled control flow Golden Hammer: Using same solution for every problem Premature Optimization: Optimizing before measuring Copy-Paste Programming: Duplicating code instead of abstracting
When to Use What
System Design Focus Areas:
- Read-heavy: Caching, read replicas, CDN
- Write-heavy: Message queues, sharding, async processing
- Real-time: WebSockets, push notifications, streaming
- Analytics: Data warehouses, batch processing, OLAP
Design Pattern Selection:
- Need one instance? → Singleton
- Complex object creation? → Builder or Factory
- Add features dynamically? → Decorator
- Different algorithms? → Strategy
- Notify multiple objects? → Observer
- Simplify complex system? → Facade