Database design goes well beyond just tables and normalization.

Indexes

Purpose: Speed up data retrieval (like a book’s index)

Types:

Primary Index: Automatically created on primary keys
Unique Index: Ensures uniqueness (like email addresses)
Composite Index: Multiple columns together (lastName, firstName)
Full-text Index: For text searching

Example:

CREATE INDEX idx_customer_email ON Customers(email);
CREATE INDEX idx_order_date ON Orders(order_date, customer_id);

Trade-offs: Indexes speed up reads but slow down writes (inserts/updates). Don’t index everything—only columns you frequently search, filter, or join on.

Constraints

Data Integrity Rules:

NOT NULL: Field must have a value

email VARCHAR(255) NOT NULL

UNIQUE: No duplicate values allowed

username VARCHAR(50) UNIQUE

CHECK: Validates data against conditions

age INT CHECK (age >= 0 AND age <= 150)
price DECIMAL(10,2) CHECK (price > 0)

DEFAULT: Provides automatic values

created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
status VARCHAR(20) DEFAULT 'pending'

FOREIGN KEY with actions:

customer_id INT REFERENCES Customers(customer_id)
  ON DELETE CASCADE  -- Delete orders when customer deleted
  ON UPDATE CASCADE  -- Update order's customer_id if customer_id changes

Other FK options: SET NULL, SET DEFAULT, RESTRICT

Data Types

Choose appropriately for performance and storage:

Numeric:

INT vs BIGINT vs SMALLINT (storage size matters)
DECIMAL(10,2) for money (exact precision)
FLOAT/DOUBLE for scientific data (approximate)

Strings:

CHAR(10) – Fixed length, faster but wastes space
VARCHAR(255) – Variable length, more flexible
TEXT – Large text (don’t index these)

Dates:

DATE – Just the date
TIMESTAMP – Date + time with timezone awareness
DATETIME – Date + time without timezone

Boolean:

BOOLEAN or TINYINT(1)

Bad example: Storing dates as strings "2024-12-06" – can’t do date math or proper sorting

Naming Conventions

Consistency is critical:

Table names:

Plural vs singular (choose one): Users or User
PascalCase, snake_case, or lowercase (be consistent)

Column names:

Descriptive: created_at not dt
Avoid reserved words: Don’t name a column order or user
Foreign keys: customer_id clearly shows it references Customers

Prefixes to avoid:

Don’t prefix tables with tbl_
Don’t use Hungarian notation: str_name, int_age

Soft Deletes vs Hard Deletes

Hard Delete: Actually remove the row

DELETE FROM Users WHERE user_id = 123;

Soft Delete: Mark as deleted but keep the data

-- Add column
deleted_at TIMESTAMP NULL

-- "Delete"
UPDATE Users SET deleted_at = NOW() WHERE user_id = 123;

-- Query active users
SELECT * FROM Users WHERE deleted_at IS NULL;

Use soft deletes when:

You need audit trails
Data has dependencies
Regulatory compliance requires retention
Users might want to “undelete”

Audit Trails and History

Track changes for compliance/debugging:

Simple approach: Add tracking columns

created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
created_by INT REFERENCES Users(user_id)
updated_at TIMESTAMP ON UPDATE CURRENT_TIMESTAMP
updated_by INT

Advanced approach: Separate audit table

-- Main table
Products (product_id, name, price)

-- Audit table
Product_History (
  history_id,
  product_id,
  name,
  price,
  action, -- 'INSERT', 'UPDATE', 'DELETE'
  changed_by,
  changed_at
)

Use database triggers to automatically populate audit tables.

Partitioning

Split large tables for performance:

Range Partitioning: By date ranges

-- Orders table partitioned by year
Orders_2023
Orders_2024
Orders_2025

Hash Partitioning: Distribute data evenly List Partitioning: By specific values (regions, categories)

Benefits: Faster queries, easier maintenance, can archive old partitions

Denormalization Patterns

When performance trumps perfect normalization:

Calculated fields:

-- Store order_total instead of always calculating from OrderItems
Orders (
  order_id,
  customer_id,
  order_total -- Denormalized!
)

Redundant data for performance:

-- Store customer_name in Orders to avoid joins
Orders (
  order_id,
  customer_id,
  customer_name -- Duplicates data from Customers table
)

Materialized views: Pre-computed query results updated periodically

Scalability Considerations

Vertical Partitioning: Split wide tables

-- Instead of one huge Users table
Users (user_id, username, email)
User_Profiles (user_id, bio, avatar, preferences) -- Rarely accessed columns

Horizontal Sharding: Split by rows across servers

User IDs 1-1M on Server A
User IDs 1M-2M on Server B

Read Replicas: Copy databases for read-only queries

Caching: Redis/Memcached for frequently accessed data

Security Design

Encryption:

Encrypt sensitive columns (passwords, SSN, credit cards)
Use bcrypt/argon2 for passwords, never plain text
Consider encryption at rest and in transit

Access Control:

Principle of least privilege
Different database users for different applications
Row-level security for multi-tenant applications

SQL Injection Prevention:

Use parameterized queries/prepared statements
Never concatenate user input into SQL

Performance Optimization

Query Optimization:

Use EXPLAIN to analyze slow queries
Avoid SELECT * (fetch only needed columns)
Limit result sets with pagination
Use proper JOIN types

Connection Pooling: Reuse database connections

Batch Operations: Insert/update many rows at once instead of one-by-one

Caching Strategies:

Application-level caching
Query result caching
Object caching

Backup and Recovery

Strategy:

Regular automated backups
Test restore procedures
Point-in-time recovery capability
Geographic redundancy

Types:

Full backups (weekly)
Incremental backups (daily)
Transaction log backups (hourly)

Schema Versioning / Migrations

Track database changes like code:

Use migration tools (Flyway, Liquibase, Alembic):

-- Migration: V001__create_users_table.sql
CREATE TABLE Users (...);

-- Migration: V002__add_phone_to_users.sql
ALTER TABLE Users ADD COLUMN phone VARCHAR(20);

Never modify migrations after they’re deployed—create new ones

Design Patterns

UUID vs Auto-increment IDs:

Auto-increment: Simple, sequential, can expose scale
UUID: Globally unique, harder to guess, larger storage

Polymorphic Associations: One table relates to multiple tables

Comments (
  comment_id,
  commentable_id,   -- Could be post_id or photo_id
  commentable_type  -- 'Post' or 'Photo'
)

EAV (Entity-Attribute-Value): Flexible but slow

-- For products with varying attributes
Product_Attributes (
  product_id,
  attribute_name,  -- 'color', 'size', 'weight'
  attribute_value
)

Use sparingly—usually better to use JSONB or separate tables

Documentation

Essential to maintain:

ER diagrams showing table relationships
Data dictionary explaining each table/column
Business rules and constraints
Index strategy and rationale
Backup/recovery procedures

Testing

Database testing:

Test constraints actually prevent bad data
Performance testing with realistic data volumes
Test backup/restore procedures
Load testing for concurrent users

Database design goes well beyond just tables and normalization.

Indexes

Constraints

Data Types

Naming Conventions

Soft Deletes vs Hard Deletes

Audit Trails and History

Partitioning

Denormalization Patterns

Scalability Considerations

Security Design

Performance Optimization

Backup and Recovery

Schema Versioning / Migrations

Design Patterns

Documentation

Testing

Backend developer roadmap 2026.

SYSTEM DESIGN AND DESIGN PATTERNS

Leave a Reply Cancel reply

Indexes

Constraints

Data Types

Naming Conventions

Soft Deletes vs Hard Deletes

Audit Trails and History

Partitioning

Denormalization Patterns

Scalability Considerations

Security Design

Performance Optimization

Backup and Recovery

Schema Versioning / Migrations

Design Patterns

Documentation

Testing

Similar Posts

Leave a Reply Cancel reply