Database design goes well beyond just tables and normalization.

Indexes

Purpose: Speed up data retrieval (like a book’s index)

Types:

  • Primary Index: Automatically created on primary keys
  • Unique Index: Ensures uniqueness (like email addresses)
  • Composite Index: Multiple columns together (lastName, firstName)
  • Full-text Index: For text searching

Example:

CREATE INDEX idx_customer_email ON Customers(email);
CREATE INDEX idx_order_date ON Orders(order_date, customer_id);

Trade-offs: Indexes speed up reads but slow down writes (inserts/updates). Don’t index everything—only columns you frequently search, filter, or join on.


Constraints

Data Integrity Rules:

NOT NULL: Field must have a value

email VARCHAR(255) NOT NULL

UNIQUE: No duplicate values allowed

username VARCHAR(50) UNIQUE

CHECK: Validates data against conditions

age INT CHECK (age >= 0 AND age <= 150)
price DECIMAL(10,2) CHECK (price > 0)

DEFAULT: Provides automatic values

created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
status VARCHAR(20) DEFAULT 'pending'

FOREIGN KEY with actions:

customer_id INT REFERENCES Customers(customer_id)
  ON DELETE CASCADE  -- Delete orders when customer deleted
  ON UPDATE CASCADE  -- Update order's customer_id if customer_id changes

Other FK options: SET NULL, SET DEFAULT, RESTRICT


Data Types

Choose appropriately for performance and storage:

Numeric:

  • INT vs BIGINT vs SMALLINT (storage size matters)
  • DECIMAL(10,2) for money (exact precision)
  • FLOAT/DOUBLE for scientific data (approximate)

Strings:

  • CHAR(10) – Fixed length, faster but wastes space
  • VARCHAR(255) – Variable length, more flexible
  • TEXT – Large text (don’t index these)

Dates:

  • DATE – Just the date
  • TIMESTAMP – Date + time with timezone awareness
  • DATETIME – Date + time without timezone

Boolean:

  • BOOLEAN or TINYINT(1)

Bad example: Storing dates as strings "2024-12-06" – can’t do date math or proper sorting


Naming Conventions

Consistency is critical:

Table names:

  • Plural vs singular (choose one): Users or User
  • PascalCase, snake_case, or lowercase (be consistent)

Column names:

  • Descriptive: created_at not dt
  • Avoid reserved words: Don’t name a column order or user
  • Foreign keys: customer_id clearly shows it references Customers

Prefixes to avoid:

  • Don’t prefix tables with tbl_
  • Don’t use Hungarian notation: str_name, int_age

Soft Deletes vs Hard Deletes

Hard Delete: Actually remove the row

DELETE FROM Users WHERE user_id = 123;

Soft Delete: Mark as deleted but keep the data

-- Add column
deleted_at TIMESTAMP NULL

-- "Delete"
UPDATE Users SET deleted_at = NOW() WHERE user_id = 123;

-- Query active users
SELECT * FROM Users WHERE deleted_at IS NULL;

Use soft deletes when:

  • You need audit trails
  • Data has dependencies
  • Regulatory compliance requires retention
  • Users might want to “undelete”

Audit Trails and History

Track changes for compliance/debugging:

Simple approach: Add tracking columns

created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
created_by INT REFERENCES Users(user_id)
updated_at TIMESTAMP ON UPDATE CURRENT_TIMESTAMP
updated_by INT

Advanced approach: Separate audit table

-- Main table
Products (product_id, name, price)

-- Audit table
Product_History (
  history_id,
  product_id,
  name,
  price,
  action, -- 'INSERT', 'UPDATE', 'DELETE'
  changed_by,
  changed_at
)

Use database triggers to automatically populate audit tables.


Partitioning

Split large tables for performance:

Range Partitioning: By date ranges

-- Orders table partitioned by year
Orders_2023
Orders_2024
Orders_2025

Hash Partitioning: Distribute data evenly List Partitioning: By specific values (regions, categories)

Benefits: Faster queries, easier maintenance, can archive old partitions


Denormalization Patterns

When performance trumps perfect normalization:

Calculated fields:

-- Store order_total instead of always calculating from OrderItems
Orders (
  order_id,
  customer_id,
  order_total -- Denormalized!
)

Redundant data for performance:

-- Store customer_name in Orders to avoid joins
Orders (
  order_id,
  customer_id,
  customer_name -- Duplicates data from Customers table
)

Materialized views: Pre-computed query results updated periodically


Scalability Considerations

Vertical Partitioning: Split wide tables

-- Instead of one huge Users table
Users (user_id, username, email)
User_Profiles (user_id, bio, avatar, preferences) -- Rarely accessed columns

Horizontal Sharding: Split by rows across servers

  • User IDs 1-1M on Server A
  • User IDs 1M-2M on Server B

Read Replicas: Copy databases for read-only queries

Caching: Redis/Memcached for frequently accessed data


Security Design

Encryption:

  • Encrypt sensitive columns (passwords, SSN, credit cards)
  • Use bcrypt/argon2 for passwords, never plain text
  • Consider encryption at rest and in transit

Access Control:

  • Principle of least privilege
  • Different database users for different applications
  • Row-level security for multi-tenant applications

SQL Injection Prevention:

  • Use parameterized queries/prepared statements
  • Never concatenate user input into SQL

Performance Optimization

Query Optimization:

  • Use EXPLAIN to analyze slow queries
  • Avoid SELECT * (fetch only needed columns)
  • Limit result sets with pagination
  • Use proper JOIN types

Connection Pooling: Reuse database connections

Batch Operations: Insert/update many rows at once instead of one-by-one

Caching Strategies:

  • Application-level caching
  • Query result caching
  • Object caching

Backup and Recovery

Strategy:

  • Regular automated backups
  • Test restore procedures
  • Point-in-time recovery capability
  • Geographic redundancy

Types:

  • Full backups (weekly)
  • Incremental backups (daily)
  • Transaction log backups (hourly)

Schema Versioning / Migrations

Track database changes like code:

Use migration tools (Flyway, Liquibase, Alembic):

-- Migration: V001__create_users_table.sql
CREATE TABLE Users (...);

-- Migration: V002__add_phone_to_users.sql
ALTER TABLE Users ADD COLUMN phone VARCHAR(20);

Never modify migrations after they’re deployed—create new ones


Design Patterns

UUID vs Auto-increment IDs:

  • Auto-increment: Simple, sequential, can expose scale
  • UUID: Globally unique, harder to guess, larger storage

Polymorphic Associations: One table relates to multiple tables

Comments (
  comment_id,
  commentable_id,   -- Could be post_id or photo_id
  commentable_type  -- 'Post' or 'Photo'
)

EAV (Entity-Attribute-Value): Flexible but slow

-- For products with varying attributes
Product_Attributes (
  product_id,
  attribute_name,  -- 'color', 'size', 'weight'
  attribute_value
)

Use sparingly—usually better to use JSONB or separate tables


Documentation

Essential to maintain:

  • ER diagrams showing table relationships
  • Data dictionary explaining each table/column
  • Business rules and constraints
  • Index strategy and rationale
  • Backup/recovery procedures

Testing

Database testing:

  • Test constraints actually prevent bad data
  • Performance testing with realistic data volumes
  • Test backup/restore procedures
  • Load testing for concurrent users

Similar Posts

  • SYSTEM DESIGN AND DESIGN PATTERNS

    SYSTEM DESIGN System design is about architecting large-scale software systems. It’s how you build applications that handle millions of users, massive data, and stay reliable. Core Principles Scalability – Handle growing load (users, data, traffic) Reliability – System works correctly even when things fail Availability – System is operational and accessible Maintainability – Easy to…

  • Backend developer roadmap 2026.

    In 2025, companies expect backend developers to be strong in Core Java, Spring Boot, Microservices, and CI/CD Deployment. Whether you’re preparing for interviews or upgrading your skills, this is the most complete roadmap you’ll find online. This guide covers all important topics—Java 8 features, Stream API, multithreading, design patterns, microservices architecture, Docker, Kubernetes, CI/CD pipelines,…

Leave a Reply

Your email address will not be published. Required fields are marked *