Database design goes well beyond just tables and normalization.

0
(0)

Indexes

Purpose: Speed up data retrieval (like a book’s index)

Types:

  • Primary Index: Automatically created on primary keys
  • Unique Index: Ensures uniqueness (like email addresses)
  • Composite Index: Multiple columns together (lastName, firstName)
  • Full-text Index: For text searching

Example:

CREATE INDEX idx_customer_email ON Customers(email);
CREATE INDEX idx_order_date ON Orders(order_date, customer_id);

Trade-offs: Indexes speed up reads but slow down writes (inserts/updates). Don’t index everything—only columns you frequently search, filter, or join on.


Constraints

Data Integrity Rules:

NOT NULL: Field must have a value

email VARCHAR(255) NOT NULL

UNIQUE: No duplicate values allowed

username VARCHAR(50) UNIQUE

CHECK: Validates data against conditions

age INT CHECK (age >= 0 AND age <= 150)
price DECIMAL(10,2) CHECK (price > 0)

DEFAULT: Provides automatic values

created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
status VARCHAR(20) DEFAULT 'pending'

FOREIGN KEY with actions:

customer_id INT REFERENCES Customers(customer_id)
  ON DELETE CASCADE  -- Delete orders when customer deleted
  ON UPDATE CASCADE  -- Update order's customer_id if customer_id changes

Other FK options: SET NULL, SET DEFAULT, RESTRICT


Data Types

Choose appropriately for performance and storage:

Numeric:

  • INT vs BIGINT vs SMALLINT (storage size matters)
  • DECIMAL(10,2) for money (exact precision)
  • FLOAT/DOUBLE for scientific data (approximate)

Strings:

  • CHAR(10) – Fixed length, faster but wastes space
  • VARCHAR(255) – Variable length, more flexible
  • TEXT – Large text (don’t index these)

Dates:

  • DATE – Just the date
  • TIMESTAMP – Date + time with timezone awareness
  • DATETIME – Date + time without timezone

Boolean:

  • BOOLEAN or TINYINT(1)

Bad example: Storing dates as strings "2024-12-06" – can’t do date math or proper sorting


Naming Conventions

Consistency is critical:

Table names:

  • Plural vs singular (choose one): Users or User
  • PascalCase, snake_case, or lowercase (be consistent)

Column names:

  • Descriptive: created_at not dt
  • Avoid reserved words: Don’t name a column order or user
  • Foreign keys: customer_id clearly shows it references Customers

Prefixes to avoid:

  • Don’t prefix tables with tbl_
  • Don’t use Hungarian notation: str_name, int_age

Soft Deletes vs Hard Deletes

Hard Delete: Actually remove the row

DELETE FROM Users WHERE user_id = 123;

Soft Delete: Mark as deleted but keep the data

-- Add column
deleted_at TIMESTAMP NULL

-- "Delete"
UPDATE Users SET deleted_at = NOW() WHERE user_id = 123;

-- Query active users
SELECT * FROM Users WHERE deleted_at IS NULL;

Use soft deletes when:

  • You need audit trails
  • Data has dependencies
  • Regulatory compliance requires retention
  • Users might want to “undelete”

Audit Trails and History

Track changes for compliance/debugging:

Simple approach: Add tracking columns

created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
created_by INT REFERENCES Users(user_id)
updated_at TIMESTAMP ON UPDATE CURRENT_TIMESTAMP
updated_by INT

Advanced approach: Separate audit table

-- Main table
Products (product_id, name, price)

-- Audit table
Product_History (
  history_id,
  product_id,
  name,
  price,
  action, -- 'INSERT', 'UPDATE', 'DELETE'
  changed_by,
  changed_at
)

Use database triggers to automatically populate audit tables.


Partitioning

Split large tables for performance:

Range Partitioning: By date ranges

-- Orders table partitioned by year
Orders_2023
Orders_2024
Orders_2025

Hash Partitioning: Distribute data evenly List Partitioning: By specific values (regions, categories)

Benefits: Faster queries, easier maintenance, can archive old partitions


Denormalization Patterns

When performance trumps perfect normalization:

Calculated fields:

-- Store order_total instead of always calculating from OrderItems
Orders (
  order_id,
  customer_id,
  order_total -- Denormalized!
)

Redundant data for performance:

-- Store customer_name in Orders to avoid joins
Orders (
  order_id,
  customer_id,
  customer_name -- Duplicates data from Customers table
)

Materialized views: Pre-computed query results updated periodically


Scalability Considerations

Vertical Partitioning: Split wide tables

-- Instead of one huge Users table
Users (user_id, username, email)
User_Profiles (user_id, bio, avatar, preferences) -- Rarely accessed columns

Horizontal Sharding: Split by rows across servers

  • User IDs 1-1M on Server A
  • User IDs 1M-2M on Server B

Read Replicas: Copy databases for read-only queries

Caching: Redis/Memcached for frequently accessed data


Security Design

Encryption:

  • Encrypt sensitive columns (passwords, SSN, credit cards)
  • Use bcrypt/argon2 for passwords, never plain text
  • Consider encryption at rest and in transit

Access Control:

  • Principle of least privilege
  • Different database users for different applications
  • Row-level security for multi-tenant applications

SQL Injection Prevention:

  • Use parameterized queries/prepared statements
  • Never concatenate user input into SQL

Performance Optimization

Query Optimization:

  • Use EXPLAIN to analyze slow queries
  • Avoid SELECT * (fetch only needed columns)
  • Limit result sets with pagination
  • Use proper JOIN types

Connection Pooling: Reuse database connections

Batch Operations: Insert/update many rows at once instead of one-by-one

Caching Strategies:

  • Application-level caching
  • Query result caching
  • Object caching

Backup and Recovery

Strategy:

  • Regular automated backups
  • Test restore procedures
  • Point-in-time recovery capability
  • Geographic redundancy

Types:

  • Full backups (weekly)
  • Incremental backups (daily)
  • Transaction log backups (hourly)

Schema Versioning / Migrations

Track database changes like code:

Use migration tools (Flyway, Liquibase, Alembic):

-- Migration: V001__create_users_table.sql
CREATE TABLE Users (...);

-- Migration: V002__add_phone_to_users.sql
ALTER TABLE Users ADD COLUMN phone VARCHAR(20);

Never modify migrations after they’re deployed—create new ones


Design Patterns

UUID vs Auto-increment IDs:

  • Auto-increment: Simple, sequential, can expose scale
  • UUID: Globally unique, harder to guess, larger storage

Polymorphic Associations: One table relates to multiple tables

Comments (
  comment_id,
  commentable_id,   -- Could be post_id or photo_id
  commentable_type  -- 'Post' or 'Photo'
)

EAV (Entity-Attribute-Value): Flexible but slow

-- For products with varying attributes
Product_Attributes (
  product_id,
  attribute_name,  -- 'color', 'size', 'weight'
  attribute_value
)

Use sparingly—usually better to use JSONB or separate tables


Documentation

Essential to maintain:

  • ER diagrams showing table relationships
  • Data dictionary explaining each table/column
  • Business rules and constraints
  • Index strategy and rationale
  • Backup/recovery procedures

Testing

Database testing:

  • Test constraints actually prevent bad data
  • Performance testing with realistic data volumes
  • Test backup/restore procedures
  • Load testing for concurrent users

How useful was this post?

Click on a star to rate it!

Average rating 0 / 5. Vote count: 0

No votes so far! Be the first to rate this post.

As you found this post useful...

Follow us on social media!

We are sorry that this post was not useful for you!

Let us improve this post!

Tell us how we can improve this post?

Comments

No comments yet. Why don’t you start the discussion?

Leave a Reply

Your email address will not be published. Required fields are marked *