Skip to content

Relational Data Stores (P05)

This document introduces relational data stores and explains why they matter for data analytics.

Relational databases are one of the most important technologies in data work. They appear in enterprise systems, analytics pipelines, and everyday software.


What Is a Relational Data Store?

A relational data store: - organizes data into tables - each table has rows (records) and columns (fields) - uses keys to relate tables to one another

Relationships allow data to be: - structured - queried efficiently - joined across tables


Tables, Rows, and Columns

Key vocabulary:

  • Table: a collection of related records
  • Row: one record (one entity or event)
  • Column: one attribute of that record

Example: - a customers table - one row per customer - columns like customer_id, name, email


Keys and Relationships

Relational databases rely on keys:

  • Primary key: uniquely identifies a row
  • Foreign key: references a primary key in another table

These keys make it possible to: - connect data across tables - avoid duplication - maintain consistency


File-Based vs Hosted Databases

Relational databases come in two broad forms:

File-Based

  • stored in a single file
  • run locally
  • easy to copy and version
  • no server required

Examples: - SQLite - DuckDB

Hosted / Server-Based

  • run as services
  • support many users
  • common in production systems

Examples: - PostgreSQL - MySQL - SQL Server - Oracle

File-based systems may act as proxies for hosted systems.


Why File-Based Databases Are Used

File-based databases: - remove setup complexity - behave like real databases - support standard SQL - work well with Python

This makes them ideal for: - analytics - experimentation - local projects