Relational Data Stores (P05)

This document introduces relational data stores and explains why they matter for data analytics.

Relational databases are one of the most important technologies in data work. They appear in enterprise systems, analytics pipelines, and everyday software.

What Is a Relational Data Store?

A relational data store: - organizes data into tables - each table has rows (records) and columns (fields) - uses keys to relate tables to one another

Relationships allow data to be: - structured - queried efficiently - joined across tables

Tables, Rows, and Columns

Key vocabulary:

Table: a collection of related records
Row: one record (one entity or event)
Column: one attribute of that record

Example: - a customers table - one row per customer - columns like customer_id, name, email

Keys and Relationships

Relational databases rely on keys:

Primary key: uniquely identifies a row
Foreign key: references a primary key in another table

These keys make it possible to: - connect data across tables - avoid duplication - maintain consistency

File-Based vs Hosted Databases

Relational databases come in two broad forms:

File-Based

stored in a single file
run locally
easy to copy and version
no server required

Examples: - SQLite - DuckDB

Hosted / Server-Based

run as services
support many users
common in production systems

Examples: - PostgreSQL - MySQL - SQL Server - Oracle

File-based systems may act as proxies for hosted systems.

Why File-Based Databases Are Used

File-based databases: - remove setup complexity - behave like real databases - support standard SQL - work well with Python

This makes them ideal for: - analytics - experimentation - local projects