Chapter 6

Data

This chapter provides an introduction to key aspects of data management.

Data At Rest

This section covers data at rest, storage technologies, and architectures, including data lakes and lake house architectures.

Data In Motion

This section covers data in motion, pipeline technologies,and architectures, including stream processing and ETL.

Data Formats

This section covers common data formats used in modern analytics, including CSV, JSON, and Parquet.

Data Stores

This section covers popular data storage systems, including SQL and NoSQL databases, graph databases, and key-value stores. Examples include PostgreSQL, MongoDB, Neo4j, and Redis.

Message Queues

This section covers popular message queue systems used to route information from decoupled applications and services. It introduces popular options including RabbitMQ and Apache Kafka.

Data Processing Engines

This section covers popular data processing engines, including Apache Spark, Apache Flink, and Apache Beam.