Project Instructions (Module 4: Exploratory Data Analysis with Notebooks)¶

WEDNESDAY: Complete Workflow Phases 1-3¶

Follow the instructions in ⭐ Workflow: Apply Example to complete:

Phase 1. Start & Run - copy the project and confirm it runs
Phase 2. Change Authorship - update the project to your name and GitHub account
Phase 3. Read & Understand - review the project structure and code

Open notebooks/eda_case.ipynb in VS Code, select the kernel (.venv), and run all cells. Confirm the notebook executes without errors and commit it with output visible.

FRIDAY/SUNDAY: Complete Workflow Phases 4-5¶

Again, follow the instructions above to complete:

Phase 4. Make a Technical Modification - make a change and verify it still runs
Phase 5. Apply the Skills to a New Problem

Phase 4 Suggestions¶

Make a small technical change to the example notebook that does not break it. Choose any one of these (or a different modification as you like):

Add a new Markdown cell with a section heading and one observation about the data
Change the color palette or chart style in a seaborn plot
Add a second chart of a different type (e.g., add a box plot alongside a histogram)
Add a cell that prints the number of missing values per column using df.isnull().sum()
Add a cell that filters the DataFrame to a subset of rows and re-runs a chart on the subset

Re-run all cells after your change and confirm the notebook executes cleanly. Commit with output visible.

Phase 5 Suggestions¶

Phase 5 Suggestion 1. EDA on a Built-in Dataset (Directed)¶

Use seaborn's built-in datasets to perform EDA on a dataset different from the example. Being able to conduct an EDA on your own is a critical skill. You will do this again in the capstone. Seek to understand the process and be able to explore any data that interests you.

Steps:

Choose a dataset from seaborn (see list below)
Create a new notebook file: notebooks/eda_yourname.ipynb
Follow the same numbered section structure as the example
Include: shape, dtypes, missing values, descriptive statistics, and at least two charts
Add Markdown narrative cells explaining your observations after each section

Good seaborn datasets for practice:

tips — restaurant tipping data (244 rows, 7 columns)
iris — flower measurements (150 rows, 5 columns)
mpg — car fuel efficiency (398 rows, 9 columns)
titanic — passenger survival data (891 rows, 15 columns)
diamonds — diamond prices and attributes (53940 rows, 10 columns)

Load with: df = sns.load_dataset('dataset_name')

Then:

All column names will change.
Update the notebook to reflect the columns and content of your data.
Describe the dataset: what each column represents and where the data comes from
Identify one surprising or interesting pattern you found
Explain what a next analytical step might be (e.g., grouping, filtering, modeling)

Phase 5 Suggestion 2. EDA on Your Own Dataset (Original)¶

Perform EDA on a dataset you bring yourself. Being able to get value out of data is one of key skills of a data analyst. Try it on any data you find interesting.

Steps:

Find a tabular dataset (CSV) relevant to your field or interests (e.g., from https://www.kaggle.com, https://data.gov, or your own work)
Place the file in data/raw/
Create a new notebook: notebooks/eda_yourname.ipynb
Load the data with pd.read_csv()
All column names will change.
Update the notebook to reflect the columns and content of your data.
Follow the same numbered section structure as the example
Include: shape, dtypes, missing values, descriptive statistics, and at least two charts
Add Markdown narrative explaining your observations

Then:

Cite the data source and describe what it contains
Identify at least one data quality issue you found (missing values, outliers, wrong types)
Describe what question you would investigate next if you had more time

Key Skill Focus¶

As you work, focus on:

how notebooks combine narrative and code for exploratory work
how df.info(), df.describe(), and df.isnull() give a quick dataset overview
how distributions reveal shape, spread, and outliers
how grouping and filtering expose patterns within subsets
how Markdown narrative turns a notebook into a readable analysis

Your goal is to produce a notebook that tells a clear story about a dataset.

Professional Communication¶

Make sure the title and narrative reflect your work. Verify key files:

README.md
docs/ (source and hosted on GitHub Pages)
notebooks/ (executed notebooks committed with output visible)

Ensure your project clearly demonstrates:

a complete EDA with all standard sections
at least two meaningful charts with narrative
a readable notebook someone else could follow