Project Instructions (Module 4: Exploratory Data Analysis with Notebooks)¶
WEDNESDAY: Complete Workflow Phases 1-3¶
Follow the instructions in ⭐ Workflow: Apply Example to complete:
- Phase 1. Start & Run - copy the project and confirm it runs
- Phase 2. Change Authorship - update the project to your name and GitHub account
- Phase 3. Read & Understand - review the project structure and code
Open notebooks/eda_case.ipynb in VS Code, select the kernel (.venv), and run all cells.
Confirm the notebook executes without errors and commit it with output visible.
FRIDAY/SUNDAY: Complete Workflow Phases 4-5¶
Again, follow the instructions above to complete:
- Phase 4. Make a Technical Modification - make a change and verify it still runs
- Phase 5. Apply the Skills to a New Problem
Phase 4 Suggestions¶
Make a small technical change to the example notebook that does not break it. Choose any one of these (or a different modification as you like):
- Add a new Markdown cell with a section heading and one observation about the data
- Change the color palette or chart style in a seaborn plot
- Add a second chart of a different type (e.g., add a box plot alongside a histogram)
- Add a cell that prints the number of missing values per column using
df.isnull().sum() - Add a cell that filters the DataFrame to a subset of rows and re-runs a chart on the subset
Re-run all cells after your change and confirm the notebook executes cleanly. Commit with output visible.
Phase 5 Suggestions¶
Phase 5 Suggestion 1. EDA on a Built-in Dataset (Directed)¶
Use seaborn's built-in datasets to perform EDA on a dataset different from the example. Being able to conduct an EDA on your own is a critical skill. You will do this again in the capstone. Seek to understand the process and be able to explore any data that interests you.
Steps:
- Choose a dataset from seaborn (see list below)
- Create a new notebook file:
notebooks/eda_yourname.ipynb - Follow the same numbered section structure as the example
- Include: shape, dtypes, missing values, descriptive statistics, and at least two charts
- Add Markdown narrative cells explaining your observations after each section
Good seaborn datasets for practice:
tips— restaurant tipping data (244 rows, 7 columns)iris— flower measurements (150 rows, 5 columns)mpg— car fuel efficiency (398 rows, 9 columns)titanic— passenger survival data (891 rows, 15 columns)diamonds— diamond prices and attributes (53940 rows, 10 columns)
Load with: df = sns.load_dataset('dataset_name')
Then:
- All column names will change.
- Update the notebook to reflect the columns and content of your data.
- Describe the dataset: what each column represents and where the data comes from
- Identify one surprising or interesting pattern you found
- Explain what a next analytical step might be (e.g., grouping, filtering, modeling)
Phase 5 Suggestion 2. EDA on Your Own Dataset (Original)¶
Perform EDA on a dataset you bring yourself. Being able to get value out of data is one of key skills of a data analyst. Try it on any data you find interesting.
Steps:
- Find a tabular dataset (CSV) relevant to your field or interests (e.g., from https://www.kaggle.com, https://data.gov, or your own work)
- Place the file in
data/raw/ - Create a new notebook:
notebooks/eda_yourname.ipynb - Load the data with
pd.read_csv() - All column names will change.
- Update the notebook to reflect the columns and content of your data.
- Follow the same numbered section structure as the example
- Include: shape, dtypes, missing values, descriptive statistics, and at least two charts
- Add Markdown narrative explaining your observations
Then:
- Cite the data source and describe what it contains
- Identify at least one data quality issue you found (missing values, outliers, wrong types)
- Describe what question you would investigate next if you had more time
Key Skill Focus¶
As you work, focus on:
- how notebooks combine narrative and code for exploratory work
- how
df.info(),df.describe(), anddf.isnull()give a quick dataset overview - how distributions reveal shape, spread, and outliers
- how grouping and filtering expose patterns within subsets
- how Markdown narrative turns a notebook into a readable analysis
Your goal is to produce a notebook that tells a clear story about a dataset.
Professional Communication¶
Make sure the title and narrative reflect your work. Verify key files:
- README.md
- docs/ (source and hosted on GitHub Pages)
- notebooks/ (executed notebooks committed with output visible)
Ensure your project clearly demonstrates:
- a complete EDA with all standard sections
- at least two meaningful charts with narrative
- a readable notebook someone else could follow