Seaborn Built-in Datasets
Seaborn includes several clean, ready-to-use datasets for practice.
These are ideal for EDA and regression projects because they load instantly
with no file management required.
List Available Datasets
import seaborn as sns
print(sns.get_dataset_names())
Load a Dataset
df = sns.load_dataset("dataset_name")
Good for EDA (Module 4)
| Dataset |
Rows |
Columns |
Good for |
penguins |
344 |
7 |
Grouping, scatter, missing values, course example |
tips |
244 |
7 |
Numeric and categorical, distributions |
iris |
150 |
5 |
Classic grouping, clean data, minimal missing values |
mpg |
398 |
9 |
Mixed types, missing values, real-world context |
diamonds |
53940 |
10 |
Large dataset, skewed distributions |
titanic |
891 |
15 |
Survival analysis, many missing values, good challenge |
Good for Regression (Module 7)
| Dataset |
Rows |
Predictor |
Outcome |
Notes |
mpg |
398 |
horsepower |
mpg |
Natural pair, missing values to handle |
tips |
244 |
total_bill |
tip |
Intuitive real-world relationship |
penguins |
344 |
flipper_length_mm |
body_mass_g |
Strong R² (~0.76) |
diamonds |
53940 |
carat |
price |
Nonlinear, stretch goal |
Notes
Seaborn Built-in Datasets
Seaborn includes several clean, ready-to-use datasets for practice.
These are ideal for EDA and regression projects because they load instantly
with no file management required.
List Available Datasets
import seaborn as sns
print(sns.get_dataset_names())
Load a Dataset
df = sns.load_dataset("dataset_name")
Good for EDA (Module 4)
| Dataset |
Rows |
Columns |
Good for |
penguins |
344 |
7 |
Grouping, scatter, missing values — course example |
tips |
244 |
7 |
Numeric and categorical, distributions |
iris |
150 |
5 |
Classic grouping, clean data, minimal missing values |
mpg |
398 |
9 |
Mixed types, missing values, real-world context |
diamonds |
53940 |
10 |
Large dataset, skewed distributions |
titanic |
891 |
15 |
Survival analysis, many missing values, good challenge |
Good for Regression (Module 7)
| Dataset |
Rows |
Predictor |
Outcome |
Notes |
mpg |
398 |
horsepower |
mpg |
Natural pair, missing values to handle |
tips |
244 |
total_bill |
tip |
Intuitive real-world relationship |
penguins |
344 |
flipper_length_mm |
body_mass_g |
Strong R² (~0.76) |
diamonds |
53940 |
carat |
price |
Nonlinear — stretch goal |
Notes
mpg is recommended for Module 7; it uses a different dataset than Module 4,
which reinforces that the regression workflow transfers to new data.
np.ptp() is deprecated in newer numpy. Use np.max(x) - np.min(x) for range.
- Seaborn datasets require an internet connection on first load; they are cached locally after that.
which reinforces that the regression workflow transfers to new data.
np.ptp() is deprecated in newer numpy. Use np.max(x) - np.min(x) for range.
- Seaborn datasets require an internet connection on first load; they are cached locally after that.