API Reference¶
This page is auto-generated from Python docstrings.
datafun.app_case ¶
app_case.py - Project script (example).
Author: Denise Case Date: 2026-01
Practice key Python skills: - pathlib for cross-platform paths - logging (preferred over print) - calling functions from modules - clear ETVL pipeline stages: E = Extract (read, get data from source into memory) T = Transform (process, change data in memory) V = Verify (check, validate data in memory) L = Load (write results, to data/processed or other destination)
Terminal command to run this file from the root project folder:
uv run python -m datafun.app_case
OBS
Don't edit this file - it should remain a working example. Copy it, rename it, and modify your copy.
main ¶
Entry point for the script.
Entry point: run four simple ETVL pipelines.
log_header() logs a standard run header. log_path() logs repo-relative paths (privacy-safe).
Arguments: None. Returns: None.
Source code in src/datafun/app_case.py
datafun.case_csv_pipeline ¶
case_csv_pipeline.py - CSV ETVL pipeline.
Author: Denise Case Date: 2026-04
Practice key Python skills related to: - ETVL pipeline structure (Extract, Transform, Verify, Load) - reading CSV files using the csv module - keyword-only function arguments - error handling with raise - calculating statistics with the statistics module - writing results to a text file
Paths (relative to repo root):
INPUT FILE: data/raw/2020_happiness.csv
OUTPUT FILE: data/processed/csv_ladder_score_stats.txt
Terminal command to run this file from the root project folder:
uv run python -m datafun.case_csv_pipeline
OBS
Don't edit this file - it should remain a working example. Copy it, rename it, and modify your copy.
extract_csv_scores ¶
E: Read CSV and extract one numeric column as floats.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
file_path
|
Path
|
Path to input CSV file. |
required |
column_name
|
str
|
Name of the column to extract. |
required |
Returns:
| Type | Description |
|---|---|
list[float]
|
List of float values from the specified column. |
Source code in src/datafun/case_csv_pipeline.py
load_stats_report ¶
L: Write stats to a text file in data/processed.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
stats
|
dict[str, float]
|
Dictionary with statistics to write. |
required |
out_path
|
Path
|
Path to output text file. |
required |
Returns:
| Type | Description |
|---|---|
None
|
None |
Source code in src/datafun/case_csv_pipeline.py
run_csv_pipeline ¶
Run the full ETVL pipeline.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
raw_dir
|
Path
|
Path to data/raw directory. |
required |
processed_dir
|
Path
|
Path to data/processed directory. |
required |
logger
|
Any
|
Logger for logging messages. |
required |
Returns:
| Type | Description |
|---|---|
None
|
None |
Source code in src/datafun/case_csv_pipeline.py
transform_scores_to_stats ¶
T: Calculate basic statistics for a list of floats.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
scores
|
list[float]
|
List of float values. |
required |
Returns:
| Type | Description |
|---|---|
dict[str, float]
|
Dictionary with keys: count, min, max, mean, stdev. |
Source code in src/datafun/case_csv_pipeline.py
verify_stats ¶
V: Sanity-check the stats dictionary.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
stats
|
dict[str, float]
|
Dictionary with statistics to verify. |
required |
Returns:
| Type | Description |
|---|---|
None
|
None |
Source code in src/datafun/case_csv_pipeline.py
datafun.case_xlsx_pipeline ¶
case_xlsx_pipeline.py - XLSX ETVL pipeline.
Author: Denise Case Date: 2026-04
Practice key Python skills related to: - ETVL pipeline structure (Extract, Transform, Verify, Load) - reading Excel files using the openpyxl package - accessing cells by column letter - keyword-only function arguments - runtime type checking with isinstance() - counting word occurrences across strings - writing results to a text file
Paths (relative to repo root):
INPUT FILE: data/raw/Feedback.xlsx
OUTPUT FILE: data/processed/xlsx_feedback_github_count.txt
Terminal command to run this file from the root project folder:
uv run python -m datafun.case_xlsx_pipeline
OBS
Don't edit this file - it should remain a working example. Copy it, rename it, and modify your copy.
extract_xlsx_column_strings ¶
E: Read an Excel file and extract string values from a column.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
file_path
|
Path
|
Path to input XLSX file. |
required |
column_letter
|
str
|
Letter of the column to extract (e.g., 'A'). |
required |
Returns:
| Type | Description |
|---|---|
list[str]
|
List of non-empty string values from the specified column. |
Source code in src/datafun/case_xlsx_pipeline.py
load_count_report ¶
L: Write the word count result to a text file in data/processed.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
count
|
int
|
The word count to write. |
required |
out_path
|
Path
|
Path to output text file. |
required |
word
|
str
|
The word that was counted. |
required |
column_letter
|
str
|
The column letter that was processed. |
required |
Returns:
| Type | Description |
|---|---|
None
|
None |
Source code in src/datafun/case_xlsx_pipeline.py
run_xlsx_pipeline ¶
Run the full ETVL pipeline.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
raw_dir
|
Path
|
Path to data/raw directory. |
required |
processed_dir
|
Path
|
Path to data/processed directory. |
required |
logger
|
Any
|
Logger for logging messages. |
required |
Returns:
| Type | Description |
|---|---|
None
|
None |
Source code in src/datafun/case_xlsx_pipeline.py
transform_count_word ¶
T: Count occurrences of a word across all strings (case-insensitive).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
values
|
list[str]
|
List of strings to search. |
required |
word
|
str
|
Word to count. |
required |
Returns:
| Type | Description |
|---|---|
int
|
Total count of occurrences of the word across all strings. |
Source code in src/datafun/case_xlsx_pipeline.py
verify_count ¶
V: Verify the count is valid.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
count
|
int
|
The count to verify. |
required |
Returns:
| Type | Description |
|---|---|
None
|
None |
Source code in src/datafun/case_xlsx_pipeline.py
datafun.case_json_pipeline ¶
case_json_pipeline.py - JSON ETVL pipeline.
Author: Denise Case Date: 2026-04
Practice key Python skills related to: - ETVL pipeline structure (Extract, Transform, Verify, Load) - reading JSON files using the json module - walking JSON: dictionaries, lists, and nested structures - keyword-only function arguments - defensive programming for untrusted input - runtime type checking with isinstance() - writing results to a text file
Paths (relative to repo root):
INPUT FILE: data/raw/astros.json
OUTPUT FILE: data/processed/json_astronauts_by_craft.txt
Terminal command to run this file from the root project folder:
uv run python -m datafun.case_json_pipeline
OBS
Don't edit this file - it should remain a working example. Copy it, rename it, and modify your copy.
extract_people_list ¶
E/V: Read JSON file and extract a list of dictionaries under list_key.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
file_path
|
Path
|
Path to input JSON file. |
required |
list_key
|
str
|
Top-level key expected to map to a list (default: "people"). |
'people'
|
Returns:
| Type | Description |
|---|---|
list[dict[str, Any]]
|
A list of dictionaries from the JSON file. |
Source code in src/datafun/case_json_pipeline.py
load_counts_report ¶
L: Write craft counts to a text file in data/processed.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
counts
|
dict[str, int]
|
Dictionary mapping craft names to counts. |
required |
out_path
|
Path
|
Path to output text file. |
required |
Returns:
| Type | Description |
|---|---|
None
|
None |
Source code in src/datafun/case_json_pipeline.py
run_json_pipeline ¶
Run the full ETVL pipeline.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
raw_dir
|
Path
|
Path to data/raw directory. |
required |
processed_dir
|
Path
|
Path to data/processed directory. |
required |
logger
|
Any
|
Logger for logging messages. |
required |
Returns:
| Type | Description |
|---|---|
None
|
None |
Source code in src/datafun/case_json_pipeline.py
transform_count_by_craft ¶
transform_count_by_craft(
*,
people_list: list[dict[str, Any]],
craft_key: str = 'craft',
) -> dict[str, int]
T: Count people by craft.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
people_list
|
list[dict[str, Any]]
|
List of person dictionaries. |
required |
craft_key
|
str
|
Key to read craft name from (default: "craft"). |
'craft'
|
Returns:
| Type | Description |
|---|---|
dict[str, int]
|
Dictionary mapping craft names to counts. |
Source code in src/datafun/case_json_pipeline.py
verify_counts ¶
V: Verify counts are non-negative and craft names are not empty.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
counts
|
dict[str, int]
|
Dictionary mapping craft names to counts. |
required |
Returns:
| Type | Description |
|---|---|
None
|
None |
Source code in src/datafun/case_json_pipeline.py
datafun.case_text_pipeline ¶
case_text_pipeline.py - Text ETVL pipeline.
Author: Denise Case Date: 2026-04
Practice key Python skills related to: - ETVL pipeline structure (Extract, Transform, Verify, Load) - reading text files line by line - counting lines, words, and characters - keyword-only function arguments - error handling with raise - writing results to a text file
Paths (relative to repo root):
INPUT FILE: data/raw/romeo_and_juliet.txt
OUTPUT FILE: data/processed/txt_summary.txt
Terminal command to run this file from the root project folder:
uv run python -m datafun.case_text_pipeline
OBS
Don't edit this file - it should remain a working example. Copy it, rename it, and modify your copy.
extract_lines ¶
E: Read a text file into a list of lines.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
file_path
|
Path
|
Path to input text file. |
required |
Returns:
| Type | Description |
|---|---|
list[str]
|
List of lines from the text file. |
Source code in src/datafun/case_text_pipeline.py
load_summary_report ¶
L: Write summary to a text file in data/processed.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
summary
|
dict[str, int]
|
Dictionary with counts for 'lines', 'words', and 'chars'. |
required |
out_path
|
Path
|
Path to output text file. |
required |
Returns:
| Type | Description |
|---|---|
None
|
None |
Source code in src/datafun/case_text_pipeline.py
run_text_pipeline ¶
Run the full ETVL pipeline.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
raw_dir
|
Path
|
Path to data/raw directory. |
required |
processed_dir
|
Path
|
Path to data/processed directory. |
required |
logger
|
Any
|
Logger for logging messages. |
required |
Returns:
| Type | Description |
|---|---|
None
|
None |
Source code in src/datafun/case_text_pipeline.py
transform_line_word_char_counts ¶
T: Summarize a list of lines: line count, word count, character count.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
lines
|
list[str]
|
List of lines from the text file. |
required |
Returns:
| Type | Description |
|---|---|
dict[str, int]
|
Dictionary with counts for 'lines', 'words', and 'chars'. |
Source code in src/datafun/case_text_pipeline.py
verify_summary ¶
V: Verify the summary has expected keys and non-negative values.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
summary
|
dict[str, int]
|
Dictionary with counts for 'lines', 'words', and 'chars'. |
required |
Returns:
| Type | Description |
|---|---|
None
|
None |