These are commonly used third-party Python packages that extend core functionality.
They are not included in the Python Standard Library and must be installed as needed.
Package Management and Core Utilities
Package
Description
Links
pip
Python’s package installer (standard tool for managing packages).
Docs
setuptools
Build system and packaging library for Python.
Docs
wheel
Builds .whl distribution files for faster installs.
Docs
loguru
Simple, powerful logging with colorized output and rotation support.
Docs
httpx
Modern, async-capable HTTP client for sending web requests and APIs.
Docs
python-dotenv
Loads environment variables from .env files.
Docs
pre-commit
Automates linting, formatting, and quality checks before commits.
Docs
uv
Fast Python package manager and environment tool (replaces pip + venv).
Docs
Note: httpx replaces requests as the modern, async-capable HTTP client. Most requests examples need minimal or no changes.
Documentation
Package
Description
Links
mkdocs
Fast, lightweight documentation site generator using Markdown. Often used with the Material for MkDocs theme.
Docs
Text-to-Speech
Package
Description
Links
pyttsx3
Offline text-to-speech library for Python (works without internet).
Docs
Jupyter and Interactive Development
These packages provide notebook and interactive shell capabilities.
In most cases, VS Code already integrates Jupyter support, so you can work with .ipynb files directly - without installing the full JupyterLab environment.
Package
Description
Links
ipython
Enhanced interactive Python shell with colorized output and %magic commands.
Docs
ipykernel
Kernel interface used by VS Code’s Jupyter extension to execute notebook cells.
Docs
jupyter
Core metapackage that ties together IPython and notebook execution; recommended for compatibility.
Docs
nbdime
Tools for diffing and merging Jupyter notebooks - useful with Git.
Docs
Optional Jupyter
Package
Description
Links
ipywidgets
Adds interactive widgets (sliders, dropdowns) for richer notebooks and dashboards.
Docs
NOTE: Notebooks using ipywidgets will not render in GitHub, they can be displayed using MyBinder or other platform.
Optional JupyterLab Environment (instead of VS Code)
Package
Description
Links
jupyterlab
Full-featured, browser-based IDE for notebooks, code, and data. Use only if running JupyterLab outside VS Code (e.g., remote server, Binder, JupyterHub).
Docs
jupyterlab-git
Git integration panel for the JupyterLab web interface.
Docs
Excel File Reading and Writing
Package
Description
Links
openpyxl
Primary library for .xls / .xlsx; handles formulas, charts, formatting (~8 MB).
Docs
xlsxwriter
Advanced Excel writer supporting formatting and charts.
Docs
xlrd
Reads legacy .xls Excel files (for backward compatibility).
Docs
pyexcel
Unified access to multiple spreadsheet formats.
Docs
Package
Description
Links
duckdb
In-process analytical database optimized for OLAP workloads.
Docs
pyarrow
Apache Arrow - shared memory format for efficient data exchange across Pandas, Polars, and DuckDB.
Docs
sqlalchemy
SQL toolkit and ORM for relational databases.
Docs
dbt-core
SQL-based data transformation framework.
Docs
dbt-duckdb
DBT adapter for DuckDB back-ends.
Docs
sqlmesh
Declarative data transformations in SQL and Python.
Docs
prefect
Modern workflow orchestration and dataflow automation.
Docs
gx
Data validation and quality framework for pipelines (Great Expectations 3.x).
Docs
Data Analysis and Manipulation
Package
Description
Links
numpy
Core numerical array and matrix library (20–30 MB).
Docs
pandas
Data manipulation and analysis built on NumPy (10–20 MB).
Docs
polars
High-performance DataFrame library (Rust-based, ~5–10 MB).
Docs
Visualization
Package
Description
Links
matplotlib
Foundation plotting library (~30 MB).
Docs
seaborn
Statistical visualization built on matplotlib (~2–5 MB).
Docs
altair
Declarative statistical visualization library built on Vega-Lite.
Docs
plotly
Interactive plotting and dashboards (~20–25 MB).
Docs
Continuous Intelligence and Interactive Analytics
Package
Description
Links
shiny
Interactive web applications for data analytics in Python.
Docs
streamlit
Simplified web app framework for data dashboards.
Docs
dash
Analytical web application framework by Plotly.
Docs
Distributed and Parallel Computing
Package
Description
Links
dask
Parallel and distributed computing for analytics (~50 MB). Stable, but no longer under rapid development.
Docs
ray
Distributed computing framework for ML training, data processing, and serving.
Docs
Kafka and Stream Processing
Package
Description
Links
kafka-python-ng
Kafka client for Python 3.5+ supporting KRaft mode (~1 MB).
Docs
pyspark
Distributed computation and structured streaming (heavy, 200 + MB).
Docs
streamz
Lightweight streaming and reactive data pipelines.
Docs
Email and SMS Alerts
Package
Description
Links
dc-mailer
Send email alerts from Python (requires Gmail configuration).
Docs
dc-texter
Send SMS text alerts using Gmail (requires Gmail configuration).
Docs
Machine Learning and Optimization
These libraries provide classical and modern tools for regression, classification, forecasting, and inference.
They form the foundation for applied analytics and machine learning pipelines.
Package
Description
Links
statsmodels
Classical statistics, regression, and inference.
Docs
scikit-learn
Core ML library for supervised/unsupervised learning.
Docs
optuna
Hyperparameter optimization framework.
Docs
xgboost
Gradient boosting algorithm used in production ML.
Docs
lightgbm
Fast, memory-efficient gradient boosting by Microsoft.
Docs
catboost
Gradient boosting with categorical feature support.
Docs
Guidance
Use Statsmodels for statistical inference and regression diagnostics.
Use Scikit-learn for supervised and unsupervised ML, pipelines, and evaluation.
Use XGBoost or LightGBM for structured/tabular predictive modeling.
Use Optuna for hyperparameter tuning and optimization.
These frameworks remain core even as deep learning and LLMs expand - they form the quantitative foundation of data science.
Natural Language Processing (NLP)
Text processing and language understanding in Python can range from simple keyword analysis to advanced generative models.
For most analytics projects, focus on lightweight tools first, then explore classical and modern NLP frameworks as needed.
Package
Description
Links
beautifulsoup4
Parse and extract text or tags from HTML or XML - standard tool for web data cleanup.
Docs
regex
Enhanced regular expression engine (a more powerful alternative to Python’s built-in re).
Docs
textblob
Easy-to-use text analysis library for tokenization, sentiment, and tagging (built on NLTK).
Docs
wordcloud
Generate visual word clouds from text data for exploratory analysis.
Docs
nltk
Classic NLP library with tokenization, stemming, tagging, and linguistic corpora (~10 MB + corpora ~1 GB).
Docs
spacy
Industrial-strength NLP with pretrained models for tokenization, NER, and dependency parsing (~50 MB + models ~300 MB).
Docs
sentence-transformers
Modern library for semantic embeddings and text similarity; compact and LLM-compatible.
Docs
transformers
Hugging Face Transformers for pretrained and generative language models (large, ~500 MB+ with models).
Docs
Guidance
For web and text extraction, start with BeautifulSoup and regex .
For simple analysis and sentiment, use TextBlob or NLTK .
For modern semantic tasks (similarity, clustering, embeddings), use Sentence Transformers .
For advanced or generative NLP, move to Transformers or hosted LLM APIs.
Traditional NLP libraries (NLTK , spaCy ) remain valuable for learning language structure and preprocessing,
but for summarization, classification, and semantic tasks, LLMs and embedding models now outperform classical pipelines.
Large Language Models (LLMs) and Generative AI
Package
Description
Links
openai
Official OpenAI client for GPT and embedding models.
Docs
anthropic
Client for Claude models by Anthropic.
Docs
datasets
Large-scale dataset management and loading (Hugging Face).
Docs
langchain
Framework for LLM applications, orchestration, and retrieval.
Docs
llama-index
Data framework for context-aware retrieval and LLM apps.
Docs
faiss-cpu
Efficient vector similarity search for embeddings (Facebook AI).
Docs
chroma
Lightweight open-source vector database for embeddings.
Docs
API Development and Validation
Package
Description
Links
fastapi
High-performance web API framework.
Docs
pydantic
Data validation and settings management using type hints (v2).
Docs
uvicorn
ASGI server used to run FastAPI apps.
Docs
slowapi
Simple rate limiting for FastAPI/Starlette.
Docs
Cloud, Deployment, and Hosting
Package
Description
Links
modal
Cloud platform for running Python functions serverlessly.
Docs
gradio
Build and share ML/LLM web interfaces easily.
Docs
Summary
These libraries represent the most common ecosystem used in professional data, analytics, and AI projects.
Select only what your project requires. Combine with the Common Standard Library Modules list for a complete overview of Python’s built-in and external tooling.
```