These are commonly used third-party Python packages that extend core functionality.
They are not included in the Python Standard Library and must be installed as needed.
Package Management and Core Utilities
Package |
Description |
Links |
pip |
Python’s package installer (standard tool for managing packages). |
Docs |
setuptools |
Build system and packaging library for Python. |
Docs |
wheel |
Builds .whl distribution files for faster installs. |
Docs |
loguru |
Simple, powerful logging with colorized output and rotation support. |
Docs |
httpx |
Modern, async-capable HTTP client for sending web requests and APIs. |
Docs |
python-dotenv |
Loads environment variables from .env files. |
Docs |
pre-commit |
Automates linting, formatting, and quality checks before commits. |
Docs |
uv |
Fast Python package manager and virtual environment tool (replaces pip + venv). |
Docs |
Note: httpx
replaces requests
as the modern, async-capable HTTP client. Most requests
examples need minimal or no changes.
Documentation
Package |
Description |
Links |
mkdocs |
Fast, lightweight documentation site generator using Markdown. Often used with the Material for MkDocs theme. |
Docs |
Text-to-Speech
Package |
Description |
Links |
pyttsx3 |
Offline text-to-speech library for Python (works without internet). |
Docs |
Jupyter and Interactive Development
These packages provide notebook and interactive shell capabilities.
In most cases, VS Code already integrates Jupyter support, so you can work with .ipynb
files directly — without installing the full JupyterLab environment.
Package |
Description |
Links |
ipython |
Enhanced interactive Python shell with colorized output and %magic commands. |
Docs |
ipykernel |
Kernel interface used by VS Code’s Jupyter extension to execute notebook cells. |
Docs |
jupyter |
Core metapackage that ties together IPython and notebook execution; recommended for compatibility. |
Docs |
nbdime |
Tools for diffing and merging Jupyter notebooks — useful with Git. |
Docs |
Optional Jupyter
Package |
Description |
Links |
ipywidgets |
Adds interactive widgets (sliders, dropdowns) for richer notebooks and dashboards. |
Docs |
NOTE: Notebooks using ipywidgets will not render in GitHub, they can be displayed using MyBinder or other platform.
Optional JupyterLab Environment (instead of VS Code)
Package |
Description |
Links |
jupyterlab |
Full-featured, browser-based IDE for notebooks, code, and data. Use only if running JupyterLab outside VS Code (e.g., remote server, Binder, JupyterHub). |
Docs |
jupyterlab-git |
Git integration panel for the JupyterLab web interface. |
Docs |
Excel File Reading and Writing
Package |
Description |
Links |
openpyxl |
Primary library for .xls / .xlsx ; handles formulas, charts, formatting (~8 MB). |
Docs |
xlsxwriter |
Advanced Excel writer supporting formatting and charts. |
Docs |
xlrd |
Reads legacy .xls Excel files (for backward compatibility). |
Docs |
pyexcel |
Unified access to multiple spreadsheet formats. |
Docs |
Package |
Description |
Links |
duckdb |
In-process analytical database optimized for OLAP workloads. |
Docs |
pyarrow |
Apache Arrow — shared memory format for efficient data exchange across Pandas, Polars, and DuckDB. |
Docs |
sqlalchemy |
SQL toolkit and ORM for relational databases. |
Docs |
dbt-core |
SQL-based data transformation framework. |
Docs |
dbt-duckdb |
DBT adapter for DuckDB back-ends. |
Docs |
sqlmesh |
Declarative data transformations in SQL and Python. |
Docs |
prefect |
Modern workflow orchestration and dataflow automation. |
Docs |
gx |
Data validation and quality framework for pipelines (Great Expectations 3.x). |
Docs |
Data Analysis and Manipulation
Package |
Description |
Links |
numpy |
Core numerical array and matrix library (20–30 MB). |
Docs |
pandas |
Data manipulation and analysis built on NumPy (10–20 MB). |
Docs |
polars |
High-performance DataFrame library (Rust-based, ~5–10 MB). |
Docs |
Visualization
Package |
Description |
Links |
matplotlib |
Foundation plotting library (~30 MB). |
Docs |
seaborn |
Statistical visualization built on matplotlib (~2–5 MB). |
Docs |
altair |
Declarative statistical visualization library built on Vega-Lite. |
Docs |
plotly |
Interactive plotting and dashboards (~20–25 MB). |
Docs |
Continuous Intelligence and Interactive Analytics
Package |
Description |
Links |
shiny |
Interactive web applications for data analytics in Python. |
Docs |
streamlit |
Simplified web app framework for data dashboards. |
Docs |
dash |
Analytical web application framework by Plotly. |
Docs |
Distributed and Parallel Computing
Package |
Description |
Links |
dask |
Parallel and distributed computing for analytics (~50 MB). Stable, but no longer under rapid development. |
Docs |
ray |
Distributed computing framework for ML training, data processing, and serving. |
Docs |
Kafka and Stream Processing
Package |
Description |
Links |
kafka-python-ng |
Kafka client for Python 3.5+ supporting KRaft mode (~1 MB). |
Docs |
pyspark |
Distributed computation and structured streaming (heavy, 200 + MB). |
Docs |
streamz |
Lightweight streaming and reactive data pipelines. |
Docs |
Email and SMS Alerts
Package |
Description |
Links |
dc-mailer |
Send email alerts from Python (requires Gmail configuration). |
Docs |
dc-texter |
Send SMS text alerts using Gmail (requires Gmail configuration). |
Docs |
Machine Learning and Optimization
These libraries provide classical and modern tools for regression, classification, forecasting, and inference.
They form the foundation for applied analytics and machine learning pipelines.
Package |
Description |
Links |
statsmodels |
Classical statistics, regression, and inference. |
Docs |
scikit-learn |
Core ML library for supervised/unsupervised learning. |
Docs |
optuna |
Hyperparameter optimization framework. |
Docs |
xgboost |
Gradient boosting algorithm used in production ML. |
Docs |
lightgbm |
Fast, memory-efficient gradient boosting by Microsoft. |
Docs |
catboost |
Gradient boosting with categorical feature support. |
Docs |
Guidance
- Use Statsmodels for statistical inference and regression diagnostics.
- Use Scikit-learn for supervised and unsupervised ML, pipelines, and evaluation.
- Use XGBoost or LightGBM for structured/tabular predictive modeling.
- Use Optuna for hyperparameter tuning and optimization.
- These frameworks remain core even as deep learning and LLMs expand — they form the quantitative foundation of data science.
Natural Language Processing (NLP)
Text processing and language understanding in Python can range from simple keyword analysis to advanced generative models.
For most analytics projects, focus on lightweight tools first, then explore classical and modern NLP frameworks as needed.
Package |
Description |
Links |
beautifulsoup4 |
Parse and extract text or tags from HTML or XML — standard tool for web data cleanup. |
Docs |
regex |
Enhanced regular expression engine (a more powerful alternative to Python’s built-in re ). |
Docs |
textblob |
Easy-to-use text analysis library for tokenization, sentiment, and tagging (built on NLTK). |
Docs |
wordcloud |
Generate visual word clouds from text data for exploratory analysis. |
Docs |
nltk |
Classic NLP library with tokenization, stemming, tagging, and linguistic corpora (~10 MB + corpora ~1 GB). |
Docs |
spacy |
Industrial-strength NLP with pretrained models for tokenization, NER, and dependency parsing (~50 MB + models ~300 MB). |
Docs |
sentence-transformers |
Modern library for semantic embeddings and text similarity; compact and LLM-compatible. |
Docs |
transformers |
Hugging Face Transformers for pretrained and generative language models (large, ~500 MB+ with models). |
Docs |
Guidance
- For web and text extraction, start with BeautifulSoup and regex.
- For simple analysis and sentiment, use TextBlob or NLTK.
- For modern semantic tasks (similarity, clustering, embeddings), use Sentence Transformers.
- For advanced or generative NLP, move to Transformers or hosted LLM APIs.
Traditional NLP libraries (NLTK, spaCy) remain valuable for learning language structure and preprocessing,
but for summarization, classification, and semantic tasks, LLMs and embedding models now outperform classical pipelines.
Large Language Models (LLMs) and Generative AI
Package |
Description |
Links |
openai |
Official OpenAI client for GPT and embedding models. |
Docs |
anthropic |
Client for Claude models by Anthropic. |
Docs |
datasets |
Large-scale dataset management and loading (Hugging Face). |
Docs |
langchain |
Framework for LLM applications, orchestration, and retrieval. |
Docs |
llama-index |
Data framework for context-aware retrieval and LLM apps. |
Docs |
faiss-cpu |
Efficient vector similarity search for embeddings (Facebook AI). |
Docs |
chroma |
Lightweight open-source vector database for embeddings. |
Docs |
API Development and Validation
Package |
Description |
Links |
fastapi |
High-performance web API framework. |
Docs |
pydantic |
Data validation and settings management using type hints (v2). |
Docs |
uvicorn |
ASGI server used to run FastAPI apps. |
Docs |
slowapi |
Simple rate limiting for FastAPI/Starlette. |
Docs |
Cloud, Deployment, and Hosting
Package |
Description |
Links |
modal |
Cloud platform for running Python functions serverlessly. |
Docs |
gradio |
Build and share ML/LLM web interfaces easily. |
Docs |
Summary
These libraries represent the most common ecosystem used in professional data, analytics, and AI projects.
Select only what your project requires. Combine with the Common Standard Library Modules list for a complete overview of Python’s built-in and external tooling.
```