ROADMAP: Languages to Know¶
For Python Data Analytics, Business Intelligence, Machine Learning, and more.
These languages and formats appear throughout data analytics from early scripts to advanced machine learning and data products. Check the boxes as you add skills.
Documentation & Communication¶
- [ ] Markdown - Format notes, README files, Jupyter content, Sphinx reports
- [ ] LaTeX - ("la-TECH") Used in academic writing
Programming & Scripting¶
- [ ] Python - Main language for data analytics, ML, BI, and automation
- [ ] Excel (Formulas + Functions + Python) - Excel now supports Python natively, allowing for more powerful data analysis and visualization directly in Excel workbooks. This integration is replacing many VBA-based tasks with modern, scalable Python scripts.
- [ ] PowerShell - Powerful cross-platform scripting tool for automation
- [ ] Bash / Shell - Command-line scripting for Linux/macOS/WSL workflows
- [ ] R - Optional, mainly for heavy statistical modeling and some legacy analytics projects
Query & Data Definition¶
- [ ] SQL - Essential for querying, joining, and aggregating data in relational databases
- [ ] DuckDB SQL - Great for local analytics and in-process querying (becoming increasingly popular)
- [ ] NoSQL / MongoDB Query Language - Optional, mainly for document-based storage
- [ ] Graph Query Languages (Cypher / GQL) - Optional unless working specifically with graph databases like Neo4j or TigerGraph
Data Formats & Configuration¶
- [ ] CSV / TSV - Common text-based data formats for tabular data
- [ ] JSON - Lightweight data format used in APIs, configs, and logs
- [ ] YAML - Common in configuration files (e.g., workflows, Docker, CI/CD)
- [ ] Parquet - Essential for big data work and columnar storage (used in Spark, DuckDB, etc.)
- [ ] Pickle (.pkl) - Python-specific binary format for saving/loading objects (use with caution; not human-readable or cross-language - Parquet or JSON preferred)
Web, Text, and App Interfaces¶
- [ ] HTML - Essential for dashboards, Flask, and web-based reporting
- [ ] CSS - Optional, mainly for web-based visual customization
- [ ] JavaScript - Optional, unless customizing interactive visualizations or web dashboards
- [ ] Streamlit - User-friendly tool for creating quick, interactive UIs for data projects. Minimal code, fast deployment.
- [ ] Dash - More customizable than Streamlit, ideal for complex web applications with deeper HTML integration.
- [ ] PyShiny - A Python port of R's Shiny, allows for reactive, event-driven UIs without frontend coding. Great for interactive dashboards.
Machine Learning & AI¶
- [ ] Python with scikit-learn - Core for data science and machine learning
- [ ] TensorFlow / PyTorch code syntax - For deep learning projects
- [ ] ONNX / SavedModel formats - For sharing ML models and deploying models in production environments
- [ ] LangChain - A modern framework for building applications with LLMs (Large Language Models). It enables advanced text processing, conversational AI, and document-based search.
Streaming & Web Mining¶
- [ ] Kafka message formats (JSON, Avro) - For real-time streaming analytics
- [ ] Scrapy / BeautifulSoup - Python-based tools for web scraping
- [ ] Regex - Critical for text mining and data cleaning
- [ ] Fugue - Extends native Python, Pandas, and SQL code to distributed computing (Spark, Dask).
- [ ] Prefect - Modern workflow orchestration tool for scheduling, monitoring, and error handling in data pipelines. Simplifies scheduling, retries, and logging for ETL and ELT processes.
BI & Automation Expression Languages¶
- [ ] VBA (Visual Basic for Applications) - Legacy; still useful but diminishing. Python in Excel is replacing many use cases.
- [ ] DAX (Data Analysis Expressions) - Critical for Power BI, measures, and complex filters
- [ ] Power Query M - Essential for data transformation in Power BI and Excel