Chapter 1

Languages

This chapter provides an introduction to popular languages for analytics.

Languages in Analytics

Data analysts often work with a variety of languages, which can be broadly categorized into programming languages and markup languages.

Programming languages are further classified into compiled and scripting languages.

Compiled languages, such as Go, Rust, Java, and C# (“C-sharp”) require a separate compilation step to convert source code into machine-readable code, resulting in faster execution times and better performance optimization.

Scripting languages, such as Python, R, and JavaScript, are interpreted at runtime, providing more flexibility and ease of use, making them popular choices for data analysis tasks.

Markup languages, like Markdown, HTML, and CSS, are used to structure and present data, rather than performing computations.

Data analysts often use markup languages to store, exchange, and visualize data, in conjunction with programming languages for data manipulation and analysis.

Familiarity with various languages across these categories enables data analysts to effectively handle diverse data sources, perform complex analyses, and communicate results in a clear, accessible manner.

In alphabetical order, some of the languages you may encounter include the following.

CSS

Markup Language Markup Language Web Development Web Development

CSS (Cascading Style Sheets) is a stylesheet language used for describing the look and formatting of a document or web page written in HTML. While not directly related to data analytics, it’s essential for creating visually appealing dashboards and reports.

Go

Programming Language Programming Language Compiled Language Compiled Language

Go is a statically typed, compiled language with strong support for concurrent programming. While not as popular for data analytics as Python or R, Go is gaining traction for developing high-performance data processing tools.

HTML

Markup Language Markup Language Web Development Web Development

HTML (Hypertext Markup Language) is the standard markup language used to create web pages. It is useful for structuring and formatting web content, including data visualizations and interactive analytics applications.

JavaScript

Programming Language Programming Language Scripting Language Scripting Language Web Development Web Development

JavaScript is a widely-used programming language that enables interactivity and dynamic content on the web. In data analytics, JavaScript is commonly used with libraries like D3.js to create interactive visualizations and web-based applications.

Julia

Programming Language Programming Language Scripting Language Scripting Language Jupyter Support Jupyter Support

Julia is a high-level, high-performance programming language for technical computing. It is gaining popularity in data analytics due to its speed, ease of use, and extensive library ecosystem, including packages for data manipulation, statistical analysis, and machine learning. It can be used in Jupyter notebooks along with Python.

LaTeX

Markup Language Markup Language Typesetting Typesetting

LaTeX (“la-TECH”) is a markup language used for creating professional-looking documents, including academic papers, capstone reports, theses, and presentations. It is widely used in the scientific and technical communities due to its ability to handle complex equations and symbols with ease.

Markdown

Markup Language Markup Language Jupyter Support Jupyter Support

Markdown is a lightweight markup language used to create formatted text documents. While not specific to data analytics, it is commonly used to document code, write README files, and create reports in a simple and human-readable format. It is commonly used in Jupyter notebooks along with Python.

PowerShell

Programming Language Programming Language Scripting Language Scripting Language

PowerShell is a powerful scripting language and shell designed for automating tasks and managing configurations in Windows environments. While not commonly used for data analytics, it can be employed for data extraction, transformation, and automation tasks.

Python

Programming Language Programming Language Scripting Language Scripting Language Jupyter Support Jupyter Support

Python is a popular programming language for data science and machine learning. It offers extensive libraries and tools for data analysis, visualization, and machine learning, making it an excellent choice for data analytics tasks.

R

Programming Language Programming Language Scripting Language Scripting Language Jupyter Support Jupyter Support

R is a programming language and software environment for statistical computing and graphics. It is widely used in data analytics for statistical analysis, data manipulation, and visualization. R can be used in Jupyter notebooks along with Python.

Rust

Programming Language Programming Language Compiled Language Compiled Language

Rust is a systems programming language focused on safety, concurrency, and performance. While not as widely used for data analytics, it can be employed for building high-performance data processing tools and libraries.

SQL

Programming Language Programming Language Declarative Language Declarative Language

SQL is a domain-specific programming language used to manage and manipulate relational databases.

Typst

Markup Language Markup Language Typesetting Typesetting

Typst is a new typesetting option that aims to simplify the document creation process. It provides an intuitive markup language for formatting text, with support for mathematical equations, tables, and figures. It can be compiled into various document formats, including PDF and HTML.

Subsections of Languages

CSS

CSS is a powerful styling language used to add visual effects to web pages.

Why CSS?

For web developers and designers, Cascading Style Sheets (CSS) is an essential skill for creating attractive and engaging websites.

  • CSS helps to create visually appealing layouts and designs that enhance user experience.
  • It allows for consistent styling across all pages of a website, making it easier to maintain and update.

CSS Syntax

  • CSS uses a set of rules and declarations to style HTML elements.
  • Selectors are used to target specific HTML elements, while properties define the styling rules.

Free Resources for Learning CSS

  • CSS Tricks: A website with a wide range of articles, tutorials, and resources for learning CSS
  • MDN Web Docs - CSS: A comprehensive guide to CSS, with documentation and examples
  • W3Schools CSS Tutorial: A free, interactive tutorial for learning CSS, with practical examples and exercises
  • Codecademy CSS Course: An interactive course that covers the basics of CSS, with hands-on coding exercises
  • CSS Zen Garden: A showcase of creative CSS designs, with source code available for learning

File Extensions

  • .css

Using CSS

There is no installation needed to begin using CSS to style web pages. CSS is understood by web browsers such as Chrome, Firefox, Safari, and Edge. To use CSS, you can define styles in a separate CSS file or in the head section of an HTML file using the <style> element.

To add a .css file to an HTML file,
include a link in the head section of the HTML file.

<head>
  <link rel="stylesheet" href="styles.css">
</head>

An example of styles.css is shown below.

body {
  font-family: Arial, sans-serif;
  background-color: #f0f0f0;
}

h1 {
  color: #333;
  font-size: 2em;
  margin-bottom: 1em;
}

p {
  color: #666;
  font-size: 1.2em;
  line-height: 1.5;
  margin-bottom: 1.5em;
}

Responsive Design

Responsive design is an approach to web design that aims to create websites that adapt to different screen sizes and devices. With responsive design, web developers can ensure that their websites look and function well on desktops, laptops, tablets, and smartphones, and provide a consistent user experience across all devices.

To create responsive websites, CSS is used to define media queries that specify different styles and layouts for different screen sizes. By using media queries, web developers can adjust the design of their websites based on the width of the viewport, the orientation of the device, and other factors.

Design Skills

Good design is an essential aspect of creating effective and engaging websites and dashboards. CSS plays a crucial role in web design, as it allows users to control the visual presentation of their sites and create attractive and user-friendly interfaces.

To create good design with CSS, it’s important to have a solid understanding of typography, color theory, layout principles, and user experience design. Web developers can use CSS to define fonts, colors, spacing, positioning, and other visual elements, and use design principles to create a cohesive and appealing look and feel.

Because getting dashboards and web pages to look good on all possible screen sizes and orientations, many of us prefer to use professionally-created CSS rather than our own.

CSS frameworks are pre-built libraries of CSS and JavaScript code used to streamline web development and create consistent and responsive displays. These frameworks are desiged to be responsive and look good on screens ranging from mobile devices like smart phones to large, wall-mounted screens. They provide a range of pre-designed elements, such as navigation bars, forms, and buttons, that can be easily customized and incorporated into dashboards and web projects.

Popular CSS frameworks include Bootstrap, Material Design Bootstrap (MDB), Foundation, and Bulma. These frameworks offer a wide range of design options, robust documentation, and support from their communities.

CSS in Dashboarding Frameworks

Many popular data analytics dashboarding frameworks allow customization for analysts with a knowledge of CSS.

CSS in Tableau

Tableau provides a range of customization options for dashboard styling, including the ability to use custom CSS code to modify the appearance of dashboards and reports. Users can create custom themes and apply them to their dashboards, or use CSS to modify individual elements such as fonts, colors, and backgrounds.

CSS in Power BI

Power BI allows users to customize the appearance of their dashboards using themes and custom CSS code. Users can modify the styling of individual elements such as charts, tables, and cards, and can apply custom CSS classes to elements for greater control over styling.

CSS in Plotly

Plotly is a web-based data visualization platform that provides a range of customization options for dashboard styling, including the ability to use custom CSS code to modify the appearance of charts and graphs. Users can modify the styling of individual elements such as colors, fonts, and backgrounds, and can apply custom CSS classes to elements for more granular control over styling.

Plotly supports multiple programming languages including Python, R, and JavaScript.

CSS in Metabase

Metabase is an open-source business intelligence and data analytics platform that allows users to create interactive dashboards and reports. It provides a range of customization options for dashboard styling, including the ability to use custom CSS code to modify the appearance of dashboards and reports. Users can modify the styling of individual elements such as fonts, colors, and backgrounds, and can apply custom CSS classes to elements for greater control over styling.

Metabase supports SQL queries and has a web-based interface.

CSS in Redash

Redash is an open-source data visualization and dashboarding platform that allows users to connect to various data sources and create interactive dashboards and reports. It provides a range of customization options for dashboard styling, including the ability to use custom CSS code to modify the appearance of dashboards and reports. Users can modify the styling of individual elements such as fonts, colors, and backgrounds, and can apply custom CSS classes to elements for more granular control over styling.

Redash connects to SQL databases, MongoDB, and APIs, and includes support for Python scripts.

See Also

Read more about some of these important options in:

Go

Powerful and Efficient Programming Language

Golang, also known as Go, is an open-source programming language developed by Google. It is designed for simplicity, efficiency, and strong support for concurrent programming.

Why Go?

For developers, Golang offers several advantages over other programming languages:

  • Go is designed for simplicity, making it easy to learn and write.
  • It has strong support for concurrent programming, allowing for efficient performance in multi-core environments.
  • Go has a garbage collector, which automatically manages memory allocation and deallocation.
  • It has a growing ecosystem and community, with a range of libraries and frameworks available.

Go Syntax

  • Golang has a clean and straightforward syntax, influenced by C but with some improvements.
  • It uses static typing and supports various data types, including integers, floats, strings, and arrays.
  • Go has built-in support for concurrent programming with goroutines and channels.

Free Resources for Learning Go

  • The Go Programming Language: The official Go website, with documentation, tutorials, and downloads.
  • A Tour of Go: An interactive introduction to Golang, with hands-on coding exercises.
  • Go by Example: A collection of practical examples and snippets for learning Golang.
  • Effective Go: A guide to writing efficient and idiomatic Golang code.
  • The Go Playground: An online environment for writing and testing Golang code.

Golang Frameworks and Libraries

  • Golang has a growing ecosystem of libraries and frameworks, catering to various use cases such as web development, data processing, and networking.
  • Popular Golang frameworks and libraries include Gin, Revel, and Gorilla.

File Extensions

  • .go

HTML

HTML is a markup language used for creating web pages and applications.

Why HTML?

HTML is essential for data analysts and developers who want to create web-based applications and documents.

  • HTML skills allow you to create and publish web content and applications.
  • HTML can be used with other languages like CSS and JavaScript to create dynamic web pages and applications.

HTML Syntax

  • HTML is a markup language that uses tags to define elements on a web page.
  • Tags are used to define headings, paragraphs, links, images, and other elements.
  • HTML documents are typically saved with the .html file extension.

Free Resources for Learning HTML

File Extensions

  • .html

JavaScript

JavaScript is a popular programming language used for web development and beyond. In this page, we will cover some basics of JavaScript.

Why JavaScript?

JavaScript is widely used for building web applications, and it’s a vital skill for web developers. Some of the reasons to learn JavaScript include:

  • Interactivity: JavaScript makes websites more interactive and engaging, allowing for features such as animations, user input validation, and dynamic content updates.

  • Front-end web development: JavaScript is used heavily in front-end web development, enabling developers to build user interfaces and dynamic web pages.

  • Back-end web development: JavaScript can also be used for back-end web development, allowing developers to build server-side applications and APIs.

  • Cross-platform development: With tools like Node.js, JavaScript can be used to build cross-platform applications for desktop and mobile devices.

Free Resources for Learning JavaScript

  • JavaScript Tutorial for Beginners: A comprehensive tutorial covering the basics of JavaScript syntax, data types, operators, functions, and more.

  • MDN Web Docs: JavaScript: Mozilla’s guide to JavaScript, including a reference guide, tutorials, and examples.

  • Eloquent JavaScript: A free online book that covers the basics of JavaScript programming, including control structures, functions, objects, and more.

  • JavaScript30: A free 30-day JavaScript coding challenge that covers different aspects of the language and helps build real-world projects.

  • Codecademy: JavaScript: An interactive online course that teaches the basics of JavaScript programming.

Free Resources for Advanced JavaScript

  • You Don’t Know JS: A series of books that covers advanced JavaScript topics, including closures, prototypes, asynchronous programming, and more.

  • JSBooks: A collection of free JavaScript books covering advanced topics such as functional programming, design patterns, and algorithms.

  • Node.js: A JavaScript runtime built on Chrome’s V8 JavaScript engine that allows developers to build scalable network applications.

File Extensions

  • .js

Julia

High-Performance Dynamic Programming Language

Julia is a high-level, high-performance dynamic programming language designed for numerical and scientific computing, data analysis, and machine learning.

Why Julia?

For developers, Julia offers several advantages over other programming languages:

  • Julia has a just-in-time (JIT) compiler, which means that it can run code as fast as statically compiled languages like C and Fortran.
  • It has a simple and expressive syntax, making it easy to learn and write.
  • Julia supports multiple dispatch, which allows for flexible and efficient handling of functions with different argument types.
  • It has a growing ecosystem and community, with a range of libraries and frameworks available.

Julia Syntax

  • Julia has a simple and readable syntax, with support for multiple dispatch and type inference.
  • It supports various data types, including integers, floats, strings, and arrays.
  • Julia has built-in support for parallel and distributed computing.

Project Management

Project.toml is a configuration file used in Julia projects to specify the project’s dependencies and other metadata. It is part of the Julia package management system, which provides a standardized way to manage packages and their dependencies.

Project.toml is used by the Julia package manager to create and manage project environments. When a Project.toml file is present in a project directory, the package manager can use this file to create a dedicated environment for the project, separate from the user’s global environment or other project environments.

It allows developers to specify the exact version of each dependency required by the project. This helps ensure that the project is compatible with specific versions of each package, and can help avoid conflicts or unexpected behavior caused by incompatible package versions.

Project.toml can also include other metadata about the project, such as its name, version number, and author information. This makes it easy to share and distribute the project with others.

Free Resources for Learning Julia

Julia Frameworks and Libraries

  • Julia has a growing ecosystem of libraries and frameworks, catering to various use cases such as data processing, scientific computing, and machine learning.
  • Popular Julia frameworks and libraries include Flux, DifferentialEquations.jl, and JuMP.

File Extensions

  • .jl

LaTeX

LaTeX, pronounced “la-TECH”, is a high-quality typesetting system designed for the production of technical and scientific documents. It is widely used in academia, industry, and publishing, and is known for its ability to produce professional-looking documents with complex mathematical formulas and graphics.

Why LaTeX?

LaTeX offers several advantages over traditional word processors such as Microsoft Word or Google Docs:

  • Precision: LaTeX is designed to produce high-quality, precise documents with consistent formatting, layout, and typesetting.

  • Flexibility: LaTeX allows users to easily create and format complex mathematical equations, symbols, and diagrams.

  • Portability: LaTeX documents can be easily converted to a variety of formats, including PDF, HTML, and other document types.

LaTeX for Scientific Writing

  • LaTeX is a preferred tool for writing scientific documents such as research papers, technical reports, capstone project reports, and theses.

  • LaTeX provides powerful tools for creating and formatting complex equations and symbols, making it ideal for scientific writing.

LaTeX for Presentations

  • LaTeX can be used to create professional-looking presentations using the Beamer class.

  • Beamer provides a variety of presentation templates and themes, and allows users to easily incorporate mathematical equations and graphics.

Basic LaTeX Syntax

LaTeX uses markup syntax to create formatted text, equations, and graphics.

Here are some basic syntax elements of LaTeX.

Math Mode

Math mode is used to create mathematical equations and symbols. To enter math mode, use the $ symbol to enclose your equation or symbol.

$f(x) = x^2$

Commands

LaTeX uses commands to perform various formatting and typesetting tasks. Commands are preceded by a backslash (\).

\section{Introduction}

Environments

Environments are used to apply formatting or styles to a block of text or content. Environments are enclosed by the \begin{environment} and \end{environment} commands.

\begin{itemize}
\item Item 1
\item Item 2
\item Item 3
\end{itemize}

Integration

LaTeX can be used in combination with other tools, such as BibTeX for managing bibliographic references and citations.

Free Resources for Learning LaTeX

  • LaTeX Project: The official website for LaTeX, with documentation, tutorials, and resources.

  • Overleaf: A cloud-based LaTeX editor with templates, tutorials, and collaboration tools.

  • ShareLaTeX: A cloud-based LaTeX editor with templates, tutorials, and collaboration tools.

  • LaTeX Wikibook: A community-driven LaTeX guide with tutorials, examples, and reference materials.

  • LaTeX Tutorial by Overleaf: A beginner-friendly LaTeX tutorial by Overleaf, with examples and exercises.

File Extensions

Here are some common file extensions used in LaTeX.

  • .tex: The main file extension for LaTeX documents.

  • .bib: The file extension for bibliographic data files, used with BibTeX to manage references and citations in reports and documents.

See Also

Markdown

Markdown is a lightweight markup language for formatting text on the web.

Why Markdown?

Markdown is an essential tool for data analysts and developers. With its simple syntax and powerful features, Markdown is easy to learn, widely used, and perfect for creating structured documents and web content.

For data analysts and developers:

  • Markdown is an invaluable skill for creating clear and concise documentation of our work.
  • Markdown skills help communicate our findings more effectively to colleagues and stakeholders, and make our work more accessible and engaging to others.

Markdown for READMEs

  • Markdown can be used to create professional README.md files to introduce our project repositories on GitHub.
  • README.md files help others understand the purpose of our project, its features, and how to use it.

Markdown for Jupyter Notebooks

  • Markdown is widely used in Jupyter Notebooks, a popular tool for data analysis and scientific computing.
  • With Markdown, we can create rich and informative narratives alongside our code and visualizations.

Basic Markdown Syntax

Markdown uses plain text formatting to create headers, lists, links, and other formatting elements. Here are some basic syntax elements of Markdown:

Headers

Headers are used to create headings or subheadings in your document. To create a header, use the # symbol followed by a space and the text for your heading. Markdown supports up to six levels of headers.

# This is a level one header
## This is a level two header
### This is a level three header
#### This is a level four header
##### This is a level five header
###### This is a level six header

Lists

Lists are used to create ordered and unordered lists in your document. To create a list, use either the * symbol or the - symbol for an unordered list, or use numbers for an ordered list.

An unordered list in Markdown is created by using the “- " syntax (“dash space”), followed by the list item.

- Item 1
- Item 2
- Item 3

An ordered list in Markdown is created by using the “1. " syntax (“one dot space”), followed by the list item. Markdown will automatically increment the number of each item in the list as the page is rendered, ensuring that the numbers are displayed correctly. This makes it easy to create numbered lists in Markdown without having to manually adjust the numbers.

1. Item 1
1. Item 2
1. Item 3

Links are used to create hyperlinks in your document. To create a link, use square brackets to enclose the link text, followed by the link URL in parentheses.

[Markdown: Getting Started](https://www.markdownguide.org/getting-started/)

Images

Images are used to display images in your document. To add an image, use an exclamation point, followed by square brackets to enclose the alt text, and the image URL in parentheses.

![Alt Text](image.url)

Advanced Markdown Syntax

Markdown also supports more advanced syntax, such as tables, code blocks, and inline code. Here are some examples of advanced Markdown syntax.

Tables

Tables are used to display data in rows and columns. To create a table, use hyphens (-) for the headers and pipes or vertical bars (|) to separate the columns.

Code Blocks

Code blocks are used to display code in your document. To create a code block, use triple backticks followed by the language name, and then your code. End your code block with triple backticks.


```python
print("Hello, world!")
```

Inline Code

Inline code is used to display code within a paragraph. To create inline code, use single backticks (`) to enclose your code.


Use the `print()` function to print a message to the console.

Free Resources for Learning Markdown

Free Resources for Learning GitHub-Flavored Markdown

File Extensions

  • .md
  • .markdown

PowerShell

Cross-Platform Automation and Configuration

PowerShell Core is an open-source automation and configuration programming language for Windows, Linux, and macOS. It provides a powerful command-line interface for managing and automating systems and processes.

Why PowerShell Core?

For developers and system administrators, PowerShell Core offers several advantages over other automation and scripting tools:

  • PowerShell Core is cross-platform.
  • It’s a powerful and flexible scripting language.
  • It has a large and active community of users and contributors, with many resources and tutorials available.

PowerShell Core Syntax

  • PowerShell Core uses a command-line interface and scripting language that is similar to Unix shell scripting.
  • It supports various data types, including strings, numbers, arrays, and objects.
  • PowerShell Core has built-in support for remote management and automation.

Free Resources for Learning PowerShell Core

PowerShell Core Modules and Libraries

  • PowerShell Core has a large and growing collection of modules and libraries, catering to various automation and system management use cases.

Popular PowerShell modules include Pester, PSReadLine, and PowerShell Gallery.

File Extensions

  • .ps1

See Also

There is more information about PowerShell in the Terminals Chapter.

Python

Python is a high-level programming language used for a wide range of applications, from data analysis to web development.

Why Python?

Python is an essential tool for data analysts and developers. With its easy-to-learn syntax, vast library of modules, and robust community support, Python is perfect for:

  • Data analysis, including statistical analysis, data visualization, and machine learning.
  • Web development, including server-side programming, web scraping, and automation.
  • Scripting, including system administration, text processing, and task automation.
  • Scientific computing, including simulations, modeling, and optimization.

Learning Python can be a valuable investment in your career.

Installation

The installation process for Git depends on your operating system. Follow the instructions below based on your platform:

Python Resources

  • Python.org - The official website of the Python programming language. Includes documentation, tutorials, and downloads for the latest versions of Python
  • Python for Data Analysis, 3E Open Edition or 2E Print - A comprehensive guide to using Python for data analysis, written by Wes McKinney, the creator of pandas
  • Python Data Science Handbook - A free online book that covers the fundamentals of data science using Python
  • Real Python - A collection of tutorials, courses, and articles on Python programming, web development, and data science
  • Python Crash Course - A beginner-friendly guide to Python programming, with examples and exercises covering key topics such as variables, functions, and control flow
  • Python Lingo from Luciano Ramalho, author of the advanced book Fluent Python.

File Extensions

  • .py - Python source code files
  • .ipynb - Jupyter Notebook files (“interactive Python notebook”)
  • .pyc - Compiled Python files
  • .pyd - Python extension modules
  • .pyo - Optimized Python files
  • .whl - Python package distribution files (“wheels”)

Subsections of Python

Python: Basics

Python is a popular high-level programming language that is easy to learn and widely used in data analysis, machine learning, web development, and many other fields.

Defining Variables

In Python, we can define a variable and assign a value to it using the “=” operator. For example:

x = 10

Here, we’ve defined a variable x and assigned it the value of 10.

Performing Operations

We can also perform mathematical operations on variables:

y = 5
z = x + y

Here, we’ve defined a variable y and added it to x to create a new variable z.

Expressions

Expressions are combinations of operators and operands that can be evaluated to produce a value.

Python allows us to use expressions to perform operations on variables. For example:

a = 2
b = 3
c = a * b + 1

Here, we’ve defined three variables: a, b, and c. We’ve used the * operator to multiply a and b, and then added 1 to the result.

Expressions can also include functions:

import math
d = math.sqrt(a**2 + b**2)

Here, we’ve imported the math module and used the sqrt() function to calculate the square root of a^2 + b^2.

product = x * y
quotient = x / y

# Print statements
print("x =", x)
print("y =", y)
print("x + y =", sum)
print("x - y =", difference)
print("x * y =", product)
print("x / y =", quotient)
x = 10
y = 5
z = x + y
print(z)

This code will create two variables, x and y, assign them the values 10 and 5, respectively, and then add them together to create a new variable z with the value 15. Finally, the code prints the value of z.

Statements

In Python, a statement is a line of code that performs an action or task.

Statements are the smallest unit of code that can be executed and they represent an action or command. Each statement performs a specific task, such as defining a variable, calling a function, or creating a loop.

x = 10
print("Hello, world!")
def add_numbers(a, b):
    return a + b

In the above example, the first line (x = 10) is a statement that assigns the value 10 to the variable x. The second line (print(“Hello, world!”)) is a statement that prints the message “Hello, world!” to the console. The third line defines a function add_numbers that takes two arguments and returns their sum.

Statements vs Expressions

Some expressions can be statements, such as an assignment expression, which assigns a value to a variable.

However, not all statements are expressions. For example, a print statement does not evaluate to a value and cannot be used as part of an expression.

Script

A Python script is simply a collection of statements executed in order to achieve a desired outcome.

Python: Installation

Python is a high-level programming language used for a wide range of applications, from data analysis to web development.

Mac/Linux Users

  • Option 1: Official installation instructions. Follow instructions on the official Python website. This is the most up-to-date and comprehensive guide to installing Python on your system.

  • Option 2: Step-by-step installation guide. Check out our installation instructions for a step-by-step guide.

Windows Users

  • Option 1: Official installation instructions. Follow instructions on the official Python website. This is the most up-to-date and comprehensive guide to installing Python on your system.

  • Option 2: Step-by-step installation guide. Check out our detailed installation instructions for a step-by-step guide.

Subsections of Python: Installation

Python: Mac/Linux

Task 1 - Install Python

  1. Open a terminal window
  2. Run the following command to install Python:
    1. sudo apt-get install python3
    2. (for Debian/Ubuntu-based systems) or
    3. brew install python3
    4. (for macOS)

Task 2 - Install pip

pip is the default package manager for Python used to install, update, and manage Python packages and dependencies.

  1. Open a terminal window
  2. Run the following command to install pip:
    1. sudo apt-get install python3-pip
    2. (for Debian/Ubuntu-based systems) or
    3. sudo easy_install pip
    4. (for macOS)

Task 3 - Verify

  1. Open a terminal window
  2. Run the following commands to verify installation:
python3 --version
pip3 -- version

or

python --version
pip --version

If you see version information, the installation was successful.

You may need multiple Python versions available on your machine, depending on the requirements of your project and the external tools and libraries required.

Python: Windows

Task 1 - Install Python (includes pip)

  1. Go to the Python download page at https://www.python.org/downloads/windows/
  2. Click the “Download Python” button for the latest version of Python
  3. Read and follow the official instructions here (things change; adapting is key!): https://docs.python.org/3/using/windows.html
  4. Run the installer file that you downloaded as an Administrator, checking both options: 
    1. The first checkbox is checked - keep it checked.
    2. Also check “Add Python to PATH” during the installation process
  5. Click “Install Now” to install Python (which will include pip)

Task 2 - Activate the New Environment

Close and reopen the command prompt or PowerShell window to activate the new environment.

Task 3 - Verify Installation

  1. Open a command prompt or PowerShell window
  2. Run the following commands to verify installation:
python3 --version
pip3 --version

or

python --version
pip --version

If you see version information, the installation was successful.

Python Libraries

Python libraries are collections of pre-written code that can be imported and used in your own programs, saving time and effort when developing complex applications.

Python Standard Library

Any installation of Python will include the standard library which includes a rich set of modules providing access to various system functionalities such as operating system interfaces, file I/O, network programming, data manipulation, and much more.

External Libraries

In addition, Python has a vast ecosystem of external libraries for various purposes, including data analysis, scientific computing, web development, machine learning, artificial intelligence, and more.

Subsections of Python Libraries

Python Standard Library

Python comes with a vast library of modules that are included in any installation of Python, known as the Python Standard Library.

These modules offer a wide range of functionality that can be used for various tasks such as working with data, networking, file handling, and much more.

Here is a brief introduction to some of the commonly used modules in the Python Standard Library:

os

This module provides a way of interacting with the operating system, allowing you to access system files and directories, work with environment variables, and much more.

sys

This module provides access to some variables used or maintained by the interpreter and to functions that interact strongly with the interpreter. It allows you to manipulate the Python runtime environment and perform system-specific operations.

datetime

This module provides classes for working with dates and times. It allows you to create, manipulate, and format dates and times and perform calculations with them.

math

This module provides mathematical functions such as trigonometric functions, logarithmic functions, and many others. It also includes constants such as pi and e.

random

This module provides functions for generating pseudo-random numbers. It can be used for simulating random events, creating games, and much more.

re

This module provides support for regular expressions, a powerful tool for text processing. It allows you to search for patterns in text, extract specific parts of text, and perform various operations on text.

urllib

This module provides a high-level interface for working with URLs and URIs. It allows you to retrieve data from web pages, download files, and much more.

json

This module provides support for working with JSON (JavaScript Object Notation), a lightweight data interchange format. It allows you to encode and decode JSON data, convert JSON data to Python objects, and vice versa.

argparse

This module provides a way of creating command-line interfaces. It allows you to specify arguments and options for your program and provides help messages and error handling.

Years of Experience

For the most part, teams assume analysts can master basic Python syntax in a matter of weeks.

It’s learning and using the vast array of libraries available that can take many years of experience.

Learning how to use this freely available code can be very valuable.

Official Documentation

Python External Libraries

Python has a vast ecosystem of external libraries for data analytics, visualization, and statistical processing. Here are some of the most popular and widely used libraries:

NumPy

NumPy is a powerful library for numerical computing in Python. It provides a high-performance multidimensional array object, along with tools for working with these arrays. NumPy is widely used in scientific computing and data analysis, and is the foundation for many other Python libraries.

pandas

pandas is a library for data manipulation and analysis. It provides a high-performance DataFrame object for working with structured data, and includes tools for data cleaning, merging, and reshaping. pandas is widely used in data science and machine learning, and is a key component of the PyData ecosystem.

Matplotlib

Matplotlib is a library for creating static, animated, and interactive visualizations in Python. It provides a wide range of plotting tools and options, and can create a variety of charts, plots, and graphs. Matplotlib is widely used in scientific computing, data analysis, and machine learning.

Seaborn

Seaborn is a library for creating statistical visualizations in Python. It provides a high-level interface for creating a variety of statistical charts, plots, and graphs, including heatmaps, bar plots, and scatter plots. Seaborn is built on top of Matplotlib and integrates well with pandas data structures.

Scikit-learn

Scikit-learn is a library for machine learning in Python. It provides tools for data preprocessing, feature selection, model selection, and evaluation, and includes a wide range of supervised and unsupervised learning algorithms. Scikit-learn is widely used in data science and machine learning, and is the foundation for many other Python machine learning libraries.

TensorFlow

TensorFlow is a library for machine learning and deep learning in Python. It provides tools for building and training deep neural networks, and includes a wide range of pre-built models for image recognition, natural language processing, and more. TensorFlow is widely used in artificial intelligence, data science, and machine learning.

PyTorch

PyTorch is a library for machine learning and deep learning in Python. It provides tools for building and training deep neural networks, and includes a wide range of pre-built models for image recognition, natural language processing, and more. PyTorch is known for its dynamic computational graph, which enables flexible and efficient model building.

More

These are just a few of the many external libraries available for data analytics, visualization, and statistical processing in Python.

Each library has its own strengths and use cases, so it’s important to know enough about the major options to be able to choose the right tool for the job.

Python Tools

When you first install Python, you have access to the Standard Library.

However, to expand your capabilities and work with various Python projects, you want to install additional packages and dependencies.

Python offers a range of tools that make it easy to install, manage, and maintain these packages and dependencies.

We introduce just some of the popular tools along with recommendations for new personal projects.

Recommanded Approach

Since our goal is to get you started quickly, here’s the recommended way to help maximize these benefits from the beginning.

In each new project, create a pyproject.toml file that’s configured to use the following tools. We provide an example file that you can customize.

Don’t get too attached to preferences - each workplace will likely have their own standard set of preferred tools and processes.

⭐ Configure with pyproject.toml ⭐

Use Pyproject.toml for configuration. It can help remind us to set our virtual environment, install our dependencies, and format and lint our files for correctness and ease of use.

The pyproject.toml file can be used to configure these recommended tools.

  • build-system: Hatch is a dependency management tool that can be used to publish Python packages to PyPI. Hatchling is a build backend for Hatch that is used to build packages. Both tools use the pyproject.toml file to configure the package’s metadata and dependencies.

  • tool.black: Black is a Python code formatter that uses a pyproject.toml file to configure its behavior. You can specify options such as line length and whether to use single or double quotes in the pyproject.toml file.

  • tool.pyright: Pyright is a static type checker for Python that can use a pyproject.toml file to configure its behavior. You can specify options such as which files to include or exclude from type checking, and which Python version to use in the pyproject.toml file.

  • tool.ruff: Ruff is a Python build tool that uses a pyproject.toml file to specify tasks and dependencies for your project. You can define tasks such as building documentation or running tests, and specify the dependencies required for each task in the tools.ruff section of the pyproject.toml file.

Package Managers

Package managers allow you to fetch and install packages from the internet into your Python environment. Two widely used package managers in Python are pip and conda.

⭐ pip ⭐

pip is the default package manager for Python and makes it easy to install, update, and manage Python packages and dependencies. It is an essential tool for working with Python projects.

conda

conda is another popular package manager for Python, often used with the Anaconda or Miniconda distributions. It can be used alongside or as an alternative to pip.

Environment Managers

Python projects often require different versions of Python and different packages, making it essential to maintain and activate different environments as we work. Python offers two environment managers: venv and conda.

⭐ venv ⭐

venv is the default environment manager for Python and allows you to create and manage virtual environments within a Python project.

conda

conda can also be used as an environment manager in addition to its role as a package manager. It automatically activates a base environment upon installation and allows you to create and manage other environments as needed.

Python Formatters

Formatters are tools that help ensure consistent and readable code by automatically formatting Python code according to predefined styles and standards.

Many work environments will specify the tools and formats they prefer. Some may automatically apply formatting rules when code is pushed to a repository. The following recommendations are for personal projects.

⭐ Format with black ⭐

Black is a popular and highly-regarded Python formatter that aims to provide a simple and opinionated approach to code formatting. It reformats entire files in place, making it easy to integrate into automated workflows.

isort

Isort is a Python library and command-line tool that helps e nsure Python imports are properly sorted and formatted. It can automatically group imports by type and optimize the order of imports to reduce conflicts and improve readability. The Ruff linter includes isort functionality.

Python Linters

Linters are tools that analyze code and report on potential errors, style violations, and other issues. These tools help ensure that code is well-written, maintainable, and conforms to best practices.

⭐ Lint with Ruff ⭐

A new Python linter, Ruff, is gaining popularity. Ruff is a Rust-powered linter that aims to be faster and more reliable than traditional linters like Pylint and Flake8. It uses a Rust library called syntect for syntax highlighting and parsing, and leverages Rust’s memory safety and concurrency features to provide a faster and more reliable analysis.

Ruff offers several features that make it a promising option for Python developers, including integration with editors like VS Code, support for custom rule sets, and an easy-to-use command-line interface.

Ruff is configured using the standard pyproject.toml file and includes isort functionality.

Python Type Checkers

Type checkers are tools that analyze your code and attempt to find type-related errors before code runs. This helps catch errors earlier in the development process and can improve the overall quality of your code.

⭐ Typecheck with Pyright ⭐

Pyright is a popular type-checking tool for Python that uses static analysis to identify type-related errors in your code. Pyright supports Python 3.6 and above, and can be used in a variety of development environments, including VS Code and other editors.

When used in VS Code, Pyright provides real-time feedback and suggestions as you code, helping you catch errors and improve the overall quality of your code. Pyright also supports type annotations, allowing you to provide additional information about the types of variables and function arguments in your code.

Package Development and Distribution

⭐ Hatch and Hatchling ⭐

Hatch is a command-line tool for managing dependencies and environment isolation for Python developers. It allows developers to easily configure, version, specify dependencies for, and publish packages to PyPI. Hatch can be used to create new Python packages, add dependencies, and manage virtual environments.

Hatchling is a build backend for Hatch that is used to build Python packages. It provides a simple, declarative configuration file format that allows developers to specify the dependencies, entry points, and other package metadata. Hatchling can be used to build packages in different formats, including source distributions and wheels, and to upload them to PyPI or other package repositories.

Setuptools

Setuptools is a package development and distribution tool for Python that provides features such as package metadata, package installation, and dependency management. Setuptools is widely used and integrates with many other Python tools and frameworks, making it a popular choice for package development and distribution.

Flit

Flit is a lightweight tool for building and publishing Python packages. Flit provides features such as dependency management, virtual environments, and metadata management, and is designed to be simple and easy to use. Flit also supports building wheels for distribution, making it a good choice for creating packages that can be easily installed on different systems.

Python Build Tools

Using a build tool or command line runner in Python can help new analysts and developers automate repetitive tasks, streamline their workflow, and avoid having to retype complex commands.

There are several build tools available for Python projects that help automate the build process and manage dependencies.

Make

One such tool is Make, an older and widely used build tool that automates the building and testing of software. It is a powerful and flexible tool that can be used to manage complex build processes with many dependencies.

⭐ Build with Just ⭐

Another popular tool is Just, a newer command runner written in Rust. It uses a simple YAML configuration file called justfile.yaml to define tasks and their dependencies, and is designed to be fast and easy to use. Just is particularly useful for smaller projects that don’t require a full-fledged build system.

Python: AI and ML

Python is a popular programming language that has gained a lot of traction in the fields of artificial intelligence (AI) and machine learning (ML).

Python offers a range of libraries and frameworks that make it easier to develop and deploy AI and ML applications, including:

  • NumPy: A library for numerical computing in Python, NumPy provides support for large, multi-dimensional arrays and matrices, as well as a range of mathematical functions for working with this data.

  • pandas: A library for data manipulation and analysis in Python, pandas provides support for working with structured data in a variety of formats, including CSV, Excel, SQL databases, and more.

  • Scikit-learn: A library for machine learning in Python, Scikit-learn provides a range of algorithms for classification, regression, and clustering, as well as tools for model selection and evaluation.

  • TensorFlow: A popular library for machine learning and deep learning in Python, TensorFlow provides support for building and training neural networks, as well as tools for deploying models on a variety of platforms.

  • Keras: A high-level neural networks API in Python, Keras provides a simple and intuitive interface for building and training deep learning models, as well as support for a range of backends, including TensorFlow.

Python: Environments

Python environments can be confusing at first, but they are essential for developing and deploying Python applications.

Overview

At a high level, you can think of Python environments as isolated “containers” that provide a controlled environment for your code to run in.

They are similar in some ways to operating systems, in that they provide a layer of abstraction between the code and the underlying system, and allow multiple applications to run independently without interfering with each other.

Python Environments

In the case of Python environments, the “container” is a self-contained installation of the Python interpreter and associated packages and dependencies.

Environments allow you to install and manage different versions of Python and packages without affecting other environments or your system Python installation.

By creating separate environments for each project, you can ensure that each project has access to the correct versions of Python and packages, and that packages do not conflict with each other.

This can help ensure that your code works consistently across different machines and environments, and can make it easier to manage and deploy your code.

Importance

There are several reasons why Python environments are important:

Version management

Different projects may require different versions of Python or packages. By creating separate environments for each project, you can ensure that each project has access to the correct versions of Python and packages.

Dependency management

Python packages often have complex dependencies on other packages. By isolating each project in its own environment, you can avoid conflicts between different packages and ensure that each project has the correct dependencies installed.

Reproducibility

By using environments, you can ensure that your code works consistently across different machines and environments. This is important when collaborating with others or when deploying your code to a production environment.

Tools

There are several tools available for managing Python environments, including:

  • virtualenv
  • conda
  • pipenv

These tools make it easy to create, manage, and switch between environments, and can be integrated with development tools like IDEs and text editors.

Create / Activate / Install

In practice, creating a new environment involves using a tool like virtualenv or conda to create a new environment directory, activating the environment, and then installing the required packages and dependencies using pip or conda.

Using The Active Environment

Once the environment is set up, you can run your code within that environment, and any packages you install will be isolated to that environment.

Python: Fundamentals

Here is a quick summary of some basic concepts to get started programming with Python.

Human Languages

Python introductions are available in many human languages. See https://wiki.python.org/moin/Languages for more.

Syntax

Python has a simple and consistent syntax which makes it easy to learn and read.

Indentation is used to indicate a block of code, as opposed to curly braces or keywords like ‘begin’ and ’end’ in other languages.

Indentation matters! (a tab is not the same as spaces)

Comments

Comments are denoted by the hashtag or pound sign (#). Any text that follows the hashtag on the same line is c onsidered a comment and is ignored by the Python interpreter. Comments can be used to provide additional information about the code or to temporarily disable parts of the code during development or testing.

Variables

Variables are used to store values, like numbers or text strings. In Python, you can create a variable by assigning a value to it, like this:

x = 5

Data Types

Python has several built-in data types, including integers (whole numbers), floating point numbers, and strings (text). For example:

x = 5         # an integer
y = 3.14      # a floating-point number
z = "hello"   # a string

Basic Operations

Python supports basic mathematical operations like addition, subtraction, multiplication and division, use the following signs +, - , * , / respectively:

x = 5
y = 3
print(x + y)   # prints 8

Conditional Statements

Conditional statements allow you to check if certain conditions are true, and then run different code depending on the result. In Python, elif is used as the keyword for “else if”. For example:

x = 5
if x > 0:
    print("x is positive")
elif x ==0:
    print("x is zero")
else:
    print("x is negative")

Functions

Functions are blocks of code that can be reused throughout your program. They can take input values called parameters, and return one or more output values. For example:

def double(x):
    return x * 2

result = double(5)
print(result)  # prints 10

Loops

Loops are used to repeat a block of code multiple times. Python has two types of loops: for loops and while loops. For example:

for i in range(5):
    print(i) # will print 0, 1, 2, 3, 4
x = 0
while x < 5:
    print(x)
    x += 1

How To Learn

The best way to learn is by doing - experiment, type code, and build personal projects to gain skills. This is the only course in the program where we work through all the foundational topics. Other courses will jump right in to Python programming by example. It’s best to take this course early in the program and master these basics early.

Big Wins

Reddit comment on suggested “big wins in Python” - when you see these, know they’re considered pretty useful skills.

https://www.reddit.com/r/learnpython/comments/10ka2dm/comment/j5pciik/

  • extended libraries (e.g. pandas, NumPy)
  • context managers (with open() as file:)
  • lambda functions, zip(), map(), filter(), enumerate()
  • comprehensions (concise and very impressive/useful)
  • regular expressions
  • sorting (faker is pretty useful, too)
  • type-hinting - easy and recommended! No more wondering if x is a string or an int - make it so! This is pretty new and valuable skill. It looks a lot like Swift.

Python: Organization

This page provides an overview of the fundamental building blocks of Python code organization.

Variable

A variable is a named memory location that holds a value.

Expression

An expression is a combination of operators (e.g. +) and operands (e.g. 1 or age) that resolves to a value.

Statement

A statement is the smallest unit of a Python script. All scripts are made up of statements. Some statements are expressions, while others are not (e.g. print(“hello”)).

Function

A function is a reusable block of code that performs a specific task. Functions help to break up large programs into smaller, more manageable pieces, and can be reused across different parts of a program or across different programs.

Class

A class is a blueprint for creating objects in Python. Classes define the attributes (data fields) and methods (functions) that all objects of a certain type will have. Using classes, we can create multiple instances (objects) of a certain type, each with their own unique attributes.

In Python, classes are optional, but many modules use an object-oriented approach to organizing code.

Object

An object is a specific instance of a class that holds real data. For example, if we have a Dog class, we can create two objects, sam = Dog("Sam", 3) and fido = Dog("Fido", 4), each with their own name and age attributes.

File / Module

In Python, a module is a file containing Python definitions and statements. Each .py file is a Python script and, by definition, a Python file is also a module. The name of the module is the same as the name of the file, without the .py extension.

Package

A package is a way of organizing related modules together. Packages allow us to group together related functionality in a way that is easy to import and use. A package is simply a directory that contains one or more Python modules.

Library

A library is a collection of packages and modules that provides a set of pre-written code for specific tasks. For example, the Python Standard Library is a large collection of libraries that are included with Python and provide a wide range of functionality, f rom file input/output to regular expressions to networking.

Python Distribution Methods

Python also has some special entities related to distributing Python code to users.

Python Distributions

A distribution is a bundle of Python software, which typically includes the Python interpreter, the Python standard library, and various additional packages and tools.

There are several popular Python distributions available, such as Anaconda, which includes many data science packages and tools, and Python(x,y), which is geared towards scientific computing.

Python distributions can make it easier to set up and manage a Python environment, especially for beginners, and come with many pre-installed packages and tools.

Python Wheels

A wheel is a self-contained installation executable that can be used to easily distribute and install Python packages across different systems.

Wheels are different from source distributions or packages, which are typically distributed as source code and must be compiled or built before they can be installed.

A wheel is essentially a ZIP archive that contains the files and dependencies necessary for a Python package to be installed on a system. It makes installation faster and easier, as the package does not need to be built from source code each time it is installed on a new system.

Wheels are platform-specific, meaning that a wheel built on one operating system or architecture may not work on another system with a different operating system or architecture. To address this, Python has a system of tags to identify which platforms a wheel is compatible with, so the correct version of the wheel can be downloaded and installed on each system.

Understanding Organization

Understanding the fundamental building blocks of Python organization and distribution is essential for employing available Python tools and writing clean, well-structured code that is easy to read, maintain, and reuse.

Python: pandas

pandas is a popular open-source library for data analytics in Python. It provides powerful tools for working with tabular data, such as data frames and series. With pandas, you can easily read, manipulate, and analyze data in a variety of formats, including CSV, Excel, SQL databases, and more.

One of the key features of pandas is its ability to handle missing data. pandas provides a number of methods for filling in missing data, interpolating values, and dropping missing data altogether. This is a critical feature for data analytics, as real-world data is often incomplete or inconsistent.

pandas performs complex data transformations and aggregations. With pandas, you can group data by one or more columns, apply functions to subsets of data, and pivot data to reshape it in different ways.

pandas provides tools for merging and joining data from multiple sources, making it easy to combine data from different sources into a single data set.

Being good with pandas is a valuable skill.

Faster Options

pandas can be a bit slow. Options include:

  • Moving to the faster pandas 2.0
  • Trying Polars

New! Read More about this important 2.0 update

pandas 2.0

pandas 2.0 is a significant update the to the beloved pandas.

Learn more at:

Polars

Polars is a data manipulation library written with Rust that aims to provide a fast, memory-efficient alternative to pandas for large-scale data processing. It’s still a relatively new library, having been first released in 2019, a nd its user base and ecosystem are still growing.

Polars has a lot of potential as a fast and memory-efficient data manipulation library for large datasets, but it’s still a relatively new library and may not have the same level of maturity and ecosystem as pandas.

Python: Project Management

There are several ways to manage dependencies and project metadata in Python. While they differ in their syntax and capabilities, they can all be used to specify the dependencies required for a Python project.

pyproject.toml

pyproject.toml is a configuration file used in modern Python projects to specify various aspects of the project, including its dependencies, build settings, and package metadata. It is part of the Python Packaging ecosystem, which provides a standardized way to manage Python packages and their distribution.

pyproject.toml is similar to Project.toml, used in Julia projects.

pyproject.toml is used by the Poetry package manager, a popular tool for managing dependencies and building Python projects. Poetry relies on the pyproject.toml file to define the project’s dependencies, and uses this information to create a virtual environment for the project and install the necessary dependencies.

It provides a simple, declarative way to manage project dependencies, without the need for separate requirements.txt or setup.py files. It also allows developers to specify other project metadata, such as its version number, author, and license.

With pep-0621 pyproject.toml is the standard for managing Python projects, and increasingly used by many popular Python packages and tools.

By adopting pyproject.toml and the Python Packaging ecosystem, developers ensure that their projects are well-organized, maintainable, and easily sharable with others in the Python community.

Poetry

Poetry is a modern Python packaging and dependency management tool that helps simplify the process of managing dependencies and building projects. It allows developers to define their project dependencies in a declarative way using a simple pyproject.toml file, rather than relying on separate requirements.txt or setup.py files.

One of the key advantages of using Poetry is that it provides a streamlined workflow for managing dependencies and virtual environments. It can automatically create and manage virtual environments for each project, isolating project dependencies and avoiding conflicts with system-level packages. Poetry also provides powerful tools for managing dependencies, including automatic dependency resolution, dependency locking, and the ability to publish and install packages from both PyPI and private repositories.

Another advantage of using Poetry is that it provides a simple, consistent interface for managing all aspects of a Python project, from dependency management to building and publishing packages. This makes it easier for developers to focus on writing code and building their projects, without getting bogged down in the details of project management.

Legacy Project Management

Although pyproject.toml is the new standard for managing dependencies and metadata in modern Python projects, you may still encounter older projects that use requirements.txt and setup.py.

requirements.txt is a file used to specify a project’s dependencies in a simple, text-based format. Each line in the file lists a package name and version number, separated by an equals sign. This format is easy to read and edit, and is supported by many Python tools and frameworks. However, it lacks some of the advanced features provided by pyproject.toml, such as the ability to specify package metadata and build settings.

setup.py is a script used to build and distribute Python packages. It includes metadata about the package, such as its name, version, and author information, as well as instructions for building and installing the package. Although setup.py is still used in many projects, it has some limitations, such as the inability to specify dependencies with the same level of detail as pyproject.toml.

Recommendations

If you’re starting a new Python project from scratch, it’s generally not recommended to use requirements.txt or setup.py as the primary method for managing dependencies and metadata. Instead, you should use pyproject.toml, which is the modern standard for these tasks.

However, if you’re working with an existing project that uses requirements.txt or setup.py, it’s often necessary to keep these files around for compatibility reasons. For example, if you’re working on a project that is already deployed to production and relies on requirements.txt to specify its dependencies, you may not want to switch to pyproject.toml right away, since this could cause compatibility issues or require a significant amount of testing.

Python: Scripts

A Python script is a file containing Python code that can be executed by the Python interpreter.

Scripts can be used to automate tasks, perform calculations, or interact with other software systems.

Run A Script

To run a Python script, you need to have the Python interpreter installed on your system. Once you have installed Python, you can run a script by opening a terminal or command prompt, navigating to the directory containing the script, and typing python myscript.py (replacing myscript with the name of your script).

For example, if you have a script named myscript.py in a directory called myproject, you can run it by opening a terminal or command prompt, navigating to the myproject directory, and typing python myscript.py.

If your script requires any command-line arguments, you can pass them to the script by including them after the script name. For example, if your script requires a filename as an argument, you could run it like this: python myscript.py myfile.txt.

When you run a Python script, the interpreter reads the code in the file and executes it. Any output produced by the script is printed to the console.

Python: Try/Except

Code Might Fail

It’s important to use try/except/finally whenever your application could fail through no fault of your own.

Why Plan for Errors?

People ask:

  • Why plan for errors?
  • Shouldn’t we fix all errors in our code before we release it?
  • Why do we need try/except/finally?

Perfect Code Can Still Have Exceptions

We should always strive to fix all coding and logic errors. However, sometimes our code can be perfect - but exceptions can still happen. try/except/finally is a way to gracefully handle unexpected errors and prevent our program from crashing.

Example

Suppose you write a script to read baseball_game_results.csv each night at midnight.

It runs fine until someone changes the filename to rslts.csv.

Now, your code terminates with an ugly error because the necessary file can’t be found.

To code professionally, we can use try/except to handle this error gracefully.

try:
    # Attempt to open the file
    with open('baseball_game_results.csv', 'r') as f:
        # Do something with the file
        
except FileNotFoundError:
    # Handle the case where the file is not found
    print('ERROR: File not found. Please name the file to baseball_game_results.csv')
finally:
    # Clean up any resources (e.g. file handles) used by the code
     

Other Programming Languages

Other programming languages use something very similar, but might use the keywords try/catch/finally. As in “try this, and if you catch an exception, do this.”

Throwing Exceptions

Exceptions are thrown by nested functions, up, up, up, until some level “catches” the exception and deals with it, or the program terminates with an ugly error.

It’s important to handle exceptions gracefully and prevent our programs from crashing.

Python: Uninstalling

Python seems to install a bit like a virus and traces can get everywhere.

At times, removing an old version of Python can be challenging.

Cleaning up unneeded Python installations can help avoid conflicts between different Python versions and packages.

Using package managers and virtual environments can help.

Uninstalling

Installations can leave traces on your system that may no longer be needed. Here are some recommendations for cleaning up old Python installations:

  1. Uninstall Python from the Control Panel: If you have installed Python using the official installer on a Windows machine, you can uninstall it from the Control Panel. Simply search for “Add or Remove Programs” in the Start menu, then find the Python installation you want to remove and click “Uninstall”.

  2. Delete Python folders: Python installations typically create folders on your system that can be deleted to remove the installation. The main folders are typically located in C:\Python or C:\Users\{user}\AppData\Local\Programs\Python. Be careful when deleting folders to ensure you are only deleting the correct installation.

  3. Clean up environment variables: Python installations can add environment variables to your system that may no longer be needed. You can clean these up by going to the System Properties window, selecting “Advanced System Settings”, then clicking the “Environment Variables” button. Here you can remove any Python-related environment variables that are no longer needed.

Management Tools

Managing Python well can help avoid issues. The following recommendations can help.

  1. Use a package manager: Using a package manager like conda or pipenv can help keep track of Python installations and dependencies. These package managers allow you to create isolated environments for specific projects, so you can avoid installing unnecessary packages and versions of Python.

  2. Use virtual environments: Another way to manage multiple Python installations is to use virtual environments. Virtual environments allow you to create isolated environments for specific projects, so you can avoid conflicts between different Python versions and packages. You can create virtual environments using the venv module or third-party tools like virtualenv or conda.

R

Data Analysis and Statistics

R is a programming language designed for data analysis and statistical computing. It is widely used by data scientists, statisticians, and researchers for various purposes.

Why R?

For developers, R offers several advantages over other programming languages:

  • R has a focus on data analysis and statistical computing, with a range of built-in functions and libraries for these tasks.
  • It provides a high-level interface for data manipulation and visualization, making it easy to explore and analyze complex data sets.
  • R has a large and active community of users and contributors, with many resources and tutorials available.

R Syntax

  • R has a simple and intuitive syntax, with a focus on data manipulation and analysis.
  • It supports various data types, including vectors, matrices, data frames, and lists.
  • R has built-in support for statistical functions and libraries.

Free Resources for Learning R

  • R Project: The official R website, with downloads, documentation, and resources.
  • R Tutorial: A comprehensive tutorial for learning R, covering the basics of data analysis and visualization.
  • R for Data Science: A book by Hadley Wickham and Garrett Grolemund, covering the fundamentals of data science with R.
  • Coursera: Various online courses on R programming and data science.

R Frameworks and Libraries

  • R has a large and diverse ecosystem of libraries and packages, catering to various data science use cases such as data manipulation, visualization, machine learning, and more.
  • Popular R libraries include ggplot2, dplyr, tidyr

File Extensions

  • .R

Rust

Powerful and Safe Programming Language

Rust is an open-source programming language developed by Mozilla. It aims to provide a fast and safe alternative to C and C++, with a focus on memory safety and concurrency.

Why Rust?

For developers, Rust offers several advantages over other programming languages:

  • Rust has a focus on safety, with memory and thread-safety enforced at compile-time.
  • It provides low-level control like C and C++ but without the risk of memory errors and vulnerabilities.
  • Rust’s borrow checker prevents data races and other concurrency issues.
  • It has a growing ecosystem and community, with a range of libraries and frameworks available.

Rust Syntax

  • Rust has a clean and modern syntax, influenced by C and other systems programming languages.
  • It uses static typing and supports various data types, including integers, floats, strings, and arrays.
  • Rust has built-in support for concurrent programming with threads and channels.

Free Resources for Learning Rust

  • The Rust Programming Language: The official Rust website, with documentation, tutorials, and downloads.
  • Rust By Example: An interactive introduction to Rust, with hands-on coding exercises.
  • Rustlings: A collection of small exercises to get started with Rust.
  • Rust Cookbook: A collection of practical examples and snippets for learning Rust.
  • Rust Playground: An online environment for writing and testing Rust code.

Rust Frameworks and Libraries

  • Rust has a growing ecosystem of libraries and frameworks, catering to various use cases such as web development, game development, and systems programming.
  • Popular Rust frameworks and libraries include Actix, Rocket, and Serde.

File Extensions

  • .rs

Typst

Typst is a modern typesetting system designed for creating professional-looking documents, with a focus on simplicity and ease of use.

It offers several advantages over traditional word processors and other typesetting systems, such as:

  • Ease of Use: Typst is designed to be easy to learn and use, even for beginners.

  • Flexibility: Typst provides powerful tools for creating and formatting complex mathematical equations, symbols, tables, and figures.

  • Portability: Typst documents can be easily converted to a variety of formats, including PDF, HTML, and other document types.

Typst for Scientific Writing

Typst is a preferred tool for writing scientific documents such as research papers, technical reports, capstone project reports, and theses. It provides powerful tools for creating and formatting complex equations and symbols, making it ideal for scientific writing.

Typst Syntax

Typst uses dollar signs ($) for math mode, like LaTeX. For more on syntax, see:

Integration

Typst can be used in combination with other tools, such as BibTeX for managing bibliographic references and citations.

Free Resources for Learning Typst

File Extensions

Here are some common file extensions used with Typst.

  • .typ: The main file extension for Typst documents.

  • .bib: The file extension for bibliographic data files, used with BibTeX to manage references and citations in reports and documents.

See Also