Chapter 2

Tools

This chapter introduces some popular tools.

Chocolatey

Chocolately is a popular package manager for Windows that makes it easy to install, update, and manage software packages. It offers a large selection of packages and advanced features. See also Winget.

Docker

Docker is a platform for building, shipping, and running applications in containers.

Git

Git is a popular version control system that allows developers to track changes to their code and collaborate with others on a project. It provides a way to manage and organize code, and allows for easy branching and merging. Git is widely used in software development, and is an essential tool for any developer’s toolkit.

GitHub

GitHub is a web-based platform that provides a range of features for managing Git repositories. It allows developers to host their code online, collaborate with others on a project, and track issues and bugs. GitHub is widely used in the open-source community and is a popular tool for managing software development projects.

Homebrew

Homebrew is a package manager for macOS that makes it easy to install, update, and manage software packages.

Jupyter

Jupyter is a popular web-based interactive computing environment that allows data analysts to create and share documents containing live code, visualizations, and narrative text.

PowerShell

PowerShell is a command line shell and scripting language developed by Microsoft. It is designed to automate system administration tasks and provide an extensible platform for developers to write their own scripts and tools. PowerShell is widely used on Windows systems, and is becoming increasingly popular as a cross-platform tool for managing and automating IT infrastructure.

VS Code

Visual Studio Code, often referred to as VS Code, is a lightweight but powerful source code editor that is popular among developers. It is highly customizable and supports a wide range of programming languages, making it a versatile tool for developers of all skill levels. VS Code also has a large ecosystem of extensions that can be used to extend its functionality.

Winget

Winget is a newer lightweight package manager for Windows 10 developed by Microsoft that makes it easy to install, update, and manage software packages. See also Chocolatey.

Language-Specific Tools

In addition, there are several important language-specific tools.

Languages / Python / Tools / conda

Conda is a popular package manager for Python often used with the Anaconda or Miniconda distributions. See also pip.

Languages / Python / Tools / pip

pip is the default and widely-used package manager for Python that makes it easy to install, update, and manage Python packages and dependencies. It is an essential tool for working with Python projects. See also conda.

Subsections of Tools

Chocolatey

Chocolatey is a package manager for Windows, similar to Homebrew for macOS. It simplifies the installation, updating, and management of Windows software, including command-line tools, applications, and libraries. Chocolatey uses NuGet infrastructure and PowerShell to manage packages, making it a powerful tool for Windows users.

Alternatives

Microsoft has been working on a package manager called Winget. Winget is an official package manager developed by Microsoft, and it is designed to be the native package manager for Windows. It is gaining new features and improvements over time.

To choose the best package manager for your needs, consider the following.

Community adoption

  • Both Chocolatey and Winget have growing communities.
  • Chocolatey has been around for longer and has a larger repository of packages.
  • As Winget gains traction, its community and package offerings will likely grow.

Official support

  • As an official Microsoft product, Winget may receive better long-term support and integration with the Windows ecosystem.
  • This could make it a more future-proof choice.

Features and functionality

  • Chocolatey has more mature features and a comprehensive set of tools.
  • However, Winget is expected to gain more features and improvements over time.

Docker

Docker is an open-source platform that automates the deployment, scaling, and management of applications by using containerization technology. It allows developers to package an application and its dependencies (libraries, configuration files, etc.) into a single, lightweight, and portable container. These containers can run consistently across different environments, simplifying application development, testing, and deployment.

Docker provides the following features.

Containerization

Docker uses containerization to isolate applications and their dependencies into separate, self-contained units. This approach ensures that each application runs in a consistent environment, reducing conflicts and improving security.

Image Management

Docker images are templates used to create containers. They are lightweight and can be easily shared, stored, and versioned. Docker Hub, the official public registry, hosts thousands of pre-built images for various programming languages, frameworks, and tools.

Portability

Docker containers can run on any system that supports Docker, regardless of the underlying infrastructure or platform. This makes it easy to deploy and migrate applications across different environments, such as development, testing, and production.

Scalability

Docker enables horizontal scaling of applications by allowing you to deploy multiple instances of the same container. This approach can help distribute the load across multiple resources and improve application performance.

Version Control

Docker images can be versioned and stored in registries, making it easy to rollback, upgrade, or downgrade applications as needed. This also facilitates collaboration among team members, as they can share and use the same image versions.

Ecosystem

Docker has a rich ecosystem of tools and services and many third-party tools and plugins integrate with Docker to extend its functionality.

Managing Containers

Docker containers can be managed with Kubernetes, a popular open-source container orchestration platform. Kubernetes is designed to automate the deployment, scaling, and management of containerized applications, including Docker containers.

Kubernetes provides features such as automatic scaling, self-healing, and load balancing. Kubernetes can manage Docker containers running on a single host or across a cluster of hosts, abstracting away the underlying infrastructure and providing a consistent and scalable platform for running containerized workloads.

Technologies such as Docker Swarm, Apache Mesos, Nomad, and OpenShift perform similar functions to Kubernetes.

Installation

The installation process for Docker depends on your operating system. Follow the instructions below based on your platform.

Common Files

When working with Docker, you’ll encounter several common files.

Dockerfile

File used to define the steps required to build a Docker image.

Dockerfile contains instructions such as

  • FROM - specifies the base image to use
  • RUN - runs commands to install dependencies and set up the environment
  • COPY - copies files from the host machine into the image
  • CMD - specifies the command to run when the container is started

docker-compose.yml

Defines and runs multi-container Docker applications.

docker-compose.yml allows developers to define the services that make up the application, their dependencies, and how they are connected. This file can be used to start, stop, and manage containers in a multi-container application.

.dockerignore

Like .gitignore in Git repositories, .dockerignore is used to specify files and directories that should be excluded from the Docker build context.

By excluding unnecessary files and directories, the Docker build process is faster and more efficient.

Dockerfile.dev

Dockerfile.dev is a Dockerfile variant for development environments.

It contains additional instructions for setting up a development environment, such as installing development tools and enabling debugging.

See Also

Learn more about Docker and the associated tools.

Subsections of Docker

Docker: Installation

Docker is an open-source platform that automates the deployment, scaling, and management of applications by using containerization technology.

Use Docker to create, manage, and deploy containerized applications.

Mac/Linux Users

  • Option 1: Official installation instructions. Follow instructions on the official Docker website. This is the most up-to-date and comprehensive guide to installing Docker on your system.

  • Option 2: Step-by-step installation guide. Check out our installation instructions for a step-by-step guide.

Windows Users

  • Option 1: Official installation instructions. Follow the instructions on the official Docker website. This is the most up-to-date and comprehensive guide to installing Docker Desktop on your Windows system.

  • Option 2: Step-by-step installation guide. Check out our installation instructions for a step-by-step guide.

Subsections of Docker: Installation

Docker: Mac/Linux

The best way to install Docker for Mac and Linux is by using Docker Desktop (for Mac) and Docker Engine (for Linux). Docker provides a complete development environment for containerized applications.

Warning: Docker is a resource-intensive application that may consume a significant amount of disk space, memory, and CPU resources. Installing and running Docker on your system may slow down your machine, especially if it has limited resources. Make sure your system meets the minimum requirements before installing Docker, and consider monitoring resource usage to ensure optimal performance.

Follow these steps to install Docker on Mac and Linux.

For Mac:

  1. Ensure your system meets the requirements:

    • macOS 10.14 (Mojave) or later
  2. Download Docker Desktop for Mac from the official Docker website.

  3. Run the installer:

    • Double-click the downloaded Docker Desktop Installer.dmg file and follow the on-screen instructions.
  4. Start Docker Desktop:

    • After the installation is complete, Docker Desktop should start automatically. If it doesn’t, you can launch it from the Applications folder.
    • You will see the Docker icon in the menu bar, indicating that Docker is running.
  5. Verify the installation:

    • Open a Terminal window.
    • Run the following command to check the Docker version.

    docker --version

  6. Run a test container to ensure that Docker is working correctly.

    docker run hello-world

For Linux:

  1. Choose the appropriate installation instructions for your Linux distribution from the official Docker Engine documentation.

  2. Follow the provided instructions to install Docker Engine on your system.

  3. Verify the installation.

    • Open a Terminal window.
    • Run the following command to check the Docker version.

    docker --version

  4. Run a test container to ensure that Docker is working correctly.

    docker run hello-world

Save Resources

Docker takes a lot of resources. You may want to stop Docker when you are not using it.

For Mac

  1. Locate the Docker icon in the menu bar, which is typically located in the upper-right corner of the screen.
  2. Click on the Docker icon to open the dropdown menu.
  3. Click on “Quit Docker Desktop” or “Exit” to stop Docker Desktop.

For Linux

  1. Open a Terminal window.
  2. Run the following command to stop the Docker daemon.

sudo systemctl stop docker

To start Docker again, simply launch the application from the Applications folder (Mac) or run the following command in a Terminal window (Linux):

sudo systemctl start docker

Docker: Windows

The best way to install Docker for Windows is by using Docker Desktop. Docker Desktop is an easy-to-use application that allows you to run containers on your Windows machine. It includes both Docker Engine and Docker Compose, providing a complete development environment for containerized applications.

Warning: Docker is a resource-intensive application that may consume a significant amount of disk space, memory, and CPU resources. Installing and running Docker on your system may slow down your machine, especially if it has limited resources. Make sure your system meets the minimum requirements before installing Docker, and consider monitoring resource usage to ensure optimal performance.

Follow these steps to install Docker Desktop for Windows.

  1. Ensure your system meets the requirements:
  • Windows 10 64-bit: Pro, Enterprise, or Education (Build 16299 or later) or Windows 11.

  • Virtualization must be enabled in the BIOS. You can usually find this setting under “CPU Configuration,” “Virtualization,” or “VT-x” settings.

  1. Download Docker Desktop for Windows from the official Docker website. (600+ MB).

  2. Run the installer:

  • Double-click on the downloaded Docker Desktop Installer.exe file to start the installation process.
  • Follow the on-screen instructions, accepting the default settings or customizing them as needed.
  1. Start Docker Desktop:
  • After the installation is complete, Docker Desktop should start automatically.
  • If it doesn’t, you can launch it from the Start menu.
  • You will see the Docker icon in the system tray, indicating that Docker is running.
  • Right-click on the icon and select “Dashboard” to open the Docker Desktop dashboard.
  1. Verify the installation:
  • Open a command prompt or PowerShell window.
  • Run the following command to check the Docker version.

docker --version

  1. Run a test container to ensure that Docker is working correctly.

docker run hello-world

Save Resources

To stop Docker Desktop when you are not using it:

  1. Locate the Docker icon in the system tray, which is typically located in the lower-right corner of the screen.

  2. Right-click on the Docker icon to open the context menu.

  3. Click on “Quit Docker Desktop” or “Exit” to stop Docker Desktop.

Git

Git is a popular tool used to help collaborate with others and keep track of code changes over time.

At a high level, Git is a version control system for tracking changes in evolving code projects. Using Git allows you to easily revert to an earlier version of code if you make a mistake or if a change causes unexpected problems.

Git makes it easy to collaborate with others on code. You can use Git to share your code with others, track changes that they make, and merge their changes back into your codebase. This makes it a great tool for open source development, where many people may be working on the same codebase at the same time.

In this Git introduction, we’ll start with the basics of using Git, including setting up your Git environment, creating a repository, and making commits. We’ll also cover more advanced topics like branching, merging, and collaborating with others.

Installation

The installation process for Git depends on your operating system. Follow the instructions below based on your platform:

Configuration

After installing, configure Git with your name and email address.

Using Git

When it comes to using Git, you have a few options for how to interact with it. One option is to use Git in the terminal, which involves typing out commands and working with the Git command line interface. Another option is to use a Git integration in your Integrated Development Environment (IDE), such as Visual Studio Code (VS Code).

Using Git in the terminal can be a bit intimidating, as it requires memorizing and typing out specific commands. However, it can be a useful skill to have, especially if you work on projects that require using Git outside of an IDE.

On the other hand, using a Git integration in your IDE can make the process of working with Git more user-friendly and intuitive, as you can often perform Git actions with a few clicks or keystrokes. For example, VS Code has built-in Git support and provides a visual interface for common Git actions such as committing changes, creating branches, and merging changes.

Git Crash Course (Video)

Check out the recommended Git Crash Course (Video).

Free ProGit (Book)

Check out the free ProGit book for a comprehensive guide to using Git.

See Also

Subsections of Git

Git: Installation

Git is a widely-used version control system that helps data analysts and developers track changes to their code and collaborate with others.

Mac/Linux Users

  • Option 1: Official installation instructions. Follow instructions on the official Git website. This is the most up-to-date and comprehensive guide to installing Git on your system.

  • Option 2: Step-by-step installation guide. Check out our installation instructions for a step-by-step guide.

Windows Users

  • Option 1: Official installation instructions. Follow instructions on the official Git website. This is the most up-to-date and comprehensive guide to installing Git on your system.

  • Option 2: Step-by-step installation guide. Check out our detailed installation instructions for a step-by-step guide.

Use Git to manage your code and collaborate with others.

Subsections of Git: Installation

Git: Mac/Linux

Task 1 - Download and install Git

  1. Open a terminal window
  2. Run the following command to install Git:
    • sudo apt-get install git
    • (for Debian/Ubuntu-based systems) or
    • brew install git
    • (for macOS)

Task 2 - Configure Git

  1. Open a terminal window
  2. Run the following commands to configure Git with your name (your real name, e.g. “Denise Case”) and the email address you used for GitHub.
git config --global user.name "Your Name"
git config --global user.email "your.email@example.com"
  1. Important: Replace “Your Name” with your name and “your.email@example.com” with the email address associated with your GitHub account
  2. This configuration will be used for all of your Git repositories

Task 3 - Verify

  1. Run the following command to verify your Git configuration:
git config --list
  1. You should see your name and email address listed under the “user” section
  2. If the information is not correct, you can run the git config command again to update it

Git: Windows

Task 1 - Download and install Git

  1. Go to the Git download page at https://git-scm.com/download/win
  2. Click the “Download” button to download the Git installer
  3. Run the installer file that you downloaded
  4. Accept the default installation options and click “Install”
  5. Choose the appropriate options for line ending conversion and terminal emulator during the installation process

Task 2 - Configure Git

  1. Open a command prompt or PowerShell window
  2. Run the following commands to configure Git with your name (your real name, e.g. “Denise Case”) and the email address you used for GitHub:
git config --global user.name "Your Name"
git config --global user.email "your.email@example.com"
  1. Important: Replace “Your Name” with your name and “your.email@example.com” with the email address associated with your GitHub account
  2. This configuration will be used for all of your Git repositories

Task 3 - Verify

  1. Run the following command to verify your Git configuration:
git config --list
  1. You should see your name and email address listed under the “user” section
  2. If the information is not correct, run the git config command again to update it

Git: Basics

Git is a widely-used version control system that helps you track changes to your code and collaborate with others. With Git, you can create a complete history of your work, from the initial commit to the latest changes. This makes it easy to work on a project with others, keep track of your progress, and recover from mistakes.

Creating a Repository

To get started with Git, you need to create a repository. This is where you’ll store your code and track changes to it. There are several ways to create a repository:

  • Clone an existing repository: If you want to work on code that’s already been created and shared by someone else, you can clone their repository to your local machine. To do this, you’ll need the repository’s URL and you can use the git clone command to create a local copy of the code.

  • Fork an existing repository: If you want to make changes to someone else’s code and contribute those changes back to their repository, you can fork their repository. This creates a copy of their repository in your GitHub account, which you can then clone to your local machine and work on.

  • Create a new repository: If you want to start a new project from scratch, you can create a new repository by clicking the “+” sign in the top right corner of your GitHub account.

Getting Code onto Your Machine

Once you have a repository set up, you’ll want to get the code onto your local machine so you can work on it. To do this, clone the repository using the git clone sourceurl command. Change sourceurl to the address shown in the browser when viewing the root folder of the repository. This will create a local copy of the repository on your machine that you can work with.

Saving Changes with Git

Once you have a copy of the repository on your machine, you can make changes to the code and save those changes to the repository using Git. The basic workflow for this is:

  1. Add changes: Use the git add . command to add the changes you’ve made to the code to the staging area. It’s said “git add dot”. See the dot at the end? That means add all the newly created files into source control.

  2. Commit changes: Use the git commit -m "add feature n command to save the changes to the local repository with a descriptive commit message.

  3. Push changes: Use the git push origin main command to push the changes from your local repository back up to the remote repository on GitHub.

This sequence of commands is very common:

git add .
git commit -m "tell us what you did"
git push origin main

Editing on your Machine

We typically like to edit files on our machine using editors like VS Code or IDEs like PyCharm and Spyder. These local tools provide advanced features including syntax highlighting, code completion, and debugging, which can make our work more efficient.

Editing in the Cloud

However, the power of our local editors and IDEs is increasingly becoming available in the cloud, and we can make many updates to our repositories right from the GitHub web interface. For example, you can use the github.dev web-based editor to edit files and commit your changes.

It’s important to note that if we edit files both on our machine and in the cloud, we can end up with conflicts when trying to merge our changes. Therefore, it’s important to ensure that we always pull down the latest changes from the cloud before making any local edits, and that we push our changes back up to the cloud as soon as we’re finished with them.

Using the git pull command will bring any changes made directly in your GitHub (or other cloud) repository down to your machine.

git pull

Read more about the github.dev editor at:

Remotes

In Git, “origin” is a shorthand name that refers to the remote repository where your code is stored. When you clone a repository, Git automatically creates an “origin” remote that points to the original repository on the server. You can use this remote to pull changes from the server or push your local changes back to it.

You can add more than one remote to a repository.

Branches

Git branches are separate lines of development that allow multiple contributors to work on different features or versions of a project simultaneously.

The default branch in Git is now called “main”, but “master” was previously used, so you may still see it.

Pull and Push

When we use git pull, Git already knows the source and destination of the changes (i.e., the remote and local repositories) because it’s been configured using the git clone command.

When we use git push, we need to specify both the remote repository (the source) and the branch we want to push the changes to (the destination). The origin in git push origin main refers to the remote repository we want to push the changes to, and main refers to the branch on the remote repository that we want to update with our changes.

Git: Branches

In Git, we are always working on a branch of code, which is like a separate “timeline” for the code.

Default Branch

The default branch is employed automatically when we first create a repository, and it is typically and by default named main. On older repos, you may see a master branch instead, but the old terminology is discouraged and easy to update.

Working Alone

For independent projects, we may work directly on the main default branch.

Individual developers may choose to use branches to work on new features or fixes without affecting their main codebase.

Working Together

In a professional environment, it’s generally recommended to create new branches for new features or changes to avoid conflicts with other developers and to make it easier to manage and review changes.

Individual developers can also use branches to experiment with new features or make changes without affecting the main codebase.

We can make changes, commit them to our branch, and then merge our branch back into the default branch when appropriate. Multiple branches allow a team to work on different features or changes at once without worrying about conflicts or breaking the main codebase.

Once we’re satisfied with our changes on a branch, we can create a pull request to request that the changes be reviewed and merged into the default branch. Team leads can then review and merge the changes as needed. The default branch is typically set to “main” and is the primary branch for the project.

You can create a new branch with the git branch command, and switch to that branch with the git checkout command. Once you’re on the new branch, any changes you make and commit will only affect that branch.

To merge a branch back into the main codebase, you can use the git merge command. This will bring any changes from the branch into the main codebase, and you can resolve any conflicts that arise during the merge.

Git branches are an important tool for managing complex projects with multiple contributors, and they allow for efficient collaboration and code review.

Git: Configuration

After installing, configure Git with your name and email.

Use your GitHub email for best results.

Open Git Bash on Windows

To open Git Bash on Windows:

  1. Press the Windows key on your keyboard to open the Start menu.
  2. Type “Git Bash” into the search bar and select it from the list of results.
  3. Git Bash should now open in a new window.

Open Terminal on Mac or Linux

On Mac or Linux, open Terminal app.

Check Git Configuration

Type the following command to display your Git configuration:

git config --list

Look for the following lines in the output:

user.name=Your Name
user.email=your.email@example.com

If you see your name and email listed, then they are set in Git.

Set Git Configuration

If you don’t see your name and email listed, set them using the following commands:

git config --global user.name "Your Name"
git config --global user.email your.email@example.com

Replace “Your Name” and “your.email@example.com” with your actual name and email address.

The --global flag ensures the settings are applied globally across all your Git repositories.

Git: Conflicts

We can edit project files in at least two places:

  • locally, on our machine
  • in the cloud, e.g., by using the editing features in GitHub

Bad Practices

We want to keep our local version and cloud version in sync at all times.

Some of the worst things we can do are:

  1. Forget to pull before we start our work.
  2. Pull code and leave it for a long time, then start working on old, stale code.
  3. Make huge, expansive contributions that take a long time (unless we know how to branch - an intermediate Git skill.)
  4. Wait to push our completed changes to the cloud.

Good Practices

To minimize the chance of conflits:

  1. Always pull code before you start working locally. Never work on stale code!
  2. Make small, incremental changes.
  3. As soon as you finish a useful contribution, git add, commit, and push up to the cloud.

Keep your local and cloud repositories synchronized. Use these for each session.

Before you start:

git pull

After you finish a set of edits:

git add .
git commit -m "add title"
git push

When working collaboratively, communicate with team members and establish a clear workflow. Ensure the team knows who is working on which files and when changes are being made. You might create different small, focused branches that don’t overlap much in terms of the files they modify.

Merge Conflicts

Merge conflicts can occur when:

  • two people edit the same file simultaneously
  • changes are made to a file both locally and in the cloud at the same time.
  • two branches with different changes are merged.

For example, we might use the GitHub cloud editor to make a quick fix to our README.md - forgetting that we’re also in the process of updating installation instructions on the local README.md.

Merge conflicts can be frustrating, but they are an inevitable part of collaborative work.

If you do run into a merge conflict, don’t worry - it’s not the end of the world. Git provides tools to help you resolve conflicts and merge changes together. The first step is to understand which files have conflicts by running git status. The files with conflicts will be marked as “unmerged”.

To resolve the conflict, open the conflicted file and look for the conflicting sections marked with «««< HEAD, =======, and »»»>. Manually edit the file to remove the conflicting sections and keep the changes you want. Once you’ve resolved the conflict, fstage the changes with git add and commit them with git commit.

If you’re still unsure how to resolve the conflict, ask for help from your team members or consult Git documentation. Stay calm and take your time to carefully resolve the conflict.

Experience managing merge conflicts can be very valuable.

Git: Crash Course

Student-recommended video on Git - definitely worth sharing! It covers things in a similar way and you can jump right to the parts you need.

Note: Watch when you have time or use it when you’re ready to learn more about Git. Many students find it very helpful. I don’t know how anyone could provide more information, more efficiently than this.

https://www.youtube.com/watch?v=RGOj5yH7evk

Git and GitHub for Beginners - Crash Course

Over 2 million views.

From the video description:

Learn about Git and GitHub in this tutorial. These are important tools for all developers to understand. Git and GitHub make it easier to manage different software versions and make it easier for multiple people to work on the same software project. This course was developed by Gwen Faraday.

Git: Remotes

In Git, the term “origin” refers to the default remote repository that a local repository is linked to. When you clone a repository from a remote server to your local machine, Git automatically sets up the “origin” remote for you. This allows you to push changes from your local repository to the remote repository, and pull changes from the remote repository to your local repository.

When you clone a repository, Git sets up the origin remote by default, pointing to the repository you cloned from. This means that when you push changes to the remote repository, they will be added to the branch on the remote repository that you cloned from.

Using the “origin” remote allows you to collaborate with others by sharing changes to the same repository. When someone else pushes changes to the remote repository, you can pull those changes down to your local repository and merge them with your own changes.

However, if you edit the same file in both your local repository and the remote repository, conflicts can arise. To avoid conflicts, it’s important to always pull down changes from the remote repository before making your own changes, and to carefully review any merge conflicts that arise.

Working with Remote Repositories

Git provides a set of commands that allow you to work with remote repositories. Here are some commonly used commands:

  • git remote - List the remote repositories that are connected to your local repository.

  • git remote -v - List the remote repositories along with their URLs.

  • git remote add <name> <url> - Add a new remote repository to your local repository. The name parameter is the name you want to give the remote, and url is the URL of the remote repository.

  • git remote rm <name> - Remove a remote repository from your local repository.

  • git push <remote> <branch> - Push your local changes to a remote repository. The remote parameter is the name of the remote repository, and branch is the branch you want to push to.

  • git pull <remote> <branch> - Pull changes from a remote repository into your local repository. The remote parameter is the name of the remote repository, and branch is the branch you want to pull from.

  • git fetch <remote> - Fetch the changes from a remote repository, but don’t apply them to your local repository.

  • git clone <url> - Clone a remote repository to your local machine.

Git Learning: Concepts Over Memorization

Learning every Git command by heart is not necessary nor efficient. Instead, focus on understanding the concepts and workflows of Git, and how the commands fit into those workflows. The vast amount of online resources available will serve as reliable references when you need them.

As you work with Git more frequently, the most common commands will become second nature. However, for the rest, don’t hesitate to look them up. Remember that the value of Git lies not in memorizing commands but in leveraging its powerful version control capabilities to manage your projects effectively.

GitHub

GitHub is a popular, web-based platform that allows data analysts and developers to store and manage their code and collaborate with others.

GitHub is built on Git, which is a distributed version control system that allows developers to track changes to their code over time and collaborate with others on the same codebase.

With GitHub, developers can create their own repositories, which are essentially folders that contain their code, documentation, and other files related to a specific project. They can also fork other people’s repositories to create their own copies, which they can then modify and contribute back to the original repository. This allows for easy collaboration and code sharing among developers.

GitHub provides tools for developers to manage their code, such as the ability to track and resolve issues, review and merge pull requests, and create and manage branches. It also provides a web-based interface for viewing and editing code, as well as a built-in code editor. Additionally, it has a wide range of integrations and APIs that allow developers to automate various development tasks and integrate with other tools and services.

Sign Up For A Free Account

Sign up for a free account with GitHub.com, a code hosting platform that manages a vast number of programming projects. Follow their website instructions to get started.  See the recommendations on GitHub email and username below.

GitHub Email

You’ll need an email. I use a permanent personal email for most GitHub work, rather than a work or school account (which may be temporary). Your email will not be made public.

GitHub Username

You’ll create a GitHub username. Your username will be public. Your username can be anonymous (e.g., ‘analystextraordinaire’) or publicly associated with you. For example, I use ‘denisecase’. Your username will be a part of the URL to all of your projects.

Students New to GitHub

  • Recruiters may look at GitHub and LinkedIn profiles - it can be helpful to show your skills using modern tools. 
  • Be courageous. The best way to learn is by doing, and don’t be too concerned about making mistakes.
  • Git mistakes and do-overs are common getting started.
  • Learning to fix issues is a key skill in data analytics.
  • Keep and share your latest, most useful, and best work in GitHub. 

GitHub Repositories

Each coding project lives in a GitHub repository (called ‘repo’ for short) in the ‘cloud’ (a distributed group of machines).

Git (the system) keeps track of committed changes to an evolving project. 
- The GitHub repo can be kept in sync with a git repo on your local machine. 
- For example     - If a GitHub repo is named datafun-01-getting-started     - On my machine, it’s in my Documents/datafun-01-getting-started directory

Quick Quiz

Go to: https://github.com/denisecase/datafun-01-getting-started

Q: What is the username? 

Q: What is the repo name in the URL? 

Get Started 

After you have an account, you can use the Get Started Guide that the GitHub team has created to help you understand the platform.

For more information on getting started on GitHub, view the “Getting Started with GitHub” video below from the GitHub Training & Guides Youtube Channel.

GitHub Training &amp; Guides GitHub Training &amp; Guides

More About GitHub

The following definition of GitHub comes from Kinsta.com

At a high level, GitHub is a website and cloud-based service that helps developers store and manage their code, as well as track and control changes to their code. To understand exactly what GitHub is, you need to know two connected principles: Version control, which helps developers track and manage changes to a software project’s code, and Git, which is a specific open-source version control system.

Learn more about GitHub in the following video from the GitHub YouTube.

GitHub Video GitHub Video

Free Stuff For Students

For more fun stuff, check these out. 

See Also

There is more information about GitHub in the Hosting Chapter.

Homebrew

Homebrew is a package manager for macOS and Linux that simplifies the installation, updating, and management of software on your system. Homebrew allows you to install various command-line tools, applications, and libraries with ease. It is designed to work seamlessly with macOS and Linux, providing a user-friendly interface for managing software packages.

Jupyter

Jupyter is an open-source web application that allows users to create and share documents that contain live code, equations, visualizations, and narrative text. It is a popular tool for data analysis, scientific computing, and machine learning, and is widely used in academic research, industry, and data science education.

Jupyter gets its name from Julia-Python-and-R - some of the original programming languages supported.

Jupyter provides the following features.

Interactive Computing

Jupyter notebooks allow users to write and execute code interactively, providing an interactive computing environment. This allows users to explore data, prototype algorithms, and create visualizations in a single, cohesive environment.

Multiple Language Support

Jupyter supports multiple programming languages, including Python, R, and Julia. This makes it easy to integrate different tools and frameworks and collaborate with colleagues who use different programming languages.

Collaboration

Jupyter notebooks can be shared with others, allowing for easy collaboration and reproducibility of analyses. This also facilitates communication and knowledge sharing among team members and stakeholders.

Visualization

Jupyter notebooks support interactive visualization libraries such as Matplotlib, Bokeh, and Plotly, making it easy to create and share data visualizations.

Integration

Jupyter notebooks can be integrated with other tools and frameworks such as Git, GitHub, and Docker. This makes it easy to manage version control, share code and data, and deploy projects.

Ecosystem

Jupyter has a rich ecosystem of tools and services, such as JupyterLab, JupyterHub, and Binder, that can help streamline the development and deployment process. Many third-party tools and plugins also integrate with Jupyter to extend its functionality.

Jupyter Installation

The installation process for Jupyter depends on your operating system and your preferred installation method. Follow the instructions below based on your platform.

Jupyter Ecosystem

Here’s a short guide to clarify some of the terms used with Jupyter.

  • JupyterLab: An interactive development environment (IDE) for working with Jupyter notebooks, code, and data. It provides a flexible and powerful user interface that can be customized to suit the needs of individual users.

  • Jupyter Notebook: A web-based interactive computational environment for creating and sharing Jupyter notebooks, which allow you to create and share documents that contain live code, equations, visualizations, and narrative text.

  • JupyterHub: A multi-user server that allows multiple users to access Jupyter notebooks and other resources from a shared server. It is commonly used in educational settings or for collaborative research projects.

  • Jupyter Book: A tool for building beautiful, publication-quality books and documents from computational material, such as Jupyter notebooks. It provides a simple way to create interactive documents with executable code and visualizations.

  • nbconvert: A command-line tool that converts Jupyter notebooks to other formats, such as HTML, PDF, or Markdown. This allows you to share your work with others who may not have Jupyter installed.

  • ipywidgets: A library for creating interactive widgets in Jupyter notebooks. Widgets are user interface elements, such as buttons and sliders, that allow you to interact with and visualize data in real time.

  • nbviewer: A web application that allows you to view Jupyter notebooks without having to install Jupyter yourself. You can simply paste the URL of a notebook and view it in your browser.

Get Started with Jupyter Notebooks

There are excellent resources available for getting started with Jupyter Notebooks.

See:

VS Code

Visual Studio Code (VS Code) is a free and open-source code editor developed by Microsoft. It is available on Windows, Linux, and macOS and offers features such as debugging, syntax highlighting, and intelligent code completion.

Some of the key features of VS Code include:

  • Built-in Git integration
  • Support for multiple languages and frameworks
  • Extensions for customizing the editor and adding new functionality
  • Debugging capabilities for Node.js, Python, and other languages
  • Integrated terminal for running commands and scripts

Using a modern editor or IDE can make your coding experience more efficient and productive.

Installation

The installation process depends on your operating system. Follow the instructions below based on your platform:

VS Code Extensions

VS Code extensions are add-ons that allow users to customize and enhance the functionality of the VS Code.

For example, IntelliSense is a popular VS Code extension that provides intelligent code suggestions, auto-completion, and parameter hints while writing code. It is a built-in extension enabled by default in VS Code.

To learn more about extensions, visit the official documentation at https://code.visualstudio.com/docs/introvideos/extend.

Why VS Code

One reason we teach VS Code over other IDEs (.e.g., Spyder, PyCharm, IDLE) is that VS Code is a more general-purpose code editor that supports multiple languages and workflows, and works on Windows, Mac, and Linux machines. VS Code is capable of handling a wide range of tasks and can be used for web development, data analysis, scripting, and more.

VS Code has a lot of built-in functionality for working with other languages including Markdown, SQL, PowerShell, Julia, and more. Learning VS Code is a great skill for someone getting started with programming, data analysis, and/or automation and wants to learn a versatile environment that will accomodate growing skills.

VS Code is widely used and well-supported, with many resources for learning how to use it effectively. In addition to the comprehensive official documenttaion, there are articles and videos available for begineers through experts.

Subsections of VS Code

VS Code: Installation

PowerShell is a powerful command-line shell and scripting language designed for system administration and automation tasks. Here are some options for installing PowerShell on your system:

Windows Users

  • Option 1: Install via Microsoft Store. If you’re running Windows 10 or later, you can install PowerShell via the Microsoft Store. This is the recommended method, as it ensures that you have the latest version of PowerShell and allows for easy updates.

  • Option 2: Download the MSI installer. If you’re not able to install via the Microsoft Store, you can download the MSI installer from the PowerShell GitHub repository. Choose the appropriate version for your system architecture (32-bit or 64-bit) and follow the installation wizard.

macOS Users

  • Option 1: Install via Homebrew. If you’re using Homebrew on your Mac, you can install PowerShell by running the following command in your terminal: brew install --cask powershell.

  • Option 2: Download the PKG installer. You can also download the PKG installer from the PowerShell GitHub repository. Choose the appropriate version for your macOS version and system architecture (Intel or Apple Silicon) and follow the installation wizard.

Linux Users

  • Option 1: Package manager installation. Most Linux distributions include PowerShell in their package repositories. You can search for PowerShell in your package manager and install it from there. For example, on Ubuntu or Debian, you can run sudo apt-get install powershell.

  • Option 2: Download the package manually. You can also download the package for your distribution directly from the PowerShell GitHub repository and install it manually. Follow the instructions for your specific distribution on the download page.

Once you have PowerShell installed, you can use it to perform a wide range of tasks and automate common system administration tasks. Happy scripting!

Winget

Winget (Windows Package Manager) is an official package manager for Windows systems, developed by Microsoft. It simplifies the process of discovering, installing, upgrading, and removing software on Windows machines. Winget provides command-line access to manage software packages, s imilar to package managers on Linux and macOS systems.

With Winget, you can search for, install, update, and uninstall software packages without having to manually navigate to a website, download installers, or follow installation wizards. Winget automates these tasks and makes it easy to manage software on your Windows system.

Alternatives

For a while yet, Chocolatey is a popular alternative. Chocolatey has been around for a longer time, offering a mature set of features and a large repository of packages. The Chocolatey community is well-established, and it has extensive documentation and support. Chocolatey is known for its versatility and integration with various Windows tools, such as PowerShell and NuGet infrastructure. This makes it a popular choice for many Windows users looking for a reliable and comprehensive package management solution.