A Gentle Introduction to Python

An intro to setup a Python development environment, the basic structures, basic Git and some popular libraries such as Numpy and Pandas.

Contributed by Ahmad Aghaebrahimian (ZHAW-ICLS)

Colab Notebook

Contents

Introduction

Python was created by Guido van Rossum and was first released in 1991. It was named after Monty Python, a popular British comedy group. Python was designed to be an easy-to-read and easy-to-write language that emphasizes readability and code reusability.

In the late 1990s, Python became increasingly popular as a general-purpose programming language, due to its simplicity, versatility, and ability to run on almost any platform. This led to the development of a large number of third-party libraries, making it a popular choice for various applications, including scientific computing, data analysis, artificial intelligence, and web development. For scientific computing and data analysis, for instance, Python has a number of powerful libraries, including NumPy, which provides support for arrays and matrices, and Pandas, which provides fast and efficient data analysis and manipulation. In the field of Artificial Intelligence (AI), Python has a number of libraries too, including TensorFlow and PyTorch, which are used to build and train machine learning models. Additionally, the scikit-learn library provides a simple and efficient way to perform machine learning tasks, including regression, classification, and clustering. Last but not least, Python has a number of popular libraries, including Django, Flask, and FastAPI for web development.

In the early 2000s, Python gained even more popularity when it was included as one of the standard scripting languages for the popular Linux operating system. Today, Python is used by organizations such as NASA, Google, and Facebook, and it has become one of the most widely-used programming languages in the world.

Python has continued to evolve over the years, with the release of several major versions, each of which has brought new features and improvements to the language. The latest version of Python, Python 3, was released in 2008 and has since become the preferred version of the language, due to its improved performance and enhanced features. This overview uses Python 3 as the default version.

Setting up

The first step for setting up Python development environment is to have a Python interpreter installed on the computer. This process may vary slightly based on the computer's operating system. Installing a virtual environment and an Integrated Development Environment (IDE) is done afterward. Please note that by creating a virtual enviroment as described in Step 2 below, the first step can be safely ignored since a virtual environment contains a Python interpreter by itself.

Step 1: Installing Python

You can safely disregard the version of the Python that comes pre-installed by default. In the next step, you will set up a virtual environment, allowing you to create multiple Python environments with different versions as needed.


Step 2: Installing a virtual environmet

The next step in setting up a Python development environment is to create a virtual environment. Virtual environments are an essential tool for Python development. They help maintain a clean and organized development environment and ensure the stability and reproducibility of your projects. Furthermore, virtual environments isolate the development environment from the global Python interpreter, making it particularly beneficial, especially for beginners. There are several ways to create a virtual environment. Here, we use Anaconda for this purpose.

  1. Download the Anaconda script compatible with your Linux distribution form here at the bottom of the page.

  2. Run the script, accept the license, and accept all the default values, except the last one which asks Do you wish the installer to initialize Anaconda3 by running conda init? which should be answered with a yes

    anaconda

  3. When installed, you can open a 'new' terminal and run conda create -n my_new_env python==3.10. Accept all the default options.

  4. Activate your new environment conda activate my_new_env and check the version.

    activate

  5. my_new_env is isolated from system altogether. You can safely remove it with conda remove -n my_new_env --all and create a brand new one.

  6. Alternatively, you can return to the base environment by deactivating the active environment.

    deactivate

  7. A complete Anaconda installation comes with Jupyter Notebook, the most popular notebook system for Python. In Python, a notebook is a web-based interactive computational interface that enables you to run and experiment with Python code interactively. To run a notebook locally, open a terminal and run jupyter notebook

Read more about installing Anaconda on Linux here.

For more information about Conda commands check this


  1. Download the Anaconda Mac graphical installer form here at the bottom of the page.

  2. Run the installer, accept the license agreement and all default values.

  3. When installed, open a 'new' terminal and run conda create -n my_new_env python==3.10. Accept all the default options. (for these steps you may observe similar images as the ones in the Linux part above)

  1. Activate your new environment conda activate my_new_env and check the version, python --version.

  2. my_new_env is isolated from the system altogether. You can safely remove it with conda remove -n my_new_env --all and create a brand new one.

  3. Alternatively, you can return to the base environment by deactivating the active environment conda deactivate.

  4. A complete Anaconda installation comes with Jupyter Notebook, the most popular notebook system for Python. In Python, a notebook is a web-based interactive computational interface that enables you to run and experiment with Python code interactively. To run a notebook locally, open a terminal and run jupyter notebook

Read more about installing Anaconda on Mac here.

For more information about Conda commands check this


  1. Download the Anaconda Windows graphical installer form here, bottom of the page.

  2. Run the installer, accept the license agreement, and all default values. On the 'Advanced Installation Options' page, check all the fields.

  3. When installed, you can work with environments in the command line. Open a command line prompt (CMD) in the start menu and run conda create -n my_new_env python==3.10. Accept all the default options. (for these steps you may observe similar images as the ones in the Linux part above).

  4. Activate your new environment conda activate my_new_env and check the version, python --version.

  5. my_new_env is isolated from the system altogether. You can safely remove it with conda remove -n my_new_env --all and create a brand new one.

  6. Alternatively, you can return to the base environment by deactivating the active environment conda deactivate.

  7. A complete Anaconda installation comes with Jupyter Notebook, the most popular notebook system for Python. In Python, a notebook is a web-based interactive computational interface that enables you to run and experiment with Python code interactively. To run a notebook locally, run CMD and run jupyter notebook in it. You can also run it in the 'Anaconda Navigator' in the start menu.

Read more about installing Anaconda on Windows here.

For more information about Conda commands check this

By installing an Integrated Development Environment (IDE) in the next step, Anaconda will be integrated into the IDE where each project can be configured with a new environmet within the IDE.


Step 3: Installing Integrated Development Environment

The last step is to install a code editor or an Integrated Development Environment (IDE) such as Visual Studio Code, PyCharm, or IDLE (the built-in Python editor). In this tutorial, we install and use PyCharm as the default IDE.

  1. The easiest way to install Pycharm in Linux(Ubuntu) is to use the Software Center. Other distributions have similar options to use. Search for Pycharm and install either Pycharm-community or Pycharm-EDU. This tutorial uses Pycharm-community.

    software Center

  2. Another alternative is to run sudo snap install pycharm-community --classic in the terminal.

  3. When installed, click on New Project in the welcome window or in the File menu on top of the page (if no welcome window has appeared).

  4. PyCharm integrates with the Anaconda which has already been installed, allowing the creation a new environment with a specific Python version for each project. However, it's important to keep in mind that some libraries may not be compatible with certain Python versions. In that case, it's recommended to create environment with a Python version that is compatible with all the required libraries. With Conda installed, removing an environment with an incompatible Python version and creating a new, compatible environment is a straightforward process as described above.

  5. The project automatically creates main.py. One can execute it by right-clicking on the file and Run 'main.py' or from Run in the top menus.

  6. if required, you can install and import new libraries. Click on Terminal at the bottom of the page to open a terminal window. Note to the prompt which includes the active environment created when the project is initialized. Run conda install numpy or pip install numpy to install Numpy a powerful library for working with vectors and matrices.

  7. Check what libraries are already installed in your environment by running conda list or pip list in the terminal.

  8. if a library's version is incompatible, let's say Numpy, you can safely remove it from your environment with pip uninstall numpy and install one with the correct version, let's say 1.24.0, with pip install numpy==1.24.0.

  9. When delivering your code to someone else or running it on other machines, to make sure that the code runs on exactly the same environment with the same library versions, you can generate a requirements file containing all libraries with their versions with conda list --export > requirements.txt or pip list --format=freeze > requirements.txt. This will create a new file requirements.txt which should go along with other codes, letting the recipient replicate the same environment by installing exact libraries with pip install -r requirements.txt.


  1. Download the Pycharm installer from here. Select the Community version which is an open-source and free software. Also, select the dmg package compatible with your processor type (Intel or Apple Silicon).

  2. Install Pycharm by running the installer. Accept all the default values.

  3. When finished, you can run Pycharm by clicking on the newly made icon 'Pycharm CE' in the Applications folder.

  4. Click on New Project in the welcome window or in the File menu on top of the page (if no welcome window has appeared). From now on, all the procedures are exactly similar to step 3 onward, in the Linux part above. Please follow the instruction from there on.


  1. Download the Pycharm installer from here. Select the Community version which is an open-source and free software.

  2. Install Pycharm by running the installer. Accept all the default values.

  3. When finished, you can run Pycharm by clicking on the newly made icon 'Pycharm Community Edition' in the start menu.

  4. Click on New Project in the welcome window or the File menu on top of the page (if no welcome window has appeared). From now on, all the procedures are exactly similar to step 3 onward, in the Linux part above. Please follow the instruction from there on.


Step 4: Git and Github

Git is a distributed version control system that allows developers to track code changes, collaborate with others, and revert to previous versions of their work.

GitHub is a web-based platform (similar to Bitbucket or GitLab) that provides hosting for Git repositories, as well as additional collaboration tools like issue tracking, code review, and continuous integration.

Git commands are often run in the command line or terminal. However, it is nicely integrated with Pycharm and you can use it within your IDE. The first step to using Git is to make sure it's been already installed on the system. Git is available in Linux and MacOS by default. You can check this by opening a terminal and running git --version. In Windows, you need to install Git from here. Download the installer for Windows, run, and accept all the default values except for the terminal emulator which is better to use the Windows default console window (like the image below). In Windows, when Git installation is done, you may need to restart your Pycharm IDE.

After creating a new project in Pycharm, you can open VCS menu and select 'Enable Version Control Integration'. In the next window, select 'Git'. Doing so, VCS in the top menu changes to Git (in Windows, it requires IDE restart). Git has numerous functionalities. Yet, here we describe the most basic ones; Commit, Push, and Clone.

The history of all changes made to code is stored by Git. After making modifications to the code, you can Commit to save the progress. You may add all or only specific files for committing. When committing, it is necessary to provide a description that offers a clue about the changes you have made. These commit points serve as checkpoints for developers, allowing them to revert to earlier stages in case of errors.

Click on the Commit in Git menu in your PyCharm. In the newly opened window, select all 'Unversioned Files'. In the provided space below the window, provide a note or message to remind you what has been changed in this commit. (e.g., Initial commit, Deleted my_file.txt). And hit the Commit button.

you can perform some changes (adding or deleting, or changing some code) and commit each time. Now in the terminal at the bottom of the Pycharm window, run git log to see the history of your commits. Each entry in the log has a commit_id (hash code) using which you can restore your code state to that commit by git checkout commit_id

Up to now, all tracking and progress saving has been done locally. To publish the code into a remote folder (i.e., repository) on GitHub for instance, one needs to open an account in GitHub first. This is a straightforward process. Just navigate to https://github.com/ and SingUp.

Next, within the PyCharm project, click on the menu Git -> GitHub -> Share Project on GitHub

Provide a repository name and description. Click 'Add account' to connect your PyCharm to your GitHub account. Select 'Log In via GitHub' which automatically navigates you to the GitHub Login page. There you may Authorize JetBrains (PyCharm) to connect to your GitHub Account. Provide your credentials and Authorize JetBrains IDE integration.

After this step, PyCharm connects to your GitHub accounts and creates a new repository there. Now, you can click on Push in the Git menu, select your commit within the Push windows, and hit the Push button. You can check your new repository in GitHub and observe the changes that have been updated there.

If you found other people's repositories interesting and decided to work on their code, you can simply Clone their project by clicking on Clone under the Git menu (It is important to acknowledge the contribution of others by citing their work if you have utilized all or part of their code.) This will open a window asking for the URL of the original repository and the local address of where you want the copy the repository. The window will open the project after it copies the content. To learn more about Fork, Pull requests, branches, and many other Git functionalities check this document.


Basics

Python is a high-level language known for its readability and ease of use. Python uses indentation to define blocks of code, such as loops or functions. Indention may set to any number of empty characters (often 4 or 8) which should be consistent throughout the entire code.

Variables in Python are defined using the equal sign my_variable = 5. They do not need a type to be specified. Python has several built-in data types, including numbers (integers and floats), strings, lists, dictionaries, and tuples. It supports the usual arithmetic operators (+, -, *, /, \%), comparison operators (==, !=, <, >, <=, >=), and logical operators (and, or, not). Conditional statements in Python are defined using the keywords if, elif, and else. Python has a large standard library and also allows users to import external modules using the import statement. The following sections will provide an overview of all these concepts along with some examples.

Variables

Data types

Python supports several built-in data types, including:

bool: Boolean values, which can be either True or False.

int: Integer values, such as 1, 2, 3, etc.

float: Floating-point numbers, such as 1.0, 2.5, 3.14, etc. To know more about int, float and other build-in numbers in Python, please check here.

str: String values, such as "hello", "world", etc. Strings can be declared using single or double quotes. Long strings in multiple lines can be declare with triple quotes. When working in IDE, adding a dot(.) in front of a variable trigers the autocomplete functionality of the IDE which show many availabe functions given that particular object. You can find more about string in here.

list: Lists are ordered collections of values, which can be of any data type. For example: [1, 2, 3, "hello", [4, 5]]. Lists are mutable; means that you can add items to them or change its items.

dict: Dictionaries are unordered collections of key-value pairs, where each key is mapped to a value. For example: {"key1": "value1", "key2": "value2"}.

tuple: Tuples are similar to lists, but they are immutable, meaning that their values cannot be changed once they are created. For example: (1, 2, 3, "hello", (4, 5)).

set: Sets are unordered collections of unique values.

These data types can be combined to create more complex data structures, such as lists of dictionaries or dictionaries of lists. Additionally, you can also create your own custom data types by defining classes and objects.

Operators

Operators are special symbols that perform specific operations on one or more operands. An operand is an object on which an operator operates.

There are several types of operators in Python:

Arithmetic Operators perform basic arithmetic operations like addition, subtraction, multiplication(*), exponentiation(**), modulo(%, remaining of an integer devision), etc.

Comparison Operators compare two values and return a Boolean value based on the comparison. For example: > for greater than, < for less than, == for equal to, etc.

Assignment Operators are used to assign values to variables. For example: = for assignment, += for addition and assignment, etc.

Logical Operators perform operations on Boolean values and return a Boolean value based on the evaluation of a logical expression. For example: and for logical AND, or for logical OR, not for logical NOT, etc.

Membership Operators test for membership in a sequence, such as strings, lists, or tuples. For example: in for membership. (see an example in set examples above)

Identity Operators compare the memory locations of two objects and return True if they are the same objects located at the same memory location, otherwise False. For example: is for identity checking (see an example in bool examples above)

Conditionals

Conditional statements allow you to control the flow of execution based on certain conditions. They allow you to check if a condition is true or false, and execute certain blocks of code based on the result.

Loops

Loops are an important construct in programming that allow you to repeat a block of code multiple times. In Python, there are two types of loops: for loops and while loops.

It is important to be careful when using while loops, as they can run forever if the condition never becomes False.

Loop control

Loop control mechanisms in Python allow you to control the flow of execution within a loop. They provide a way to exit a loop prematurely, skip the current iteration, or do nothing. break, continue, and pass are some common loop control mechanisms in Python:

File Input/Output

To be able to read or write a file in Python, one must first utilize the built-in open() function to open it. This function generates a file object that can be used to invoke other related methods.

This is a simple syntax for the open function:

file_obj = open(file_name, access_mode)

file_name is an absolute or relative file address. access_mode is the mode of the file to be opened (read, write, append, etc.)

To read more about Files check this

Functions

Functions in Python allow for organizing and reusing code, making it easier to write, test, and maintain large programs. Functions provide a way to encapsulate code blocks and perform specific tasks, allowing for modular and organized code. They also promote code reuse and reduce the risk of bugs by breaking down complex code into smaller, manageable pieces. Additionally, functions can make code easier to read and understand by giving descriptive names to code blocks and abstracting away implementation details.

Functions are defined using the def keyword, followed by the function name, a set of parentheses, and a colon. The code inside the function is indented, and the function is executed when it is called.

Modules and Libraries

Modules are collections of functions, variables, and other objects that can be imported into other Python files. Modules provide a way to organize related functions and objects into a single, importable file. They help to separate code into different, reusable components and reduce the risk of naming conflicts between different parts of a program. One can write several functions in a file, import the file in another file where the functions defined in the imported file can simply be called.

Libraries are collections of modules that provide additional functionality for a programming language. In Python, libraries are usually distributed as packages and can be installed using package managers like pip (or conda as you did earlier). They provide a wide range of functionalities, including scientific computing, data analysis, machine learning, web development, and more. For example:

Conclusion

Sequential structure ( e.g., running lines of codes one after another), conditional structure (e.g., if/else), repetitive structure (e.g., loops), and function structure (e.g., def) covered in this overview are four fundamental building blocks of any software program. They can be combined in various ways to solve different problems and meet specific requirements. There is one other building block known as a class construct in Object-Oriented Programming (OOP) which will be covered later in more advanced modules in your studies.


Useful libraries

Numpy and Pandas as two popular python libraries are introduced here.

Numpy

NumPy (Numerical Python) is a Python library for numerical computing, providing powerful tools for working with multi-dimensional arrays and matrices. It includes functions for mathematical operations, random number generation, linear algebra, Fourier analysis, and more. NumPy is widely used in scientific computing, data analysis, machine learning, and other fields where high-performance numerical operations are required. To work with Numpy, let's first install the library:

Data structure:

Before working with Numpy, let's review some basic and useful concepts in linear algebra. Beginning with scalar, vector, matrix, and tensor data structures

Scalar: Single numerical value (rank 0), e.g., 2.0

Vector: An array of numbers (rank 1), e.g., [2.0, 3.0, 4.0]

Matrix: An array of numbers arranged in rows and columns (rank 2), e.g. a 2-row by 3-column (i.e., 2 x 3) matrix of zeros,

Tensor: A generalization of a matrix with an arbitrary rank, e.g., rank-1 tensor (or vector), rank-2 tensor (or matrix), rank-3 tensor, ...

Data types:

Int (integer) and Float with different precisions (8, 16, 32, 64 bit) are two common data types in numpy arrays. Uint (unsigned integer), bool, and complex are some other data types. Numpy arrays can have their data types explicitly declared or implicitly inferred.

The copy of an array is only a pointer to the original array (i.e. changing the copy version, changes the original array), unless the copy() function is expliceltly used used.

Specific matrices:

Inthe following block, several specific types of matrices are introduced.

zero/one/full matrix matrices with any size in which all elements are the same number, zeros, ones, or a predefined other value

Identity matrix (denoted with I) is a square matrix (i.e., has the same number of rows and columns) that has 1's along the main diagonal and 0's elsewhere. An identity matrix is similar to 1 in ordinary arithmetic; multiplying every matrix with an identity matrix (with compatible size) yields the same matrix.

Inverse matrix The inverse of a matrix(denoted with a superscript -1) is another square matrix that, when multiplied by the original matrix, yields the identity matrix of the same size. Note: a matrix is called singular if it has no inverse.

Operations:

Numpy comes with a wide range of arithmetic and mathematical operations:

Element-wise operations vector/vector or matrix-matrix addition, subtraction, multiplication, and division are possible if both vectors or matrices have the same rank and size.

Some other operations:

Internal/dot product is the sum over element-wise multiplication of two vectors. It yields a scalar.

Matrix multiplication Assuming A, B, and C are matrices. To compute the product C = A x B, the number of columns in A and the number of rows in B should be the same. The product (a matrix) will have the same row and same column as A, and B respectively.

C(m x n) = A(m x k) x B(k x n)

Transposition (denoted with a small superscript T) means to switch the rows and columns indices of a matrix. In matrix multiplication, sometimes we need to transpose one matrix to satisfy the condition for matrix multiplication as described above

Slicing:

Numpy provides several ways to index arrays, such as slicing, integer indexing and Boolean indexing.

Note: Integer indexing and slicing can be combined to construct lower ranked arrays:

Example:

To wrap up this section we do some image manipulations using numpy. Before that, we should install two other libraries:


Pandas

Pandas is a fast and popular data manipulation library for the Python programming language. To begin with Pandas, let's first install the library:

Pandas is built on top of the NumPy library and provides high-level data structures and functions to manipulate data. The two main data structures in Pandas are the Series and the DataFrame.

Series:

A Series is a one-dimensional labeled array that can hold any data type, such as integers, floats, strings, or even Python objects.

Dataframe:

A DataFrame is a two-dimensional labeled data structure with columns of potentially different data types. It can be thought of as a spreadsheet.