There are many programming languages for data analysis (for example R), however Python has the advantage to be more intuitive for software developers or people with programming skills. With the use of data analysis packages and the access to different machine learning and Artificial intelligence packages, python has become more and more popular in the field of Data Analysis and Artificial Intelligence.
If you don’t know anything about Python, this is a good place to start.
First, we have to get into the basics: Installations and choosing a 'ready to go' IDE.
For this initial steps we will need to install python in our OS.
In order to make everything as easy and smooth as possible, I would recommend for beginners to install Anaconda (https://www.anaconda.com/download). Anaconda has the advantage of having all the tools that you will need into one working place. So you don’t need to install python or Jupyter, because the Anaconda package has them both and other tools that will be useful in the future.
If you don’t want to install anaconda, you will have to install python via command or go to the official website and download the installer: https://www.python.org/downloads
Follow the steps of the wizard and Python will be installed in your OS. After that you have the option to install Jupyter notebook or try your code via text editor or command lines.
*** If you are not a fan of installations, you can try Jupyter notebooks online from the Azure website: https://notebooks.azure.com/ ***
Once you have Anaconda installed, go to the app and launch Jupyter notebook.
This will open a tree view of your Home folder.
To add a new folder click ‘New’ —> ‘Folder’
A new “Untitled Folder’ will appear (you can change the name by selecting the folder and doing click in ‘rename’.
To create a new Jupyter Notebook, click ‘New’ —> ‘Notebook: Python3’
Getting around your Jupyter Notebook
To run the code you will have to type it in the cell, then click “Run” this will bring the output below the cell.
You can add more cells on the menu ‘Insert’, and later export the notebook as a python file (.py) to be executed later.
Python includes some packages by default, to access those packages you only need to import them in your code. If the package is not a default package from python you will need to install the package before using it.
To import a package first you have to install it in your python environment. If you are using conda, you can do this by typing into your terminal:
conda install package_name
If you don't have anaconda or want to install it in Python, you first need to install pip (it will make things easier).
Linux: sudo apt install python3-pip
mac: sudo easy_install pip3
Windows. Pip3 is usually installed along with python via installer.
After having pip install you can use the following command in your terminal.
pip3 install package_name
('pip install package_name' for python 2x)
To import packages in your python file or jupyter notebook, you have to call them before you use them, example:
Import numpy as np
(‘as xx’ will be the alias.)
Check out this website for more python packages: https://pypi.org/project/pip/
In python variables types are dynamically typed so you don’t need to declare the type.
You can define list of strings, numbers or mixed list as an array.
NumPy, or Numeric Python is a Python package which provides a ‘numpy array’, this is an alternative to the regular python list. The great advantage of numpy over normal list is that you can perform calculations on the arrays very easily and fast.
To install numpy use the following command in your terminal:
pip3 install numpy
Numpy arrays have the disadvantage to be a 2D array which you can only have one type of data. If you need to work with different types of data, Pandas is a great ally. Pandas Dataframes have rows and columns that allow to store different types of data and manipulate the data every easily.
To install the package you will need to write the command in your terminal:
pip3 install pandas