Machine Learning, Computer Vision and Natural Language Processing with Python

Warith Harchaoui, Mohamed Chelali, Matias Tassano and Pierre-Louis Antonsanti
Special thanks to Azedine Mani

Since December 2018
Updated in October 2024

Top

Introduction
Programming in Artificial Intelligence?
Installation of Artificial Intelligence toolboxes
Conclusion

Introduction

Top

Artificial Intelligence needs heavy computations. During the 2010s, the Deep Learning community paved the way of hardware acceleration by historically Graphics Processing Units (GPU) diverted from its original usage for the benefits of Applied Mathematics Research beyond Graphics. The aim of this webpage is to present a basic cheat sheet for programming in Machine Learning (i.e. Statistical Learning, Pattern Recognition, Artificial Intelligence, Data Science) for tremendous applications such as in Computer Vision, Sound Processing and Natural Language Processing.

To the best of our knowledge, there is two hardware and software solutions for Machine Learning computations acceleration:

CUDA acceleration thanks to NVIDIA GPUs. CUDA is a C/C++ dialect for massive parallelization that provides 40x, 100x speed compared to regular CPU computations. As a research scientist, you can for a free NVIDIA GPU thanks to the NVIDIA Research promotion initative.
Apple M1 is neither a CPU nor a GPU, it combines on the same chip without motherboard intermediate a CPU, GPU, Controller hub, RAM and an additional Neural Engine. As of today, more and more software is compiled for direct usage for Apple M1 without learning a new programming language while still leveraging the hardware acceleration.

For these kind of accelerations in scientific programming outside Apple M1 and NVIDIA, in spite of high hopes regarding OpenCL (the equivalent non-proprietary and open version of CUDA), there is no other realistic way in terms of research community contributions to perform Machine Learning, Computer Vision and Natural Language Processing in high-level languages (i.e. non-C/C++) with modern toolboxes but it is worth encouraging shy initiatives yet like DeepCL and PlaidML.

This page has been extensively used in the MAP5 lab in Applied Mathematics to conduct research in Machine Learning (ML), Computer Vision (CV) and Natural Language Processing (NLP) in Python. Please feel free to contact Warith Harchaoui, for improvements and suggestions.

Programming in Artificial Intelligence?

Top

Depending on the decade, we observe several changes in the Machine Learning (ML) corporate and academic environments: Java, Matlab, Python. In this section we observe that trillion-dollar companies chose Python with associated funding and research community. For these reasons, we strongly recommend Python for your ML projects to adapt to scientific and industrial trends. Choosing Python for prototyping has nothing to do with our personal tastes, we just follow our contemporary trend. Nevertheless, people using R or even Java can still use ONNX-R and DL4J. There is also an emerging trend in Rust for Machine Learning. Let's stay open-minded!

Several toolboxes in Machine Learning (ML), Computer Vision (CV) and Natural Language Processing (NLP) have been released these last decades. Many great ideas have been developed in all them but only few remain popular in terms of scientific communities:

Machine Learning (excluding deep learning): scikit-learn
Deep Learning: Tensorflow and PyTorch emerged as preferred choices for large-scale computations involving differentiation. More concretely, choosing between PyTorch and Tensorflow is merely a question of taste because the possibilities are huge and similar. On the one hand, Tensorflow is well-established for production settings and hardware compatibility beyond servers e.g., phones (see Tensorflow for Swift, Tensorflow Lite for Android). On the other hand, PyTorch offers more modularity and flexibility since its beginning a little after Tensorflow (see the OpenAI effervescence)
Deep Learning state-of-the-art pretrained models in Computer Vision, Natural Language Processing and even Speech Processing: Hugging Face
Computer Vision (excluding deep learning): OpenCV

Recently in Deep Learning, Keras and Pytorch Lightning are popular within Python as they use the above-mentioned reference toolboxes as backends.

The basic meaning of model serving is to host machine-learning models (on the cloud or on premises) and to make their functions available via API so that applications can incorporate AI into their systems. Model serving is crucial, as a business cannot offer AI products to a large user base without making its product accessible. Deploying a machine-learning model in production also involves resource management and model monitoring including operations stats as well as model drifts. For language compatibility, two solutions emerge as industry-level quality machineries:

MXNet: it is an Apache Project that provides both language compatibility and multi-hardware computations
ONNX: it is both a ML model file format and a deployment framework backed up by Open Source efforts combined with Microsoft and Linux Foundation AI.

ONNX put a lot of effort into including deep and non-deep (e.g. sklearn) models in production. In practice, we appreciate the fact that one can prototype in its favorite language (e.g., Python) and deploy via ONNX format for the model and ONNX Runtime for programming in C/C++, C#, Java, JavaScript and Objective-C and MXNet in R, C/C++, Clojure, Java, JavaScript, Julia, Perl, Scala.

For those who are not afraid of really programming :) there is the good old technique of writing brand new algorithms in C/C++ and calling them in higher-level languages. Matlab, R and Python offer such possibilities and for Python, the most effective experience we had was with pybind11 (for example, Facebook researchers chose it for fastText). Although Cython became very respected thanks to scikit-learn, we enjoy the efficiency of Numba. Now, Apple develops coremltools that offers the possibility to benefit from the M1 acceleration while still working in Python which is good news especially at inference time.

Installation of Artificial Intelligence toolboxes

Top

For optimal performance, we recommend using an Ubuntu operating system equipped with NVIDIA GPU acceleration. This setup can also be remotely controlled from a MacOS machine. That being said, for prototyping with small datasets and for programming and iteration, we recommend using Visual Studio Code, which is available on both Windows and MacOS.

Ubuntu (20.04 LTS or higher)

Install

We install a pip environment within a conda environment called ENV=env4ml of Python version 3.10 (at the time of writing, pytorch recommends python 3.10, "Getting Started" section of the website)

Conda: these commands can run within the terminal of Visual Studio Code (free!)

mkdir -p ~/miniconda3
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -O ~/miniconda3/miniconda.sh
bash ~/miniconda3/miniconda.sh -b -u -p ~/miniconda3
rm -rf ~/miniconda3/miniconda.sh

~/miniconda3/bin/conda init bash

Close and reopen your Terminal window

cd ~
ENV=env4ml
conda update -y -n base -c defaults conda
conda create -y -n $ENV python=3.10
conda activate $ENV
conda install -y pip
yes | pip install scikit-learn pandas matplotlib seaborn

NVIDIA driver and CUDA
For NVIDIA GPUs on Ubuntu, we recommend you to follow Google Cloud instructions on your local machines. This is mandatory for NVIDIA acceleration.

Usage


 ENV=env4ml
 conda activate $ENV

macOS (11 Big Sur or higher)

Top

Install

We install a pip environment within a conda environment called ENV=env4ml of Python version 3.10 (at the time of writing, pytorch recommends python 3.10, "Getting Started" section of the website)

Command Line Tools (xCode): Use the Terminal.app of your MacOS
```
xcode-select --install
```

Brew and wget

/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
brew install wget

Conda: You have to determine whether you have an Intel or M1 Apple computer thanks to the "About This Mac" menu at the top left Apple icon of your screen.
These commands can run within the terminal of Visual Studio Code (free!)

Intel

mkdir -p ~/miniconda3
curl https://repo.anaconda.com/miniconda/Miniconda3-latest-MacOSX-x86_64.sh -o ~/miniconda3/miniconda.sh
bash ~/miniconda3/miniconda.sh -b -u -p ~/miniconda3

rm -rf ~/miniconda3/miniconda.sh
~/miniconda3/bin/conda init bash
~/miniconda3/bin/conda init zsh

or M1

mkdir -p ~/miniconda3
curl https://repo.anaconda.com/miniconda/Miniconda3-latest-MacOSX-arm64.sh -o ~/miniconda3/miniconda.sh
bash ~/miniconda3/miniconda.sh -b -u -p ~/miniconda3

rm -rf ~/miniconda3/miniconda.sh
~/miniconda3/bin/conda init bash
~/miniconda3/bin/conda init zsh

Close and reopen your Terminal window


cd ~
ENV=env4ml
conda update -y -n base -c defaults conda
conda create -y -n $ENV python=3.10
conda activate $ENV
conda install -y pip
yes | pip install scikit-learn pandas matplotlib seaborn

Usage


ENV=env4ml
conda activate $ENV

Windows (10 or higher)

Top

Install

We install a pip environment within a conda environment called ENV=env4ml of Python version 3.10 (at the time of writing, pytorch recommends python 3.10, "Getting Started" section of the website)

Conda: These commands can run within the terminal of Visual Studio Code (free!)

curl https://repo.anaconda.com/miniconda/Miniconda3-latest-Windows-x86_64.exe -o miniconda.exe
start /wait "" miniconda.exe /S
del miniconda.exe

Basic toolboxes:


ENV=env4ml
conda update -y -n base -c defaults conda
conda create -y -n $ENV python=3.10
conda activate $ENV
conda install -y pip
yes | pip install scikit-learn pandas matplotlib seaborn

Usage


ENV=env4ml
conda activate $ENV

Web Browser

Top

Using your own web browser for demonstrations and remote computing has become increasingly popular. First, select your operating system from the options above for standard installation instructions. Then, you can enhance your setup with one of these three "web browser" solutions:

Jupyter
Jupyter is a local in-browser programming environment which is suitable for demos, hands-on, public workshops and even quick and dirty prototyping.
```
yes | pip install jupyter
```
For using it, you just have to launch on your Terminal app
```
jupyter notebook
```
and your default web browser will appear for programming.
Colab
Google Colab is inspired by Jupyter and built upon it. It is the same spirit except you are using Google hardware and especially their GPUs and even TPUs.
Streamlit
Streamlit is an open-source app framework for Machine Learning and Data Science teams in Python for create web apps in minutes without front/server skills.
```
yes | pip install streamlit watchdog
```
It is really straightforward to use:
```
streamlit run app.py
```
and it opens your web browser with your app.

Conclusion

Top

This page outlines installation procedures for Artificial Intelligence development in Python, a rapidly growing field. We strongly recommend beginners to explore the courses offered for free by Kaggle. Our choice of conda and pip as environment management systems is based on their distinct advantages:

conda is a comprehensive environment management system capable of creating isolated environments, even specifying Python versions.
pip is a widely used package manager that provides access to a vast repository of Python libraries and tools.

Our approach leverages the strengths of both tools: we initially create an environment with conda and then use pip within that environment to install packages. This strategy allows for the use of both pip install xxx and conda install conda install xxx commands within the same environment. To export your environment's package configuration, you can use:

conda list --export > requirements.txt

and you can recreate it with:

conda create --name ENVNAME --file requirements.txt

where ENVNAME is your chosen conda/pip environment name.

If something goes wrong in your installation to the point you need to remove it. You can use these commands:

cd ~
rm -rf miniconda*
rm -rf .conda*