Machine Learning, Computer Vision and Natural Language Processing with Python

Warith Harchaoui, Mohamed Chelali, Matias Tassano and Pierre-Louis Antonsanti
Special thanks to Azedine Mani

Since December 2018
Updated in April 2024

Contents

Top

Introduction

Top

Artificial Intelligence needs heavy computations. During the 2010s, the Deep Learning community paved the way of hardware acceleration by historically Graphics Processing Units (GPU) diverted from its original usage for the benefits of Applied Mathematics Research beyond Graphics. The aim of this webpage is to present a basic cheat sheet for programming in Machine Learning (i.e. Statistical Learning, Pattern Recognition, Artificial Intelligence, Data Science) for tremendous applications such as in Computer Vision, Sound Processing and Natural Language Processing.

To the best of our knowledge, there is two hardware and software solutions for Machine Learning computations acceleration:

For these kind of accelerations in scientific programming outside Apple M1 and NVIDIA, in spite of high hopes regarding OpenCL (the equivalent non-proprietary and open version of CUDA), there is no other realistic way in terms of research community contributions to perform Machine Learning, Computer Vision and Natural Language Processing in high-level languages (i.e. non-C/C++) with modern toolboxes but it is worth encouraging shy initiatives yet like DeepCL and PlaidML.

This page has been extensively used in the MAP5 lab in Applied Mathematics to conduct research in Machine Learning (ML), Computer Vision (CV) and Natural Language Processing (NLP) in Python. Please feel free to contact Warith Harchaoui, for improvements and suggestions.

Programming in Artificial Intelligence?

Top

Depending on the decade, we observe several changes in the Machine Learning (ML) corporate and academic environments: Java, Matlab, Python. In this section we observe that trillion-dollar companies chose Python with associated funding and research community. For these reasons, we strongly recommend Python for your ML projects to adapt to scientific and industrial trends. Choosing Python for prototyping has nothing to do with our personal tastes, we just follow our contemporary trend. Nevertheless, people using R or even Java can still use ONNX-R and DL4J. There is also an emerging trend in Rust for Machine Learning. Let's stay open-minded!

Several toolboxes in Machine Learning (ML), Computer Vision (CV) and Natural Language Processing (NLP) have been released these last decades. Many great ideas have been developed in all them but only few remain popular in terms of scientific communities:

Recently in Deep Learning, Keras and Pytorch Lightning are popular within Python as they use the above-mentioned reference toolboxes as backends.

The basic meaning of model serving is to host machine-learning models (on the cloud or on premises) and to make their functions available via API so that applications can incorporate AI into their systems. Model serving is crucial, as a business cannot offer AI products to a large user base without making its product accessible. Deploying a machine-learning model in production also involves resource management and model monitoring including operations stats as well as model drifts. For language compatibility, two solutions emerge as industry-level quality machineries:

ONNX put a lot of effort into including deep and non-deep (e.g. sklearn) models in production. In practice, we appreciate the fact that one can prototype in its favorite language (e.g., Python) and deploy via ONNX format for the model and ONNX Runtime for programming in C/C++, C#, Java, JavaScript and Objective-C and MXNet in R, C/C++, Clojure, Java, JavaScript, Julia, Perl, Scala.

For those who are not afraid of really programming :) there is the good old technique of writing brand new algorithms in C/C++ and calling them in higher-level languages. Matlab, R and Python offer such possibilities and for Python, the most effective experience we had was with pybind11 (for example, Facebook researchers chose it for fastText). Although Cython became very respected thanks to scikit-learn, we enjoy the efficiency of Numba. Now, Apple develops coremltools that offers the possibility to benefit from the M1 acceleration while still working in Python which is good news especially at inference time.

Installation of Artificial Intelligence toolboxes

Top

For optimal performance, we recommend using an Ubuntu operating system equipped with NVIDIA GPU acceleration. This setup can also be remotely controlled from a MacOS machine. That being said, for prototyping with small datasets and for programming and iteration, we recommend using Visual Studio Code, which is available on both Windows and MacOS.

Ubuntu (20.04 LTS or higher)

Install

We install a pip environment within a conda environment called env4ml of Python version 3.11
  1. Conda: these commands can run within the terminal of Visual Studio Code (free!)
    mkdir -p ~/miniconda3
    wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -O ~/miniconda3/miniconda.sh
    bash ~/miniconda3/miniconda.sh -b -u -p ~/miniconda3
    rm -rf ~/miniconda3/miniconda.sh
    
    ~/miniconda3/bin/conda init bash
    ~/miniconda3/bin/conda init zsh
    
    Close and reopen your Terminal window
    cd ~
    conda update -y -n base -c defaults conda
    conda create -y -n env4ml python=3.11
    conda activate env4ml
    conda install -y pip
    yes | pip install scikit-learn pandas matplotlib seaborn 
  2. NVIDIA driver and CUDA
    For NVIDIA GPUs on Ubuntu, we recommend you to follow Google Cloud instructions on your local machines. This is mandatory for NVIDIA acceleration.

Usage

conda activate env4ml

macOS (11 Big Sur or higher)

Top

Install

We install a pip environment within a conda environment called env4ml of Python version 3.11
  1. Command Line Tools (xCode): Use the Terminal.app of your MacOS
    xcode-select --install
  2. Brew and wget
    /bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
    brew install wget
  3. Conda: You have to determine whether you have an Intel or M1 Apple computer thanks to the "About This Mac" menu at the top left Apple icon of your screen.
    These commands can run within the terminal of Visual Studio Code (free!)
    • Intel
      mkdir -p ~/miniconda3
      curl https://repo.anaconda.com/miniconda/Miniconda3-latest-MacOSX-x86_64.sh -o ~/miniconda3/miniconda.sh
      bash ~/miniconda3/miniconda.sh -b -u -p ~/miniconda3
      
      rm -rf ~/miniconda3/miniconda.sh
      ~/miniconda3/bin/conda init bash
      ~/miniconda3/bin/conda init zsh
      
      
    • or M1
      mkdir -p ~/miniconda3
      curl https://repo.anaconda.com/miniconda/Miniconda3-latest-MacOSX-arm64.sh -o ~/miniconda3/miniconda.sh
      bash ~/miniconda3/miniconda.sh -b -u -p ~/miniconda3
      
      rm -rf ~/miniconda3/miniconda.sh
      ~/miniconda3/bin/conda init bash
      ~/miniconda3/bin/conda init zsh
      
    Close and reopen your Terminal window
    
    cd ~
    conda update -y -n base -c defaults conda
    conda create -y -n env4ml python=3.11
    conda activate env4ml
    conda install -y pip
    yes | pip install scikit-learn pandas matplotlib seaborn
    

Usage

conda activate env4ml

Windows (10 or higher)

Top

Install

We install a pip environment within a conda environment called env4ml of Python version 3.11
  1. Conda: These commands can run within the terminal of Visual Studio Code (free!)
    curl https://repo.anaconda.com/miniconda/Miniconda3-latest-Windows-x86_64.exe -o miniconda.exe
    start /wait "" miniconda.exe /S
    del miniconda.exe
  2. Basic toolboxes:
    conda update -y -n base -c defaults conda
    conda create -y -n env4ml python=3.11
    conda activate env4ml
    conda install -y pip
    yes | pip install scikit-learn pandas matplotlib seaborn

Usage

conda activate env4ml

Web Browser

Top

Using your own web browser for demonstrations and remote computing has become increasingly popular. First, select your operating system from the options above for standard installation instructions. Then, you can enhance your setup with one of these three "web browser" solutions:

Conclusion

Top

This page outlines installation procedures for Artificial Intelligence development in Python, a rapidly growing field. We strongly recommend beginners to explore the courses offered for free by Kaggle. Our choice of conda and pip as environment management systems is based on their distinct advantages:

Our approach leverages the strengths of both tools: we initially create an environment with conda and then use pip within that environment to install packages. This strategy allows for the use of both pip install xxx and conda install conda install xxx commands within the same environment. To export your environment's package configuration, you can use:
conda list --export > requirements.txt
and you can recreate it with:
conda create --name ENVNAME --file requirements.txt
where ENVNAME is your chosen conda/pip environment name.

If something goes wrong in your installation to the point you need to remove it. You can use these commands:

cd ~
rm -rf miniconda*
rm -rf .conda*