Machine Learning, Computer Vision and Natural Language Processing with Python

Warith Harchaoui, Mohamed Chelali, Matias Tassano and Pierre-Louis Antonsanti
Special thanks to Azedine Mani

December 2018
Updated in April 2022

Contents

Top

Introduction

Top

Artificial Intelligence needs heavy computations. During the 2010s, the Deep Learning community paved the way of hardware acceleration by historically Graphics Processing Units (GPU) diverted from its original usage for the benefits of Applied Mathematics Research beyond Graphics. The aim of this webpage is to present a basic cheat sheet for programming in Machine Learning (i.e. Statistical Learning, Pattern Recognition, Artificial Intelligence, Data Science) for tremendous applications such as in Computer Vision, Sound Processing and Natural Language Processing.

To the best of our knowledge, there is two hardware and software solutions for Machine Learning computations acceleration:

For these kind of accelerations in scientific programming outside Apple M1 and NVIDIA, in spite of high hopes regarding OpenCL (the equivalent non-proprietary and open version of CUDA), there is no other realistic way in terms of research community contributions to perform Machine Learning, Computer Vision and Natural Language Processing in high-level languages (i.e. non-C/C++) with modern toolboxes but it is worth encouraging shy initiatives yet like DeepCL and PlaidML.

This page has been extensively used in the MAP5 lab in Applied Mathematics to conduct research in Machine Learning (ML), Computer Vision (CV) and Natural Language Processing (NLP) in Python. Please feel free to contact Warith Harchaoui, for improvements and suggestions.

Programming in Artificial Intelligence?

Top

Depending on the decade, we observe several changes in the Machine Learning (ML) corporate and academic environments: Java, Matlab, Python. In this section we observe that trillion-dollars-sized companies chose Python with associated funding and research community. For these reasons, we strongly recommend Python for your ML projects to adapt to scientific and industrial trends. Choosing Python for prototyping has nothing to do with our personal tastes, we just follow our contemporary trend. People using R or even Java can still use ONNX-R and DL4J.

Several toolboxes in Machine Learning (ML), Computer Vision (CV) and Natural Language Processing (NLP) have been released these last decades. Many great ideas have been developed in all them but only few remain popular in terms of scientific communities:

Recently in Deep Learning, Keras and Lightning are popular within Python as they use the above-mentioned reference toolboxes as backends.

The basic meaning of model serving is to host machine-learning models (on the cloud or on premises) and to make their functions available via API so that applications can incorporate AI into their systems. Model serving is crucial, as a business cannot offer AI products to a large user base without making its product accessible. Deploying a machine-learning model in production also involves resource management and model monitoring including operations stats as well as model drifts. For language compatibility, two solutions emerge as industry-level quality machineries:

ONNX put a lot of effort into including deep and non-deep (e.g. sklearn) models in production. In practice, we appreciate the fact that one can prototype in its favorite language (e.g. Python) and deploy via ONNX format for the model and ONNX Runtime for programming in C/C++, C#, Java, JavaScript and Objective-C and MXNet in R, C/C++, Clojure, Java, JavaScript, Julia, Perl, Scala.

For those who are not afraid of really programming :) there is the good old technique of writing brand new algorithms in C/C++ and calling them in higer-level languages. Matlab, R and Python offer such possibilities and for Python, the most effective experience we had was with pybind11 (for example, Facebook researchers chose it for fastText). Although Cython became very respected thanks to scikit-learn, we enjoy the effciency of Numba. Now, Apple develops coremltools that offers the possibility to benefit from the M1 acceleration while still working in Python which is good news especially at inference time.

Installation of Artificial Intelligence toolboxes

Top

Ideally, we recommand an Ubuntu operating system with NVIDIA GPU acceleration for computing possibly remotely controlled by MacOS machine. That being said, for prototyping with small datasets, programming and iterating with an Apple machine is comfortable with Visual Code.

Ubuntu (20.04 LTS or higher)

Install

We install a pip environment within a conda environment called ENVNAME=env4ml of Python version 3.8
  1. Conda
    wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -O ~/miniconda.sh
    bash ~/miniconda.sh -b -p $HOME/miniconda/.
    Close and reopen your Terminal window
    source $HOME/miniconda/bin/activate
    conda init bash
    
    Close and reopen your Terminal window
    cd ~
    PYTHONVERSION=3.8
    ENVNAME=env4ml
    conda update -y -n base -c defaults conda
    conda create -y -n $ENVNAME python=$PYTHONVERSION
    conda activate $ENVNAME
    conda install -y pip
    yes | pip install scikit-learn pandas matplotlib seaborn opencv-python
  2. NVIDIA driver and CUDA
    For NVIDIA GPUs on Ubuntu, we recommend you to follow Google Cloud instructions on your local machines. This is mandatory for NVIDIA acceleration.
  3. Tensorflow Google made the effort to make a single Tensorflow install command line for any scenario
    yes | pip install tensorflow
  4. PyTorch
    Depending on your configuration, you will find what suits you on the Pytorch install page. For CUDA version, type nvcc --version if it ends up for example with release 10.1, V10.1.243, then your CUDA version is 10.1 and thus your install command would be:
    yes | pip install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu101
    knowing that the end can vary (it could be /cu113 for CUDA 11.3).

Usage

conda activate $ENVNAME

macOS (11 Big Sur or higher)

Top

Install

We install a pip environment within a conda environment called ENVNAME=env4ml of Python version 3.8
  1. Command Line Tools (xCode)
    xcode-select --install
  2. Brew and wget
    cd ~
    /bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
    brew install wget
  3. Conda You have to determine whether you have an Intel or M1 Apple computer thanks to the "About This Mac" menu at the top left Apple icon of your screen.
    • Intel
      wget https://repo.anaconda.com/miniconda/Miniconda3-latest-MacOSX-x86_64.sh -O ~/miniconda.sh
      bash ~/miniconda.sh -b -p $HOME/miniconda/.
      
    • or M1
      wget https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-MacOSX-arm64.sh -O ~/miniconda.sh
      bash ~/miniconda.sh -b -p $HOME/miniconda/.
      
    Close and reopen your Terminal window
    source $HOME/miniconda/bin/activate
    conda init zsh
    
    Close and reopen your Terminal window
    # At the time of writing, we recommend Python 3.8 (but possibly not higher)
    cd ~
    PYTHONVERSION=3.8
    ENVNAME=env4ml
    conda update -y -n base -c defaults conda
    conda create -y -n $ENVNAME python=$PYTHONVERSION
    conda activate $ENVNAME
    conda install -y pip
    yes | pip install scikit-learn pandas matplotlib seaborn opencv-python
    
  4. Tensorflow for M1 Special commands for M1
    conda install -y -c apple tensorflow-deps
    yes | pip install tensorflow-macos
    yes | pip install tensorflow-metal
    
  5. Tensorflow for Intel Special commands for M1
    yes | pip install tensorflow
    
  6. PyTorch for Intel/M1
    yes | pip install torch torchvision torchaudio
    

Usage

conda activate $ENVNAME

Windows (10 or higher)

Top

Install

We install a pip environment within a conda environment called ENVNAME=env4ml of Python version 3.8
  1. Conda
    Download and run this Miniconda3 Windows installer
    cd ~
    PYTHONVERSION=3.8
    ENVNAME=env4ml
    conda create -y -n $ENVNAME python=$PYTHONVERSION
    conda activate $ENVNAME
    conda install -y pip
    yes | pip install scikit-learn pandas matplotlib seaborn opencv-python
  2. Tensorflow Google made the effort to make a single Tensorflow install command line for any scenario
    yes | pip install tensorflow
  3. PyTorch
    Depending on your configuration, you will find what suits you on the Pytorch install page.

Usage

conda activate $ENVNAME

Web Browser

Top

Using your own web browser for demos and for remote computing is increaslingly popular. First you choose your operating system above for installation instructions, and then you can complete with these three "web browser" solutions:

Conclusion

Top

This page presented installation procedures for the so-called emerging field Artificial Intelligence in Python. We highly recommend newcomers to study these courses freely proposed by Kaggle. We justify our usage of conda and pip by the following properties:

The way we first use conda and second within a conda environment we install pip is about combining the best of the two worlds.

Within a conda/pip environment that we build, you can use both pip install xxx and conda install xxx commands. In order to dump your toolboxes configuration, you can use:

conda list --export > requirements.txt
and you can recreate it with:
conda create --name ENVNAME --file requirements.txt
where ENVNAME is your chosen conda/pip environment name.

If something goes wrong in your installation to the point you need to remove it. You can use these commands:

cd ~
rm -rf miniconda*
rm -rf .conda*