Since December 2018 · Updated June 2026

A Practical Python Environment for Artificial Intelligence

A practical installation and tooling guide for AI, ML, CV and NLP in Python written for AI scientists, students and lab teams.

By Warith Harchaoui, Mohamed Chelali, Matias Tassano, Pierre-Louis Antonsanti, Bachir Zerroug and Edmond Jacoupeau.

Battle-tested at the MAP5 applied-mathematics lab. Suggestions and corrections welcome — please contact Warith Harchaoui.

Introduction

This page answers a single, recurring question: what should I actually install today to do ML / CV / NLP work in Python?

It is opinionated on purpose. The default in section 1 is the one we install on every fresh laptop in the lab; the decision table in section 2 covers the situations where the default needs to bend. Sections 3–6 cover the libraries, the GPU question, the notebook / editor / agent layer and the path from notebook to deployable demo. Section 7 is the long, copy-pasteable per-OS recipe. Section 8 is the philosophy — when and why we keep an Open Source counterpart next to every commercial API.

The page is updated as the field moves; the install commands here are what we would run today, not what we would have run two years ago.

What should I install today?
Which setup matches my situation?
Core Python and ML stack
GPU choices: CUDA vs Apple Silicon
Notebooks, IDEs and AI coding agents
From notebooks to small applications
Full installation recipes
Historical and philosophical notes

1. What should I install today?

If you only read one section of this page, read this one.

You need Miniconda first. The commands below use conda, which comes with Miniconda (a small Anaconda). The install line is OS-specific, so grab the one-liner for your machine from section 7 — Ubuntu, macOS (Apple Silicon) or Windows 11 — then come back. If conda --version already works in your terminal, skip ahead.

Once conda is on your PATH, open a fresh terminal and run:

conda create -n env4ml python=3.12
conda activate env4ml
pip install scikit-learn pandas matplotlib seaborn jupyterlab
pip install torch torchvision torchaudio

That gives you classical ML (scikit-learn), data tooling (pandas, matplotlib, seaborn), interactive notebooks (JupyterLab) and deep learning (PyTorch). PyTorch ships its mps (Apple Silicon) and CUDA back-ends in the same wheel — the default install picks the right one for your machine.

Need an NVIDIA-CUDA-specific wheel, a Windows native path, or the classroom variant without PyTorch? Each one is a small tweak; the per-OS recipes are in section 7. Does your situation not match this default at all? Check the decision table in section 2. No local install? Google Colab hands you a remote GPU notebook with NumPy / pandas / scikit-learn / PyTorch pre-installed.

Before you install — a few practical warnings

Do not install ML packages globally. One environment per project saves you from pip conflicts that take a day to diagnose.
Do not chase the newest Python blindly. Python 3.12 is currently a conservative, robust default for ML work. Newer Python releases exist, but ML libraries usually take a few months to fully support each one. In production and teaching, compatibility matters more than novelty.
GPU setup is where most installation pain happens. If a CPU-only install works, ship it first and add GPU support afterwards.
For teaching, reproducibility beats peak performance. A slightly slower setup that every student can reproduce is worth more than a fast one that breaks for a third of the class.
For deep learning at scale, CUDA is still the safest bet. Apple Silicon is excellent for prototyping; production-grade training still lands on NVIDIA.

2. Which setup matches my situation?

The default install in section 1 covers most readers. If your situation is more specific — a particular GPU, a teaching constraint, a cloud-only workflow — pick the row that matches and follow that path.

Your situation	Recommended setup	Why
Laptop, CPU only, just getting started.	Python + `venv` / `uv` + JupyterLab + scikit-learn (+ PyTorch CPU when needed).	Smallest install, fewest moving parts, runs everywhere.
Apple Silicon Mac (M-series).	PyTorch with `mps` for most workflows; MLX for Apple-native experimentation.	Strong local performance and unified memory; limited for very large training jobs.
You have an NVIDIA GPU.	PyTorch with CUDA — install via the official selector, or conda-forge when binary deps fight you.	The most mature GPU path for deep learning at any scale.
No GPU and need one for a few hours.	Google Colab.	Free or paid GPU/TPU runtimes, zero local setup.
Cloud or self-managed server (Linux + NVIDIA).	Ubuntu LTS + Miniconda + PyTorch CUDA, driven via SSH or remote VS Code / Cursor.	Same stack as your laptop, real GPU horsepower, reproducible images.
Teaching a class or running a workshop.	CPU-only conda + pip env, pinned `requirements.txt`, Colab as the fallback.	Reproducibility beats performance. Every student should be able to follow along.
Heavy scientific Python (compiled deps, geo, bioinformatics).	conda-forge throughout, or a carefully pinned `uv` / `pip` environment.	Avoids most binary-dependency pain.
You want to turn an experiment into a small web app.	Streamlit, Gradio or Taipy — or Front if you want the result to look like your project, not the toolkit.	Section 6 walks through the notebook → demo pipeline.

3. Core Python and ML stack

Across the decades, the ML community has shifted from Java to MATLAB to Python. The trillion-dollar companies have settled on Python, and the research community has followed with funding and tooling. We recommend Python for your ML projects to align with that scientific and industrial consensus — not as personal taste but as a pragmatic default. Teams on R or Java can still ship through ONNX-R and DL4J; Rust is steadily gaining ground — see "Are we learning yet?".

Recommended toolboxes

Ordered chronologically by the first release of the lead tool, so the list doubles as a short history of the field. Notebooks, IDEs and AI coding agents have their own section (section 5) so this list stays focused on libraries and frameworks.

Computer Vision tools — libraries, models, annotation, hosted platform — since 2000. Five complementary pieces. Pick the one that matches the layer you are working at:
- Classical CV — OpenCV (2000). Image processing, geometric transforms, camera calibration, tracking. C++ core with Python bindings; still the right answer for everything that does not need a learned representation.
- Object detection / instance segmentation — Detectron2 (Meta FAIR, 2019). Modular PyTorch toolkit for detection, segmentation, keypoint and panoptic tasks. Mature recipes, model zoo, and a straightforward training loop — the default serious-CV stack when you want full control over the architecture and training recipe. For a faster practical path, the Ultralytics YOLO line (YOLOv8, YOLO11, YOLO12 — Python package pip install ultralytics) is the dominant ready-to-use detection / segmentation / pose / oriented-bounding-box stack; AGPL-3.0 with a commercial license for closed-source use.
- Annotation platform — CVAT (MIT-licensed) is the open-source reference for labelling image and video datasets — classification, detection, temporal labelling, segmentation, keypoints and OCR — and the Community edition is free. Paired with an AI coding agent, you can stand up sophisticated annotation pipelines without writing all the tedious plumbing yourself.
- Hosted CV platform + model hub — Roboflow (2020). Today's Roboflow is end-to-end: train, deploy and serve CV models, plus Roboflow Universe as a public hub of pre-trained models and datasets. They still ship Roboflow Annotate, but in 2026 their headline value is the training / deployment / inference stack, not annotation alone.
- Self-supervised feature backbones — DINOv3 (Meta FAIR, 2025). Strong general-purpose image embeddings trained without labels; use as a frozen backbone for downstream tasks (classification, retrieval, segmentation) when you have few labels or want a robust feature space without retraining.
Classical ML — since 2007 — scikit-learn. The canonical Python library for classification, regression, clustering and model selection. Stable API, good defaults, excellent docs — start here unless your problem clearly needs deep learning.
Deep learning — since 2015 — today I would usually start with PyTorch (2016): widely used in research and practice, large ecosystem, most new architectures land here first. TensorFlow (2015) is still important when you need its ecosystem, its production tooling or its mobile / edge deployment path (TF Lite / LiteRT).
Higher-level deep-learning wrappers — since 2015 — Keras 3 (2015; now multi-backend — runs on TensorFlow, JAX or PyTorch, with OpenVINO for inference) is the high-level API to consider when you want one Python interface across backends. PyTorch Lightning (2019) removes the training-loop boilerplate and gives you distributed training and checkpointing for free.
NLP + LLM tools — classical NLP, transformer models, local inference, retrieval, evaluation — since 2015. Five complementary pieces. Pick the one that matches the layer you are working at:
- Classical NLP + word embeddings — spaCy (2015) for industrial-strength tokenization, NER, dependency parsing and POS tagging. fastText (Meta, 2016) for very fast word embeddings and multilingual text classification — C++ core, Python bindings, still a strong low-cost baseline against transformer models.
- Transformer models + embeddings — Hugging Face Transformers (2018) for pre-trained encoders, decoders and seq2seq; sentence-transformers (2019) for dense semantic embeddings and retrieval. See also the standalone Hugging Face bullet below.
- Local LLM inference — Ollama (2023) is the easiest local-first daemon — it wraps llama.cpp for quantised inference on CPU, Apple Silicon (MPS / MLX) or CUDA. For production-scale serving with paged attention, continuous batching and OpenAI-API compatibility, use vLLM (2023). As a rule of thumb for model choice: the Qwen family is often the strongest option in the small / embedded range (when you need something that fits on a laptop, a phone or an edge device), while Gemma tends to be the most reliable choice at the larger end when you have a real GPU to run it on.
- Retrieval + RAG orchestration — LangChain or LlamaIndex (both 2022) to compose retrievers, prompts and tool calls. Pair with a vector store — FAISS (Meta) for the canonical in-process ANN index, or turbovec (2026; Rust core with Python bindings, built on Google Research's TurboQuant) when memory matters — it compresses float32 embeddings to 2- or 4-bit and fits ~10 M vectors in ~4 GB while keeping competitive recall, with no separate training step. My preferred ANN index for local RAG. Qdrant or Chroma when you need a separate service.
- LLM evaluation — DeepEval (2023) for assertion-style tests with LLM-as-judge in a Pytest-shaped runner; Ragas (2023) for RAG-specific metrics (context relevance, faithfulness, answer correctness).
Fairness + explainability + causality — bias metrics, local explanations, counterfactuals, causal effects, audit — since 2016. Six complementary pieces. Run them before the model ships, not after a regulator asks:
- Fairness metrics + mitigation — Fairlearn (Microsoft, 2018) ships demographic parity, equalized odds and a catalogue of mitigation algorithms (post-processing, reductions).
- Local explanations — SHAP (2017) unifies Shapley-value feature attributions across model families and is the de-facto answer for tabular and tree models. Shapash (MAIF, 2020) sits on top of SHAP and LIME with a business-readable layer — feature labels, a built-in webapp for non-ML stakeholders, and one-call export of individual-prediction reports. Pick it when the audience for the explanation is not the ML team. LIME (2016) is the older perturbation-based approach — still useful as a sanity check.
- Counterfactual explanations — FACET (BCG X, 2020) for supervised-learning explainability plus counterfactual simulation on top of scikit-learn. DiCE (Microsoft, 2020) generates diverse counterfactuals — "what minimal change would flip this prediction?"
- Bias auditing pipeline — Aequitas (CMU DSSG, 2018) wraps the metric zoo into an end-to-end audit with reports per protected group — useful when you need to show your work to a non-ML reviewer.
- Deep-net attribution + interactive what-if — Captum (Meta / PyTorch, 2019) for integrated gradients and other attribution methods on neural nets. What-If Tool (Google PAIR, 2018) is the TensorBoard / Colab plugin for tweaking inputs interactively.
- Causal inference — PyWhy (open-source ecosystem, 2022; spun out of Microsoft Research). Its core library DoWhy ships a four-step pipeline (model → identify → estimate → refute) for answering "does X cause Y?" rather than "can we predict Y from X?". EconML handles heterogeneous treatment effects when the answer depends on the subgroup. Causal counterfactuals are the sibling of the FACET / DiCE counterfactuals above — instead of "what input change would flip the prediction?" they ask "what intervention would change the outcome?"
Tabular-data preprocessing — since 2017 — skrub (originally dirty_cat, renamed in 2024; from the Probabl / Inria scikit-learn family). Turns messy real-world tables into scikit-learn-ready features: fuzzy joins on dirty string keys, high-cardinality categorical encoders (GapEncoder, MinHashEncoder, TableVectorizer), and a one-call tabular_learner() baseline. Use it whenever your CSV needs cleaning before it can reach a scikit-learn pipeline.
Tabular foundation models — since 2022 — A genuine surprise from the foundation-model wave: pre-trained transformers now match or beat well-tuned gradient-boosted trees on small-to-mid tables, with zero training on your data. TabPFN (Hutter lab, NeurIPS 2022; v2 in Nature, 2025) and TabICL (Inria SODA, ICML 2025; v3 extends in-context learning to larger tables and broader feature types) take your training CSV as in-context examples and predict on the test rows in a single forward pass — no gradient steps, no hyperparameter search. The "is XGBoost still king on tabular?" answer is finally shifting; try one before reaching for a stack you have to train yourself.
Pre-trained models for CV / NLP / Speech — since 2018 — Hugging Face. Model hub plus the transformers, diffusers and datasets libraries — the de-facto distribution layer for open-weights models. A one-line from_pretrained() is the modern equivalent of pip install.
AI testing tools — model scans, LLM unit tests, RAG eval, red-team, data validation — since 2022. Five complementary pieces. Testing an AI system needs more than a held-out accuracy number — pick the layer that matches the risk you actually run:
- ML model scans — Giskard (2022; open source on GitHub plus a hosted hub) scans tabular, NLP and LLM models for robustness, performance issues, hallucination, prompt injection and bias — the closest thing to a one-call "what's wrong with this model?" check.
- LLM unit testing — DeepEval (2023, open source on GitHub) wraps LLM-as-judge metrics in a Pytest-shaped runner — assertion-style tests for hallucination, answer relevancy, summarisation quality. Drop alongside your regular test suite.
- RAG-specific metrics — Ragas (2023) for context relevance, faithfulness and answer correctness on retrieval-augmented pipelines. Pair with DeepEval when the system uses both retrieval and free generation.
- LLM red-team + safety — PyRIT (Microsoft, 2024) is the Python Risk Identification Tool for automated red-teaming against generative AI. garak (NVIDIA, 2023) is the LLM vulnerability scanner — prompt injection, jailbreaks, data leakage, hallucination probes.
- Data + schema validation — Great Expectations (2017) and Pandera (2019) check the data before it reaches the model. You cannot test a model whose inputs you cannot audit — these are the upstream fence.
Machine Learning project tracking — since 2024 — Skore (by Probabl, the scikit-learn company). Sits on top of scikit-learn to give you opinionated evaluation, model comparison and a project-level dashboard. Use it the moment you have more than one model worth comparing.

AI helpers

A set of Python utility libraries Warith maintains on GitHub — the full index lives at harchaoui.org/warith/ai-helpers. Each one wraps a specific corner of the AI / media stack so you do not have to re-implement the same plumbing in every project.

os-helper — utility functions for working across operating systems (paths, environments, shell quirks). The dependency-less base layer most of the other helpers rely on.
audio-helper — loading audio, converting formats, separating sources, splitting / trimming. Wraps the messy parts of ffmpeg, librosa and friends behind a small Python API.
video-helper — load, convert and frame-extract video files; work with subtitle formats. The video-side counterpart to audio-helper, sharing conventions and CLI shape.
yt-helper — download videos, audio and thumbnails from YouTube, Vimeo, Dailymotion (via yt-dlp). The "give me a clip" layer above the raw downloader.
sftp-helper — utility functions for SFTP servers (upload, download, walk, mirror). Useful when your dataset or model artefacts live on an SFTP share rather than S3.

4. GPU choices: NVIDIA CUDA vs Apple Silicon

Two hardware paths dominate practical ML in 2026, and they cover different jobs. The choice is a practical one, not an ideological one — pick the one that matches the workload in front of you.

NVIDIA CUDA

The reference setup for serious GPU training. Mature ecosystem, the broadest model zoo, the most third-party tooling, and the only realistic path for large-scale training and high-throughput inference.

Best for: deep learning at scale, large-batch training, production inference.
Install via the official PyTorch selector for the right CUDA wheel.
Free research GPUs: the NVIDIA Academic Grant Program.

Apple Silicon (M-series)

Now genuinely useful for local ML work. Unified memory and quiet hardware make it excellent for experimentation, prototyping, teaching and many small-to-medium workflows. It is not a drop-in replacement for CUDA at the high end.

Best for: local development, prototyping, teaching, edge experiments.
PyTorch via the Metal Performance Shaders backend (mps device).
Apple's own MLX for NumPy-style research on Apple Silicon (see also mlx-examples).

Practical rule

Develop and prototype on whatever machine sits in front of you — Apple Silicon makes this comfortable. When the experiment outgrows the laptop, move to a CUDA box, a cloud GPU or a cluster. GPU setup is where most of the installation pain lives, so do the move when it has actually paid for itself, not before.

5. Notebooks, IDEs and AI coding agents

The environment in which you actually write the code matters almost as much as the libraries underneath it. This section covers the three layers most ML practitioners touch every day: notebooks for exploration, editors for serious work, and the new generation of AI coding agents sitting on top of both.

Notebooks

JupyterLab — the default in-browser notebook for exploration, demos and teaching. Install with pip install jupyterlab, start with jupyter lab.
Marimo — since 2023 — a reactive notebook stored as a plain .py file (diff-friendly, no JSON, version-control friendly). Cells form a dataflow graph: change a value upstream and every downstream cell re-runs automatically — no hidden state, no "out-of-order Jupyter" debugging sessions. Built-in UI widgets (mo.ui.slider, mo.ui.dropdown, mo.ui.table) turn a notebook into a small reactive app with marimo run notebook.py. Use it as a modern Jupyter replacement when you want reproducibility, code review and lightweight ML demos in the same artefact.
Google Colab — Jupyter-style notebooks running on Google's GPUs / TPUs. Right answer when you need a remote GPU for a few hours, or a reproducible teaching environment with zero local setup.

Editors

VS Code is still the safe default for everyday ML development. Two AI-first forks have changed the landscape since 2023:

Cursor (Anysphere, 2023) is the original AI-first VS Code fork — deep inline completions, an agent mode that runs commands, and the editor most ML teams compare everything else against.
Google Antigravity (2025) is Google's agent-first VS Code fork — up to five parallel agents in a Manager view, multi-model (Gemini 3, Claude Opus 4.6, Claude Sonnet, GPT-OSS-120B), with a built-in Chrome view for visual verification of UI work.
The open-source equivalent in the same VS Code family is Cline (Apache-2 extension; bring-your-own-key for any provider, including local Ollama; top of the open-source SWE-bench leaderboard).

AI coding agents — a new abstraction level

Programming has always advanced one abstraction layer at a time. Machine code gave way to assembly. Assembly gave way to C. C gave way to Python, JavaScript, Rust. Each new layer let programmers express more intent and offload more mechanism onto a tool: the compiler. Large language models add the next layer — natural language as the source artefact, with code as the compiled output.

“The hottest new programming language is English.”

— Andrej Karpathy, tweet, January 24, 2023.

That is the optimist's framing. The same observation, viewed from the other end of the abstraction stack, comes from the creator of Linux:

“AI is a great new tool, but it's a tool, and when I see people saying, ‘Hey, 99% of our code is written by AI,’ I literally get angry, because those same people — I can pretty much guarantee — that 100% of their code is written by compilers.”

— Linus Torvalds, keynote at the Open Source Summit North America, Minneapolis, May 2026. Video clip.

Both statements are correct, at the same time. Karpathy is marking the new floor — most people will reach a useful program by writing English rather than Python. Torvalds is marking the ceiling — the engineer who understands what the compiler (or the model) actually produced is still the one who can debug it, optimise it and ship it. Engineering judgment, debugging, testing, architecture and taste have not gone anywhere. AI coding agents help most when the human already knows what they are trying to build — they amplify intent, they do not supply it.

For an ML student today, the practical synthesis is straightforward: treat AI coding agents as the next compiler — write your intent clearly, and read the output critically. The two we use most:

Claude Code

Anthropic's official CLI agent. Reads your repo, runs commands, edits files and follows project-specific skills declared under ~/.claude/skills/. Particularly strong at code review, refactors, end-to-end debugging and multi-file edits.

OpenCode

Open-source terminal coding agent that drives Claude, GPT or local models behind the same UX. Reads the same skill format from ~/.opencode/skills/. Pick it when you want to avoid vendor lock-in or run a local model for sensitive code.

6. From notebooks to small applications

Every ML project tends to follow the same trajectory:

Notebook → script → CLI → small web UI → deployable demo

You start in a notebook to find the model that works. You promote the useful cells to a script so the experiment is reproducible. You wrap the script in a CLI so a colleague can run it without reading your code. You wrap the CLI in a small web UI so a stakeholder can try it without a terminal. And, if it survives that, you deploy it.

Different tools fit different rungs of this ladder. Here is how I pick.

Python-first auto-form generators

For the fastest path from notebook to interactive app, the Python-first options are Streamlit, Gradio and Taipy. Each generates a usable web form from Python code in minutes and covers the 80% case very well. The price you pay is that the result looks like a Streamlit, Gradio or Taipy app — their CSS is hard to override cleanly, so apps from any of them tend to look alike.

When you already have a CLI: Front

Many ML projects end up with a working Python CLI before anything else (argparse, click, typer). At that point, the question becomes "how do I put a usable interface on top of this CLI without rewriting it as a Streamlit app?". Front is one answer: an open-source skill for Claude Code and OpenCode that pins the agent to a single frontend stack — vanilla JavaScript, Tailwind CSS, Montserrat (or Inter) — and hands it a curated design system.

Asking the agent to "wrap this CLI in a GUI" produces a single-page index.html + app.js + Tailwind config that maps every argparse / click flag onto the right form control and streams the CLI's output to a log panel. No framework lock-in, no Python runtime, no "Gradio look" — just plain HTML you can edit.

Install with Claude Code

git clone https://github.com/warith-harchaoui/front.git
mkdir -p ~/.claude/skills
cp -r front/front-ui      ~/.claude/skills/front-ui
cp -r front/front-cli-gui ~/.claude/skills/front-cli-gui
cp -r front/front-publish ~/.claude/skills/front-publish
cp -r front/front-a11y    ~/.claude/skills/front-a11y

Or OpenCode

mkdir -p ~/.opencode/skills
cp -r front/front-* ~/.opencode/skills/

Repo: github.com/warith-harchaoui/front — public domain (Unlicense).

Why this matters for ML practitioners

Your CLI keeps owning execution. Front only emits the UI shell and a thin HTTP+SSE adapter on top of your existing CLI. You do not rewrite your training loop, your inference script or your evaluation harness.
The output looks like your project, not the toolkit's. Front emits plain Tailwind that you can read, audit and adjust.
It composes with the rest of your stack. Drop the emitted HTML into Tauri for a desktop app, into FastAPI for a hosted demo, or behind any reverse proxy.
Accessibility and i18n are built in. Focus rings, dark-mode peers, reduced-motion guards, alt-text drafting and a contrast audit are part of the skill, not a follow-up sprint.

When the demo needs to ship

Once you have an HTML page driving a CLI, packaging is straightforward: Tauri for a desktop app, FastAPI for a hosted demo with a clean HTTP boundary, an ONNX export when you need the model to leave Python entirely. Each of these is a small additional step — not a full rewrite — because the CLI, the model and the UI stay loosely coupled.

Serving the model in production

Model serving hosts the trained model (in the cloud or on-premises) and exposes it through an API so applications can plug AI into their workflow. For cross-language portability, ONNX is both a model file format and a deployment runtime, backed by an open-source community together with Microsoft and the Linux Foundation AI. Prototype in Python, then deploy the ONNX file through ONNX Runtime in C / C++, C#, Java, JavaScript or Objective-C.

For performance-critical paths, the well-trodden route is still C / C++ called from a higher-level language. On the Python side, the best experience comes from pybind11 (Meta used it for fastText); Cython remains a solid option — scikit-learn is its most visible showcase; Numba is a pleasant just-in-time alternative for hot loops. Apple's coremltools lets you tap the Neural Engine from Python at inference time.

7. Full installation recipes

These recipes are the long form of section 1. Use them when you are setting up a fresh machine, when you need NVIDIA / CUDA drivers, or when you want a Linux-like environment on Windows. Throughout, the environment is named env4ml and the Python version is 3.12 — PyTorch and most maintained packages support 3.12 by default.

Ubuntu (22.04 LTS or higher)

Install Miniconda:

mkdir -p ~/miniconda3
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -O ~/miniconda3/miniconda.sh
bash ~/miniconda3/miniconda.sh -b -u -p ~/miniconda3
rm -rf ~/miniconda3/miniconda.sh
~/miniconda3/bin/conda init bash

Close and reopen the terminal, then create the environment:

ENV=env4ml
conda update -y -n base -c defaults conda
conda create -y -n $ENV python=3.12
conda activate $ENV
conda install -y pip
pip install scikit-learn pandas matplotlib seaborn jupyterlab

NVIDIA driver and CUDA — follow the Google Cloud instructions on your local machine. Mandatory for NVIDIA acceleration.

Then install PyTorch with the matching CUDA wheel via the official selector — for example:

pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu124
Activate the environment for each session:
```
ENV=env4ml
conda activate $ENV
```

macOS (Sonoma 14 or higher, Apple Silicon)

On Apple Silicon Macs, the situation has improved dramatically. PyTorch can target Apple's Metal Performance Shaders backend through the mps device, which makes local experimentation and prototyping comfortable. Apple also develops MLX, a NumPy-like array framework with autodiff designed for ML research on Apple Silicon. Both are excellent for local work; for heavy training, CUDA on NVIDIA GPUs remains the reference (see section 4).

Command-line tools and Homebrew:

xcode-select --install
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
brew install wget

Install Miniconda (Apple Silicon, arm64):

mkdir -p ~/miniconda3
curl https://repo.anaconda.com/miniconda/Miniconda3-latest-MacOSX-arm64.sh -o ~/miniconda3/miniconda.sh
bash ~/miniconda3/miniconda.sh -b -u -p ~/miniconda3
rm -rf ~/miniconda3/miniconda.sh
~/miniconda3/bin/conda init zsh

Close and reopen the terminal, then create the environment:

ENV=env4ml
conda update -y -n base -c defaults conda
conda create -y -n $ENV python=3.12
conda activate $ENV
conda install -y pip
pip install scikit-learn pandas matplotlib seaborn jupyterlab
pip install torch torchvision torchaudio

Windows 11

For 2026 we recommend Windows 11 for native Windows users. For any serious ML work on Windows, we strongly recommend WSL2 (Windows Subsystem for Linux 2). WSL2 gives you a real Linux kernel and a Linux-like development environment that mirrors most cloud and server workflows — meaning the same installation commands work on your laptop, on a CI runner and on a remote GPU box.

Recommended path: WSL2

From an elevated PowerShell:

wsl --install -d Ubuntu

Then open the Ubuntu shell and follow the Ubuntu recipe above. NVIDIA CUDA in WSL2 is supported through the official NVIDIA CUDA on WSL guide.

Native Windows path

If you really need a native Windows environment (for example because a vendor SDK is Windows-only):

Install Miniconda:

curl https://repo.anaconda.com/miniconda/Miniconda3-latest-Windows-x86_64.exe -o miniconda.exe
start /wait "" miniconda.exe /S
del miniconda.exe

Create the environment:

set ENV=env4ml
conda update -y -n base -c defaults conda
conda create -y -n %ENV% python=3.12
conda activate %ENV%
conda install -y pip
pip install scikit-learn pandas matplotlib seaborn jupyterlab

No local install

If you would rather not touch your machine, Google Colab gives you Jupyter-style notebooks running on Google's GPUs and TPUs. It is the right starting point for one-off experiments, classroom assignments and short notebooks.

8. Historical and philosophical notes

Open Source — the backup generator

“AI is the new electricity.”

— Andrew Ng, Stanford Graduate School of Business, January 2017.

Andrew Ng's framing has been repeated so often that it now risks sounding obvious, but the trajectory it describes is real: AI is following the path electricity took a century ago — from a frontier technology, to a utility, to a quiet commodity that powers everything else. The analogy is useful here for a specific reason: electricity is also a textbook case of how to engineer for a serious dependency.

In any OECD country, a hospital is plugged into a power grid that is engineered, regulated, monitored and overwhelmingly reliable. It still keeps a diesel generator in the basement. Nobody runs the ICU off that generator on a normal Tuesday; it is there because, on the day the grid fails, "we trusted the utility" is not a defensible answer.

AI is settling into the same category of infrastructure as electricity, water and the road network. Hospitals, ministries, labs, schools and small businesses are coming to rely on it in ways that would be painful to undo. The important nuance — unlike the power grid — is that the commercial AI utilities are not run by the state. They are operated by a small number of private companies, in a small number of jurisdictions, each with its own pricing, policies and incident history.

The practical answer is the hospital's answer. Use a high-quality commercial AI when it is the best tool for the job, and keep an Open Source counterpart installed, tested and ready for the day a bill, a policy change, a rate limit or a geopolitical decision puts the commercial option out of reach. Open Source AI is, at the very least, the backup generator of the paid AI plant.

Even the most polished paid API goes down. Provider incidents, regional outages, revoked keys, sudden quota cuts, deprecated model snapshots, a vendor withdrawing from a country — none of these are hypothetical, and all of them have happened in production this year. Keeping an Open Source alternative on hand, ideally one you can run locally or on-prem, is not a philosophical stance; it is plain common sense — the same common sense that puts a generator in the basement of a hospital that has never lost power.

A pragmatic dual stack

Commercial side — Claude, ChatGPT, Gemini, Mistral — frontier capability, hosted inference, SLAs.
Open Source side — open-weight families from Meta Llama, Google Gemma, Qwen, Mistral and DeepSeek, runnable locally via Ollama, llama.cpp or vLLM.
Coding agent — the same pattern at the tooling layer: Claude Code for daily work, and OpenCode in the basement — ready to drive the same skills with a local model when the upstream API is unreachable or off-limits.

A virtuous cycle, not a rivalry

Beyond the backup-generator argument, the two sides actively feed one another. Commercial labs push the frontier; Open Source catches up — often within months — and then advances it further with ideas the next proprietary release absorbs in turn. The lineage is hard to miss: TensorFlow and PyTorch released into the open by Google and Meta; the original Transformer paper on arXiv; Hugging Face and its Model Hub; llama.cpp running serious models on a laptop; Llama's open weights; DeepSeek-R1's reasoning recipes. Each one collapsed a cost curve or unlocked a technique the proprietary world then built on — and the other way around.

The whole intellectual adventure of modern AI — the papers, the benchmarks, the reproducible baselines, the worldwide community that reads and improves them — could not have happened inside a closed ecosystem. The field is too vast, too cross-disciplinary and moving too fast for any single lab to carry it alone. Without Open Source, there is no PyTorch to learn from, no arXiv preprint to reproduce on a Friday night, no Kaggle kernel to fork, no Hugging Face card to fine-tune — and no field as we know it.

The dual stack is therefore both operational hygiene and a quiet act of recognition. The generator runs once a month, the staff knows how to start it, and the day the grid fails the hospital keeps working. On every other day, the grid itself owes a great deal to the open code, open papers and open weights that made the whole electricity of AI possible in the first place.

Frameworks that mattered

Older frameworks and GPU abstraction layers such as Apache MXNet, PlaidML and DeepCL helped shape the evolution of deep-learning tooling, but I would no longer recommend starting a new project with them — MXNet in particular has been retired to the Apache Attic. They are worth knowing for context, not for green-fielding.

Why conda + pip

Throughout this page, the default has been conda + pip. The rationale is pragmatic:

conda manages isolated environments and Python versions, and handles compiled scientific dependencies cleanly across platforms.
pip reaches the rest of the Python package universe, including ML libraries that ship faster on PyPI than on conda channels.

Create the environment with conda, then install with either as needed. To export and recreate it:

conda list --export > requirements.txt
conda create --name ENVNAME --file requirements.txt

If something goes wrong and you need a clean slate:

cd ~
rm -rf miniconda*
rm -rf .conda*

Beginners looking for hands-on practice will find the free Kaggle courses a good place to start once the environment in section 1 is up and running.