Warith Harchaoui | In Data Science: no Data, no Science

In the rapidly advancing field of Artificial Intelligence, it is common to observe high volume and pace of both scientific and non-scientific publications. It is overwhelming and I am often asked how to do.

However, it is fortunate that experienced scientists do take the time to write comprehensive books that provide valuable insights and surveys of the field. In addition to the concise format of publications at top AI conferences, it is beneficial to delve deeper into the mathematical and algorithmic complexities of plain books in order to both understand the shorter works and effectively utilize the various toolkits available online. It is in this context that I present a list of books that I find particularly noteworthy, along with some comments, for those readers who are eager to engage in the exciting AI adventure.

Since recently, I took the habit to ask questions to the consensus.app and hyperwrite.ai engines which are both LLM-powered in RAG mode with curated sources (see explanations here).

I did not commented all the books I like so far yet. Indeed, it is difficult for me to comment books from people I admire. In a way that is valuable for the readers, I try to be relevant and straightfoward, so it takes me time.

What is a Data Scientist?

Short answer by Josh Wills in May 2012 on Twitter:

Data Scientist (n.): Person who is better at statistics than any software engineer and better at software engineering than any statistician.

I am Dr Warith Harchaoui and I have been fortunate to practice the Data Scientist job since summer of 2014 at Oscaro.com. Since then, I continue to learn every day, gathering great resources. For beginners, I strongly recommend the following:

Kaggle’s micro-courses, which offer a few hours of hands-on experience in both Python and machine learning: Kaggle Learn.
Introduction to Machine Learning with Python.

Back in the days when I was admin of the GPU servers of my Ph.D. lab, I maintained this page to start programming in AI. Today, it is used by my clients beginning to embrace AI within their IT department.

For those who want to delve deeper on the academic side, I present several books that guide me throughout my career.

The Elements of Statistical Learning, 2^nd Edition
Trevor Hastie, Robert Tibshirani, Jerome Friedman, 2009

Any practionners in AI/ML/DS needs some clean theoretical foundations in their own expertise. We have to admit you can found most of them in this influential book in the field of machine learning and statistical modeling. This is a go-to rigorous reference for supervised and unsupervised learning, neural networks, support vector machines, and more. After working through one of its chapters (in practice even one of them needed for a given prototyping project) any doubt about a subject simply vanishes.

Pattern Recognition and Machine Learning
Christopher M. Bishop, 2006

This 738 pages top-notch textbook offers a complete overview of the fields of Pattern Recognition and Machine Learning. The “pattern recognition” part of the title reminds me how powerful it is for engineers bulding things people do not understand yet called engines literally. It might be too subjective to say but I really love that book (except the peculiar cover, otherwise it would be perfection!).

The mathematical background is not too heavy, and kindly refreshed to the readers when necessary. Re-reading the various chapters consistently inspires me, thanks to their unique pedagogical approach that weaves back and forth between theory and practice.

Machine Learning: A Probabilistic Perspective
Kevin Murphy, 2012

These three books (of thousand pages each approximately) cover a wide range of topics in detail, including probability, optimization, linear algebra, for machine learning with particular attention for conditional random fields, L₁ sparisity regularization, and deep learning. People with mathematical backgrounds will find it a great reference, and it's also a good choice for self-study.

The attempt to unify traditional and more recent topics provides a valuable coherence and reflection for developing a culture. These books are not only about fundamentals but also about the state of the art. Ideally, an undergraduate a student considering a doctoral thesis should at least try to read the first volume “Machine Learning: A Probabilistic Perspective”: if it is not fascinating to the student, maybe he/she should not pursue a Ph.D. in Machine Learning.

It is written in an easy-to-understand style, with pseudo-code for the most important algorithms and plenty of examples from real-world fields like biology, text processing, computer vision, and robotics. Instead of just giving you a bunch of random tricks and techniques, the book takes a closer look to graphical models to tackle probabilistic modelling in a clear and concise way.

Bayesian Reasoning and Machine Learning
David Barber, 2012

This 735 pages book explains how established tools are used in a wide range of industrial applications spreading rapidly, including search engines, DNA sequencing, stock market analysis, and robot locomotion. Beyond sterile discussions about “Bayesians vs. Frequentists” (troll discussions equivalent to “emacs vs. vim” or “Linux vs. Windows” in Machine Learning), this book is the first I can think of about what “Bayesian Modelling” or “Graphical Models” actually mean. This hands-on text opens opportunities to computer science students with some taste for mathematics to go further.

This book narrates advancements in the field of machine learning and graphical models. Before reading this book, I did not understand the circles and arrows in articles claiming they were graphical models. Now these drawings are much clearer to me and sometimes I do some myself. I can even say that what makes this book unique is the integration of multiple disciplines through the use of graphical models. In addition, the transition from traditional artificial intelligence to modern machine learning, executed with finesse, adds to the value of the book. It is written with clarity and, as such, should be accessible to a diverse audience, including those with varying levels of mathematical proficiency.

Computer Vision

Computer vision is an area of artificial intelligence that aims to replicate the capabilities of human vision by teaching computers to interpret and comprehend the visual environment in a similar way to humans. This field is applied to a variety of tasks, such as facial recognition, object detection, autonomous driving, and medical imaging.

Computer Vision: Algorithms and Applications, 2^nd Edition
Richard Szeliski, 2022

This 2^nd edition of the book (1212 pages) is pleasantly entertaining yet covering almost all important subjects in Computer Vision: Filtering, Recognition, Feature Matching, Image Alignment, Motion Estimation, Computational Photography, Robotic Vision, Depth Estimation (with 2 or even 1 photograph(s) of the same scene), 3D, Rendering...

I highly recommend this book for newcomers trying to dive in the field.

Computer Vision: A Modern Approach, 2^nd Edition
David Forsyth and Jean Ponce, 2011

This textbook (800 pages) has been written by two living legends in Computer Vision: David A. Forsyth and Jean Ponce. Here the main objective is to develop a scientific culture and strenghten mathematical reflexes for handling classic Computer Vision problems from image modelling to understanding human activity.

The book is particularly comprehensive about building image features, computational geometry, image preprocessing, segmentation and object recognition which gives insight beyond Computer Vision.

Multiple View Geometry in Computer Vision
Richard Hartley and Andrew Zisserman, 2004

The book (670 pages) covers the basic principles of Computer Vision, specifically in regards to understanding the structure of real world scenes and reconstructing them using geometric, algebraic and algorithmic principles. This is not only fundamental for 3D representations but also for understanding 2D perspective in images and videos. Being impregnated with the writing style of Richard Hartley and Andrew Zisserman is also valuable for being a researcher oneself.

Natural Language Processing

Natural Language Processing (NLP) allows computers to interpret and comprehend human language. This is achieved through the use of algorithms and software that analyze large amounts of data and extract the meaning of text, enabling computers to understand language in a similar way to humans. NLP is applied in various contexts, including search engine optimization, automatic summarization, sentiment analysis, and natural language generation.

Neural Network Methods in Natural Language Processing
Yoav Goldberg, 2017

This long article (76 pages) that we can combine with the associated longer book (309 pages) is a pretty fine first-read of Natural Language Processing that finally works in practice! How numbers can express words and expressions of human beings? How to use the terrific idea of embeddings even beyond NLP. How can we use the Deep Learning artillery to accomplish wonders since the seminal word2vec approaches in the mid-2010s. The readers will appreciate the straightforward and clear explanations of the author.

Foundations of Statistical Natural Language Processing
Chris Manning and Hinrich Schütze, 1999

This 620 pages book is old but it summarizes very well all the good practices of non-deep Natural Language Processing. It is nicely written and a nice source of inspiration for even non-NLP-related problems especially for pre-processing data. One can recommend this book for understanding at least the problems at hand in recent publications such as part-of-speech tagging, context free grammars, topics extraction or information retrieval.

Signal Processing and Information Theory

Crisper signals and more precise information mean better-performing systems across all domains!

Signal processing focuses on the manipulation, analysis, and transformation of signals—such as sound waves, radio waves, images, or data from medical instruments. It plays a vital role in various contexts, including audio enhancement, image processing, medical imaging analysis, communication systems, and control systems. At its core, signal processing extracts meaningful information—like frequency, amplitude, or phase—from signals or modifies them to improve their quality, clarity, or usability.

Information theory complements signal processing by offering the mathematical tools needed to understand, quantify, and optimize the transmission, storage, and processing of information. It underpins critical concepts like data compression, error correction, and channel capacity.

Together, signal processing and information theory form the foundation of cutting-edge technologies such as telecommunications, digital media, artificial intelligence, and biomedical engineering. Their combined impact drives innovations ranging from noise reduction in audio recordings and efficient wireless communication to breakthroughs in diagnostics and imaging technologies, ensuring smarter, more effective systems across diverse fields.

A Wavelet Tour of Signal Processing, 3^rd Edition
Stéphane Mallat, 2008

One can recommend this legendary book (edited several times) even if you don't like wavelets. The great value of this book lies in its explanations of the links between algebra and signal processing (bases and projections), the refreshing insights into what a Fourier transform is, time-frequency analysis, sparsity, space scales, compression, inverse problems... all this with a pleasant writing style. The associated website A Wavelet Tour of Signal Processing is just magical! I cannot help myself from citing this awesome website brother Numerical Tours from Gabriel Peyré (who extended the book in this last edition).

Ce livre existe aussi en français.

Information Theory, Inference, and Learning Algorithms
David MacKay, 2003

This 640-page book is a masterful text that provides a comprehensive exploration of the connections between information theory, Bayesian inference, and machine learning. It is rare to find such a rigorous yet practical book that presents foundational principles while continuing to feel cutting-edge (you would be surprised), even two decades after its publication in 2003.

Key features include intuitive examples that clarify complex concepts, interdisciplinary insights bridging theory and application. David MacKay also covers nice topics like error-correcting codes and probabilistic graphical models, ensuring this book remains a timeless classic for students, researchers, and practitioners alike.

Advances in Financial Machine Learning
Marcos Lopez de Prado, 2018

This 400 pages book is a great primer on Machine Learning applied to Finance and it seems it gets the well-deserved status of a classic textbook. I recommend it for its quality about time series even beyond Finance.

It is written for pragmatic people who want to thrice understand, apply and experiment from Sharpe ratio to proper cross-validation with the right amount of Computer Science concepts involved. The modelling and backtesting sections of the book will save you a lot of time and sweat when using time series AI.

Introduction to Information Retrieval
Christopher D. Manning, Prabhakar Raghavan and Hinrich Schütze, 2008

This comprehensive 496-page textbook is an essential resource for understanding search engine technology, text classification, and web information retrieval (IR): from old ones to new ones.

The book adeptly balances theoretical concepts with practical applications while covering a wide array of topics, including Boolean and vector space retrieval models, evaluation metrics, indexing, query expansion, and machine learning approaches in IR. The authors' clear explanations and inclusion of real-world examples facilitate a deeper understanding of the material.

One of the book's strengths lies in its treatment of web search, addressing challenges such as crawling, link analysis, and handling large-scale data. This focus is particularly relevant given the ever-growing importance of effective information retrieval in the digital age.

Paradoxically, despite huge advancements in NLP and semantic search, the book remains highly relevant because of its wealth of foundational research. Many modern breakthroughs, such as semantic vectorization for text, images, and videos, have been developed to catch up with and leverage the concepts and techniques outlined here, demonstrating how the field has continually built upon this rich literature.

Reinforcement Learning

Reinforcement Learning (RL) is about making machines learn from their environment and perform actions that maximize rewards. To do this, the computer is given a numerical goal or objective, and then given feedback after each action it performs in the form of punishment and reward, also numerical. The computer then adjusts its actions according to this feedback, learning from its mistakes and optimizing its behavior over time. Reinforcement learning is applied in a variety of fields, including games, robotics and autonomous vehicles. In general terms, reinforcement learning is the process of teaching a computer to perform the most efficient actions in a given environment in order to maximize the rewards

Reinforcement Learning, 2^nd Edition
Richard S. Sutton and Andrew G. Barto, 2018

As the name suggests, this 557 pages book provides an in-depth introduction of Reinforcement Learning (RL) from two authority figures of this community: R. Sutton and A. Barto. This book is a must-read to understand RL, and it does not assume prerequisite knowledge (for an undergraduate). It is perfect for a person who wants to know more about RL updated in this second edtion with the deep learning approaches.

In the new chapters for this edition, the readers can appreciate the relationships between RL and Optimal Control, as well as a chapter focused on famous prowess such as AlphaGo, AlphaGo Zero, Atari game playing and IBM Watson.

Algorithms and Optimization

Algorithms are sets of instructions for solving problems in a systematic way. Optimization is the process of identifying the most efficient way to solve a problem. In essence, algorithms and optimization can be viewed as sister techniques for improving efficiency. Algorithms and optimization are the cornerstones of artificial intelligence to adjust model parameters to fit the data. Mastering algorithms and optimization techniques provides the theoretical, but more importantly practical, tools to create new ones and adapt old ones to meet the specificity of your real-world problems.

Introduction to Algorithms
Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest and Clifford Stein, 2009

This 1312-page book is legendary. Don't be fooled by the word introduction or its relatively old age (2009): I would consider anyone very competent if they master this book. It is considered a must-read for many members of the AI community and even the wider computer community.

What makes it so special is that the chapters are both comprehensive and precise, with a particular effort to be simple but not simplistic. In practice, I have saved a lot of time in my work thanks to chapters on multiprocessing calculations and on how to use divide and conquer algorithms, dynamic programming and greedy algorithms to solve general problems beyond the preferred and fashionable programming language you like.

Convex Optimization
Stephen Boyd and Lieven Vandenberghe, 2004

The “Boyd” is a gentle, yet rigorous “first book” of 727 pages for newcomers in Numerical Optimization. Every time, we hear training or learning from data, it is basically optimization even beyond AI. Convex optimization problems are special cases with exact solutions that can be used to tackle non-convex problems through successive approximations which makes it crucial in Machine Learning (and Deep Learning).

The exercises are so good that sometimes I suspect scientists writing articles to be inspired by the exercices in this book and simply extend them into valuable publications. I also appreciate this book for developping intuitions and interpretations of the concepts and methods. I cannot write about this book without mentioning its well-known solver toolbox CVXPY which is really helpful for scientists and practioners.

Numerical Optimization, 2^nd Edition
J. Frédéric Bonnans, J. Charles Gilbert, Claude Lemaréchal and Claudia A. Sagastizábal, 2006

Numerical Optimization is ubiquitous in science and engineering as nicely explained in the introduction. It is a key component of many algorithms in machine learning, signal processing, image processing, computer vision, robotics, and many other fields.

This 508 pages book presents the main concepts and algorithms in a unified and accessible manner, with a focus on the practical aspects of their implementation. The authors are famous in this field and have been teaching this course for many years with also experience in energy management, geoscience, life sciences on optimization problems.

When in doubt while imagining new AI/ML algorithms, this is the book I would read first for confirmation and inspiration. When people don't find their answers in the “Boyd”, I recommend that one. The book is intended for graduate students and researchers but I have no problem admitting I consult it on a regular basis.

Ce livre existe aussi en français.

Numerical Recipes, 3^rd Edition
William H. Press, Saul A. Teukolsky, William T. Vetterling and Brian P. Flannery, 2007

“Numerical Recipes” is a famous and comprehensive 1256-page book on scientific computing techniques. It covers a wide range of topics, including linear algebra, the computer science involved, and various numerical methods and algorithms.

This is typically the kind of book that could help you design heavy computational algorithms in C/C++ or Fortran called from high-level languages like Python. It is quite rare to find such an easy and precise book to read, co-authored by world experts from academia and industry. I have been using this book for many years and still find it insightful. It is a must-have book for any serious scientist or engineer who wants to deliver reliable software on a large scale.

Computational Optimal Transport
Gabriel Peyré and Marco Cuturi, 2019

This 209-page book reviews the topic of Optimal Transport with a focus on numerical methods and their applications at various scales: small, medium and large. A standout feature of this book is the accompanying website, which boasts impressive teaching materials, a wealth of literature, and high quality toolboxes like the Python Optimal Transport (POT) toolbox (developed by Rémi Flamary and Nicolas Courty).

Starting with a history of Optimal Transport (invented by Gaspard Monge in 1781), the book guides readers through a comprehensive survey of the field especially for the concept of entropic regularization and how it has enabled the use of Optimal Transport at large scales settings in fields like Imaging Sciences (such as Color or Texture Processing), Computer Vision, Image Graphics (for shape manipulation), and Machine Learning (for tasks like Regression, Clustering, Classification, Density Fitting, and even Content Generation by imitation). To the best of my knowledge, this is the only book to cover the topic of Optimal Transport with such a precise computational angle.

Favorite Books in Artificial Intelligence

Computer Vision

Natural Language Processing

Signal Processing and Information Theory

Reinforcement Learning

Algorithms and Optimization