Papers
- PhD - Statistical learning for selectivity estimation in relational databases (manuscript, slides)
- Selectivity correction with online machine learning - BDA 2020
- Selectivity Estimation with Attribute Value Dependencies using Linked Bayesian Networks - TLDKS 2020
- An Approach Based on Bayesian Networks for Query Selectivity Estimation - DASFAA, 2019
- Entropic Variable Projection for Explainability and Intepretability - 2018
- Master 2 year internship at HelloFresh (report, slides)
- Master 1 year internship at Privateaser (report, slides)
- Undergraduate internship at INSA Toulouse (report, slides)
- Detailed solutions to the first 30 Project Euler problems
Talks
- A brief introduction to online machine learning - Hong Kong Machine Learning, 2020
- Online machine learning with decision trees - Toulouse AOC workgroup, 2020
- Our solution to the IDAO 2020 qualifiers - virtual seminar, 2020
- Global explanation of machine learning with sensitivity analysis - MASCOT-NUM, Paris, 2020
- The benefits of online learning - Quantmetry, Paris, 2019
- The benefits of online learning - Element AI, London, 2019
- The benefits of online learning - Airbus BizLab, Toulouse, 2019
- An approach based on Bayesian networks for query selectivity estimation - DASFAA, 2019
- Machine learning incrémental: des concepts à la pratique - Toulouse Data Science, 2019
- Online machine learning with creme - PyData, Amsterdam, 2019
- Docker for data science - HelloFresh, Berlin, 2017
- Challenge Big Data - Toulouse, 2017
- Forecasting bicycle-sharing usage - Toulouse Data Science, 2016
Datasets
Blogroll
This is a list of blogs I regularly scroll through.
- Tim Salimans on Data Analysis
- Randal Olson
- Sam & Max – French and NSFW!
- Sebastian Raschka
- Clean Coder
- Pythonic Perambulations
- Erik Bernhardsson
- otoro
- Terra Incognita
- Real Python
- Airbnb Engineering
- No Free Hunch
- The Unofficial Google Data Science Blog
- will wolf
- Edwin Chen
- Use the index, Luke!
- Jack Preston
- Agustinus Kristiadi
- DataGenetics
- Katherine Bailey
- Netflix Research
- inFERENce
- Hyndsight – Rob Hyndman is a time series specialist.
- While My MCMC Gently Samples
- Ines Montani – By one of the founders of spaCy.
- Stephen Smerity
- Peter Norvig
- IT Best Kept Secret Is Optimization – By Jean-Francois Puget, aka CPMP.
- explained.ai
- Better Explained
- Genetic Argonaut
- pandas blog
- Towards Data Science
- Linear Digressions – data science podcasts.
- Not so standard deviations – more podcasts.
- Talking Machines – even more podcasts.
- Practical AI – here be podcasts.
- Probably Overthinking It
- Simply Statistics
- Practically Predictable
- koaning – By Vincent Warmerdam, who made calmcode.
- blogarithms
- Possibly Wrong
- FastML
- Parameter-free Learning and Optimization Algorithms
- Todd W. Schneider – This guy is really good at exploratory data analysis.
- Yann ThaddĂ©e – Not directly related to data science but interesting nonetheless.
- Colins Blog
- Fabien Sanglard – Nothing to do with data science, but such good taste!
- The Glowing Python – By the creator of MiniSom, which is worth checking out too.
- Matt Hancock
- Francis Bach – Someone with an h-index of 80+ who takes the time to blog is worth reading.
- Gwern Branwen – Cool in a weird way.
- Libres pensĂ©es d’un mathĂ©maticien ordinaire
- Count Bayesie
- Jim Savage
- Nick Higham – A lot of well explained algebra.
- Calmcode – Not a blog per se, but a nice collection of short to the point tutorials about various tools.
- Chris Said
- Andrey Akinshin
- Single Lunch
Hall of fame
The following is a hall of fame of papers, books, and blog posts that have a very high signal to noise ratio – at least in my book. I highly recommend reading some of them when you get time.
- The Elements of Statistical Learning - Jerome H. Friedman, Robert Tibshirani, and Trevor Hastie
- Machine Learning - Tom Mitchell – I think this wonderful textbook is under-appreciated.
- Artificial Intelligence: A Modern Approach - Russel & Norvig
- mlcourse.ai – Of all the introductions to machine learning, I think this is the one that strikes the best balance between theory and practice.
- Machine learning cheat sheets - Shervine Amidi
- Kalman and Bayesian Filters in Python - Roger Labbe – Kalman filters are notoriously hard to grok, this tutorial nicely builds up the steps to understanding them.
- CS231n Convolutional Neural Networks for Visual Recognition - Stanford
- Algorithmes d’optimisation non-linéaire sans contrainte (French) - Michel Bergmann
- Graphical Models in a Nutshell - Koller et al.
- Rules of Machine Learning: Best Practices for ML Engineering - Martin Zinkevich – You should read this once a year.
- A Few Useful Things to Know about Machine Learning - Pedro Domingos – This short paper summarizes basic truths in machine learning.
- Choose Boring Technology - Dan McKinley
- How to Write a Spelling Corrector - Peter Norvig – Magic in 36 lines of code.
- MCMC sampling for dummies - Thomas Wiecki
- Your Easy Guide to Latent Dirichlet Allocation
- An Intuitive Explanation of Convolutional Neural Networks - Ujjwal Karn
- An overview of gradient descent optimization algorithms - Sebastian Ruder
- How to explain gradient boosting - Terence Parr and Jeremy Howard – A very good introduction to vanilla gradient boosting with step by step examples.
- Why Does XGBoost Win “Every” Machine Learning Competition? - Didrik Nielsen – This Master’s thesis goes into some of the details of XGBoost without being too bloated.
- Good sleep, good learning, good life - Piotr Wozniak – Extremely long and nothing to do with data science, but a very thorough essay nonetheless on how to properly sleep.
- Make for data scientists - Paul Butler – I believe Makefiles are yet to be rediscovered for managing data science pipelines.
- Statistical tests, P values, confidence intervals, and power: a guide to misinterpretations – Just read it.
- The CramĂ©r-Rao Lower Bound on Variance: Adam and Eve’s “Uncertainty Principle” - Michael Powers
- Kaggle contest on Observing Dark World - Cam Davidson-Pilon – If you’re not convinced about the power of Bayesian machine learning then read this and get your mind blown.
- A Concrete Introduction to Probability (using Python) - Peter Norvig – Extremely elegant Python coding.
- The Hungarian Maximum Likelihood Trick - Louis Abraham
- Machine Learning for Signal Processing - University of Illinois
- Don’t Call Yourself A Programmer, And Other Career Advice
- Tidy Data - Hadley Wickham – If you like playing with data then you need to be aware of this one.
- Gaussian Process, not quite for dummies - Yuge Shi – Gaussian processes are quite difficult to understand (at least, for me) but Yuge gives some great visual intuitions.
- Continuous Delivery for Machine Learning - Martin Fowler
- Memos - Sriram Krishnan
- Frequentism and Bayesianism: A Python-driven Primer - Jake VanderPlas
- A Few Useful Things To Know About Machine Learning - Pedro Domingos
- Multiworld Testing Decision Service: A System for Experimentation, Learning, And Decision-Making
- Machine Learning: The High-Interest Credit Card of Technical Debt - Google
- Variational Inference: A Review for Statisticians - David Blei and his flock
- The Performance of Decision Tree Evaluation Strategies - Andrew Tulloch
- Hidden Technical Debt in Machine Learning Systems - Google
- Distill: Why do we need Flask, Celery, and Redis? (with McDonalds in Between) - Lj Miranda – A good example of the difference between abstract ideas and implementation details.
- Darts, Dice, and Coins: Sampling from a Discrete Distribution - Keith Schwarz
- Simplifying Graph Convolutional Networks - Felix Wu et al. – A nice example of putting the horse before the cart.
- MIT 6.867 machine learning course notes - Tommi Jaakola – For people who enjoy concise mathematical notation.
- A Recipe for Training Neural Networks - Andrej Karpathy
- The Bitter Lesson - Richard Sutton
- The Best Medium-Hard Data Analyst SQL Interview Questions – There are some great interactive SQL tutorials out there, such as SQLBolt and Select Star SQL, but this one takes the cake due to its complexity.
- Rules of Programming - Rob Pike
- Transformers from scratch - Peter Bloem
- Understanding Matrix capsules with EM Routing - Jonathan Hui
- A Machine Learning Primer - Mihail Eric – A good read for beginners in machine learning algorithms.
- Fitting Bayesian structural time series with the bsts R package - Steven L. Scott
- Novelist Cormac McCarthy’s tips on how to write a great science paper - Savage and Yeh
- Emerging Architectures for Modern Data Infrastructure - Matt Bornstein, Martin Casado, and Jennifer Li – Gives a good overview of the data analysis tooling landscape as of late 2020.
- Fred’s ImageMagick Scripts
- Cameras and Lenses - Bartosz Ciechanowski – 100% worth a read.
- Ditherpunk - Surma
- Command-line Tools can be 235x Faster than your Hadoop Cluster - Adam Drake
Eye candy
- Tyler Hobbs – The god of generative arts.
- Some Jean Giraud stuff
- Mauro Martins
- A new way to knit by Petros Vrellis
- A fascinating article about Manolo Gamboa Naon
- Some Ukiyo-e
- Turtletoy
- Dwitter
- generated.space
- Pixel art by Marcus Blättermann
- Nick Barnes’ football bible
- Simon StĂĄlenhag
- Syd Mead (who worked on Blade Runner)
- Michael Fogleman’s blog
- World of Warcraft art by Dreamwalker
- Hors-sol de AKOREACRO
- Erica Anderson
- Jack Sharp
- Archillect – An AI that curates cool pictures, how awesome is that?
- Martin Kleppe
- Zoomquilt