Links
Table of contents
Papers
- PhD - Statistical learning for selectivity estimation in relational databases (manuscript, slides)
- Selectivity correction with online machine learning - BDA 2020
- Selectivity Estimation with Attribute Value Dependencies using Linked Bayesian Networks - TLDKS 2020
- An Approach Based on Bayesian Networks for Query Selectivity Estimation - DASFAA, 2019
- Entropic Variable Projection for Explainability and Intepretability - 2018
- Master 2 year internship at HelloFresh (report, slides)
- Master 1 year internship at Privateaser (report, slides)
- Undergraduate internship at INSA Toulouse (report, slides)
- Detailed solutions to the first 30 Project Euler problems
Talks
- Online machine learning with River - GAIA, 2022 (slides, recording)
- Online machine learning in practice - PyData PDX, 2022
- Real-time machine learning: the next frontier? - The Applied AI Community, 2021
- Real-time machine learning: the next frontier? - LVMH, 2021
- Manipulating ephemeral data with git - Défi IA, 2021
- The challenges of online machine learning in production - Itaú Unibanco Meetup, 2021
- Quelle est l’empreinte écologique du Big Data? - Toulouse Tech round table, 2021
- A brief introduction to online machine learning - Hong Kong Machine Learning, 2020
- Online machine learning with decision trees - Toulouse AOC workgroup, 2020
- Our solution to the IDAO 2020 qualifiers - virtual seminar, 2020
- Global explanation of machine learning with sensitivity analysis - MASCOT-NUM, Paris, 2020
- The benefits of online learning - Quantmetry, Paris, 2019
- The benefits of online learning - Element AI, London, 2019
- The benefits of online learning - Airbus BizLab, Toulouse, 2019
- An approach based on Bayesian networks for query selectivity estimation - DASFAA, 2019
- Machine learning incrémental: des concepts à la pratique - Toulouse Data Science, 2019
- Online machine learning with creme - PyData, Amsterdam, 2019
- Docker for data science - HelloFresh, Berlin, 2017
- Challenge Big Data - Toulouse, 2017
- Forecasting bicycle-sharing usage - Toulouse Data Science, 2016
Blogroll
This is a list of blogs I regularly scroll through.
- Tim Salimans on Data Analysis
- Randal Olson
- Sam & Max – French and NSFW!
- Sebastian Raschka
- Clean Coder
- Pythonic Perambulations
- Erik Bernhardsson
- otoro
- Terra Incognita
- Real Python
- Airbnb Engineering
- No Free Hunch
- The Unofficial Google Data Science Blog
- will wolf
- Edwin Chen
- Use the index, Luke!
- Jack Preston
- Agustinus Kristiadi
- DataGenetics
- Katherine Bailey
- Netflix Research
- inFERENce
- Hyndsight – Rob Hyndman is a time series specialist.
- While My MCMC Gently Samples
- Ines Montani – by one of the founders of spaCy.
- Stephen Smerity
- Peter Norvig
- IT Best Kept Secret Is Optimization – by Jean-Francois Puget, aka CPMP.
- explained.ai
- Better Explained
- Genetic Argonaut
- pandas blog
- Towards Data Science
- Linear Digressions – data science podcasts.
- Not so standard deviations – more podcasts.
- Talking Machines – even more podcasts.
- Practical AI – here be podcasts.
- Probably Overthinking It
- Simply Statistics
- Practically Predictable
- koaning – by Vincent Warmerdam, who made calmcode.
- blogarithms
- Possibly Wrong
- FastML
- Parameter-free Learning and Optimization Algorithms
- Todd W. Schneider – this guy is really good at exploratory data analysis.
- Yann Thaddée – not directly related to data science but interesting nonetheless.
- Colins Blog
- Fabien Sanglard – nothing to do with data science, but such good taste!
- The Glowing Python – by the creator of MiniSom, which is worth checking out too.
- Matt Hancock
- Francis Bach – someone with an h-index of 80+ who takes the time to blog is worth reading.
- Gwern Branwen – Cool in a weird way.
- Libres pensées d’un mathématicien ordinaire
- Count Bayesie
- Jim Savage
- Nick Higham – a lot of well explained algebra.
- Calmcode – not a blog per se, but a nice collection of short to the point tutorials about various tools.
- Chris Said
- Evan Miller
- Eric Jang
- Andrey Akinshin
- Single Lunch
- Freakonometrics
- Martin Daniel
- Chris Kiehl
- ithaka.im – a guy I met who travelled for 6 years with his wife on a bike, very inspiring.
- Muthukrishnan – has written some neat document processing stuff.
- Björn Ottosson
- Guilherme Duarte Marmerola
- Cal Paterson
- Claire Carroll
- Luke Metz – Luke is working on the niche topic of meta-learning at Google. He also happens a very kind person.
- Practical Recommendations – a blog about recommender systems.
- Robin Linacre – some good stuff related to record linkage.
- Neal Lathia – machine learning in production stuff.
- John D. Cook
- Brandon Roberts
- Allen Downey
- Christophe Blefari
- Scott Rome
- Eugene Yan
- Lj Miranda
- death and gravity – great advanced Python resource.
Hall of fame
The following is a hall of fame of papers, books, and blog posts that left me with a strong impression, both in terms of content and quality.
- The Elements of Statistical Learning - Jerome H. Friedman, Robert Tibshirani, and Trevor Hastie
- Machine Learning - Tom Mitchell – I think this wonderful textbook is under-appreciated.
- Artificial Intelligence: A Modern Approach - Russel & Norvig
- mlcourse.ai – Of all the introductions to machine learning, I think this is the one that strikes the best balance between theory and practice.
- Machine learning cheat sheets - Shervine Amidi
- Kalman and Bayesian Filters in Python - Roger Labbe – Kalman filters are notoriously hard to grok, this tutorial nicely builds up the steps to understanding them.
- CS231n Convolutional Neural Networks for Visual Recognition - Stanford
- Algorithmes d’optimisation non-linéaire sans contrainte (French) - Michel Bergmann
- Graphical Models in a Nutshell - Koller et al.
- Rules of Machine Learning: Best Practices for ML Engineering - Martin Zinkevich – You should read this once a year.
- A Few Useful Things to Know about Machine Learning - Pedro Domingos – This short paper summarizes basic truths in machine learning.
- Choose Boring Technology - Dan McKinley
- How to Write a Spelling Corrector - Peter Norvig – Magic in 36 lines of code.
- MCMC sampling for dummies - Thomas Wiecki
- Your Easy Guide to Latent Dirichlet Allocation
- An Intuitive Explanation of Convolutional Neural Networks - Ujjwal Karn
- An overview of gradient descent optimization algorithms - Sebastian Ruder
- How to explain gradient boosting - Terence Parr and Jeremy Howard – A very good introduction to vanilla gradient boosting with step by step examples.
- Why Does XGBoost Win “Every” Machine Learning Competition? - Didrik Nielsen – This Master’s thesis goes into some of the details of XGBoost without being too bloated.
- Good sleep, good learning, good life - Piotr Wozniak – Extremely long and nothing to do with data science, but a very thorough essay nonetheless on how to properly sleep.
- Make for data scientists - Paul Butler – I believe Makefiles are yet to be rediscovered for managing data science pipelines.
- Statistical tests, P values, confidence intervals, and power: a guide to misinterpretations – Just read it.
- The Cramér-Rao Lower Bound on Variance: Adam and Eve’s “Uncertainty Principle” - Michael Powers
- Kaggle contest on Observing Dark World - Cam Davidson-Pilon – If you’re not convinced about the power of Bayesian machine learning then read this and get your mind blown.
- A Concrete Introduction to Probability (using Python) - Peter Norvig – Extremely elegant Python coding.
- The Hungarian Maximum Likelihood Trick - Louis Abraham
- Machine Learning for Signal Processing - University of Illinois
- Don’t Call Yourself A Programmer, And Other Career Advice
- Tidy Data - Hadley Wickham – If you like playing with data then you need to be aware of this one.
- Gaussian Process, not quite for dummies - Yuge Shi – Gaussian processes are quite difficult to understand (at least, for me) but Yuge gives some great visual intuitions.
- Continuous Delivery for Machine Learning - Martin Fowler
- Memos - Sriram Krishnan
- Frequentism and Bayesianism: A Python-driven Primer - Jake VanderPlas
- A Few Useful Things To Know About Machine Learning - Pedro Domingos
- Multiworld Testing Decision Service: A System for Experimentation, Learning, And Decision-Making
- Machine Learning: The High-Interest Credit Card of Technical Debt - Google
- Variational Inference: A Review for Statisticians - David Blei and his flock
- The Performance of Decision Tree Evaluation Strategies - Andrew Tulloch
- Hidden Technical Debt in Machine Learning Systems - Google
- Distill: Why do we need Flask, Celery, and Redis? (with McDonalds in Between) - Lj Miranda – A good example of the difference between abstract ideas and implementation details.
- Darts, Dice, and Coins: Sampling from a Discrete Distribution - Keith Schwarz
- Simplifying Graph Convolutional Networks - Felix Wu et al. – A nice example of putting the horse before the cart.
- MIT 6.867 machine learning course notes - Tommi Jaakola – For people who enjoy concise mathematical notation.
- A Recipe for Training Neural Networks - Andrej Karpathy
- The Bitter Lesson - Richard Sutton
- The Best Medium-Hard Data Analyst SQL Interview Questions – There are some great interactive SQL tutorials out there, such as SQLBolt and Select Star SQL, but this one takes the cake due to its complexity.
- Rules of Programming - Rob Pike
- Transformers from scratch - Peter Bloem
- Understanding Matrix capsules with EM Routing - Jonathan Hui
- A Machine Learning Primer - Mihail Eric – A good read for beginners in machine learning algorithms.
- Fitting Bayesian structural time series with the bsts R package - Steven L. Scott
- Novelist Cormac McCarthy’s tips on how to write a great science paper - Savage and Yeh
- Emerging Architectures for Modern Data Infrastructure
- What your data team is using: the analytics stack - Technically – Another solid article to understand what an analytics stack looks like in 2021.
- Fred’s ImageMagick Scripts
- Unprojecting text with ellipses - Matt Zucker – See also this article on page dewarping by the same author.
- The Log: What every software engineer should know about real-time data’s unifying abstraction - Jay Kreps
- Language models, classification and dbacl - Laird A. Breyer – machine learning on text with a UNIX philosophy.
- Cameras and Lenses - Bartosz Ciechanowski – 100% worth a read.
- Ditherpunk - Surma
- Command-line Tools can be 235x Faster than your Hadoop Cluster - Adam Drake
- Teaching An Old Dog A New Trick - Chris Kamphuis
- Git scraping, the five minute lightning talk - Simon Willison – I wish I had thought about this first!
- Optimal Peanut Butter and Banana Sandwiches - Ethan Rosenthal
- The Data Science Hierarchy of Needs - Monica Rogati
- Are Pop Lyrics Getting More Repetitive? - Colin Morris
- Tuesday Changes Everything - Jesper Juul
- Gently down the stream - Mitch Seymour
- Les études statistiques sont-elles hors de contrôle? - David Louapre
- Visually stunning math concepts which are easy to explain - StackExchange
- Introduction to Locality-Sensitive Hashing - Tyler Neylon
- Transformers from scratch - Peter Bloem
- How to Build an Economic Model in Your Spare Time - Hal R. Varian – The academic wisdom in this article goes beyond the world of economics.
- My Writings - Leslie Lamport
- Modeling marketing attribution - Claire Carroll – I worked on this problem for a short time at Alan. I definitely would have done a better job if I had read this article first.
- Doing Named Entity Recognition? Don’t optimize for F1 - Christopher Manning – A rather niche topic, but well explained.
- Why the super rich are inevitable - The Pudding – Really cool dataviz.
- Visual design rules you can safely follow every time - Anthony Hobday – good follow-up to Web Design in 4 minutes by Jeremy Thomas
Inspiring data analysis
- Bayesian Rock Climbing Rankings - Ethan Rosenthal
- Is Seattle Really Seeing an Uptick In Cycling? - Jake VanderPlas
- How we changed our roof and cut 1.5 tons of CO2e - Martin Daniel
- WWW: Who Will Win? - Peter Norvig
- State of dataviz: top insights - Henri Battiste – I like this one because of how pretty the charts are 🦋
- Wealth shown to scale - Matt Korostoff
- Are Pop Lyrics Getting More Repetitive? - Colin Morris
- Tracking the Fake GitHub Star Black Market - Fraser Marlow, Yuhan Luo, Alana Glassco
Data sources
Eye candy
- Tyler Hobbs – The god of generative arts.
- Some Jean Giraud stuff
- Mauro Martins
- A new way to knit by Petros Vrellis
- A fascinating article about Manolo Gamboa Naon
- Some Ukiyo-e
- Turtletoy
- Dwitter
- generated.space
- Pixel art by Marcus Blättermann
- Nick Barnes’ football bible
- Simon Stålenhag
- Syd Mead (who worked on Blade Runner)
- Michael Fogleman’s blog
- World of Warcraft art by Dreamwalker
- Hors-sol de AKOREACRO
- Erica Anderson
- Jack Sharp
- Archillect – An AI that curates cool pictures, how awesome is that?
- Martin Kleppe
- Zoomquilt
- lossfunctions.tumblr.com – Yes, that’s a thing.
- Shirts of Peter Norvig
- United Airlines ads by Cream Electric Art
- Miniature Calendar by Tatsuya Tanaka – Broccolis that look like trees, staples that look like workout benches… I love it!
- sandspiel
- Jorge Jacinto
- WaveFunctionCollapse
- Owen D. Pomery
- einsteigenbitte.eu – I love “weird” websites like Claire Glanois'
- 19th Century French Artists Predicted The World Of The Future In This Series Of Postcards
- Blog maps
- Decktwo
- eycndy.com
Cool
- WindowSwap
- Radio Garden
- Every Noise at Once
- Starlink Satellites Tracker
- Based Cooking
- ReadComicOnline – I recommend these French comics.
- Looria – especially for cycling and camping gear
- Same Energy
- BOOOOOOM
- indieblog.page
- Cloudhiker