- An Approach Based on Bayesian Networks for Query Selectivity Estimation (slides)
- Cost models in database query optimisation bibliography
- Master 2 year internship at HelloFresh (report, slides)
- Master 1 year internship at Privateaser (report, slides)
- Undergraduate internship at INSA Toulouse (report, slides)
- Detailed solutions to the first 30 Project Euler problems
- Machine learning incrémental: des concepts à la pratique - TDS Meetup 2019
- Online machine learning with creme - PyData Amsterdam 2019
- Docker for data science - HelloFresh Data Science Academy
- Forecasting bicycle-sharing usage - Toulouse Data Science 2016
Hall of fame
The following is a hall of fame of papers, books, and blog posts that have a very high signal to noise ratio; I highly recommend reading some of them when you get time.
- The Elements of Statistical Learning - Jerome H. Friedman, Robert Tibshirani, and Trevor Hastie
- Machine Learning - Tom Mitchell – I think this wonderful textbook is under-appreciated.
- Artificial Intelligence: A Modern Approach - Russel & Norvig
- mlcourse.ai – Of all the introductions to machine learning I think this is the one that strikes the best balance between theory and practice.
- Machine learning cheat sheets - Shervine Amidi
- Kalman and Bayesian Filters in Python - Roger Labbe – Kalman filters are notoriously hard to grok, this tutorial nicely builds up the steps to understanding them.
- CS231n Convolutional Neural Networks for Visual Recognition - Stanford
- Algorithmes d’optimisation non-linéaire sans contrainte (French) - Michel Bergmann
- Graphical Models in a Nutshell - Koller et al.
- Rules of Machine Learning: Best Practices for ML Engineering - Martin Zinkevich – You should read this once a year.
- A Few Useful Things to Know about Machine Learning - Pedro Domingos – This short paper summarizes basic truths in machine learning.
- Choose Boring Technology - Dan McKinley
- How to Write a Spelling Corrector - Peter Norvig – Magic in 36 lines of code.
- MCMC sampling for dummies - Thomas Wiecki
- Your Easy Guide to Latent Dirichlet Allocation
- An Intuitive Explanation of Convolutional Neural Networks - Ujjwal Karn
- An overview of gradient descent optimization algorithms - Sebastian Ruder
- How to explain gradient boosting - Terence Parr and Jeremy Howard – A very good introduction to vanilla gradient boosting with step by step examples.
- Why Does XGBoost Win “Every” Machine Learning Competition? - Didrik Nielsen – This Master’s thesis goes into some of the details of XGBoost without being too bloated.
- Good sleep, good learning, good life - Piotr Wozniak – Extremely long and nothing to do with data science, but a very thorough essay nonetheless on how to properly sleep.
- Make for data scientists - Paul Butler – I believe Makefiles are yet to be rediscovered for managing data science pipelines.
- Statistical tests, P values, confidence intervals, and power: a guide to misinterpretations – Just read it.
- The Cramér-Rao Lower Bound on Variance: Adam and Eve’s “Uncertainty Principle” - Michael Powers
- Kaggle contest on Observing Dark World - Cam Davidson-Pilon – If you’re not convinced about the power of Bayesian machine learning then read this and get your mind blown.
- A Concrete Introduction to Probability (using Python) - Peter Norvig – Extremely elegant Python coding.
- The Hungarian Maximum Likelihood Trick - Louis Abraham
- Machine Learning for Signal Processing - University of Illinois
- Don’t Call Yourself A Programmer, And Other Career Advice
- Tidy Data - Hadley Wickham – If you like playing with data then you need to be aware of this one.
This is a list of blogs I regularly scroll through.
- Tim Salimans on Data Analysis
- Randal Olson
- Sam & Max – French and NSFW!
- Sebastian Raschka
- Clean Coder
- Pythonic Perambulations
- Erik Bernhardsson
- Terra Incognita
- Real Python
- Airbnb Engineering
- No Free Hunch
- The Unofficial Google Data Science Blog
- will wolf
- Edwin Chen
- Use the index, Luke!
- Jack Preston
- Agustinus Kristiadi
- Katherine Bailey
- Netflix Research
- Hyndsight – Rob Hyndman is a time series specialist
- While My MCMC Gently Samples
- Ines Montani – By one of the founders of spaCy
- Stephen Smerity
- Peter Norvig
- IT Best Kept Secret Is Optimization – By Jean-Francois Puget, aka CPMP
- Better Explained
- Genetic Argonaut
- pandas blog
- Towards Data Science
- Linear Disgressions – data science podcasts
- Probably Overthinking It
- Simply Statistics
- Practically Predictable
- koaning – By Vincent Warmerdam, who did this great presentation
- Possibly Wrong