Links
Table of contents
Smart people
- Tim Salimans on Data Analysis
- Randal Olson
- Sam & Max – French and NSFW!
- Sebastian Raschka
- Clean Coder
- Pythonic Perambulations
- Erik Bernhardsson
- otoro
- Terra Incognita
- Real Python
- Airbnb Engineering
- No Free Hunch
- The Unofficial Google Data Science Blog
- will wolf
- Edwin Chen
- Use the index, Luke!
- Jack Preston
- Agustinus Kristiadi
- DataGenetics
- Katherine Bailey
- Netflix Research
- inFERENce
- Hyndsight – Rob Hyndman is a time series specialist.
- While My MCMC Gently Samples
- Ines Montani – by one of the founders of spaCy.
- Stephen Smerity
- Peter Norvig
- IT Best Kept Secret Is Optimization – By Jean-Francois Puget, aka CPMP.
- explained.ai
- Better Explained
- Genetic Argonaut
- pandas blog
- Towards Data Science
- Probably Overthinking It
- Simply Statistics
- Practically Predictable
- koaning – by Vincent Warmerdam, who made calmcode
- blogarithms
- Possibly Wrong
- FastML
- Parameter-free Learning and Optimization Algorithms
- Todd W. Schneider – This guy is really good at exploratory data analysis.
- Yann Thaddée – Not directly related to data science but interesting nonetheless.
- Colins Blog
- Fabien Sanglard – nothing to do with data science, but such good taste!
- The Glowing Python – By the creator of MiniSom, which is worth checking out too.
- Matt Hancock
- Francis Bach – Someone with an h-index of 80+ who takes the time to blog is worth reading.
- Gwern Branwen – Cool in a weird way.
- Libres pensées d’un mathématicien ordinaire
- Count Bayesie
- Jim Savage
- Nick Higham – A lot of well explained algebra.
- Calmcode – Not a blog per se, but a nice collection of short to the point tutorials about various tools.
- Chris Said
- Evan Miller
- Eric Jang
- Andrey Akinshin
- Single Lunch
- Freakonometrics
- Martin Daniel
- Chris Kiehl
- ithaka.im – A guy I met who travelled for 6 years with his wife on a bike, very inspiring.
- Muthukrishnan – Has written some neat document processing stuff.
- Björn Ottosson
- Guilherme Duarte Marmerola
- Cal Paterson
- Claire Carroll
- Luke Metz – Luke is working on the niche topic of meta-learning at Google. He also happens to a very kind person.
- Practical Recommendations – A blog about recommender systems.
- Robin Linacre – Some good stuff related to record linkage.
- Neal Lathia – Machine learning in production stuff.
- John D. Cook
- Brandon Roberts
- Allen Downey
- Christophe Blefari
- Scott Rome
- Eugene Yan
- Lj Miranda
- death and gravity – Great advanced Python resource.
- The Shape of Data
- IDEA
- Shaded relief
- Leslie Lamport
- Curtis Miller
- Naftali Harris
Machine learning
- The Elements of Statistical Learning - Jerome H. Friedman, Robert Tibshirani, and Trevor Hastie
- Machine Learning - Tom Mitchell – I think this wonderful textbook is under-appreciated.
- Artificial Intelligence: A Modern Approach - Russel & Norvig
- mlcourse.ai – Of all the introductions to machine learning, I think this is the one that strikes the best balance between theory and practice.
- Machine learning cheat sheets - Shervine Amidi
- Kalman and Bayesian Filters in Python - Roger Labbe – Kalman filters are notoriously hard to grok, this tutorial nicely builds up the steps to understanding them.
- CS231n Convolutional Neural Networks for Visual Recognition - Stanford
- Algorithmes d’optimisation non-linéaire sans contrainte (French) - Michel Bergmann
- Graphical Models in a Nutshell - Koller et al.
- Rules of Machine Learning: Best Practices for ML Engineering - Martin Zinkevich – You should read this once a year.
- A Few Useful Things to Know about Machine Learning - Pedro Domingos – This short paper summarizes basic truths in machine learning.
- How to Write a Spelling Corrector - Peter Norvig – Magic in 36 lines of code.
- MCMC sampling for dummies - Thomas Wiecki
- Your Easy Guide to Latent Dirichlet Allocation
- An Intuitive Explanation of Convolutional Neural Networks - Ujjwal Karn
- An overview of gradient descent optimization algorithms - Sebastian Ruder
- How to explain gradient boosting - Terence Parr and Jeremy Howard – A very good introduction to vanilla gradient boosting with step by step examples.
- Why Does XGBoost Win “Every” Machine Learning Competition? - Didrik Nielsen – This Master’s thesis goes into some of the details of XGBoost without being too bloated.
- Statistical tests, P values, confidence intervals, and power: a guide to misinterpretations
- The Cramér-Rao Lower Bound on Variance: Adam and Eve’s “Uncertainty Principle” - Michael Powers
- A Concrete Introduction to Probability (using Python) - Peter Norvig – Extremely elegant Python coding.
- The Hungarian Maximum Likelihood Trick - Louis Abraham
- Machine Learning for Signal Processing - University of Illinois
- Gaussian Process, not quite for dummies - Yuge Shi – Gaussian processes are quite difficult to understand (at least, for me) but Yuge gives some great visual intuitions.
- Frequentism and Bayesianism: A Python-driven Primer - Jake VanderPlas
- Variational Inference: A Review for Statisticians - David Blei and his flock
- The Performance of Decision Tree Evaluation Strategies - Andrew Tulloch
- Simplifying Graph Convolutional Networks - Felix Wu et al. – A nice example of putting the horse before the cart.
- MIT 6.867 machine learning course notes - Tommi Jaakola – For people who enjoy concise mathematical notation.
- A Recipe for Training Neural Networks - Andrej Karpathy
- The Bitter Lesson - Richard Sutton
- Introduction to Locality-Sensitive Hashing - Tyler Neylon
- Transformers from scratch - Peter Bloem
- A Machine Learning Primer - Mihail Eric – A good read for beginners in machine learning algorithms.
- Fitting Bayesian structural time series with the bsts R package - Steven L. Scott
- Super Fast String Matching in Python - Chris van den Berg
Data science
- Make for data scientists - Paul Butler
- Tidy Data - Hadley Wickham – You need to be aware of this framework if you want to be serious about analysing tabular data.
- Modeling marketing attribution - Claire Carroll – I worked on this problem for a short time at Alan. I definitely would have done a better job if I had read this article first.
- Darts, Dice, and Coins: Sampling from a Discrete Distribution - Keith Schwarz
- Unprojecting text with ellipses - Matt Zucker – See also this article on page dewarping by the same author.
- Language models, classification and dbacl - Laird A. Breyer – Machine learning on text with a UNIX philosophy.
- Teaching An Old Dog A New Trick - Chris Kamphuis
- Optimal Peanut Butter and Banana Sandwiches - Ethan Rosenthal
- The Data Science Hierarchy of Needs - Monica Rogati
- Tuesday Changes Everything - Jesper Juul
- Doing Named Entity Recognition? Don’t optimize for F1 - Christopher Manning – A rather niche topic, but well explained.
- Lessons learned building an ML trading system that turned \$5k into \$200k
- Common statistical tests are linear models (or: how to teach stats)
Data engineering
- Emerging Architectures for Modern Data Infrastructure
- What your data team is using: the analytics stack - Technically – Another solid article to understand what an analytics stack looks like in 2021.
- Multiworld Testing Decision Service: A System for Experimentation, Learning, And Decision-Making
- Machine Learning: The High-Interest Credit Card of Technical Debt - Google
- Continuous Delivery for Machine Learning - Martin Fowler
- Hidden Technical Debt in Machine Learning Systems - Google
- The Log: What every software engineer should know about real-time data’s unifying abstraction - Jay Kreps
- Command-line Tools can be 235x Faster than your Hadoop Cluster - Adam Drake
- Git scraping, the five minute lightning talk - Simon Willison – I wish I had thought about this first!
- Choose Boring Technology - Dan McKinley
- Gently down the stream - Mitch Seymour
SQL
- The Best Medium-Hard Data Analyst SQL Interview Questions – There are some great interactive SQL tutorials out there, such as SQLBolt and Select Star SQL, but this one takes the cake due to its complexity. The Ultimate SQL guide is a comprehensive guide made with Count.
Good advice
- Don’t Call Yourself A Programmer, And Other Career Advice
- Memos - Sriram Krishnan
- Common Bugs in Writing
- Rules of Programming - Rob Pike
- Novelist Cormac McCarthy’s tips on how to write a great science paper - Savage and Yeh
- How to Build an Economic Model in Your Spare Time - Hal R. Varian – The academic wisdom in this article goes beyond the world of economics.
- Fast - Patrick Collison
- Visual design rules you can safely follow every time - Anthony Hobday – Good follow-up to Web Design in 4 minutes by Jeremy Thomas.
Product
- Beautiful Polished Rocks - Steve Jobs – the best metaphor for product design I’ve ever heard.
- Stevey’s Google Platforms Rant – insights about product design at GAFAs.
Inspiring data analysis
- Bayesian Rock Climbing Rankings - Ethan Rosenthal
- Is Seattle Really Seeing an Uptick In Cycling? - Jake VanderPlas
- How we changed our roof and cut 1.5 tons of CO2e - Martin Daniel
- WWW: Who Will Win? - Peter Norvig
- Wealth shown to scale - Matt Korostoff
- Are Pop Lyrics Getting More Repetitive? - Colin Morris
- Tracking the Fake GitHub Star Black Market - Fraser Marlow, Yuhan Luo, Alana Glassco
- Why the super rich are inevitable - The Pudding – Really cool dataviz.
- Kaggle contest on Observing Dark World - Cam Davidson-Pilon – If you’re doubtful about the power of Bayesian machine learning, then read this and get mindblown.
- looria.com/reddit – This is a website that aggregates informal product reviews found on Reddit. There’s a bunch of cool NLP stuff going on behind the scenes. For instance here’s recommendations for cycling and camping gear.
Data sources
Eye candy
- Tyler Hobbs – The god of generative arts.
- Some Jean Giraud stuff
- Mauro Martins
- A new way to knit by Petros Vrellis
- A fascinating article about Manolo Gamboa Naon
- Some Ukiyo-e
- Turtletoy
- Dwitter
- generated.space
- Pixel art by Marcus Blättermann
- Nick Barnes’ football bible
- Simon Stålenhag
- Syd Mead (who worked on Blade Runner)
- Michael Fogleman’s blog
- World of Warcraft art by Dreamwalker
- Hors-sol de AKOREACRO
- Erica Anderson
- Jack Sharp
- Archillect – An AI that curates cool pictures, how awesome is that?
- Martin Kleppe
- Zoomquilt
- lossfunctions.tumblr.com – Yes, that’s a thing.
- Shirts of Peter Norvig
- United Airlines ads by Cream Electric Art
- Miniature Calendar by Tatsuya Tanaka – Broccolis that look like trees, staples that look like workout benches… I love it!
- sandspiel
- Jorge Jacinto
- WaveFunctionCollapse
- Owen D. Pomery
- einsteigenbitte.eu – I love “weird” websites like Claire Glanois'
- 19th Century French Artists Predicted The World Of The Future In This Series Of Postcards
- Blog maps
- Decktwo
- eycndy.com
- Fred’s ImageMagick Scripts
- Cameras and Lenses - Bartosz Ciechanowski – 100% worth a read.
- Ditherpunk - Surma
- Visually stunning math concepts which are easy to explain - StackExchange
- Cars, bars and burger joints: William Eggleston’s iconic America – in pictures
- Spectrolite
- RamenHaus