Links
Table of contents
Smart people
- Tim Salimans on Data Analysis
- Randal Olson
- Sam & Max – French and NSFW!
- Sebastian Raschka
- Clean Coder
- Pythonic Perambulations
- Erik Bernhardsson
- otoro
- Terra Incognita
- Real Python
- Airbnb Engineering
- No Free Hunch
- The Unofficial Google Data Science Blog
- will wolf
- Edwin Chen
- Use the index, Luke!
- Jack Preston
- Agustinus Kristiadi
- DataGenetics
- Katherine Bailey
- Netflix Research
- inFERENce
- Hyndsight – Rob Hyndman is a time series specialist.
- While My MCMC Gently Samples
- Ines Montani – by one of the founders of spaCy.
- Stephen Smerity
- Peter Norvig
- IT Best Kept Secret Is Optimization – By Jean-Francois Puget, aka CPMP.
- explained.ai
- Better Explained
- Genetic Argonaut
- pandas blog
- Towards Data Science
- Probably Overthinking It
- Simply Statistics
- Practically Predictable
- koaning – by Vincent Warmerdam, who made calmcode
- blogarithms
- Possibly Wrong
- FastML
- Parameter-free Learning and Optimization Algorithms
- Todd W. Schneider – This guy is really good at exploratory data analysis.
- Yann Thaddée – Not directly related to data science but interesting nonetheless.
- Colins Blog
- Fabien Sanglard – nothing to do with data science, but such good taste!
- The Glowing Python – By the creator of MiniSom, which is worth checking out too.
- Matt Hancock
- Francis Bach – Someone with an h-index of 80+ who takes the time to blog is worth reading.
- Gwern Branwen – Cool in a weird way.
- Libres pensées d’un mathématicien ordinaire
- Count Bayesie
- Jim Savage
- Nick Higham – A lot of well explained algebra.
- Calmcode – Not a blog per se, but a nice collection of short to the point tutorials about various tools.
- Chris Said
- Evan Miller
- Eric Jang
- Andrey Akinshin
- Single Lunch
- Freakonometrics
- Martin Daniel
- Chris Kiehl
- ithaka.im – A guy I met who travelled for 6 years with his wife on a bike, very inspiring.
- Muthukrishnan – Has written some neat document processing stuff.
- Björn Ottosson
- Guilherme Duarte Marmerola
- Cal Paterson
- Claire Carroll
- Luke Metz – Luke is working on the niche topic of meta-learning at Google. He also happens to a very kind person.
- Practical Recommendations – A blog about recommender systems.
- Robin Linacre – Some good stuff related to record linkage.
- Neal Lathia – Machine learning in production stuff.
- John D. Cook
- Brandon Roberts
- Allen Downey
- Christophe Blefari
- Scott Rome
- Eugene Yan
- Lj Miranda
- death and gravity – Great advanced Python resource.
- The Shape of Data
- IDEA
- Shaded relief
- Leslie Lamport
- Curtis Miller
- Naftali Harris
- Laird Breyer – wrote some cool software for text classification called dbacl, and markovpr which is a PageRank implementation.
- Vicky Boykis – the OG behind Normconf
- Danielle Navarro
- Amit Patel – visual explanations of algorithms used in games.
Machine learning
- The Elements of Statistical Learning - Jerome H. Friedman, Robert Tibshirani, and Trevor Hastie
- Machine Learning - Tom Mitchell – I think this wonderful textbook is under-appreciated.
- Artificial Intelligence: A Modern Approach - Russel & Norvig
- mlcourse.ai – Of all the introductions to machine learning, I think this is the one that strikes the best balance between theory and practice.
- Machine learning cheat sheets - Shervine Amidi
- Kalman and Bayesian Filters in Python - Roger Labbe – Kalman filters are notoriously hard to grok, this tutorial nicely builds up the steps to understanding them.
- CS231n Convolutional Neural Networks for Visual Recognition - Stanford
- Algorithmes d’optimisation non-linéaire sans contrainte (French) - Michel Bergmann
- Graphical Models in a Nutshell - Koller et al.
- Rules of Machine Learning: Best Practices for ML Engineering - Martin Zinkevich – You should read this once a year.
- A Few Useful Things to Know about Machine Learning - Pedro Domingos – This short paper summarizes basic truths in machine learning.
- How to Write a Spelling Corrector - Peter Norvig – Magic in 36 lines of code.
- MCMC sampling for dummies - Thomas Wiecki
- Your Easy Guide to Latent Dirichlet Allocation
- An Intuitive Explanation of Convolutional Neural Networks - Ujjwal Karn
- An overview of gradient descent optimization algorithms - Sebastian Ruder
- How to explain gradient boosting - Terence Parr and Jeremy Howard – A very good introduction to vanilla gradient boosting with step by step examples.
- Why Does XGBoost Win “Every” Machine Learning Competition? - Didrik Nielsen – This Master’s thesis goes into some of the details of XGBoost without being too bloated.
- Statistical tests, P values, confidence intervals, and power: a guide to misinterpretations
- The Cramér-Rao Lower Bound on Variance: Adam and Eve’s “Uncertainty Principle” - Michael Powers
- A Concrete Introduction to Probability (using Python) - Peter Norvig – Extremely elegant Python coding.
- The Hungarian Maximum Likelihood Trick - Louis Abraham
- Machine Learning for Signal Processing - University of Illinois
- Gaussian Process, not quite for dummies - Yuge Shi – Gaussian processes are quite difficult to understand (at least, for me) but Yuge gives some great visual intuitions.
- Frequentism and Bayesianism: A Python-driven Primer - Jake VanderPlas
- Variational Inference: A Review for Statisticians - David Blei and his flock
- The Performance of Decision Tree Evaluation Strategies - Andrew Tulloch
- Simplifying Graph Convolutional Networks - Felix Wu et al. – A nice example of putting the horse before the cart.
- MIT 6.867 machine learning course notes - Tommi Jaakola – For people who enjoy concise mathematical notation.
- A Recipe for Training Neural Networks - Andrej Karpathy
- The Bitter Lesson - Richard Sutton
- Introduction to Locality-Sensitive Hashing - Tyler Neylon
- Transformers from scratch - Peter Bloem
- A Machine Learning Primer - Mihail Eric – A good read for beginners in machine learning algorithms.
- Fitting Bayesian structural time series with the bsts R package - Steven L. Scott
- Super Fast String Matching in Python - Chris van den Berg
- Poisson regression and non-normal loss - scikit-learn
- Perfect lung cancer detections in a $1 million ML competition with an ingenious hack - Yusaku Sako
- Beware Default Random Forest Importances
- From RankNet to LambdaRank to LambdaMART: An Overview
- Word2Vec Tutorial - The Skip-Gram Model - Chris McCormick
- Causal Inference for The Brave and True
- Data Distribution Shifts and Monitoring - Chip Huyen
- Creating Confidence Intervals for Machine Learning Classifiers - Sebastian Raschka
- Machine Learning @ VU
- SVD Image Compression, Explained - Dennis Miczek
- Classifying all of the pdfs on the internet - Santiago Pedroza
Deep learning
- Transformers from scratch - Peter Bloem
- Explaining RNNs without neural networks
- Practical Deep Learning for Coders
- What Is ChatGPT Doing… and Why Does It Work?
Data science
- Make for data scientists - Paul Butler
- Tidy Data - Hadley Wickham – You need to be aware of this framework if you want to be serious about analysing tabular data.
- Modeling marketing attribution - Claire Carroll – I worked on this problem for a short time at Alan. I definitely would have done a better job if I had read this article first.
- Darts, Dice, and Coins: Sampling from a Discrete Distribution - Keith Schwarz
- Unprojecting text with ellipses - Matt Zucker – See also this article on page dewarping by the same author.
- Language models, classification and dbacl - Laird A. Breyer – Machine learning on text with a UNIX philosophy.
- Teaching An Old Dog A New Trick - Chris Kamphuis
- Optimal Peanut Butter and Banana Sandwiches - Ethan Rosenthal
- The Data Science Hierarchy of Needs - Monica Rogati
- Tuesday Changes Everything - Jesper Juul
- Doing Named Entity Recognition? Don’t optimize for F1 - Christopher Manning – A rather niche topic, but well explained.
- Lessons learned building an ML trading system that turned \$5k into \$200k
- Common statistical tests are linear models (or: how to teach stats)
Mathematics
Physics
- Portail Électricité - Vikidia
- Simulating Fluids, Fire, and Smoke in Real-Time - Andrew Chan
- GPS - Bartosz Ciechanowski – all his articles are great.
Data engineering
- Emerging Architectures for Modern Data Infrastructure
- What your data team is using: the analytics stack - Technically – Another solid article to understand what an analytics stack looks like in 2021.
- Multiworld Testing Decision Service: A System for Experimentation, Learning, And Decision-Making
- Machine Learning: The High-Interest Credit Card of Technical Debt - Google
- Continuous Delivery for Machine Learning - Martin Fowler
- Hidden Technical Debt in Machine Learning Systems - Google
- The Log: What every software engineer should know about real-time data’s unifying abstraction - Jay Kreps
- Command-line Tools can be 235x Faster than your Hadoop Cluster - Adam Drake
- Git scraping, the five minute lightning talk - Simon Willison – I wish I had thought about this first!
- Gently down the stream - Mitch Seymour
- Turning the database inside-out with Apache Samza
- The Snowflake Elastic Data Warehouse
- Differential Dataflow – also see the Naiad paper
- Time, Clocks, and the Ordering of Events in a Distributed System - Leslie Lamport
- How Query Engines Work
- Building a cost-effective analytics stack with Modal, dlt, and dbt – prime example of what a modern analytics stack looks like in late 2024.
Inspiring data analysis
- Bayesian Rock Climbing Rankings - Ethan Rosenthal
- Is Seattle Really Seeing an Uptick In Cycling? - Jake VanderPlas
- How we changed our roof and cut 1.5 tons of CO2e - Martin Daniel
- WWW: Who Will Win? - Peter Norvig
- Wealth shown to scale - Matt Korostoff
- Are Pop Lyrics Getting More Repetitive? - Colin Morris
- Tracking the Fake GitHub Star Black Market - Fraser Marlow, Yuhan Luo, Alana Glassco
- Why the super rich are inevitable - The Pudding – Really cool dataviz.
- Kaggle contest on Observing Dark World - Cam Davidson-Pilon – If you’re doubtful about the power of Bayesian machine learning, then read this and get mindblown.
- looria.com/reddit – This is a website that aggregates informal product reviews found on Reddit. There’s a bunch of cool NLP stuff going on behind the scenes. For instance here’s recommendations for cycling and camping gear.
- Who is the average nomad? – feeds from NomadList live data.
- Every Noise at Once – uses PCA to map music genres.
- How Big is YouTube? - Ethan Zuckerman
- NYC Taxi Rides viz
- Mario meets Pareto - Antoine Mayerowitz
- We mapped weather forecast accuracy across the U.S. Look up your city
- Resurfacing the past - a madlad decides to pinpoint all the ships that sank during WWII.
Life cycle assessment (LCA)
Data sources
- API Rank
- Finding Undocumented APIs
- bigquery-public-data
- fh-bigquery
- Wikidata Query Service
- New-York City transport data
- Reverse Engineering Bumble’s API – a fun/scary API reverse engineering example that worked in 2020
- ccxt – access cryptocurrency exchanges’ APIs
- Our World in Data
- Beyond the route: Introducing granular MTA bus speed data
Data visualization
- Datawrapper – great way to produce professional looking charts and tabels.
- SlidesCodeHighlighter
Food for thought
- If Sapiens were a blog post - Neil Kakkar
- Fast - Patrick Collison
- Choose Boring Technology - Dan McKinley
- Memos - Sriram Krishnan
- What is Money, Anyway? - Lyn Alden
- The Final Speech from The Great Dictator
- The tyranny of the algorithm: why every coffee shop looks the same
- Against Disruption: On the Bulletpointization of Books
SQL
- The Best Medium-Hard Data Analyst SQL Interview Questions – There are some great interactive SQL tutorials out there, such as SQLBolt and Select Star SQL, but this one takes the cake due to its complexity. The Ultimate SQL guide is a comprehensive guide made with Count.
- Bypassing airport security via SQL injection – A
fundangerous example of what can happen when you don’t sanitize your inputs.
Programming
- The Grand Unified Theory of Software Architecture
- Don’t Call Yourself A Programmer, And Other Career Advice
- Rules of Programming - Rob Pike
- Why Lisp?
Writing
- Common Bugs in Writing
- Novelist Cormac McCarthy’s tips on how to write a great science paper - Savage and Yeh
- How to Build an Economic Model in Your Spare Time - Hal R. Varian – The academic wisdom in this article goes beyond the world of economics.
Web development
- Visual design rules you can safely follow every time - Anthony Hobday – Good follow-up to Web Design in 4 minutes by Jeremy Thomas.
- Typography in ten minutes
- alpine.js – I usually go to Vue.js for web dev, but my brother made me realize alpine.js is a great alternative for small projects.
- Hot Page – looks like a good idea to create a landing page.
Building a product
- Beautiful Polished Rocks - Steve Jobs – the best metaphor for product design I’ve ever heard.
- Stevey’s Google Platforms Rant – insights about product design at GAFAs.
- Jeff Bezos on the disagree and commit principle
I don’t have a clue but it looks cool
Eye candy
- Tyler Hobbs – The god of generative arts.
- Some Jean Giraud stuff
- Mauro Martins
- A new way to knit by Petros Vrellis
- A fascinating article about Manolo Gamboa Naon
- Some Ukiyo-e
- Turtletoy
- Dwitter
- generated.space
- Pixel art by Marcus Blättermann
- Nick Barnes’ football bible
- Simon Stålenhag
- Syd Mead (who worked on Blade Runner)
- Michael Fogleman’s blog
- World of Warcraft art by Dreamwalker
- Hors-sol de AKOREACRO
- Erica Anderson
- Jack Sharp
- Archillect – An AI that curates cool pictures, how awesome is that?
- Martin Kleppe
- Zoomquilt
- lossfunctions.tumblr.com – Yes, that’s a thing.
- Shirts of Peter Norvig
- United Airlines ads by Cream Electric Art
- Miniature Calendar by Tatsuya Tanaka – Broccolis that look like trees, staples that look like workout benches… I love it!
- sandspiel
- Jorge Jacinto
- WaveFunctionCollapse
- Owen D. Pomery
- 19th Century French Artists Predicted The World Of The Future In This Series Of Postcards
- Blog maps
- Decktwo
- eycndy.com
- Fred’s ImageMagick Scripts
- Ditherpunk - Surma
- Visually stunning math concepts which are easy to explain - StackExchange
- Cars, bars and burger joints: William Eggleston’s iconic America – in pictures
- Spectrolite
- RamenHaus
- SportsNetUSA.net
- readcomiconline
- MUBI
- La vida en viñetas
- Plotting 3 years of hourly data in 150ms
- What I’ve learned about flow fields so far
- Dear Data
- FAA Aviation Maps
- Floor796
- John Martin
- marimekko.com
Pretty websites
- MotherDuck: Data Infrastructure and Analytics
- Welcome to the Operational Analytics Club 👋
- Snaplet
- Noun Project: Free Icons & Stock Photos for Everything
- Benthos Studio
- Claire Glanois
- API for Automated Image and Video Generation
- 𝚜𝚙𝚎𝚗𝚌𝚎𝚛𝚌𝚑𝚊𝚗𝚐.𝚖𝚎 𝚒𝚜 𝚠𝚊𝚗𝚍𝚎𝚛𝚒𝚗𝚐
- Maintenance 🌱 Digital Garden
- Equals | The fastest way for startups to do any analysis
- Maki.vc | European Venture Capital Firm
- Harlequin: The DuckDB IDE for Your Terminal.
- Inter font family
- Neatnik
- The Creative Independent
- Bay 12 Games: Dwarf Fortress
- Browserbear
I like these retrocool websites:
Cool
- WindowSwap
- Radio Garden
- Every Noise at Once
- Starlink Satellites Tracker
- Based Cooking
- ReadComicOnline – I recommend these French comics.
- Same Energy
- BOOOOOOM
- indieblog.page
- Cloudhiker
- Fish doorbells! Historic sandwiches! 50 of the weirdest, most wonderful corners of the web
- Marginalia
- Anna’s Archive
- Browser games – these are made by a single doujin developer called Kenta Cho
- Pong wars