Max Halford ツ
Blog Links Bio
My unvarnished guide to solution engineering
2026-06-04 data-science showerthoughts
Autonomous web scraping with Claude Code
2026-05-11 scraping llm
Lower your warehouse costs via DuckDB transpilation
2026-03-12 data-eng sql
Text classification with Python 3.14's zstd module
2026-02-06 machine-learning text-processing python
Solving Détrak with brute force
2026-02-02 optimization llm
Nostalgia for a time I didn’t experience
2026-01-17 showerthoughts
Row level lineage at Carbonfact
2026-01-09 python data-eng
No pain no startup
2025-10-27 showerthoughts
Scraping Google Calendar events
2025-10-12 python scraping
Warmshowers sparks joy
2025-08-24 bike-touring showerthoughts
Do LLMs identify fonts?
2025-07-30 llm scraping
Thoughts on DuckLake
2025-06-09 data-eng
The total derivative of a metric tree
2025-05-06 data-science
Minimizing the runtime of a SQL DAG
2025-02-08 data-eng python
Hard data integration problems at Carbonfact
2025-01-02 data-science
Introducing icanexplain @ PyData Paris 2024
2024-09-26 analytics-engineering python
@daily_cache implementation in Python
2024-08-27 python
LCA software: exit the matrix
2024-06-09 sustainability python
Cutting up shoes to measure their footprint
2024-05-17 sustainability data-science
A training set for bike sharing forecasting
2024-04-04 data-eng machine-learning
Fast Poetry and pre-commit with GitHub Actions
2024-02-27 python
Decomposing funnel metrics
2023-12-14 data-science
Efficient ELT refreshes
2023-12-01 data-eng
Online machine learning on the road @ IDE+A, TH Köln
2023-10-26 online-machine-learning
Sh*t flows downhill, but not at Carbonfact
2023-10-16 data-eng
Answering "Why did the KPI change?" using decomposition
2023-08-09 data-science
Measuring the carbon footprint of pizzas
2023-06-25 sustainability python
Graph components with DuckDB
2023-06-03 data-science sql
For analytics, don't use dynamic JSON keys
2023-05-11 data-eng sql
Metric correctness doesn't matter, consistency does
2023-04-28 data-science
Online gradient descent written in SQL
2023-03-07 online-machine-learning sql
Using SymPy in Python doctests
2023-02-15 python
Online active learning in 80 lines of Python
2023-01-22 online-machine-learning
Are Airbnb guests less energy efficient than their host?
2023-01-17 sustainability data-science
The future of River
2022-12-13 online-machine-learning
Parsing garment descriptions with GPT-3
2022-11-20 text-processing
Dynamic on-screen TV keyboards
2022-09-25 ux
NLP at Carbonfact: how would you do it?
2022-09-06 text-processing
Matrix inverse mini-batch updates
2022-08-24 online-machine-learning
A rant against dbt ref
2022-06-28 data-eng sql rant
First IRL meetup with the River developers
2022-06-09 online-machine-learning
Online machine learning with River @ GAIA
2022-04-07 online-machine-learning
Fuzzy regex matching in Python
2022-04-04 text-processing python
OCR spelling correction is hard
2022-03-06 text-processing
Comic book panel segmentation
2022-03-05 image-processing python
Online machine learning in practice @ PyData PDX
2022-02-09 online-machine-learning
The online machine learning predict/fit switcheroo
2022-01-06 online-machine-learning
Weighted sampling without replacement in pure Python
2021-12-24 python
Online machine learning in practice @ Applied AI
2021-12-17 online-machine-learning
Online machine learning in practice @ LVMH
2021-12-10 online-machine-learning
Web scraping, upside down
2021-11-11 scraping
One year at Alan
2021-10-26 job-log
Manipulating ephemeral data with git
2021-10-07 scraping
Dashboards and GROUPING SETS
2021-09-10 data-eng sql
Homoglyphs: different characters that look identical
2021-08-19 text-processing
Automated document processing at Alan
2021-06-10 text-processing
Text classification by data compression
2021-06-08 machine-learning text-processing
Reducing the memory footprint of a scikit-learn text classifier
2021-04-11 machine-learning text-processing
An overview of dataset time travel
2021-04-07 data-eng
The challenges of online machine learning in production @ Itaú Unibanco
2021-02-26 online-machine-learning
Quelle est l’empreinte écologique du Big Data? @ Toulouse Tech
2021-01-22 sustainability
Organising a Kaggle InClass competition with a fairness metric
2021-01-21 kaggle
Converting Amazon Textract tables to pandas DataFrames
2021-01-14 text-processing
What my PhD was about
2021-01-06 job-log
Computing cross-correlations in SQL
2020-11-17 sql
Unsupervised text classification with word embeddings
2020-10-03 machine-learning text-processing
Focal loss implementation for LightGBM
2020-09-20 machine-learning python
A few intermediate pandas tricks
2020-08-17 data-eng
A brief introduction to online machine learning @ Hong Kong Machine Learning Meetup
2020-06-10 online-machine-learning
The correct way to evaluate online machine learning models
2020-06-07 online-machine-learning
Online machine learning with decision trees @ Toulouse AOC workgroup
2020-05-07 online-machine-learning
Server-sent events in Flask without extra dependencies
2020-05-04 web-dev python
I got plagiarized and Google didn't help
2020-04-17 rant
Our solution to the IDAO 2020 qualifiers
2020-04-12 competitive-machine-learning
Speeding up scikit-learn for single predictions
2020-03-31 machine-learning
Machine learning for streaming data with creme
2020-03-26 online-machine-learning
Global explanation of machine learning with sensitivity analysis @ MASCOT-NUM
2020-03-10 machine-learning explainability
Bayesian linear regression for practitioners
2020-02-26 machine-learning
Under-sampling a dataset with desired ratios
2019-12-17 machine-learning
The benefits of online machine learning @ Quantmetry
2019-10-29 online-machine-learning
The benefits of online machine learning @ Element AI
2019-10-23 online-machine-learning
Finding fuzzy duplicates with pandas
2019-09-16 data-eng
A smooth approach to putting machine learning into production
2019-07-13 machine-learning data-eng
The benefits of online machine learning @ Airbus Bizlab
2019-06-28 online-machine-learning
Machine learning incrémental: des concepts à la pratique @ Toulouse Data Science Meetup
2019-05-28 online-machine-learning
Skyline queries in Python
2019-05-21 data-eng
Online machine learning with creme @ PyData Amsterdam
2019-05-11 online-machine-learning
SQL subquery enumeration
2019-05-06 sql
An approach based on Bayesian networks for query selectivity estimation @ DASFAA
2019-04-22 selectivity-estimation phd
Morellet crosses with JavaScript
2019-02-03 generative-art
Streaming groupbys in pandas for big datasets
2018-12-05 online-machine-learning
Target encoding done the right way
2018-10-13 machine-learning python
Stella triangles with JavaScript
2018-04-26 generative-art
Unknown pleasures with JavaScript
2017-07-24 generative-art
Subsampling a training set to match a test set - Part 1
2017-06-19 machine-learning
Docker for data science @ HelloFresh Berlin
2017-06-01 data-science
Halftoning with Go - Part 2
2017-03-20 image-processing
Grid paintings à la Mondrian with JavaScript
2017-03-04 generative-art
A short introduction and conclusion to the OpenBikes 2016 Challenge
2017-01-26 kaggle
Challenge Big Data @ TSE
2017-01-09 competitive-machine-learning
Halftoning with Go - Part 1
2016-11-27 image-processing
Predire la disponibilité des Velib' @ Toulouse Data Science Meetup
2016-03-30 data-science machine-learning data-viz
Recursive polygons with JavaScript
2016-03-25 generative-art
The Naïve Bayes classifier
2015-09-10 machine-learning
An introduction to genetic algorithms
2015-08-02 machine-learning
Setting up a droplet to host a Flask app
2015-07-14 web-dev
Visualizing bike stations live data
2015-06-03 data-viz
Kaggle icon
mail