Max Halford ツBlog Links BioMy unvarnished guide to solution engineering 2026-06-04 data-science showerthoughtsAutonomous web scraping with Claude Code 2026-05-11 scraping llmLower your warehouse costs via DuckDB transpilation 2026-03-12 data-eng sqlText classification with Python 3.14's zstd module 2026-02-06 machine-learning text-processing pythonSolving Détrak with brute force 2026-02-02 optimization llmNostalgia for a time I didn’t experience 2026-01-17 showerthoughtsRow level lineage at Carbonfact 2026-01-09 python data-engNo pain no startup 2025-10-27 showerthoughtsScraping Google Calendar events 2025-10-12 python scrapingWarmshowers sparks joy 2025-08-24 bike-touring showerthoughtsDo LLMs identify fonts? 2025-07-30 llm scrapingThoughts on DuckLake 2025-06-09 data-engThe total derivative of a metric tree 2025-05-06 data-scienceMinimizing the runtime of a SQL DAG 2025-02-08 data-eng pythonHard data integration problems at Carbonfact 2025-01-02 data-scienceIntroducing icanexplain @ PyData Paris 2024 2024-09-26 analytics-engineering python@daily_cache implementation in Python 2024-08-27 pythonLCA software: exit the matrix 2024-06-09 sustainability pythonCutting up shoes to measure their footprint 2024-05-17 sustainability data-scienceA training set for bike sharing forecasting 2024-04-04 data-eng machine-learningFast Poetry and pre-commit with GitHub Actions 2024-02-27 pythonDecomposing funnel metrics 2023-12-14 data-scienceEfficient ELT refreshes 2023-12-01 data-engOnline machine learning on the road @ IDE+A, TH Köln 2023-10-26 online-machine-learningSh*t flows downhill, but not at Carbonfact 2023-10-16 data-engAnswering "Why did the KPI change?" using decomposition 2023-08-09 data-scienceMeasuring the carbon footprint of pizzas 2023-06-25 sustainability pythonGraph components with DuckDB 2023-06-03 data-science sqlFor analytics, don't use dynamic JSON keys 2023-05-11 data-eng sqlMetric correctness doesn't matter, consistency does 2023-04-28 data-scienceOnline gradient descent written in SQL 2023-03-07 online-machine-learning sqlUsing SymPy in Python doctests 2023-02-15 pythonOnline active learning in 80 lines of Python 2023-01-22 online-machine-learningAre Airbnb guests less energy efficient than their host? 2023-01-17 sustainability data-scienceThe future of River 2022-12-13 online-machine-learningParsing garment descriptions with GPT-3 2022-11-20 text-processingDynamic on-screen TV keyboards 2022-09-25 uxNLP at Carbonfact: how would you do it? 2022-09-06 text-processingMatrix inverse mini-batch updates 2022-08-24 online-machine-learningA rant against dbt ref 2022-06-28 data-eng sql rantFirst IRL meetup with the River developers 2022-06-09 online-machine-learningOnline machine learning with River @ GAIA 2022-04-07 online-machine-learningFuzzy regex matching in Python 2022-04-04 text-processing pythonOCR spelling correction is hard 2022-03-06 text-processingComic book panel segmentation 2022-03-05 image-processing pythonOnline machine learning in practice @ PyData PDX 2022-02-09 online-machine-learningThe online machine learning predict/fit switcheroo 2022-01-06 online-machine-learningWeighted sampling without replacement in pure Python 2021-12-24 pythonOnline machine learning in practice @ Applied AI 2021-12-17 online-machine-learningOnline machine learning in practice @ LVMH 2021-12-10 online-machine-learningWeb scraping, upside down 2021-11-11 scrapingOne year at Alan 2021-10-26 job-logManipulating ephemeral data with git 2021-10-07 scrapingDashboards and GROUPING SETS 2021-09-10 data-eng sqlHomoglyphs: different characters that look identical 2021-08-19 text-processingAutomated document processing at Alan 2021-06-10 text-processingText classification by data compression 2021-06-08 machine-learning text-processingReducing the memory footprint of a scikit-learn text classifier 2021-04-11 machine-learning text-processingAn overview of dataset time travel 2021-04-07 data-engThe challenges of online machine learning in production @ Itaú Unibanco 2021-02-26 online-machine-learningQuelle est l’empreinte écologique du Big Data? @ Toulouse Tech 2021-01-22 sustainabilityOrganising a Kaggle InClass competition with a fairness metric 2021-01-21 kaggleConverting Amazon Textract tables to pandas DataFrames 2021-01-14 text-processingWhat my PhD was about 2021-01-06 job-logComputing cross-correlations in SQL 2020-11-17 sqlUnsupervised text classification with word embeddings 2020-10-03 machine-learning text-processingFocal loss implementation for LightGBM 2020-09-20 machine-learning pythonA few intermediate pandas tricks 2020-08-17 data-engA brief introduction to online machine learning @ Hong Kong Machine Learning Meetup 2020-06-10 online-machine-learningThe correct way to evaluate online machine learning models 2020-06-07 online-machine-learningOnline machine learning with decision trees @ Toulouse AOC workgroup 2020-05-07 online-machine-learningServer-sent events in Flask without extra dependencies 2020-05-04 web-dev pythonI got plagiarized and Google didn't help 2020-04-17 rantOur solution to the IDAO 2020 qualifiers 2020-04-12 competitive-machine-learningSpeeding up scikit-learn for single predictions 2020-03-31 machine-learningMachine learning for streaming data with creme 2020-03-26 online-machine-learningGlobal explanation of machine learning with sensitivity analysis @ MASCOT-NUM 2020-03-10 machine-learning explainabilityBayesian linear regression for practitioners 2020-02-26 machine-learningUnder-sampling a dataset with desired ratios 2019-12-17 machine-learningThe benefits of online machine learning @ Quantmetry 2019-10-29 online-machine-learningThe benefits of online machine learning @ Element AI 2019-10-23 online-machine-learningFinding fuzzy duplicates with pandas 2019-09-16 data-engA smooth approach to putting machine learning into production 2019-07-13 machine-learning data-engThe benefits of online machine learning @ Airbus Bizlab 2019-06-28 online-machine-learningMachine learning incrémental: des concepts à la pratique @ Toulouse Data Science Meetup 2019-05-28 online-machine-learningSkyline queries in Python 2019-05-21 data-engOnline machine learning with creme @ PyData Amsterdam 2019-05-11 online-machine-learningSQL subquery enumeration 2019-05-06 sqlAn approach based on Bayesian networks for query selectivity estimation @ DASFAA 2019-04-22 selectivity-estimation phdMorellet crosses with JavaScript 2019-02-03 generative-artStreaming groupbys in pandas for big datasets 2018-12-05 online-machine-learningTarget encoding done the right way 2018-10-13 machine-learning pythonStella triangles with JavaScript 2018-04-26 generative-artUnknown pleasures with JavaScript 2017-07-24 generative-artSubsampling a training set to match a test set - Part 1 2017-06-19 machine-learningDocker for data science @ HelloFresh Berlin 2017-06-01 data-scienceHalftoning with Go - Part 2 2017-03-20 image-processingGrid paintings à la Mondrian with JavaScript 2017-03-04 generative-artA short introduction and conclusion to the OpenBikes 2016 Challenge 2017-01-26 kaggleChallenge Big Data @ TSE 2017-01-09 competitive-machine-learningHalftoning with Go - Part 1 2016-11-27 image-processingPredire la disponibilité des Velib' @ Toulouse Data Science Meetup 2016-03-30 data-science machine-learning data-vizRecursive polygons with JavaScript 2016-03-25 generative-artThe Naïve Bayes classifier 2015-09-10 machine-learningAn introduction to genetic algorithms 2015-08-02 machine-learningSetting up a droplet to host a Flask app 2015-07-14 web-devVisualizing bike stations live data 2015-06-03 data-viz