Max Halford ツ
Blog Links Bio
2025-06-09
Thoughts on DuckLake
2025-05-06
The total derivative of a metric tree
2025-02-08
Minimizing the runtime of a SQL DAG
2025-01-02
Hard data integration problems at Carbonfact
2024-09-26
Introducing icanexplain @ PyData Paris 2024
2024-08-27
@daily_cache implementation in Python
2024-06-09
LCA software: exit the matrix
2024-05-17
Cutting up shoes to measure their footprint
2024-04-04
A training set for bike sharing forecasting
2024-02-27
Fast Poetry and pre-commit with GitHub Actions
2023-12-14
Decomposing funnel metrics
2023-12-01
Efficient ELT refreshes
2023-10-26
Online machine learning on the road @ IDE+A, TH Köln
2023-10-16
Sh*t flows downhill, but not at Carbonfact
2023-08-09
Answering "Why did the KPI change?" using decomposition
2023-06-25
Measuring the carbon footprint of pizzas
2023-06-03
Graph components with DuckDB
2023-05-11
For analytics, don't use dynamic JSON keys
2023-04-28
Metric correctness doesn't matter, consistency does
2023-03-07
Online gradient descent written in SQL
2023-02-15
Using SymPy in Python doctests
2023-01-22
Online active learning in 80 lines of Python
2023-01-17
Are Airbnb guests less energy efficient than their host?
2022-12-13
The future of River
2022-11-20
Parsing garment descriptions with GPT-3
2022-09-25
Dynamic on-screen TV keyboards
2022-09-06
NLP at Carbonfact: how would you do it?
2022-08-24
Matrix inverse mini-batch updates
2022-06-28
A rant against dbt ref
2022-06-09
First IRL meetup with the River developers
2022-04-07
Online machine learning with River @ GAIA
2022-04-04
Fuzzy regex matching in Python
2022-03-06
OCR spelling correction is hard
2022-03-05
Comic book panel segmentation
2022-02-09
Online machine learning in practice @ PyData PDX
2022-01-06
The online machine learning predict/fit switcheroo
2021-12-24
Weighted sampling without replacement in pure Python
2021-12-17
Online machine learning in practice @ Applied AI
2021-12-10
Online machine learning in practice @ LVMH
2021-11-11
Web scraping, upside down
2021-10-26
One year at Alan
2021-10-07
Manipulating ephemeral data with git
2021-09-10
Dashboards and GROUPING SETS
2021-08-19
Homoglyphs: different characters that look identical
2021-06-10
Automated document processing at Alan
2021-06-08
Text classification by data compression
2021-04-11
Reducing the memory footprint of a scikit-learn text classifier
2021-04-07
An overview of dataset time travel
2021-02-26
The challenges of online machine learning in production @ Itaú Unibanco
2021-01-22
Quelle est l’empreinte écologique du Big Data? @ Toulouse Tech
2021-01-21
Organising a Kaggle InClass competition with a fairness metric
2021-01-14
Converting Amazon Textract tables to pandas DataFrames
2021-01-06
What my PhD was about
2020-11-17
Computing cross-correlations in SQL
2020-10-03
Unsupervised text classification with word embeddings
2020-09-20
Focal loss implementation for LightGBM
2020-08-17
A few intermediate pandas tricks
2020-06-10
A brief introduction to online machine learning @ Hong Kong Machine Learning Meetup
2020-06-07
The correct way to evaluate online machine learning models
2020-05-07
Online machine learning with decision trees @ Toulouse AOC workgroup
2020-05-04
Server-sent events in Flask without extra dependencies
2020-04-17
I got plagiarized and Google didn't help
2020-04-12
Our solution to the IDAO 2020 qualifiers
2020-03-31
Speeding up scikit-learn for single predictions
2020-03-26
Machine learning for streaming data with creme
2020-03-10
Global explanation of machine learning with sensitivity analysis @ MASCOT-NUM
2020-02-26
Bayesian linear regression for practitioners
2019-12-17
Under-sampling a dataset with desired ratios
2019-10-29
The benefits of online machine learning @ Quantmetry
2019-10-23
The benefits of online machine learning @ Element AI
2019-09-16
Finding fuzzy duplicates with pandas
2019-07-13
A smooth approach to putting machine learning into production
2019-06-28
The benefits of online machine learning @ Airbus Bizlab
2019-05-28
Machine learning incrémental: des concepts à la pratique @ Toulouse Data Science Meetup
2019-05-21
Skyline queries in Python
2019-05-11
Online machine learning with creme @ PyData Amsterdam
2019-05-06
SQL subquery enumeration
2019-04-22
An approach based on Bayesian networks for query selectivity estimation @ DASFAA
2019-02-03
Morellet crosses with JavaScript
2018-12-05
Streaming groupbys in pandas for big datasets
2018-10-13
Target encoding done the right way
2018-04-26
Stella triangles with JavaScript
2017-07-24
Unknown pleasures with JavaScript
2017-06-19
Subsampling a training set to match a test set - Part 1
2017-06-01
Docker for data science @ HelloFresh Berlin
2017-03-20
Halftoning with Go - Part 2
2017-03-04
Grid paintings à la Mondrian with JavaScript
2017-01-26
A short introduction and conclusion to the OpenBikes 2016 Challenge
2017-01-09
Challenge Big Data @ TSE
2016-11-27
Halftoning with Go - Part 1
2016-03-30
Predire la disponibilité des Velib' @ Toulouse Data Science Meetup
2016-03-25
Recursive polygons with JavaScript
2015-09-10
The Naïve Bayes classifier
2015-08-02
An introduction to genetic algorithms
2015-07-14
Setting up a droplet to host a Flask app
2015-06-03
Visualizing bike stations live data
Kaggle icon
mail