Blog on Max Halford
https://maxhalford.github.io/blog/
Recent content in Blog on Max HalfordHugo -- gohugo.ioen-USSat, 22 Aug 2015 06:42:21 -0700Skyline queries in Python
https://maxhalford.github.io/blog/skyline-queries-in-python/
Tue, 21 May 2019 00:00:00 +0000https://maxhalford.github.io/blog/skyline-queries-in-python/Imagine that you’re looking to buy a home. If you have an analytical mind then you might want to tackle this with a quantitative. Let’s suppose that you have a list of potential homes, and each home has some attributes that can help you compare them. As an example, we’ll consider three attributes:
The price of the house, which you want to minimize The size of the house, which you want to maximize The city where the house if located, which you don’t really care about Some houses will be objectively better than others because they will be cheaper and bigger.SQL subquery enumeration
https://maxhalford.github.io/blog/sql-subquery-enumeration/
Mon, 06 May 2019 00:00:00 +0000https://maxhalford.github.io/blog/sql-subquery-enumeration/I recently stumbled on a rather fun problem during my PhD. I wanted to generate all possible subqueries from a given SQL query. In this case an example is easily worth a 1000 thousand words. Take the following SQL query:
SELECT * FROM customers AS c, purchases AS p, shops AS s WHERE p.customer_id = c.id AND p.shop_id = s.id AND c.nationality = 'Swedish' AND c.hair = 'Blond' AND s.city = 'Stockholm' Here all the possible subqueries that can be generated from the above query.Morellet crosses with JavaScript
https://maxhalford.github.io/blog/morellet-crosses-with-javascript/
Sun, 03 Feb 2019 00:00:00 +0000https://maxhalford.github.io/blog/morellet-crosses-with-javascript/The days I’m working on a deep learning project. I hate it but I promised myself to give it a real try. My scripts are taking a long time so I decided to do some procedural art while I waited. This time I’m going to reproduce the following crosses made by François Morellet. I saw them the last I went to the Musée Pompidou with some friends from university. I don’t have any smartphone anymore so one my friends was kind enough to take a few pictures for me, including this one.Streaming groupbys in pandas for big datasets
https://maxhalford.github.io/blog/streaming-groupbys-in-pandas-for-big-datasets/
Wed, 05 Dec 2018 00:00:00 +0000https://maxhalford.github.io/blog/streaming-groupbys-in-pandas-for-big-datasets/If you’ve done a bit of Kaggling then you’ve probably been typing a fair share of df.groupby(some_col). That is, if you’re using Python. If you’re handling tabular data, then a lot of your features will revolve around computing aggregate statistics. This is very true for the ongoing PLAsTiCC Astronomical Classification challenge. The goal of the competition is to classify objects in the sky into one of 14 groups. The bulk of the available is a set of so-called light curve.Target encoding done the right way
https://maxhalford.github.io/blog/target-encoding-done-the-right-way/
Sat, 13 Oct 2018 00:00:00 +0000https://maxhalford.github.io/blog/target-encoding-done-the-right-way/When you’re doing supervised learning you often have to deal with categorical variables. That is, variables which don’t have a natural numerical representation. The problem is that most machine learning algorithms require the input data to be numerical. At some point or another a data science pipeline will require converting categorical variables to numerical variables.
There are many ways to do so:
Label encoding where you choose an arbitrary number for each category One-hot encoding where you create one binary column per category Vector representation a.Stella triangles with JavaScript
https://maxhalford.github.io/blog/stella-triangles-with-javascript/
Thu, 26 Apr 2018 00:00:00 +0000https://maxhalford.github.io/blog/stella-triangles-with-javascript/Around the same time last year I visited the San Francisco Museum of Modern Art. Frank Stella’s compositions really caught my eye. When I saw them I started thinking about how I could write a computer program to imitate his work. In this post I’m going to attempt to reproduce his so-called V Series.
Nice and simple right? Indeed in a lot of his work Frank Stella uses straight lines without much randomness.Unknown pleasures with JavaScript
https://maxhalford.github.io/blog/unknown-pleasures-with-javascript/
Mon, 24 Jul 2017 00:00:00 +0000https://maxhalford.github.io/blog/unknown-pleasures-with-javascript/No this blog post is not about how nice JavaScript can be, instead it’s just another one of my attempts at reproducing modern art with procedural generation and the HTML5 <canvas> element. This time I randomly generated images resembling the cover of the album by Joy Division called “Unknown Pleasures”.
According to Wikipedia, this somewhat iconic album cover is based on radio waves. I saw a poster of it in a bar not long ago and decided to reproduce the next time I had some time to kill.Subsampling a training set to match a test set - Part 1
https://maxhalford.github.io/blog/subsampling-a-training-set-to-match-a-test-set---part-1/
Mon, 19 Jun 2017 00:00:00 +0000https://maxhalford.github.io/blog/subsampling-a-training-set-to-match-a-test-set---part-1/Some friends and I recently qualified for the final of the 2017 edition of the Data Science Game competition. The first part was a Kaggle competition with data provided by Deezer. The problem was a binary classification task where one had to predict if a user was going to listen to a song that was proposed to him. Like many teams we extracted clever features and trained an XGBoost classifier, classic.Halftoning with Go - Part 2
https://maxhalford.github.io/blog/halftoning-with-go---part-2/
Mon, 20 Mar 2017 00:00:00 +0000https://maxhalford.github.io/blog/halftoning-with-go---part-2/The next stop on my travel through the world of halftoning will be the implementation of Weighted Voronoi Stippling as described in Adrian Secord’s 2002 paper. This method is more involved than the ones I detailed in my previous blog post, however the results are quite interesting. Again, I did the implementation in Go.
Notice the black dot in the middle of the white square? Overview I found a fair amount of resources about the method, most of them being implementations of Adrian Secord’s paper.Grid paintings à la Mondrian with JavaScript
https://maxhalford.github.io/blog/grid-paintings-%C3%A0-la-mondrian-with-javascript/
Sat, 04 Mar 2017 00:00:00 +0000https://maxhalford.github.io/blog/grid-paintings-%C3%A0-la-mondrian-with-javascript/I was at a laundrette today and had just finished my book so I had some time to kill. Naturally I devised an algorithm for generating drawings that would resemble the grid-like paintings that Piet Mondrian made famous. With the benefit of hindsight I guess I could indulge in saner activities while waiting for my laundry to dry!
I went through different ideas but in the end I settled on a recursive approach.A short introduction and conclusion to the OpenBikes 2016 Challenge
https://maxhalford.github.io/blog/a-short-introduction-and-conclusion-to-the-openbikes-2016-challenge/
Thu, 26 Jan 2017 00:00:00 +0000https://maxhalford.github.io/blog/a-short-introduction-and-conclusion-to-the-openbikes-2016-challenge/During my undergraduate internship in 2015 I started a side project called OpenBikes. The idea was to visualize and analyze bike sharing over multiple cities. Axel Bellec joined me and in 2016 we won a national open data competition. Since then we haven’t pursued anything major, instead we use OpenBikes to try out technologies and to apply concepts we learn at university and on online.
Before the 2016 summer holidays one of my professors, Aurélien Garivier, mentioned that he was considering using our data for a Kaggle-like competition between some statistics curriculums in France.Halftoning with Go - Part 1
https://maxhalford.github.io/blog/halftoning-with-go---part-1/
Sun, 27 Nov 2016 00:00:00 +0000https://maxhalford.github.io/blog/halftoning-with-go---part-1/Recently I stumbled upon this webpage which shows how to use a TSP solver as a halftoning technique. I began to read about related concepts like dithering and stippling. I don’t have any background in photography but I can appreciate the visual appeal of these techniques. As I understand it these techniques were first invented to save ink for printing. However nowadays printing has become cheaper and the modern use of these technique is mostly aesthetic, at least for images.Recursive polygons with JavaScript
https://maxhalford.github.io/blog/recursive-polygons-with-javascript/
Fri, 25 Mar 2016 00:00:00 +0000https://maxhalford.github.io/blog/recursive-polygons-with-javascript/I like modern art, I enjoy looking at the stuff that was made at the beginning of the 20th century and thinking how it is still shaping today’s style. I’m not an expert, it’s just a hobby of mine. I especially like the Centre Pompidou in Paris, it’s got loads of fascinating stuff. While I was going through the galleries it struck me that some of the paintings were very geometrical.The Naïve Bayes classifier
https://maxhalford.github.io/blog/the-na%C3%AFve-bayes-classifier/
Thu, 10 Sep 2015 00:00:00 +0000https://maxhalford.github.io/blog/the-na%C3%AFve-bayes-classifier/The objective of a classifier is to decide to which class (also called label) to assign an observation based on observed data. In supervised learning, this is done by taking into account previous classifications. In other words if we know that certain observations are classified in a certain way, the goal is to determine the class of a new observation. The first group of observations on which the classifier is built is called the training set.An introduction to genetic algorithms
https://maxhalford.github.io/blog/an-introduction-to-genetic-algorithms/
Sun, 02 Aug 2015 00:00:00 +0000https://maxhalford.github.io/blog/an-introduction-to-genetic-algorithms/The goal of genetic algorithms (GAs) is to solve problems whose solutions are not easily found (ie. NP problems, nonlinear optimization, etc.). For example, finding the shortest path from A to B in a directed graph is easily done with Djikstra’s algorithm, it can be solved in polynomial time. However the time to find the smallest path that joins all points on a non-directed graph, also known as the Travelling Salesman Problem (TSP) increases exponentially as the number of points increases.Setting up a droplet to host a Flask app
https://maxhalford.github.io/blog/setting-up-a-droplet-to-host-a-flask-app/
Tue, 14 Jul 2015 00:00:00 +0000https://maxhalford.github.io/blog/setting-up-a-droplet-to-host-a-flask-app/After having worked for some weeks on the OpenBikes website, it was time to put it online. Digital Ocean seemed to provide a good service and so I decided to give it a spin. Their documentation is quite good but it doesn’t cover exactly everything for setting up Flask. In this post I simply want to record every single step I took.
OpenBikes is a project with a Flask backend and a few upstart jobs.Visualizing bike stations live data
https://maxhalford.github.io/blog/visualizing-bike-stations-live-data/
Wed, 03 Jun 2015 00:00:00 +0000https://maxhalford.github.io/blog/visualizing-bike-stations-live-data/Recently some friends and I decided to launch openbikes.co, a website for visualizing (and later on analyzing) urban bike traffic. We have a lot of ideas that we will progressively implement. Anyway, the point is that all of it started with me fiddling about with the JCDecaux API and the leaflet.js library and I would like to share it with you. Shall we?
Presentation In this post I want to show you the tools and the code to get a fully functional website for visualizing live data.