A fair amount of cutting-edge machine learning models, including deep networks, are based on the idea of stochastic learning. In a stochastic environment, a finite training set is assumed to be at one’s disposition. An example is sampled at random from the training set at every step in time. The example is passed through the model, the prediction is compared to the ground truth, and finally the error gradient is used to update the model’s parameters. Typically, this process is reproduced for each element of the training set, and is repeated for several epochs until a (heuristic) stopping criterion is met.

The same mathematics underpin both stochastic machine learning and online machine learning. Therefore, both concepts are rarely distinguished from each other. However, I would like to argue (and remind) that online machine learning supersedes stochastic gradient descent in several ways. Indeed, it brings several additional principles to the table that modify the way we think about applying machine learning. In online machine learning, emphasis is put on the fact that models keep learning once they have been deployed. In particular, good online models are designed to deal with concept drift. They sometimes provide guarantees that their performance will remain stable through time. It is also assumed that new features might appear on-the-fly. Likewise, some features might not be available whenever a prediction has to be made. Meanwhile, some concepts simply do not apply. For instance, overfitting, convergence, and epochs do not make sense in an online machine learning scenario. Indeed, the data is assumed to be infinite, whilst its distribution and difficulty are likely to vary through time.

Online machine learning is therefore not just a family of algorithms, it is a different line of thought where deployment and real-world concerns are treated as first-class citizens. This distinction has been mentioned by other researchers than myself, such as Léon Bottou. Its important to call a spade a spade, which is what Patrick Winston called the Rumpelstiltskin Principle. I believe this is especially relevant for topics that are not so much under the limelight.