Machine Learning Without Code

Posted on Wed 15 February 2017 in Blog

I've grown increasingly frustrated with the way the popular press talks about Machine Learning and Artificial Intelligence.

Too many articles talk about Machine Learning (ML) as a sci-fi black box capable of doing and learning anything, and leave the reader with a poorer understanding of what ML is than had they not read the article. I'm especially tired of reading headlines that start with "Google's AI Learns ...", as if they have one big machine that does everything and learns autonomously like a human.

I think it's possible for somebody with no background in math or programming to speak intelligently about the subject, and to understand the uses and limitations of Machine Learning.

A technical note: ML is usually divided into three branches, the most common of which is supervised learning¹. In this blog post I will only discuss supervised learning, and to keep things simple I'll use the phrase Machine Learning/ML to mean just that one branch.

The TL;DR of Machine Learning & Artificial Intelligence

Machine Learning is a tool for building software systems where decision-making rules are discovered from automated processing of data rather than being explicitly programmed. If I thought any one sentence would sufficiently explain ML to the uninitiated I wouldn't be writing this blog post, so please read the more in-depth explanation and examples below.

As for Artificial Intelligence - unfortunately, these days people tend to use the phrases AI and ML interchangeably.

To me, Artificial Intelligence is set of goals and Machine Learning is a set of methods. These days the cutting-edge breakthroughs in AI tend to be achieved using ML methods (especially deep neural networks), but it doesn't have to be that way. Deep Blue was programmed without any Machine Learning.

Otto the Fruit Separator

Let's say we manage a fruit processing plant. We hire a new employee, and want him to sit next to a conveyor belt of fruit and separate out the apples and the oranges. His name is Otto. Otto is a very simple guy - he has never seen an apple or an orange, and has absolutely no common sense or ability to make inferences. Otto will do exactly what you tell him quickly and efficiently, but nothing more - which is why he is named Otto (pronounced "auto" - get it?).

Otto Otto

My metaphor was not very subtle, but to be clear - my description of Otto also describes exactly what a computer is. Please keep that in mind whenever people talk about computers "learning" things - after reading this post you should understand a bit more about what it means for ML algorithms to "learn" or "be trained".

Broadly speaking, there are two ways to teach Otto how to separate apples and oranges: the heuristic way or the Machine Learning way.

Apples vs. Oranges by Heuristics

A heuristic is a rule of thumb, a guideline that seems to work in general but isn't necessarily proven scientifically to be ideal in every case.

We, as managers of a fruit processing plant, know a lot about what apples and oranges look like. So we can just give Otto detailed instructions on how to decide whether a fruit is an apple or an orange, and if we write these instructions wisely then Otto should be able to separate the fruit easily.

We set up a scanner facing the conveyor belt that scans each incoming fruit. This scanner outputs to a screen the dominant color of the fruit. We then give him a sheet of paper explaining what to do in each case.

Color	Bucket To Throw Fruit Into
Red	Apple
Orange	Orange
Green	Apple
Anything Else	Unknown / Mystery Fruit

There, that was easy. Since Otto doesn't have anything to figure out, he should be able to do a good job from this point on... unless some prankster puts a red bell pepper on the apple vs. orange conveyor belt and makes poor Otto look bad.

Apples vs. Oranges by Machine Learning

The Machine Learning approach is quite different. We still give Otto his scanner and screen that displays the fruit's color - but instead of giving Otto the instructions on how to decide, we instead spend his first week at work sending fruit down the conveyor belt that we've laboriously hand-labeled with stickers reading "apple" or "orange". In Machine Learning we call these labeled fruit the training data.

We've also given him a sheet of paper with a table like this:

Color	Apples	Oranges	Unknown
Red
Orange
Green
etc.

Along with a set of instructions on what to do with the labeled fruit. Specifically, for each labeled fruit he should make a tick mark in the cell of the table corresponding to the color of the fruit and its label. After telling Otto to add up the tick marks (Otto is fantastic at arithmetic when the problem is very clear), it might look like:

Color	Apples	Oranges	Unknown
Red	189	21	1
Orange	21	275	3
Green	50	13	5
Black	1	2	10

We should expect that not all fruits identified as green will be apples - because some unripe oranges are green, and sometimes we'll accidentally place the wrong label on a fruit, and sometimes the scanner will malfunction and output the wrong color.

From there, Otto should go row by row and note which cell had the highest number per row, put an X in that cell, and erase everything else. After a bit of cleanup it might look like:

Color	Apples	Oranges	Unknown
Red	X
Orange		X
Green	X
Black			X

At this point we've helped Otto create his own set of rules, like the instructions we gave him when separating fruit by heuristics. He can use this table once his training period is over and we start running non-labeled fruit on the conveyor belt. If the incoming fruit is green, for example, he would see that the table has an X for apples in the green row, and throw the new fruit in the apple bucket.

While this was the simplest algorithm I could think of that would still count as ML, there exist many types of ML algorithms. Most of them are far more complex and powerful than this one - but what they all have in common is that they turn a human-supplied algorithm and human-supplied data into computer-generated rules on which to make decisions.

Machine Learning Terms

Before we continue, a few terms

training data - the labeled examples that the software processes in order to generate rules for new predictions. These were the fruit we put labels on.

model - the rules that the computer will generate after processing the training data. This is like the sheet that Otto made after seeing the labeled fruit.

test data - data that wasn't used for training. This can either refer to a labeled data set that's used to understand how good the model is, or for unlabeled data the model will have to process once it goes to work. The fruit that Otto processed without labels were his test data.

feature - an aspect of an object that we're informing the ML algorithm of. The only feature that Otto saw was color, but we could've used weight, shape, smell, etc.

Comparing ML vs. Heuristics

The ML method seemed more difficult than the heuristic-based method - and in this case it was almost certainly overkill. Why would we ever use Machine Learning over heuristic methods?

Because many problems are harder for humans to write heuristics for. In many cases the complexity of the heuristics would be overwhelming - like helping a self-driving car decide if an image of the road ahead contains a pedestrian. In others the signal is not as complicated, but is obscured by a lot of noise - like predicting which people on a website will click on an ad. A human could spend a lifetime studying online advertising data in order to figure it out - or we could "train" a computer on terabytes of data in a few hours and let it separate the signal from the noise.

Regardless of whether Machine Learning was a good choice for this example, it is clear that it differs from the heuristic method in key ways. When decisions are made by heuristics, the computer programmer writes out a series of criteria on which the software (or Otto) should make a decision when it sees new data. When Machine Learning is used, however, we're giving the software a detailed method it can use to turn a set of labeled training data into criteria on which to make decisions. A key part of this is finding or creating a good training data set.

The software is "learning" only in the ways that we've explicitly told it to learn, and only from examples that we've given it.

Machine Learning Challenges

If you work with people who build Machine Learning systems, it's important that you understand the challenges involved in building useful ML systems.

Training data must resemble the test data

Imagine that our labeled fruit includes only green apples, but when Otto's training period is over he's exposed to both green and red apples. He would probably start claiming that red apples are a type of orange, and the person responsible for that failure won't be Otto - it will be whoever put together his training data.

Alternatively, what if he's trained on apples and oranges but a red bell pepper ends up on the conveyor belt after training. We can't blame Otto for not knowing that it's not an apple. But that's not just an issue of having comparable train/test sets, it's also an issue because we told Otto to only care about color.

Feature selection

Feature selection is how the human programmer decides what types of information about fruit the ML algorithm should be exposed to.

In the example above the only feature was color. But this will fail miserably once we add red, green and orange peppers to the conveyor belt - we would need to use additional features (density might work) in order to train Otto to differentiate new types of fruit.

Finding or creating labeled training data

When training Otto we had to put stickers on hundreds of apples and oranges by hand. That sounds very boring and time-consuming, doesn't it?

Imagine that we were building an ML-based computer vision system to detect pedestrians in a roadway. Such systems often require huge amounts of training data to work well, and we can't automate the labeling - because if labeling was already automated, we wouldn't need to build this software. A lot of money is spent on paying people to build good training data sets.

For many applications, simple ML models trained on large amounts of good data tend to be more effective than more sophisticated models trained on worse data.

Take-Aways

I hope you come away from this blog post understanding that Machine Learning is not a silver bullet, and it's not sci-fi style Artificial Intelligence. ML systems are capable of "learning" in only limited ways, and anthropomorphizing Machine Learning can be misleading.

It's an interesting and powerful way to solve many problems. It doesn't take much math and programming skill to start building ML models, and if your interest has been piqued there are many high quality online classes².

[1] the other two being unsupervised learning (understanding the structure of unlabeled data, like clustering customers based on their interests) and reinforcement learning (figuring out how sequences of actions lead to good and bad outcomes, like in a video game).

[2] For a course that uses R, start with this one from Stanford.