Neural Nets
TUTORIAL ON MACHINE LEARNING AND NEURAL NETWORKS
<btn>hello</btn>
Let's just dive right in... Below I've embedded a neural network classifier rendered using TensorflowPlayground. There are a variety of knobs and buttons on the interface; as we move along, more of these options will become available. Don't worry though, all these will be explained in detail, in due time. For now though, let's define our primary goal throughout this tutorial: categorization
Our primary task is to train neural nets to classify items into categories, based on some limited information. Like fruit or vegetable; or undergrad major; or Alzheimer's Disease patient (CASE) or control participant (CTRL). In the Tensorflow playground below, you can see a bunch of orange and blue dots. Instead of simply thinking about these as dots or arbitrary spatial coordinates, I think it will be helpful to think of these as representing people in a clinical study. Let's define the blue dots are patients from the CASE group, and orange dots are CTRL participants. Now what is our 'limited information' about them? Let's say we have collected information about their age and their score on a cognitive exam. So the first thing we'd probably want to do is make a scatter plot of these two variables. Let's define their respective dimensions on the plot axes as:
- x-axis | dim1 | current age (AGE)
- y-axis | dim2 | exam score (SCORE)
Notice that after plotting, the dots seem to form clusters. That very promising! If you were asked to draw a line on this plane, to separate these two clusters, it could be easily done. Our brain's neural nets have already solved the the spatial problem. Now let's see if an artificial neural net can solve the same problem.
Go ahead and click the blue start button below; let it run for about 500 epochs (~5 seconds), then click pause.
{{#widget:Tensorflow1}}
Finished?
How'd it do? Is one neuron with input from a single feature (the dim1 data: AGE) performing well in the separation task? If so, an orange-colored background should have formed behind the orange dots, while a blue-colored background should have formed behind the blue dots. This colored surface gradient can be understood as the neural network's prediction value at that given coordinate. We will explore prediction values in more detail later on in the tutorial. First let's take a look at what the neural net is taking as inputs.
Inputs
Take a close look at the input options in Figure-1 on the right. There are a bunch of X variables with subscripts and superscripts, and next to each is a box with various color gradients. In keeping with the example above, here are what those symbols represent to us...
X1 | AGE | |
X2 | SCORE | |
X12 | AGE2 | |
X22 | SCORE2 | |
X1X2 | AGE × SCORE | |
sin(X1) | sin(AGE) | |
sin(X2) | sin(SCORE) |
These are parsed such that subscripts (X1 , X2 ,... Xi ) represent each predictor variable, like AGE and SCORE. As you can see, the first two input options X1 and X2 are just XAGE and XSCORE.
Note that since X1 is plotted on the x-axis, it has a vertical color gradient; since X2 is plotted on the y-axis, it has a horizontal gradient. This might seem backwards, so to clarify: if the only thing we know about these study participants is their AGE, X1, we can only make a 1-D plot with each person's age along the x-axis, such that [ x = AGEi , y = 0 ].
If you take a look at Figure-2, it should be clear that when information is collapsed onto its single dimension and plotted along the x-axis, the best line we can draw to separate the dim-1 data will be orthogonal to the x-axis. In plain terms, if given a pencil, and asked to separate the orange and blue dots plotted near the number line, it'd best be a vertical line!
When the neural net only gets input about a single feature of each person in the dataset, its synaptic weights will only adapt output along that one dimension. Thus, if for example the network sees that a person is 3 years above the dataset average (considering the data has been mean deviated and centered), it won't matter what that person's cognitive SCORE was (since the neural net doesn't have access to that info), the neural net will always make the same guess for anyone +3 years above average. Hence why color is constant above x=3 y=anything.
Outputs
More directly, it is the value spit-out by the activation function of the 'output layer'. Here, since we only have a single layer, our hidden 'hidden layer' and 'output layer' are one in the same. The output function of our neuron is known as the tanh function.
The tanh function is an extremely common choice for an output function in artificial neural network machine learning frameworks because it yields a nice sigmoid shape, and no matter the magnitude of its inputs, the output from the tanh function is bounded between { 0 : 1}. These are very desirable properties for neural net nodes. Here you see the tanh function evaluated across various x-dim inputs...
Tanh produces a sigmoid output over the range {-2 : 2}, and automatically evaluates to exact values when its argument is the natural logarithm. Speaking of the natural log, that is another very common choice of output function for the same reasons as tanh.
For now, let's not belabor the point that our neuron (and in going forward, all our neurons) are using the tanh function. Maybe just keep this in mind if you're wondering what sorts of numbers are travelling along the axons of these neurons, and ultimately those colored gradients underneath the dots.
This tutorial continues on the next page. Don't worry about playing around too much with the TensorFlow GUI, there will be plenty of that on the next page, and those that follow.