Neural Nets: Difference between revisions

From bradwiki
Jump to navigation Jump to search
No edit summary
 
(33 intermediate revisions by 2 users not shown)
Line 1: Line 1:
<big>TUTORIAL ON MACHINE LEARNING AND NEURAL NETWORKS</big>{{SmallBox|float=right|clear=none|width=170px|font-size=13px|Tutorial Pages|txt-size=11px|
{{SmallBox|float=right|clear=none|margin=0px 0px 8px 18px|width=170px|font-size=13px|Tutorial Pages|txt-size=11px|
1. [[Neural Nets|Tensorflow Intro]]<br>
1. [[Neural Nets|Intro]]<br>
2. [[Neural Nets 2|Neural Net Classifiers]]<br>
2. [[Neural Nets 2|Network Inputs]]<br>
3. [[Neural Nets 3|DIY Machine Learning]]<br>
3. [[Neural Nets 3|Network Activation]]<br>
4. [[Neural Nets 4|PCA &nbsp; t-SNE]]<br>
4. [[Neural Nets 4|Network Outputs]]<br>
}}
}}
Below there is an embedded a neural network classifier, rendered using [https://www.tensorflow.org Tensorflow][http://playground.tensorflow.org Playground]. There are a variety of knobs and buttons on the interface; as we move along, more of these options will become available. Don't worry though, all these will be explained in detail, in due time. For now though, let's define our primary goal throughout this tutorial: '''classification'''.


Let's just dive right in... Below I've embedded a neural network classifier rendered using [https://www.tensorflow.org Tensorflow][http://playground.tensorflow.org Playground]. There are a variety of knobs and buttons on the interface; as we move along, more of these options will become available. Don't worry though, all these will be explained in detail, in due time. For now though, let's define our primary goal throughout this tutorial: '''categorization'''
The primary task is to train neural nets to classify items into categories, based on some limited information. Like fruit or vegetable; or undergrad major; or Alzheimer's Disease patient (CASE) or control participant (CTRL). In the Tensorflow playground below, you can see a bunch of orange and blue dots. Instead of simply thinking about these as dots at arbitrary spatial coordinates, it will be helpful to think of these as representing people in a clinical study. Let's define the blue dots are patients from the CASE group, and orange dots are CTRL participants. What is our 'limited information' about them? Let's say we have collected information about their age and their score on a dementia screening exam (scores represent number of items forgotten). So the first thing we'd probably want to do is make a scatter plot of these two variables. Let's define their respective dimensions on the plot axes as:
 
Our primary task is to train neural nets to classify items into categories, based on some limited information. Like fruit or vegetable; or undergrad major; or Alzheimer's Disease patient (CASE) or control participant (CTRL). In the Tensorflow playground below, you can see a bunch of orange and blue dots. Instead of simply thinking about these as dots or arbitrary spatial coordinates, I think it will be helpful to think of these as representing people in a clinical study. Let's define the blue dots are patients from the CASE group, and orange dots are CTRL participants. Now what is our 'limited information' about them? Let's say we have collected information about their age and their score on a cognitive exam. So the first thing we'd probably want to do is make a scatter plot of these two variables. Let's define their respective dimensions on the plot axes as:


* x-axis | dim1 | current age ('''''AGE''''')
* x-axis | dim1 | current age ('''''AGE''''')
Line 15: Line 14:
Notice that after plotting, the dots seem to form clusters. That very promising! If you were asked to draw a line on this plane, to separate these two clusters, it could be easily done. Our brain's neural nets have already solved the the spatial problem. Now let's see if an artificial neural net can solve the same problem.  
Notice that after plotting, the dots seem to form clusters. That very promising! If you were asked to draw a line on this plane, to separate these two clusters, it could be easily done. Our brain's neural nets have already solved the the spatial problem. Now let's see if an artificial neural net can solve the same problem.  


Go ahead and click the blue ''start'' button below; let it run for about 500 epochs (~5 seconds), then click pause.
Click the blue ''start'' button below; let it run for about 500 epochs (~5 seconds), then click pause.
 
{{Clear}}
 
===Tensorflow Playground===
 
<iframe key="tf" path="#activation=tanh&regularization=L1&batchSize=15&dataset=gauss&regDataset=reg-plane&learningRate=0.003&regularizationRate=0.001&noise=0&networkShape=1&seed=0.73576&showTestData=false&discretize=false&percTrainData=40&x=true&y=false&xTimesY=false&xSquared=false&ySquared=false&cosX=false&sinX=false&cosY=false&sinY=false&collectStats=false&problem=classification&initZero=false&hideText=false&showTestData_hide=true&stepButton_hide=true&problem_hide=true&noise_hide=true&discretize_hide=true&regularization_hide=true&dataset_hide=true&batchSize_hide=true&percTrainData_hide=true&regularizationRate_hide=true&learningRate_hide=true&numHiddenLayers_hide=true" />
 
 
 


{{Clear}}


{{#widget:Tensorflow1}}
Is one neuron with input from ''a single feature'' (the dim1 data: AGE) performing well in the separation task? If so, an orange-colored background should have formed behind the orange dots, while a blue-colored background should have formed behind the blue dots. This colored surface gradient can be understood as the neural network's prediction value at that given coordinate. We will explore prediction values in more detail later on in the tutorial. First let's take a look at what the neural net is taking as inputs.
<br><br><br>
<big>'''''Finished?'''''</big>


How'd it do? Is one neuron with input from ''a single feature'' (the dim1 data: AGE) performing well in the separation task? If so, an orange-colored background should have formed behind the orange dots, while a blue-colored background should have formed behind the blue dots. This colored surface gradient can be understood as the neural network's prediction value at that given coordinate. We will explore prediction values in more detail later on in the tutorial. First let's take a look at what the neural net is taking as inputs.
{{Clear}}


<br>
===Inputs===
===Inputs===
----
----
Line 31: Line 37:
|clear=none  
|clear=none  
|width=250px
|width=250px
|margin=5px -10% 5px auto
|margin=5px 2%
|border-width=2px  
|border-width=2px  
|border-radius=2px
|border-radius=2px
|[[File: Neural Net Features.png|250px]]
|[[File: Neural Net Features.png|250px]]
| Figure 1
}}
}}


Take a close look at the input options in the figure on the right. There are a bunch of X variables with subscripts and superscripts, next to boxes with various color gradients. These are parsed such that subscripts (''X''<sub>1</sub> , ''X''<sub>2</sub> ,... ''X''<sub>i</sub> ) represent each predictor variable, like AGE and SCORE. So, as you can see, the first two input options ''X''<sub>1</sub> and ''X''<sub>2</sub> are just ''X''<sub>AGE</sub> and ''X''<sub>SCORE</sub>.  
To assess the performance, take a look at the input options in ''Figure-1''. There are a bunch of ''X'' variables with subscripts and superscripts, and next to each is a box with various color gradients. For now, let's focus on just two of those symbols, and what they mean to us...


Note that since ''X''<sub>1</sub> is plotted on the x-axis, it has a vertical color gradient; since ''X''<sub>2</sub> is plotted on the y-axis, it has a horizontal gradient. This might seem backwards, so to clarify: if the only thing we know about these study participants is their AGE, ''X''<sub>1</sub>, we can plot each person's age along the x-axis on a flat 1-D line, such that [ x = ''AGE''<sub>i</sub> , y = 0 ].
{| class="wikitable" width=30% align=center  
 
 
{| class="wikitable" width=40% align=center  
|+ style="font-weight:bold;"|Input Features
|+ style="font-weight:bold;"|Input Features
|- style="height:35px"
|- style="height:30px"
| style="background:#fcf9f2; border:1px solid #ffffff"| X<sub>1</sub>  
| style="background:#f7f7f7; border:3px solid #ffffff"| ''X''<sub>1</sub>  
|colspan=2 style="background:#fcf9f2; border:1px solid #ffffff"| AGE
|colspan=2 style="background:#f7f7f7; border:3px solid #ffffff"| AGE
|- style="height:35px"
|- style="height:30px"
| style="background:#fcf9f2; border:1px solid #ffffff"| X<sub>2</sub>   
| style="background:#f7f7f7; border:3px solid #ffffff"| ''X''<sub>2</sub>   
|colspan=2 style="background:#fcf9f2; border:1px solid #ffffff"|  SCORE
|colspan=2 style="background:#f7f7f7; border:3px solid #ffffff"|  SCORE
|- style="height:35px"
| style="background:#fcf9f2; border:1px solid #ffffff"| X<sub>1</sub><sup>2</sup>
|colspan=2 style="background:#fcf9f2; border:1px solid #ffffff"|  AGE<sup>2</sup>
|- style="height:35px"
| style="background:#fcf9f2; border:1px solid #ffffff"| X<sub>2</sub><sup>2</sup>
|colspan=2 style="background:#fcf9f2; border:1px solid #ffffff"|  SCORE<sup>2</sup>
|- style="height:35px"
| style="background:#fcf9f2; border:1px solid #ffffff"| X<sub>1</sub>X<sub>2</sub>
|colspan=2 style="background:#fcf9f2; border:1px solid #ffffff"|  AGE × SCORE
|- style="height:35px"
| style="background:#fcf9f2; border:1px solid #ffffff"| sin(X<sub>1</sub>)
|colspan=2 style="background:#fcf9f2; border:1px solid #ffffff"|  sin(AGE)
|- style="height:35px"
| style="background:#fcf9f2; border:1px solid #ffffff"| sin(X<sub>2</sub>)
|colspan=2 style="background:#fcf9f2; border:1px solid #ffffff"|  sin(SCORE)
|}
|}


These are parsed such that subscripts (''X''<sub>1</sub> , ''X''<sub>2</sub> ,... ''X''<sub>i</sub> ) represent each predictor variable, like AGE and SCORE. As you can see, the first two input options ''X''<sub>1</sub> and ''X''<sub>2</sub> are just ''X''<sub>AGE</sub> and ''X''<sub>SCORE</sub>. Note that since ''X''<sub>1</sub> is plotted on the x-axis, it has a color gradient that changes horizontally, but is constant in the vertical dimension. Conversely the ''X''<sub>2</sub> feature plotted on the y-axis has a vertical color gradient. To clarify why this happens...


If the only thing we know about these study participants is their AGE, ''X''<sub>1</sub>, we can only make a 1-D plot with each person's age along the x-axis, such that [ x = ''AGE''<sub>i</sub> , y = 0 ]. If you take a look at Figure-2, it should be clear that when information is collapsed onto its single dimension and plotted along the x-axis, the best line we can draw to separate the dim-1 data will be orthogonal to the x-axis (a vertical line). As you move horizontally along the x-axis your categorical guess will likely change, along with the confidence in that guess, which is precisely what is being represented by the color gradient. On the other hand, knowing nothing about exam score, moving up and down on the y-axis will have no effect on your decision, which is why color is constant in the y-dimension.


[[File:NN NumberLine.png|thumb|left|300px]]
When the neural net only gets input about a single feature of each person in the dataset, its synaptic weights will only adapt output along that one dimension. Thus, if for example the network sees that a person is 3 years above the dataset average (considering the data has been ''mean deviated'' and centered), it won't matter what that person's cognitive SCORE was (since the neural net doesn't have access to that info), the network will always make the same guess for anyone 3 years above average age. This is why color is constant at ''x''=3 for any ''y'' value.


{{Clear}}


{{SmallBox|display=block
|float=left
|clear=none
|width=420px
|margin=15px 5%
|border-width=2px
|border-radius=2px
|[[File:NN NumberLine.png|400px]]
| Figure 2
}}


{{Clear}}


This isn't a shortcoming of having just one single neuron in the entire network. You could add as many neurons and layers as you want (go ahead and try it)...... if the network only gets input about one feature dimension, the output will be the same, whether there is 1 neuron, or 1 billion. To realize this fact, pretend you can only see the dots as they are plotted in along the number line in 1D (in Figure 2); if we were unable to see the 2D cluster clouds above that line, the billions of neurons in our brain would tell us to draw the classification line in basically the same place as that one single neuron in our artificial neural net. This is a very interesting concept worth noting: neural net classifiers can fail for two very different reasons.


<br><br><br>
(1) The neural network itself might be ill-formulated in such a way that, no matter how much information you provide, it cannot seem to learn to solve the classification problem. (2) On the other hand, you might have implemented an apposite deep neural network; yet if the input data is insufficient to solve the classification problem, it will appear to you that this potentially very good neural network performs like garbage. <br>
===Outputs===
----


More directly, it is the value spit-out by the activation function of the 'output layer'. Here, since we only have a single layer, our hidden 'hidden layer' and 'output layer' are one in the same. The output function of our neuron is known as the '''tanh''' function.  
With that said, there are ways to help prevent that later scenario from happening. These involve doing things like you see for the rest of the input features. The next page will discuss the full set of possible network inputs we have here, which includes...


The tanh function is an extremely common choice for an output function in artificial neural network machine learning frameworks because it yields a nice sigmoid shape, and no matter the magnitude of its inputs, the output from the tanh function is bounded between { 0 : 1}. These are very desirable properties for neural net nodes. Here you see the tanh function evaluated across various x-dim inputs...


<br><br><br><br>
{{Clear}}
[[File: Tanh.png|thumb|500px|left|see [http://reference.wolfram.com/language/ref/Tanh.html tanh on wolfram alpha] for many details about tanh function.]]
{{Clear}}
{{Clear}}


Tanh produces a sigmoid output over the range {-2 : 2}, and automatically evaluates to exact values when its argument is the natural logarithm. Speaking of the natural log, that is another very common choice of output function for the same reasons as tanh.
{| class="wikitable" width=30% align=center
|+ style="font-weight:bold;"|Input Features
|- style="height:30px"
| style="background:#f7f7f7; border:3px solid #ffffff"| ''X''<sub>1</sub>
|colspan=2 style="background:#f7f7f7; border:3px solid #ffffff"| AGE
|- style="height:30px"
| style="background:#f7f7f7; border:3px solid #ffffff"| ''X''<sub>2</sub> 
|colspan=2 style="background:#f7f7f7; border:3px solid #ffffff"|  SCORE
|- style="height:30px"
| style="background:#f7f7f7; border:3px solid #ffffff"| ''X''<sub>1</sub><sup>2</sup>
|colspan=2 style="background:#f7f7f7; border:3px solid #ffffff"|  AGE<sup>2</sup>
|- style="height:30px"
| style="background:#f7f7f7; border:3px solid #ffffff"| ''X''<sub>2</sub><sup>2</sup>
|colspan=2 style="background:#f7f7f7; border:3px solid #ffffff"|  SCORE<sup>2</sup>
|- style="height:30px"
| style="background:#f7f7f7; border:3px solid #ffffff"| ''X''<sub>1</sub>''X''<sub>2</sub>
|colspan=2 style="background:#f7f7f7; border:3px solid #ffffff"|  AGE × SCORE
|- style="height:30px"
| style="background:#f7f7f7; border:3px solid #ffffff"| sin(''X''<sub>1</sub>)
|colspan=2 style="background:#f7f7f7; border:3px solid #ffffff"|  sin(AGE)
|- style="height:30px"
| style="background:#f7f7f7; border:3px solid #ffffff"| sin(''X''<sub>2</sub>)
|colspan=2 style="background:#f7f7f7; border:3px solid #ffffff"|  sin(SCORE)
|}


For now, let's not belabor the point that our neuron (and in going forward, all our neurons) are using the tanh function. Maybe just keep this in mind if you're wondering what sorts of numbers are travelling along the axons of these neurons, and ultimately those colored gradients underneath the dots.


This tutorial continues on the next page. Don't worry about playing around too much with the TensorFlow GUI, there will be plenty of that on the next page, and those that follow.
{{Clear}}
{{SmallBox|'''[[Neural Nets 2|Continue to Neural Nets Tutorial Page 2]]'''}}


<br>
<!-- <btn data-toggle="tooltip">Neural Nets 2</btn> -->
{{SmallBox|'''[[Neural Nets 2|Continue to Neural Nets Tutorial Page 2]]'''}}

Latest revision as of 05:20, 4 May 2020

Tutorial Pages

Below there is an embedded a neural network classifier, rendered using TensorflowPlayground. There are a variety of knobs and buttons on the interface; as we move along, more of these options will become available. Don't worry though, all these will be explained in detail, in due time. For now though, let's define our primary goal throughout this tutorial: classification.

The primary task is to train neural nets to classify items into categories, based on some limited information. Like fruit or vegetable; or undergrad major; or Alzheimer's Disease patient (CASE) or control participant (CTRL). In the Tensorflow playground below, you can see a bunch of orange and blue dots. Instead of simply thinking about these as dots at arbitrary spatial coordinates, it will be helpful to think of these as representing people in a clinical study. Let's define the blue dots are patients from the CASE group, and orange dots are CTRL participants. What is our 'limited information' about them? Let's say we have collected information about their age and their score on a dementia screening exam (scores represent number of items forgotten). So the first thing we'd probably want to do is make a scatter plot of these two variables. Let's define their respective dimensions on the plot axes as:

  • x-axis | dim1 | current age (AGE)
  • y-axis | dim2 | exam score (SCORE)

Notice that after plotting, the dots seem to form clusters. That very promising! If you were asked to draw a line on this plane, to separate these two clusters, it could be easily done. Our brain's neural nets have already solved the the spatial problem. Now let's see if an artificial neural net can solve the same problem.

Click the blue start button below; let it run for about 500 epochs (~5 seconds), then click pause.

Tensorflow Playground

<iframe key="tf" path="#activation=tanh&regularization=L1&batchSize=15&dataset=gauss&regDataset=reg-plane&learningRate=0.003&regularizationRate=0.001&noise=0&networkShape=1&seed=0.73576&showTestData=false&discretize=false&percTrainData=40&x=true&y=false&xTimesY=false&xSquared=false&ySquared=false&cosX=false&sinX=false&cosY=false&sinY=false&collectStats=false&problem=classification&initZero=false&hideText=false&showTestData_hide=true&stepButton_hide=true&problem_hide=true&noise_hide=true&discretize_hide=true&regularization_hide=true&dataset_hide=true&batchSize_hide=true&percTrainData_hide=true&regularizationRate_hide=true&learningRate_hide=true&numHiddenLayers_hide=true" />



Is one neuron with input from a single feature (the dim1 data: AGE) performing well in the separation task? If so, an orange-colored background should have formed behind the orange dots, while a blue-colored background should have formed behind the blue dots. This colored surface gradient can be understood as the neural network's prediction value at that given coordinate. We will explore prediction values in more detail later on in the tutorial. First let's take a look at what the neural net is taking as inputs.

Inputs


Error creating thumbnail: File missing
Figure 1


To assess the performance, take a look at the input options in Figure-1. There are a bunch of X variables with subscripts and superscripts, and next to each is a box with various color gradients. For now, let's focus on just two of those symbols, and what they mean to us...

Input Features
X1 AGE
X2 SCORE

These are parsed such that subscripts (X1 , X2 ,... Xi ) represent each predictor variable, like AGE and SCORE. As you can see, the first two input options X1 and X2 are just XAGE and XSCORE. Note that since X1 is plotted on the x-axis, it has a color gradient that changes horizontally, but is constant in the vertical dimension. Conversely the X2 feature plotted on the y-axis has a vertical color gradient. To clarify why this happens...

If the only thing we know about these study participants is their AGE, X1, we can only make a 1-D plot with each person's age along the x-axis, such that [ x = AGEi , y = 0 ]. If you take a look at Figure-2, it should be clear that when information is collapsed onto its single dimension and plotted along the x-axis, the best line we can draw to separate the dim-1 data will be orthogonal to the x-axis (a vertical line). As you move horizontally along the x-axis your categorical guess will likely change, along with the confidence in that guess, which is precisely what is being represented by the color gradient. On the other hand, knowing nothing about exam score, moving up and down on the y-axis will have no effect on your decision, which is why color is constant in the y-dimension.

When the neural net only gets input about a single feature of each person in the dataset, its synaptic weights will only adapt output along that one dimension. Thus, if for example the network sees that a person is 3 years above the dataset average (considering the data has been mean deviated and centered), it won't matter what that person's cognitive SCORE was (since the neural net doesn't have access to that info), the network will always make the same guess for anyone 3 years above average age. This is why color is constant at x=3 for any y value.

Error creating thumbnail: File missing
Figure 2


This isn't a shortcoming of having just one single neuron in the entire network. You could add as many neurons and layers as you want (go ahead and try it)...... if the network only gets input about one feature dimension, the output will be the same, whether there is 1 neuron, or 1 billion. To realize this fact, pretend you can only see the dots as they are plotted in along the number line in 1D (in Figure 2); if we were unable to see the 2D cluster clouds above that line, the billions of neurons in our brain would tell us to draw the classification line in basically the same place as that one single neuron in our artificial neural net. This is a very interesting concept worth noting: neural net classifiers can fail for two very different reasons.

(1) The neural network itself might be ill-formulated in such a way that, no matter how much information you provide, it cannot seem to learn to solve the classification problem. (2) On the other hand, you might have implemented an apposite deep neural network; yet if the input data is insufficient to solve the classification problem, it will appear to you that this potentially very good neural network performs like garbage.

With that said, there are ways to help prevent that later scenario from happening. These involve doing things like you see for the rest of the input features. The next page will discuss the full set of possible network inputs we have here, which includes...


Input Features
X1 AGE
X2 SCORE
X12 AGE2
X22 SCORE2
X1X2 AGE × SCORE
sin(X1) sin(AGE)
sin(X2) sin(SCORE)


Continue to Neural Nets Tutorial Page 2