I have a neural network with a lot of inputs, and i want to train it to realise that only 1 of the inputs matter. First i train it with input[1]=1 and given result 10
then i train with exact same inputs except input[1] = 0 and given result being 0.
I train them until the error is 0 before i switch to the other one, but they just keep changing different weights up and down till the output is equal to the given result, they never figure out that only the weights related to input[1] needs to be concerned about.
Is this a common error so to say, that can be bypassed somehow?
Ps. I'm using Sigmoid and derivatives
what you are doing is incremental or selective learning. each time you re-train the network on a new data several epochs you are over fitting the new data. if in your case you don't care about the incremental learning and you just care about the result from your data set it is better you use batches from you data set over several iteration until your network converge and doesn't fit the training data.
Related
I need to show a chart from data returned from an API.
This API could potentially return millions of results, but it would tax the server heavily.
Thus, I'm looking for a way to return a fewer number of results and still show a trend in the chart. Basically, I'm looking to "smooth" the line of the graph by showing only relevant points.
Is there a .NET library that could help me in this implementation? Or perhaps a "smoothing" function that takes a limit on the number of points to results?
What would be your target number of results? One approach would be to just take a sampling of the points. For every 10 points you have, return 1, for instance. In which case, you could use Linq to accomplish this: Sampling a list with linq
This doesn't address the "showing only relevant points" part of your question, though. That's a little harder to solve programmatically. What does "relevant" mean in your data? Exceeding a certain deviation?
So maybe a moving average of your data would work. Take 10 points at a time, average them, return 1 point. Like this example: Smoothing data from a sensor
With either of those approaches, you can trade off accuracy and 'smoothness' by varying the "10" in the above examples. The higher the number, the "smoother" your result.
I'm looking for a way to detect faulty sensors in an IOT environment.
In this case a tank level sensor. The readings are always fluctuating somewhat, and the "hop" at the beginning is a tank refill which is "normal". On Sep 16 the sensor started to malfunction and just gives apparent random values after that.
As a programmer ideally I'd like a simple way of detecting the problem (and as soon after it starts as possible).
I can mess about with "if direction of vector between two hourly averages changes direction more than once per day it is unstable". But I guess there are more sound and stable algorithms out there.
Two simple options:
domain knowledge based: If you know the max possible output of the tank (say 5 liter/h), any output above that would signal an error. I.e. in case of the example, if
t1-t2 > 5
assuming t1 and t2 show the tank capacity at hourly intervall. You might want to add sensor accuracy related safety margin.
past data based: Assuming that all tanks are similar regarding output capacity and used sensor quality, calculate the following for all your data of non-faulty sensors:
max(t1-t2)
The result is the error threshold to be used, similar to the value 5 above.
Note: tank refill operation might require additional consideration.
Additional methods are described e.g. here. You can find other papers for sure.
http://bourbon.usc.edu/leana/pubs-full/sensorfaults.pdf
Standard deviation.
You're looking at how much variation there is between the measurements. Standard deviation is an easy formula, and well known. Look for a high value, and you know there's a problem.
You can also use coefficient of variation, which is the ratio of the mean to standard deviation.
I'm implementing Ng's example of OCR neural network in C#.
I think I've got all formulas correctly implemented [vectorized version] and my app is training the network.
Any advice on how can I see my network improving in recognition - without testing examples manually by drawing them after the training is done? I want to see where my training is going while it's being trained.
I've test my trained weights on a drawn digits, output on all neurons is quite similar(approx. 0.077,or something like that ...on all neurons) ,and the largest value is on the wrong neuron. So the result doesn't match the drawn image.
This is the only test I'm doing so far: Cost Function changes with epochs
So, this is what happens with Cost function (some call it objective function? ) in 50 epochs.
my Lambda value is set to 3.0 , learning rate is 0.01, 5000 examples, I do batch after each epoch i.e. after those 5000 examples. Activation function: sigmoid.
input: 400
hidden: 25
output:10
I don't know what proper values are for lambda and learning rate so that my network can learn without overfitting or underfitting.
Any suggestions how to find out my network is learning well?
Also, what value should J cost function have after all this training?
Should it approach zero?
Should I have more epochs?
Is it bad that my examples are all ordered by digits?
Any help is appreciated.
Q: Any suggestions how to find out my network is learning well?
A: Split the data into three groups training, cross validation and test.Validate your result with test data. This is actually address in the course later.
Q: Also, what value should J cost function have after all this training? Should it approach zero?
A: I recall in the homework Ng mentioned what is the expected value. The regularized cost should not be zero since it includes a sum of all the weights.
Q: Should I have more epochs?
A: If you run your program long enough ( less than 20 minutes? ) you will see the cost is not getting smaller, I assume it reached the local/global optimum so more epochs would not be necessary.
Q: Is it bad that my examples are all ordered by digits?
A: The algorithm modify the weights for every example so different order of data does affect each step in a batch. However the final result should not have much difference.
I am developing a program to study Neural Networks, by now I understand the differences (I guess) of dividing a dataset into 3 sets (training, validating & testing). My networks may be of just one output or multiple outputs, depending on the datasets and the problems. The learning algorithm is the back-propagation.
So, the problem basically is that I am getting confused with each error and the way to calculate it.
Which is the training error? If I want to use the MSE is the (desired - output)^2 ? But then, what happens if my network has 2 or more outputs, the training error is going to be the sum of all outputs?
Then, the validation error is just using the validation data set to calculate the output and compare the obtained results with the desired results, this will give me an error, is it computed the same way as in the training error? and with multiple outputs?
And finally, not totally clear, when should the validation run? Somewhere I read that it could be once every 5 epochs, but, is there any rule for this?
Thanks the time in advance!
For multiple output neurons, to calculate the training error, in each epoch/iteration, you take each output value, get the difference to the target value for that neuron. Square it, do the same for the other output neurons, and then get the mean.
So eg with two output neurons,
MSE = (|op1 - targ1|^2 + |op2 - targ2|^2 ) / 2
The training, validation and test errors are calculated the same way. The difference is when they are run and how they are used.
The full validation set is usually checked on every training epoch. Maybe to speed computation, you could run it every 5.
The result of the validation test/check is not used to update the weights, only to decide when to exit training. Its used to decide if the network has generalized on the data, and not overfitted.
Check the pseudocode in the first answer in this question
whats is the difference between train, validation and test set, in neural networks?
I have been doing some research with neural networks and the concept and theory as a whole makes sense to me. Although the one question that sticks out to me, which I haven't been able to find an answer to yet, is how many neurons should be used in a Neural Net. to achieve proper/efficient results. Including Hidden Layers, neurons per Hidden Layer, etc. Do more neurones necessarily more accurate results (while being more taxing on the system) or will less neurons still be sufficient? Is there some sort of governing rule to help determine those numbers? Does it depend on the type of training/learning algorithm that is being implemented into the neural net. Does it depend on the type of data/input that is being presented to the network?
If it makes it easier to answer the questions, I will most likely be using feedforwarding and backpropogation as the main method for training and prediction.
On a side note, is there a prediction algorithm/firing rule or learning algorithm that is generally regraded to as "the best/most practical", or is that also dependant on the type of data being presented to the network?
Thanks to anyone with any input, it's always appreciated!
EDIT: Regarding the C# tag, that is the language in which I'll be putting together my neural network. If that information helps at all.
I specialized in AI / NN in College, and have had some ameture experience working on them for games, and here is what I found as a guide for getting started. Realize, however, that each NN will take some tweaking to work best in your chosen environment. (One potential solution is to expose your program to 1000s of different NNs, setup a testable criteria for performance and then use a Genetic Algorithm to propagate more useful NNs and cull less useful NNs - but that is a whole other very large post...)
I found - in general
Input Layer - One AN for each input vector + 1 Bias (always 1)
Inner Layer - Double the Input Layer
Output Layer - One AN for each Action or Result
Example: Character Recognition
If you are examining a 10x10 grid for character recognition;
start with 101 Input AN (one for each pixel, plus one bias)
202 Inner AN
and 26 Output AN (one for each letter of the alphabet)
Example: Blackjack
If you are building a NN to "win at blackjack";
start with 16 Input AN (13 to count each occurance of a card, 1 for player hand value, 1 for dealer "up-card", and 1 bias)
32 Inner AN
and 6 output AN (one for "Hit" "Stay" "Split" "Double" "Surrender" and "Insurrance")
Some general rules are the following based on this paper: 'Approximating Number of Hidden layer neurons in Multiple Hidden Layer BPNN Architecture' by Saurabh Karsoliya. Source here
The number of hidden layer neurons are 2/3 (or 70%
to 90%) of the size of the input layer. If this is
insufficient then number of output layer neurons can
be added later on.
The number of hidden layer neurons should be less
than twice of the number of neurons in input layer.
The size of the hidden layer neurons is between the
input layer size and the output layer size.