I'm implementing Ng's example of OCR neural network in C#.
I think I've got all formulas correctly implemented [vectorized version] and my app is training the network.
Any advice on how can I see my network improving in recognition - without testing examples manually by drawing them after the training is done? I want to see where my training is going while it's being trained.
I've test my trained weights on a drawn digits, output on all neurons is quite similar(approx. 0.077,or something like that ...on all neurons) ,and the largest value is on the wrong neuron. So the result doesn't match the drawn image.
This is the only test I'm doing so far: Cost Function changes with epochs
So, this is what happens with Cost function (some call it objective function? ) in 50 epochs.
my Lambda value is set to 3.0 , learning rate is 0.01, 5000 examples, I do batch after each epoch i.e. after those 5000 examples. Activation function: sigmoid.
input: 400
hidden: 25
output:10
I don't know what proper values are for lambda and learning rate so that my network can learn without overfitting or underfitting.
Any suggestions how to find out my network is learning well?
Also, what value should J cost function have after all this training?
Should it approach zero?
Should I have more epochs?
Is it bad that my examples are all ordered by digits?
Any help is appreciated.
Q: Any suggestions how to find out my network is learning well?
A: Split the data into three groups training, cross validation and test.Validate your result with test data. This is actually address in the course later.
Q: Also, what value should J cost function have after all this training? Should it approach zero?
A: I recall in the homework Ng mentioned what is the expected value. The regularized cost should not be zero since it includes a sum of all the weights.
Q: Should I have more epochs?
A: If you run your program long enough ( less than 20 minutes? ) you will see the cost is not getting smaller, I assume it reached the local/global optimum so more epochs would not be necessary.
Q: Is it bad that my examples are all ordered by digits?
A: The algorithm modify the weights for every example so different order of data does affect each step in a batch. However the final result should not have much difference.
Related
I need to show a chart from data returned from an API.
This API could potentially return millions of results, but it would tax the server heavily.
Thus, I'm looking for a way to return a fewer number of results and still show a trend in the chart. Basically, I'm looking to "smooth" the line of the graph by showing only relevant points.
Is there a .NET library that could help me in this implementation? Or perhaps a "smoothing" function that takes a limit on the number of points to results?
What would be your target number of results? One approach would be to just take a sampling of the points. For every 10 points you have, return 1, for instance. In which case, you could use Linq to accomplish this: Sampling a list with linq
This doesn't address the "showing only relevant points" part of your question, though. That's a little harder to solve programmatically. What does "relevant" mean in your data? Exceeding a certain deviation?
So maybe a moving average of your data would work. Take 10 points at a time, average them, return 1 point. Like this example: Smoothing data from a sensor
With either of those approaches, you can trade off accuracy and 'smoothness' by varying the "10" in the above examples. The higher the number, the "smoother" your result.
I'm looking for a way to detect faulty sensors in an IOT environment.
In this case a tank level sensor. The readings are always fluctuating somewhat, and the "hop" at the beginning is a tank refill which is "normal". On Sep 16 the sensor started to malfunction and just gives apparent random values after that.
As a programmer ideally I'd like a simple way of detecting the problem (and as soon after it starts as possible).
I can mess about with "if direction of vector between two hourly averages changes direction more than once per day it is unstable". But I guess there are more sound and stable algorithms out there.
Two simple options:
domain knowledge based: If you know the max possible output of the tank (say 5 liter/h), any output above that would signal an error. I.e. in case of the example, if
t1-t2 > 5
assuming t1 and t2 show the tank capacity at hourly intervall. You might want to add sensor accuracy related safety margin.
past data based: Assuming that all tanks are similar regarding output capacity and used sensor quality, calculate the following for all your data of non-faulty sensors:
max(t1-t2)
The result is the error threshold to be used, similar to the value 5 above.
Note: tank refill operation might require additional consideration.
Additional methods are described e.g. here. You can find other papers for sure.
http://bourbon.usc.edu/leana/pubs-full/sensorfaults.pdf
Standard deviation.
You're looking at how much variation there is between the measurements. Standard deviation is an easy formula, and well known. Look for a high value, and you know there's a problem.
You can also use coefficient of variation, which is the ratio of the mean to standard deviation.
I have a neural network with a lot of inputs, and i want to train it to realise that only 1 of the inputs matter. First i train it with input[1]=1 and given result 10
then i train with exact same inputs except input[1] = 0 and given result being 0.
I train them until the error is 0 before i switch to the other one, but they just keep changing different weights up and down till the output is equal to the given result, they never figure out that only the weights related to input[1] needs to be concerned about.
Is this a common error so to say, that can be bypassed somehow?
Ps. I'm using Sigmoid and derivatives
what you are doing is incremental or selective learning. each time you re-train the network on a new data several epochs you are over fitting the new data. if in your case you don't care about the incremental learning and you just care about the result from your data set it is better you use batches from you data set over several iteration until your network converge and doesn't fit the training data.
I am developing a program to study Neural Networks, by now I understand the differences (I guess) of dividing a dataset into 3 sets (training, validating & testing). My networks may be of just one output or multiple outputs, depending on the datasets and the problems. The learning algorithm is the back-propagation.
So, the problem basically is that I am getting confused with each error and the way to calculate it.
Which is the training error? If I want to use the MSE is the (desired - output)^2 ? But then, what happens if my network has 2 or more outputs, the training error is going to be the sum of all outputs?
Then, the validation error is just using the validation data set to calculate the output and compare the obtained results with the desired results, this will give me an error, is it computed the same way as in the training error? and with multiple outputs?
And finally, not totally clear, when should the validation run? Somewhere I read that it could be once every 5 epochs, but, is there any rule for this?
Thanks the time in advance!
For multiple output neurons, to calculate the training error, in each epoch/iteration, you take each output value, get the difference to the target value for that neuron. Square it, do the same for the other output neurons, and then get the mean.
So eg with two output neurons,
MSE = (|op1 - targ1|^2 + |op2 - targ2|^2 ) / 2
The training, validation and test errors are calculated the same way. The difference is when they are run and how they are used.
The full validation set is usually checked on every training epoch. Maybe to speed computation, you could run it every 5.
The result of the validation test/check is not used to update the weights, only to decide when to exit training. Its used to decide if the network has generalized on the data, and not overfitted.
Check the pseudocode in the first answer in this question
whats is the difference between train, validation and test set, in neural networks?
I have a stream of data that I get at around 10 snapshots per second. I wrote a C# controller that takes this data and adjusts some data-gathering parameters based on the distance away from the expected result. Currently, I just do a linear scale operation on my data so that the farther away I am from the expected result the more it corrects. The problem is that the incoming data stream is delayed by somewhere between 0.5 and 2 seconds (I can calculate that at runtime). Because of this delay, it is correcting for results from a while ago and is constantly overcorrecting and even correcting in the wrong direction sometimes.
I am looking for an algorithm to do the following:
Implement a correction algorithm (I'd prefer PID) that will attempt to hone-in on the optimal value
Predict a certain amount of time (or number of datasets) in advance based on the history of corrections and results
What options do I have in terms of algorithms that can accomplish this?
C# code samples would be appreciated; I am not great at converting complex pseudo-code in to a working implementation.