How to avoid overfitting (Encog3 C#)? - c#

I am new to neural network and I'm working with Encog3. I have created feedforward neural network which can be train and tested.
Problem is that I'm not sure how to prevent overfitting. I know I have to split data into training, testing and evaluation set, but I'm not sure where and when to use evaluation set.
Currently, I split all data into training and testing set (50%, 50%), train network on one part, test on another. Accuracy is 85%.
I tried with CrossValidationKFold but in that case accuracy is only 12% and I don't understand why.
My question is, how can I use evaluation set to avoid overfitting?
I am confused about evaluation set and any help would be appreciated.

It is general practice to have split 60x20x20 ( another common usage is 80x10x10 )%. 60 percent for training. 20 percent for validating and another 20 percent for validating previous two. Why three parts? Because it will give you better picture how ML works on data which it never seen before. Another part of analysis could include representative learning set. If you have in your training data set values which do not have any representation in validating then most probably you'll get mistakes in your ML. It's the same way how your brain works. If you learn some rules, and then suddenly got some task which is actually exception from rules you'll know, most probably you'll give wrong answer. In case if you have problems with learning, you can do the following: increase dataset, increase number of inputs ( via some non linear transformations with your inputs ). Maybe you'll also need to apply some anomaly detection algorithm. Also you can consider to apply some different normalization techniques.

As a quick aside, you keep referring to the data as an “evaluation” set. Whilst it is being used in that capacity, the general term is “validation” set, which might allow you better success when googling it.
You’re in something of a chicken-and-egg situation with your current setup. Basically, the sole purpose of the validation set is to prevent overfitting – making no use of a validation set will (for all intents and purposes) result in overfitting. By contrast, the testing set has no part to play in preventing overfitting, it’s just another way of seeing, at the end, whether overfitting might have occurred.
Perhaps it would be easier to take this away from any maths or code (which I assume you have seen before) and imagine this as questions the model keeps asking itself. On every training epoch, the model is desperately trying to reduce its residual error against the training set and, being so highly non-linear, there’s a good chance in structured problems that it will reduce this error to almost nothingness if you allow it to keep running. But that’s not what you’re after. You’re after a model that is a good approximator for all three datasets. So, we make it do the following on every epoch:
“Has my new move reduced the error on the training set?” If yes: “Awesome, I’ll keep going in that direction.”
“Has my new move reduced the error on the validation set?” If yes: “Awesome, I’ll keep going in that direction.”
Eventually, you’ll come to:
“Has my new move reduced the error on the training set?” Yes: “Awesome, I’ll keep going in that direction.”
“Has my new move reduced the error on the validation set?” No, it’s increased: “Perhaps I’ve gone too far.”
If the validation error continues to rise, then you’ve identified the point at which your model is moving away from being a good approximator and moving towards being over-fit to the training set. It’s time to stop. Then you want to apply that final model to your test data and see whether the model is still a good approximator to that data too. And if it is, you have your model.
A final word, it’s good to see you’re doing some form of cross validation because I’ve seen that kind of safeguard missed so many times in the past.

Related

Why doesn't my Feed-Forward NN work with varying inputs?

I decided to create a feedforward Neural Network without using any libraries. I am fairly new to the subject and completely self-trained.
My Neural Network uses backpropagation to set the weights and the activation function between all layers (input-hidden1-output) is a Sigmoid function.
Let's say that I try to solve a basic problem like the XOr logic gate problem with my NN. Whenever i use the complete training set (all the possible combinations of 1s and 0s) my NN cannot set the weights in such a way that it could produce the desired output. Seemingly it always stops at the middle. (output is ~0.5 in all cases)
On the other hand, when I only iterate one type of input (Let's say 0 and 1) it quickly learns.
Is there a problem in my cost function, number of nodes, hidden layers or what? I would appreciate some guiding words!
XOR problem is not linearly separable and makes single layer perceptron unfit. However, in your network addition of hidden layer makes the network to capture non-linear features, which makes it fine.
The most plausible reason for the poor performance of the network would be due to tortuous initial phase to learn the problem. So increasing the iterations would solve the problem.
And one more possible thing to try is by the smooth nonlinearity of XOR, so the role of bias is crucial as the translation parameter and as important as weights (which you did not mention)
XOR can't be solved with one hidden layer. Because you can't separate your labels (0 and 1) with just one line. You can separate them with two lines and then use AND gate (another hidden layer) to find their common area.
See this post for clarification: https://medium.com/#jayeshbahire/the-xor-problem-in-neural-networks-50006411840b

How can I reduce the optimality gap of the routing model's assignment by allow more time to search?

I am solving a pick and delivery problem. I was testing OR-Tools to know how good it is by the following example:
1. Two vehicles at same start, two pickup locations (one for each customer) that are actually the same point in terms of geolocation, two customers having the same geolocation too.
2. No demand or capacity, just a time dimension between points and constraints to satisfy pickup and delivery.
3. The objective is to reduce the global span of the cumulative time
It's obvious that the optimal solution will use both vehicles, but it doesn't! I tried a lot of settings to make it escape from a local optima, but it still doesn't and doesn't even try to use the time at hand to reach a better solution and just finishes in a couple of seconds.
So, how can I force it to continue search even if it thinks that the solution at hand is enough?
BTW: I checked if my logic is correct by giving it the optimal route as an initial route, and when I do that, it uses it. It also, indicated that the objective value of the optimal route is less than the original route, so I guess there are no bugs in the code.

When running unit tests on objects(s) whose purpose it is to track various lengths of elapsed time, is there any way to speed up the process?

Longform Question:
When running unit tests on objects(s) whose purpose it is to track various lengths of elapsed time, is there any way to speed up the process rather than having to sit through it? In essence, if there’s a unit test that would take sixty or more seconds to complete, is there a way to simulate that test in one or two seconds. I don’t want something that will cheat the test as I still want the same comparable, accurate results, just without the minute of waiting before I get them. I guess you could say I’m asking if anyone knows how to implement a form of time warp.
Background Info:
I’m currently working with an object that can count up or down, and then does an action when the desired time has elapsed. All of my tests pass, so I’m completely fine on that front. My problem is that the tests require various lengths of time to pass for the tests to be completely thorough. This isn’t a problem for short tests, say five seconds, but if I wish to test longer lengths of time, say sixty seconds or longer, I have to wait that long before I get my result.
I’m using longer lengths of time on some tests to see how accurate the timing is, and if I’ve made sure the logic is correct so rollover isn’t an issue. I’ve essentially found that, while a short duration of time is fine for the majority of the tests, there are a few that have to be longer.
I’ve been googling and regardless of what search terms I’ve used, I can’t seem to find an answer to a question such as this. The only ones that seem to pop up are "getting up to speed with unit tests" and others of that nature. I’m currently using the MSTestv2 framework that comes with VS2017 if that helps.
Any help is greatly appreciated!
Edit:
Thanks for the responses! I appreciate the info I've been given so far and it's nice to get a fresh perspective on how I could tackle the issue. If anyone else has anything they'd like to / want to add, I'm all ears!
In 1998, John Carmack wrote:
If you don't consider time an input value, think about it until you do -- it is an important concept
The basic idea here being that your logic is going to take time as an input, and your boundary is going to have an element that can integrate with the clock.
In C#, the result is probably going to look like ports and adapters; you have a port (interface) that describes how you want to interact with the clock, and an adapter (implementation) that implements the interface and reads times off of the clock that you will use in production.
In your unit tests, you replace the production adapter with an implementation that you control.
Key idea:
There are two ways of constructing a software design: One way is to make it so simple that there are obviously no deficiencies
Your adapter implementation should be so simple (by design) that you can just look at it and evaluate its correctness. No logic, no data transformations, just the simplest thing that could possibly insulate you from the boundary.
Note that this might be a significant departure from your current design. That's OK, and part of the point of test driven design; the difficulties in testing are supposed to help you recognize the separable responsibilities in your code, and create the correct boundaries around them.
Cory Benfield's talk on building protocol libraries the right way describes this approach in the context of reading data from a socket; read data from IO, and just copy the bytes as is into a state machine that performs all of the logic.
So your "clock" might actually just be a stream/sequence of timestamp events, and your unit tests then document "given this sequence of timestamps, then that is the expected behavior of the functional core".
The slower tests, that actually interact with the real clock, are moved from the unit test suite to the integration suite. They still have value, and you still want them to pass, but you don't want the delays they produce to interrupt the development/refactoring workflow.

Redundancy algorithm for reading noisy bitstream

I'm reading a lossy bit stream and I need a way to recover as much usable data as possible. There can be 1's in place of 0's and 0's in palce of 1's, but accuracy is probably over 80%.
A bonus would be if the algorithm could compensate for missing/too many bits as well.
The source I'm reading from is analogue with noise (microphone via FFT), and the read timing could vary depending on computer speed.
I remember reading about algorithms used in CD-ROM's doing this in 3? layers, so I'm guessing using several layers is a good option. I don't remember the details though, so if anyone can share some ideas that would be great! :)
Edit: Added sample data
Best case data:
in: 0000010101000010110100101101100111000000100100101101100111000000100100001100000010000101110101001101100111000101110000001001111011001100110000001001100111011110110101011100111011000100110000001000010111
out: 0010101000010110100101101100111000000100100101101100111000000100100001100000010000101110101001101100111000101110000001001111011001100110000001001100111011110110101011100111011000100110000001000010111011
Bade case (timing is off, samples are missing):
out: 00101010000101101001011011001110000001001001011011001110000001001000011000000100001011101010011011001
in: 00111101001011111110010010111111011110000010010000111000011101001101111110000110111011110111111111101
Edit2: I am able to controll the data being sent. Currently attempting to implement simple XOR checking (though it won't be enough).
If I understand you correctly, you have two needs:
Modulate a signal into sound and then demodulate it.
Apply error correction since the channel is unreliable.
Modulation and demodulation is a wellknown application, with several ways to modulate the information.
Number two, error correction also is wellknown and have several possibilities. Which one is applicable depends on the error rate and whether you have duplex operation so that you can request resends. If you have decent quality and can request resends an approach like the one TCP is using is worth exploring.
Otherwise you will have to get down to error detection and error correction algorithms, like the one used on CDROMs.
Edit after the comment
Having the modulation/demodulation done and no resend possibilities narrows the problem. If you are having timing issues, I would still recommend that you read up on existing (de)modulation methods, as there are ways to automatically resynchronize with the sender and increase signal-to-noise ratio.
Down to the core problem; error correction you will have to add parity bits to your output stream in order to be able to detect the errors. Starting with the forward error correction article #Justin suggests, an scheme that looks quite simple, but still powerful is the Hamming(7,4) scheme.
You need to use forward error correction. An XOR parity check will only detect when an error occurs. A simple error correction algorithm would be to send each chunk of data multiple times (at least 3) and make a majority decision.
The choice of algorithm depends on several factors:
Channel utilization (if you have lots of free time, you don't need an efficient coding)
Error types: are the bad bits randomly spaced or do they usually occur in a row
Processing time: code complexity is limited if data transmission needs to be fast
There are lot of possibilities, see : http://en.wikipedia.org/wiki/Error_detection_and_correction
This can help you with changed bits, but may be unsuitable to check whenever you have all the bits.
In the end, it will probably take much more than few lines of simple code.

factory floor simulation

I would like to create a simulation of a factory floor, and I am looking for ideas on how to do this. My thoughts so far are:
• A factory is a made up of a bunch of processes, some of these processes are in series and some are in parallel. Each process would communicate with it's upstream and downstream and parallel neighbors to let them know of it’s through put
• Each process would it's own basic attributes like maximum throughput, cost of maintenance as a result of through put
Obviously I have not fully thought this out, but I was hoping somebody might be able to give me a few ideas or perhaps a link to an on line resource
update:
This project is only for my own entertainment, and perhaps learn a little bit alnong the way. I am not employed as a programmer, programming is just a hobby for me. I have decided to write it in C#.
Simulating an entire factory accurately is a big job.
Firstly you need to figure out: why are you making the simulation? Who is it for? What value will it give them? What parts of the simulation are interesting? How accurate does it need to be? What parts of the process don't need to be simulated accurately?
To figure out the answers to these questions, you will need to talk to whoever it is that wants the simulation written.
Once you have figured out what to simulate, then you need to figure out how to simulate it. You need some models and some parameters for those models. You can maybe get some actual figures from real production and try to derive models from the figures. The models could be a simple linear relationship between an input and an output, a more complex relationship, and perhaps even a stochastic (random) effect. If you don't have access to real data, then you'll have to make guesses in your model, but this will never be as good so try to get real data wherever possible.
You might also want to consider to probabilities of components breaking down, and what affect that might have. What about the workers going on strike? Unavailability of raw materials? Wear and tear on the machinery causing progressively lower output over time? Again you might not want to consider these details, it depends on what the customer wants.
If your simulation involves random events, you might want to run it many times and get an average outcome, for example using a Monte Carlo simulation.
To give a better answer, we need to know more about what you need to simulate and what you want to achieve.
Since your customer is yourself, you'll need to decide the answer to all of the questions that Mark Byers asked. However, I'll give you some suggestions and hopefully they'll give you a start.
Let's assume your factory takes a few different parts and assembles them into just one finished product. A flowchart of the assembly process might look like this:
Factory Flowchart http://img62.imageshack.us/img62/863/factoryflowchart.jpg
For the first diamond, where widgets A and B are assembled, assume it takes on average 30 seconds to complete this step. We'll assume the actual time it takes the two widgets to be assembled is distributed normally, with mean 30 s and variance 5 s. For the second diamond, assume it also takes on average 30 seconds, but most of the time it doesn't take nearly that long, and other times it takes a lot longer. This is well approximated by an exponential distribution, with 30 s as the rate parameter, often represented in equations by a lambda.
For the first process, compute the time to assemble widgets A and B as:
timeA = randn(mean, sqrt(variance)); // Assuming C# has a function for a normally
// distributed random number with mean and
// sigma as inputs
For the second process, compute the time to add widget C to the assembly as:
timeB = rand()/lambda; // Assuming C# has a function for a uniformly distributed
// random number
Now your total assembly time for each iGadget will be timeA + timeB + waitingTime. At each assembly point, store a queue of widgets waiting to be assembled. If the second assembly point is a bottleneck, it's queue will fill up. You can enforce a maximum size for its queue, and hold things further up stream when that max size is reached. If an item is in a queue, it's assembly time is increased by all of the iGadgets ahead of it in the assembly line. I'll leave it up to you to figure out how to code that up, and you can run lots of trials to see what the total assembly time will be, on average. What does the resultant distribution look like?
Ways to "spice this up":
Require 3 B widgets for every A widget. Play around with inventory. Replenish inventory at random intervals.
Add a quality assurance check (exponential distribution is good to use here), and reject some of the finished iGadgets. I suggest using a low rejection rate.
Try using different probability distributions than those I've suggested. See how they affect your simulation. Always try to figure out how the input parameters to the probability distributions would map into real world values.
You can do a lot with this simple simulation. The next step would be to generalize your code so that you can have an arbitrary number of widgets and assembly steps. This is not quite so easy. There is an entire field of applied math called operations research that is dedicated to this type of simulation and analysis.
What you're describing is a classical problem addressed by discrete event simulation. A variety of both general purpose and special purpose simulation languages have been developed to model these kinds of problems. While I wouldn't recommend programming anything from scratch for a "real" problem, it may be a good exercise to write your own code for a small queueing problem so you can understand event scheduling, random number generation, keeping track of calendars, etc. Once you've done that, a general purpose simulation language will do all that stuff for you so you can concentrate on the big picture.
A good reference is Law & Kelton. ARENA is a standard package. It is widely used and, IMHO, is very comprehensive for these kind of simulations. The ARENA book is also a decent book on simulation and it comes with the software that can be applied to small problems. To model bigger problems, you'll need to get a license. You should be able to download a trial version of ARENA here.
It maybe more then what you are looking for but visual components is a good industrial simulation tool.
To be clear I do not work for them nor does the company I work for currently use them, but we have looked at them.
Automod is the way to go.
http://www.appliedmaterials.com/products/automod_2.html
There is a lot to learn, and it won't be cheap.
ASI's Automod has been in the factory simulation business for about 30 years. It is now owned by Applied Materials. The big players who work with material handling in a warehouse use Automod because it is the proven leader.

Categories

Resources