I have a stream of data that I get at around 10 snapshots per second. I wrote a C# controller that takes this data and adjusts some data-gathering parameters based on the distance away from the expected result. Currently, I just do a linear scale operation on my data so that the farther away I am from the expected result the more it corrects. The problem is that the incoming data stream is delayed by somewhere between 0.5 and 2 seconds (I can calculate that at runtime). Because of this delay, it is correcting for results from a while ago and is constantly overcorrecting and even correcting in the wrong direction sometimes.
I am looking for an algorithm to do the following:
Implement a correction algorithm (I'd prefer PID) that will attempt to hone-in on the optimal value
Predict a certain amount of time (or number of datasets) in advance based on the history of corrections and results
What options do I have in terms of algorithms that can accomplish this?
C# code samples would be appreciated; I am not great at converting complex pseudo-code in to a working implementation.
Related
I need to show a chart from data returned from an API.
This API could potentially return millions of results, but it would tax the server heavily.
Thus, I'm looking for a way to return a fewer number of results and still show a trend in the chart. Basically, I'm looking to "smooth" the line of the graph by showing only relevant points.
Is there a .NET library that could help me in this implementation? Or perhaps a "smoothing" function that takes a limit on the number of points to results?
What would be your target number of results? One approach would be to just take a sampling of the points. For every 10 points you have, return 1, for instance. In which case, you could use Linq to accomplish this: Sampling a list with linq
This doesn't address the "showing only relevant points" part of your question, though. That's a little harder to solve programmatically. What does "relevant" mean in your data? Exceeding a certain deviation?
So maybe a moving average of your data would work. Take 10 points at a time, average them, return 1 point. Like this example: Smoothing data from a sensor
With either of those approaches, you can trade off accuracy and 'smoothness' by varying the "10" in the above examples. The higher the number, the "smoother" your result.
I'm looking for a way to detect faulty sensors in an IOT environment.
In this case a tank level sensor. The readings are always fluctuating somewhat, and the "hop" at the beginning is a tank refill which is "normal". On Sep 16 the sensor started to malfunction and just gives apparent random values after that.
As a programmer ideally I'd like a simple way of detecting the problem (and as soon after it starts as possible).
I can mess about with "if direction of vector between two hourly averages changes direction more than once per day it is unstable". But I guess there are more sound and stable algorithms out there.
Two simple options:
domain knowledge based: If you know the max possible output of the tank (say 5 liter/h), any output above that would signal an error. I.e. in case of the example, if
t1-t2 > 5
assuming t1 and t2 show the tank capacity at hourly intervall. You might want to add sensor accuracy related safety margin.
past data based: Assuming that all tanks are similar regarding output capacity and used sensor quality, calculate the following for all your data of non-faulty sensors:
max(t1-t2)
The result is the error threshold to be used, similar to the value 5 above.
Note: tank refill operation might require additional consideration.
Additional methods are described e.g. here. You can find other papers for sure.
http://bourbon.usc.edu/leana/pubs-full/sensorfaults.pdf
Standard deviation.
You're looking at how much variation there is between the measurements. Standard deviation is an easy formula, and well known. Look for a high value, and you know there's a problem.
You can also use coefficient of variation, which is the ratio of the mean to standard deviation.
So... no clue where I should be asking this question but I'm hoping someone here can at least point me in the right direction. I have a time series that I would like to do spectral analysis on but I can't find any tools for doing FFT that accommodate a varied time difference between data points (they all assume dt is constant). Does anyone know of a tool that would work for this (I'm specifically looking for a periodogram or some other way to determine periodicity).
My only thought is to do linear interpolation between data points at a specific time interval to give the data a constant dt but I'm worried that will scew the spectral analysis data.
Here is a small chunk of the data; time, data, dt
time data dt
39.630 49662.1 0.170
39.810 49582.5 0.180
40.150 49430.0 0.340
40.320 49413.8 0.170
40.490 49324.0 0.170
40.670 49092.5 0.180
40.830 49025.6 0.160
41.010 49101.5 0.180
any suggestions??
Most mathematical theory behind FFT requires a fixed sampling period.
I can suggest you to do the following:
Create a sequence of equally spaced time instants
Calculate the value for each instant based on the neighbor points (linear or quadratic interpolation should do it). You can use as many points as you desire in order to obtain the best approximation.
Depending on the degree of detail you want for your results use a parametric or non-parametric method for estimating spectrum: Burg method can be useful and LPC/AR model can also be useful.
Check the links at Mathworks:
http://www.mathworks.com/help/signal/nonparametric-spectral-estimation.html
http://www.mathworks.com/help/signal/parametric-spectral-estimation.html
I'm implementing Ng's example of OCR neural network in C#.
I think I've got all formulas correctly implemented [vectorized version] and my app is training the network.
Any advice on how can I see my network improving in recognition - without testing examples manually by drawing them after the training is done? I want to see where my training is going while it's being trained.
I've test my trained weights on a drawn digits, output on all neurons is quite similar(approx. 0.077,or something like that ...on all neurons) ,and the largest value is on the wrong neuron. So the result doesn't match the drawn image.
This is the only test I'm doing so far: Cost Function changes with epochs
So, this is what happens with Cost function (some call it objective function? ) in 50 epochs.
my Lambda value is set to 3.0 , learning rate is 0.01, 5000 examples, I do batch after each epoch i.e. after those 5000 examples. Activation function: sigmoid.
input: 400
hidden: 25
output:10
I don't know what proper values are for lambda and learning rate so that my network can learn without overfitting or underfitting.
Any suggestions how to find out my network is learning well?
Also, what value should J cost function have after all this training?
Should it approach zero?
Should I have more epochs?
Is it bad that my examples are all ordered by digits?
Any help is appreciated.
Q: Any suggestions how to find out my network is learning well?
A: Split the data into three groups training, cross validation and test.Validate your result with test data. This is actually address in the course later.
Q: Also, what value should J cost function have after all this training? Should it approach zero?
A: I recall in the homework Ng mentioned what is the expected value. The regularized cost should not be zero since it includes a sum of all the weights.
Q: Should I have more epochs?
A: If you run your program long enough ( less than 20 minutes? ) you will see the cost is not getting smaller, I assume it reached the local/global optimum so more epochs would not be necessary.
Q: Is it bad that my examples are all ordered by digits?
A: The algorithm modify the weights for every example so different order of data does affect each step in a batch. However the final result should not have much difference.
I need to display a set of signals. Each signal is defined by millions of samples. Just processing the collection (for converting samples to points according to bitmap size) of samples takes a significant amount of time (especially during scrolling).
So I implemented some kind of downsampling. I just skip some points: take every 2nd, every 3rd, every 50th point depending on signal characteristics. It increases speed very much but significantly distorts signal form.
Are there any smarter approaches?
We've had a similar issue in a recent application. Our visualization (a simple line graph) became too cluttered when zoomed out to see the full extent of the data (about 7 days of samples with a sample taken every 6 seconds more or less), so down-sampling was actually the way to go. If we didn't do that, zooming out wouldn't have much meaning, as all you would see was just a big blob of lines smeared out over the screen.
It all depends on how you are going to implement the down-sampling. There's two (simple) approaches: down-sample at the moment you get your sample or down-sample at display time.
What really gives a huge performance boost in both of these cases is the proper selection of your data-sources.
Let's say you have 7 million samples, and your viewing window is just interested in the last million points. If your implementation depends on an IEnumerable, this means that the IEnumerable will have to MoveNext 6 million times before actually starting. However, if you're using something which is optimized for random reads (a List comes to mind), you can implement your own enumerator for that, more or less like this:
public IEnumerator<T> GetEnumerator(int start, int count, int skip)
{
// assume we have a field in the class which contains the data as a List<T>, named _data
for(int i = start;i<count && i < _data.Count;i+=skip)
{
yield return _data[i];
}
}
Obviously this is a very naive implementation, but you can do whatever you want within the for-loop (use an algorithm based on the surrounding samples to average?). However, this approach will make usually smooth out any extreme spikes in your signal, so be wary of that.
Another approach would be to create some generalized versions of your dataset for different ranges, which update itself whenever you receive a new signal. You usually don't need to update the complete dataset; just updating the end of your set is probably good enough. This allows you do do a bit more advanced processing of your data, but it will cost more memory. You will have to cache the distinct 'layers' of detail in your application.
However, reading your (short) explanation, I think a display-time optimization might be good enough. You will always get a distortion in your signal if you generalize. You always lose data. It's up to the algorithm you choose on how this distortion will occur, and how noticeable it will be.
You need a better sampling algorithm, also you can employ parallel processing features of c#. Refer to Task Parallel Library