I'm looking for a way to detect faulty sensors in an IOT environment.
In this case a tank level sensor. The readings are always fluctuating somewhat, and the "hop" at the beginning is a tank refill which is "normal". On Sep 16 the sensor started to malfunction and just gives apparent random values after that.
As a programmer ideally I'd like a simple way of detecting the problem (and as soon after it starts as possible).
I can mess about with "if direction of vector between two hourly averages changes direction more than once per day it is unstable". But I guess there are more sound and stable algorithms out there.
Two simple options:
domain knowledge based: If you know the max possible output of the tank (say 5 liter/h), any output above that would signal an error. I.e. in case of the example, if
t1-t2 > 5
assuming t1 and t2 show the tank capacity at hourly intervall. You might want to add sensor accuracy related safety margin.
past data based: Assuming that all tanks are similar regarding output capacity and used sensor quality, calculate the following for all your data of non-faulty sensors:
max(t1-t2)
The result is the error threshold to be used, similar to the value 5 above.
Note: tank refill operation might require additional consideration.
Additional methods are described e.g. here. You can find other papers for sure.
http://bourbon.usc.edu/leana/pubs-full/sensorfaults.pdf
Standard deviation.
You're looking at how much variation there is between the measurements. Standard deviation is an easy formula, and well known. Look for a high value, and you know there's a problem.
You can also use coefficient of variation, which is the ratio of the mean to standard deviation.
Related
I'm implementing Ng's example of OCR neural network in C#.
I think I've got all formulas correctly implemented [vectorized version] and my app is training the network.
Any advice on how can I see my network improving in recognition - without testing examples manually by drawing them after the training is done? I want to see where my training is going while it's being trained.
I've test my trained weights on a drawn digits, output on all neurons is quite similar(approx. 0.077,or something like that ...on all neurons) ,and the largest value is on the wrong neuron. So the result doesn't match the drawn image.
This is the only test I'm doing so far: Cost Function changes with epochs
So, this is what happens with Cost function (some call it objective function? ) in 50 epochs.
my Lambda value is set to 3.0 , learning rate is 0.01, 5000 examples, I do batch after each epoch i.e. after those 5000 examples. Activation function: sigmoid.
input: 400
hidden: 25
output:10
I don't know what proper values are for lambda and learning rate so that my network can learn without overfitting or underfitting.
Any suggestions how to find out my network is learning well?
Also, what value should J cost function have after all this training?
Should it approach zero?
Should I have more epochs?
Is it bad that my examples are all ordered by digits?
Any help is appreciated.
Q: Any suggestions how to find out my network is learning well?
A: Split the data into three groups training, cross validation and test.Validate your result with test data. This is actually address in the course later.
Q: Also, what value should J cost function have after all this training? Should it approach zero?
A: I recall in the homework Ng mentioned what is the expected value. The regularized cost should not be zero since it includes a sum of all the weights.
Q: Should I have more epochs?
A: If you run your program long enough ( less than 20 minutes? ) you will see the cost is not getting smaller, I assume it reached the local/global optimum so more epochs would not be necessary.
Q: Is it bad that my examples are all ordered by digits?
A: The algorithm modify the weights for every example so different order of data does affect each step in a batch. However the final result should not have much difference.
I'm writing an application that has a need to know the speed you're traveling. My application talks to several pieces of equipment, all with different built-in GPS receivers. Where the hardware I'm working with reports speed, I use that parameter. But in some cases, I have hardware which does NOT report speed, simply latitude and longitude.
What I have been doing in that case, is marking the time that I receive the first coordinate, then waiting for another coordinate to come in. I then calculation the distance traveled and divide by the elapsed time.
The problem I'm running into is that some of the hardware reports position quickly (5-10 times per second) while some reports position slowly (0.5 times per second). When I'm receiving the GPS position quickly, my algorithm fails to accurately calculate the speed due to the inherent inaccuracies of GPS receivers. In order words, the position will naturally move due to GPS inaccuracy, and since the elapsed time span from the last received position is so small, my algorithm thinks we've moved far over a short time - meaning we are going fast (when in reality we may be standing still).
How can I go about averaging the speed to avoid this problem? It seems like the process will have to be adaptive based on how fast the points come in. For example if I simply average the last 5 points collected to do my speed calculation, it will probably work great for "fast" reporting units but it will hurt my accuracy for "slow" reporting units.
Any ideas?
Use a simple filter:
Take a position only if it is more than 10 meters away from last taken position.
Then caluclate the distance between lastGood and thisGood, and divide by timeDiff.
Your further want to ignore all speeds under 5km/h were GPS is most noisy.
You further can optimize by calcuklating the direction between last and this, if it stays stable you take it. This helps filtering.
I would average the speed over the last X seconds. Let's pick X=3. For your fast reporters that means averaging your speed with about 20 data points. For your slow reporters, that may only get you 6 data points. This should keep the accuracy fairly even across the board.
I'd try using the average POSITION over the last X seconds.
This should "average out" the random noise associated with the high frequency location input....which should yield a better speed computation.
(Obviously you'd use "averaged" positions to compute your speed)
You probably have an existing data point structure to pull a linq query from?
In light of the note that we need to account for negative vectors, and the suggestion to account for known margins of error here is a more complex example:
class GPS
{
List<GPSData> recentData;
TimeSpan speedCalcZone = new TimeSpan(100000);
decimal acceptableError = .5m;
double CalcAverageSpeed(GPSData newestPoint)
{
var vectors = (from point in recentData
where point.timestamp > DateTime.Now - speedCalcZone
where newestPoint.VectorErrorMargin(point) < acceptableError
select new
{
xVector = newestPoint.XVector(point),
yVector = newestPoint.YVector(point)
});
var averageXVector = (from vector in vectors
select vector.xVector).Average();
var averageYVector = (from vector in vectors
select vector.yVector).Average();
var averagedSpeed = Math.Sqrt(Math.Pow(averageXVector, 2) + Math.Pow(averageYVector, 2));
return averagedSpeed;
}
}
But as pointed out in comments, there is no one magic algorithm, you have to tweak it for your circumstances and needs.
You're looking for one ideal algorithm that may not exist for one very simple reason: you can't invent data where there isn't any and some times you can't even tell where the data ends and error begins.
That being said there are ways to reduce the "noise" as you've discovered with averaging 5 consecutive measurements, I'd add to that you can throw away the "outliers" and choose 3 of the 5 that are closest to each-other.
The question here is what would work best (or acceptably well) for your situation. If you're tracking trucks moving around the continent a few mph won't matter as the errors will cancel themselves out, but if you're tracking a flying drone that moves between buildings the difference can be quite significant.
Here are some more ideas, you can pick and choose how far you can go, I'm assuming the truck scenario and the idea is to get most probable speed when you don't have an accurate reading:
- discard "improbable" speeds - tall buildings can reflect GPS signal causing speeds of over 100mph when you're just walking, having a "highway map" (see below) can help managing the cut-off value
- transmit, store and calculate with error ranges rather than point values (some GPS reports error range).
- keep average error per location
- keep average error per reporting device
- keep average speed per location, you'll end up having a map of highways vs other roads
- you can correlate location speed and direction
I'm trying to get input from plug-in guitar, get the frequency from it and check whether the users is playing the right note or not. Something like a guitar tuner (I'll need to do a guitar tuner as well).
My first question is, how can I get the frequency of guitar input in real time?
and is it possible to do something like :
if (frequency == noteCFrequency)
{
//print This is a C note!!
}
I'm now able to get input from the soundcard, record and playback the input sound already.
For an implementation of FFT in C# you can have a look at this.
Whiel I think that you do not need to fully understand the FFT to use it, you should know about some basic limitations:
You always need a sample window. You may have a sliding window but the essence of being fast here is to take a chunk of signal and accept some error.
You have "buckets" of frequencies not exact ones. The result is something like "In the range 420Hz - 440Hz you have 30% of the signal". (The "width" of the buckets should be adjustable)
The window size must contain a number of samples that is a power of 2.
The window size must be at least two wavelengths of the longest wavelength you want to detect.
The highest frequency is given by the sampling rate. (You don't need to worry about this so much)
The more precise you want your frequencies separated, the longer shall your window be.
The other answers don't really explain how to do this, they just kinda waive their arms. For example, you would have no idea from reading those answers that the output of an FFT is a set of complex numbers, and you wouldn't have any clue how to interpret them.
Moreover FFT is not even the best available method, although for your purposes it works fine and most people find it to be the most intuitive. Anyway, this question has been asked to death, so I'll just refer you to other questions on SO. Not all apply to C#, but you are going to need to understand the (non-trivial) concepts first. You can find an answer to your question by reading answers to these questions and following the links.
frequency / pitch detection for dummies
Get the frequency of an audio file in every 1/4 seconds in android
How to detect sound frequency / pitch on an iPhone?
How to calculate sound frequency in android?
how to retrieve the different original frequency in each FFT caculating and without any frequency leakage in java
You must compute the FFT -Fast Fourier Transform- of a piece of signal and look for a peak. For kinds of FFT, window type, window size... you must read some documentation regarding signal processing. Anyway a 25 ms window is OK and use a Hamming window, for example. On the net there is lot of code for computing FFT. Good luck!
Imagine I want to, say, compute the first one million terms of the Fibonacci sequence using the GPU. (I realize this will exceed the precision limit of a 32-bit data type - just used as an example)
Given a GPU with 40 shaders/stream processors, and cheating by using a reference book, I can break up the million terms into 40 blocks of 250,000 strips, and seed each shader with the two start values:
unit 0: 1,1 (which then calculates 2,3,5,8,blah blah blah)
unit 1: 250,000th term
unit 2: 500,000th term
...
How, if possible, could I go about ensuring that pixels are processed in order? If the first few pixels in the input texture have values (with RGBA for simplicity)
0,0,0,1 // initial condition
0,0,0,1 // initial condition
0,0,0,2
0,0,0,3
0,0,0,5
...
How can I ensure that I don't try to calculate the 5th term before the first four are ready?
I realize this could be done in multiple passes but setting a "ready" bit whenever a value is calculated, but that seems incredibly inefficient and sort of eliminates the benefit of performing this type of calculation on the GPU.
OpenCL/CUDA/etc probably provide nice ways to do this, but I'm trying (for my own edification) to get this to work with XNA/HLSL.
Links or examples are appreciated.
Update/Simplification
Is it possible to write a shader that uses values from one pixel to influence the values from a neighboring pixel?
You cannot determine the order the pixels are processed. If you could, that would break the massive pixel throughput of the shader pipelines. What you can do is calculating the Fibonacci sequence using the non-recursive formula.
In your question, you are actually trying to serialize the shader units to run one after another. You can use the CPU right away and it will be much faster.
By the way, multiple passes aren't as slow as you might think, but they won't help you in your case. You cannot really calculate any next value without knowing the previous ones, thus killing any parallelization.
I want to make a program that detects the note that is being played in front of the microphone. I am testing the FFT function of Naudio, but with the tests that I did in audacity it seems that FFT does not detect the pitch correctly. I played an C5, but the highest pick was at E7.
I changed the first dropdown box in the frequency analysis window to "enchanced autocorrelation" and after that the highest pick was at C5.
I googled "enchanced autocorrelation" and had no luck.
You are likely getting thrown off by harmonics. Have you tried testing with a sine wave to see if your NAudio's FFT is in the ballpark?
See these references:
http://cnx.org/content/m11714/latest/
http://www.gamedev.net/community/forums/topic.asp?topic_id=506592&whichpage=1�
Line 48 in Spectrum.cpp in the Audacity source code seems to be close to what you want. They also reference an IEEE paper by Tolonen and Karjalainen.
The highest peak in an audio spectrum is not necessarily the musical pitch as a human would perceive it, especially in a sound with strong overtones. That's because pitch is a human psycho-perceptual phenomena, the brain will often deduce frequencies that aren't even present in a waveform.
Auto-correlation methods of frequency or pitch estimation (roughly, finding how far apart even a funny-looking and/or non-sinusoidal waveform repeats in time) is usually a better match for what a human would call pitch. The reason for various enhancements to the autocorrelation algorithm is that simple autocorrelation will find an near infinite number of repeating wavelengths (e.g. if it repeats every 1 second it also repeats twice every 2 seconds, etc.) So the trick is to weight the correlation to somehow statistically better match what a human would guess about the same waveform.
Well, if you can live with GPLv2, why not take a peek at the Audacity source code?
http://audacity.sourceforge.net/download/beta_source