So... no clue where I should be asking this question but I'm hoping someone here can at least point me in the right direction. I have a time series that I would like to do spectral analysis on but I can't find any tools for doing FFT that accommodate a varied time difference between data points (they all assume dt is constant). Does anyone know of a tool that would work for this (I'm specifically looking for a periodogram or some other way to determine periodicity).
My only thought is to do linear interpolation between data points at a specific time interval to give the data a constant dt but I'm worried that will scew the spectral analysis data.
Here is a small chunk of the data; time, data, dt
time data dt
39.630 49662.1 0.170
39.810 49582.5 0.180
40.150 49430.0 0.340
40.320 49413.8 0.170
40.490 49324.0 0.170
40.670 49092.5 0.180
40.830 49025.6 0.160
41.010 49101.5 0.180
any suggestions??
Most mathematical theory behind FFT requires a fixed sampling period.
I can suggest you to do the following:
Create a sequence of equally spaced time instants
Calculate the value for each instant based on the neighbor points (linear or quadratic interpolation should do it). You can use as many points as you desire in order to obtain the best approximation.
Depending on the degree of detail you want for your results use a parametric or non-parametric method for estimating spectrum: Burg method can be useful and LPC/AR model can also be useful.
Check the links at Mathworks:
http://www.mathworks.com/help/signal/nonparametric-spectral-estimation.html
http://www.mathworks.com/help/signal/parametric-spectral-estimation.html
Related
I need to show a chart from data returned from an API.
This API could potentially return millions of results, but it would tax the server heavily.
Thus, I'm looking for a way to return a fewer number of results and still show a trend in the chart. Basically, I'm looking to "smooth" the line of the graph by showing only relevant points.
Is there a .NET library that could help me in this implementation? Or perhaps a "smoothing" function that takes a limit on the number of points to results?
What would be your target number of results? One approach would be to just take a sampling of the points. For every 10 points you have, return 1, for instance. In which case, you could use Linq to accomplish this: Sampling a list with linq
This doesn't address the "showing only relevant points" part of your question, though. That's a little harder to solve programmatically. What does "relevant" mean in your data? Exceeding a certain deviation?
So maybe a moving average of your data would work. Take 10 points at a time, average them, return 1 point. Like this example: Smoothing data from a sensor
With either of those approaches, you can trade off accuracy and 'smoothness' by varying the "10" in the above examples. The higher the number, the "smoother" your result.
I'm looking for a way to detect faulty sensors in an IOT environment.
In this case a tank level sensor. The readings are always fluctuating somewhat, and the "hop" at the beginning is a tank refill which is "normal". On Sep 16 the sensor started to malfunction and just gives apparent random values after that.
As a programmer ideally I'd like a simple way of detecting the problem (and as soon after it starts as possible).
I can mess about with "if direction of vector between two hourly averages changes direction more than once per day it is unstable". But I guess there are more sound and stable algorithms out there.
Two simple options:
domain knowledge based: If you know the max possible output of the tank (say 5 liter/h), any output above that would signal an error. I.e. in case of the example, if
t1-t2 > 5
assuming t1 and t2 show the tank capacity at hourly intervall. You might want to add sensor accuracy related safety margin.
past data based: Assuming that all tanks are similar regarding output capacity and used sensor quality, calculate the following for all your data of non-faulty sensors:
max(t1-t2)
The result is the error threshold to be used, similar to the value 5 above.
Note: tank refill operation might require additional consideration.
Additional methods are described e.g. here. You can find other papers for sure.
http://bourbon.usc.edu/leana/pubs-full/sensorfaults.pdf
Standard deviation.
You're looking at how much variation there is between the measurements. Standard deviation is an easy formula, and well known. Look for a high value, and you know there's a problem.
You can also use coefficient of variation, which is the ratio of the mean to standard deviation.
I have a stream of data that I get at around 10 snapshots per second. I wrote a C# controller that takes this data and adjusts some data-gathering parameters based on the distance away from the expected result. Currently, I just do a linear scale operation on my data so that the farther away I am from the expected result the more it corrects. The problem is that the incoming data stream is delayed by somewhere between 0.5 and 2 seconds (I can calculate that at runtime). Because of this delay, it is correcting for results from a while ago and is constantly overcorrecting and even correcting in the wrong direction sometimes.
I am looking for an algorithm to do the following:
Implement a correction algorithm (I'd prefer PID) that will attempt to hone-in on the optimal value
Predict a certain amount of time (or number of datasets) in advance based on the history of corrections and results
What options do I have in terms of algorithms that can accomplish this?
C# code samples would be appreciated; I am not great at converting complex pseudo-code in to a working implementation.
I have chart that can contain a lot of points (10000 +)
When I scale the chart in order to see all points in screen, it takes some time to draw them
Can You advice me some optimization, in order not to draw all points
I'm not an expert with listed technologies, but I would solve this by 'bucketing' your data points.
Your X axis is time, so determine the resolution point for the current chart size. IE, if you are seeing the entire chart you will only need a data point per day for example. If you are zoomed in a long way, you might want a point per hour.
Now you have determined resolution, go through your chart, and find all the data that exists between the resolution points, IE, all data that is > 20th April 2011 at 4pm and < 20th April 2011 at 5pm if you are on an hourly resolution.
Depending on the type of data you are using, will determine if you want to average all the data point you have collected, or find the median (or some other method, such as a candle stick chart to show the max/min values). Either way, pick the most relevant method, repeat for all points and render the result with your new data.
Hope that's what you meant.
Seems like you should use some sort of level of detail (LoD) algorithm.
For example:
Always use a maximum given set of points to represent all your actual points. By calculating local minima and maxima you can create a proper representation of the given point set for a certain 'detail', depending on how far you are zoomed in.
Calculating these extrema can still prove to be slow, so you might need to cache them. You can calculate and cache this on the fly as new data arrives.
In addition to the other good suggestions, I would
Do some random-pausing on it, to see if it's spending much time doing something else that could be avoided, such as maybe allocating new point structures all the time.
Rather than paint directly to the window, paint to a bitmap, and copy that to the window. It always looks faster, and sometimes it even is faster. (Be sure to stub out the method that clears the window background.)
I had experienced a severe performance problem with thousands of Series added to the chart rather than thousands of Points. The solution that worked for me was a flavor of the Flyweight pattern:
Instead of adding 1000-s of series, add just a single one.
At the end of a virtual series, i.e., when all points of the series have been added and it's time to move on to the next one, insert an empty point:
series.Points.Add(new DataPoint(0, 0) { IsEmpty = true });
Hope that helps somebody.
I have very little data for my analysis, and so I want to produce more data for analysis through interpolation.
My dataset contain 23 independent attributes and 1 dependent attribute.....how can this done interpolation?
EDIT:
my main problem is of shortage of data, i hv to increase the size of my dataset, n attributes are categorical for example attribute A may be low, high, meduim, so interpolation is the right approach for it or not????
This is a mathematical problem but there is too little information in the question to properly answer. Depending on distribution of your real data you may try to find a function that it follows. You can also try to interpolate data using artificial neural network but that would be complex. The thing is that to find interpolations you need to analyze data you already have and that defeats the purpose. There is probably more to this problem but not explained. What is the nature of the data? Can you place it in n-dimensional space? What do you expect to get from analysis?
Roughly speaking, to interpolate an array:
double[] data = LoadData();
double requestedIndex = /* set to the index you want - e.g. 1.25 to interpolate between values at data[1] and data[2] */;
int previousIndex = (int)requestedIndex; // in example, would be 1
int nextIndex = previousIndex + 1; // in example, would be 2
double factor = requestedIndex - (double)previousIndex; // in example, would be 0.25
// in example, this would give 75% of data[1] plus 25% of data[2]
double result = (data[previousIndex] * (1.0 - factor)) + (data[nextIndex] * factor);
This is really pseudo-code; it doesn't perform range-checking, assumes your data is in an object or array with an indexer, and so on.
Hope that helps to get you started - any questions please post a comment.
If the 23 independent variables are sampled in a hyper-grid (regularly spaced), then you can choose to partition into hyper-cubes and do linear interpolation of the dependent value from the vertex closest to the origin along the vectors defined from that vertex along the hyper-cube edges away from the origin. In general, for a given partitioning, you project the interpolation point onto each vector, which gives you a new 'coordinate' in that particular space, which can then be used to compute the new value by multiplying each coordinate by the difference of the dependent variable, summing the results, and adding to the dependent value at the local origin. For hyper-cubes, this projection is straightforward (you simply subtract the nearest vertex position closest to the origin.)
If your samples are not uniformly spaced, then the problem is much more challenging, as you would need to choose an appropriate partitioning if you wanted to perform linear interpolation. In principle, Delaunay triangulation generalizes to N dimensions, but it's not easy to do and the resulting geometric objects are a lot harder to understand and interpolate than a simple hyper-cube.
One thing you might consider is if your data set is naturally amenable to projection so that you can reduce the number of dimensions. For instance, if two of your independent variables dominate, you can collapse the problem to 2-dimensions, which is much easier to solve. Another thing you might consider is taking the sampling points and arranging them in a matrix. You can perform an SVD decomposition and look at the singular values. If there are a few dominant singular values, you can use this to perform a projection to the hyper-plane defined by those basis vectors and reduce the dimensions for your interpolation. Basically, if your data is spread in a particular set of dimensions, you can use those dominating dimensions to perform your interpolation, since you don't really have much information in the other dimensions anyway.
I agree with the other commentators, however, that your premise may be off. You generally don't want to interpolate to perform analysis, as you're just choosing to interpolate your data in different ways and the choice of interpolation biases the analysis. It only makes sense if you have a compelling reason to believe that a particular interpolation is physically consistent and you simply need additional points for a particular algorithm.
May I suggest Cubic Spline Interpolation
http://www.coastrd.com/basic-cubic-spline-interpolation
unless you have a very specific need, this is easy to implement and calculates splines well.
Have a look at the regression methods presented in Elements of statistical learning; most of them may be tested in R. There are plenty of models that can be used: linear regression, local models and so on.