triggering an event with a certain probability with C# - c#

I'm trying to simulate a realistic key press event. For that reason I'm using SendInput() method, but for greater result I need to specify the delay between keyDOWN and KeyUP events! These numbers below show the elapsed time in milliseconds between DOWN and UP events (these are real/valid):
96
95
112
111
119
104
143
96
95
104
120
112
111
88
104
119
111
103
95
104
95
127
112
143
144
142
143
128
144
112
111
112
120
128
111
135
118
147
96
135
103
64
64
87
79
112
88
111
111
112
111
104
87
95
We can simplify the output:
delay 64 - 88 ms -> 20% of a time
delay 89 - 135 ms -> 60% of a time
delay 136 - 150 ms -> 20 % of a time
How do I trigger an event according to probabilities from above? Here is the code I'm using right now:
private void button2_Click(object sender, EventArgs e)
{
textBox2.Focus();
Random r = new Random();
int rez = r.Next(0, 5); // 0,1,2,3,4 - five numbers total
if (rez == 0) // if 20% (1/5)
{
Random r2 = new Random();
textBox2.AppendText(" " + rez + " " + r2.Next(64, 88) + Environment.NewLine);
// do stuff
}
else if (rez == 4)//if 20% (1/5)
{
Random r3 = new Random();
textBox2.AppendText(" " + rez + " " + r3.Next(89, 135) + Environment.NewLine);
// do stuff
}
else // if 1 or 2 or 3 (3/5) -> 60%
{
Random r4 = new Random();
textBox2.AppendText(" " + rez + " " + r4.Next(136, 150) + Environment.NewLine);
// do stuff
}
}
There is a huge problem with this code. In theory, after millions of iterations - the resulting graph will look similar to this:
How to deal with this problem?
EDIT: the solution was to use distribution as people suggested.
here is java implementation of such code:
http://docs.oracle.com/javase/1.4.2/docs/api/java/util/Random.html#nextGaussian%28%29
and here is C# implementation:
How to generate normally distributed random from an integer range?
although I'd suggest to decrease the value of "deviations" a little.
here is interesting msdn article
http://blogs.msdn.com/b/ericlippert/archive/2012/02/21/generating-random-non-uniform-data-in-c.aspx
everyone thanks for help!

Sounds like you need to generate a normal distribution. The built-in .NET class generates a Uniform Distribution.
Gaussian or Normal distribution random numbers are possible using the built-in Random class by using the Box-Muller transform.
You should end up with a nice probability curve like this
(taken from http://en.wikipedia.org/wiki/Normal_distribution)
To transform a Normally Distributed random number into an integer range, the Box-Muller transform can help with this again. See this previous question and answer which describes the process and links to the mathematical proof.

This is the right idea, I just think you need to use doubles instead of ints so you can partition the probability space between 0 and 1. This will allow you to get a finer grain, as follows :
Normalise the real values by dividing all the values by the largest value
Divide the values into buckets - the more buckets, the closer the graph will be to the continuous case
Now, the larger the bucket the more chance of the event being raised. So, partition the interval [0,1] according to how many elements are in each bucket. So, if you have 20 real values, and a bucket has 5 values in it, it takes up a quarter of the interval.
On each test, generate a random number between 0-1 using Random.NextDouble() and whichever bucket the random number falls into, raise an event with that parameter. So for the numbers you provided, here are the values for 5 buckets buckets :
This is a bit much to put in a code example, but hopefully this gives the right idea

One possible approach would be to model the delays as an Exponential Distribution. The exponential distribution models the time between events that occur continuously and independently at a constant average rate - which sounds like a fair assumption given your problem.
You can estimate the parameter lambda by taking the inverse of the average of your real observed delays, and simulate the distribution using this approach, i.e.
delay = -Math.Log(random.NextDouble()) / lambda
However, looking at your sample, the data looks too "concentrated" around the mean to be a pure Exponential, so simulating that way would result in delays with the proper mean, but too spread out to match your sample.
One way to address that is to model the process as a shifted Exponential; essentially, the process is shifted by a value which represents the minimum the value can take, instead of 0 for an exponential. In code, taking the shift as the minimum observed value from your sample, this could look like this:
var sample = new List<double>()
{
96,
95,
112,
111,
119,
104,
143,
96,
95,
104,
120,
112
};
var min = sample.Min();
sample = sample.Select(it => it - min).ToList();
var lambda = 1d / sample.Average();
var random = new Random();
var result = new List<double>();
for (var i = 0; i < 100; i++)
{
var simulated = min - Math.Log(random.NextDouble()) / lambda;
result.Add(simulated);
Console.WriteLine(simulated);
}
A trivial alternative, which is in essence similar to Aidan's approach, is to re-sample: pick random elements from your original sample, and the result will have exactly the desired distribution:
var sample = new List<double>()
{
96,
95,
112,
111,
119,
104,
143,
96,
95,
104,
120,
112
};
var random = new Random();
var size = sample.Count();
for (var i = 0; i < 100; i++)
{
Console.WriteLine(sample[random.Next(0, size)]);
}

Related

What kind of Algorithm am I looking for to combine quantities?

I have been stuck on this problem now for 8 weeks and I think that I almost have a solution however the last bit of math is racking my mind. I will try to explain a simple problem that requires a complex solution. I am programing in C#.net MVC Web Project. Here is the situation.
I have an unknown group of quantities incoming to look for like items. Those like items share a max level to make it a full box. Here is an example of this:
Revision******
This is the real world case
I have many, let say candy, orders coming in to a company.
Qty Item MaxFill Sold-To DeliverNumber
60 candy#14 26 Joe 1
1 candy#12 48 Jim 2
30 candy#11 48 Jo 3
60 candy#15 48 Tom 4
6 candy#8 48 Kat 5
30 candy#61 48 Kim 6
44 candy#12 48 Jan 7
10 candy#12 48 Yai 8
10 candy#91 48 Jun 9
55 candy#14 26 Qin 10
30 candy#14 26 Yo 11
40 candy#14 26 Moe 12
in this list I am looking for like candy items to combine to make all the full boxes of candy that I can based off the MaxFill number. Here we see the like items are:
Qty Item MaxFill Sold-To DeliverNumber
60 candy#14 26 Joe 1
55 candy#14 26 Qin 10
30 candy#14 26 Yo 11
40 candy#14 26 Moe 12
1 candy#12 48 Jim 2
44 candy#12 48 Jan 7
10 candy#12 48 Yai 8
Now lets take the first set of numbers for candy#14.
I know that the total of candy#14 is 185 and I can get 7 full boxes of 26 with one box having only 3 in the last box. So how do I do this with the values that I have without losing the information of the original order. So this is how I am working it out right now
See below
End of Revision******
Like candy#14 max fill level is 26.
Like candy#14 quantities:
60
55
30
40
Now I already have a recursive function to break these down to the 26 level and is working fine. I feel that I need another recursive function to deal with the remainders that come out of this. As you can see most of the time there will be remainders from any given list but those remainders could total up to another full box of 26.
60 = 26+26+8
55 = 26+26+3
30 = 26+4
40 = 26+14
The 8,3,4,14 = 29 so I can get another 26 out of this. But in the real unknown world I could have the remainders come up with a new set of remainders that could repeat the same situation. To make this even more complicated I have to save the data that is originality with the 60,55,30,40 that is carried with it such as who it was sold to and delivery number. This will also be helpful with knowing how the original amount was broken down and combined together.
from the 8,3,4,14 the best way that I was think to add to that value is to take the 8,4,14 this will give me the 26 that I am looking for and I would not have to split any value because 3 is the remainder and I could save all other data without issue. However this just works in this situation only. If I go in a linear motion 8+3+4=15 so I would have to take 11 from the next value 14 with a remainder of 3.
In reading about different algorithms I was thinking that this might fall into the NP,NP-Complete,NP-Hard category. But with all the situations it is very technical and not a lot of real world scenarios are to be found.
Any suggestions would help here if I should go through the list of number to find the best combinations to reach the 26 or if the linear progression and splitting of the next value is the best solution. I know that I can solve to get how many full boxes I could get from the remainders and what the left over amount would be such as 8+3+4+14=29 which would give me 1, 26 and 1, 3 but I have no idea about the math in a recursive way to solve this. I have this much done and I "feel" that this is on the right track but can't see how to adjust to make this work with the linear or "test every possible combination".
public static void Main(string[] args)
{
var numbers = new List<int>() { 8, 3, 4, 14 };
var target = 26;
sum_up(numbers, target);
}
private static void sum_up(List<int> numbers, int target)
{
sum_up_recursive(numbers, target, new List<int>());
}
private static void sum_up_recursive(List<int> numbers, int target, List<int> partial)
{
int s = 0;
foreach (int x in partial) s += x;
if (s == target)
{
var outputtext = "sum(" + string.Join(",", partial.ToArray()) + ")=" + target;
}
if (s >= target)
return;
for (int i = 0; i < numbers.Count; i++)
{
List<int> remaining = new List<int>();
int n = numbers[i];
for (int j = i + 1; j < numbers.Count; j++) remaining.Add(numbers[j]);
List<int> partial_rec = new List<int>(partial);
partial_rec.Add(n);
sum_up_recursive(remaining, target, partial_rec);
}
}
I wrote sample project in javascript.
Please check my repo.
https://github.com/panghea/packaging_sample

How to sort a number sequence that wraps around

I have a sequence of objects, that each have a sequence number that goes from 0 to ushort.MaxValue (0-65535). I have at max about 10 000 items in my sequence, so there should not be any duplicates, and the items are mostly sorted due to the way they are loaded. I only need to access the data sequentially, I don't need them in a list, if that can help. It is also something that is done quite frequently, so it cannot have a too high Big-O.
What is the best way to sort this list?
An example sequence could be (in this example, assume the sequence number is a single byte and wraps at 255):
240 241 242 243 244 250 251 245 246 248 247 249 252 253 0 1 2 254 255 3 4 5 6
The correct order would then be
240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 0 1 2 3 4 5 6
I have a few different approaches, including making a array of ushort.MaxValue size, and just incrementing the position, but that seems like a very inefficient way, and I have some problems when the data I receive have a jump in sequence. However, it's O(1) in performance..
Another approach is to order the items normally, then find the split (6-240), and move the first items to the end. But I'm not sure if that is a good idea.
My third idea is to loop the sequence, until I find a wrong sequence number, look ahead until I find the correct one, and move it to its correct position. However, this can potentially be quite slow if there is a wrong sequence number early on.
Is this what you are looking for?
var groups = ints.GroupBy(x => x < 255 / 2)
.OrderByDescending(list => list.ElementAt(0))
.Select(x => x.OrderBy(u => u))
.SelectMany(i => i).ToList();
Example
In:
int[] ints = new int[] { 88, 89, 90, 91, 92, 0, 1, 2, 3, 92, 93, 94, 95, 96, 97, 4, 5, 6, 7, 8, 99, 100, 9, 10, 11, 12, 13 };
Out:
88 89 90 91 92 92 93 94 95 96 97 99 100 0 1 2 3 4 5 6 7 8 9 10 11 12 13
I realise this is an old question byte I also needed to do this and would have liked an answer so...
Use a SortedSet<FileData> with a custom comparer;
where FileData contains information about the files you are working with
e.g.
struct FileData
{
public ushort SequenceNumber;
...
}
internal class Sequencer : IComparer<FileData>
{
public int Compare(FileData x, FileData y)
{
ushort comparer = (ushort)(x.SequenceNumber - y.SequenceNumber);
if (comparer == 0) return 0;
if (comparer < ushort.MaxValue / 2) return 1;
return -1;
}
}
As you read file information from disk add them to your SortedSet
When you read them out of the SortedSet they are now in the correct order
Note that the SortedSet uses a Red-Black Internally which should give you a nice balance between performance and memory
Insertion is O(log n)
Traversal is O(n)

Hysteresis Round to solve "flickering" values due to noise

Background:
We have an embedded system that converts linear positions (0 mm - 40 mm) from a potentiometer voltage to its digital value using a 10-bit analog to digital converter.
------------------------
0mm | | 40 mm
------------------------
We show the user the linear position at 1 mm increments. Ex. 1mm, 2mm, 3mm, etc.
The problem:
Our system can be used in electromagnetically noisy environments which can cause the linear position to "flicker" due to noise entering the ADC. For example, we will see values like: 39,40,39,40,39,38,40 etc. when the potentiometer is at 39 mm.
Since we are rounding to every 1 mm, we will see flicker between 1 and 2 if the value toggles between 1.4 and 1.6 mm for example.
Proposed software solution:
Assuming we can not change the hardware, I would like to add some hysteresis to the rounding of values to avoid this flicker. Such that:
If the value is currently at 1mm, it can only go to 2mm iff the raw value is 1.8 or higher.
Likewise, if the current value is 1mm it can only go to 0mm iff the raw value is 0.2 or lower.
I wrote the following simple app to test my solution. Please let me know if I am on the right track, or if you have any advice.
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
namespace PDFSHysteresis
{
class Program
{
static void Main(string[] args)
{
double test = 0;
int curr = 0;
Random random = new Random();
for (double i = 0; i < 100; i++)
{
test = test + random.Next(-1, 2) + Math.Round((random.NextDouble()), 3);
curr = HystRound(test, curr, 0.2);
Console.WriteLine("{0:00.000} - {1}", test, curr);
}
Console.ReadLine();
}
static int HystRound(double test, int curr, double margin)
{
if (test > curr + 1 - margin && test < curr + 2 - margin)
{
return curr + 1;
}
else if (test < curr - 1 + margin && test > curr - 2 + margin)
{
return curr - 1;
}
else if (test >= curr - 1 + margin && test <= curr + 1 - margin)
{
return curr;
}
else
{
return HystRound(test, (int)Math.Floor(test), margin);
}
}
}
}
Sample output:
Raw HystRound
====== =========
00.847 1
00.406 1
01.865 2
01.521 2
02.802 3
02.909 3
02.720 3
04.505 4
06.373 6
06.672 6
08.444 8
09.129 9
10.870 11
10.539 11
12.125 12
13.622 13
13.598 13
14.141 14
16.023 16
16.613 16
How about using the average of readings for the last N seconds, where N could be fairly small / sub-second depending on your sample rate?
You can use a simple linear average, or something more complex, depending on your needs. Several moving average algorithms are detailed on Wikipedia:
http://en.wikipedia.org/wiki/Moving_average
Depending on your sensitivity / responsiveness needs, you could reset the average if a new reading exceeds the running average by X%.
I had to deal with something similar sometime ago where I had to read voltage output from a circuit and display a graph on a computer screen. The bottom line is, this really depends on your system requirements. If the requirement is "1mm" accuracy then there is nothing you could really do. Otherwise, as mentioned above, you could go with several methods that can help you out lessen the flickering. You can:
Calculate the average of these values over a certain period of time the user can configure.
Allow the user to set a "Sensitivity threshold". This threshold can be used to decide on weather to consider the new value as valid or not. In your example, the threshold can be set to 2mm in which case values such as 39, 40, 39, 38 would read as 39mm
Also, have you thought about putting an external stabilizer between your application and the hardware itself?
I think Gareth Rees gave an excellent answer to a very similar question:
how to prevent series of integers to have the same value to often

Porting libmfcc to C# using Bass Library

I am currently using The Bass Library for Audio Analysis which can calculate FFT and return it as an Array, libmfcc uses this Data to calculate the Value of the MFCC Coefficients which I need. (Info: MFCC is like a Audio Spectrum but it fits more the way how the Human Hearing and Frequency Scaling works)
The Bass Library returns Values from 0 to 1 as FFT Values.
Now I encountered several Problems and Questions:
Their FFT Example Data seems to have a different Format, Values are very high and the total of the 8192 FFT Values Sum to 10739.24 , how can that be?[/li]
In their example Application they call the Function like the following. Why they Use 128 as FFT Array Size if they just loaded 8192 Values?
Using their MFCC Class which I copied and edited a bit to match C# Syntax/Functions I get negative Values for some Coefficients, I dont think that should be the case.
Can anyone help me out why it is returning negative Values or what I did wrong ?
I made a simple example Ready to Try Program which does the described above and is useful for debugging.
Link: http://www.xup.in/dl,17603935/MFCC_Test.rar/
Output from my C# Application (Most likely not correct)
Coeff 16 = 0,017919318626506 Coeff 17 = -0,155580763009355 Coeff 18 =
-0,76072865841987 Coeff 19 = 0,108961510335727 Coeff 20 = 0,819025783804398 Coeff 21 = -0,660508603974514 Coeff 22 =
-0,951623924906163 Coeff 23 = 0,424922129906254 Coeff 24 = 0,0129727009313168 Coeff 25 = -0,388796833267654 Coeff 26 =
0,270839393161931 Coeff 27 = -0,138515788828431 Coeff 28 =
-0,454837674981149 Coeff 29 = -0,448629344922371 Coeff 30 = -0,11908663618393 Coeff 31 = 0,237500036702818 Coeff 32 = 0,114874386870208 Coeff 33 = -0,100822381384326 Coeff 34 =
0,144242143551012 Coeff 35 = 0,209338502838453 Coeff 36 =
0,247588420953066 Coeff 37 = -0,451654204112441 Coeff 38 =
0,0346927542067229 Coeff 39 = 0,180816031061584
Their example FFT Data (Different Format?)
14.524506
38.176063
10.673860
3.705076
2.102398
1.461585
1.145616
0.974108
0.878079
0.825304
0.798959
0.789067
0.789914
0.797102
0.808576
0.822048
0.836592
0.851101
0.864869
0.877625
0.888780
0.897852
0.905033
0.910054
0.912214
0.912414
0.909593
0.904497
I can answer the first part:
The sample code clearly states that the input data was computed using FFTW, which produces an unnormalized result. You need to divide by sqrt(n) to get the normalized values, which is what I suspect BASS returns.
Perhaps multiplying your inputs by sqrt(n) will give you better results.
The MFCC routine returns cepstral coefficients (DCT of the log of mel magnitudes), not mel magnitude values. Cepstral coefficients can be negative. I believe the value 128 in the example code is indeed a mistake by the author. In order to preserve the signal energy an FFT requires normalization at some point (either after FFT, iFFT or split between the two). In the example you're looking at the raw (unnormalized) magnitudes, which is why they are so large.

Merging approximately equal points in dataset

I'm looking for an algorithm that can quickly run through a short (<30 element) array and merge points that are approximately equal. It'll probably end up being some sort of segmentation algorithm.
The context is as follows: I'm looking for the tallest peaks in a dataset. I've already separated the tallest maximums from the dross using a one-dimensional implementation of J-SEG, but anywhere where the dataset is "flat," I get back a point for every element along the plateau. I need to be able to adaptively merge these points to a single point at the center of the plateau. (This also means I don't know how many clusters there will be.)
Sample dataset 1 (Sample/Artificial input)
Input:
97 54686024814922.8
118 406406320535.935
148 24095826539423.7
152 1625624905272.95
160 1625625128029.81
166 1625625152145.47
176 1625625104745.48
179 1625625127365.09
183 1625625152208.44
190 1625624974205.81
194 21068100428092.9
247 54686024895222.1
Ideal Output:
97 54686024814922.8
118 406406320535.935
148 24095826539423.7
159 1625625061816.08
182 1625625089631.21
194 21068100428092.9
247 54686024895222.1
Sample dataset 2 (Real input):
Input:
2 196412376940671
123 206108518197124
135 194488685387149
148 178463949513298
154 192912098976702
156 195042451997727
161 195221254214493
168 204760073508681
172 189240741651297
182 191554457423846
187 215014126955355
201 202294866774063
Ideal output:
2 196412376940671
123 206108518197124
135 194488685387149
148 178463949513298
157 194391935062974
168 204760073508681
172 189240741651297
182 191554457423846
187 215014126955355
201 202294866774063
Sample Dataset 3 (Real input)
Input:
2 299777367852602
26 263467434856928
35 293412234811901
83 242768805551742
104 226333969841383
107 227548774800053
178 229173574175201
181 229224441416751
204 244334414017228
206 245258151638118
239 198782930497571
Ideal output:
2 299777367852602
26 263467434856928 (May be merged
35 293412234811901 depending on parameters)
83 242768805551742
105.5 226941372320718
179.5 229199007795976
205 244796282827673
239 198782930497571
(Will edit in further information as needed.)
I'm not sure if this is exactly what you want, but there aren't any other answers posted yet so here we go.
I looked at it from the perspective of a graph. If I were looking at a graph and I wanted to determine which points were horizontally similar that would end up being relative to the graphs scale. So I made a function that accepts a percentage of the scale that you want to be considered the same. It then takes that percentage and multiplies it by the maximum difference between your dataset.
Additionally the similar value is always compared against the average of the currently located plateau. Once a a plateau is detected to end it adds the x's together and divides by 2 to get the middle and then takes the average y value and adds it as a final data point.
Without having access to good sample data all I have to go on is my very poor data generator that I made. But in my testing values within 1% generally eliminated about half of my data points.
Now it's important to note that this is one dimensional, x distance is completely ignored. You could very easily expand it to be two dimension as well. Also something else that I considered doing is instead of outputting a single data point to represent plateaus you could output a start and end point of the average instead.
namespace PointCondenser
{
public static class Extensions
{
static public bool AlmostEqual<T>(this T value, T value2, T epsilon)
{
return (Math.Abs((dynamic)value - value2) < epsilon);
}
}
public struct Point
{
public Point(double x, double y)
{
X = x;
Y = y;
}
public override string ToString()
{
return string.Format("{0}\t{1}", X, Y);
}
public double X;
public double Y;
}
class Program
{
static public Point RandomYPoint(int i)
{
var r = new Random();
var r2 = new Random(i);
var variance = r2.NextDouble() / 100;
return new Point(i, Math.Abs(r.NextDouble() - variance) * 100);
}
static public IEnumerable<Point> SmoothPoints(IEnumerable<Point> points, double percent)
{
if (percent <= 0 || percent >= 1)
throw new ArgumentOutOfRangeException("percent", "Percentage outside of logical bounds");
var final = new List<Point>();
var apoints = points.ToArray();
var largestDifference = apoints.Max(x => x.Y) - apoints.Min(x => x.Y);
var epsilon = largestDifference * percent;
var currentPlateau = new List<Point> { apoints[0] };
for (var i = 1; i < apoints.Length; ++i)
{
var point = apoints[i];
if (point.Y.AlmostEqual(currentPlateau.Average(x => x.Y), epsilon))
currentPlateau.Add(point);
else
{
var x = (currentPlateau[0].X + currentPlateau[currentPlateau.Count - 1].X) / 2.0;
var y = currentPlateau.Average(z => z.Y);
currentPlateau.Clear();
currentPlateau.Add(point);
final.Add(new Point(x, y));
}
}
return final;
}
static void Main(string[] args)
{
var r = new Random();
var points = new List<Point>();
for (var i = 0; i < 100; ++i)
{
for (var n = 0; n < r.Next(1, 5); ++n)
{
var p = RandomYPoint(points.Count);
points.Add(p);
Console.WriteLine(p);
}
Thread.Sleep(r.Next(10, 250));
}
Console.Write("\n\n Condensed \n\n");
var newPoints = SmoothPoints(points, .01);
foreach (var p in newPoints)
Console.WriteLine(p);
}
}
}
Another approach to clustering without parameters is to merge the closest data points.
That is in each pass you will find the smallest gap between two data points and then merge the pairs with that gap.
As a result in each pass granularity will decrease. However finding the smallest gap can be expensive unless the data points are sorted based on the attribute that you compare.
In retrospect, I could also have done this with linear regression: if the slope is close to zero, and the slope to the next point is similar to the average slope of previous points on the plateau, then register next point for merging and continue.

Categories

Resources