For Loop in Foreach Loop Performance Improvement - c#

I have a db table with 2M entries
My XPositions table structure is
Id - int
FID - int
CoordinateQue - int
Latitude - float
Longitude - float
Each row represents a marker position and I need to calculate distance between each coordinates and save to another table.
My xWeights table structure is;
Id - int
x_Id - int
Tox - int
Distance - decimal(18,8)
So far my working code is
var query = _xRepository.TableNoTracking;
var xNodes = query.ToList()
var n = new xWeights();
foreach (var x in xNodes)
{
for (var i = 0; i < xNodes.Count; i++)
{
if(x.Id == xNodes[i].Id)
{
//Do nothing - Same Node
}
else
{
var R = 6378137;
var φ1 = (Math.PI / 180) * x.Latitude;
var φ2 = (Math.PI / 180) * xNodes[i].Latitude;
var Δφ = (xNodes[i].Latitude - x.Latitude) * (Math.PI / 180);
var Δλ = (xNodes[i].Longitude - x.Longitude) * (Math.PI / 180);
var Δψ = Math.Log(Math.Tan(Math.PI / 4 + φ2 / 2) / Math.Tan(Math.PI / 4 + φ1 / 2));
var q = Math.Abs(Δψ) > 10e-12 ? Δφ / Δψ : Math.Cos(φ1); // E-W course creates problem with 0/0
// if Longitude over 180° take shorter rhumb line across the anti-meridian:
if (Math.Abs(Δλ) > Math.PI) Δλ = Δλ > 0 ? -(2 * Math.PI - Δλ) : (2 * Math.PI + Δλ);
var dist = (Math.Sqrt(Δφ * Δφ + q * q * Δλ * Δλ)) * R;
n.x_Id = x.Id;
n.Tox = xNodes[i].Id;
n.Distance = dist;
_xWeightsRepository.Insert(n);
}
}
}
My problem is; I am getting approximately 35k records per minute so will be 2.1M record per hour. This will take forever to finish this. Any ideas how to improve the performance?

The problem is not with this function, but with what you are trying to achieve.
You are trying to insert every from-to combination into _xWeightsRepository. If there are 2 million nodes, then that means 4 thousand billion weights.
If you could insert a weight per CPU clock cycle (which is several orders of magnitude faster than you could ever actually hope to achieve) then you'll still be waiting ten or twenty years.
Check out SQL spatial indexes. I'm going to take a guess that your answer lies in that direction:
https://learn.microsoft.com/en-us/sql/t-sql/statements/create-spatial-index-transact-sql

Related

C# trigonometric functions / math functions

I'm writing a class which should calculate angles (in degrees), after long trying I don't get it to work.
When r = 45 & h = 0 sum should be around 77,14°
but it returns NaN - Why? I know it must be something with the Math.Atan-Method and a not valid value.
My code:
private int r = 0, h = 0;
public Schutzwinkelberechnung(int r, int h)
{
this.r = r;
this.h = h;
}
public double getSchutzwinkel()
{
double sum = 0;
sum = 180 / Math.PI * Math.Atan((Math.Sqrt(2 * h * r - (h * h)) - r * Math.Sin(Math.Asin((Math.Sqrt(2 * h * r)) / (2 * r)))) / (h - r + r * (Math.Cos(Math.Asin(Math.Sqrt(2 * h * r) / (2 * r))))));
return sum;
}
Does someone see my mistake? I got the formula from an excel sheet.
EDIT: Okay, my problem was that I had a parsing error while creating the object or getting the user input, apparently solved it by accident. Ofc I have to add a simple exception, as Nick Dechiara said. Thank you very much for the fast reply, I appreciate it.
EDIT2: The exception in my excel sheet is:
if(h < 2) {h = 2}
so that's explaining everything and I wasn't paying attention at all. Thanks again for all answers.
int r = 45, h = 2;
sum = 77.14°
A good approach to debugging these kinds of issues is to break the equation into smaller pieces, so it is easier to debug.
double r = 45;
double h = 0;
double sqrt2hr = Math.Sqrt(2 * h * r);
double asinsqrt2hr = Math.Asin((sqrt2hr) / (2 * r));
double a = (Math.Sqrt(2 * h * r - (h * h)) - r * Math.Sin(asinsqrt2hr));
double b = (h - r + r * (Math.Cos(asinsqrt2hr)));
double sum = 180 / Math.PI * Math.Atan(a / b);
Now if we put a breakpoint at sum and let the code run, we see that both a and b are equal to zero. This gives us a / b = 0 / 0 = NaN in the final line.
Now we can ask, why is this happening? Well in the case of b you have h - r + r which is 0 - 45 + 45, evaluates to 0, so b becomes 0. You probably have an error in your math there.
In the case of a, we have 2 * h * r - h * h, which also evaluates to 0.
You probably either A) have an error in your equation, or B) need to include a special case for when h = 0, as that is breaking your math here.
Definitely break up the expression something like
var a = Asin(Sqrt(2 * h * r) / (2 * r));
var b = Sqrt(2 * h * r - h * h) - r * Sin(a);
var c = h - r + r * Cos(a);
var sum = 180 / PI * Atan(b / c);
and you will find b=0 and c=0. You might consider changing the last expression into
var sum = 180 / PI * Atan2(b , c);
which will return a value when b=0 and c=0.
PS. Also, use using static System.Math; in the beginning of the code to shorten such math expressions.

c# linear regression given 2 sets of data

I have 2 sets of data - one is an average position and the other a score so for every position, i have the predicted score of an item -
double[] positions = {0.1,0.2,0.3,0.45,0.46,...};
double[] scores = {1,1.2,1.5,2.2,3.4,...};
I need to create a function that predicts the score for average position, so given a new item with position 1.7.
I under stand the function should be something like y=a*x + b but how do i get to it?
Any help will be appreciated!
Yes, you have to build a linear function
y = a * x + b
in order to do this you have to compute the sums (x is predictor's values and y - is corresponding results):
sx - sum of x's
sxx - sum of x * x
sy - sum of y's
sxy - sum of x * y
So
a = (N * sxy - sx * sy) / (N * sxx - sx * sx);
b = (sy - a * sx) / N;

Weird glitch when creating sine audio signal

I had a look at the wave format today and created a little wave generator. I create a sine sound like this:
public static Wave GetSine(double length, double hz)
{
int bitRate = 44100;
int l = (int)(bitRate * length);
double f = 1.0 / bitRate;
Int16[] data = new Int16[l];
for (int i = 0; i < l; i++)
{
data[i] = (Int16)(Math.Sin(hz * i * f * Math.PI * 2) * Int16.MaxValue);
}
return new Wave(false, Wave.MakeInt16WaveData(data));
}
MakeInt16WaveData looks like this:
public static byte[] MakeInt16WaveData(Int16[] ints)
{
int s = sizeof(Int16);
byte[] buf = new byte[s * ints.Length];
for(int i = 0; i < ints.Length; i++)
{
Buffer.BlockCopy(BitConverter.GetBytes(ints[i]), 0, buf, i * s, s);
}
return buf;
}
This works as expected! Now I wanted to swoop from one frequency to another like this:
public static Wave GetSineSwoop(double length, double hzStart, double hzEnd)
{
int bitRate = 44100;
int l = (int)(bitRate * length);
double f = 1.0 / bitRate;
Int16[] data = new Int16[l];
double hz;
double hzDelta = hzEnd - hzStart;
for (int i = 0; i < l; i++)
{
hz = hzStart + ((double)i / l) * hzDelta * 0.5; // why *0.5 ?
data[i] = (Int16)(Math.Sin(hz * i * f * Math.PI * 2) * Int16.MaxValue);
}
return new Wave(false, Wave.MakeInt16WaveData(data));
}
Now, when I swooped from 200 to 100 Hz, the sound played from 200 to 0 hertz. For some reason I had to multiply the delta by 0.5 to get the correct output. What might be the issue here ? Is this an audio thing or is there a bug in my code ?
Thanks
Edit by TaW: I take the liberty to add screenshots of the data in a chart which illustrate the problem, the first is with the 0.5 factor, the 2nd with 1.0 and the 3rd & 4th with 2.0 and 5.0:
Edit: here is an example, a swoop from 200 to 100 hz:
Debug values:
Wave clearly does not end at 100 hz
Digging out my rusty math I think it may be because:
Going in L steps from frequency F1 to F2 you have a frequency of
Fi = F1 + i * ( F2 - F1 ) / L
or with
( F2 - F1 ) / L = S
Fi = F1 + i * S
Now to find out how far we have progressed we need the integral over i, which would be
I(Fi) = i * F1 + 1/2 * i ^ 2 * S
Which give or take resembles the term inside your formula for the sine.
Note that you can gain efficiency by moving the constant part (S/2) out of the loop..

Exponential Moving Average with different kernels

I am trying to replicate some formulas but am having trouble translating the math to code.
Here is the simple Exponential Moving Average
In c#:
out[1] = values[1];
for (i in 2:N(X)) {
tmp = (times[i] - times[i-1]) / tau;
w = exp(-tmp);
w2 = (1 - w) / tmp;
out[i] = out[i-1] * w + values[i] * (1 - w2) + values[i-1] * (w2 - w);
}
In Python:
mu = numpy.exp ((ts[1] - ts[0]) / self.tau)
nu = 1.0 - mu
return numpy.array ([
mu * el + nu * arr[0] for el, arr in zip (last, arrays)
])
I want to be able to specify different kernels and am not sure how to go about it as described here:
This is all done so I can eventually recreate this moving differential given here:
Thanks for any help given
One possible approach here is to have a method that returns the kernel.
From what I am able to see, inputs to this method would be kerneltype, i, and otherInputs.
A simple approach would be:
for(int i = 1; i < values.length(); i++)
{
tmp = (times[i] - times[i-1]) / tau;
//w = exp(-tmp);
//w2 = (1 - w) / tmp;
List<Object> kernelInputsInital = new List<Object>();
kernelInputsInitial.Add(tmp); //takes in the first argument
kernelInputsInitial.Add(true); //expected to calculate the first
w = GetKernel(KernelType.Exponential, i, kernelInputsInitial);
List<Object> kernelInputsSecondTerm = new List<Object>();
kernelInputsSecondTerm.Add(w); //takes in the first argument
kernelInputsSecondTerm.Add(false); //expected to calculate the first
w2 = GetKernel(KernelType.Exponential, i, kernelInputsInitial);
out[i] = out[i-1] * w + values[i] * (1 - w2) + values[i-1] * (w2 - w);
....
}
This is of course terribly, terribly rough, and a lot of improvement can be made, but it is intended to merely get the point across.
I would use an interface to represent a kernel, and have classes derived per kernel. In my experience, that produces sufficiently readable and maintainable code, but there's always room for improvement.

Which way is more accurate?

I need to divide a numeric range to some segments that have same length. But I can't decide which way is more accurate. For example:
double r1 = 100.0, r2 = 1000.0, r = r2 - r1;
int n = 30;
double[] position = new double[n];
for (int i = 0; i < n; i++)
{
position[i] = r1 + (double)i / n * r;
// position[i] = r1 + i * r / n;
}
It's about (double)int1 / int2 * double or int1 * double / int2. Which way is more accurate? Which way should I use?
Update
The following code will show the difference:
double r1 = 1000.0, r2 = 100000.0, r = r2 - r1;
int n = 300;
double[] position = new double[n];
for (int i = 0; i < n; i++)
{
double v1 = r1 + (double)i / n * r;
double v2 = position[i] = r1 + i * r / n;
if (v1 != v2)
{
Console.WriteLine(v2 - v1);
}
}
Disclaimer: All numbers I am going to give as examples are not exact, but show the principle of what's happening behind the scenes.
Let's examine two cases:
(1) int1 = 1000, int2= 3, double = 3.0
The first method will give you: (1000.0 / 3) * 3 == 333.33333 * 3.0 == 999.999...
While the second will give (1000 * 3.0) / 3 == 3000 / 3 == 1000
In this scenario - the second method is more accurate.
(2) int1 = 2, int2 = 2, double = Double.MAX_VALUE
The first will yield (2.0 / 2) * Double.MAX_VALUE == 1 * Double.MAX_VALUE == Double.MAX_VALUE
While the second will give (2 * Double.MAX_VALUE) / 2 - which will cause (in Java) to be Infinity, I am not sure what the double standard says about this cases, if it might overflow or is it always infinity - but it is definetly an issue.
So, in this case - the first method is more accurate.
The things might go more complicated if the integers are longs or the double is float, since there are long values that cannot be represented by doubles, so loss of accuracy might happen for large double values in this case, and in any case - large double values are less accurate.
Conclusion: Which is better is domain specific. In some cases the first method should be better and in some the first. It really depends on the values of int1,int2, and double.
However- AFAIK, the general rule of thumb with double precision ops is keep the calculations as small as possible (Don't create huge numbers and then decrease them back, keep them small as longest as you can). This issue is known as loss of significant digits.
Neither is particularly faster, since the compiler or the JIT process may reorder the operation for efficiency anyway.
Maybe I misunderstand your requirement but why do any division/multiplication inside the loop at all? Maybe this would get the same results:
decimal r1 = 100.0m, r2 = 1000.0m, r = r2 - r1;
int n = 30;
decimal[] position = new double[n];
decimal diff = r / n;
decimal current = r1;
for (int i = 0; i < n; i++)
{
position[i] = current;
current += diff;
}

Categories

Resources