How to calculate round on very large array of double in C# - c#

In C# I have an array of double (double[] I mean)
I want to round all values to two decimal places in C#.
One solution is to do this with foreach() and Math.Round() functions
but the array is very large (1.000.000 to 10.000.000 values)
Instead of foreach, is there a more efficient solution?

Instead of foreach, a more efficient solution?
Nope. Fundamentally, if you want to round all the values, you've got to round all the values. There's no magic that will make that simpler.
Of course, if you're only going to access a few of the values, you could potentially redesign your code to lazily round on access - or even on display - but if you really need the rounded versions of all the values, there's nothing that will make that faster than an O(n) operation.
You could do it in parallel as noted in other answers, which may well make it faster - but it will simultaneously be less efficient than a single-threaded approach. (You'll end up doing more work, because of the coordination involved - but you'll still get it done faster overall, probably.)
You haven't said anything about why you want to round these values though - usually, you should only round values at the point of display. Are you really sure you need to round them before then? What do the values represent? (If they're financial values, you should strongly consider decimal instead of double.)

You can try Parallel.Foreach and thread it
While there is still an operation per item, the operations are not atomic so you can do this.

To do it in a multi-threaded way:
Parallel.For(0, arr.Length, i => arr[i] = Math.Round(arr[i], 2));

I don't think there is any solution which does not include some kind of loop. You could use LINQ:
array = array.Select(v => Math.Round(v, 2)).ToArray();
but it will be even slower then your custom for loop, because instead of modifying the array in place, it will create new one with new values.
To make your loop faster, you can split it into parts and run simultaneously, using TPL.

Any solution will eventually devolve into a loop. Though you may write nicely as a Linq statement:
rounds = mean.Select(x=>Math.Round(x,2));
If you have multiple processors then a potentially faster solution would be to use Parallel.Foreach, though you'd have to test it to see if it actually is.

Yeah, you can use Parallel.ForEach.
List<Double> values;
//fill list
ConcurrentBag<Double> rounded = new ConcurrentBag<Double>();
Parallel.ForEach(
values,
value =>
{
rounded.Add(/*value rounded to 2 decimal places*/);
});

Related

Loop - Calculated last element different

Hi everyone (sry for the bad title),
I have a loop in which I can get a rounding difference every time I pass. I would like to cumulate them and add it to the last record of my result.
var cumulatedRoundDifference = 0m;
var resultSet = Enumerable.Range(0, periods)
.Select(currentPeriod => {
var value = this.CalculateValue(currentPeriod);
var valueRounded = this.CommercialRound(value);
// Bad part :(
cumulatedRoundDifference += value - valueRounded;
if (currentPeriod == periods - 1)
valueRounded = this.CommercialRound(value + valueRounded);
return valuesRounded;
}
At the moment the code of my opinion is not so nice.
Is there a pattern / algorithm for such a thing or is it somehow clever with Linq, without a variable outside the loop?
many Greetings
It seems like you are doing two things - rounding everything, and calculating the total rounding error.
You could remove the variable outside the lambda, but then you would need 2 queries.
var baseQuery = Enumerable.Range(0, periods)
.Select(x => new { Value = CalculateValue(x), ValueRounded = CommercialRound(x) });
var cumulateRoundDifference = baseQuery.Select(x => x.Value - x.ValueRounded).Sum();
// LINQ isn't really good at doing something different to the last element
var resultSet = baseQuery.Select(x => x.ValueRounded).Take(periods - 1).Concat(new[] { CommercialRound(CalculateValue(periods - 1) + CommericalRound(periods - 1)) });
Is there a pattern / algorithm for such a thing or is it somehow clever with Linq, without a variable outside the loop?
I don't quite agree with what you're trying to accomplish. You're trying to accomplish two very different tasks, so why are you trying to merge them into the same iteration block? The latter (handling the last item) isn't even supposed to be an iteration.
For readability's sake, I suggest splitting the two off. It makes more sense and doesn't require you to check if you're on the last loop of the iteration (which saves you some code and nesting).
While I don't quite understand the calculation in and of itself, I can answer the algorithm you're directly asking for (though I'm not sure this is the best way to do it, which I'll address later in the answer).
var allItemsExceptTheLastOne = allItems.Take(allItems.Count() - 1);
foreach(var item in allItemsExceptTheLastOne)
{
// Your logic for all items except the last one
}
var theLastItem = allItems.Last();
// Your logic for the last item
This is in my opinion a cleaner and more readable approach. I'm not a fan of using lambda methods as mini-methods with a less-than-trivial readability. This may be subjective and a matter of personal style.
On rereading, I think I understand the calculation better, so I've added an attempt at implementing it, while still maximizing readability as best I can:
// First we make a list of the values (without the sum)
var myValues = Enumerable
.Range(0, periods)
.Select(period => this.CalculateValue(period))
.Select(period => period - this.CommercialRound(period))
.ToList();
// myValues = [ 0.1, 0.2, 0.3 ]
myValues.Add(myValues.Sum());
// myValues = [ 0.1, 0.2, 0.3, 0.6 ]
This follows the same approach as the algorithm I first suggested: iterate over the iteratable items, and then separately handle the last value of your intended result list.
Note that I separated the logic into two subsequent Select statements as I consider it the most readable (no excessive lambda bodies) and efficient (no duplicate CalculateValue calls) way of doing this. If, however, you are more concerned about performance, e.g. when you are expecting to process massive lists, you may want to merge these again.
I suggest that you always try to default to writing code that favors readability over (excessive) optimization; and only deviate from that path when there is a clear need for additional optimization (which I cannot decide based on your question).
On a second reread, I'm not sure you've explained the actual calculation well enough, as cumulatedRoundDifference is not actually used in your calculations, but the code seems to suggest that its value should be important to the end result.

Pure Speed for Lookup Single Value Type c#?

.NET 4.5.1
I have a "bunch" of Int16 values that fit in a range from -4 to 32760. The numbers in the range are not consecutive, but they are ordered from -4 to 32760. In other words, the numbers from 16-302 are not in the "bunch", but numbers 303-400 are in there, number 2102 is not there, etc.
What is the all-out fastest way to determine if a particular value (eg 18400) is in the "bunch"? Right now it is in an Int16[] and the Linq Contains method is used to determine if a value is in the array, but if anyone can say why/how a different structure would deliver a single value faster I would appreciate it. Speed is the key for this lookup (the "bunch" is a static property on a static class).
Sample code that works
Int16[] someShorts = new[] { (short)4 ,(short) 5 , (short)6};
var isInIt = someShorts.Contains( (short)4 );
I am not sure if that is the most performant thing that can be done.
Thanks.
It sounds like you really want BitArray - just offset the value by 4 so you've got a range of [0, 32764] and you should be fine.
That will allocate an array which is effectively 4K in size (32764 / 8), with one bit per value in the array. It will handle finding the relevant element in the array, and applying bit masking. (I don't know whether it uses a byte[] internally or something else.)
This is a potentially less compact representation than storing ranges, but the only cost involved in getting/setting a bit will be computing an index (basically a shift), getting the relevant bit of memory to the CPU, and then bit masking. It takes 1/8th the size of a bool[], making your CPU cache usage more efficient.
Of course, if this is really a performance bottleneck for you, you should compare both this solution and a bool[] approach in your real application - microbenchmarks aren't nearly as important here as how your real app behaves.
Make one bool for each possible value:
var isPresentItems = new bool[32760-(-4)+1];
Set the corresponding element to true if the given item is present in the set. Lookup is easy:
var isPresent = isPresentItems[myIndex];
Can't be done any faster. The bools will fit into L1 or L2 cache.
I advise against using BitArray because it stores multiple values per byte. This means that each access is slower. Bit-arithmetic is required.
And if you want insane speed, don't make LINQ call a delegate once for each item. LINQ is not the first choice for performance-critical code. Many indirections that stall the CPU.
If you want to optimize for lookup time, pick a data structure with O(1) (constant-time) lookups. You have several choices since you only care about set membership, and not sorting or ordering.
A HashSet<Int16> will give this to you, as will a BitArray indexed on max - min + 1. The absolute fastest ad-hoc solution would probably be a simple array indexed on max - min + 1, as #usr suggests. Any of these should be plenty "fast enough". The HashSet<Int16> will probably use the most memory, as the size of the internal hash table is an implementation detail. BitArray would be the most space efficient out of these options.
If you only have a single lookup, then memory should not be a concern, and I suggest first going with a HashSet<Int16>. That solution is easy to reason about and deal with in a bug-free manner, as you don't have to worry about staying within array boundaries; you can simply check set.Contains(n). This is particularly useful if your value range might change in the future. You can fall back to one of the other solutions if you need to optimize further for speed or performance.
One option is to use the HashSet. To find if the value is in it, it is a O(1) operation
The code example:
HashSet<Int16> evenNumbers = new HashSet<Int16>();
for (Int16 i = 0; i < 20; i++)
{
evenNumbers.Add(i);
}
if (evenNumbers.Contains(0))
{
/////
}
Because the numbers are sorted, I would loop through the list one time and generate a list of Range objects that have a start and end number. That list would be much smaller than having a list or dictionary of thousands of numbers.
If your "bunch" of numbers can be identified as a series of intervals, I suggest you use Interval Trees. An interval tree allows dynamic insertion/deletions and also searching if a an interval intersects any interval in the tree is O(log(n)) where n is the number of intervals in the tree. In your case the number of intervals would be way less than the number of ints and the search is much faster.

merge in-place without external storage

I want to merge two arrays with sorted values into one. Since both source arrays are stored as succeeding parts of a large array, I wonder, if you know a way to merge them into the large storage. Meaning inplace merge.
All methods I found, need some external storage. They often require sqrt(n) temp arrays. Is there an efficient way without it?
I m using C#. Other languages welcome also. Thanks in advance!
AFAIK, merging two (even sorted) arrays does not work inplace without considerably increasing the necessary number of comparisons and moves of elements. See: merge sort. However, blocked variants exist, which are able to sort a list of length n by utilizing a temporary arrays of lenght sqrt(n) - as you wrote - by still keeping the number of operations considerably low.. Its not bad - but its also not "nothing" and obviously the best you can get.
For practical situations and if you can afford it, you better use a temporary array to merge your lists.
If the values are stored as succeeding parts of a larger array, you just want to sort the array, then remove consecutive values which are equal.
void SortAndDedupe(Array<T> a)
{
// Do an efficient in-place sort
a.Sort();
// Now deduplicate
int lwm = 0; // low water mark
int hwm = 1; // High water mark
while(hwm < a.length)
{
// If the lwm and hwm elements are the same, it is a duplicate entry.
if(a[lwm] == a[hwm])
{
hwm++;
}else{
// Not a duplicate entry - move the lwm up
// and copy down the hwm element over the gap.
lwm++;
if(lwm < hwm){
a[lwm] = a[hwm];
}
hwm++;
}
}
// New length is lwm
// number of elements removed is (hwm-lwm-1)
}
Before you conclude that this will be too slow, implement it and profile it. That should take about ten minutes.
Edit: This can of course be improved by using a different sort rather than the built-in sort, e.g. Quicksort, Heapsort or Smoothsort, depending on which gives better performance in practice. Note that hardware architecture issues mean that the practical performance comparisons may very well be very different from the results of big O analysis.
Really you need to profile it with different sort algorithms on your actual hardware/OS platform.
Note: I am not attempting in this answer to give an academic answer, I am trying to give a practical one, on the assumption you are trying to solve a real problem.
Dont care about external storage. sqrt(n) or even larger should not harm your performance. You will just have to make sure, the storage is pooled. Especially for large data. Especially for merging them in loops. Otherwise, the GC will get stressed and eat up a considerable part of your CPU time / memory bandwidth.

Best practice with Math.Pow

I'm working on a n image processing library which extends OpenCV, HALCON, ... . The library must be with .NET Framework 3.5 and since my experiences with .NET are limited I would like to ask some questions regarding the performance.
I have encountered a few specific things which I cannot explain to myself properly and would like you to ask a) why and b) what is the best practise to deal with the cases.
My first question is about Math.pow. I already found some answers here on StackOverflow which explains it quite well (a) but not what to do about this(b). My benchmark Program looks like this
Stopwatch watch = new Stopwatch(); // from the Diagnostics class
watch.Start();
for (int i = 0; i < 1000000; i++)
double result = Math.Pow(4,7) // the function call
watch.Stop()
The result was not very nice (~300ms on my computer) (I have run the test 10 times and calcuated the average value).
My first idea was to check wether this is because it is a static function. So I implemented my own class
class MyMath
{
public static double Pow (double x, double y) //Using some expensive functions to calculate the power
{
return Math.Exp(Math.Log(x) * y);
}
public static double PowLoop (double x, int y) // Using Loop
{
double res = x;
for(int i = 1; i < y; i++)
res *= x;
return res;
}
public static double Pow7 (double x) // Using inline calls
{
return x * x * x * x * x * x * x;
}
}
THe third thing I checked were if I would replace the Math.Pow(4,7) directly through 4*4*4*4*4*4*4.
The results are (the average out of 10 test runs)
300 ms Math.Pow(4,7)
356 ms MyMath.Pow(4,7) //gives wrong rounded results
264 ms MyMath.PowLoop(4,7)
92 ms MyMath.Pow7(4)
16 ms 4*4*4*4*4*4*4
Now my situation now is basically like this: Don't use Math for Pow. My only problem is just that... do I really have to implement my own Math-class now? It seems somehow ineffective to implement an own class just for the power function. (Btw. PowLoop and Pow7 are even faster in the Release build by ~25% while Math.Pow is not).
So my final questions are
a) am I wrong if I wouldn't use Math.Pow at all (but for fractions maybe) (which makes me somehow sad).
b) if you have code to optimize, are you really writing all such mathematical operations directly?
c) is there maybe already a faster (open-source^^) library for mathematical operations
d) the source of my question is basically: I have assumed that the .NET Framework itself already provides very optimized code / compile results for such basic operations - be it the Math-Class or handling arrays and I was a little surprised how much benefit I would gain by writing my own code. Are there some other, general "fields" or something else to look out in C# where I cannot trust C# directly.
Two things to bear in mind:
You probably don't need to optimise this bit of code. You've just done a million calls to the function in less than a second. Is this really going to cause big problems in your program?
Math.Pow is probably fairly optimal anyway. At a guess, it will be calling a proper numerics library written in a lower level language, which means you shouldn't expect orders of magnitude increases.
Numerical programming is harder than you think. Even the algorithms that you think you know how to calculate, aren't calculated that way. For example, when you calculate the mean, you shouldn't just add up the numbers and divide by how many numbers you have. (Modern numerics libraries use a two pass routine to correct for floating point errors.)
That said, if you decide that you definitely do need to optimise, then consider using integers rather than floating point values, or outsourcing this to another numerics library.
Firstly, integer operations are much faster than floating point. If you don't need floating point values, don't use the floating point data type. This generally true for any programming language.
Secondly, as you have stated yourself, Math.Pow can handle reals. It makes use of a much more intricate algorithm than a simple loop. No wonder it is slower than simply looping. If you get rid of the loop and just do n multiplications, you are also cutting off the overhead of setting up the loop - thus making it faster. But if you don't use a loop, you have to know
the value of the exponent beforehand - it can't be supplied at runtime.
I am not really sure why Math.Exp and Math.Log is faster. But if you use Math.Log, you can't find the power of negative values.
Basically int are faster and avoiding loops avoid extra overhead. But you are trading off some flexibility when you go for those. But it is generally a good idea to avoid reals when all you need are integers, but in this case coding up a custom function when one already exists seems a little too much.
The question you have to ask yourself is whether this is worth it. Is Math.Pow actually slowing your program down? And in any case, the Math.Pow already bundled with your language is often the fastest or very close to that. If you really wanted to make an alternate implementation that is really general purpose (i.e. not limited to only integers, positive values, etc.), you will probably end up using the same algorithm used in the default implementation anyway.
When you are talking about making a million iterations of a line of code then obviously every little detail will make a difference.
Math.Pow() is a function call which will be substantially slower than your manual 4*4...*4 example.
Don't write your own class as its doubtful you'll be able to write anything more optimised than the standard Math class.

Any way to make this LINQ faster?

I have a LINQ expression that's slowing down my application.
I'm drawing a control, but to do this, I need to know the max width of the text that will appear in my column.
The way I'm doing that is this:
return Items.Max(w => TextRenderer.MeasureText((w.RenatlUnit == null)? "" :
w.RenatlUnit.UnitNumber, this.Font).Width) + 2;
However, this iterates over ~1000 Items, and takes around 20% of the CPU time that is used in my drawing method. To make it worse, there are two other columns that this must be done with, so this LINQ statement on all the items/columns takes ~75-85% of the CPU time.
TextRenderer is from System.Windows.Forms package, and because I'm not using a monospaced font, MeasureText is needed to figure out the pixel width of a string.
How might I make this faster?
I don't believe that your problem lies in the speed of LINQ, it lies in the fact that you're calling MeasureText over 1000 times. I would imagine that taking your logic out of a LINQ query and putting it into an ordinary foreach loop would yield similar run times.
A better idea is probably to employ a little bit of sanity checking around what you're doing. If you go with reasonable inputs (and disregard the possibility of linebreaks), then you really only need to measure the text of strings that are, say, within 10% or so of the absolute longest (in terms of number of characters) string, then use the maximum value. In other words, there's no point in measuring the string "foo" if the largest value is "paleontology". There's no font that has widths THAT variable.
It's the MeasureText method that takes time, so the only way to increase the speed is to do less work.
You can cache the results of the call to MeasureText in a dictionary, that way you don't have to remeasure strings that already has been measured before.
You can calculate the values once and keep along with the data to display. Whenever you change the data, you recalculate the values. That way you don't have to measure the strings every time the control is drawn.
Step 0: Profile. Assuming you find that most of the execution time is indeed in MeasureText, then you can try the following to reduce the number of calls:
Compute the lengths of all individual characters. Since it sounds like you're rendering a number, this should be a small set.
Estimate the length numstr.Select(digitChar=>digitLengthDict[digitChar]).Sum()
Take the strings with the top N lengths, and measure only those.
To avoid even most of the cost of the lookup+sum, also filter to include only those strings within 90% of the maximum string-length, as suggested.
e.g. Something like...
// somewhere else, during initialization - do only once.
var digitLengthDict = possibleChars.ToDictionary(c=>c,c=>TextRenderer.MeasureText(c.ToString()));
//...
var relevantStringArray = Items.Where(w=>w.RenatlUnit!=null).Select(w.RenatlUnit.UnitNumber).ToArray();
double minStrLen = 0.9*relevantStringArray.Max(str => str.Length);
return (
from numstr in relevantStringArray
where str.Length >= minStrLen
orderby numstr.Select(digitChar=>digitLengthDict[digitChar]).Sum() descending
select TextRenderer.MeasureText(numstr)
).Take(10).Max() + 2;
If we knew more about the distribution of the strings, that would help.
Also, MeasureText isn't magic; it's quite possible you can duplicate it's functionality entirely quite easily for a limited set of inputs. For instance, it would not surprise me to learn that the Measured length of a string is precisely equal to the sum of the length of all characters in the string, minus the kerning overhang of all character bigrams in the string. If your string then consists of, say, 0-9, +, -, ,, ., and a terminator symbol, then a lookup table of 14 character widths and 15*15-1 kernel corrections might be enough to precisely emulate MeasureText at a far greater speed, and without much complexity.
Finally, the best solution is to not solve the problem at all - perhaps you can rearchitect the application to not require such a precise number - if a simpler estimate were to suffice, you could avoid MeasureText almost completely.
Unfortunately, it doesn't look like LINQ is your problem. If you ran a for loop and did this same calculation, the amount of time would be the same order of magnitude.
Have you considered running this calculation on multiple threads? It would work nicely with Parallel LINQ.
Edit: It seems Parallel LINQ won't work because MeasureText is a GDI function and will simply be marshaled back to the UI thread (thanks #Adam Robinson for correcting me.)
My guess is the issues is not the LINQ expression but calling the MeasureText several thousand times.
I think you could work around the non-monospaced font issue by breaking the problem into 4 parts.
Find the biggest number in terms of render size
Find the apartment unit with the most digits
Create a string with all values being the value determined in #1 and having size in #2.
Pass the value created in #3 to MeasureText and use that as your basis
This won't yield a perfect solution but it will ensure that you reserve at least enough space for your item and avoids the pitfall of calling MeasureText far too many times.
If you can't figure out how to make MeasureText faster, you could precalculate the width of all the characters in your font size and style and estimate the width of a string like that, although kerning of character pairs would suggest that it would probably be only an estimate and not precise.
You might want to consider as an approximation taking the length of the longest string and then finding the width of a string of that length of 0's (or whatever the widest digit is, I can't remember). That should be a much faster method, but it would only be an approximation and probably longer than necessary.
var longest = Items.Max( w => w.RenatlUnit == null
|| w.RenatlUnit.UnitNumber == null)
? 0
: w.RenatlUnit.UnitNumber.Length );
if (longest == 0)
{
return 2;
}
return TextRenderer.MeasureText( new String('0', longest ) ).Width + 2;

Categories

Resources