I need to create an array of boolean values, which could be on the scale of 100,000s or even millions of entries. It also needs to be super-fast, so every millisecond per iteration counts.
At the time of beginning the loop, I will already know how many entries there are going to be in the array. The question is, will it be faster to create a bool array up front and fill in the values by index (which is random access - could be slow?), or should I create a List<bool>, keep adding entries to the list, and at the end return .ToArray()?
In other words:
Option 1
var array = new bool[size];
for (var n=0; n<size; n++)
array[n] = GetValue(n);
return array;
Option 2
var list = new List<bool>();
for (var n=0; n<size; n++)
list.Add(GetValue(n));
return list.ToArray();
Or maybe there's a 3rd way that's even faster?
Use a System.Collections.BitArray and don't worry about speed.
What you are suggesting above will only waste your memory. This is optimizes both for speed and size, and will pack your bool values nicely (8 per byte, as the gods intended :).
Reply to below comments: If you use a BitArray, everything will be zero at first. Set only those bits for which you have GetValue == true.
The following code seems to show (at least to me) that of the methods discussed on this page, the simple allocation to a bool[] using a loop is quickest.
The code also seems to show me that unless GetValue(n) is computationally trivial, the overhead of allocating the bytes is not the part of the process I would be hoping to optimise.
Hope this helps in some way.
edit: added the results from the run (on my machine)
-- 187ms BitArray
-- 171ms List<bool>().ToArray
-- 168ms bool[] set only if true
-- 130ms bool[] always set
--11460ms bool[] always set with 'complex' GetValue()
class Program
{
static void Main(string[] args)
{
BitArray bitArray = new BitArray(10000000);
bool[] boolArray = new bool[10000000];
Stopwatch sw1 = new Stopwatch();
sw1.Start();
for (int i = 0; i < 10000000; i++)
{
bitArray[i] = GetMod2(i);
}
Console.WriteLine(sw1.ElapsedMilliseconds);
sw1.Restart();
var list = new List<bool>();
for (int i = 0; i < 10000000; i++)
list.Add(GetMod2(i));
var boolArray2 = list.ToArray();
Console.WriteLine(sw1.ElapsedMilliseconds);
sw1.Restart();
for (int i = 0; i < 10000000; i++)
{
bool nextVal = GetMod2(i);
if (nextVal)
bitArray[i] = true;
}
Console.WriteLine(sw1.ElapsedMilliseconds);
sw1.Restart();
for (int i = 0; i < 10000000; i++)
{
boolArray[i] = GetMod2(i);
}
Console.WriteLine(sw1.ElapsedMilliseconds);
sw1.Restart();
for (int i = 0; i < 10000000; i++)
{
boolArray[i] = GetRand(i);
}
Console.WriteLine(sw1.ElapsedMilliseconds);
Console.ReadLine();
}
static bool GetMod2(int i)
{
return (i % 2) == 1;
}
static bool GetRand(int i)
{
return new Random().Next(2) == 1;
}
}
Go with the first. The only reason it might be. "slow" is if it keeps paging data from outside the processor cache.
The list will have exactly the same problem, except it will also need to perform several memory allocations and copies.
Now here's a funny old thing. Inspired by #paul, I ran these benchmark tests myself, on 10,000,000 booleans. The results (in milliseconds) are very surprising, given the discussion in the comments to this question:
BitArray: 517
BitArray + CopyTo(array): 536
List + ToArray(): 455
bool array: 483
And what a turnout for the books! Despite the fact that the List<Bool> is inserting a new record every time, while the bool[] and BitArray are initialized to false on every record, and I only updated them where the value should be true, the List<bool> comes out tops, consistently, even including the .ToArray() call.
Yet another case where practical application is better than textbook knowledge, it seems... :)
Related
I've stumbled upon this effect when debugging an application - see the repro code below.
It gives me the following results:
Data init, count: 100,000 x 10,000, 4.6133365 secs
Perf test 0 (False): 5.8289565 secs
Perf test 0 (True): 5.8485172 secs
Perf test 1 (False): 32.3222312 secs
Perf test 1 (True): 217.0089923 secs
As far as I understand, the array store operations shouldn't normally have such a drastic performance effect (32 vs 217 seconds). I wonder if anyone understands what effects are at play here?
UPD extra test added; Perf 0 shows the results as expected, Perf 1 - shows the performance anomaly.
class Program
{
static void Main(string[] args)
{
var data = InitData();
TestPerf0(data, false);
TestPerf0(data, true);
TestPerf1(data, false);
TestPerf1(data, true);
if (Debugger.IsAttached)
Console.ReadKey();
}
private static string[] InitData()
{
var watch = Stopwatch.StartNew();
var data = new string[100_000];
var maxString = 10_000;
for (int i = 0; i < data.Length; i++)
{
data[i] = new string('-', maxString);
}
watch.Stop();
Console.WriteLine($"Data init, count: {data.Length:n0} x {maxString:n0}, {watch.Elapsed.TotalSeconds} secs");
return data;
}
private static void TestPerf1(string[] vals, bool testStore)
{
var watch = Stopwatch.StartNew();
var counters = new int[char.MaxValue];
int tmp = 0;
for (var j = 0; ; j++)
{
var allEmpty = true;
for (var i = 0; i < vals.Length; i++)
{
var val = vals[i];
if (j < val.Length)
{
allEmpty = false;
var ch = val[j];
var count = counters[ch];
tmp ^= count;
if (testStore)
counters[ch] = count + 1;
}
}
if (allEmpty)
break;
}
// prevent the compiler from optimizing away our computations
tmp.GetHashCode();
watch.Stop();
Console.WriteLine($"Perf test 1 ({testStore}): {watch.Elapsed.TotalSeconds} secs");
}
private static void TestPerf0(string[] vals, bool testStore)
{
var watch = Stopwatch.StartNew();
var counters = new int[65536];
int tmp = 0;
for (var i = 0; i < 1_000_000_000; i++)
{
var j = i % counters.Length;
var count = counters[j];
tmp ^= count;
if (testStore)
counters[j] = count + 1;
}
// prevent the compiler from optimizing away our computations
tmp.GetHashCode();
watch.Stop();
Console.WriteLine($"Perf test 0 ({testStore}): {watch.Elapsed.TotalSeconds} secs");
}
}
After testing your code for quite some time my best guess is, as already said in the comments, that you experience a lot of cache-misses with your current solution. The line:
if (testStore)
counters[ch] = count + 1;
might be force the compiler to completely load a new cache-line into the memory and displace the current content. There might also be some problems with branch-prediction in this scenario. This is highly hardware dependent and I'm not aware of a really good solution to test this in any interpreted language (It's also quite hard in compiled languages where the hardware is set and well-known).
After going through the disassembly, you can clearly see that you also introduce a whole bunch of new instruction which might increase the before mentioned problems further.
Overall I'd advice you the re-write the complete algorithm as there are better places to improve performance instead of picking at this one little assignment. This would be the optimizations I'd suggest (this also improves readability):
Invert your i and j loop. This will remove the allEmpty variable completely.
Cast ch to int with var ch = (int) val[j]; - because you ALWAYS use it as index.
Think about why this might be a problem at all. You introduce a new instruction and any instruction comes at a cost. If this is really the primary "hot-spot" of your code you can start to think about better solutions (Remember: "premature optimization is the root of all evil").
As this is a "test setting" which the name suggests, is this important at all? Just remove it.
EDIT: Why did I suggest to invert to loops? With this little rearrangement of code:
foreach (var val in vals)
{
foreach (int ch in val)
{
var count = counters[ch];
tmp ^= count;
if (testStore)
{
counters[ch] = count + 1;
}
}
}
I come from runtimes like this:
to runtimes like this:
Do you still think it's not worth a try? I saved some orders of magnitude here and nearly eliminated the effect of the if (to be clear - all optimizations are disabled in the settings). If there are special reasons not to do this you should tell us more about the context in which this code will be used.
EDIT2: For the in-depth answer. My best explanation for why this problem occurs is because you cross-reference your cache-lines. In the lines:
for (var i = 0; i < vals.Length; i++)
{
var val = vals[i];
you load a really massive dataset. This is by far bigger than a cache-line itself. So it will most likely need to be loaded every iteration fresh from the memory into a new cache-line (displacing the old content). This is also known as "cache-thrashing" if I remember correctly. Thanks to #mjwills for pointing this out in his comment.
In my suggested solution, on the other hand, the content of a cache-line can stay alive as long as the inner loop did not exceed its boundaries (which happens a lot less if you use this direction of memory access).
This is the closest explanation why me code runs that much faster and it also supports the assumption that you have serious caching problems with your code.
I have a loop that is too slow in C#. I want to know if there is a faster way to process through these arrays. I'm currently working in .NET 2.0. i'm not opposed to upgrading this project. This is part of a theoretical image processing concept involving gray levels.
Pixel count (PixCnt = 21144402)
g_len = 4625
list1d - 1Dimensional array of an image with upper bound of the above pixel count.
pg - gray level intensity holder.
This function creates an index of those values. hence pgidx.
int[] pgidx = new int[PixCnt];
sw = new Stopwatch();
sw.Start();
for (i = 0; i < PixCnt; i++)
{
j = 0;
pgidx[i] = 0;
while (list_1d[i] != pg[j] && j < g_len) j++;
if (list_id[i] == pg[j])
pgidx[i] = j
}
sw.stop();
Debug.WriteLine("PixCnt Loop took" + sw.ElapsedMilliseconds + " ms");
I think using a dictionary to store what's in the pg array will speed it up. g_len is 4625 elements, so you will likely average around 2312 iterations of the inner while loop. Replacing that with a single hashed look up in a dictionary should be faster. Since the outer loop executes 21 million times, speeding up the body of that loop should reap big rewards. I'm guessing the code below will speed up your time by 100 to 1000 time faster.
var pgDict = new Dictionary<int,int>(g_len);
for (int i = 0; i < g_len; i++) pgDict.Add(pg[i], i);
int[] pgidx = new int[PixCnt];
int value = 0;
for (int i = 0; i < PixCnt; i++) {
if (pgDict.TryGetValue(list_id[i], out value)) pgidx[i] = value;
}
Note that setting pgidx[i] to zero when a match isn't found is not necessary, because all elements of the array are already initialized to zero when the array is created.
If there is the possibility for a value in pg to appear more than once, you would want to check first to see if that key has already been added, and skip adding it to the dictionary if it has. That would mimic your current behavior of finding the first match. To do that replace the line where the dictionary is built with this:
for (int i = 0; i < g_len; i++) if (!pgDict.ContainsKey(pg[i])) pgDict.Add(pg[i], i);
If the range of the pixel values in pq allows it (say 16 bpp = 65536 entries), you can create an auxiliary array that maps all possible gray levels to the index value in pg. Filling this array is done with a single pass over pg (after initializing to all zeroes).
Then convert list_1d to pgidx with straight table lookups.
If the table is too big (bigger than the image), then do as #hatchet answered.
EDIT: so it looks like this is normal behavior, so can anyone just recommend a faster way to do these numerous intersections?
so my problem is this. I have 8000 lists (strings in each list). For each list (ranging from size 50 to 400), I'm comparing it to every other list and performing a calculation based on the intersection number. So I'll do
list1(intersect)list1= number
list1(intersect)list2= number
list1(intersect)list888= number
And I do this for every list. Previously, I had HashList and my code was essentially this: (well, I was actually searching through properties of an object, so I
had to modify the code a bit, but it's basically this:
I have my two versions below, but if anyone knows anything faster, please let me know!
Loop through AllLists, getting each list, starting with list1, and then do this:
foreach (List list in AllLists)
{
if (list1_length < list_length) //just a check to so I'm looping through the
//smaller list
{
foreach (string word in list1)
{
if (block.generator_list.Contains(word))
{
//simple integer count
}
}
}
// a little more code, but the same, but looping through the other list if it's smaller/bigger
Then I make the lists into regular lists, and applied Sort(), which changed my code to
foreach (List list in AllLists)
{
if (list1_length < list_length) //just a check to so I'm looping through the
//smaller list
{
for (int i = 0; i < list1_length; i++)
{
var test = list.BinarySearch(list1[i]);
if (test > -1)
{
//simple integer count
}
}
}
The first version takes about 6 seconds, the other one takes more than 20 (I just stop there cuz otherwise it would take more than a minute!!!) (and this is for a smallish subset of the data)
I'm sure there's a drastic mistake somewhere, but I can't find it.
Well I have tried three distinct methods for achieving this (assuming I understood the problem correctly). Please note I have used HashSet<int> in order to more easily generate random input.
setting up:
List<HashSet<int>> allSets = new List<HashSet<int>>();
Random rand = new Random();
for(int i = 0; i < 8000; ++i) {
HashSet<int> ints = new HashSet<int>();
for(int j = 0; j < rand.Next(50, 400); ++j) {
ints.Add(rand.Next(0, 1000));
}
allSets.Add(ints);
}
the three methods I checked (code is what runs in the inner loop):
the loop:
note that you are getting duplicated results in your code (intersecting set A with set B and later intersecting set B with set A).
It won't affect your performance thanks to the list length check you are doing. But iterating this way is clearer.
for(int i = 0; i < allSets.Count; ++i) {
for(int j = i + 1; j < allSets.Count; ++j) {
}
}
first method:
used IEnumerable.Intersect() to get the intersection with the other list and checked IEnumerable.Count() to get the size of the intersection.
var intersect = allSets[i].Intersect(allSets[j]);
count = intersect.Count();
this was the slowest one averaging 177s
second method:
cloned the smaller set of the two sets I was intersecting, then used ISet.IntersectWith() and checked the resulting sets Count.
HashSet<int> intersect;
HashSet<int> intersectWith;
if(allSets[i].Count < allSets[j].Count) {
intersect = new HashSet<int>(allSets[i]);
intersectWith = allSets[j];
} else {
intersect = new HashSet<int>(allSets[j]);
intersectWith = allSets[i];
}
intersect.IntersectWith(intersectWith);
count = intersect.Count;
}
}
this one was slightly faster, averaging 154s
third method:
did something very similar to what you did iterated over the shorter set and checked ISet.Contains on the longer set.
for(int i = 0; i < allSets.Count; ++i) {
for(int j = i + 1; j < allSets.Count; ++j) {
count = 0;
if(allSets[i].Count < allSets[j].Count) {
loopingSet = allSets[i];
containsSet = allSets[j];
} else {
loopingSet = allSets[j];
containsSet = allSets[i];
}
foreach(int k in loopingSet) {
if(containsSet.Contains(k)) {
++count;
}
}
}
}
this method was by far the fastest (as expected), averaging 66s
conclusion
the method you're using is the fastest of these three. I certainly can't think of a faster single threaded way to do this. Perhaps there is a better concurrent solution.
I've found that one of the most important considerations in iterating/searching any kind of collection is to choose the collection type very carefully. To iterate through a normal collection for your purposes will not be the most optimal. Try using something like:
System.Collections.Generic.HashSet<T>
Using the Contains() method while iterating over the shorter list of two (as you mentioned you're already doing) should give close to O(1) performance, the same as key lookups in the generic Dictionary type.
I've found two diferent methods to get a Max value from an array but I'm not really fond of parallel programing, so I really don't understand it.
I was wondering do this methods do the same or am I missing something?
I really don't have much information about them. Not even comments...
The first method:
int[] vec = ... (I guess the content doesn't matter)
static int naiveMax()
{
int max = vec[0];
object obj = new object();
Parallel.For(0, vec.Length, i =>
{
lock (obj) {
if (vec[i] > max) max = vec[i];
}
});
return max;
}
And the second one:
static int Max()
{
int max = vec[0];
object obj = new object();
Parallel.For(0, vec.Length, //could be Parallel.For<int>
() => vec[0],
(i, loopState, partial) =>
{
if(vec[i]>partial) partial = vec[i];
return partial;
},
partial => {
lock (obj) {
if( partial > max) max = partial;
}
});
return max;
}
Do these do the same or something diferent and what? Thanks ;)
Both find the maximum value in an array of integers. In an attempt to find the maximum value faster, they do it "in parallel" using the Parallel.For Method. Both methods fail at this, though.
To see this, we first need a sufficiently large array of integers. For small arrays, parallel processing doesn't give us a speed-up anyway.
int[] values = new int[100000000];
Random random = new Random();
for (int i = 0; i < values.Length; i++)
{
values[i] = random.Next();
}
Now we can run the two methods and see how long they take. Using an appropriate performance measurement setup (Stopwatch, array of 100,000,000 integers, 100 iterations, Release build, no debugger attached, JIT warm-up) I get the following results on my machine:
naiveMax 00:06:03.3737078
Max 00:00:15.2453303
So Max is much much better than naiveMax (6 minutes! cough).
But how does it compare to, say, PLINQ?
static int MaxPlinq(int[] values)
{
return values.AsParallel().Max();
}
MaxPlinq 00:00:11.2335842
Not bad, saved a few seconds. Now, what about a plain, old, sequential for loop for comparison?
static int Simple(int[] values)
{
int result = values[0];
for (int i = 0; i < values.Length; i++)
{
if (result < values[i]) result = values[i];
}
return result;
}
Simple 00:00:05.7837002
I think we have a winner.
Lesson learned: Parallel.For is not pixie dust that you can sprinkle over your code to
make it magically run faster. If performance matters, use the right tools and measure, measure, measure, ...
They appear to do the same thing, however they are very inefficient. The point of parallelization is to improve the speed of code that can be executed independently. Due to race conditions, discovering the maximum (as implemented here) requires an atomic semaphore/lock on the actual logic... Which means you're spinning up many threads and related resources simply to do the code sequentially anyway... Defeating the purpose of parallelization entirely.
I have a List<T> that I want to be able to copy to an array backwards, meaning start from List.Count and copy maybe 5 items starting at the end of the list and working its way backwards. I could do this with a simple reverse for loop; however there is probably a faster/more efficient way of doing this so I thought I should ask. Can I use Array.Copy somehow?
Originally I was using a Queue as that pops it off in the correct order I need, but I now need to pop off multiple items at once into an array and I thought a list would be faster.
Looks like Array.Reverse has native code for reversing an array which sometimes doesn't apply and would fall back to using a simple for loop. In my testing Array.Reverse is very slightly faster than a simple for loop. In this test of reversing a 1,000,000 element array 1,000 times, Array.Reverse is about 600ms whereas a for-loop is about 800ms.
I wouldn't recommend performance as a reason to use Array.Reverse though. It's a very minor difference which you'll lose the minute you load it into a List which will loop through the array again. Regardless, you shouldn't worry about performance until you've profiled your app and identified the performance bottlenecks.
public static void Test()
{
var a = Enumerable.Range(0, 1000000).ToArray();
var stopwatch = Stopwatch.StartNew();
for(int i=0; i<1000; i++)
{
Array.Reverse(a);
}
stopwatch.Stop();
Console.WriteLine("Elapsed Array.Reverse: " + stopwatch.ElapsedMilliseconds);
stopwatch = Stopwatch.StartNew();
for (int i = 0; i < 1000; i++)
{
MyReverse(a);
}
stopwatch.Stop();
Console.WriteLine("Elapsed MyReverse: " + stopwatch.ElapsedMilliseconds);
}
private static void MyReverse(int[] a)
{
int j = a.Length - 1;
for(int i=0; i<j; i++, j--)
{
int z = a[i];
a[i] = a[j];
a[j] = z;
}
}
It is not possible to do this faster than a simple for loop.
You can accomplish it any number of ways, but the fastest way is get the elements in exactly the manner you are. You can use Array.Reverse, Array.Copy, etc., or you can use LINQ and extension methods, and both are valid alternatives, but they shouldn't be any faster.
In one of your comments:
Currently we are pulling out one result and committing it to a database one at a time
There is a big difference between using a for loop to iterate backwards over a List<T> and committing records to a database one at a time. The former is fine; nobody's endorsing the latter.
Why not just iterate first--to populate an array--and then send that array into the database, all populated?
var myArray = new T[numItemsYouWantToSend];
int arrayIndex = 0;
for (int i = myList.Count - 1; arrayIndex < myArray.Length; --i) {
if (i < 0) break;
myArray[arrayIndex++] = myList[i];
}
UpdateDatabase(myArray);