LINQ Why is "Enumerable = Enumerable.Skip(N)" slow? - c#

I am having an issue with the performance of a LINQ query and so I created a small simplified example to demonstrate the issue below. The code takes a random list of small integers and returns the list partitioned into several smaller lists each which totals 10 or less.
The problem is that (as I've written this) the code takes exponentially longer with N. This is only an O(N) problem. With N=2500, the code takes over 10 seconds to run on my pc.
I would appriciate greatly if someone could explain what is going on. Thanks, Mark.
int N = 250;
Random r = new Random();
var work = Enumerable.Range(1,N).Select(x => r.Next(0, 6)).ToList();
var chunks = new List<List<int>>();
// work.Dump("All the work."); // LINQPad Print
var workEnumerable = work.AsEnumerable();
Stopwatch sw = Stopwatch.StartNew();
while(workEnumerable.Any()) // or .FirstorDefault() != null
{
int soFar = 0;
var chunk = workEnumerable.TakeWhile( x =>
{
soFar += x;
return (soFar <= 10);
}).ToList();
chunks.Add(chunk); // Commented out makes no difference.
workEnumerable = workEnumerable.Skip(chunk.Count); // <== SUSPECT
}
sw.Stop();
// chunks.Dump("Work Chunks."); // LINQPad Print
sw.Elapsed.Dump("Time elapsed.");

What .Skip() does is create a new IEnumerable that loops over the source, and only begins yielding results after the first N elements. You chain who knows how many of these after each other. Everytime you call .Any(), you need to loop over all the previously skipped elements again.
Generally speaking, it's a bad idea to set up very complicated operator chains in LINQ and enumerating them repeatedly. Also, since LINQ is a querying API, methods like Skip() are a bad choice when what you're trying to achieve amounts to modifying a data structure.

You effectively keep chaining Skip() onto the same enumerable. In a list of 250, the last chunk will be created from a lazy enumerable with ~25 'Skip' enumerator classes on the front.
You would find things become a lot faster, already if you did
workEnumerable = workEnumerable.Skip(chunk.Count).ToList();
However, I think the whole approach could be altered.
How about using standard LINQ to achieve the same:
See it live on http://ideone.com/JIzpml
using System;
using System.Collections.Generic;
using System.Linq;
public class Program
{
private readonly static Random r = new Random();
public static void Main(string[] args)
{
int N = 250;
var work = Enumerable.Range(1,N).Select(x => r.Next(0, 6)).ToList();
var chunks = work.Select((o,i) => new { Index=i, Obj=o })
.GroupBy(e => e.Index / 10)
.Select(group => group.Select(e => e.Obj).ToList())
.ToList();
foreach(var chunk in chunks)
Console.WriteLine("Chunk: {0}", string.Join(", ", chunk.Select(i => i.ToString()).ToArray()));
}
}

The Skip() method and others like it basically create a placeholder object, implementing IEnumerable, that references its parent enumerable and contains the logic to perform the skipping. Skips in loops, therefore, are non-performant, because instead of throwing away elements of the enumerable, like you think they are, they add a new layer of logic that's lazily executed when you actually need the first element after all the ones you've skipped.
You can get around this by calling ToList() or ToArray(). This forces "eager" evaluation of the Skip() method, and really does get rid of the elements you're skipping from the new collection you will be enumerating. That comes at an increased memory cost, and requires all of the elements to be known (so if you're running this on an IEnumerable that represents an infinite series, good luck).
The second option is to not use Linq, and instead use the IEnumerable implementation itself, to get and control an IEnumerator. Then instead of Skip(), simply call MoveNext() the necessary number of times.

Related

Why does LINQ orderby consume more memory?

I want to know why orderBy consumes more memory then simply copying the list and sorting.
void printMemoryUsage()
{
long memory = GC.GetTotalMemory(true);
long mb = 1024 * 1024;
Console.WriteLine("memory: " + memory/mb + " MB" );
}
var r = new Random();
var list = Enumerable.Range(0, 20*1024*1024).OrderBy(x => r.Next()).ToList();
printMemoryUsage();
var lsitCopy = list.OrderBy(x => x);
foreach(var v in lsitCopy)
{
printMemoryUsage();
break;
}
Console.ReadKey();
The result I got is:
memory: 128 MB
memory: 288 MB
But copying the list and sorting consume less memory.
void printMemoryUsage()
{
long memory = GC.GetTotalMemory(true);
long mb = 1024 * 1024;
Console.WriteLine("memory: " + memory/mb + " MB" );
}
var r = new Random();
var list = Enumerable.Range(0, 20*1024*1024).OrderBy(x => r.Next()).ToList();
printMemoryUsage();
var lsitCopy = list.ToList();
printMemoryUsage();
lsitCopy.Sort();
printMemoryUsage();
Console.ReadKey();
Results are:
memory: 128 MB
memory: 208 MB
memory: 208 MB
More testing shows that memory consumed by orderBy is twice the list size.
It's a bit unsurprising when you dive into how the two approaches are implemented internally. Take a look at the Reference Source for .NET.
In your second approach where you call the Sort() method on the list, the internal array in the List object is passed to the TrySZSort method that is written in native code outside of C#, which means no work for the garbage collector.
private static extern bool TrySZSort(Array keys, Array items, int left, int right);
Now, in your first approach you're using LINQ to sort the enumerable. What's really happening when you call .OrderBy() is an OrderedEnumerable<T> object is constructed. Just calling OrderBy doesn't sort the list; it is only sorted when it is enumerated by the GetEnumerator method being called. GetEnumerator is implicitly called behind the scenes when you call ToList or when you enumerate over using a construct like foreach.
You're actually sorting the list twice since you're enumerating the list once on this line:
var list = Enumerable.Range(0, 20*1024*1024).OrderBy(x => r.Next()).ToList();
and again when you enumerate via foreach on this line:
var lsitCopy = list.OrderBy(x => x);
foreach(var v in lsitCopy)
Since these LINQ methods are not using native code, they rely on the garbage collector to pick up after them. Each of the classes is also creating a bunch of objects (e.g. OrderedEnumerable creates a Buffer<TElement> with another copy of the array). All of these objects consume RAM.
I had to do some research on this one, and found some interesting information. The default List.Sort function performs an in-place sort (not a second copy), but does some via a call to Array.Sort, which ultimately calls through to TrySZSort, a heavily optimized native, unmanaged CLR function that selects the specific sort algorithm based on the input type, but in most cases performs what's called an Introspective Sort, which combines the best use cases of the QuickSort, HeapSort, and InsertSort for maximum efficiency. This is done in unmanaged code, meaning it's generally faster and more efficient.
If you're interested in going down the rabbit hole, the Array Sort source is here and the TrySZSort implementation is here. Ultimately though, the use of Unmanaged code means the garbage collector doesn't get involved, and thus less memory is used.
The implementation used by OrderBy is a standard Quicksort, and the OrderedEnumerable actually creates a second copy of the keys used in the sort (in your case the only field, though that doesn't have to be the case if you considered a larger class object with a single property or two used as the sorter), leading to exactly what you observed, which is additional usage equal to the size of the collection for the second copy. Assuming you then typed that out to a List or Array (rather than an OrderedEnumerable) and waited for or forced a garbage collection, you should recover most of that memory. The Enumerable.OrderBy method source is here if you want to dig in to it.
The source of extra memory used can be found in the implementation of OrderedEnumerable which is created on the line
IOrderedEnumerable<int> lsitCopy = list.OrderBy(x => x);
OrderedEnumerable is a generic implementation that sorts by any criteria you provide it, which is distinctly different to the implementation of List.Sort which sorts elements only by value. If you follow the coding of OrderedEnumerable you will find it creates a buffer into which your values are copied accounting for an extra 80MB (4*20*1024*1024) of memory. The additional 40MB (2*20*1024*1024) is associated with structures created to sort the list by the keys.
Another thing to note is not only does OrderBy(x => x) result in more memory use it also uses a lot more processing power, calling Sort by my testing is about 6 times faster than using OrderBy(x => x).
The List.Sort() method is backed by a native implementation heavily optimised method for sorting elements by their value, whereas the Linq OrderBy method is far more versatile and consequently less optimised for simply sorting the list by value...
IOrderedEnumerable<TSource> OrderBy<TSource, TKey>(this IEnumerable<TSource> source, Func<TSource, TKey> keySelector)
P.S I would suggest you stop using var instead of the actual variable types as it hides valuable information to the reader of the code about how the code is actually functioning. I recommend developers only use the var keyword for anonymous types
Connor answer gave a clue what is happening here. Implementation of OrderedEnumerable makes it clearer. GetEnumerator of OrderedEnumerable is
public IEnumerator<TElement> GetEnumerator() {
Buffer<TElement> buffer = new Buffer<TElement>(source);
if (buffer.count > 0) {
EnumerableSorter<TElement> sorter = GetEnumerableSorter(null);
int[] map = sorter.Sort(buffer.items, buffer.count);
sorter = null;
for (int i = 0; i < buffer.count; i++) yield return buffer.items[map[i]];
}
}
Buffer is another copy of the original data. And Map keeps the mapping of the order. So, if the code is
// memory_foot_print_1
var sortedList = originalList.OrderBy(v=>v)
foreach(var v in sortedList)
{
// memory_foot_print_2
...
}
Here, memory_foot_print_2 will be equal to memory_foot_print_1 + size_of(originalList) + size_of(new int[count_of(originalList)]) (assuming no GC)
Thus, if originalList is a list of ints of size 80Mb, memory_foot_print_2 - memory_foot_print_1 = 80 + 80= 160Mb. And if originalList is a list of logs of size 80Mb, memory_foot_print_2 - memory_foot_print_1 = 80+ 40 (size of map)= 120Mb (assuming int - 4bytes, longs- 8 bytes) which is what I was observing.
It leads to another question if it makes sense to use OrderBy for larger objects.

Simple List<string> vs IEnumarble<string> Performance issues

I've tested List<string> vs IEnumerable<string>
iterations with for and foreach loops , is it possible that the List is much faster ?
these are 2 of few links I could find that are publicly stating that performance is better iterating IEnumerable over List.
Link1
Link2
my tests was loading 10K lines from a text file that holds a list of URLs.
I've first loaded it in to a List , then copied List to an IEnumerable
List<string> StrByLst = ...method to load records from the file .
IEnumerable StrsByIE = StrByLst;
so each has 10k items Type <string>
looping on each collection for 100 times , meaning 100K iterations, resulted with
List<string> is faster by amazing 50 x than the IEnumerable<string>
is that predictable ?
update
this is the code that is doing the tests
string WorkDirtPath = HostingEnvironment.ApplicationPhysicalPath;
string fileName = "tst.txt";
string fileToLoad = Path.Combine(WorkDirtPath, fileName);
List<string> ListfromStream = new List<string>();
ListfromStream = PopulateListStrwithAnyFile(fileToLoad) ;
IEnumerable<string> IEnumFromStream = ListfromStream ;
string trslt = "";
Stopwatch SwFr = new Stopwatch();
Stopwatch SwFe = new Stopwatch();
string resultFrLst = "",resultFrIEnumrable, resultFe = "", Container = "";
SwFr.Start();
for (int itr = 0; itr < 100; itr++)
{
for (int i = 0; i < ListfromStream.Count(); i++)
{
Container = ListfromStream.ElementAt(i);
}
//the stop() was here , i was doing changes , so my mistake.
}
SwFr.Stop();
resultFrLst = SwFr.Elapsed.ToString();
//forgot to do this reset though still it is faster (x56??)
SwFr.Reset();
SwFr.Start();
for(int itr = 0; itr<100; itr++)
{
for (int i = 0; i < IEnumFromStream.Count(); i++)
{
Container = IEnumFromStream.ElementAt(i);
}
}
SwFr.Stop();
resultFrIEnumrable = SwFr.Elapsed.ToString();
Update ... final
taking out the counter to outside of the for loops ,
int counter = ..countfor both IEnumerable & List
then passed counter(int) as a count of total items as suggested by #ScottChamberlain .
re checked that every thing is in place, now the results are 5 % faster IEnumerable.
so that concludes , use by scenario - use case... no performance difference at all ...
You are doing something wrong.
The times that you get should be very close to each other, because you are running essentially the same code.
IEnumerable is just an interface, which List implements, so when you call some method on the IEnumerable reference it ends up calling the corresponding method of List.
There is no code implemented in the IEnumerable - this is what interfaces are - they only specify what functionality a class should have, but say nothing about how it's implemented.
You have a few problems with your test, one is the IEnumFromStream.Count() inside the for loop, every time it want to get that value it must enumerate over the entire list to get the count and the value is not cached between loops. Move that call outside of the for loop and save the result in a int and use that value for the for loop, you will see a shorter time for your IEnumerable.
Also the IEnumFromStream.ElementAt(i) behaves similarly to Count() it must iterate over the whole list up to i (eg: first time it goes 0, second time 0,1, third 0,1,2, and so on...) every time where List can jump directly to the index it needs. You should be working with the IEnumerator returned from GetEnumerator() instead.
IEnumerable's and for loop's don't mix well. Use the correct tool for the job, either call GetEnumerator() and work with that or use it in a foreach loop.
Now, I know a lot of you may be saying "But it is a interface it will be just mapping the calls and it should make no difference", but there is a key thing, IEnumerable<T> Does not have a Count() or ElementAt() method!. Those methods are extension methods added by LINQ, and the LINQ classes do not know the underlying collection is a List, so it does what it knows the underlying object can do, and that is iterating over the list every time the method is called.
IEnumerable using IEnumerator
using(var enu = IEnumFromStream.GetEnumerator())
{
//You have to call "MoveNext()" once before getting "Current" the first time,
// this is done so you can have a nice clean while loop like this.
while(enu.MoveNext())
{
Container = enu.Current;
}
}
The above code is basically the same thing as
foreach(var enu in IEnumFromStream)
{
Container = enu;
}
The important thing to remember is IEnumerable's do not have a length, in fact they can be infinitely long. There is a whole field of computer science on detecting a infinitely long IEnumerable
Based on the code you posted I think the problem is with your use of the Stopwatch class.
You declare two of these, SwFr and SwFe, but only use the former. Because of this, the last call to SwFr.Elapsed will get the total amount of time across both sets of for loops.
If you are wanting to reuse that object in this way, place a call to SwFr.Reset() right after resultFrLst = SwFr.Elapsed.ToString();.
Alternatively, you could use SwFe when running the second test.

Using PLINQ causing OutofMemory Exception but not with LINQ?

I have come across the scenario, where using LINQ is working fine, but PLINQ causing "OutOfMemoryException". Following is the sample code
static void Main(string[] args)
{
Stopwatch timer = new Stopwatch();
var guidList = new List<Guid>();
for (int i = 0; i < 10000000; i++)
{
guidList.Add(Guid.NewGuid());
}
timer.Start();
// var groupedList = guidList.GroupBy(f => f).Where(g => g.Count() > 1);
var groupedList = guidList.AsParallel().GroupBy(f => f).Where(g => g.Count() > 1);
timer.Stop();
Console.WriteLine(string.Format("Took {0} ms time with result: {1} duplications", timer.ElapsedMilliseconds, groupedList.Count()));
Console.ReadKey();
}
Throwing inner exception "Exception of type 'System.OutOfMemoryException' was thrown"..what could be the issue? What is the guidelines to use PLINQ for this type of scenario, Thanks in advance.
You are probably close to running out of memory on the regular Linq version as well, but using AsParallel() will add additional partitioning overhead to run in parallel and because of that you run over the limit.
When I tried your sample I had the same results at first, the non-parallel version would finish, but the PLinq version would run out of memory - doubling the Guid list size then caused both version to run out of memory. Also note 10 million Guids take about 152 MB space in memory
Also note that your current plinq and linq queries are only executed in your Console.WriteLine() - since Linq is lazy you have to force evaluation, i.e. using ToList() (or in your case Count())
One way to at least dampen the problem is to not put all of the guids into a list, but rather to use an enumerable.
public IEnumerable<Guid> getGuids(int number)
{
for (int i = 0; i < number; i++)
{
yield return Guid.NewGuid();
}
}
This has several advantages. First, it's lazily loaded, so you'll fail in the middle of the processing rather than the declaration of the guids. Second, you aren't holding onto all of the guids that fail the where clause; they can be removed from memory. That means a LOT. You'll only need to have one copy of each guid in memory, not two when you hit the where clause.

foreach + break vs linq FirstOrDefault performance difference

I have two classes that perform date date range data fetching for particular days.
public class IterationLookup<TItem>
{
private IList<Item> items = null;
public IterationLookup(IEnumerable<TItem> items, Func<TItem, TKey> keySelector)
{
this.items = items.OrderByDescending(keySelector).ToList();
}
public TItem GetItem(DateTime day)
{
foreach(TItem i in this.items)
{
if (i.IsWithinRange(day))
{
return i;
}
}
return null;
}
}
public class LinqLookup<TItem>
{
private IList<Item> items = null;
public IterationLookup(IEnumerable<TItem> items, Func<TItem, TKey> keySelector)
{
this.items = items.OrderByDescending(keySelector).ToList();
}
public TItem GetItem(DateTime day)
{
return this.items.FirstOrDefault(i => i.IsWithinRange(day));
}
}
Then I do speed tests which show that Linq version is about 5 times slower. This would make sense when I would store items locally without enumerating them using ToList. This would make it much slower, because with every call to FirstOrDefault, linq would also perform OrderByDescending. But that's not the case so I don't really know what's going on. Linq should perform very similar to iteration.
This is the code excerpt that measures my timings
IList<RangeItem> ranges = GenerateRanges(); // returns List<T>
var iterLookup = new IterationLookup<RangeItems>(ranges, r => r.Id);
var linqLookup = new LinqLookup<RangeItems>(ranges, r => r.Id);
Stopwatch timer = new Stopwatch();
timer.Start();
for(int i = 0; i < 1000000; i++)
{
iterLookup.GetItem(GetRandomDay());
}
timer.Stop();
// display elapsed time
timer.Restart();
for(int i = 0; i < 1000000; i++)
{
linqLookup.GetItem(GetRandomDay());
}
timer.Stop();
// display elapsed time
Why do I know it should perform better? Because when I write a very similar code without using these lookup classes, Linq performs very similar to foreach iterations...
// continue from previous code block
// items used by both order as they do in classes as well
IList<RangeItem> items = ranges.OrderByDescending(r => r.Id).ToList();
timer.Restart();
for(int i = 0; i < 1000000; i++)
{
DateTime day = GetRandomDay();
foreach(RangeItem r in items)
{
if (r.IsWithinRange(day))
{
// RangeItem result = r;
break;
}
}
}
timer.Stop();
// display elapsed time
timer.Restart();
for(int i = 0; i < 1000000; i++)
{
DateTime day = GetRandomDay();
items.FirstOrDefault(i => i.IsWithinRange(day));
}
timer.Stop();
// display elapsed time
This is by my opinion highly similar code. FirstOrDefault as much as I know also iterate for only as long until it gets to a valid item or until it reaches the end. And this is somehow the same as foreach with break.
But even iteration class performs worse than my simple foreach iteration loop which is also a mystery because all the overhead it has is the call to a method within a class compared to direct access.
Question
What am I doing wrong in my LINQ class that it performs so very slow?
What am I doing wrong in my Iteration class so it performs twice as slow as direct foreach loop?
Which times are being measured?
I do these steps:
Generate ranges (as seen below in results)
Create object instances for IterationLookup, LinqLookup (and my optimized date range class BitCountLookup which is not part of discussion here)
Start timer and execute 1 million lookups on random days within maximum date range (as seen in results) by using previously instantiated IterationLookup class.
Start timer and execute 1 million lookups on random days within maximum date range (as seen in results) by using previously instantiated LinqLookup class.
Start timer and execute 1 million lookups (6 times) using manual foreach+break loops and Linq calls.
As you can see object instantiation is not measured.
Appendix I: Results over million lookups
Ranges displayed in these results don't overlap, which should make both approaches even more similar in case LINQ version doesn't break loop on successful match (which it highly likely does).
Generated Ranges:
ID Range 000000000111111111122222222223300000000011111111112222222222
123456789012345678901234567890112345678901234567890123456789
09 22.01.-30.01. |-------|
08 14.01.-16.01. |-|
07 16.02.-19.02. |--|
06 15.01.-17.01. |-|
05 19.02.-23.02. |---|
04 01.01.-07.01.|-----|
03 02.01.-10.01. |-------|
02 11.01.-13.01. |-|
01 16.01.-20.01. |---|
00 29.01.-06.02. |-------|
Lookup classes...
- Iteration: 1028ms
- Linq: 4517ms !!! THIS IS THE PROBLEM !!!
- BitCounter: 401ms
Manual loops...
- Iter: 786ms
- Linq: 981ms
- Iter: 787ms
- Linq: 996ms
- Iter: 787ms
- Linq: 977ms
- Iter: 783ms
- Linq: 979ms
Appendix II: GitHub:Gist code to test for yourself
I've put up a Gist so you can get the full code yourself and see what's going on. Create a Console application and copy Program.cs into it an add other files that are part of this gist.
Grab it here.
Appendix III: Final thoughts and measurement tests
The most problematic thing was of course LINQ implementatino that was awfully slow. Turns out that this has all to do with delegate compiler optimization. LukeH provided the best and most usable solution that actually made me try different approaches to this. I've tried various different approaches in the GetItem method (or GetPointData as it's called in Gist):
the usual way that most of developers would do (and is implemented in Gist as well and wasn't updated after results revealed this is not the best way of doing it):
return this.items.FirstOrDefault(item => item.IsWithinRange(day));
by defining a local predicate variable:
Func<TItem, bool> predicate = item => item.IsWithinRange(day);
return this.items.FirstOrDefault(predicate);
local predicate builder:
Func<DateTime, Func<TItem, bool>> builder = d => item => item.IsWithinRange(d);
return this.items.FirstOrDefault(builder(day));
local predicate builder and local predicate variable:
Func<DateTime, Func<TItem, bool>> builder = d => item => item.IsWithinRange(d);
Func<TItem, bool> predicate = builder(day);
return this.items.FirstOrDefault(predicate);
class level (static or instance) predicate builder:
return this.items.FirstOrDefault(classLevelBuilder(day));
externally defined predicate and provided as method parameter
public TItem GetItem(Func<TItem, bool> predicate)
{
return this.items.FirstOrDefault(predicate);
}
when executing this method I also took two approaches:
predicate provided directly at method call within for loop:
for (int i = 0; i < 1000000; i++)
{
linqLookup.GetItem(item => item.IsWithinRange(GetRandomDay()));
}
predicate builder defined outside for loop:
Func<DateTime, Func<Ranger, bool>> builder = d => r => r.IsWithinRange(d);
for (int i = 0; i < 1000000; i++)
{
linqLookup.GetItem(builder(GetRandomDay()));
}
Results - what performs best
For comparison when using iteration class, it takes it approx. 770ms to execute 1 million lookups on randomly generated ranges.
3 local predicate builder turns out to be best compiler optimized so it performs almost as fast as usual iterations. 800ms.
6.2 predicate builder defined outside for loop: 885ms
6.1 predicate defined within for loop: 1525ms
all others took somewhere between 4200ms - 4360ms and are thus considered unusable.
So whenever you use a predicate in externally frequently callable method, define a builder and execute that. This will yield best results.
The biggest surprise to me about this is that delegates (or predicates) may be this much time consuming.
Sometimes LINQ appears slower because the generation of delegates in a loop (especially a non-obvious loop over method calls) can add time. Instead, you may want to consider moving your finder out of the class to make it more generic (like your key selector is on construction):
public class LinqLookup<TItem, TKey>
{
private IList<Item> items = null;
public IterationLookup(IEnumerable<TItem> items, Func<TItem, TKey> keySelector)
{
this.items = items.OrderByDescending(keySelector).ToList();
}
public TItem GetItem(Func<TItem, TKey> selector)
{
return this.items.FirstOrDefault(selector);
}
}
Since you don't use a lambda in your iterative code, this can be a bit of a difference since it has to create the delegate on each pass through the loop. Usually, this time is insignificant for every-day coding, and the time to invoke the delegate is no more expensive than other method calls, it's just the delegate creation in a tight loop that can add that little bit of extra time.
In this case, since the delegate never changes for the class, you can create it outside of the code you are looping through and it would be more efficient.
Update:
Actually, even without any optimization, compiling in release mode on my machine I do not see the 5x difference. I just performed 1,000,000 lookups on an Item that only has a DateTime field, with 5,000 items in the list. Of course, my data, etc, are different, but you can see the times are actually really close when you abstract out the delegate:
iterative : 14279 ms, 0.014279 ms/call
linq w opt : 17400 ms, 0.0174 ms/call
These time differences are very minor and worth the readability and maintainability improvements of using LINQ. I don't see the 5x difference though, which leads me to believe there's something we're not seeing in your test harness.
Further to Gabe's answer, I can confirm that the difference appears to be caused by the cost of re-constructing the delegate for every call to GetPointData.
If I add a single line to the GetPointData method in your IterationRangeLookupSingle class then it slows right down to the same crawling pace as LinqRangeLookupSingle. Try it:
// in IterationRangeLookupSingle<TItem, TKey>
public TItem GetPointData(DateTime point)
{
// just a single line, this delegate is never used
Func<TItem, bool> dummy = i => i.IsWithinRange(point);
// the rest of the method remains exactly the same as before
// ...
}
(I'm not sure why the compiler and/or jitter can't just ignore the superfluous delegate that I added above. Obviously, the delegate is necessary in your LinqRangeLookupSingle class.)
One possible workaround is to compose the predicate in LinqRangeLookupSingle so that point is passed to it as an argument. This means that the delegate only needs to be constructed once, not every time the GetPointData method is called. For example, the following change will speed up the LINQ version so that it's pretty much comparable with the foreach version:
// in LinqRangeLookupSingle<TItem, TKey>
public TItem GetPointData(DateTime point)
{
Func<DateTime, Func<TItem, bool>> builder = x => y => y.IsWithinRange(x);
Func<TItem, bool> predicate = builder(point);
return this.items.FirstOrDefault(predicate);
}
Assume you have a loop like this:
for (int counter = 0; counter < 1000000; counter++)
{
// execute this 1M times and time it
DateTime day = GetRandomDay();
items.FirstOrDefault(i => i.IsWithinRange(day));
}
This loop will create 1,000,000 lambda objects in order for the i.IsWithinRange call to access day. After each lambda creation, the delegate that calls i.IsWithinRange gets invoked on average 1,000,000 * items.Length / 2 times. Both of those factors do not exist in your foreach loop, which is why the explicit loop is faster.

Calling a list of methods in a random sequence?

I have a list of 10 methods. Now I want to call this methods in a random sequence. The sequence should be generated at runtime. Whats the best way to do this?
It is always astonishing to me the number of incorrect and inefficient answers one sees whenever anyone asks how to shuffle a list of things on StackOverflow. Here we have several examples of code which is brittle (because it assumes that key collisions are impossible when in fact they are merely rare) or slow for large lists. (In this case the problem is stated to be only ten elements, but when possible surely it is better to give a solution that scales to thousands of elements if doing so is not difficult.)
This is not a hard problem to solve correctly. The correct, fast way to do this is to create an array of actions, and then shuffle that array in-place using a Fisher-Yates Shuffle.
http://en.wikipedia.org/wiki/Fisher-Yates_shuffle
Some things not to do:
Do not implement Fischer-Yates shuffle incorrectly. One sees more incorrect than correct implementations of this trivial algorithm. In particular, make sure you are choosing the random number from the correct range. Choosing it from the wrong range produces a biased shuffle.
If the shuffle algorithm must actually be unpredictable then use a source of randomness other than Random, which is only pseudo-random. Remember, Random only has 232 possible seeds, and therefore there are fewer than that many possible shuffles.
If you are going to be producing many shuffles in a short amount of time, do not create a new instance of Random every time. Save and re-use the old one, or use a different source of randomness entirely. Random chooses its seed based on the time; many Randoms created in close succession will produce the same sequence of "random" numbers.
Do not sort on a "random" GUID as your key. GUIDs are guaranteed to be unique. They are not guaranteed to be randomly ordered. It is perfectly legal for an implementation to spit out consecutive GUIDs.
Do not use a random function as a comparator and feed that to a sorting algorithm. Sort algorithms are permitted to do anything they please if the comparator is bad, including crashing, and including producing non-random results. As Microsoft recently found out, it is extremely embarrassing to get a simple algorithm like this wrong.
Do not use the input to random as the key to a dictionary, and then sort the dictionary. There is nothing stopping the randomness source from choosing the same key twice, and therefore either crashing your application with a duplicate key exception, or silently losing one of your methods.
Do not use the algorithm "Create two lists. Add the elements to the first list. Repeatedly move a random element from the first list to the second list, removing the element from the first list". If the list is O(n) to remove an item then this is an O(n2) algorithm.
Do not use the algorithm "Create two lists. Add the elements to the first list. Repeatedly move a random non-null element from the first list to the second list, setting the element in the first list to null." Also do not do this crazy equivalent of that algorithm.If there are lots of items in the list then this gets slower and slower as you start hitting more and more nulls.
New, short answer
Starting from where Ilya Kogan left off, totally correct after we had Eric Lippert find the bug:
var methods = new Action[10];
var rng = new Random();
var shuffled = methods.Select(m => Tuple.Create(rng.Next(), m))
.OrderBy(t => t.Item1).Select(t => t.Item2);
foreach (var action in shuffled) {
action();
}
Of course this is doing a lot behind the scenes. The method below should be much faster. But if LINQ is fast enough...
Old answer (much longer)
After stealing this code from here:
public static T[] RandomPermutation<T>(T[] array)
{
T[] retArray = new T[array.Length];
array.CopyTo(retArray, 0);
Random random = new Random();
for (int i = 0; i < array.Length; i += 1)
{
int swapIndex = random.Next(i, array.Length);
if (swapIndex != i)
{
T temp = retArray[i];
retArray[i] = retArray[swapIndex];
retArray[swapIndex] = temp;
}
}
return retArray;
}
the rest is easy:
var methods = new Action[10];
var perm = RandomPermutation(methods);
foreach (var method in perm)
{
// call the method
}
Have an array of delegates. Suppose you have this:
class YourClass {
public int YourFunction1(int x) { }
public int YourFunction2(int x) { }
public int YourFunction3(int x) { }
}
Now declare a delegate:
public delegate int MyDelegate(int x);
Now create an array of delegates:
MyDelegate delegates[] = new MyDelegate[10];
delegates[0] = new MyDelegate(YourClass.YourFunction1);
delegates[1] = new MyDelegate(YourClass.YourFunction2);
delegates[2] = new MyDelegate(YourClass.YourFunction3);
and now call it like this:
int result = delegates[randomIndex] (48);
You can create a shuffled collection of delegates, and then call all methods in the collection.
Here is an easy way of doing so using a dictionary. The keys of the dictionary are random numbers, and the values are delegates to your methods. When you iterate through the dictionary, it has the effect of shuffling.
var shuffledActions = actions.ToDictionary(
action => random.Next(),
action => action);
foreach (var pair in shuffledActions.OrderBy(item => item.Key))
{
pair.Value();
}
actions is an enumerable of your methods.
random is a of type Random.
Think that this is a list of objects and you want it to extract the objects randomly. You can get a random index using the Random.Next Method (always use current List.Count as parameter) and after that remove object from the list so it will not be drawn again.
When processing a list in a random order, the natural inclination is to shuffle a list.
Another approach is to just keep the list order, but randomly select and remove each item.
var actionList = new[]
{
new Action( () => CallMethodOne() ),
new Action( () => CallMethodTwo() ),
new Action( () => CallMethodThree() )
}.ToList();
var r = new Random();
while(actionList.Count() > 0) {
var index = r.Next(actionList.Count());
var action = actionList[index];
actionList.RemoveAt(index);
action();
}
I think:
Via reflection get Method Objects;
create an array of created Method Object;
generate random index (normalize range);
invoke method;
You can remove method from array to execute method one times.
Bye

Categories

Resources