I don't like premature optimization but I was curious while doing a simple task
so I added a stopwatch.
I don't understand how the difference can be so big.
Array of strings (7 characters) each [richtextbox.text].
Length of array: 5500 elements.
Foreach Elapsed Time: 0.0015 seconds
For Elapsed Time: 9.757 seconds
For:
if (chkLineBreaks.Checked)
{
for (int i = 0; i < txtInput.Lines.Length; i++)
{
outputStringBuilder.Append($#"'{txtInput.Lines[i]}',");
}
}
Foreach:
foreach (var line in txtInput.Lines)
{
outputStringBuilder.Append($#"'{line}',");
if (chkLineBreaks.Checked)
outputStringBuilder.AppendLine();
}
From what I've read the difference should be negligible and the For would slightly faster.
Even more, the foreach has a condition in each iteration (unless it is being 'hoisted' up before the loop.
What is going on here?
Edit:
I've changed the foreach code to:
int i = 0;
foreach (var line in txtInput.Lines)
{
outputStringBuilder.Append($#"'{txtInput.Lines[i]}',");
i++;
}
So it is now doing the same thing.
It is taking 4.625 seconds.. still about half of the time for the FOR
Also I know that I can extract the array outside the loop but this is not what I am testing here :)
Edit #2:
This is the whole code for that section:
Stopwatch sw = new Stopwatch();
sw.Start();
// for (int i = 0; i < txtInput.Lines.Length; i++)
// {
// outputStringBuilder.Append($#"'{txtInput.Lines[i]}',");
// }
int i = 0;
foreach (var line in txtInput.Lines)
{
outputStringBuilder.Append($#"'{txtInput.Lines[i]}',");
i++;
}
MessageBox.Show(sw.Elapsed.ToString());
The issue is that txtInput.Lines is executing many times (once per line) in your for loop (due to use of txtInput.Lines[i]). So for every line of the file you are saying 'OK, please parse this textbox into multiple lines - and then get me the nth line' - and the parsing is the killer bit.
For a fairer comparison:
if (chkLineBreaks.Checked)
{
var lines = txtInput.Lines;
for (int i = 0; i < lines.Length; i++)
{
outputStringBuilder.Append($#"'{lines[i]}',");
}
}
this way the Lines call is done only once (i.e. equivalent to the foreach scenario).
One way to spot these kinds of issues is to compare the timings. The slow one is about 6K slower than the fast one, and you have 5.5K entries. Since 5.5K and 6K are very similar numbers, it may prompt you to think 'am I doing something in the loop that I really shouldn't?'
The compiled code sees very little difference between a for and a foreach statement when traversing an array (or list).
Consider this simple code the writes out a list of strings three different ways:
class Program
{
static void Main(string[] args)
{
var list = Enum.GetNames(typeof(System.UriComponents));
// 1. for each
foreach (var item in list)
{
Console.WriteLine(item);
}
// 2. for loop
for (int i = 0; i<list.Length; i++)
{
Console.WriteLine(list[i]);
}
// 3. LINQ
Console.WriteLine(string.Join(Environment.NewLine, list));
}
}
Now look at the MSIL compiled code, translated back into C# using ILSpy or DotNetPeek.
// ConsoleApplication1.Program
private static void Main(string[] args)
{
string[] list = Enum.GetNames(typeof(UriComponents));
string[] array = list;
for (int j = 0; j < array.Length; j++)
{
string item = array[j];
Console.WriteLine(item);
}
for (int i = 0; i < list.Length; i++)
{
Console.WriteLine(list[i]);
}
Console.WriteLine(string.Join(Environment.NewLine, list));
}
See the two for loops. The foreach statement became a for loop by the compiler. As far as the string.Join() statement it calls the SZArrayEnumerator which holds a reference to the array, and the current index value. At each .MoveNext() call the index is incremented and a new value returned. Basically, it is equivalent to the following:
int i = 0;
while (i<list.Length)
{
Console.WriteLine(list[i]);
i++;
}
Related
I'm optimizing every line of code in my application, as performance is key. I'm testing all assumptions, as what I expect is not what I see in reality.
A strange occurrence to me is the performance of function calls. Below are two scenarios. Iterating an integer within the loop, and with a function in the loop. I expected the function call to be slower, however it is faster??
Can anyone explain this? I'm using .NET 4.7.1
Without function: 2808ms
With function 2295ms
UPDATE:
Switching the loops switches the runtime as well - I don't understand why, but will accept it as it is. Running the two different loops in different applications give similar results. I'll assume in the future that a function call won't create any additional overhead
public static int a = 0;
public static void Increment()
{
a = a + 1;
}
static void Main(string[] args)
{
//There were suggestions that the first for loop always runs faster. I have included a 'dummy' for loop here to warm up.
a = 0;
for (int i = 0;i < 1000;i++)
{
a = a + 1;
}
//Normal increment
Stopwatch sw = new Stopwatch();
sw.Start();
a = 0;
for (int i = 0; i < 900000000;i++)
{
a = a + 1;
}
sw.Stop();
Console.WriteLine(sw.ElapsedMilliseconds);
//Increment with function
Stopwatch sw2 = new Stopwatch();
sw2.Start();
a = 0;
for (int i = 0; i < 900000000; i++)
{
Increment();
}
sw2.Stop();
Console.WriteLine(sw2.ElapsedMilliseconds);
Console.ReadLine();
}
Now, I may get negative points because perhaps somewhere in vast internet there is already an answer to this but I tried to look for it and I simply couldnt find it.
The gist of the problem is that HackerRanks wants you to create an array with a size decided by the user, then have the user add its values (integers) and finally have the program sum its values.
There are plenty of ways to do it and I already know how to but my problem is that I just can't understand Hackerrank's code sample in C# it gave me. I commented the parts I don't understand, which is most of it:
static int simpleArraySum(int n, int[] ar) {
// Complete this function
int sum = 0;
foreach( var item in ar){
sum += item;
}
return sum;
}
static void Main(String[] args) {
//I know what this does
int n = Convert.ToInt32(Console.ReadLine());
//I am lost here, just why create a string array and add the split method?
string[] ar_temp = Console.ReadLine().Split(' ');
//I dont understand here neither, what is it converting? What is the parse for?
int[] ar = Array.ConvertAll(ar_temp,Int32.Parse);
//Why send the n when all you need is the array itself?
int result = simpleArraySum(n, ar);
Console.WriteLine(result);
}
I know some people hate HackerRank, and honestly, I do too but it does gives me some nice ways to test my limited skills in coding with c# and testing my logic. So, if there are better sites that helps you test your logic as a CS please share them with me.
Here is the code I made to solve this problem in Visual Studio but for some stupid reason Hackerrank wont accept it unless I make custom inputs:
//This code can be potentially shorter using the code commented further below.
//For practice's sake, it was made longer.
static int simpleArraySum(int[] arr_temp)
{
int total = 0;
foreach (var item in arr_temp)
{
total += item;
}
return total;
}
static void Main(String[] args)
{
int n = Convert.ToInt32(Console.ReadLine());
int[] arr_temp = new int[n];
for (int i = 0; i < n; i++)
{
arr_temp[i] = Convert.ToInt32(Console.ReadLine());
}
int result = simpleArraySum(arr_temp);
//int result = arr_temp.Sum();
Console.WriteLine(result);
Console.ReadLine();
}
You need to convert to string array since if you're on the main method, all it gets are string values from the argument list. To get the sum then you need to convert the string into a usable number / integer.
I agree that it doesn't make sense to send the first argument n in simpleArraySum because n is simply unused.
as for the part int[] ar = Array.ConvertAll(ar_temp,Int32.Parse); it simply tries to take in all the integers into the array. It is also risky because if you accidentally pass in a string then it will throw an error i.e. pass in "3 4 1 f" <- f will throw an exception, unless this is the desired behaviour.
Personally I think the main method should not be interested in getting involved too much with the data, the heavy lifting should be done in the methods. The better version perhaps would be to modify simpleArraySum and refactor that line in like:
static int simpleArraySum(string input)
{
String[] fields = input.Split(null);
List<int> vals = new List<int>();
foreach (string i in fields)
{
var j = 0;
if (Int32.TryParse(i, out j)) vals.Add(j);
}
int sum = 0;
foreach (var item in vals)
{
sum += item;
}
return sum;
}
I introduced the use of generic list because it's more readable if not cleaner, although the use of List might look overkill to some programmers and might not be as light weight as just using an array, hence on the other hand you can easily stick to using arrays except that it needs to be initialized with the length i.e. int[] vals = new int[fields.Length]; Roughly:
static int simpleArraySum(string input)
{
String[] fields = input.Split(null);
int[] vals = new int[fields.Length];
for (int i = 0; i < fields.Length; i++)
{
var j = 0;
if (Int32.TryParse(fields[i], out j)) vals[i] = j;
}
int sum = 0;
foreach (var item in vals)
{
sum += item;
}
return sum;
}
here my code i hope that helps
static int simpleArraySum(int[] ar,int count) {
if (count > 0 && count <= 10000)
{
if (count == ar.Length)
{
if (!ar.Any(item => (item < 0 || item >= 10000)))
{
return ar.Sum();
}
}
}
return 0;
}
and in main
int arCount = Convert.ToInt32(Console.ReadLine());
int[] arr = Array.ConvertAll(Console.ReadLine().Split(' '), arTemp => Convert.ToInt32(arTemp));
int result = simpleArraySum(arr, arCount);
Console.WriteLine(result);
since Array.ConvertAll() takes a string and convert it to one type array
int or float for example
For users still looking for a 100% C# solution: In above mentioned coding websites do not modify the main function. The aim of the test is to complete the function via the online complier.
using System.Linq;
public static int simpleArraySum(List<int> ar)
{
int sum = ar.Sum();
return sum;
}
This is out of curiosity I want to ask this question...
Here is my code:
for (int i = 0; i < myList.Count - 1; ++i)
{
for (int j = i+1; j < myList.Count; ++j)
{
DoMyStuff(myList[i], myList[j]);
}
}
Pretty simple loop, but obviously it only works with List...
But I was wondering... how can I code this loop in order to make it independent of the collection's type (deriving from IEnumerable...)
My first thought:
IEnumerator it1 = myList.GetEnumerator();
while (it1.MoveNext())
{
IEnumerator it2 = it1; // this part is obviously wrong
while (it2.MoveNext())
{
DoMyStuff(it1.Current, it2.Current);
}
}
Because enumerators don't have an efficient way of getting the n'th element, your best bet is to copy the enumerable into a list, then use your existing code:
void CrossMap<T>(IEnumerable<T> enumerable)
{
List<T> myList = enumerable.ToList();
for (int i = 0; i < myList.Count - 1; ++i)
{
for (int j = i+1; j < myList.Count; ++j)
{
DoMyStuff(myList[i], myList[j]);
}
}
}
However, there is a rather tricksie hack you can do with some collection types. Because the enumerators of some of the collection types in the BCL are declared as value types, rather than reference types, you can create an implicit clone of the state of an enumerator by copying it to another variable:
// notice the struct constraint!
void CrossMap<TEnum, T>(TEnum enumerator) where TEnum : struct, IEnumerator<T>
{
while (enumerator.MoveNext())
{
TEnum enum2 = enumerator; // value type, so this makes an implicit clone!
while (enum2.MoveNext())
{
DoMyStuff(enumerator.Current, enum2.Current);
}
}
}
// to use (you have to specify the type args exactly)
List<int> list = Enumerable.Range(0, 10).ToList();
CrossMap<List<int>.Enumerator, int>(list.GetEnumerator());
This is quite obtuse, and quite hard to use, so you should only do this if this is performance and space-critical.
Here is a way that will truly use the lazy IEnumerable paradigm to generate a stream of non-duplicated combinations from a single IEnumerable input. The first pair will return immediately (no cacheing of lists), but there will be increasing delays (still imperceptible except for very high values of n or very expensive IEnumerables) during the Skip(n) operation which occurs after every move forward on the outer enumerator:
public static IEnumerable<Tuple<T, T>> Combinate<T>(this IEnumerable<T> enumerable) {
var outer = enumerable.GetEnumerator();
var n = 1;
while (outer.MoveNext()) {
foreach (var item in enumerable.Skip(n))
yield return Tuple.Create(outer.Current, item);
n++;
}
}
Here is how you would use it in your case:
foreach(var pair in mySource.Combinate())
DoMyStuff(pair.Item1, pair.Item2);
Postscript
Everyone has pointed out (here and elsewhere) that there is no efficient way of getting the "nth" element of an IEnumerable. This is partly because IEnumerable does not require there to even be an underlying source collection. For example, here's a silly little function that that dynamically generates values for an experiment as quickly as they can be consumed, and continues for a specified period of time rather than for any count:
public static IEnumerable<double> Sample(double milliseconds, Func<double> generator) {
var sw = new Stopwatch();
var timeout = TimeSpan.FromMilliseconds(milliseconds);
sw.Start();
while (sw.Elapsed < timeout)
yield return generator();
}
There are extension methods Count() and ElementAt(int) that are declared on IEnumerable<T>. They are declared in the System.Linq namespace, which should be included by default in your .cs files if you are using any C# version later than C# 3. That means that you could you just do:
for (int i = 0; i < myList.Count() - 1; ++i)
{
for (int j = i+1; j < myList.Count(); ++j)
{
DoMyStuff(myList.ElementAt(i), myList.ElementAt(j));
}
}
However, note that these are methods, and will be called over and over again during iteration, so you might want to save their result to variables, like:
var elementCount = myList.Count();
for (int i = 0; i < elementCount - 1; ++i)
{
var iElement = myList.ElementAt(i);
for (int j = i+1; j < elementCount; ++j)
{
DoMyStuff(iElement, myList.ElementAt(j));
}
}
You could also try some LINQ that will select all pair of elements that are eligible, and then use simple foreach to call the processing, something like:
var result = myList.SelectMany((avalue, aindex) =>
myList.Where((bvalue, bindex) => aindex < bindex)
.Select(bvalue => new {First = avalue, Second = bvalue}));
foreach (var item in result)
{
DoMyStuff(item.First, item.Second);
}
I'd write against IEnumerable<T> and pass a delegate for the indexing operation:
public static void DoStuff<T>(IEnumerable<T> seq, Func<int, T> selector)
{
int count = seq.Count();
for (int i = 0; i < count - 1; ++i)
{
for (int j = i+1; j < count; ++j)
{
DoMyStuff(selector(i), selector(j));
}
}
}
You can call it using:
List<T> list = //whatever
DoStuff(list, i => list[i]);
If you restrict the collection argument to ICollection<T> you can use the Count property instead of using the Count() extension method.
Not really efficient, but readable:
int i = 0;
foreach( var item1 in myList)
{
++i;
foreach( var item2 in myList.Skip(i))
DoMyStuff(item1, item2);
}
You can do it fairly succinctly using IEnumerable.Skip(), and it might even be fairly fast compared with copying the list into an array IF the list is short enough. It's bound to be a lot slower than the copying for lists of a sufficient size, though.
You'd have to do some timings with lists of various sizes to see where copying to an array becomes more efficient.
Here's the code. Note that it's iterating an enumerable twice - which will be ok if the enumerable is implemented correctly!
static void test(IEnumerable<int> myList)
{
int n = 0;
foreach (int v1 in myList)
{
foreach (int v2 in myList.Skip(++n))
{
DoMyStuff(v1, v2);
}
}
}
I can't figure out a discrepancy between the time it takes for the Contains method to find an element in an ArrayList and the time it takes for a small function that I wrote to do the same thing. The documentation states that Contains performs a linear search, so it's supposed to be in O(n) and not any other faster method. However, while the exact values may not be relevant, the Contains method returns in 00:00:00.1087087 seconds while my function takes 00:00:00.1876165. It might not be much, but this difference becomes more evident when dealing with even larger arrays. What am I missing and how should I write my function to match Contains's performances?
I'm using C# on .NET 3.5.
public partial class Window1 : Window
{
public bool DoesContain(ArrayList list, object element)
{
for (int i = 0; i < list.Count; i++)
if (list[i].Equals(element)) return true;
return false;
}
public Window1()
{
InitializeComponent();
ArrayList list = new ArrayList();
for (int i = 0; i < 10000000; i++) list.Add("zzz " + i);
Stopwatch sw = new Stopwatch();
sw.Start();
//Console.Out.WriteLine(list.Contains("zzz 9000000") + " " + sw.Elapsed);
Console.Out.WriteLine(DoesContain(list, "zzz 9000000") + " " + sw.Elapsed);
}
}
EDIT:
Okay, now, lads, look:
public partial class Window1 : Window
{
public bool DoesContain(ArrayList list, object element)
{
int count = list.Count;
for (int i = count - 1; i >= 0; i--)
if (element.Equals(list[i])) return true;
return false;
}
public bool DoesContain1(ArrayList list, object element)
{
int count = list.Count;
for (int i = 0; i < count; i++)
if (element.Equals(list[i])) return true;
return false;
}
public Window1()
{
InitializeComponent();
ArrayList list = new ArrayList();
for (int i = 0; i < 10000000; i++) list.Add("zzz " + i);
Stopwatch sw = new Stopwatch();
long total = 0;
int nr = 100;
for (int i = 0; i < nr; i++)
{
sw.Reset();
sw.Start();
DoesContain(list,"zzz");
total += sw.ElapsedMilliseconds;
}
Console.Out.WriteLine(total / nr);
total = 0;
for (int i = 0; i < nr; i++)
{
sw.Reset();
sw.Start();
DoesContain1(list, "zzz");
total += sw.ElapsedMilliseconds;
}
Console.Out.WriteLine(total / nr);
total = 0;
for (int i = 0; i < nr; i++)
{
sw.Reset();
sw.Start();
list.Contains("zzz");
total += sw.ElapsedMilliseconds;
}
Console.Out.WriteLine(total / nr);
}
}
I made an average of 100 running times for two versions of my function(forward and backward loop) and for the default Contains function. The times I've got are 136 and
133 milliseconds for my functions and a distant winner of 87 for the Contains version. Well now, if before you could argue that the data was scarce and I based my conclusions on a first, isolated run, what do you say about this test? Not does only on average Contains perform better, but it achieves consistently better results in each run. So, is there some kind of disadvantage in here for 3rd party functions, or what?
First, you're not running it many times and comparing averages.
Second, your method isn't being jitted until it actually runs. So the just in time compile time is added into its execution time.
A true test would run each multiple times and average the results (any number of things could cause one or the other to be slower for run X out of a total of Y), and your assemblies should be pre-jitted using ngen.exe.
As you're using .NET 3.5, why are you using ArrayList to start with, rather than List<string>?
A few things to try:
You could see whether using foreach instead of a for loop helps
You could cache the count:
public bool DoesContain(ArrayList list, object element)
{
int count = list.Count;
for (int i = 0; i < count; i++)
{
if (list[i].Equals(element))
{
return true;
}
return false;
}
}
You could reverse the comparison:
if (element.Equals(list[i]))
While I don't expect any of these to make a significant (positive) difference, they're the next things I'd try.
Do you need to do this containment test more than once? If so, you might want to build a HashSet<T> and use that repeatedly.
I'm not sure if you're allowed to post Reflector code, but if you open the method using Reflector, you can see that's it's essentially the same (there are some optimizations for null values, but your test harness doesn't include nulls).
The only difference that I can see is that calling list[i] does bounds checking on i whereas the Contains method does not.
Using the code below I was able to get the following timings relatively consitently (within a few ms):
1: 190ms DoesContainRev
2: 198ms DoesContainRev1
3: 188ms DoesContainFwd
4: 203ms DoesContainFwd1
5: 199ms Contains
Several things to notice here.
This is run with release compiled code from the commandline. Many people make the mistake of benchmarking code inside the Visual Studio debugging environment, not to say anyone here did but something to be careful of.
The list[i].Equals(element) appears to be just a bit slower than element.Equals(list[i]).
using System;
using System.Diagnostics;
using System.Collections;
namespace ArrayListBenchmark
{
class Program
{
static void Main(string[] args)
{
Stopwatch sw = new Stopwatch();
const int arrayCount = 10000000;
ArrayList list = new ArrayList(arrayCount);
for (int i = 0; i < arrayCount; i++) list.Add("zzz " + i);
sw.Start();
DoesContainRev(list, "zzz");
sw.Stop();
Console.WriteLine(String.Format("1: {0}", sw.ElapsedMilliseconds));
sw.Reset();
sw.Start();
DoesContainRev1(list, "zzz");
sw.Stop();
Console.WriteLine(String.Format("2: {0}", sw.ElapsedMilliseconds));
sw.Reset();
sw.Start();
DoesContainFwd(list, "zzz");
sw.Stop();
Console.WriteLine(String.Format("3: {0}", sw.ElapsedMilliseconds));
sw.Reset();
sw.Start();
DoesContainFwd1(list, "zzz");
sw.Stop();
Console.WriteLine(String.Format("4: {0}", sw.ElapsedMilliseconds));
sw.Reset();
sw.Start();
list.Contains("zzz");
sw.Stop();
Console.WriteLine(String.Format("5: {0}", sw.ElapsedMilliseconds));
sw.Reset();
Console.ReadKey();
}
public static bool DoesContainRev(ArrayList list, object element)
{
int count = list.Count;
for (int i = count - 1; i >= 0; i--)
if (element.Equals(list[i])) return true;
return false;
}
public static bool DoesContainFwd(ArrayList list, object element)
{
int count = list.Count;
for (int i = 0; i < count; i++)
if (element.Equals(list[i])) return true;
return false;
}
public static bool DoesContainRev1(ArrayList list, object element)
{
int count = list.Count;
for (int i = count - 1; i >= 0; i--)
if (list[i].Equals(element)) return true;
return false;
}
public static bool DoesContainFwd1(ArrayList list, object element)
{
int count = list.Count;
for (int i = 0; i < count; i++)
if (list[i].Equals(element)) return true;
return false;
}
}
}
With a really good optimizer there should not be difference at all, because the semantics seems to be the same. However the existing optimizer can optimize your function not so good as the hardcoded Contains is optimized. Some of the points for optimization:
comparing to a property each time can be slower than counting downwards and comparing against 0
function call itself has its performance penalty
using iterators instead of explicit indexing can be faster (foreach loop instead of plain for)
First, if you are using types you know ahead of time, I'd suggest using generics. So List instead of ArrayList. Underneath the hood, ArrayList.Contains actually does a bit more than what you are doing. The following is from reflector:
public virtual bool Contains(object item)
{
if (item == null)
{
for (int j = 0; j < this._size; j++)
{
if (this._items[j] == null)
{
return true;
}
}
return false;
}
for (int i = 0; i < this._size; i++)
{
if ((this._items[i] != null) && this._items[i].Equals(item))
{
return true;
}
}
return false;
}
Notice that it forks itself on being passed a null value for item. However, since all the values in your example are not null, the additional check on null at the beginning and in the second loop should in theory take longer.
Are you positive you are dealing with fully compiled code? I.e., when your code runs the first time it gets JIT compiled where as the framework is obviously already compiled.
After your Edit, I copied the code and made a few improvements to it.
The difference was not reproducable, it turns out to be a measuring/rounding issue.
To see that, change your runs to this form:
sw.Reset();
sw.Start();
for (int i = 0; i < nr; i++)
{
DoesContain(list,"zzz");
}
total += sw.ElapsedMilliseconds;
Console.WriteLine(total / nr);
I just moved some lines. The JIT issue was insignificant with this numbr of repetitions.
My guess would be that ArrayList is written in C++ and could be taking advantage of some micro-optimizations (note: this is a guess).
For instance, in C++ you can use pointer arithmetic (specifically incrementing a pointer to iterate an array) to be faster than using an index.
using an array structure, you can't search faster than O(n) whithout any additional information.
if you know that the array is sorted, then you can use binary search algorithm and spent only o(log(n))
otherwise you should use a set.
Revised after reading comments:
It does not use some Hash-alogorithm to enable fast lookup.
Use SortedList<TKey,TValue>, Dictionary<TKey, TValue> or System.Collections.ObjectModel.KeyedCollection<TKey, TValue> for fast access based on a key.
var list = new List<myObject>(); // Search is sequential
var dictionary = new Dictionary<myObject, myObject>(); // key based lookup, but no sequential lookup, Contains fast
var sortedList = new SortedList<myObject, myObject>(); // key based and sequential lookup, Contains fast
KeyedCollection<TKey, TValue> is also fast and allows indexed lookup, however, it needs to be inherited as it is abstract. Therefore, you need a specific collection. However, with the following you can create a generic KeyedCollection.
public class GenericKeyedCollection<TKey, TValue> : KeyedCollection<TKey, TValue> {
public GenericKeyedCollection(Func<TValue, TKey> keyExtractor) {
this.keyExtractor = keyExtractor;
}
private Func<TValue, TKey> keyExtractor;
protected override TKey GetKeyForItem(TValue value) {
return this.keyExtractor(value);
}
}
The advantage of using the KeyedCollection is that the Add method does not require that a key is specified.
I have been testing out the yield return statement with some of the code I have been writing. I have two methods:
public static IEnumerable<String> MyYieldCollection {
get
{
wrapper.RunCommand("Fetch First From Water_Mains");
for (int row = 0; row < tabinfo.GetNumberOfRows() ; row++) //GetNumberOfRows
//will return 1000+ most of the time.
{
yield return wrapper.Evaluate("Water_Mains.col1");
wrapper.RunCommand("Fetch Next From Water_Mains");
}
}
}
and
public static List<String> MyListCollection
{
get
{
List<String> innerlist = new List<String>();
wrapper.RunCommand("Fetch First From Water_Mains");
for (int row = 0; row < tabinfo.GetNumberOfRows(); row++)
{
innerlist.Add(wrapper.Evaluate("Water_Mains.col1"));
wrapper.RunCommand("Fetch Next From Water_Mains");
}
return innerlist;
}
}
then I use a foreach loop over each collection:
foreach (var item in MyYieldCollection) //Same thing for MyListCollection.
{
Console.WriteLine(item);
}
The funny thing is for some reason I seem to be able to loop over and print out the full MyListCollection faster then the MyYieldCollection.
Results:
MyYieldCollection -> 2062
MyListCollection -> 1847
I can't really see a reason for this, am I missing something or is this normal?
How have you done your timings? Are you in the debugger? In debug mode? It looks like you are using DataTable, so I used your code as the template for a test rig (creating 1000 rows each time), and used the harness as below, in release mode at the command line; the results were as follows (the number in brackets is a check to see they both did the same work):
Yield: 2000 (5000000)
List: 2100 (5000000)
Test harness:
static void Main()
{
GC.Collect(GC.MaxGeneration,GCCollectionMode.Forced);
int count1 = 0;
var watch1 = Stopwatch.StartNew();
for(int i = 0 ; i < 5000 ; i++) {
foreach (var row in MyYieldCollection)
{
count1++;
}
}
watch1.Stop();
GC.Collect(GC.MaxGeneration,GCCollectionMode.Forced);
int count2 = 0;
var watch2 = Stopwatch.StartNew();
for (int i = 0; i < 5000; i++)
{
foreach (var row in MyListCollection)
{
count2++;
}
}
watch1.Stop();
Console.WriteLine("Yield: {0} ({1})", watch1.ElapsedMilliseconds, count1);
Console.WriteLine("List: {0} ({1})", watch2.ElapsedMilliseconds, count2);
}
(note you shouldn't normally use GC.Collect, but it has uses for levelling the field for performance tests)
The only other change I made was to the for loop, to avoid repetition:
int rows = tabinfo.Rows.Count;
for (int row = 0; row < rows; row++) {...}
So I don't reproduce your numbers...
What happens if one iteration of your loop is expensive and you only need to iterate over a few items in your collection?
With yield you only need to pay for what you get ;)
public IEnumerable<int> YieldInts()
{
for (int i = 0; i < 1000; i++)
{
Thread.Sleep(1000) // or do some other work
yield return i;
}
}
public void Main()
{
foreach(int i in YieldInts())
{
Console.WriteLine(i);
if(i == 42)
{
break;
}
}
}
My guess is that the JIT can better optimize the for loop in the version that returns the list. In the version that returns IEnumerable, the row variable used in the for loop is now actually a member of a generated class instead of a variable that is local only to the method.
The speed difference is only around 10%, so unless this is performance critical code I wouldn't worry about it.
As far as I understand it, "yield return" will keep looping until it runs our of stuff to do and the function/property exits, returning a filled IEnumarable. In other words instead of the function being called for each item in the foreach loop, it is called once and before anything inside the foreach loop is executed.
It could be by the type of collections that are returned. Perhaps the List can be iterated over faster than whatever datastructure the IEnumerable is.