Accessing yield return collection - c#

Is there any way to access the IEnumerable<T> collection being build up by yield return in a loop from within the method building the IEnumerable itself?
Silly example:
Random random = new Random();
IEnumerable<int> UniqueRandomIntegers(int n, int max)
{
while ([RETURN_VALUE].Count() < n)
{
int value = random.Next(max);
if (![RETURN_VALUE].Contains(value))
yield return value;
}
}

There is no collection being built up. The sequence that is returned is evaluated lazily, and unless the caller explicitly copies the data to another collection, it will be gone as soon as it's been fetched.
If you want to ensure uniqueness, you'll need to do that yourself. For example:
IEnumerable<int> UniqueRandomIntegers(int n, int max)
{
HashSet<int> returned = new HashSet<int>();
for (int i = 0; i < n; i++)
{
int candidate;
do
{
candidate = random.Next(max);
} while (returned.Contains(candidate));
yield return candidate;
returned.Add(candidate);
}
}
Another alternative for unique random integers is to build a collection of max items and shuffle it, which can still be done just-in-time. This is more efficient in the case where max and n are similar (as you don't need to loop round until you're lucky enough to get a new item) but inefficient in the case where max is very large and n isn't.
EDIT: As noted in comments, you can shorten this slightly by changing the body of the for loop to:
int candidate;
do
{
candidate = random.Next(max);
} while (!returned.Add(candidate))
yield return candidate;
That uses the fact that Add will return false if the item already exists in the set.

Related

Performance of Skip and Take in Linq to Objects

"Searching for alternative functionalities for "Skip" and "Take" functionalities"
1 of the link says "Everytime you invoke Skip() it will have to iterate you collection from the beginning in order to skip the number of elements you desire, which gives a loop within a loop (n2 behaviour)"
Conclusion: For large collections, don’t use Skip and Take. Find another way to iterate through your collection and divide it.
In order to access last page data in a huge collection, can you please suggest us a way other than Skip and Take approach?
Looking at the source for Skip, you can see it enumerates over all the items, even over the first n items you want to skip.
It's strange though, because several LINQ-methods have optimizations for collections, like Count and Last.
Skip apparently does not.
If you have an array or IList<T>, you use the indexer to truly skip over them:
for (int i = skipStartIndex; i < list.Count; i++) {
yield return list[i];
}
Internally it is really correct:
private static IEnumerable<TSource> SkipIterator<TSource>(IEnumerable<TSource> source, int count)
{
using (IEnumerator<TSource> enumerator = source.GetEnumerator())
{
while (count > 0 && enumerator.MoveNext())
--count;
if (count <= 0)
{
while (enumerator.MoveNext())
yield return enumerator.Current;
}
}
}
If you want to skip for IEnumerable<T> then it works right. There are no other way except enumeration to get specific element(s). But you can write own extension method on IReadOnlyList<T> or IList<T> (if this interface is implemented in collection used for your elements).
public static class IReadOnlyListExtensions
{
public static IEnumerable<T> Skip<T>(this IReadOnlyList<T> collection, int count)
{
if (collection == null)
return null;
return ICollectionExtensions.YieldSkip(collection, count);
}
private static IEnumerable<T> YieldSkip<T>(IReadOnlyList<T> collection, int count)
{
for (int index = count; index < collection.Count; index++)
{
yield return collection[index];
}
}
}
In addition you can implement it for IEnumerable<T> but check inside for optimization:
if (collection is IReadOnlyList<T>)
{
// do optimized skip
}
Such solution is used a lot of where in Linq source code (but not in Skip unfortunately).
Depends on your implementation, but it would make sense to use indexed arrays for the purpose, instead.

List is taking too much time

I have been writing a program which has a list of 100,000 elements I have to process all the elements with different conditions. This does not take much time 3sec at most. After this I have a list of valid entries and my orignal list which had 100000 elements. The new list usualy has a size of 6K - 7K. The main problem is when I use List.Remove function or any other way to remove the invalid elements from the orignal list with 100K elements its too slow.
Please guide if I should use any thing else then the LIST or there is something that I can do with this code also.
I am including all codes I tried.
for( int k = 0; k < initialList.Count;k++)
{
combo c = initialList.ElementAt(k);
if(invalidEntries.Contains(c))
{
smartString.Append(c.number1.ToString());
smartString.Append(c.number2.ToString());
smartString.Append(c.number3.ToString());
smartString.Append(c.number4.ToString());
smartString.Append(c.number5.ToString());
smartString.Append(" Sum : ");
smartString.Append(c.sum.ToString());
smartString.AppendLine();
InvalidCombo.AppendText(smartString.ToString());
smartString.Clear();
}
else
{
smartString.Append(c.number1.ToString());
smartString.Append(c.number2.ToString());
smartString.Append(c.number3.ToString());
smartString.Append(c.number4.ToString());
smartString.Append(c.number5.ToString());
smartString.Append(" Sum : ");
smartString.Append(c.sum.ToString());
smartString.AppendLine();
validCombo.AppendText(smartString.ToString());
smartString.Clear();
}
}
Also
for(int k=0;k<100000;k++)
{
combo c = initialList.ElementAt(k);
if (!invalidEntries.Contains(c))
validEntries.Add(c);
}
I have also tried the .remove functions but i think list cant take it. so any suggestions/solutions?
I'm a big fan of the structs, but you must be very careful when you work with a struct like yours. The List<T> methods that rely on equality (Contains, IndexOf, Remove) may not work and should not be used. Same for HashSet<T> and similar.
The best for your case would be to combine the processing with the removal. And the fastest way to do a removal from a List<T> is to not use it's item remove related (Remove/RemoveAt) methods! :-) Instead, you "compact" the list by keeping the items that should remain (and their count) at the beginning of the list, and then just use RemoveRange method to cut the unnecessary items at the end of the list. This is very efficient and avoids all the data block moving which happens when you use the "normal" list remove methods. Here is a sample code based on your struct definition:
public struct combo { public int number1; public int number2; public int number3; public int number4; public int number5; public int sum; public bool invalid; }
void ProcessList(List<combo> list)
{
int count = 0;
for (int i = 0; i < list.Count; i++)
{
var item = list[i];
ProcessItem(ref item);
if (!item.invalid) list[count++] = item;
}
list.RemoveRange(count, list.Count - count);
}
void ProcessItem(ref combo item)
{
// do the processing and set item.invalid=true/false
}
In case you are not mutating the item inside the ProcessItem, you can remove the ref modifier, change the return type to bool and use it to control whether the item should be removed from the list or not.
Here is an example of using HashSet. It is very fast.
using System.Collections.Generic;
namespace ConsoleApplication1
{
class Program
{
static void Main(string[] args)
{
var myInts = new HashSet<int>();
for (var i = 0; i < 100000; i++)
myInts.Add(i);
myInts.Remove(62345);
}
}
}

Read skipped data

I've got an IEnumerable of T. I wish to skip a certain number of T, but in the process, I also wish to read these T. You can read the Ts with Take() and skip them with Skip(), but that would entail multiple enumerations.
I need to read N items, deal with them, and then get all the unread items back as an IEnumerable, in one enumeration.
Edit: I'm trying to feed an IEnumerable to a method that takes a Stream-alike. Namely, I must implement only the method
public int Read(T[] readBuffer, int offset, int count)
The problem is that I need to advance the enumerable past the read data to store the position and also read the data to pass back out in the input buffer.
So far I've tried this:
public static IEnumerable<T> SkipTake<T>(this IEnumerable<T> input, int num, Action<List<T>> take)
{
var enumerator = input.GetEnumerator();
var chunk = new List<T>();
for (int i = 0; i < num; ++num)
{
chunk.Add(enumerator.Current);
if (!enumerator.MoveNext())
break;
}
take(chunk);
yield return enumerator.Current;
while (enumerator.MoveNext())
yield return enumerator.Current;
}
Not much luck.
Seems like your implementation does not call MoveNext() at the right time. You must call MoveNext() before you can get the Current element:
public static IEnumerable<T> SkipTake<T>(this IEnumerable<T> input, int num, Action<List<T>> take)
{
var enumerator = input.GetEnumerator();
var chunk = new List<T>();
for (int i = 0; i < num; ++num)
{
if (!enumerator.MoveNext())
break;
chunk.Add(enumerator.Current);
}
take(chunk);
while (enumerator.MoveNext())
yield return enumerator.Current;
}
EDIT: Just to make it clear, here's a usage example:
var list = new List<string>() {"This", "is", "an", "example", "!"};
var res = list.SkipTake(2, chunk =>
{
Console.WriteLine(chunk.Count());
});
Console.WriteLine(res.Count());
The output is
2 3
and the collections contain
{"This", "is"}
and
{"an", "example", "!"}
respectively and the original collection list was not modified.

Writing the implementation for List<T> using IEnumerable<T> from scratch

Out of boredom I decided to write the implementation of List from scratch using IEnumerable. I ran into a few issues that I honestly don't know how to solve:
How would you resize a generic array (T[]) when an index is nulled or set to default(T)?
Since you cannot null T, how do you overcome the numerical primitive problem with their values being 0 by default?
If nothing can be done regarding #2, how do you stop the GetEnumerator() method from yield returning 0 when utilizing a numerical data type?
Last but not least, what is the standard practice regarding downsizing an array? I know for certain that one of the best solutions for upsizing is to increase the current length by a power of 2; if and when do you downsize? Per Remove/RemoveAt or by the currently used length % 2?
Here's what I've done so far:
public class List<T> : IEnumerable<T>
{
T[] list = new T[32];
int current;
public void Add(T item)
{
if (current + 1 > list.Length)
{
T[] temp = new T[list.Length * 2];
Array.Copy(list, temp, list.Length);
list = temp;
}
list[current] = item;
current++;
}
public void Remove(T item)
{
for (int i = 0; i < list.Length; i++)
if (list[i].Equals(item))
list[i] = default(T);
}
public void RemoveAt(int index)
{
list[index] = default(T);
}
public IEnumerator<T> GetEnumerator()
{
foreach (T item in list)
if (item != null && !item.Equals(default(T)))
yield return item;
}
System.Collections.IEnumerator System.Collections.IEnumerable.GetEnumerator()
{
foreach (T item in list)
if (item != null && !item.Equals(default(T)))
yield return item;
}
}
Thanks in advance.
Well, for starters, your Remove and RemoveAt methods do not implement the same behavior as List<T>. List<T> will decrease in size by 1, whereas your List will remain constant in size. You should be shifting the values of higher index from the removed object to one lower index.
Also, GetEnumerator will iterate over all items in the array, regardless of what the value is.
I believe that will solve all of the issues you have. If someone adds a default(T) to the list, then a default(T) is what they will get back out again, regardless if T is an int and thus 0 or a class-type and thus null.
Finally, on downsizing: some growable array implementations rationalize that, if the array had ever gotten so big, then it is more likely than usual to get that big again. For that reason, they specifically avoid downsizing.
The key problem you're running into is maintaining the internal array and what remove does. List<T> does not support partial arrays internally. That doesn't mean you can't, but doing so is far more complicated. To exactly mimic List<T> you want to keep an array and a field for the number of elements in the array that are actually utilized (the list length, which is equal to or less than array length).
Add is easy, you add an element to the end like you did.
Remove is more complicated. If you are removing an element from the end, set the end element to default(T) and change the list length. If you are removing and element from the beginning or middle, then you need to shift the contents of the array and set the last one to default(T). The reason we set the last element to default(T) is to clear the reference, not so we can tell whether or not it's "in use". We know if it's "in use" based on the position in the array and our stored list length.
Another key to implementation is the enumerator. You want to loop through the first elements until you hit the list length. Don't skip nulls.
This is not a complete implementation, but should be correct implementation of the methods you started.
btw, I would not agree with
I know for certain that the best solution for upsizing is to increase the current length by a power of 2
This is the default behavior of List<T> but it's not the best solution in all situations. That's exactly why List<T> allows you to specify a capacity. If you're loading a list from a source and know how many items you're adding, then you can pre-initialize the capacity of the list to reduce the number of copies. Similarly, if you're creating hundreds or thousands of lists that are larger than the default size or likely to be larger, it can be a benefit to memory utilization to pre-initialize the lists to be the same size. That way the memory they allocate and free will be the same continuous blocks and can be more efficiently allocated and deallocated repeatedly. For example, we have a reporting calculation engine that creates about 300,000 lists for each run, with many runs a second. We know the lists are always a few hundred items each, so we pre-initialize them all to 1024 capacity. This is more than most need, but since they're all the same length and they're created and disposed of very quickly, this makes memory reusage efficient.
public class MyList<T> : IEnumerable<T>
{
T[] list = new T[32];
int listLength;
public void Add(T item)
{
if (listLength + 1 > list.Length)
{
T[] temp = new T[list.Length * 2];
Array.Copy(list, temp, list.Length);
list = temp;
}
list[listLength] = item;
listLength++;
}
public void Remove(T item)
{
for (int i = 0; i < list.Length; i++)
if (list[i].Equals(item))
{
RemoveAt(i);
return;
}
}
public void RemoveAt(int index)
{
if (index < 0 || index >= listLength)
{
throw new ArgumentException("'index' must be between 0 and list length.");
}
if (index == listLength - 1)
{
list[index] = default(T);
listLength = index;
return;
}
// need to shift the list
Array.Copy(list, index + 1, list, index, listLength - index + 1);
listLength--;
list[listLength] = default(T);
}
public IEnumerator<T> GetEnumerator()
{
for (int i = 0; i < listLength; i++)
{
yield return list[i];
}
}
IEnumerator IEnumerable.GetEnumerator()
{
return GetEnumerator();
}
}

Get next N elements from enumerable

Context: C# 3.0, .Net 3.5
Suppose I have a method that generates random numbers (forever):
private static IEnumerable<int> RandomNumberGenerator() {
while (true) yield return GenerateRandomNumber(0, 100);
}
I need to group those numbers in groups of 10, so I would like something like:
foreach (IEnumerable<int> group in RandomNumberGenerator().Slice(10)) {
Assert.That(group.Count() == 10);
}
I have defined Slice method, but I feel there should be one already defined. Here is my Slice method, just for reference:
private static IEnumerable<T[]> Slice<T>(IEnumerable<T> enumerable, int size) {
var result = new List<T>(size);
foreach (var item in enumerable) {
result.Add(item);
if (result.Count == size) {
yield return result.ToArray();
result.Clear();
}
}
}
Question: is there an easier way to accomplish what I'm trying to do? Perhaps Linq?
Note: above example is a simplification, in my program I have an Iterator that scans given matrix in a non-linear fashion.
EDIT: Why Skip+Take is no good.
Effectively what I want is:
var group1 = RandomNumberGenerator().Skip(0).Take(10);
var group2 = RandomNumberGenerator().Skip(10).Take(10);
var group3 = RandomNumberGenerator().Skip(20).Take(10);
var group4 = RandomNumberGenerator().Skip(30).Take(10);
without the overhead of regenerating number (10+20+30+40) times. I need a solution that will generate exactly 40 numbers and break those in 4 groups by 10.
Are Skip and Take of any use to you?
Use a combination of the two in a loop to get what you want.
So,
list.Skip(10).Take(10);
Skips the first 10 records and then takes the next 10.
I have done something similar. But I would like it to be simpler:
//Remove "this" if you don't want it to be a extension method
public static IEnumerable<IList<T>> Chunks<T>(this IEnumerable<T> xs, int size)
{
var curr = new List<T>(size);
foreach (var x in xs)
{
curr.Add(x);
if (curr.Count == size)
{
yield return curr;
curr = new List<T>(size);
}
}
}
I think yours are flawed. You return the same array for all your chunks/slices so only the last chunk/slice you take would have the correct data.
Addition: Array version:
public static IEnumerable<T[]> Chunks<T>(this IEnumerable<T> xs, int size)
{
var curr = new T[size];
int i = 0;
foreach (var x in xs)
{
curr[i % size] = x;
if (++i % size == 0)
{
yield return curr;
curr = new T[size];
}
}
}
Addition: Linq version (not C# 2.0). As pointed out, it will not work on infinite sequences and will be a great deal slower than the alternatives:
public static IEnumerable<T[]> Chunks<T>(this IEnumerable<T> xs, int size)
{
return xs.Select((x, i) => new { x, i })
.GroupBy(xi => xi.i / size, xi => xi.x)
.Select(g => g.ToArray());
}
Using Skip and Take would be a very bad idea. Calling Skip on an indexed collection may be fine, but calling it on any arbitrary IEnumerable<T> is liable to result in enumeration over the number of elements skipped, which means that if you're calling it repeatedly you're enumerating over the sequence an order of magnitude more times than you need to be.
Complain of "premature optimization" all you want; but that is just ridiculous.
I think your Slice method is about as good as it gets. I was going to suggest a different approach that would provide deferred execution and obviate the intermediate array allocation, but that is a dangerous game to play (i.e., if you try something like ToList on such a resulting IEnumerable<T> implementation, without enumerating over the inner collections, you'll end up in an endless loop).
(I've removed what was originally here, as the OP's improvements since posting the question have since rendered my suggestions here redundant.)
Let's see if you even need the complexity of Slice. If your random number generates is stateless, I would assume each call to it would generate unique random numbers, so perhaps this would be sufficient:
var group1 = RandomNumberGenerator().Take(10);
var group2 = RandomNumberGenerator().Take(10);
var group3 = RandomNumberGenerator().Take(10);
var group4 = RandomNumberGenerator().Take(10);
Each call to Take returns a new group of 10 numbers.
Now, if your random number generator re-seeds itself with a specific value each time it's iterated, this won't work. You'll simply get the same 10 values for each group. So instead, you would use:
var generator = RandomNumberGenerator();
var group1 = generator.Take(10);
var group2 = generator.Take(10);
var group3 = generator.Take(10);
var group4 = generator.Take(10);
This maintains an instance of the generator so that you can continue retrieving values without re-seeding the generator.
You could use the Skip and Take methods with any Enumerable object.
For your edit :
How about a function that takes a slice number and a slice size as a parameter?
private static IEnumerable<T> Slice<T>(IEnumerable<T> enumerable, int sliceSize, int sliceNumber) {
return enumerable.Skip(sliceSize * sliceNumber).Take(sliceSize);
}
It seems like we'd prefer for an IEnumerable<T> to have a fixed position counter so that we can do
var group1 = items.Take(10);
var group2 = items.Take(10);
var group3 = items.Take(10);
var group4 = items.Take(10);
and get successive slices rather than getting the first 10 items each time. We can do that with a new implementation of IEnumerable<T> which keeps one instance of its Enumerator and returns it on every call of GetEnumerator:
public class StickyEnumerable<T> : IEnumerable<T>, IDisposable
{
private IEnumerator<T> innerEnumerator;
public StickyEnumerable( IEnumerable<T> items )
{
innerEnumerator = items.GetEnumerator();
}
public IEnumerator<T> GetEnumerator()
{
return innerEnumerator;
}
System.Collections.IEnumerator System.Collections.IEnumerable.GetEnumerator()
{
return innerEnumerator;
}
public void Dispose()
{
if (innerEnumerator != null)
{
innerEnumerator.Dispose();
}
}
}
Given that class, we could implement Slice with
public static IEnumerable<IEnumerable<T>> Slices<T>(this IEnumerable<T> items, int size)
{
using (StickyEnumerable<T> sticky = new StickyEnumerable<T>(items))
{
IEnumerable<T> slice;
do
{
slice = sticky.Take(size).ToList();
yield return slice;
} while (slice.Count() == size);
}
yield break;
}
That works in this case, but StickyEnumerable<T> is generally a dangerous class to have around if the consuming code isn't expecting it. For example,
using (var sticky = new StickyEnumerable<int>(Enumerable.Range(1, 10)))
{
var first = sticky.Take(2);
var second = sticky.Take(2);
foreach (int i in second)
{
Console.WriteLine(i);
}
foreach (int i in first)
{
Console.WriteLine(i);
}
}
prints
1
2
3
4
rather than
3
4
1
2
Take a look at Take(), TakeWhile() and Skip()
I think the use of Slice() would be a bit misleading. I think of that as a means to give me a chuck of an array into a new array and not causing side effects. In this scenario you would actually move the enumerable forward 10.
A possible better approach is to just use the Linq extension Take(). I don't think you would need to use Skip() with a generator.
Edit: Dang, I have been trying to test this behavior with the following code
Note: this is wasn't really correct, I leave it here so others don't fall into the same mistake.
var numbers = RandomNumberGenerator();
var slice = numbers.Take(10);
public static IEnumerable<int> RandomNumberGenerator()
{
yield return random.Next();
}
but the Count() for slice is alway 1. I also tried running it through a foreach loop since I know that the Linq extensions are generally lazily evaluated and it only looped once. I eventually did the code below instead of the Take() and it works:
public static IEnumerable<int> Slice(this IEnumerable<int> enumerable, int size)
{
var list = new List<int>();
foreach (var count in Enumerable.Range(0, size)) list.Add(enumerable.First());
return list;
}
If you notice I am adding the First() to the list each time, but since the enumerable that is being passed in is the generator from RandomNumberGenerator() the result is different every time.
So again with a generator using Skip() is not needed since the result will be different. Looping over an IEnumerable is not always side effect free.
Edit: I'll leave the last edit just so no one falls into the same mistake, but it worked fine for me just doing this:
var numbers = RandomNumberGenerator();
var slice1 = numbers.Take(10);
var slice2 = numbers.Take(10);
The two slices were different.
I had made some mistakes in my original answer but some of the points still stand. Skip() and Take() are not going to work the same with a generator as it would a list. Looping over an IEnumerable is not always side effect free. Anyway here is my take on getting a list of slices.
public static IEnumerable<int> RandomNumberGenerator()
{
while(true) yield return random.Next();
}
public static IEnumerable<IEnumerable<int>> Slice(this IEnumerable<int> enumerable, int size, int count)
{
var slices = new List<List<int>>();
foreach (var iteration in Enumerable.Range(0, count)){
var list = new List<int>();
list.AddRange(enumerable.Take(size));
slices.Add(list);
}
return slices;
}
I got this solution for the same problem:
int[] ints = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10};
IEnumerable<IEnumerable<int>> chunks = Chunk(ints, 2, t => t.Dump());
//won't enumerate, so won't do anything unless you force it:
chunks.ToList();
IEnumerable<T> Chunk<T, R>(IEnumerable<R> src, int n, Func<IEnumerable<R>, T> action){
IEnumerable<R> head;
IEnumerable<R> tail = src;
while (tail.Any())
{
head = tail.Take(n);
tail = tail.Skip(n);
yield return action(head);
}
}
if you just want the chunks returned, not do anything with them, use chunks = Chunk(ints, 2, t => t). What I would really like is to have to have t=>t as default action, but I haven't found out how to do that yet.

Categories

Resources