Can someone come up with a better version of this enumerator? - c#

I'm pretty happy with the following method. It takes an enumerable and a list of sorted, disjoint ranges and skips items not in the ranges. If the ranges are null, we just walk every item. The enumerable and the list of ranges are both possibly large. We want this method to be as high performance as possible.
Can someone think of a more elegant piece of code? I'm primarily interested in C# implementations, but if someone has a three-character APL implementation, that's cool too.
public static IEnumerable<T> WalkRanges<T>(IEnumerable<T> source, List<Pair<int, int>> ranges)
{
Debug.Assert(ranges == null || ranges.Count > 0);
int currentItem = 0;
Pair<int, int> currentRange = new Pair<int, int>();
int currentRangeIndex = -1;
bool betweenRanges = false;
if (ranges != null)
{
currentRange = ranges[0];
currentRangeIndex = 0;
betweenRanges = currentRange.First > 0;
}
foreach (T item in source)
{
if (ranges != null) {
if (betweenRanges) {
if (currentItem == currentRange.First)
betweenRanges = false;
else {
currentItem++;
continue;
}
}
}
yield return item;
if (ranges != null) {
if (currentItem == currentRange.Second) {
if (currentRangeIndex == ranges.Count - 1)
break; // We just visited the last item in the ranges
currentRangeIndex = currentRangeIndex + 1;
currentRange = ranges[currentRangeIndex];
betweenRanges = true;
}
}
currentItem++;
}
}

Maybe use linq on your source something like:
public static IEnumerable<T> WalkRanges<T>(IEnumerable<T> source, List<Pair<int, int>> ranges)
{
if(ranges == null)
return null;
return source.Where((item, index) => ranges.Any(y => y.First < index && y.Second > index)).AsEnumerable();
}
I don't have my Windows PC in front of me and I'm not sure I understood your code correctly, but I tried to understand your text instead and the code above could work.... or something like it.
UPDATED: Regarding the performance issue I would recommend you to test the performance with some simple test and time both of the functions.

You could copy the source list to an array and then for each range, you can block copy from your new source array to a target array in the proper location. If you can get your source collection passed in as an array, that would make this an even better approach. If you do have to do the initial copy, it is O(N) for that operation plus O(M) where M is the total number of items in the final array. So it ends up coming out to O(N) in either case.

Here's my take. I find it easier to understand, if not more elegant.
public static IEnumerable<T> WalkRanges<T>(IEnumerable<T> source, List<Tuple<int, int>> ranges)
{
if (ranges == null)
return source;
Debug.Assert(ranges.Count > 0);
return WalkRangesInternal(source, ranges);
}
static IEnumerable<T> WalkRangesInternal<T>(IEnumerable<T> source, List<Tuple<int, int>> ranges)
{
int currentItem = 0;
var rangeEnum = ranges.GetEnumerator();
bool moreData = rangeEnum.MoveNext();
using (var sourceEnum = source.GetEnumerator())
while (moreData)
{
// skip over every item in the gap between ranges
while (currentItem < rangeEnum.Current.Item1
&& (moreData = sourceEnum.MoveNext()))
currentItem++;
// yield all the elements in the range
while (currentItem <= rangeEnum.Current.Item2
&& (moreData = sourceEnum.MoveNext()))
{
yield return sourceEnum.Current;
currentItem++;
}
// advance to the next range
moreData = rangeEnum.MoveNext();
}
}

How about this (untested)? Should have pretty similar performance characteristics (pure streaming, no unnecessary buffering, quick exit), but is easier to follow, IMO:
public static IEnumerable<T> WalkRanges<T>(IEnumerable<T> source,
List<Pair<int, int>> ranges)
{
if (source == null)
throw new ArgumentNullException("source");
// If ranges is null, just return the source. From spec.
return ranges == null ? source : RangeIterate(source, ranges);
}
private static IEnumerable<T> RangeIterate<T>(IEnumerable<T> source,
List<Pair<int, int>> ranges)
{
// The key bit: a lazy sequence of all valid indices belonging to
// each range. No buffering.
var validIndices = from range in ranges
let start = Math.Max(0, range.First)
from validIndex in Enumerable.Range(start, range.Second - start + 1)
select validIndex;
int currentIndex = -1;
using (var indexErator = validIndices.GetEnumerator())
{
// Optimization: Get out early if there are no ranges.
if (!indexErator.MoveNext())
yield break;
foreach (var item in source)
{
if (++currentIndex == indexErator.Current)
{
// Valid index, yield.
yield return item;
// Move to the next valid index.
// Optimization: get out early if there aren't any more.
if (!indexErator.MoveNext())
yield break;
}
}
}
}
If you don't mind buffering indices, you can do something like this, which is even more clearer, IMO:
public static IEnumerable<T> WalkRanges<T>(IEnumerable<T> source,
List<Pair<int, int>> ranges)
{
if (source == null)
throw new ArgumentNullException("source");
if (ranges == null)
return source;
// Optimization: Get out early if there are no ranges.
if (!ranges.Any())
return Enumerable.Empty<T>();
var validIndices = from range in ranges
let start = Math.Max(0, range.First)
from validIndex in Enumerable.Range(start, range.Second - start + 1)
select validIndex;
// Buffer the valid indices into a set.
var validIndicesSet = new HashSet<int>(validIndices);
// Optimization: don't take an item beyond the last index of the last range.
return source.Take(ranges.Last().Second + 1)
.Where((item, index) => validIndicesSet.Contains(index));
}

You could iterate over the collection manually to prevent the enumerator from getting the current item when it will be skipped:
public static IEnumerable<T> WalkRanges<T>(IEnumerable<T> source, List<Pair<int, int>> ranges)
{
Debug.Assert(ranges == null || ranges.Count > 0);
int currentItem = 0;
Pair<int, int> currentRange = new Pair<int, int>();
int currentRangeIndex = -1;
bool betweenRanges = false;
if (ranges != null)
{
currentRange = ranges[0];
currentRangeIndex = 0;
betweenRanges = currentRange.First > 0;
}
using (IEnumerator<T> enumerator = source.GetEnumerator())
{
while (enumerator.MoveNext())
{
if (ranges != null)
{
if (betweenRanges)
{
if (currentItem == currentRange.First)
betweenRanges = false;
else
{
currentItem++;
continue;
}
}
}
yield return enumerator.Current;
if (ranges != null)
{
if (currentItem == currentRange.Second)
{
if (currentRangeIndex == ranges.Count - 1)
break; // We just visited the last item in the ranges
currentRangeIndex = currentRangeIndex + 1;
currentRange = ranges[currentRangeIndex];
betweenRanges = true;
}
}
currentItem++;
}
}
}

My second try, this will consider the ordering of the ranges. I haven' tried it yet but I thinkt it works :). You could probably extract some of the code to smaller functions to make it more readable.
public static IEnumerable<T> WalkRanges<T>(IEnumerable<T> source, List<Pair<int, int>> ranges)
{
int currentIndex = 0;
int currentRangeIndex = 0;
int maxRangeIndex = ranges.Length;
bool done = false;
foreach(var item in source)
{
if(currentIndex > range[currentRangeIndex].Second)
{
while(currentIndex > range[currentRangeIndex].Second)
{
if(!++currentRangeIndex < maxRangeIndex)
{
// We've passed last range =>
// set done = true to break outer loop and then break
done = true;
break;
}
}
if(currentIndex > range[currentRangeIndex].First)
yield item; // include if larger than first since we now it's smaller than second
}
else if(currentIndex > range[currentRangeIndex].First)
{
// If higher than first and lower than second we're in range
yield item;
}
if(done) // if done break outer loop
break;
currentIndex++; // always increase index when advancint through source
}
}

Related

Skip foreach X positions

I want skip my in foreach. For example:
foreach(Times t in timeList)
{
if(t.Time == 20)
{
timeList.Skip(3);
}
}
I want "jump" 3 positions in my list.. If, in my if block t.Id = 10 after skip I want get t.Id = 13
How about this? If you use a for loop then you can just step the index forward as needed:
for (var x = 0; x < timeList.Length; x++)
{
if (timeList[x].Time == 20)
{
// option 1
x += 2; // 'x++' in the for loop will +1,
// we are adding +2 more to make it 3?
// option 2
// x += 3; // just add 3!
}
}
You can't modify an enumerable in-flight, as it were, like you could the index of a for loop; you must account for it up front. Fortunately there are several way to do this.
Here's one:
foreach(Times t in timeList.Where(t => t.Time < 20 || t.Time > 22))
{
}
There's also the .Skip() option, but to use it you must break the list into two separate enumerables and then rejoin them:
var times1 = timeList.TakeWhile(t => t.Time != 20);
var times2 = timeList.SkipeWhile(t => t.Time != 20).Skip(3);
foreach(var t in times1.Concat(times2))
{
}
But that's not exactly efficient, as it requires iterating over the first part of the sequence twice (and won't work at all for Read Once -style sequences). To fix this, you can make a custom enumerator:
public static IEnumerable<T> SkipAt<T>(this IEnumerable<T> items, Predicate<T> SkipTrigger, int SkipCount)
{
bool triggered = false;
int SkipsRemaining = 0;
var e = items.GetEnumerator();
while (e.MoveNext())
{
if (!triggered && SkipTrigger(e.Current))
{
triggered = true;
SkipsRemaining = SkipCount;
}
if (triggered)
{
SkipsRemaining--;
if (SkipsRemaining == 0) triggered = false;
}
else
{
yield return e.Current;
}
}
}
Then you could use it like this:
foreach(Times t in timeList.SkipAt(t => t.Times == 20, 3))
{
}
But again: you still need to decide about this up front, rather than inside the loop body.
For fun, I felt like adding an overload that uses another predicate to tell the enumerator when to resume:
public static IEnumerable<T> SkipAt<T>(this IEnumerable<T> items, Predicate<T> SkipTrigger, Predicate<T> ResumeTrigger)
{
bool triggered = false;
var e = items.GetEnumerator();
while (e.MoveNext())
{
if (!triggered && SkipTrigger(e.Current))
{
triggered = true;
}
if (triggered)
{
if (ResumeTrigger(e.Current)) triggered = false;
}
else
{
yield return e.Current;
}
}
}
You can use continue with some simple variables.
int skipCount = 0;
bool skip = false;
foreach (var x in myList)
{
if (skipCount == 3)
{
skip = false;
skipCount = 0;
}
if (x.time == 20)
{
skip = true;
skipCount = 0;
}
if (skip)
{
skipCount++;
continue;
}
// here you do whatever you don't want to skip
}
Or if you can use a for-loop, increase the index like this:
for (int i = 0; i < times.Count)
{
if (times[i].time == 20)
{
i += 2; // 2 + 1 loop increment
continue;
}
// here you do whatever you don't want to skip
}

Why does the Count() method use the "checked" keyword?

As I was looking the difference between Count and Count(), I thought to glance at the source code of Count(). I saw the following code snippet in which I wonder why the checked keyword is necessary/needed:
int num = 0;
using (IEnumerator<TSource> enumerator = source.GetEnumerator())
{
while (enumerator.MoveNext())
{
num = checked(num + 1);
}
return num;
}
The source code:
// System.Linq.Enumerable
using System.Collections;
using System.Collections.Generic;
public static int Count<TSource>(this IEnumerable<TSource> source)
{
if (source == null)
{
ThrowHelper.ThrowArgumentNullException(ExceptionArgument.source);
}
ICollection<TSource> collection = source as ICollection<TSource>;
if (collection != null)
{
return collection.Count;
}
IIListProvider<TSource> iIListProvider = source as IIListProvider<TSource>;
if (iIListProvider != null)
{
return iIListProvider.GetCount(onlyIfCheap: false);
}
ICollection collection2 = source as ICollection;
if (collection2 != null)
{
return collection2.Count;
}
int num = 0;
using (IEnumerator<TSource> enumerator = source.GetEnumerator())
{
while (enumerator.MoveNext())
{
num = checked(num + 1);
}
return num;
}
}
Because it doesn't want to return a negative number in the (admittedly unlikely) event that there are more than 2-billion-odd items in the sequence - or a non-negative but just wrong number in the (even more unlikely) case that there are more than 4-billion-odd items in the sequence. checked will detect the overflow condition.

Get the index of item in a list given its property

In MyList List<Person> there may be a Person with its Name property set to "ComTruise". I need the index of first occurrence of "ComTruise" in MyList, but not the entire Person element.
What I'm doing now is:
string myName = ComTruise;
int thatIndex = MyList.SkipWhile(p => p.Name != myName).Count();
If the list is very large, is there a more optimal way to get the index?
You could use FindIndex
string myName = "ComTruise";
int myIndex = MyList.FindIndex(p => p.Name == myName);
Note: FindIndex returns -1 if no item matching the conditions defined by the supplied predicate can be found in the list.
As it's an ObservableCollection, you can try this
int index = MyList.IndexOf(MyList.Where(p => p.Name == "ComTruise").FirstOrDefault());
It will return -1 if "ComTruise" doesn't exist in your collection.
As mentioned in the comments, this performs two searches. You can optimize it with a for loop.
int index = -1;
for(int i = 0; i < MyList.Count; i++)
{
//case insensitive search
if(String.Equals(MyList[i].Name, "ComTruise", StringComparison.OrdinalIgnoreCase))
{
index = i;
break;
}
}
It might make sense to write a simple extension method that does this:
public static int FindIndex<T>(
this IEnumerable<T> collection, Func<T, bool> predicate)
{
int i = 0;
foreach (var item in collection)
{
if (predicate(item))
return i;
i++;
}
return -1;
}
var p = MyList.Where(p => p.Name == myName).FirstOrDefault();
int thatIndex = -1;
if (p != null)
{
thatIndex = MyList.IndexOf(p);
}
if (p != -1) ...

LINQ: take a sequence of elements from a collection

I have a collection of objects and need to take batches of 100 objects and do some work with them until there are no objects left to process.
Instead of looping through each item and grabbing 100 elements then the next hundred etc is there a nicer way of doing it with linq?
Many thanks
static void test(IEnumerable<object> objects)
{
while (objects.Any())
{
foreach (object o in objects.Take(100))
{
}
objects = objects.Skip(100);
}
}
:)
int batchSize = 100;
var batched = yourCollection.Select((x, i) => new { Val = x, Idx = i })
.GroupBy(x => x.Idx / batchSize,
(k, g) => g.Select(x => x.Val));
// and then to demonstrate...
foreach (var batch in batched)
{
Console.WriteLine("Processing batch...");
foreach (var item in batch)
{
Console.WriteLine("Processing item: " + item);
}
}
This will partition the list into a list of lists of however many items you specify.
public static IEnumerable<IEnumerable<T>> Partition<T>(this IEnumerable<T> source, int size)
{
int i = 0;
List<T> list = new List<T>(size);
foreach (T item in source)
{
list.Add(item);
if (++i == size)
{
yield return list;
list = new List<T>(size);
i = 0;
}
}
if (list.Count > 0)
yield return list;
}
I don't think linq is really suitable for this sort of processing - it is mainly useful for performing operations on whole sequences rather than splitting or modifying them. I would do this by accessing the underlying IEnumerator<T> since any method using Take and Skip are going to be quite inefficient.
public static void Batch<T>(this IEnumerable<T> items, int batchSize, Action<IEnumerable<T>> batchAction)
{
if (batchSize < 1) throw new ArgumentException();
List<T> buffer = new List<T>();
using (var enumerator = (items ?? Enumerable.Empty<T>()).GetEnumerator())
{
while (enumerator.MoveNext())
{
buffer.Add(enumerator.Current);
if (buffer.Count == batchSize)
{
batchAction(buffer);
buffer.Clear();
}
}
//execute for remaining items
if (buffer.Count > 0)
{
batchAction(buffer);
}
}
}
var batchSize = 100;
for (var i = 0; i < Math.Ceiling(yourCollection.Count() / (decimal)batchSize); i++)
{
var batch = yourCollection
.Skip(i*batchSize)
.Take(batchSize);
// Do something with batch
}

N-way intersection of sorted enumerables

Given n enumerables of the same type that return distinct elements in ascending order, for example:
IEnumerable<char> s1 = "adhjlstxyz";
IEnumerable<char> s2 = "bdeijmnpsz";
IEnumerable<char> s3 = "dejlnopsvw";
I want to efficiently find all values that are elements of all enumerables:
IEnumerable<char> sx = Intersect(new[] { s1, s2, s3 });
Debug.Assert(sx.SequenceEqual("djs"));
"Efficiently" here means that
the input enumerables should each be enumerated only once,
the elements of the input enumerables should be retrieved only when needed, and
the algorithm should not recursively enumerate its own output.
I need some hints how to approach a solution.
Here is my (naive) attempt so far:
static IEnumerable<T> Intersect<T>(IEnumerable<T>[] enums)
{
return enums[0].Intersect(
enums.Length == 2 ? enums[1] : Intersect(enums.Skip(1).ToArray()));
}
Enumerable.Intersect collects the first enumerable into a HashSet, then enumerates the second enumerable and yields all matching elements.
Intersect then recursively intersects the result with the next enumerable.
This obviously isn't very efficient (it doesn't meet the constraints). And it doesn't exploit the fact that the elements are sorted at all.
Here is my attempt to intersect two enumerables. Maybe it can be generalized for n enumerables?
static IEnumerable<T> Intersect<T>(IEnumerable<T> first, IEnumerable<T> second)
{
using (var left = first.GetEnumerator())
using (var right = second.GetEnumerator())
{
var leftHasNext = left.MoveNext();
var rightHasNext = right.MoveNext();
var comparer = Comparer<T>.Default;
while (leftHasNext && rightHasNext)
{
switch (Math.Sign(comparer.Compare(left.Current, right.Current)))
{
case -1:
leftHasNext = left.MoveNext();
break;
case 0:
yield return left.Current;
leftHasNext = left.MoveNext();
rightHasNext = right.MoveNext();
break;
case 1:
rightHasNext = right.MoveNext();
break;
}
}
}
}
OK; more complex answer:
public static IEnumerable<T> Intersect<T>(params IEnumerable<T>[] enums) {
return Intersect<T>(null, enums);
}
public static IEnumerable<T> Intersect<T>(IComparer<T> comparer, params IEnumerable<T>[] enums) {
if(enums == null) throw new ArgumentNullException("enums");
if(enums.Length == 0) return Enumerable.Empty<T>();
if(enums.Length == 1) return enums[0];
if(comparer == null) comparer = Comparer<T>.Default;
return IntersectImpl(comparer, enums);
}
public static IEnumerable<T> IntersectImpl<T>(IComparer<T> comparer, IEnumerable<T>[] enums) {
IEnumerator<T>[] iters = new IEnumerator<T>[enums.Length];
try {
// create iterators and move as far as the first item
for (int i = 0; i < enums.Length; i++) {
if(!(iters[i] = enums[i].GetEnumerator()).MoveNext()) {
yield break; // no data for one of the iterators
}
}
bool first = true;
T lastValue = default(T);
do { // get the next item from the first sequence
T value = iters[0].Current;
if (!first && comparer.Compare(value, lastValue) == 0) continue; // dup in first source
bool allTrue = true;
for (int i = 1; i < iters.Length; i++) {
var iter = iters[i];
// if any sequence isn't there yet, progress it; if any sequence
// ends, we're all done
while (comparer.Compare(iter.Current, value) < 0) {
if (!iter.MoveNext()) goto alldone; // nasty, but
}
// if any sequence is now **past** value, then short-circuit
if (comparer.Compare(iter.Current, value) > 0) {
allTrue = false;
break;
}
}
// so all sequences have this value
if (allTrue) yield return value;
first = false;
lastValue = value;
} while (iters[0].MoveNext());
alldone:
;
} finally { // clean up all iterators
for (int i = 0; i < iters.Length; i++) {
if (iters[i] != null) {
try { iters[i].Dispose(); }
catch { }
}
}
}
}
You can use LINQ:
public static IEnumerable<T> Intersect<T>(IEnumerable<IEnumerable<T>> enums) {
using (var iter = enums.GetEnumerator()) {
IEnumerable<T> result;
if (iter.MoveNext()) {
result = iter.Current;
while (iter.MoveNext()) {
result = result.Intersect(iter.Current);
}
} else {
result = Enumerable.Empty<T>();
}
return result;
}
}
This would be simple, although it does build the hash-set multiple times; advancing all n at once (to take advantage of sorted) would be hard, but you could also build a single hash-set and remove missing things?

Categories

Resources