LINQ: take a sequence of elements from a collection - c#

I have a collection of objects and need to take batches of 100 objects and do some work with them until there are no objects left to process.
Instead of looping through each item and grabbing 100 elements then the next hundred etc is there a nicer way of doing it with linq?
Many thanks

static void test(IEnumerable<object> objects)
{
while (objects.Any())
{
foreach (object o in objects.Take(100))
{
}
objects = objects.Skip(100);
}
}
:)

int batchSize = 100;
var batched = yourCollection.Select((x, i) => new { Val = x, Idx = i })
.GroupBy(x => x.Idx / batchSize,
(k, g) => g.Select(x => x.Val));
// and then to demonstrate...
foreach (var batch in batched)
{
Console.WriteLine("Processing batch...");
foreach (var item in batch)
{
Console.WriteLine("Processing item: " + item);
}
}

This will partition the list into a list of lists of however many items you specify.
public static IEnumerable<IEnumerable<T>> Partition<T>(this IEnumerable<T> source, int size)
{
int i = 0;
List<T> list = new List<T>(size);
foreach (T item in source)
{
list.Add(item);
if (++i == size)
{
yield return list;
list = new List<T>(size);
i = 0;
}
}
if (list.Count > 0)
yield return list;
}

I don't think linq is really suitable for this sort of processing - it is mainly useful for performing operations on whole sequences rather than splitting or modifying them. I would do this by accessing the underlying IEnumerator<T> since any method using Take and Skip are going to be quite inefficient.
public static void Batch<T>(this IEnumerable<T> items, int batchSize, Action<IEnumerable<T>> batchAction)
{
if (batchSize < 1) throw new ArgumentException();
List<T> buffer = new List<T>();
using (var enumerator = (items ?? Enumerable.Empty<T>()).GetEnumerator())
{
while (enumerator.MoveNext())
{
buffer.Add(enumerator.Current);
if (buffer.Count == batchSize)
{
batchAction(buffer);
buffer.Clear();
}
}
//execute for remaining items
if (buffer.Count > 0)
{
batchAction(buffer);
}
}
}

var batchSize = 100;
for (var i = 0; i < Math.Ceiling(yourCollection.Count() / (decimal)batchSize); i++)
{
var batch = yourCollection
.Skip(i*batchSize)
.Take(batchSize);
// Do something with batch
}

Related

Queue move item back to the end

I have a System.Collection.Generic.Queue<int> with following sample code
Queue<int> iq = new Queue<int>();
iq.Enqueue(1); // 1
iq.Enqueue(2); // 1 2
iq.Enqueue(3); // 1 2 3
//move 1 to the end of the line here
int i = iq.Dequeue(); // 2 3
I want to move the value (access by value) 1 back to the end of the line so that the result is 2 and 1 would be the last dequeueable value.
Any idea? Is there something like iq.MoveToLast(1) ?
If you want to Remove / Add item by its value, you can use List<T> instead of Queue<T>:
List<int> id = ...
int itemToMove = 2;
int index = id.IndexOf(itemToMove);
// If we have item found we should put it at the end
if (index >= 0) {
id.Add(id[index]);
id.RemoveAt(index);
}
If you have to use Queue<T> you can create a temporal List<T>:
Queue<int> iq = ...
int itemToMove = 2;
// Create temporal list
var list = iq.ToList();
// process items in it
int index = list.IndexOf(itemToMove);
if (index >= 0) {
list.Add(list[index]);
list.RemoveAt(index);
}
// enqueue items back into queue in the desired order
iq.Clear();
foreach (var item in list)
iq.Enqueue(item);
Finally, you can implement an extension method:
public static partial class QueueExtensions {
public static void MoveToLast<T>(this Queue<int> queue, T itemToMove) {
if (null == queue)
throw new ArgumentNullException(nameof(queue));
var list = queue.ToList();
int index = list.IndexOf(itemToMove);
if (index < 0)
return; // Nothing to do
list.Add(list[index]);
list.RemoveAt(index);
queue.Clear();
foreach (var item in list)
queue.Enqueue(item);
}
}
Then you can put
iq.MoveToLast(1);
Just try:
queue.Enqueue(queue.Dequeue());
You can't Remove elements from Queue by using methods other than Dequeue.
Here's an approach which just manipulates the queue:
public static void MoveElementToBack<T>(Queue<T> queue, T elementToMove)
{
T item = default;
bool found = false;
for (int i = 0, n = queue.Count; i < n; ++i)
{
var current = queue.Dequeue();
if (!found && current.Equals(elementToMove))
{
item = current;
found = true;
}
else
{
queue.Enqueue(current);
}
}
if (found)
queue.Enqueue(item);
}
This is always an O(N) operation, but it only makes one pass through the queue.

Most efficient method for averaging list<int>

I am keeping a rolling accumulator for a graphing application, that one of the features is providing a running average of the sample.
The size of the accumulator is variable, but essentially the end goal is achieved like this.
The accumulator class (ugly but functional)
public class accumulator<t>
{
private int trim;
List<t> _points = new List<t>();
List<string> _labels = new List<string>();
List<t> _runAvg = new List<t>();
List<t> _high = new List<t>();
List<t> _low = new List<t>();
public List<t> points { get { return _points; } }
public List<string> labels { get { return _labels; } }
public List<t> runAvg { get { return _runAvg; } }
public List<t> high { get { return _high; } }
public List<t> low { get { return _low; } }
public delegate void onChangeHandler(accumulator<t> sender, EventArgs e);
public event onChangeHandler onChange;
public accumulator(int trim)
{
this.trim = trim;
}
public void add(t point, string label)
{
if (_points.Count == trim)
{
_points.RemoveAt(0);
_labels.RemoveAt(0);
_runAvg.RemoveAt(0);
_high.RemoveAt(0);
_low.RemoveAt(0);
}
_points.Add(point);
_labels.Add(label);
if (typeof(t) == typeof(System.Int32))
{
int avg = 0;
if (_high.Count == 0)
{
_high.Add(point);
}
else
{
t v = (Convert.ToInt32(point) > Convert.ToInt32(_high[0])) ? point : _high[0];
_high.Clear();
for (int i = 0; i < _points.Count; i++) _high.Add(v);
}
if (_low.Count == 0)
{
_low.Add(point);
}
else
{
t v = (Convert.ToInt32(point) < Convert.ToInt32(_low[0])) ? point : _low[0];
_low.Clear();
for (int i = 0; i < _points.Count; i++) _low.Add(v);
}
foreach (t item in _points) avg += Convert.ToInt32(item);
avg = (avg / _points.Count);
_runAvg.Add((t)(object)avg);
//_runAvg.Add((t)(object)_points.Average(a => Convert.ToInt32(a)));
}
if (typeof(t) == typeof(System.Double))
{
double avg = 0;
if (_high.Count == 0)
{
_high.Add(point);
}
else
{
t v = (Convert.ToDouble(point) > Convert.ToDouble(_high[0])) ? point : _high[0];
_high.Clear();
for (int i = 0; i < _points.Count; i++) _high.Add(v);
}
if (_low.Count == 0)
{
_low.Add(point);
}
else
{
t v = (Convert.ToDouble(point) < Convert.ToDouble(_low[0])) ? point : _low[0];
_low.Clear();
for (int i = 0; i < _points.Count; i++) _low.Add(v);
}
foreach (t item in _points) avg += Convert.ToDouble(item);
avg = (avg / _points.Count);
_runAvg.Add((t)(object)avg);
//_runAvg.Add((t)(object)_points.Average(a => Convert.ToDouble(a)));
}
onChangeHappen();
}
private void onChangeHappen()
{
if (onChange != null) onChange(this, EventArgs.Empty);
}
}
As you can see essentially I want to keep a running average, a High/Low mark (with the same count of data points so it binds directly to the chart control in an extended series class)
The average is I think what is killing me, to have to add every element of the list to a sum / divide by the count is of course how an average is achieved, but is the loop the most efficient way to do this?
I took a stab at a lambda expression (commented out) but figured on some level it had to be doing the same. (Sort of like using the generic list VS array, I figure it has to be re declaring an array somewhere and moving elements around, but it is likely being done as or more efficient than I would have, so let it do it for convenience's sake)
The ultimate goal and final question being really, given a list of generic values...
The most efficient way to average the whole list.
*Note: I DO realize the pedantic nature of typing the int/floats, this is because of type checking in the consumer of this class over which I have no control, it actually looks to see if it is double/int otherwise I would have treated it ALL as floats ;)
Thanks in advance...
Keeping a running average is actually really easy. You don't need to sum the whole list every time, because you already have it!
Once you have a current average (from your loop), you just do the following:
((oldAverage * oldCount) + newValue) / newCount
This will give you the average of the old set with the new value(s) included.
To get the initial average value, consider using the Average function from LINQ:
double average = listOfInts.Average();

Removing elements of a collection that I am foreaching through

Here is some dummy code that illustrates what I want to do:
List<int> list1 = new List<int>();
//Code to fill the list
foreach(int number in list1)
{
if(number%5==0)
{
list1.Remove(number);
}
}
Assuming the test actually removes an int, it will throw an error. Is there a way of doing this in a foreach, or do I have to convert it to a for loop?
You can't remove items from a collection that you are iterating thru with a for each.
I would do this...
list1 = list1.Where(l => l % 5 != 0).ToList();
The RemoveAll() method comes closest to what you want, I think:
list1.RemoveAll(i => i%5 == 0);
Actually if you want to remove the list as you state in the O.P you could do:
List<int> list1 = new List<int>();
//Code to fill the list
for(var n = 0; n < list.Count; i++)
{
if (list[n] % 5 == 0)
{
list1.Remove(list[n--]);
}
}
Edited to Add
The reason why you can't change a list while in a for each loos is as follows:
[Serializable()]
public struct Enumerator : IEnumerator<T>, System.Collections.IEnumerator
{
private List<T> list;
private int index;
private int version;
private T current;
internal Enumerator(List<T> list) {
this.list = list;
index = 0;
version = list._version;
current = default(T);
}
public void Dispose() {
}
public bool MoveNext() {
List<T> localList = list;
if (version == localList._version && ((uint)index < (uint)localList._size))
{
current = localList._items[index];
index++;
return true;
}
return MoveNextRare();
}
private bool MoveNextRare()
{
if (version != list._version) {
ThrowHelper.ThrowInvalidOperationException(ExceptionResource.InvalidOperation_EnumFailedVersion);
}
index = list._size + 1;
current = default(T);
return false;
}
public T Current {
get {
return current;
}
}
Object System.Collections.IEnumerator.Current {
get {
if( index == 0 || index == list._size + 1) {
ThrowHelper.ThrowInvalidOperationException(ExceptionResource.InvalidOperation_EnumOpCantHappen);
}
return Current;
}
}
void System.Collections.IEnumerator.Reset() {
if (version != list._version) {
ThrowHelper.ThrowInvalidOperationException(ExceptionResource.InvalidOperation_EnumFailedVersion);
}
index = 0;
current = default(T);
}
}
As far as I know, a collection cannot be modified while in a foreach loop. You need to change it to a for loop. Another way you can accomplish that is using LINQ.
You can't do it in-situ using foreach, because it invalidates the enumerator.
Either take a copy of the list and iterate over that, or use a different type of loop such as a for() loop.
You cannot modify the collection your are enumerating through using foreach. What I often do is use a for loop and go backwards through the collection, thus you can safely remove items since the length won't be affected until you move to the previous item.

Can someone come up with a better version of this enumerator?

I'm pretty happy with the following method. It takes an enumerable and a list of sorted, disjoint ranges and skips items not in the ranges. If the ranges are null, we just walk every item. The enumerable and the list of ranges are both possibly large. We want this method to be as high performance as possible.
Can someone think of a more elegant piece of code? I'm primarily interested in C# implementations, but if someone has a three-character APL implementation, that's cool too.
public static IEnumerable<T> WalkRanges<T>(IEnumerable<T> source, List<Pair<int, int>> ranges)
{
Debug.Assert(ranges == null || ranges.Count > 0);
int currentItem = 0;
Pair<int, int> currentRange = new Pair<int, int>();
int currentRangeIndex = -1;
bool betweenRanges = false;
if (ranges != null)
{
currentRange = ranges[0];
currentRangeIndex = 0;
betweenRanges = currentRange.First > 0;
}
foreach (T item in source)
{
if (ranges != null) {
if (betweenRanges) {
if (currentItem == currentRange.First)
betweenRanges = false;
else {
currentItem++;
continue;
}
}
}
yield return item;
if (ranges != null) {
if (currentItem == currentRange.Second) {
if (currentRangeIndex == ranges.Count - 1)
break; // We just visited the last item in the ranges
currentRangeIndex = currentRangeIndex + 1;
currentRange = ranges[currentRangeIndex];
betweenRanges = true;
}
}
currentItem++;
}
}
Maybe use linq on your source something like:
public static IEnumerable<T> WalkRanges<T>(IEnumerable<T> source, List<Pair<int, int>> ranges)
{
if(ranges == null)
return null;
return source.Where((item, index) => ranges.Any(y => y.First < index && y.Second > index)).AsEnumerable();
}
I don't have my Windows PC in front of me and I'm not sure I understood your code correctly, but I tried to understand your text instead and the code above could work.... or something like it.
UPDATED: Regarding the performance issue I would recommend you to test the performance with some simple test and time both of the functions.
You could copy the source list to an array and then for each range, you can block copy from your new source array to a target array in the proper location. If you can get your source collection passed in as an array, that would make this an even better approach. If you do have to do the initial copy, it is O(N) for that operation plus O(M) where M is the total number of items in the final array. So it ends up coming out to O(N) in either case.
Here's my take. I find it easier to understand, if not more elegant.
public static IEnumerable<T> WalkRanges<T>(IEnumerable<T> source, List<Tuple<int, int>> ranges)
{
if (ranges == null)
return source;
Debug.Assert(ranges.Count > 0);
return WalkRangesInternal(source, ranges);
}
static IEnumerable<T> WalkRangesInternal<T>(IEnumerable<T> source, List<Tuple<int, int>> ranges)
{
int currentItem = 0;
var rangeEnum = ranges.GetEnumerator();
bool moreData = rangeEnum.MoveNext();
using (var sourceEnum = source.GetEnumerator())
while (moreData)
{
// skip over every item in the gap between ranges
while (currentItem < rangeEnum.Current.Item1
&& (moreData = sourceEnum.MoveNext()))
currentItem++;
// yield all the elements in the range
while (currentItem <= rangeEnum.Current.Item2
&& (moreData = sourceEnum.MoveNext()))
{
yield return sourceEnum.Current;
currentItem++;
}
// advance to the next range
moreData = rangeEnum.MoveNext();
}
}
How about this (untested)? Should have pretty similar performance characteristics (pure streaming, no unnecessary buffering, quick exit), but is easier to follow, IMO:
public static IEnumerable<T> WalkRanges<T>(IEnumerable<T> source,
List<Pair<int, int>> ranges)
{
if (source == null)
throw new ArgumentNullException("source");
// If ranges is null, just return the source. From spec.
return ranges == null ? source : RangeIterate(source, ranges);
}
private static IEnumerable<T> RangeIterate<T>(IEnumerable<T> source,
List<Pair<int, int>> ranges)
{
// The key bit: a lazy sequence of all valid indices belonging to
// each range. No buffering.
var validIndices = from range in ranges
let start = Math.Max(0, range.First)
from validIndex in Enumerable.Range(start, range.Second - start + 1)
select validIndex;
int currentIndex = -1;
using (var indexErator = validIndices.GetEnumerator())
{
// Optimization: Get out early if there are no ranges.
if (!indexErator.MoveNext())
yield break;
foreach (var item in source)
{
if (++currentIndex == indexErator.Current)
{
// Valid index, yield.
yield return item;
// Move to the next valid index.
// Optimization: get out early if there aren't any more.
if (!indexErator.MoveNext())
yield break;
}
}
}
}
If you don't mind buffering indices, you can do something like this, which is even more clearer, IMO:
public static IEnumerable<T> WalkRanges<T>(IEnumerable<T> source,
List<Pair<int, int>> ranges)
{
if (source == null)
throw new ArgumentNullException("source");
if (ranges == null)
return source;
// Optimization: Get out early if there are no ranges.
if (!ranges.Any())
return Enumerable.Empty<T>();
var validIndices = from range in ranges
let start = Math.Max(0, range.First)
from validIndex in Enumerable.Range(start, range.Second - start + 1)
select validIndex;
// Buffer the valid indices into a set.
var validIndicesSet = new HashSet<int>(validIndices);
// Optimization: don't take an item beyond the last index of the last range.
return source.Take(ranges.Last().Second + 1)
.Where((item, index) => validIndicesSet.Contains(index));
}
You could iterate over the collection manually to prevent the enumerator from getting the current item when it will be skipped:
public static IEnumerable<T> WalkRanges<T>(IEnumerable<T> source, List<Pair<int, int>> ranges)
{
Debug.Assert(ranges == null || ranges.Count > 0);
int currentItem = 0;
Pair<int, int> currentRange = new Pair<int, int>();
int currentRangeIndex = -1;
bool betweenRanges = false;
if (ranges != null)
{
currentRange = ranges[0];
currentRangeIndex = 0;
betweenRanges = currentRange.First > 0;
}
using (IEnumerator<T> enumerator = source.GetEnumerator())
{
while (enumerator.MoveNext())
{
if (ranges != null)
{
if (betweenRanges)
{
if (currentItem == currentRange.First)
betweenRanges = false;
else
{
currentItem++;
continue;
}
}
}
yield return enumerator.Current;
if (ranges != null)
{
if (currentItem == currentRange.Second)
{
if (currentRangeIndex == ranges.Count - 1)
break; // We just visited the last item in the ranges
currentRangeIndex = currentRangeIndex + 1;
currentRange = ranges[currentRangeIndex];
betweenRanges = true;
}
}
currentItem++;
}
}
}
My second try, this will consider the ordering of the ranges. I haven' tried it yet but I thinkt it works :). You could probably extract some of the code to smaller functions to make it more readable.
public static IEnumerable<T> WalkRanges<T>(IEnumerable<T> source, List<Pair<int, int>> ranges)
{
int currentIndex = 0;
int currentRangeIndex = 0;
int maxRangeIndex = ranges.Length;
bool done = false;
foreach(var item in source)
{
if(currentIndex > range[currentRangeIndex].Second)
{
while(currentIndex > range[currentRangeIndex].Second)
{
if(!++currentRangeIndex < maxRangeIndex)
{
// We've passed last range =>
// set done = true to break outer loop and then break
done = true;
break;
}
}
if(currentIndex > range[currentRangeIndex].First)
yield item; // include if larger than first since we now it's smaller than second
}
else if(currentIndex > range[currentRangeIndex].First)
{
// If higher than first and lower than second we're in range
yield item;
}
if(done) // if done break outer loop
break;
currentIndex++; // always increase index when advancint through source
}
}

N-way intersection of sorted enumerables

Given n enumerables of the same type that return distinct elements in ascending order, for example:
IEnumerable<char> s1 = "adhjlstxyz";
IEnumerable<char> s2 = "bdeijmnpsz";
IEnumerable<char> s3 = "dejlnopsvw";
I want to efficiently find all values that are elements of all enumerables:
IEnumerable<char> sx = Intersect(new[] { s1, s2, s3 });
Debug.Assert(sx.SequenceEqual("djs"));
"Efficiently" here means that
the input enumerables should each be enumerated only once,
the elements of the input enumerables should be retrieved only when needed, and
the algorithm should not recursively enumerate its own output.
I need some hints how to approach a solution.
Here is my (naive) attempt so far:
static IEnumerable<T> Intersect<T>(IEnumerable<T>[] enums)
{
return enums[0].Intersect(
enums.Length == 2 ? enums[1] : Intersect(enums.Skip(1).ToArray()));
}
Enumerable.Intersect collects the first enumerable into a HashSet, then enumerates the second enumerable and yields all matching elements.
Intersect then recursively intersects the result with the next enumerable.
This obviously isn't very efficient (it doesn't meet the constraints). And it doesn't exploit the fact that the elements are sorted at all.
Here is my attempt to intersect two enumerables. Maybe it can be generalized for n enumerables?
static IEnumerable<T> Intersect<T>(IEnumerable<T> first, IEnumerable<T> second)
{
using (var left = first.GetEnumerator())
using (var right = second.GetEnumerator())
{
var leftHasNext = left.MoveNext();
var rightHasNext = right.MoveNext();
var comparer = Comparer<T>.Default;
while (leftHasNext && rightHasNext)
{
switch (Math.Sign(comparer.Compare(left.Current, right.Current)))
{
case -1:
leftHasNext = left.MoveNext();
break;
case 0:
yield return left.Current;
leftHasNext = left.MoveNext();
rightHasNext = right.MoveNext();
break;
case 1:
rightHasNext = right.MoveNext();
break;
}
}
}
}
OK; more complex answer:
public static IEnumerable<T> Intersect<T>(params IEnumerable<T>[] enums) {
return Intersect<T>(null, enums);
}
public static IEnumerable<T> Intersect<T>(IComparer<T> comparer, params IEnumerable<T>[] enums) {
if(enums == null) throw new ArgumentNullException("enums");
if(enums.Length == 0) return Enumerable.Empty<T>();
if(enums.Length == 1) return enums[0];
if(comparer == null) comparer = Comparer<T>.Default;
return IntersectImpl(comparer, enums);
}
public static IEnumerable<T> IntersectImpl<T>(IComparer<T> comparer, IEnumerable<T>[] enums) {
IEnumerator<T>[] iters = new IEnumerator<T>[enums.Length];
try {
// create iterators and move as far as the first item
for (int i = 0; i < enums.Length; i++) {
if(!(iters[i] = enums[i].GetEnumerator()).MoveNext()) {
yield break; // no data for one of the iterators
}
}
bool first = true;
T lastValue = default(T);
do { // get the next item from the first sequence
T value = iters[0].Current;
if (!first && comparer.Compare(value, lastValue) == 0) continue; // dup in first source
bool allTrue = true;
for (int i = 1; i < iters.Length; i++) {
var iter = iters[i];
// if any sequence isn't there yet, progress it; if any sequence
// ends, we're all done
while (comparer.Compare(iter.Current, value) < 0) {
if (!iter.MoveNext()) goto alldone; // nasty, but
}
// if any sequence is now **past** value, then short-circuit
if (comparer.Compare(iter.Current, value) > 0) {
allTrue = false;
break;
}
}
// so all sequences have this value
if (allTrue) yield return value;
first = false;
lastValue = value;
} while (iters[0].MoveNext());
alldone:
;
} finally { // clean up all iterators
for (int i = 0; i < iters.Length; i++) {
if (iters[i] != null) {
try { iters[i].Dispose(); }
catch { }
}
}
}
}
You can use LINQ:
public static IEnumerable<T> Intersect<T>(IEnumerable<IEnumerable<T>> enums) {
using (var iter = enums.GetEnumerator()) {
IEnumerable<T> result;
if (iter.MoveNext()) {
result = iter.Current;
while (iter.MoveNext()) {
result = result.Intersect(iter.Current);
}
} else {
result = Enumerable.Empty<T>();
}
return result;
}
}
This would be simple, although it does build the hash-set multiple times; advancing all n at once (to take advantage of sorted) would be hard, but you could also build a single hash-set and remove missing things?

Categories

Resources