Method signature for IList<T>.Split() extension method - c#

I'd like to be able to write the following code:
// contains 500 entries
IList<string> longListOfStrings = ...
// shorterListsOfStrings is now an array of 5 IList<string>,
// with each element containing 100 strings
IList<string>[] shorterListsOfStrings = longListOfStrings.Split(5);
To do this I have to create a generic extension method called Split that looks something like the following:
public static TList[] Split<TList>(this TList source, int elementCount)
where TList : IList<>, ICollection<>, IEnumerable<>, IList, ICollection, IEnumerable
{
return null;
}
But when I try to compile that, the compiler tells me that IList<>, ICollection<> and IEnumerable<> require a type argument. So, I changed the definition to the following:
public static TList<TType>[] Split<TList<TType>>(this TList<TType> source, int elementCount)
where TList : IList<TType>, ICollection<TType>, IEnumerable<TType>, IList, ICollection, IEnumerable
{
return null;
}
but then the compiler complains that it can't find type TList. I have an idea that I'm overcomplicating things but I can't see how... any help is appreciated!

Yes, I think you're overcomplicating things. Try this:
public static IList<T>[] Split<T>(this IList<T> source, int elementCount)
{
// What the heck, it's easy to implement...
IList<T>[] ret = new IList<T>[(source.Count + elementCount - 1)
/ elementCount];
for (int i = 0; i < ret.Length; i++)
{
int start = i * elementCount;
int size = Math.Min(elementCount, source.Count - i * start);
T[] tmp = new T[size];
// Would like CopyTo with a count, but never mind
for (int j = 0; i < size; j++)
{
tmp[j] = source[j + start];
}
ret[i] = tmp;
}
return ret;
}
After all, you're not going to change which kind of list you create within the method based on the source, are you? You'll presumably create a List<T> (or maybe a T[]) even if I pass in some other implementation.
You might want to look at the Batch method in MoreLINQ for an IEnumerable<T>-based implementation.

How about this:
public static IList<TResult>[] Split<TSource, TResult>(
this IList<TSource> source, // input IList to split
Func<TSource, TResult> selector, // projection to apply to each item
int elementCount // number of items per IList
) {
// do something
}
And if you don't need a version to project each item:
public static IList<T>[] Split<T>(
this IList<T> source, // input IList to split
int elementCount // number of items per IList
) {
return Split<T, T>(source, x => x, elementCount);
}

Related

Entity Framework Skip with Int64 parameter [duplicate]

How I can use long type (Int64) with Skip in Linq. It support only Int32.
dataContext.Persons.Skip(LongNumber);
You can use a while loop:
// Some init
List<Person> persons = new List<Person>();
List<Person> resultList = persons;
long bigNumber = 3 * (long)int.MaxValue + 12;
while (bigNumber > int.MaxValue)
{
resultList = resultList.Skip(int.MaxValue).ToList();
bigNumber -= int.MaxValue;
}
resultList = resultList.Skip(int.MaxValue).ToList();
// Then what do what you want with this result list
But does your collection have more than int.MaxValue entries?
The following extension method BigSkip allows skipping more than the Int32.MaxValue maximum value of LINQ's Skip method by calling the method multiple times until the long value has been reached. This method has the advantage of not causing iteration over the collection prematurely.
example usage
bigCollection.BigSkip(howMany: int.MaxValue + 1l)
method
using System;
using System.Collections.Generic;
using System.Linq;
static public class LinqExtensions
{
static public IEnumerable<T> BigSkip<T>(this IEnumerable<T> items, long howMany)
=> BigSkip(items, Int32.MaxValue, howMany);
internal static IEnumerable<T> BigSkip<T>(this IEnumerable<T> items, int segmentSize, long howMany)
{
long segmentCount = Math.DivRem(howMany, segmentSize,
out long remainder);
for (long i = 0; i < segmentCount; i += 1)
items = items.Skip(segmentSize);
if (remainder != 0)
items = items.Skip((int)remainder);
return items;
}
}
The method has been split into two: the internal overload is for convenience allowing a smaller segment size to be specified than Int32.MaxValue to make it unit testable on a smaller scale.
bonus
Replace Skip with Take to make a BigTake method; this same extension method pattern can be used to extend the reach of other LINQ methods.
Here is an implementation that includes a fast path, in case the count is in the Int32 range. In that case, any optimizations that are embedded in the native Skip implementation are enabled.
/// <summary>Bypasses a specified number of elements in a sequence and then
/// returns the remaining elements.</summary>
public static IEnumerable<T> LongSkip<T>(this IEnumerable<T> source, long count)
{
if (count >= 0 && count <= Int32.MaxValue)
return source.Skip(checked((int)count));
else
return Iterator(source, count);
static IEnumerable<T> Iterator(IEnumerable<T> source, long count)
{
long current = 0;
foreach (var item in source)
if (++current > count) yield return item;
}
}

How use Long type for Skip in Linq

How I can use long type (Int64) with Skip in Linq. It support only Int32.
dataContext.Persons.Skip(LongNumber);
You can use a while loop:
// Some init
List<Person> persons = new List<Person>();
List<Person> resultList = persons;
long bigNumber = 3 * (long)int.MaxValue + 12;
while (bigNumber > int.MaxValue)
{
resultList = resultList.Skip(int.MaxValue).ToList();
bigNumber -= int.MaxValue;
}
resultList = resultList.Skip(int.MaxValue).ToList();
// Then what do what you want with this result list
But does your collection have more than int.MaxValue entries?
The following extension method BigSkip allows skipping more than the Int32.MaxValue maximum value of LINQ's Skip method by calling the method multiple times until the long value has been reached. This method has the advantage of not causing iteration over the collection prematurely.
example usage
bigCollection.BigSkip(howMany: int.MaxValue + 1l)
method
using System;
using System.Collections.Generic;
using System.Linq;
static public class LinqExtensions
{
static public IEnumerable<T> BigSkip<T>(this IEnumerable<T> items, long howMany)
=> BigSkip(items, Int32.MaxValue, howMany);
internal static IEnumerable<T> BigSkip<T>(this IEnumerable<T> items, int segmentSize, long howMany)
{
long segmentCount = Math.DivRem(howMany, segmentSize,
out long remainder);
for (long i = 0; i < segmentCount; i += 1)
items = items.Skip(segmentSize);
if (remainder != 0)
items = items.Skip((int)remainder);
return items;
}
}
The method has been split into two: the internal overload is for convenience allowing a smaller segment size to be specified than Int32.MaxValue to make it unit testable on a smaller scale.
bonus
Replace Skip with Take to make a BigTake method; this same extension method pattern can be used to extend the reach of other LINQ methods.
Here is an implementation that includes a fast path, in case the count is in the Int32 range. In that case, any optimizations that are embedded in the native Skip implementation are enabled.
/// <summary>Bypasses a specified number of elements in a sequence and then
/// returns the remaining elements.</summary>
public static IEnumerable<T> LongSkip<T>(this IEnumerable<T> source, long count)
{
if (count >= 0 && count <= Int32.MaxValue)
return source.Skip(checked((int)count));
else
return Iterator(source, count);
static IEnumerable<T> Iterator(IEnumerable<T> source, long count)
{
long current = 0;
foreach (var item in source)
if (++current > count) yield return item;
}
}

List is taking too much time

I have been writing a program which has a list of 100,000 elements I have to process all the elements with different conditions. This does not take much time 3sec at most. After this I have a list of valid entries and my orignal list which had 100000 elements. The new list usualy has a size of 6K - 7K. The main problem is when I use List.Remove function or any other way to remove the invalid elements from the orignal list with 100K elements its too slow.
Please guide if I should use any thing else then the LIST or there is something that I can do with this code also.
I am including all codes I tried.
for( int k = 0; k < initialList.Count;k++)
{
combo c = initialList.ElementAt(k);
if(invalidEntries.Contains(c))
{
smartString.Append(c.number1.ToString());
smartString.Append(c.number2.ToString());
smartString.Append(c.number3.ToString());
smartString.Append(c.number4.ToString());
smartString.Append(c.number5.ToString());
smartString.Append(" Sum : ");
smartString.Append(c.sum.ToString());
smartString.AppendLine();
InvalidCombo.AppendText(smartString.ToString());
smartString.Clear();
}
else
{
smartString.Append(c.number1.ToString());
smartString.Append(c.number2.ToString());
smartString.Append(c.number3.ToString());
smartString.Append(c.number4.ToString());
smartString.Append(c.number5.ToString());
smartString.Append(" Sum : ");
smartString.Append(c.sum.ToString());
smartString.AppendLine();
validCombo.AppendText(smartString.ToString());
smartString.Clear();
}
}
Also
for(int k=0;k<100000;k++)
{
combo c = initialList.ElementAt(k);
if (!invalidEntries.Contains(c))
validEntries.Add(c);
}
I have also tried the .remove functions but i think list cant take it. so any suggestions/solutions?
I'm a big fan of the structs, but you must be very careful when you work with a struct like yours. The List<T> methods that rely on equality (Contains, IndexOf, Remove) may not work and should not be used. Same for HashSet<T> and similar.
The best for your case would be to combine the processing with the removal. And the fastest way to do a removal from a List<T> is to not use it's item remove related (Remove/RemoveAt) methods! :-) Instead, you "compact" the list by keeping the items that should remain (and their count) at the beginning of the list, and then just use RemoveRange method to cut the unnecessary items at the end of the list. This is very efficient and avoids all the data block moving which happens when you use the "normal" list remove methods. Here is a sample code based on your struct definition:
public struct combo { public int number1; public int number2; public int number3; public int number4; public int number5; public int sum; public bool invalid; }
void ProcessList(List<combo> list)
{
int count = 0;
for (int i = 0; i < list.Count; i++)
{
var item = list[i];
ProcessItem(ref item);
if (!item.invalid) list[count++] = item;
}
list.RemoveRange(count, list.Count - count);
}
void ProcessItem(ref combo item)
{
// do the processing and set item.invalid=true/false
}
In case you are not mutating the item inside the ProcessItem, you can remove the ref modifier, change the return type to bool and use it to control whether the item should be removed from the list or not.
Here is an example of using HashSet. It is very fast.
using System.Collections.Generic;
namespace ConsoleApplication1
{
class Program
{
static void Main(string[] args)
{
var myInts = new HashSet<int>();
for (var i = 0; i < 100000; i++)
myInts.Add(i);
myInts.Remove(62345);
}
}
}

Read skipped data

I've got an IEnumerable of T. I wish to skip a certain number of T, but in the process, I also wish to read these T. You can read the Ts with Take() and skip them with Skip(), but that would entail multiple enumerations.
I need to read N items, deal with them, and then get all the unread items back as an IEnumerable, in one enumeration.
Edit: I'm trying to feed an IEnumerable to a method that takes a Stream-alike. Namely, I must implement only the method
public int Read(T[] readBuffer, int offset, int count)
The problem is that I need to advance the enumerable past the read data to store the position and also read the data to pass back out in the input buffer.
So far I've tried this:
public static IEnumerable<T> SkipTake<T>(this IEnumerable<T> input, int num, Action<List<T>> take)
{
var enumerator = input.GetEnumerator();
var chunk = new List<T>();
for (int i = 0; i < num; ++num)
{
chunk.Add(enumerator.Current);
if (!enumerator.MoveNext())
break;
}
take(chunk);
yield return enumerator.Current;
while (enumerator.MoveNext())
yield return enumerator.Current;
}
Not much luck.
Seems like your implementation does not call MoveNext() at the right time. You must call MoveNext() before you can get the Current element:
public static IEnumerable<T> SkipTake<T>(this IEnumerable<T> input, int num, Action<List<T>> take)
{
var enumerator = input.GetEnumerator();
var chunk = new List<T>();
for (int i = 0; i < num; ++num)
{
if (!enumerator.MoveNext())
break;
chunk.Add(enumerator.Current);
}
take(chunk);
while (enumerator.MoveNext())
yield return enumerator.Current;
}
EDIT: Just to make it clear, here's a usage example:
var list = new List<string>() {"This", "is", "an", "example", "!"};
var res = list.SkipTake(2, chunk =>
{
Console.WriteLine(chunk.Count());
});
Console.WriteLine(res.Count());
The output is
2 3
and the collections contain
{"This", "is"}
and
{"an", "example", "!"}
respectively and the original collection list was not modified.

Way to pad an array to avoid index outside of bounds of array error

I expect to have at least 183 items in my list when I query it, but sometimes the result from my extract results in items count lower than 183. My current fix supposedly pads the array in the case that the count is less than 183.
if (extractArray.Count() < 183) {
int arraysize= extractArray.Count();
var tempArr = new String[183 - arraysize];
List<string> itemsList = extractArray.ToList<string>();
itemsList.AddRange(tempArr);
var values = itemsList.ToArray();
//-- Process the new array that is now at least 183 in length
}
But it seems my solution is not the best. I would appreciate any other solutions that could help ensure I get at least 183 items whenever the extract happens please.
I'd probably follow others' suggestions, and use a list. Use the "capacity" constructor for added performance:
var list = new List<string>(183);
Then, whenever you get a new array, do this (replace " " with whatever value you use to pad the array):
list.Clear();
list.AddRange(array);
// logically, you can do this without the if, but it saves an object allocation when the array is full
if (array.Length < 183)
list.AddRange(Enumerable.Repeat(" ", 183 - array.Length));
This way, the list is always reusing the same internal array, reducing allocations and GC pressure.
Or, you could use an extension method:
public static class ArrayExtensions
{
public static T ElementOrDefault<T>(this T[] array, int index)
{
return ElementOrDefault(array, index, default(T));
}
public static T ElementOrDefault<T>(this T[] array, int index, T defaultValue)
{
return index < array.Length ? array[index] : defaultValue;
}
}
Then code like this:
items.Zero = array[0];
items.One = array[1];
//...
Becomes this:
items.Zero = array.ElementOrDefault(0);
items.One = array.ElementOrDefault(1);
//...
Finally, this is the rather cumbersome idea with which I started writing this answer: You could wrap the array in an IList implementation that's guaranteed to have 183 indexes (I've omitted most of the interface member implementations for brevity):
class ConstantSizeReadOnlyArrayWrapper<T> : IList<T>
{
private readonly T[] _array;
private readonly int _constantSize;
private readonly T _padValue;
public ConstantSizeReadOnlyArrayWrapper(T[] array, int constantSize, T padValue)
{
//parameter validation omitted for brevity
_array = array;
_constantSize = constantSize;
_padValue = padValue;
}
private int MissingItemCount
{
get { return _constantSize - _array.Length; }
}
public IEnumerator<T> GetEnumerator()
{
//maybe you don't need to implement this, or maybe just returning _array.GetEnumerator() would suffice.
return _array.Concat(Enumerable.Repeat(_padValue, MissingItemCount)).GetEnumerator();
}
public int Count
{
get { return _constantSize; }
}
public bool IsReadOnly
{
get { return true; }
}
public int IndexOf(T item)
{
var arrayIndex = Array.IndexOf(_array, item);
if (arrayIndex < 0 && item.Equals(_padValue))
return _array.Length;
return arrayIndex;
}
public T this[int index]
{
get
{
if (index < 0 || index >= _constantSize)
throw new IndexOutOfRangeException();
return index < _array.Length ? _array[index] : _padValue;
}
set { throw new NotSupportedException(); }
}
}
Ack.
The Array base class implements the Resize method
if(extractArray.Length < 183)
Array.Resize<string>(ref extractArray, 183);
However, keep in mind that resizing is problematic for performance, thus this method is useful only if you require the array for some reason. If you can switch to a List
And, I suppose you have an unidimensional array of strings here, so I use the Length property to check the effective number of items in the array.
Since you've stated that you need to ensure there's 183 indexes, and that you need to pad it if there is not, I would suggest using a List instead of an array. You can do something like:
while (extractList.Count < 183)
{
extractList.Add(" "); // just add a space
}
If you ABSOLUTELY have to go back to an array you can using something similar.
I can't say that I would recommend this solution, but I won't let that stop me from posting it! Whether they like to admit it or not, everyone likes linq solutions!
Using linq, given an array with X elements in it, you can generate an array with exactly Y (183 in your case) elements in it like this:
var items183exactly = extractArray.Length == 183 ? extractArray :
extractArray.Take(183)
.Concat(Enumerable.Repeat(string.Empty, Math.Max(0, 183 - extractArray.Length)))
.ToArray();
If there are fewer than 183 elements, the array will be padded with empty strings. If there are more than 183 elements, the array will be truncated. If there are exactly 183 elements, the array is used as is.
I don't claim that this is efficient or that it is necessarily a good idea. However, it does use linq (yippee!) and it is fun.

Categories

Resources