I'm splitting a list of (in this example) approx. 190000 items into chunks of 5000 items.
so instead of: List<Object>, count 190000
it becomes: List<List<Object>>, Count 28(Count 5000))
I do this with the following code:
public static IEnumerable<List<Object>> Split(this IEnumerable<Object> sourceList, int chunkSize)
{
int numberOfLists = (sourceList.Count() / chunkSize) + 1;
List<List<Object>> result = new List<List<object>>();
for (int i = 0; i < numberOfLists; i++)
{
List<Object> subList = new List<Object>();
subList = sourceList.Skip(i * chunkSize).Take(chunkSize).ToList();
result.Add(subList);
}
return result;
}
I call this method (which resides in a helper class) like follows;
var chunkList = (IEnumerable<List<MyObjectClass>>)MyHelper.Split(myObjectList, 5000);
In the above line I explicitly cast the list, that fails in an InvalidCastException. When i use the as operator like follows;
var chunkList = MyHelper.Split(myObjectList, 5000) as IEnumerable<List<MyObjectClass>>;
the result is null.
I expected I could use
List<List<MyObjectClass>> chunkList = MyHelper.Split(myObjectList, 5000) as List<List<MyObjectClass>>
I would like to keep the splitter method as generic as possible. The question is how I can cast the return value correctly. Can someone point out to me how to do this?
Thanks in advance.
As others have stated, the problem is that you are attempting to use the type parameter or List<T> in a variant manner, which is not possible. Instead of this you need to make the splitting method generic so that it has its own type parameter that matches that of the list.
That said, you can turn the method into an iterator block so that it only produces the sub-lists on demand:
public static IEnumerable<List<T>> Partition<T>(this IEnumerable<T> source,
int chunkSize)
{
while (source.Any())
{
yield return source.Take(chunkSize).ToList();
source = source.Skip(chunkSize);
}
}
This would be used as
var chunkList = sourceList.Partition(5000);
Note that the above version is free of the off-by-one error that your original code and the solutions based on it all share.
If you don't care about lazy evaluation there is also the possibility of using this trick with GroupBy to do the partitioning:
int i = 0;
var chunkList = sourceList
.GroupBy(o => i++ / chunkSize) // group into partitions
.Select(Enumerable.ToList) // transform each partition into a List
.ToList() // force evaluation of query right now
you can use type parameter instead of object
public static IEnumerable<List<T>> Split<T>(this IEnumerable<T> sourceList, int chunkSize) {
int numberOfLists = (sourceList.Count() / chunkSize) + 1;
List<List<T>> result = new List<List<T>>();
for (int i = 0; i < numberOfLists; i++)
{
result.Add(sourceList.Skip(i * chunkSize).Take(chunkSize));
}
return result;
}
so for use
IEnumerable<List<MyObjectClass>> chunkList = myObjectClassList.Split(5000);
Instead of taking IEnumerable<Object> as parameter and returning IEnumerable<List<Object>> make your method generic:
public static IEnumerable<List<T>> Split<T>(this IEnumerable<T> sourceList, int chunkSize)
{
int numberOfLists = (sourceList.Count() / chunkSize) + 1;
var result = new List<List<T>>(numberOfLists);
for (int i = 0; i < numberOfLists; i++)
{
result.Add(sourceList.Skip(i * chunkSize).Take(chunkSize).ToList());
}
return result;
}
You can also use Partition method from moreLINQ library. It is more efficient then your solution, because using Skip().Take() causes unnecessary iteration over the same set of elements every time.
Related
How would I use the following LINQ query correctly? I want to create a one-dimensional array that contains only values that are greater than 5. I can't understand why it can't iterate over this multidimensional array, but if I use foreach, it actually iterates.
// Create an multidimensional array, then just put numbers into it and print the array.
int[,] myArr = new int[5,6];
int rows = myArr.GetLength(0);
int cols = myArr.GetLength(1);
for (int i = 0; i < rows; i++)
{
for (int k = 0; k < cols; k++)
{
myArr[i,k] = i + k;
Console.Write(myArr[i,k]);
}
Console.WriteLine("");
}
var highList = from val in myArr where (val > 5) select val;
The error is:
Could not find an implementation of the query pattern for source type 'int[*,*]'. 'Where' not found. Are you missing a reference or a using directive for 'System.Linq'?
I thought this might fix the problem:
public static IEnumerator<int> GetEnumerator(int[,] arr)
{
foreach(int i in arr)
{
yield return i;
}
}
But it doesn't implement the iterator.
The problem is that multi-dimensional (rectangular) arrays implement IEnumerable, but not IEnumerable<T>. Fortunately, you can use Cast to fix that - and Cast gets called automatically if you explicitly specify the type of the range variable:
var highList = from int val in myArr where (val > 5) select val;
Or without the unnecessary brackets:
var highList = from int val in myArr where val > 5 select val;
Or using method calls directly, given that it's a pretty trivial query expression:
var highList = myArr.Cast<int>().Where(val => val > 5);
I think this will box each element, however. You could add your own Cast extension method to avoid that:
public static class RectangularArrayExtensions
{
public static IEnumerable<T> Cast<T>(this T[,] source)
{
foreach (T item in source)
{
yield return item;
}
}
}
I am using an extension method which shuffles a generic list. This works
public static void Shuffle<T>(this IList<T> list)
{
RNGCryptoServiceProvider provider = new RNGCryptoServiceProvider();
int n = list.Count;
while (n > 1)
{
byte[] box = new byte[1];
do provider.GetBytes(box);
while (!(box[0] < n * (Byte.MaxValue / n)));
int k = (box[0] % n);
n--;
T value = list[k];
list[k] = list[n];
list[n] = value;
}
}
I am trying trying to create another extension method which would utilize Shuffle(), but would shuffle the items in a list in groups based on a defined group size. This method seems to work when debugging the extension method, but the source list in the calling code still contains the original list after the extension call:
public static void GroupRandomize<T>(this IList<T> sourceList, int groupSize)
{
List<T> shuffledList = new List<T>();
List<T> tempList = new List<T>();
int addCounter = 0;
for (int i = 0; i < sourceList.Count; i++)
{
tempList.Add(sourceList[i]);
// if we've built a full group, or we're done processing the entire list
if ((addCounter == groupSize - 1) || (i == sourceList.Count - 1))
{
tempList.Shuffle();
shuffledList.AddRange(tempList);
tempList.Clear();
addCounter = 0;
}
else
{
addCounter++;
}
}
sourceList = shuffledList;
}
How do I ensure the shuffled list is stored properly into the source list?
sourceList is actually a local variable.
Might be better to return shuffedList;
var newList = caller.GroupRandomize<T>(5) ;
sourceList = shuffledList;
This will do nothing unless you are using a ref parameter. You could change your method so that it modifies the sourceList directly:
for(int i = 0; i < sourceList.Length; i++)
sourceList[i] = shuffledList[i];
But I'd recommend changing your approach so that the extension methods return new, shuffled lists, leaving the original lists intact. So instead of:
var list = GetList();
list.Shuffle();
... you would say:
var list = GetList().Shuffle();
Make it a regular method instead of an extension so you can pass it in by reference:
public static void GroupRandomize<T>(ref IList<T> sourceList, int groupSize) {
// ... stuff
sourceList = shuffledList;
}
Or if you don't want to change the header of the method, you could do the something like:
sourceList.Clear();
sourceList.AddRange( shuffledList );
Edit:
As stated by bperniciaro, The AddRange method is not available in the IList<T> interface.
StriplingWarrior already suggested an implementation that does what AddRange would do, so instead I will just improve his answer a little by pointing to another answer, by hvostt, that implements AddRange as an extension method of IList<T>.
I wrote skip last method. When I call it with int array, I expect to only get 2 elements back, not 4.
What is wrong?
Thanks
public static class myclass
{
public static IEnumerable<T> SkipLast<T>(this IEnumerable<T> source, int n)
{
return source.Reverse().Skip(n).Reverse();
}
}
class Program
{
static void Main(string[] args)
{
int [] a = new int[] {5, 6, 7, 8};
ArrayList a1 = new ArrayList();
a.SkipLast(2);
for( int i = 0; i <a.Length; i++)
{
Console.Write(a[i]);
}
}
}
you need to call as
var newlist = a.SkipLast(2);
for( int i = 0; i <newlist.Count; i++)
{
Console.Write(newlist[i]);
}
your method returning skipped list, but your original list will not update
if you want to assign or update same list you can set the returned list back to original as a = a.SkipLast(2).ToArray();
You should assign the result, not just put a.SkipLast(2):
a = a.SkipLast(2).ToArray(); // <- if you want to change "a" and loop on a
for( int i = 0; i <a.Length; i++) { ...
When you do a.SkipLast(2) it creates IEnumerable<int> and then discards it;
The most readable solution, IMHO, is to use foreach which is very convenient with LINQ:
...
int [] a = new int[] {5, 6, 7, 8};
foreach(int item in a.SkipLast(2))
Console.Write(item);
The other replies have answered your question, but wouldn't a more efficient implementation be this (which doesn't involve making two copies of the array in order to reverse it twice). It does iterate the collection twice (or rather, once and then count-n accesses) though:
public static IEnumerable<T> SkipLast<T>(this IEnumerable<T> source, int n)
{
n = source.Count() - n;
return source.TakeWhile(_ => n-- > 0);
}
Actually, if source is a type that implements Count without iteration (such as an array or a List) this will only access the elements count-n times, so it will be extremely efficient for those types.
Here is a better solution that only iterates the sequence once. It's data requirements are such that it only needs a buffer with n elements, which makes it very efficient if n is small compared with the size of the sequence:
public static IEnumerable<T> SkipLast<T>(this IEnumerable<T> source, int n)
{
int count = 0;
T[] buffer = new T[n];
var iter = source.GetEnumerator();
while (iter.MoveNext())
{
if (count >= n)
yield return buffer[count%n];
buffer[count++%n] = iter.Current;
}
}
Change your code to,
foreach (var r in a.SkipLast(2))
{
Console.Write(r);
}
for three reasons,
The SkipLast function returns the mutated sequence, it doesn't change it directly.
What is the point of using an indexer with IEnumerable? It imposes a needless count.
This code is easy to read, easier to type and shows intent.
For a more efficient generic SkipLast see Matthew's buffer with enumerator.
Your example could use a more specialised SkipLast,
public static IEnumerable<T> SkipLast<T>(this IList<T> source, int n = 1)
{
for (var i = 0; i < (source.Count - n); i++)
{
yield return source[i];
}
}
I have a collection contains let say 100 items.
Collection<int> myCollection = new Collection<int>();
for(int i; i <= 100; i++)
{
myCollection .Add(i);
}
How can i randomly select items by percentage(eg. 30%) from this collection?
Try this:
var rand = new Random();
var top30percent = myCollection.OrderBy(x=> rand.Next(myCollection.Count))
.Take((int)(0.3f*myCollection.Count)).ToList();
You can remove the ToList() if you want some deferred query.
There's two parts in your question. First, you must shuffle your collection in order to select items randomly. To shuffle it, you can do it properly with the Fisher-Yates shuffle, or just order your items using a pseudo-random generator.
The Fisher-Yates shuffle comes from this popular answer :
public static IList<T> Shuffle<T>(this IList<T> list)
{
Random rng = new Random();
int n = list.Count;
while (n > 1) {
n--;
int k = rng.Next(n + 1);
T value = list[k];
list[k] = list[n];
list[n] = value;
}
return list;
}
However, I'm returning the list so we can use it cleanly with the take part.
Also, if you don't really need to shuffle cleanly, you can use a simple OrderBy with either i => random.Next() or i => Guid.NewGuid() as the lambda expression.
Secondly, once it's shuffled, now you need to take a percentage of items. You can do this simply by using the Take LINQ method.
Like the Shuffle method, you can make it as an Extension method :
public static IEnumerable<int> TakePercentage(this IList<int> list, int percentage)
{
return list.Take(percentage * list.Count / 100);
}
If you prefer receiving a decimal (e.g. 0.3) directly :
public static IEnumerable<int> TakePercentage(this IList<int> list, double percentage)
{
return list.Take((int)(percentage * list.Count));
}
Finally, to use it, it's quite simple :
var thirtyPercent = myCollection.Shuffle().Take(30);
I have a 2-dimensional jagged array (though it's always rectangular), which I initialize using the traditional loop:
var myArr = new double[rowCount][];
for (int i = 0; i < rowCount; i++) {
myArr[i] = new double[colCount];
}
I thought maybe some LINQ function would give me an elegant way to do this in one statement. However, the closest I can come up with is this:
double[][] myArr = Enumerable.Repeat(new double[colCount], rowCount).ToArray();
The problem is that it seems to be creating a single double[colCount] and assigning references to that intsead of allocating a new array for each row. Is there a way to do this without getting too cryptic?
double[][] myArr = Enumerable
.Range(0, rowCount)
.Select(i => new double[colCount])
.ToArray();
What you have won't work as the new occurs before the call to Repeat. You need something that also repeats the creation of the array. This can be achieved using the Enumerable.Range method to generate a range and then performing a Select operation that maps each element of the range to a new array instance (as in Amy B's answer).
However, I think that you are trying to use LINQ where it isn't really appropriate to do so in this case. What you had prior to the LINQ solution is just fine. Of course, if you wanted a LINQ-style approach similar to Enumerable.Repeat, you could write your own extension method that generates a new item, such as:
public static IEnumerable<TResult> Repeat<TResult>(
Func<TResult> generator,
int count)
{
for (int i = 0; i < count; i++)
{
yield return generator();
}
}
Then you can call it as follows:
var result = Repeat(()=>new double[rowCount], columnCount).ToArray();
The behavior is correct - Repeat() returns a sequence that contains the supplied object multiple times. You can do the following trick.
double[][] myArr = Enumerable
.Repeat(0, rowCount)
.Select(i => new double[colCount])
.ToArray();
You can't do that with the Repeat method : the element parameter is only evaluated once, so indeed it always repeats the same instance. Instead, you could create a method to do what you want, which would take a lambda instead of a value :
public static IEnumerable<T> Sequence<T>(Func<T> generator, int count)
{
for (int i = 0; i < count; i++)
{
yield return generator();
}
}
...
var myArr = Sequence(() => new double[colCount], rowCount).ToArray();
I just wrote this function...
public static T[][] GetMatrix<T>(int m, int n)
{
var v = new T[m][];
for(int i=0;i<m; ++i) v[i] = new T[n];
return v;
}
Seems to work.
Usage:
float[][] vertices = GetMatrix<float>(8, 3);
What about
var myArr = new double[rowCount, colCount];
or
double myArr = new double[rowCount, colCount];
Reference: http://msdn.microsoft.com/en-us/library/aa691346(v=vs.71).aspx