I thought I know everything about IEnumerable<T> but I just met a case that I cannot explain. When we call .Where linq method on a IEnumerable, the execution is deferred until the object is enumerated, isn't it?
So how to explain the sample below :
public class CTest
{
public CTest(int amount)
{
Amount = amount;
}
public int Amount { get; set; }
public override string ToString()
{
return $"Amount:{Amount}";
}
public static IEnumerable<CTest> GenerateEnumerableTest()
{
var tab = new List<int> { 2, 5, 10, 12 };
return tab.Select(t => new CTest(t));
}
}
Nothing bad so far!
But the following test gives me an unexpected result although my knowledge regarding IEnumerable<T> and .Where linq method :
[TestMethod]
public void TestCSharp()
{
var tab = CTest.GenerateEnumerableTest();
foreach (var item in tab.Where(i => i.Amount > 6))
{
item.Amount = item.Amount * 2;
}
foreach (var t in tab)
{
var s = t.ToString();
Debug.Print(s);
}
}
No item from tab will be multiplied by 2. The output will be :
Amount:2
Amount:5
Amount:10
Amount:12
Does anyone can explain why after enumerating tab, I get the original value.
Of course, everything work fine after calling .ToList() just after calling GenerateEnumerableTest() method.
var tab = CTest.GenerateEnumerableTest();
This tab is a LINQ query that generates CTest instances that are initialized from int-values which come from an integer array which will never change. So whenever you ask for this query you will get the "same" instances(with the original Amount).
If you want to "materialize" this query you could use ToList and then change them.
Otherwise you are modifying CTest instances that exist only in the first foreach loop. The second loop enumerates other CTest instances with the unmodified Amount.
So the query contains the informations how to get the items, you could also call the method directly:
foreach (var item in CTest.GenerateEnumerableTest().Where(i => i.Amount > 6))
{
item.Amount = item.Amount * 2;
}
foreach (var t in CTest.GenerateEnumerableTest())
{
// now you don't expect them to be changed, do you?
}
Like many LINQ operations, Select is lazy and use deferred execution so your lambda expression is never being executed, because you're calling Select but never using the results. This is why, everything work fine after calling .ToList() just after calling GenerateEnumerableTest() method:
var tab = CTest.GenerateEnumerableTest().ToList();
I check and tried this example: How to write Asynchronous LINQ query?
This works well and it asynchronously executes both procedures but my target is to create a query that when you get the first result of the query you can already start to use this value while the query is still executing and looking for more values.
I did this in while I was programming for android and the result was amazing and really fast but I have no clue how to make it in C# and using linQ
Asynchronous sequences are modeled in .NET as IObservables. Reactive Extensions is the asynchronous section of LINQ.
You can generate an observable sequence in any number of ways, such as through a call to Generate:
var sequence = Observable.Generate(0,
i => i < 10,
i => i + 1,
i => SomeExpensiveGeneratorFunction());
And then query it using all of the regular LINQ functions (which as a result allows for the use of query syntax) along with a number of additional operations that make sense specifically for asynchronous sequence (and also a lot of different ways of creating observable sequences) such as:
var query = from item in sequence
where ConditionIsTrue(item)
select item.ToString();
The short description of what's going on here is to just say that it does exactly what you want. The generator function simply notifies its subscribers whenever it successfully generates a value (or when it's done) and continues generating values without waiting for the subscribers to finish, the Where method will subscribe to sequence, and notify its subscribers whenever it observes a value that passes the condition, Select will subscribe to the sequence returned by Where and perform its transformation (asynchronously) whenever it gets a value and will then push it to all of its subscribers.
I have modified TheSoftwareJedi answer from your given link.
You can raise the first startup event from the Asynchronous class, and use it to start-up your work.
Here's the class,
public static class AsynchronousQueryExecutor
{
private static Action<object> m_OnFirstItemProcessed;
public static void Call<T>(IEnumerable<T> query, Action<IEnumerable<T>> callback, Action<Exception> errorCallback, Action<object> OnFirstItemProcessed)
{
m_OnFirstItemProcessed = OnFirstItemProcessed;
Func<IEnumerable<T>, IEnumerable<T>> func =
new Func<IEnumerable<T>, IEnumerable<T>>(InnerEnumerate<T>);
IEnumerable<T> result = null;
IAsyncResult ar = func.BeginInvoke(
query,
new AsyncCallback(delegate(IAsyncResult arr)
{
try
{
result = ((Func<IEnumerable<T>, IEnumerable<T>>)((AsyncResult)arr).AsyncDelegate).EndInvoke(arr);
}
catch (Exception ex)
{
if (errorCallback != null)
{
errorCallback(ex);
}
return;
}
//errors from inside here are the callbacks problem
//I think it would be confusing to report them
callback(result);
}),
null);
}
private static IEnumerable<T> InnerEnumerate<T>(IEnumerable<T> query)
{
int iCount = 0;
foreach (var item in query) //the method hangs here while the query executes
{
if (iCount == 0)
{
iCount++;
m_OnFirstItemProcessed(item);
}
yield return item;
}
}
}
here's the associations,
private void OnFirstItem(object value) // Your first items is proecessed here.
{
//You can start your work here.
}
public void HandleResults(IEnumerable<int> results)
{
foreach (var item in results)
{
}
}
public void HandleError(Exception ex)
{
}
and here's how you should call the function.
private void buttonclick(object sender, EventArgs e)
{
IEnumerable<int> range = Enumerable.Range(1,10000);
var qry = TestSlowLoadingEnumerable(range);
//We begin the call and give it our callback delegate
//and a delegate to an error handler
AsynchronousQueryExecutor.Call(qry, HandleResults, HandleError, OnFirstItem);
}
If this meets your expectation, you can use this to start your work with the first item processed.
Try again ...
If I understand you the logic you want is something like ...
var query = getData.Where( ... );
query.AsParallel().ForEach(r => {
//other stuff
});
What will happen here ...
Well in short, the compiler will evaluate this to something like: Whilst iterating through query results in parallel perform the logic in the area where the comment is.
This is async and makes use of an optimal thread pool managed by .net to ensure the results are acquired as fast as possible.
This is an automatically managed async parallel operation.
It's also worth noting that I if I do this ...
var query = getData.Where( ... );
... no actual code is run until I begin iterating the IQueryable and by declaring the operation a parallel one the framework is able to operate on more than one of the results at any point in time by threading the code for you.
The ForEach is essentially just a normal foreach loop where each iteration is asynchronously handled.
The logic you put in there could call some sort of callback if you wanted but that's down to how you wrap this code ...
Might I suggest something like this:
void DoAsync<T>(IQueryable<T> items, Func<T> operation, Func<T> callback)
{
items.AsParallel().ForEach(x => {
operation(x);
callback(x);
});
}
This is pretty simple with the TPL.
Here's a dummy "slow" enumerator that has to do a bit of work between getting items:
static IEnumerable<int> SlowEnumerator()
{
for (int i = 0; i < 10; i++)
{
Thread.Sleep(1000);
yield return i;
}
}
Here's a dummy bit of work to do with each item in the sequence:
private static void DoWork(int i)
{
Thread.Sleep(1000);
Console.WriteLine("{0} at {1}", i, DateTime.Now);
}
And here's how you can simultenously run the "bit of work" on one item that the enumerator has returned and ask the enumerator for the next item:
foreach (var i in SlowEnumerator())
{
Task.Run(() => DoWork(i));
}
You should get work done every second - not every 2 seconds as you would expect if you had to interleave the two types of work:
0 at 20/01/2015 10:56:52
1 at 20/01/2015 10:56:53
2 at 20/01/2015 10:56:54
3 at 20/01/2015 10:56:55
4 at 20/01/2015 10:56:56
5 at 20/01/2015 10:56:57
6 at 20/01/2015 10:56:58
7 at 20/01/2015 10:56:59
8 at 20/01/2015 10:57:00
9 at 20/01/2015 10:57:01
I was wondering, is there an elegant way to remove multiple items from a generic collection (in my case, a List<T>) without doing something such as specifying a predicate in a LINQ query to find the items to delete?
I'm doing a bit of batch processing, in which I'm filling a List<T> with Record object types that need to be processed. This processing concludes with each object being inserted into a database. Instead of building the list, and then looping through each individual member and processing/inserting it, I want to perform transactional bulk inserts with groups of N items from the list because it's less resource intensive (where N represents the BatchSize that I can put in a config file, or equivalent).
I'm looking to do something like:
public void ProcessRecords()
{
// list of Records will be a collection of List<Record>
var listOfRecords = GetListOfRecordsFromDb( _connectionString );
var batchSize = Convert.ToInt32( ConfigurationManager.AppSettings["BatchSize"] );
do
{
var recordSubset = listOfRecords.Take(batchSize);
DoProcessingStuffThatHappensBeforeInsert( recordSubset );
InsertBatchOfRecords( recordSubset );
// now I want to remove the objects added to recordSubset from the original list
// the size of listOfRecords afterwards should be listOfRecords.Count - batchSize
} while( listOfRecords.Any() )
}
I'm looking for a way to do this all at once, instead of iterating through the subset and removing the items that way, such as:
foreach(Record rec in recordSubset)
{
if( listOfRecords.Contains(rec) )
{
listOfRecords.Remove(rec);
}
}
I was looking at using List.RemoveRange( batchSize ), but wanted to get some StackOverflow feedback first :) What methods do you use to maximize the efficiency of your batch processing algorithms in C#?
Any help/suggestions/hints are much appreciated!
With extension method
public static IEnumerable<List<T>> ToBatches<T>(this List<T> list, int batchSize)
{
int index = 0;
List<T> batch = new List<T>(batchSize);
foreach (T item in list)
{
batch.Add(item);
index++;
if (index == batchSize)
{
index = 0;
yield return batch;
batch = new List<T>(batchSize);
}
}
yield return batch;
}
You can split input sequence into batches:
foreach(var batch in listOfRecords.ToBatches(batchSize))
{
DoProcessingStuffThatHappensBeforeInsert(batch);
InsertBatchOfRecords(batch);
}
MoreLINQ has a Batch extension method that would allow you to call
var listOfRecords = GetListOfRecordsFromDb( _connectionString );
var batchSize = Convert.ToInt32( ConfigurationManager.AppSettings["BatchSize"] );
foreach(var batch in listOfRecords.Batch(batchSize))
{
DoProcessingStuffThatHappensBeforeInsert(batch);
InsertBatchOfRecords(batch);
}
You wouldn't need to bother taking stuff out of the listOfRecords.
IQueryable<SomeType> collection = GetCollection();
foreach (var c in collection)
{
//do some complex checking that can't be embedded in a query
//based on results from prev line we want to discard the 'c' object
}
//here I only want the results of collection - the discarded objects
So with that simple code what is the best way to get the results. Should I created a List just before the foreach and insert the objects I want to keep, or is there some other way that would be better to do this type of thing.
I know there are other posts on similar topics but I just don't feel I'm getting what I need out of them.
Edit I tried this
var collection = GetCollection().Where(s =>
{
if (s.property == 1)
{
int num= Number(s);
double avg = Avg(s.x);
if (num > avg)
return true;
else
return false;
}
else return false;
});
I tried this but was given "A lambda expression with a statement body cannot be converted to an expression tree" on compile. Did I not do something right?
//do some complex checking that can't be embedded in a query
I don't get it. You can pass a delegate which can point to a very complex function (Turing-complete) that checks whether you should discard it or not:
var result = GetCollection().AsEnumerable().Where(c => {
// ...
// process "c"
// return true if you want it in the collection
});
If you want, you can refactor it in another function:
var result = GetCollection.Where(FunctionThatChecksToDiscardOrNot);
If you wrap it into another method, you can use yield return and then iterate over the returned collection, like so:
public IEnumerable<SomeType> FindResults(IQueryable<SomeType> collection) {
foreach (var c in collection)
{
if (doComplicatedQuery(c)) {
yield return c;
}
}
}
// elsewhere
foreach (var goodItem in FindResults(GetCollection())) {
// do stuff.
}
What is the best way to approach removing items from a collection in C#, once the item is known, but not it's index. This is one way to do it, but it seems inelegant at best.
//Remove the existing role assignment for the user.
int cnt = 0;
int assToDelete = 0;
foreach (SPRoleAssignment spAssignment in workspace.RoleAssignments)
{
if (spAssignment.Member.Name == shortName)
{
assToDelete = cnt;
}
cnt++;
}
workspace.RoleAssignments.Remove(assToDelete);
What I would really like to do is find the item to remove by property (in this case, name) without looping through the entire collection and using 2 additional variables.
If RoleAssignments is a List<T> you can use the following code.
workSpace.RoleAssignments.RemoveAll(x =>x.Member.Name == shortName);
If you want to access members of the collection by one of their properties, you might consider using a Dictionary<T> or KeyedCollection<T> instead. This way you don't have to search for the item you're looking for.
Otherwise, you could at least do this:
foreach (SPRoleAssignment spAssignment in workspace.RoleAssignments)
{
if (spAssignment.Member.Name == shortName)
{
workspace.RoleAssignments.Remove(spAssignment);
break;
}
}
#smaclell asked why reverse iteration was more efficient in in a comment to #sambo99.
Sometimes it's more efficient. Consider you have a list of people, and you want to remove or filter all customers with a credit rating < 1000;
We have the following data
"Bob" 999
"Mary" 999
"Ted" 1000
If we were to iterate forward, we'd soon get into trouble
for( int idx = 0; idx < list.Count ; idx++ )
{
if( list[idx].Rating < 1000 )
{
list.RemoveAt(idx); // whoops!
}
}
At idx = 0 we remove Bob, which then shifts all remaining elements left. The next time through the loop idx = 1, but
list[1] is now Ted instead of Mary. We end up skipping Mary by mistake. We could use a while loop, and we could introduce more variables.
Or, we just reverse iterate:
for (int idx = list.Count-1; idx >= 0; idx--)
{
if (list[idx].Rating < 1000)
{
list.RemoveAt(idx);
}
}
All the indexes to the left of the removed item stay the same, so you don't skip any items.
The same principle applies if you're given a list of indexes to remove from an array. In order to keep things straight you need to sort the list and then remove the items from highest index to lowest.
Now you can just use Linq and declare what you're doing in a straightforward manner.
list.RemoveAll(o => o.Rating < 1000);
For this case of removing a single item, it's no more efficient iterating forwards or backwards. You could also use Linq for this.
int removeIndex = list.FindIndex(o => o.Name == "Ted");
if( removeIndex != -1 )
{
list.RemoveAt(removeIndex);
}
If it's an ICollection then you won't have a RemoveAll method. Here's an extension method that will do it:
public static void RemoveAll<T>(this ICollection<T> source,
Func<T, bool> predicate)
{
if (source == null)
throw new ArgumentNullException("source", "source is null.");
if (predicate == null)
throw new ArgumentNullException("predicate", "predicate is null.");
source.Where(predicate).ToList().ForEach(e => source.Remove(e));
}
Based on:
http://phejndorf.wordpress.com/2011/03/09/a-removeall-extension-for-the-collection-class/
For a simple List structure the most efficient way seems to be using the Predicate RemoveAll implementation.
Eg.
workSpace.RoleAssignments.RemoveAll(x =>x.Member.Name == shortName);
The reasons are:
The Predicate/Linq RemoveAll method is implemented in List and has access to the internal array storing the actual data. It will shift the data and resize the internal array.
The RemoveAt method implementation is quite slow, and will copy the entire underlying array of data into a new array. This means reverse iteration is useless for List
If you are stuck implementing this in a the pre c# 3.0 era. You have 2 options.
The easily maintainable option. Copy all the matching items into a new list and and swap the underlying list.
Eg.
List<int> list2 = new List<int>() ;
foreach (int i in GetList())
{
if (!(i % 2 == 0))
{
list2.Add(i);
}
}
list2 = list2;
Or
The tricky slightly faster option, which involves shifting all the data in the list down when it does not match and then resizing the array.
If you are removing stuff really frequently from a list, perhaps another structure like a HashTable (.net 1.1) or a Dictionary (.net 2.0) or a HashSet (.net 3.5) are better suited for this purpose.
What type is the collection? If it's List, you can use the helpful "RemoveAll":
int cnt = workspace.RoleAssignments
.RemoveAll(spa => spa.Member.Name == shortName)
(This works in .NET 2.0. Of course, if you don't have the newer compiler, you'll have to use "delegate (SPRoleAssignment spa) { return spa.Member.Name == shortName; }" instead of the nice lambda syntax.)
Another approach if it's not a List, but still an ICollection:
var toRemove = workspace.RoleAssignments
.FirstOrDefault(spa => spa.Member.Name == shortName)
if (toRemove != null) workspace.RoleAssignments.Remove(toRemove);
This requires the Enumerable extension methods. (You can copy the Mono ones in, if you are stuck on .NET 2.0). If it's some custom collection that cannot take an item, but MUST take an index, some of the other Enumerable methods, such as Select, pass in the integer index for you.
This is my generic solution
public static IEnumerable<T> Remove<T>(this IEnumerable<T> items, Func<T, bool> match)
{
var list = items.ToList();
for (int idx = 0; idx < list.Count(); idx++)
{
if (match(list[idx]))
{
list.RemoveAt(idx);
idx--; // the list is 1 item shorter
}
}
return list.AsEnumerable();
}
It would look much simpler if extension methods support passing by reference !
usage:
var result = string[]{"mike", "john", "ali"}
result = result.Remove(x => x.Username == "mike").ToArray();
Assert.IsTrue(result.Length == 2);
EDIT: ensured that the list looping remains valid even when deleting items by decrementing the index (idx).
Here is a pretty good way to do it
http://support.microsoft.com/kb/555972
System.Collections.ArrayList arr = new System.Collections.ArrayList();
arr.Add("1");
arr.Add("2");
arr.Add("3");
/*This throws an exception
foreach (string s in arr)
{
arr.Remove(s);
}
*/
//where as this works correctly
Console.WriteLine(arr.Count);
foreach (string s in new System.Collections.ArrayList(arr))
{
arr.Remove(s);
}
Console.WriteLine(arr.Count);
Console.ReadKey();
There is another approach you can take depending on how you're using your collection. If you're downloading the assignments one time (e.g., when the app runs), you could translate the collection on the fly into a hashtable where:
shortname => SPRoleAssignment
If you do this, then when you want to remove an item by short name, all you need to do is remove the item from the hashtable by key.
Unfortunately, if you're loading these SPRoleAssignments a lot, that obviously isn't going to be any more cost efficient in terms of time. The suggestions other people made about using Linq would be good if you're using a new version of the .NET Framework, but otherwise, you'll have to stick to the method you're using.
Similar to Dictionary Collection point of view, I have done this.
Dictionary<string, bool> sourceDict = new Dictionary<string, bool>();
sourceDict.Add("Sai", true);
sourceDict.Add("Sri", false);
sourceDict.Add("SaiSri", true);
sourceDict.Add("SaiSriMahi", true);
var itemsToDelete = sourceDict.Where(DictItem => DictItem.Value == false);
foreach (var item in itemsToDelete)
{
sourceDict.Remove(item.Key);
}
Note:
Above code will fail in .Net Client Profile (3.5 and 4.5) also some viewers mentioned it is
Failing for them in .Net4.0 as well not sure which settings are causing the problem.
So replace with below code (.ToList()) for Where statement, to avoid that error. “Collection was modified; enumeration operation may not execute.”
var itemsToDelete = sourceDict.Where(DictItem => DictItem.Value == false).ToList();
Per MSDN From .Net4.5 onwards Client Profile are discontinued. http://msdn.microsoft.com/en-us/library/cc656912(v=vs.110).aspx
Save your items first, than delete them.
var itemsToDelete = Items.Where(x => !!!your condition!!!).ToArray();
for (int i = 0; i < itemsToDelete.Length; ++i)
Items.Remove(itemsToDelete[i]);
You need to override GetHashCode() in your Item class.
The best way to do it is by using linq.
Example class:
public class Product
{
public string Name { get; set; }
public string Price { get; set; }
}
Linq query:
var subCollection = collection1.RemoveAll(w => collection2.Any(q => q.Name == w.Name));
This query will remove all elements from collection1 if Name match any element Name from collection2
Remember to use: using System.Linq;
To do this while looping through the collection and not to get the modifying a collection exception, this is the approach I've taken in the past (note the .ToList() at the end of the original collection, this creates another collection in memory, then you can modify the existing collection)
foreach (SPRoleAssignment spAssignment in workspace.RoleAssignments.ToList())
{
if (spAssignment.Member.Name == shortName)
{
workspace.RoleAssignments.Remove(spAssignment);
}
}
If you have got a List<T>, then List<T>.RemoveAll is your best bet. There can't be anything more efficient. Internally it does the array moving in one shot, not to mention it is O(N).
If all you got is an IList<T> or an ICollection<T> you got roughly these three options:
public static void RemoveAll<T>(this IList<T> ilist, Predicate<T> predicate) // O(N^2)
{
for (var index = ilist.Count - 1; index >= 0; index--)
{
var item = ilist[index];
if (predicate(item))
{
ilist.RemoveAt(index);
}
}
}
or
public static void RemoveAll<T>(this ICollection<T> icollection, Predicate<T> predicate) // O(N)
{
var nonMatchingItems = new List<T>();
// Move all the items that do not match to another collection.
foreach (var item in icollection)
{
if (!predicate(item))
{
nonMatchingItems.Add(item);
}
}
// Clear the collection and then copy back the non-matched items.
icollection.Clear();
foreach (var item in nonMatchingItems)
{
icollection.Add(item);
}
}
or
public static void RemoveAll<T>(this ICollection<T> icollection, Func<T, bool> predicate) // O(N^2)
{
foreach (var item in icollection.Where(predicate).ToList())
{
icollection.Remove(item);
}
}
Go for either 1 or 2.
1 is lighter on memory and faster if you have less deletes to perform (i.e. predicate is false most of the times).
2 is faster if you have more deletes to perform.
3 is the cleanest code but performs poorly IMO. Again all that depends on input data.
For some benchmarking details see https://github.com/dotnet/BenchmarkDotNet/issues/1505
A lot of good responses here; I especially like the lambda expressions...very clean. I was remiss, however, in not specifying the type of Collection. This is a SPRoleAssignmentCollection (from MOSS) that only has Remove(int) and Remove(SPPrincipal), not the handy RemoveAll(). So, I have settled on this, unless there is a better suggestion.
foreach (SPRoleAssignment spAssignment in workspace.RoleAssignments)
{
if (spAssignment.Member.Name != shortName) continue;
workspace.RoleAssignments.Remove((SPPrincipal)spAssignment.Member);
break;
}