IQueryable remove from the collection, best way?

IQueryable remove from the collection, best way? - c#

IQueryable<SomeType> collection = GetCollection();
foreach (var c in collection)
{
//do some complex checking that can't be embedded in a query
//based on results from prev line we want to discard the 'c' object
}
//here I only want the results of collection - the discarded objects
So with that simple code what is the best way to get the results. Should I created a List just before the foreach and insert the objects I want to keep, or is there some other way that would be better to do this type of thing.
I know there are other posts on similar topics but I just don't feel I'm getting what I need out of them.
Edit I tried this
var collection = GetCollection().Where(s =>
{
if (s.property == 1)
{
int num= Number(s);
double avg = Avg(s.x);
if (num > avg)
return true;
else
return false;
}
else return false;
});
I tried this but was given "A lambda expression with a statement body cannot be converted to an expression tree" on compile. Did I not do something right?

//do some complex checking that can't be embedded in a query
I don't get it. You can pass a delegate which can point to a very complex function (Turing-complete) that checks whether you should discard it or not:
var result = GetCollection().AsEnumerable().Where(c => {
// ...
// process "c"
// return true if you want it in the collection
});
If you want, you can refactor it in another function:
var result = GetCollection.Where(FunctionThatChecksToDiscardOrNot);

If you wrap it into another method, you can use yield return and then iterate over the returned collection, like so:
public IEnumerable<SomeType> FindResults(IQueryable<SomeType> collection) {
foreach (var c in collection)
{
if (doComplicatedQuery(c)) {
yield return c;
}
}
}
// elsewhere
foreach (var goodItem in FindResults(GetCollection())) {
// do stuff.
}

Related

IEnumerable<T> and .Where Linq method behaviour?

I thought I know everything about IEnumerable<T> but I just met a case that I cannot explain. When we call .Where linq method on a IEnumerable, the execution is deferred until the object is enumerated, isn't it?
So how to explain the sample below :
public class CTest
{
public CTest(int amount)
{
Amount = amount;
}
public int Amount { get; set; }
public override string ToString()
{
return $"Amount:{Amount}";
}
public static IEnumerable<CTest> GenerateEnumerableTest()
{
var tab = new List<int> { 2, 5, 10, 12 };
return tab.Select(t => new CTest(t));
}
}
Nothing bad so far!
But the following test gives me an unexpected result although my knowledge regarding IEnumerable<T> and .Where linq method :
[TestMethod]
public void TestCSharp()
{
var tab = CTest.GenerateEnumerableTest();
foreach (var item in tab.Where(i => i.Amount > 6))
{
item.Amount = item.Amount * 2;
}
foreach (var t in tab)
{
var s = t.ToString();
Debug.Print(s);
}
}
No item from tab will be multiplied by 2. The output will be :
Amount:2
Amount:5
Amount:10
Amount:12
Does anyone can explain why after enumerating tab, I get the original value.
Of course, everything work fine after calling .ToList() just after calling GenerateEnumerableTest() method.

var tab = CTest.GenerateEnumerableTest();
This tab is a LINQ query that generates CTest instances that are initialized from int-values which come from an integer array which will never change. So whenever you ask for this query you will get the "same" instances(with the original Amount).
If you want to "materialize" this query you could use ToList and then change them.
Otherwise you are modifying CTest instances that exist only in the first foreach loop. The second loop enumerates other CTest instances with the unmodified Amount.
So the query contains the informations how to get the items, you could also call the method directly:
foreach (var item in CTest.GenerateEnumerableTest().Where(i => i.Amount > 6))
{
item.Amount = item.Amount * 2;
}
foreach (var t in CTest.GenerateEnumerableTest())
{
// now you don't expect them to be changed, do you?
}

Like many LINQ operations, Select is lazy and use deferred execution so your lambda expression is never being executed, because you're calling Select but never using the results. This is why, everything work fine after calling .ToList() just after calling GenerateEnumerableTest() method:
var tab = CTest.GenerateEnumerableTest().ToList();

How to create an IEnumerable with optional Action?

I have IntegerRectangle class. I want it to have an internal_perimeter() method which returns all points of its perimeter and internal_perimeter(Action<Integer> processor) which applies processor to all points of its perimeter.
One of my classes has a variable IntegerRect canvas; and HashSet<IntegerPoint> forbidden_points It calls:
canvas.internal_perimeter((IntegerPoint p)=>{forbidden_points.Add(p); print("[f]" + forbidden_points.Contains(p).ToString());});
The result differs between different implementations of internal_perimeter()
This works:
public IEnumerable<IntegerPoint> internal_perimeter()
{
for(int i=0;i<width;++i)
{
yield return new IntegerPoint(x+i,y);
}
for(int i=1;i<height;++i)
{
yield return new IntegerPoint(x+width-1,y-i);
}
for(int i=width-2;i>=0;--i)
{
yield return new IntegerPoint(x+i,y-height+1);
}
for(int i=height-2;i>=0;--i)
{
yield return new IntegerPoint(x,y-i);
}
}
public void internal_perimeter(Action<IntegerPoint> processor)
{
foreach(IntegerPoint i in internal_perimeter())
processor(i);
}
This doesn't:
public IEnumerable<IntegerPoint> internal_perimeter(Action<IntegerPoint> processor=null)
{
if(processor==null)
{
for(int i=0;i<width;++i)
{
yield return new IntegerPoint(x+i,y);
}
for(int i=1;i<height;++i)
{
yield return new IntegerPoint(x+width-1,y-i);
}
for(int i=width-2;i>=0;--i)
{
yield return new IntegerPoint(x+i,y-height+1);
}
for(int i=height-2;i>=0;--i)
{
yield return new IntegerPoint(x,y-i);
}
}
else
foreach(IntegerPoint i in internal_perimeter())
processor(i);
}
I don't understand what is wrong with the second one

To add to #Lucas' answer, which answers why your code doesn't work, you should also consider refactoring your code:
internal_perimeter is a bad name for the method. If its purpose is to mutate internal points, then it should be named void Process(Action a) or something like that.
The second example is rather problematic because it returns nothing (an empty sequence) when you don't pass null for the action parameter. It would make more sense to use a Func<T, Tresult (like LINQ Select) and yield return all processed parameters. Also, the null branch is really uncommon (it is rarely recommended to pass a null delegate like this).
Next, the method really does too little. Why do you need a new method which has an existing LINQ alternative? I.e.:
var rect = new IntegerRectangle();
// this gets a list of points
var forbiddenPoints = rect.internal_perimeter().ToList();
// this filters them and projects them
// (i.e. "get all x coordinates larger then 10")
var xLargerThan10 = rect
.internal_perimeter()
.Where(p => p.X > 10)
.Select(p => p.X)
.ToList();
Even the original internal_perimeter overload might have a better name, e.g. simply GetPoints would be pretty indicative of what its purpose is:
foreach (var point in rect.GetPoints())
DoStuff(point);

Your second example is an iterator (ie it uses yield return). This kind of function is not executed until you enumerate it.
If you do: var x = internal_perimeter(i => {});
The variable x will hold an IEnumerable<IntegerPoint> of a class constructed by the compiler from your function. Your code is not executed yet at this point.
Now, try to consume it: foreach(var point in x) {}. This will execute your function. Actually in your particular case, it will all be executed on the first iteration, so calling x.FirstOrDefault(); will be enough. Indeed, calling MoveNext on the enumerator will execute the code up to the first yield return, and there are none in the else branch of your code.
Now, I'd go with your first example because of this. It is less error prone.

Check if IEnumerable has ANY rows without enumerating over the entire list

I have the following method which returns an IEnumerable of type T. The implementation of the method is not important, apart from the yield return to lazy load the IEnumerable. This is necessary as the result could have millions of items.
public IEnumerable<T> Parse()
{
foreach(...)
{
yield return parsedObject;
}
}
Problem:
I have the following property which can be used to determine if the IEnumerable will have any items:
public bool HasItems
{
get
{
return Parse().Take(1).SingleOrDefault() != null;
}
}
Is there perhaps a better way to do this?

IEnumerable.Any() will return true if there are any elements in the sequence and false if there are no elements in the sequence. This method will not iterate the entire sequence (only maximum one element) since it will return true if it makes it past the first element and false if it does not.

Similar to Howto: Count the items from a IEnumerable<T> without iterating? an Enumerable is meant to be a lazy, read-forward "list", and like quantum mechanics the act of investigating it alters its state.
See confirmation: https://dotnetfiddle.net/GPMVXH
var sideeffect = 0;
var enumerable = Enumerable.Range(1, 10).Select(i => {
// show how many times it happens
sideeffect++;
return i;
});
// will 'enumerate' one item!
if(enumerable.Any()) Console.WriteLine("There are items in the list; sideeffect={0}", sideeffect);
enumerable.Any() is the cleanest way to check if there are any items in the list. You could try casting to something not lazy, like if(null != (list = enumerable as ICollection<T>) && list.Any()) return true.
Or, your scenario may permit using an Enumerator and making a preliminary check before enumerating:
var e = enumerable.GetEnumerator();
// check first
if(!e.MoveNext()) return;
// do some stuff, then enumerate the list
do {
actOn(e.Current); // do stuff with the current item
} while(e.MoveNext()); // stop when we don't have anything else

The best way to answer this question, and to clear all doubts, is to see what the 'Any' function does.
public static bool Any<TSource>(this IEnumerable<TSource> source) {
if (source == null) throw Error.ArgumentNull("source");
using (IEnumerator<TSource> e = source.GetEnumerator()) {
if (e.MoveNext()) return true;
}
return false;
}
https://github.com/microsoft/referencesource/blob/master/System.Core/System/Linq/Enumerable.cs

Is there a way to know I am getting the last element in the foreach loop

I need to do special treatment for the last element in the collection. I am wondering if I can know I hit the last element when using foreach loop.

Only way I know of is to increment a counter and compare with length on exit, or when breaking out of loop set a boolean flag, loopExitedEarly.

There isn't a direct way. You'll have to keep buffering the next element.
IEnumerable<Foo> foos = ...
Foo prevFoo = default(Foo);
bool elementSeen = false;
foreach (Foo foo in foos)
{
if (elementSeen) // If prevFoo is not the last item...
ProcessNormalItem(prevFoo);
elementSeen = true;
prevFoo = foo;
}
if (elementSeen) // Required because foos might be empty.
ProcessLastItem(prevFoo);
Alternatively, you could use the underlying enumerator to do the same thing:
using (var erator = foos.GetEnumerator())
{
if (!erator.MoveNext())
return;
Foo current = erator.Current;
while (erator.MoveNext())
{
ProcessNormalItem(current);
current = erator.Current;
}
ProcessLastItem(current);
}
It's a lot easier when working with collections that reveal how many elements they have (typically the Count property from ICollection or ICollection<T>) - you can maintain a counter (alternatively, if the collection exposes an indexer, you could use a for-loop instead):
int numItemsSeen = 0;
foreach(Foo foo in foos)
{
if(++numItemsSeen == foos.Count)
ProcessLastItem(foo)
else ProcessNormalItem(foo);
}
If you can use MoreLinq, it's easy:
foreach (var entry in foos.AsSmartEnumerable())
{
if(entry.IsLast)
ProcessLastItem(entry.Value)
else ProcessNormalItem(entry.Value);
}
If efficiency isn't a concern, you could do:
Foo[] fooArray = foos.ToArray();
foreach(Foo foo in fooArray.Take(fooArray.Length - 1))
ProcessNormalItem(foo);
ProcessLastItem(fooArray.Last());

Unfortunately not, I would write it with a for loop like:
string[] names = { "John", "Mary", "Stephanie", "David" };
int iLast = names.Length - 1;
for (int i = 0; i <= iLast; i++) {
Debug.Write(names[i]);
Debug.Write(i < iLast ? ", " : Environment.NewLine);
}
And yes, I know about String.Join :).
I see others already posted similar ideas while I was typing mine, but I'll post it anyway:
void Enumerate<T>(IEnumerable<T> items, Action<T, bool> action) {
IEnumerator<T> enumerator = items.GetEnumerator();
if (!enumerator.MoveNext()) return;
bool foundNext;
do {
T item = enumerator.Current;
foundNext = enumerator.MoveNext();
action(item, !foundNext);
}
while (foundNext);
}
...
string[] names = { "John", "Mary", "Stephanie", "David" };
Enumerate(names, (name, isLast) => {
Debug.Write(name);
Debug.Write(!isLast ? ", " : Environment.NewLine);
})

Not without jumping through flaming hoops (see above). But you can just use the enumerator directly (slightly awkward because of C#'s enumerator design):
IEnumerator<string> it = foo.GetEnumerator();
for (bool hasNext = it.MoveNext(); hasNext; ) {
string element = it.Current;
hasNext = it.MoveNext();
if (hasNext) { // normal processing
Console.Out.WriteLine(element);
} else { // special case processing for last element
Console.Out.WriteLine("Last but not least, " + element);
}
}
Notes on the other approaches I see here: Mitch's approach requires having access to a container which exposes it's size. J.D.'s approach requires writing a method in advance, then doing your processing via a closure. Ani's approach spreads loop management all over the place. John K's approach involves creating numerous additional objects, or (second method) only allows additional post processing of the last element, rather than special case processing.
I don't understand why people don't use the Enumerator directly in a normal loop, as I've shown here. K.I.S.S.
This is cleaner with Java iterators, because their interface uses hasNext rather than MoveNext. You could easily write an extension method for IEnumerable that gave you Java-style iterators, but that's overkill unless you write this kind of loop a lot.

Is it Special treatment can be done only while processing on the foreach loop, Is it you can't do that while adding to the collection. If this is your case, have your own custom collection,
public class ListCollection : List<string>
{
string _lastitem;
public void Add(string item)
{
//TODO: Do special treatment on the new Item, new item should be last one.
//Not applicable for filter/sort
base.Add(item);
}
}

List<int> numbers = new ....;
int last = numbers.Last();
Stack<int> stack = new ...;
stack.Peek();
update
var numbers = new int[] { 1, 2,3,4,5 };
var enumerator = numbers.GetEnumerator();
object last = null;
bool hasElement = true;
do
{
hasElement = enumerator.MoveNext();
if (hasElement)
{
last = enumerator.Current;
Console.WriteLine(enumerator.Current);
}
else
Console.WriteLine("Last = {0}", last);
} while (hasElement);
Console.ReadKey();

Deferred Execution trick
Build a class that encapsulates the values to be processed and the processing function for deferred execution purpose. We will end up using one instance of it for each element processed in the loop.
// functor class
class Runner {
string ArgString {get;set;}
object ArgContext {get;set;}
// CTOR: encapsulate args and a context to run them in
public Runner(string str, object context) {
ArgString = str;
ArgContext = context;
}
// This is the item processor logic.
public void Process() {
// process ArgString normally in ArgContext
}
}
Use your functor in the foreach loop to effect deferred execution by one element:
// intended to track previous item in the loop
var recent = default(Runner); // see Runner class above
// normal foreach iteration
foreach(var str in listStrings) {
// is deferred because this executes recent item instead of current item
if (recent != null)
recent.Process(); // run recent processing (from previous iteration)
// store the current item for next iteration
recent = new Runner(str, context);
}
// now the final item remains unprocessed - you have a choice
if (want_to_process_normally)
recent.Process(); // just like the others
else
do_something_else_with(recent.ArgString, recent.ArgContext);
This functor approach uses memory more but prevents you from having to count the elements in advance. In some scenarios you might achieve a kind of efficiency.
OR
Shorter Workaround
If you want to apply special processing to the last element after processing them all in a regular way ....
// example using strings
var recentStr = default(string);
foreach(var str in listStrings) {
recentStr = str;
// process str normally
}
// now apply additional special processing to recentStr (last)
It's a potential workaround.

Remove repetitive, hard coded loops and conditions in C#

I have a class that compares 2 instances of the same objects, and generates a list of their differences. This is done by looping through the key collections and filling a set of other collections with a list of what has changed (this may make more sense after viewing the code below). This works, and generates an object that lets me know what exactly has been added and removed between the "old" object and the "new" one.
My question/concern is this...it is really ugly, with tons of loops and conditions. Is there a better way to store/approach this, without having to rely so heavily on endless groups of hard-coded conditions?
public void DiffSteps()
{
try
{
//Confirm that there are 2 populated objects to compare
if (NewStep.Id != Guid.Empty && SavedStep.Id != Guid.Empty)
{
//<TODO> Find a good way to compare quickly if the objects are exactly the same...hash?
//Compare the StepDoc collections:
OldDocs = SavedStep.StepDocs;
NewDocs = NewStep.StepDocs;
Collection<StepDoc> docstoDelete = new Collection<StepDoc>();
foreach (StepDoc oldDoc in OldDocs)
{
bool delete = false;
foreach (StepDoc newDoc in NewDocs)
{
if (newDoc.DocId == oldDoc.DocId)
{
delete = true;
}
}
if (delete)
docstoDelete.Add(oldDoc);
}
foreach (StepDoc doc in docstoDelete)
{
OldDocs.Remove(doc);
NewDocs.Remove(doc);
}
//Same loop(s) for StepUsers...omitted for brevity
//This is a collection of users to delete; it is the collection
//of users that has not changed. So, this collection also needs to be checked
//to see if the permisssions (or any other future properties) have changed.
foreach (StepUser user in userstoDelete)
{
//Compare the two
StepUser oldUser = null;
StepUser newUser = null;
foreach(StepUser oldie in OldUsers)
{
if (user.UserId == oldie.UserId)
oldUser = oldie;
}
foreach (StepUser newie in NewUsers)
{
if (user.UserId == newie.UserId)
newUser = newie;
}
if(oldUser != null && newUser != null)
{
if (oldUser.Role != newUser.Role)
UpdatedRoles.Add(newUser.Name, newUser.Role);
}
OldUsers.Remove(user);
NewUsers.Remove(user);
}
}
}
catch(Exception ex)
{
string errorMessage =
String.Format("Error generating diff between Step objects {0} and {1}", NewStep.Id, SavedStep.Id);
log.Error(errorMessage,ex);
throw;
}
}
The targeted framework is 3.5.

Are you using .NET 3.5? I'm sure LINQ to Objects would make a lot of this much simpler.
Another thing to think about is that if you've got a lot of code with a common pattern, where just a few things change (e.g. "which property am I comparing?" then that's a good candidate for a generic method taking a delegate to represent that difference.
EDIT: Okay, now we know we can use LINQ:
Step 1: Reduce nesting
Firstly I'd take out one level of nesting. Instead of:
if (NewStep.Id != Guid.Empty && SavedStep.Id != Guid.Empty)
{
// Body
}
I'd do:
if (NewStep.Id != Guid.Empty && SavedStep.Id != Guid.Empty)
{
return;
}
// Body
Early returns like that can make code much more readable.
Step 2: Finding docs to delete
This would be much nicer if you could simply specify a key function to Enumerable.Intersect. You can specify an equality comparer, but building one of those is a pain, even with a utility library. Ah well.
var oldDocIds = OldDocs.Select(doc => doc.DocId);
var newDocIds = NewDocs.Select(doc => doc.DocId);
var deletedIds = oldDocIds.Intersect(newDocIds).ToDictionary(x => x);
var deletedDocs = oldDocIds.Where(doc => deletedIds.Contains(doc.DocId));
Step 3: Removing the docs
Either use the existing foreach loop, or change the properties. If your properties are actually of type List<T> then you could use RemoveAll.
Step 4: Updating and removing users
foreach (StepUser deleted in usersToDelete)
{
// Should use SingleOfDefault here if there should only be one
// matching entry in each of NewUsers/OldUsers. The
// code below matches your existing loop.
StepUser oldUser = OldUsers.LastOrDefault(u => u.UserId == deleted.UserId);
StepUser newUser = NewUsers.LastOrDefault(u => u.UserId == deleted.UserId);
// Existing code here using oldUser and newUser
}
One option to simplify things even further would be to implement an IEqualityComparer using UserId (and one for docs with DocId).

As you are using at least .NET 2.0 I recommend implement Equals and GetHashCode ( http://msdn.microsoft.com/en-us/library/7h9bszxx.aspx ) on StepDoc. As a hint to how it can clean up your code you could have something like this:
Collection<StepDoc> docstoDelete = new Collection<StepDoc>();
foreach (StepDoc oldDoc in OldDocs)
{
bool delete = false;
foreach (StepDoc newDoc in NewDocs)
{
if (newDoc.DocId == oldDoc.DocId)
{
delete = true;
}
}
if (delete) docstoDelete.Add(oldDoc);
}
foreach (StepDoc doc in docstoDelete)
{
OldDocs.Remove(doc);
NewDocs.Remove(doc);
}
with this:
oldDocs.FindAll(newDocs.Contains).ForEach(delegate(StepDoc doc) {
oldDocs.Remove(doc);
newDocs.Remove(doc);
});
This assumes oldDocs is a List of StepDoc.

If both StepDocs and StepUsers implement IComparable<T>, and they are stored in collections that implement IList<T>, then you can use the following helper method to simplify this function. Just call it twice, once with StepDocs, and once with StepUsers. Use the beforeRemoveCallback to implement the special logic used to do your role updates. I'm assuming the collections don't contain duplicates. I've left out argument checks.
public delegate void BeforeRemoveMatchCallback<T>(T item1, T item2);
public static void RemoveMatches<T>(
IList<T> list1, IList<T> list2,
BeforeRemoveMatchCallback<T> beforeRemoveCallback)
where T : IComparable<T>
{
// looping backwards lets us safely modify the collection "in flight"
// without requiring a temporary collection (as required by a foreach
// solution)
for(int i = list1.Count - 1; i >= 0; i--)
{
for(int j = list2.Count - 1; j >= 0; j--)
{
if(list1[i].CompareTo(list2[j]) == 0)
{
// do any cleanup stuff in this function, like your role assignments
if(beforeRemoveCallback != null)
beforeRemoveCallback(list[i], list[j]);
list1.RemoveAt(i);
list2.RemoveAt(j);
break;
}
}
}
}
Here is a sample beforeRemoveCallback for your updates code:
BeforeRemoveMatchCallback<StepUsers> callback =
delegate(StepUsers oldUser, StepUsers newUser)
{
if(oldUser.Role != newUser.Role)
UpdatedRoles.Add(newUser.Name, newUser.Role);
};

What framework are you targeting? (This will make a difference in the answer.)
Why is this a void function?
Shouldn't the signature look like:
DiffResults results = object.CompareTo(object2);

If you want to hide the traversal of the tree-like structure you could create an IEnumerator subclass that hides the "ugly" looping constructs and then use CompareTo interface:
MyTraverser t =new Traverser(oldDocs, newDocs);
foreach (object oldOne in t)
{
if (oldOne.CompareTo(t.CurrentNewOne) != 0)
{
// use RTTI to figure out what to do with the object
}
}
However, I'm not at all sure that this particularly simplifies anything. I don't mind seeing the nested traversal structures. The code is nested, but not complex or particularly difficult to understand.

Using multiple lists in foreach is easy. Do this:
foreach (TextBox t in col)
{
foreach (TextBox d in des) // here des and col are list having textboxes
{
// here remove first element then and break it
RemoveAt(0);
break;
}
}
It works similar as it is foreach (TextBox t in col && TextBox d in des)

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

IQueryable remove from the collection, best way? - c#

Related

IEnumerable<T> and .Where Linq method behaviour?

How to create an IEnumerable with optional Action?

Check if IEnumerable has ANY rows without enumerating over the entire list

Is there a way to know I am getting the last element in the foreach loop

Remove repetitive, hard coded loops and conditions in C#

Categories

Resources