Can't call another method from LINQ statement - c#

I have a c# statement which iterate thru a collection of row. One of the fields call a private method to get an array of object but I am getting null. I placed a breakpoint inside the linq but it never hits the method.
Here is my code
IQueryable<MyObject> myObject = ds.Tables['Table'].AsEnumerable().Select(row => new MyObject
{
id = row.Field<int>("ID"),
MyCollectionArray = this.getCollectionArray(row.Field<string>("MyAggregatedString")),
}).AsQueryable();
private MyObect[] getCollectionArray(string concatString)
{
// placed a breakpoint, it is never called. Not sure why
}
Thanks for any asistance.

What you are facing is called Deferred Execution.
Which means that your query is not executed until you use it somewhere.
Here is a part from the documentation:
Deferred execution means that the evaluation of an expression is
delayed until its realized value is actually required. Deferred
execution can greatly improve performance when you have to manipulate
large data collections, especially in programs that contain a series
of chained queries or manipulations. In the best case, deferred
execution enables only a single iteration through the source
collection.
And in order to execute your statement you just have to use it.
The simplest way might be just by calling
myObject.ToList()
Also it will be executed in case you use functions which require the values to be populated (like Sum, Average, etc)

Related

Using Linq for copying/filtering produces a dynamic result in foreach loops?

So I create this projection of a dictionary of items I would like to remove.
var toRemoveList =
this.outputDic.Keys.Where(key =>
this.removeDic.ContainsKey(key));
Then I iterate through the result removing from the actual dictionary
foreach(var key in toRemoveList)
this.outputDic.Remove(key);
However during that foreach an exception is thrown saying that the list was modified during the loop. But, how so? is the linq query somewhat dynamic and gets re evaluated every time the dictionary changes? A simple .ToArray() call on the end of the query solves the issues, but imo, it shouldn't even occur in the first place.
So I create this projection of a dictionary of items I would like to remove.
var toRemoveList =
this.outputDic.Keys.Where(key =>
this.removeDic.ContainsKey(key));
As I have often said, if I can teach people one thing about LINQ it is this: the result of a query expression is a query, not the results of executing the query. You now have an object that means "the keys of a dictionary such that the key is... something". It is not the results of that query, it is that query. The query is an object unto itself; it does not give you a result set until you ask for one.
Then you do this:
foreach(var key in toRemoveList)
this.outputDic.Remove(key);
So what are you doing? You are iterating over the query. Iterating over the query executes the query, so the query is iterating over the original dictionary. But you then remove an item from the dictionary, while you are iterating over it, which is illegal.
imo, it shouldn't even occur in the first place.
Your opinion about how the world should be is a common one, but doing it your way leads to deep inefficiencies. Let us suppose that creating a query executes the query immediately rather than creates a query object. What does this do?
var query = expensiveRemoteDatabase
.Where(somefilter)
.Where(someotherfilter)
.OrderBy(something);
The first call to Where produces a query, which in your world is then executed, pulling down from the remote database all records which match that query. The second call to Where then says "oh, sorry, I meant to also apply this filter here as well, can we do that whole query again, this time with the second filter?" and so then that whole record set is computed, and then we say "oh, no, wait, I forgot to tell you when you built that last query object, we're going to need to sort it, so database, can you run this query for me a third time?"
Now perhaps do you see why queries produce a query that then does not execute until it needs to?
The reason is that toRemoveList does not contain a list of things to be removed, it contains a description of how to get a list of things that can be removed.
If you step through this in a debugger using F11 you can see this quite clearly for yourself. The first point it stops is with the cursor on foreach which is what you would expect.
Next you stop at toRemoveList (the one in foreach(var key in toRemoveList)). This is where it is setting up the iterator.
When you step through var key (with F11) however it now jumps into the original definition of toRemoveList, specifically the this.removeDic.ContainsKey(key) part. Now you get an idea of what is really happening.
The foreach is calling the iterators Next method to move to the next point in the dictionary's keys and holds onto the list. When you call into this.outputDic.Remove(key); this detects that the iterator hasn't finished and therefore stops you with this error.
As everybody is saying on here, the correct way to solve this is to use ToArray()/ToList() as what these do is to give you another copy of the list. So the you have one to step through, and one to remove from.
The .ToArray() solves the issues because it forces you to evaluate the entire enumeration and cache the local values. Without doing so, when you enumerate through it the enumerable attempts to calculate the first index, return that, then return to the collection and calculate the next index. If the underlying collection you're iterating over changes, you can no longer guarantee that the enumeration will return the appropriate value.
In short: just force the evaluation with .ToArray() (or .ToList(), or whatever).
The LINQ query uses deferred execution. It streams the items one by one, retruning them as they're requested. So yes, every time you try to remove a key it changes the result which is why it throws an exception. When you invoke ToArray() it forces execution of the query which is why it works.
EDIT: This is somewhat in response to your comments. Check out iterator blocks on msdn this is the mechanism being used when your for each executes. Your query just gets turned into an expression tree and the filter, projects, operation ect is applied to the elements one by one when they're retrieved unless it is not possible.
The reason you are getting this error is because of deferred execution of linq. To explain it fully when your loop runs is actually when the data is fetch from the dictionary. Thus modification in outputdic takes place at this point of time and it is not allowed to modify the collection you are looping upon. This is why you get this error. You can get rid of this error by asking the compiler to execute it before you run the loop.
var toRemoveList =
this.outputDic.Keys.Where(key =>
this.removeDic.ContainsKey(key)).ToList();
Notice the ToList() in the above statement. It will make sure that your query has been executed and you have your list in toRemoveList.

Do linq queries re-execute if no changes occured?

I was reading up on LINQ. I know that there are deferred and immediate queries. I know with deferred types running the query when it's enumerated allows any changes to the data set to be reflected in each enumeration. But I can't seem to find an answer if there's a mechanism in place to prevent the query from running if no changes occurred to the data set since the last enumeration.
I read on MSDN referring to LINQ queries:
Therefore, it follows that if a query is enumerated twice it will be executed twice.
Have I overlooked an obvious - but...?
Indeed, there is none. Actually, that's not quite true - some LINQ providers will spot trivial but common examples like:
int id = ...
var customer = ctx.Customers.SingleOrDefault(x => x.Id == id);
and will intercept that via the identity-manager, i.e. it will check whether it has a matching record already in the context; if it does: it doesn't execute anything.
You should note that the re-execution also has nothing to do with whether or not data has changed; it will re-execute (or not) regardless.
There are two take-away messages here:
don't iterate any enumerable more than once: not least, it isn't guaranteed to work at all
if you want to buffer data, put it into a list/array
So:
var list = someQuery.ToList();
will never re-execute the query, no matter how many times you iterate over list. Because list is not a query: it is a list.
A third take-away would be:
if you have a context that lives long enough that it is interesting to ask about data migration, then you are probably holding your data-context for far, far too long - they are intended to be short-lived

LINQ to SQL use query after DataContext Dispose

Im writing DLL for one project, I just started using LINQ to SQL and after moving all methods to this new dll. I disovered that I can't acess DataContext because it was disposed, I understand why but I'm not sure how I can acess results of query for my main project method so:
My method in DLL
public static IEnumerable<Problem> GetProblemsNewest(int howMuch)
{
using (ProblemClassesDataContext context = new ProblemClassesDataContext())
{
var problems = (from p in context.Problems orderby p.DateTimeCreated select p).Take(howMuch);
return problems;
}
}
Calling it:
IEnumerable<Problem> problems = ProblemsManipulation.GetProblemsNewest(10);
//Error can't acess it because it was disposed..
This is just first method, I have larger ones so I really need a way to do this. There must be a way to use LINQ to SQL in DLL? I know I can do something Like .ToList or .ToArray but then I wouldn't be able to acess row properties directly and would have to reference it as problem[0],problem[1] etc. which is even more messy than having tone of code in main project.
After you are outside of the using statement the context is automatically disposed, so when the IEnumerable is actually enumerated the context is already disposed.
Therefore you need to tell Linq that it should go ahead and actually retrieve the values from the DB while your inside of your using statement. You can do so via ToList() or ToArray (or others).
See the updated code:
public static IList<Problem> GetProblemsNewest(int howMuch)
{
using (ProblemClassesDataContext context = new ProblemClassesDataContext())
{
var problems = (from p in context.Problems orderby p.DateTimeCreated select p).Take(howMuch);
return problems.ToList();
}
}
Change this:
return problems;
to this:
return problems.ToList();
Explanation:
The ToList() will iterate through the results and pull them all into memory. Because this happens inside the using statement, you're fine. And because it creates a list, your values will be returned.
You could do this in other ways. The basic idea is to actually retrieve the results before the using statement closes.
An alternate solution would be to avoid the using statement, create an iterator that owns the object and disposes it when the last item has been iterated past.
You could handle it by doing a .ToList() on the IEnumerable before exiting the using block. This will retrieve the records and populate the list. Depending on your scenario this might not be optimal in terms of performance though (you lose the possibility of lazy retrieval and additional filtering of the query)
You have to finish the query inside the using clause, i.e. use ToList() or First() or Count(), etc...
Currently you returning query, and when you want use it, because database connection closed before your usage, you will get an exception, just do:
return problems.AsEnumerable()
This is because of deffered execution manner of linq. In fact your problems object is just query, and you should convert it to objects to use it somewhere else.
You may not want to use the context in a using; the problem is then you can't use the navigation properties later on, because you'll have the same "object disposed" issue when it tries to load the related data. What you need to do is let the context live, and return the results directly. Or, when you return the results, call ToList(), and later on query all related data.

Correct usage of iterator blocks

I was refactoring some code earlier and I came across an implementation of an iterator block I wasn't too sure about. In an integration layer of a system where the client is calling an extrernal API for some data I have a set of translators that take the data returned from the API and translate it into collections of business entities used in the logic layer. A common translator class will look like this:
// translate a collection of entities coming back from an extrernal source into business entities
public static IEnumerable<MyBusinessEnt> Translate(IEnumerable<My3rdPartyEnt> ents) {
// for each 3rd party ent, create business ent and return collection
return from ent in ents
select new MyBusinessEnt {
Id = ent.Id,
Code = ent.Code
};
}
Today I came across the following code. Again, it's a translator class, it's purpose is to translate the collection in the parameter into the method return type. However, this time it's an iterator block:
// same implementation of a translator but as an iterator block
public static IEnumerable<MyBusinessEnt> Translate(IEnumerable<My3rdPartyEnt> ents) {
foreach(var ent in ents)
{
yield return new MyBusinessEnt {
Id = ent.Id,
Code = ent.Code
};
}
}
My question is: is this a valid use of an iterator block? I can't see the benefit of creating a translator class in this way. Could this result in some unexpected behaviour?
Your two samples do pretty much exactly the same thing. The query version will be rewritten into a call to Select, and Select is written exactly like your second example; it iterates over each element in the source collection and yield-returns a transformed element.
This is a perfectly valid use of an iterator block, though of course it is no longer necessary to write your own iterator blocks like this because you can just use Select.
Yes, that's valid. The foreach has the advantage of being debuggable,so I tend to prefer that design.
The first example is not an iterator. It just creates and returns an IEnumerable<MyBusinessEnt>.
The second is an iterator and I don't see anything wrong with it. Each time the caller iterates over the return value of that method, the yield will return a new element.
Yes, that works fine, and the result is very similar.
Both creates an object that is capable of returning the result. Both rely on the source enumerable to remain intact until the result is completed (or cut short). Both uses deferred execution, i.e. the objects are created one at a time when you iterate the result.
There is a difference in that the first returns an expression that uses library methods to produce an enumerator, while the second creates a custom enumerator.
The major difference is on when each code runs. First one is delayed until return value is iterated while second one runs immediately. What I mean is that the for loop is forcing the iteration to run. The fact that the class exposes a IEnumerable<T> and in this case is delayed is another thing.
This does not provide any benefit over simple Select. yield's real power is when there is a conditional involved:
foreach(var ent in ents)
{
if(someCondition)
yield return new MyBusinessEnt {
Id = ent.Id,
Code = ent.Code
};
}

LINQ - Does .Cast<T>() Select Records?

I've noticed that certain command cause LINQtoSQL to connect to the database and download the records that are part of the query, for example, .ToArray().
Does the command .Cast() cause a query to execute (and how can I tell these things in the future?). For example...
IRevision<T> current = context.GetTable(typeof(T))
.Cast<IRevision<T>>()
.SingleOrDefault(o => o.ID == recordId);
I know there is a command for .GetTable that allows you to specify a generic type, but for strange and unexplainable reasons, it cannot be used in this situation.
From Enumerable.Cast()'s remarks:
This method is implemented by using deferred execution. The immediate return value is an object that stores all the information that is required to perform the action. The query represented by this method is not executed until the object is enumerated either by calling its GetEnumerator method directly or by using foreach in Visual C# or For Each in Visual Basic.
All of the LINQ operators will let you know if they are deferred execution or immediate query execution. Additionally, here are the standard LINQ operators which are NOT deferred:
Aggregate
All
Any
Average
Contains
Count
ElementAt
ElementAtOrDefault
First
FirstOrDefault
Last
LastOrDefault
LongCount
Max
Min
SequenceEqual
Single
SingleOrDefault
Sum
ToArray
ToDictionary
ToList
ToLookup
No, it does not. It simply will perform a cast when you iterate through the IEnumerable.
There isn't any definitive way (in code) to know whether or not a method will use deferred execution or not. The documentation is going to be your best friend here as it will tell you if it defers execution or not.
However, that doesn't mean that you can't make some assumptions if the documentation is unclear.
If you have a method that returns another list/structure (like ToList, ToArray), then it will have to execute the query in order to populate the new data structure.
If the method returns a scalar value, then it will have to execute the query to generate that scalar value.
Other than that, if it simply returns IEnumerable<T>, then it more-than-likely is deferring execution. However, that doesn't mean that it's guaranteed, it just means it is more-than-likely.
What you are looking for is called "Deferred Execution". Statements that defer execution only run when you attempt to access the data. Statements such as ToList execute immediately, as the data is needed to transform it into a list.
Cast can wait until you actually access it, so it is a deferred statement.

Categories

Resources