A programming pattern like this comes up every so often:
int staleCount = 0;
fileUpdatesGridView.DataSource = MultiMerger.TargetIds
.Select(id =>
{
FileDatabaseMerger merger = MultiMerger.GetMerger(id);
if (merger.TargetIsStale)
staleCount++;
return new
{
Id = id,
IsStale = merger.TargetIsStale,
// ...
};
})
.ToList();
fileUpdatesGridView.DataBind();
fileUpdatesMergeButton.Enabled = staleCount > 0;
I'm not sure there is a more succinct way to code this?
Even if so, is it bad practice to do this?
No, it is not strictly "bad practice" (like constructing SQL queries with string concatenation of user input or using goto).
Sometimes such code is more readable than several queries/foreach or no-side-effect Aggregate call. Also it is good idea to at least try to write foreach and no-side-effect versions to see which one is more readable/easier to prove correctness.
Please note that:
it is frequently very hard to reason what/when will happen with such code. I.e. you sample hacks around the fact of LINQ queries executed lazily with .ToList() call, otherwise that value will not be computed.
pure functions can be run in parallel, once with side effects need a lot of care to do so
if you ever need to convert LINQ-to-Object to LINQ-to-SQL you have to rewrite such queries
generally LINQ queries favor functional programming style without side-effects (and hence by convention readers would not expect side-effects in the code).
Why not just code it like this:
var result=MultiMerger.TargetIds
.Select(id =>
{
FileDatabaseMerger merger = MultiMerger.GetMerger(id);
return new
{
Id = id,
IsStale = merger.TargetIsStale,
// ...
};
})
.ToList();
fileUpdatesGridView.DataSource = result;
fileUpdatesGridView.DataBind();
fileUpdatesMergeButton.Enabled = result.Any(r=>r.IsStale);
I would consider this a bad practice. You are making the assumption that the lambda expression is being forced to execute because you called ToList. That's an implementation detail of the current version of ToList. What if ToList in .NET 7.x is changed to return an object that semi-lazily converts the IQueryable? What if it's changed to run the lambda in parallel? All of a sudden you have concurrency issues on your staleCount. As far as I know, both of those are possibilities which would break your code because of bad assumptions your code is making.
Now as far as repeatedly calling MultiMerger.GetMerger with a single id, that really should be reworked to be a join as the logic for doing a join (w|c)ould be much more efficient than what you have coded there and would scale a lot better, especially if the implementation of MultiMerger is actually pulling data from a database (or might be changed to do so).
As far as calling ToList() before passing it to the Datasource, if the Datasource doesn't use all the fields in your new object, you would be (much) faster and take less memory to skip the ToList and let the datasource only pull the fields it needs. What you've done is highly couple the data to the exact requirements of the view, which should be avoided where possible. An example would be what if you all of a sudden need to display a field that exists in FileDatabaseMerger, but isn't in your current anonymous object? Now you have to make changes to both the controller and view to add it, where if you just passed in an IQueryable, you would only have to change the view. Again, faster, less memory, more flexible, and more maintainable.
Hope this helps.. And this question really should be posted of code review, not stackoverflow.
Update on further review, the following code would be much better:
var result=MultiMerger.GetMergersByIds(MultiMerger.TargetIds);
fileUpdatesGridView.DataSource = result;
fileUpdatesGridView.DataBind();
fileUpdatesMergeButton.Enabled = result.Any(r=>r.TargetIsStale);
or
var result=MultiMerger.GetMergers().Where(m=>MultiMerger.TargetIds.Contains(m.Id));
fileUpdatesGridView.DataSource = result;
fileUpdatesGridView.DataBind();
fileUpdatesMergeButton.Enabled = result.Any(r=>r.TargetIsStale);
Related
In C#, I have a collection of objects that I want to transform to a different type. This conversion, which I would like to do with LINQ Select(), requires multiple operations in sequence. To perform these operations, is it better to chain together multiple Select() queries like
resultSet.Select(stepOneDelegate).Select(stepTwoDelegate).Select(stepThreeDelegate);
or instead to perform these three steps in a single call?
resultSet.Select(item => stepThree(stepTwo(stepOne(item))));
Note: The three steps themselves are not necessarily functions. They are meant to be a concise demonstration of the problem. If that has an effect on the answer please include that information.
Any performance difference would be negligible, but the definitive answer to that is simply "test it". The question would be more around readability, which one is easier to understand and grasp what is going on.
Cases where I have needed to Project on a Projection would include when working with EF Linq expressions where I ultimately need to do something that isn't supported by EF Linq so I need to materialize a projection (usually to an anonymous type) then finish the expression before selecting the final output. In these cases you would need to use the first example.
Personally I'd probably just stick to the first scenario as to me it is easy to understand what is going on, and it easily supports additions for other operations such as filtering with Where or using OrderBy etc. The second scenario only really comes up when I'm building a composite model.
.Select(x => new OuterModel
{
Id = x.Id,
InnerModel = x.Inner.Select(i => new InnerModel
{
// ...
})
}) // ...
In most cases though this can be handled through Automapper.
I'd be wary of any code that I felt needed chaining a lot of Select expressions as it would smell like trying to do too much in one expression chain. Making something easy to understand, even if it involves a few extra steps and might add a few milliseconds to the execution is far better than having the risk of needing to track down bugs that someone introduced because they misunderstood potentially complex looking code.
I'm going through old projects at work trying to make them faster. I'm currently looking at some web APIs. One API is running particularly slow the problem is in the data service it is calling. Specifically it is in a lambda method trying to map a stored procedure result to a domain model. A simple version of the code.
public IEnumerable<DomainModelResult> GetData()
{
return this.EntityFrameworkDB.GetDataSproc().ToList()
.Select(sprocResults=>sprocResults.ToDomainModelResult())
.AsEnumerable();
}
This is a simplified version, but after profiling it I found the major hangup is in the lambda function. I am assuming this is because the EFContext is still open and some goofy entity framework stuff is happening.
Problem is I'm relatively new to Entity Framework(intern) and pretty ignorant to the inner workings of it. Could someone explain why this is so slow. I feel it should be very fast The DomainModelResult is a POCO and only setter methods are being used in ToDomainModelResult.
Edit:
I thought ToList() would do that but started to doubt myself because I couldn't think of another explanation. All the ToDomainModelResult() stuff is extremely simple. Something like.
public static DomainModelResult ToDomainModelResult(SprocResult source)
{
return new DomainModeResult
{
FirstName = source.description,
MiddleName = source._middlename,
LastName = source.lastname,
UserName = source.expr2,
Address = source.uglyName
};
}
Its just a bunch of simple setters, I think the model causing problems has 17 properties. The reason this is being done is because the project is old database first and the stored procedures have ugly names that aren't descriptive at all. Also so switching the stored procedures in dataservices is easy and doesn't break the rest of the project.
Edit:2 For some reason Using ToArray and breaking apart the linq statements makes the assignment from procedure result to domain model result extremely fast. Now the whole dataservice method is faster which is odd, I don't know where the rest of the time went.
This might be a more esoteric question than I originally thought. My question hasn't been answered but the problem is no longer there. Thanks to all the replied. I'm keeping this as unanswered for now.
Edit3: Please flag this question for removal I can't remove it. I found the problem but it is totally unrelated to my original question. I misunderstood the problem when I asked the question. The increase in speed I'm chalking up to compiler optimization and running code in the profiler. The real issues wasn't in my lambda but in a dynamic lambda called by entity framework when the context is closed or an object is accessed it was doing data validation. GetString, GetInt32, and ISDBNull were eating up the most time. So I'm assuming microsoft has optimized these methods and the only way to speed this up is possibly making some variable not nullable in the procedure. This question is misleading and so esoteric I don't think it belongs here and will just confuse people. Sorry.
You should split the code and check which one is taking time.
public IEnumerable<DomainModelResult> GetData()
{
var lst = this.EntityFrameworkDB.GetDataSproc().ToList();
return lst
.Select(sprocResults=>sprocResults.ToDomainModelResult())
.AsEnumerable();
}
I am pretty sure the GetDataSproc procedure is taking most of your time. You need to optimize the stored procedure code
Update
If possible, it is better to do more work on SQL side instead of retrieving 60,000 rows into your memory. Few possible solutions:
If you need to display this information, do paging (top and skip)
If you are doing any filtering or calculating or grouping anything after you retrieve rows in memory, do it in your stored proc
.Net side, as you are returning IEnumerable you may able to use yield on your second part, depends on your architecture
I'm looking to retrieve the last row of a table by the table's ID column. What I am currently using works:
var x = db.MyTable.OrderByDescending(d => d.ID).FirstOrDefault();
Is there any way to get the same result with more efficient speed?
I cannot see that this queries through the entire table.
Do you not have an index on the ID column?
Can you add the results of analysing the query to your question, because this is not how it should be.
As well as the analysis results, the SQL produced. I can't see how it would be anything other than select top 1 * from MyTable order by id desc only with explicit column names and some aliasing. Nor if there's an index on id how it's anything other than a scan on that index.
Edit: That promised explanation.
Linq gives us a set of common interfaces, and in the case of C# and VB.NET some keyword support, for a variety of operations upon sources which return 0 or more items (e.g. in-memory collections, database calls, parsing of XML documents, etc.).
This allows us to express similar tasks regardless of the underlying source. Your query for example includes the source, but we could do a more general form of:
public static YourType FinalItem(IQueryable<YourType> source)
{
return source.OrderByDesending(d => d.ID).FirstOrDefault();
}
Now, we could do:
IEnumerable<YourType> l = SomeCallThatGivesUsAList();
var x = FinalItem(db.MyTable);//same as your code.
var y = FinalItem(l);//item in list with highest id.
var z = FinalItem(db.MyTable.Where(d => d.ID % 10 == 0);//item with highest id that ends in zero.
But the really important part, is that while we've a means of defining the sort of operation we want done, we can have the actual implementation hidden from us.
The call to OrderByDescending produces an object that has information on its source, and the lambda function it will use in ordering.
The call to FirstOrDefault in turn has information on that, and uses it to obtain a result.
In the case with the list, the implementation is to produce the equivalent Enumerable-based code (Queryable and Enumerable mirror each other's public members, as do the interfaces they use such as IOrderedQueryable and IOrderedEnumerable and so on).
This is because, with a list that we don't know is already sorted in the order we care about (or in the opposite order), there isn't any faster way than to examine each element. The best we can hope for is an O(n) operation, and we might get an O(n log n) operation - depending on whether the implementation of the ordering is optimised for the possibility of only one item being taken from it*.
Or to put it another way, the best we could hand-code in code that only worked on enumerables is only slightly more efficient than:
public static YourType FinalItem(IEnumerable<YourType> source)
{
YourType highest = default(YourType);
int highestID = int.MinValue;
foreach(YourType item in source)
{
curID = item.ID;
if(highest == null || curID > highestID)
{
highest = item;
highestID = curID;
}
}
return highest;
}
We can do slightly better with some micro-opts on handling the enumerator directly, but only slightly and the extra complication would just make for less-good example code.
Since we can't do any better than that by hand, and since the linq code doesn't know anything more about the source than we do, that's the best we could possibly hope for it matching. It might do less well (again, depending on whether the special case of our only wanting one item was thought of or not), but it won't beat it.
However, this is not the only approach linq will ever take. It'll take a comparable approach with an in-memory enumerable source, but your source isn't such a thing.
db.MyTable represents a table. To enumerate through it gives us the results of an SQL query more or less equivalent to:
SELECT * FROM MyTable
However, db.MyTable.OrderByDescending(d => d.ID) is not the equivalent of calling that, and then ordering the results in memory. Because queries get processed as a whole when they are executed, we actually get the result of an SQL query more or less like:
SELECT * FROM MyTable ORDER BY id DESC
Finally, the entire query db.MyTable.OrderByDescending(d => d.ID).FirstOrDefault() results in a query like:
SELECT TOP 1 * FROM MyTable ORDER BY id DESC
Or
SELECT * FROM MyTable ORDER BY id DESC LIMIT 1
Depending upon what sort of database server you are using. Then the results get passed to code equivalent to the following ADO.NET-based code:
return dataReader.Read() ?
new MyType{ID = dataReader.GetInt32(0), dataReader.GetInt32(1), dataReader.GetString(2)}//or similar
: null;
You can't get much better.
And as for that SQL query. If there's an index on the id column (and since it looks like a primary key, there certainly should be), then that index will be used to very quickly find the row in question, rather than examining each row.
In all, because different linq providers use different means to fulfil the query, they can all try their best to do so in the best way possible. Of course, being in an imperfect world we'll no doubt find that some are better than others. What's more, they can even work to pick the best approach for different conditions. One example of this is that database-related providers can choose different SQL to take advantage of features of different versions of databases. Another is that the implementation of the the version of Count() that works with in memory enumerations works a bit like this;
public static int Count<T>(this IEnumerable<T> source)
{
var asCollT = source as ICollection<T>;
if(asCollT != null)
return asCollT.Count;
var asColl = source as ICollection;
if(asColl != null)
return asColl.Count;
int tally = 0;
foreach(T item in source)
++tally;
return tally;
}
This is one of the simpler cases (and a bit simplified again in my example here, I'm showing the idea not the actual code), but it shows the basic principle of code taking advantage of more efficient approaches when they're available - the O(1) length of arrays and the Count property on collections that is sometimes O(1) and it's not like we've made things worse in the cases where it's O(n) - and then when they aren't available falling back to a less efficient but still functional approach.
The result of all of this is that Linq tends to give very good bang for buck, in terms of performance.
Now, a decent coder should be able to match or beat its approach to any given case most of the time†, and even when Linq comes up with the perfect approach there are some overheads to it itself.
However, over the scope of an entire project, using Linq means that we can concisely create reasonably efficient code that relates to a relatively constrained number of well defined entities (generally one per table as far as databases go). In particular, the use of anonymous functions and joins means that we get queries that are very good. Consider:
var result = from a in db.Table1
join b in db.Table2
on a.relatedBs = b.id
select new {a.id, b.name};
Here we're ignoring columns we don't care about here, and the SQL produced will do the same. Consider what we would do if we were creating the objects that a and b relate to with hand-coded DAO classes:
Create a new class to represent this combination of a's id and b's name, and relevant code to run the query we need to produce instances.
Run a query to obtain all information about each a and the related b, and live with the waste.
Run a query to obtain the information on each a and b that we care of, and just set default values for the other fields.
Of these, option 2 will be wasteful, perhaps very wasteful. Option 3 will be a bit wasteful and very error prone (what if we accidentally try to use a field elsewhere in the code that wasn't set correctly?). Only option 1 will be more efficient than what the linq approach will produce, but this is just one case. Over a large project this could mean producing dozens or even hundreds or thousands of slightly different classes (and unlike the compiler, we won't necessarily spot the cases where they're actually the same). In practice, therefore, linq can do us some great favours when it comes to efficiency.
Good policies for efficient linq are:
Stay with the type of query you start with as long as you can. Whenever you grab items into memory with ToList() or ToArray etc, consider if you really need to. Unless you need to or you can clearly state the advantage doing so gives you, don't.
If you do need to move to processing in memory, favour AsEnumerable() over ToList() and the other means, so you only grab one at a time.
Examine long-running queries with SQLProfiler or similar. There are a handful of cases where policy 1 here is wrong and moving to memory with AsEnumerable() is actually better (most relate to uses of GroupBy that don't use aggregates on the non-grouped fields, and hence don't actually have a single SQL query they correspond with).
If a complicated query is hit many times, then CompiledQuery can help (less so with 4.5 since it has automatic optimisations that cover some of the cases it helps in), but it's normally better to leave that out of the first approach and then use it only in hot-spots that are efficiency problems.
You can get EF to run arbitrary SQL, but avoid it unless it's a strong gain because too much such code reduces the consistent readability using a linq approach throughout gives (I have to say though, I think Linq2SQL beats EF on calling stored procedures and even more so on calling UDFs, but even there this still applies - it's less clear from just looking at the code how things relate to each other).
*AFAIK, this particular optimisation isn't applied, but we're talking about the best possible implementation at this point, so it doesn't matter if it is, isn't, or is in some versions only.
†I'll admit though that Linq2SQL would often produce queries that use APPLY that I would not think of, as I was used to thinking of how to write queries in versions of SQLServer before 2005 introduced it, while code doesn't have those sort of human tendencies to go with old habits. It pretty much taught me how to use APPLY.
While I was programming I came up with this question,
What is better, having a method accept a single entity or a List of those entity's?
For example I need a List of strings. I can either have:
a method accepting a List and return a List of strings with the results.
List<string> results = methodwithlist(List[objects]);
or
a method accepting a object and return a string. Then use this function in a loop and so filling a list.
for int i = 0; i < List<objects>.Count;i++;)
{
results = methodwithsingleobject(List<objects>[i]);
}
** This is just a example. I need to know which one is better, or more used and why.
Thanks!
Well, it's easy to build the first form when you've got the second - but using LINQ, you really don't need to write your own, once you've got the projection. For example, you could write:
List<string> results = objectList.Select(X => MethodWithSingleObject()).ToList();
Generally it's easier to write and test a method which only deals with a single value, unless it actually needs to know the rest of the values in the collection (e.g. to find aggregates).
I would choose the second because it's easier to use when you have a single string (i.e. it's more general purpose). Also, the responsibility of the method itself is more clear because the method should not have anything to do with lists if it's purpose is just to modify a string.
Also, you can simplify the call with Linq:
result = yourList.Select(p => methodwithsingleobject(p));
This question comes up a lot when learning any language, the answer is somewhat moot since the standard coding practice is to rely upon LINQ to optimize the code for you at runtime. But this presumes you're using a version of the language that supports it. But if you do want to do some research on this there are a few Stack Overflow articles that delve into this and also give external resources to review:
In .NET, which loop runs faster, 'for' or 'foreach'?
C#, For Loops, and speed test... Exact same loop faster second time around?
What I have learned, though, is not to rely too heavily on Count and to use Length on typed Collections as that can be a lot faster.
Hope this is helpful.
So in the following bit of code, I really like the verbosity of scenario one, but I would like to know how big of a performance hit this takes when compared to scenario two. Is instantiation in the loop a big deal?
Is the syntactic benefit (which I like, some might not even agree with it being verbose or optimal) worth said performance hit? You can assume that the collections remain reasonably small (N < a few hundred).
// First scenario
var productCategoryModels = new List<ProductCategoryModel>();
foreach (var productCategory in productCategories)
{
var model = new ProductCategoryModel.ProductCategoryModelConverter(currentContext).Convert(productCategory);
productCategoryModels.Add(model);
}
// Second scenario
var productCategoryModels = new List<ProductCategoryModel>();
var modelConvert = new ProductCategoryModel.ProductCategoryModelConverter(currentContext);
foreach (var productCategory in productCategories)
{
var model = modelConvert.Convert(productCategory);
productCategoryModels.Add(model);
}
Would love to hear your guys' thoughts on this as I see this quite often.
I would approach this questions a bit differently. If whatever happens in new ProductCategoryModel.ProductCategoryModelConverter(currentContext) doesn't change during the loop, I see no reason to include it within the loop. If it is not part of the loop, it shouldn't be in there imo.
If you include it just because it looks better, you force the reader to figure out if it makes a difference or not.
Like Brian, I see no reason to create a new instance if you're not actually changing it - I'm assuming that Convert doesn't change the original object?
I'd recommend using LINQ to avoid the loop entirely though (in terms of your own code):
var modelConverter = new ProductCategoryModelConverter(currentContext);
var models = productCategories.Select(x => modelConverter.Convert(x))
.ToList();
In terms of performance, it would depend on what ProductCategoryModelConverter's constructor had to do. Just creating a new object is pretty cheap, in terms of overhead. It's not free, of course - but I wouldn't expect this to cause a bottleneck in most cases. However, I'd say that instantiating it in a loop would imply to the reader that it's necessary; that there was some reason not to use just a single object. A converter certainly sounds like something which will stay immutable as it does its work... so I'd be puzzled by the instantiate-in-a-loop version.
Keep whichever of the formats you're happiest with - if they both perform well enough for your application. Optimize later if you encounter performance problems.