First() taking time in linq - c#

1) When i use without First() it's take 8ms sec
IEnumerable<string> Discriptionlist = (from lib in ProgramsData.Descendants("program")
where lib.Attribute("TMSId").Value == TMSIds
select lib.Element("descriptions").Element("desc").Value);
2) With First() it's take 248ms sec
string Discriptionlist = (from lib in ProgramsData.Descendants("program")
where lib.Attribute("TMSId").Value == TMSIds
select lib.Element("descriptions").Element("desc").Value).First();
Data reading use
using (var sr = new StreamReader(FilePath))
{
Xdoc = XDocument.Load(sr);
}
Any solution or another way for reducing the time (It take less than 248ms ) and also get the result in a string.? Thank you.

The first statement just creates an IEnumerable, the actual query runs only when you start enumerating. The second statement runs the enumeration, that's why it's slower.
You'll notice the same thing with the same statement if you run this:
string DiscriptionListStr;
foreach(var a in Discriptionlist)
{
DiscriptionListStr = a;
break;
}

Linq uses a "feature" called lazy loading. What this means in practice is that in certain cases a linq expression will not actually do anything. It is just ready to do something when asked. So you have ask for an element it will then perform the action to get the next element at that time.
Since your first statement does not ask for an element the database is not even queried. In your second you ask for the First element the query has to run.

Related

Slow LINQ Performance on DataTable Where Clause?

I'm dumping a table out of MySQL into a DataTable object using MySqlDataAdapter. Database input and output is doing fine, but my application code seems to have a performance issue I was able to track down to a specific LINQ statement.
The goal is simple, search the contents of the DataTable for a column value matching a specific string, just like a traditional WHERE column = 'text' SQL clause.
Simplified code:
foreach (String someValue in someList) {
String searchCode = OutOfScopeFunction(someValue);
var results = emoteTable.AsEnumerable()
.Where(myRow => myRow.Field<String>("code") == searchCode)
.Take(1);
if (results.Any()) {
results.First()["columnname"] = 10;
}
}
This simplified code is executed thousands of times, once for each entry in someList. When I run Visual Studio Performance Profiler I see that the "results.Any()" line is highlighted as consuming 93.5% of the execution time.
I've tried several different methods for optimizing this code, but none have improved performance while keeping the emoteTable DataTable as the primary source of the data. I can convert emoteTable to Dictionary<String, DataRow> outside of the foreach, but then I have to keep the DataTable and the Dictionary in sync, which while still a performance improvement, feels wrong.
Three questions:
Is this the proper way to search for a value in a DataTable (equivalent of a traditional SQL WHERE clause)? If not, how SHOULD it be done?
Addendum to 1, regardless of the proper way, what is the fastest (execution time)?
Why does the results.Any() line consume 90%+ resources? In this situation it makes more sense that the var results line should consume the resources, after all, it's the line doing the actual search, right?
Thank you for your time. If I find an answer I shall post it here as well.
Any() is taking 90% of the time because the result is only executed when you call Any(). Before you call Any(), the query is not actually made.
It would seem the problem is that you first fetch entire table into the memory and then search. You should instruct your database to search.
Moreover, when you call results.First(), the whole results query is executed again.
With deferred execution in mind, you should write something like
var result = emoteTable.AsEnumerable()
.Where(myRow => myRow.Field<String>("code") == searchCode)
.FirstOrDefault();
if (result != null) {
result["columnname"] = 10;
}
What you have implemented is pretty much join :
var searchCodes = someList.Select(OutOfScopeFunction);
var emotes = emoteTable.AsEnumerable();
var results = Enumerable.Join(emotes, searchCodes, e=>e, sc=>sc.Field<String>("code"), (e, sc)=>sc);
foreach(var result in results)
{
result["columnname"] = 10;
}
Join will probably optimize the access to both lists using some kind of lookup.
But first thing I would do is to completely abandon idea of combining DataTable and LINQ. They are two different technologies and trying to assert what they might do inside when combined is hard.
Did you try doing raw UPDATE calls? How many items are you expecting to update?

C# Linq - Delayed Execution

If I build a query say:
(the query is build using XDocument class from System.Xml.Linq)
var elements = from e in calendarDocument.Root.Elements("elementName") select e;
and then I call elements.Last() SEVERAL times. Will each call return the most up to date Last() element?
For example if I do
elements.Last().AddAfterSelf(new XElement("elementName", "someValue1"));
elements.Last().AddAfterSelf(new XElement("elementName", "someValue2"));
elements.Last().AddAfterSelf(new XElement("elementName", "someValue3"));
elements.Last().AddAfterSelf(new XElement("elementName", "someValue4"));
Is it actually getting the latest element each time and added a new one to the end or is elements.Last() the same element each time?
Yes, linq queries are lazy evaluated. It's not until you call Last() that the query will be executed. In this case it will get the most up to date last element each time.
I think that this actually is the first time I've seen a proper use of calling Last() (or any other operator that executes the query) several times. When people do it, it is usually by mistake causing bad performance.
I think that the linq-to-xml library is smart enough to get good performance for your query, but I wouldn't trust it without trying.

Performance issue with Linq to object taking more time

I have an xml code which is getting updated based on the object value. The 'foreach' loop here is taking almost 12-15 minutes to fetch a 200 kb xml file. Please suggest how I can improve the performance. (the xml file consists of a four leveled tag in which the child (4th level) tags are each 10 in number)
Code:
IEnumerable<XElement> elements = xmlDoc.Descendants();
foreach (DataSource Data in DataLst)
{
XElement xmlElem = (from xmlData in elements
where Data.Name == xmlData.Name.LocalName //Name
&& Data.Store == xmlData.Element(XName.Get("Store", "")).Value
&& Data.Section == xmlData.Element(XName.Get("Section", "")).Value
select xmlData.Element(XName.Get("Val", ""))).Single();
xmlElem.ReplaceWith(new XElement(XName.Get("Val", ""), Data.Value));
}
It looks like you have an O(n)×O(m) issue here, for n = size of DataList and m = size of the xml. To make this O(n)+O(m), you should index the data; for example:
var lookup = elements.ToLookup(
x => new {
Name = x.Name.LocalName,
Store = x.Element(XName.Get("Store", "")).Value,
Section = x.Element(XName.Get("Section", "")).Value},
x => x.Element(XName.Get("Val", ""))
);
foreach (DataSource Data in DataLst)
{
XElement xmlElem = lookup[
new {Data.Name, Data.Store, Data.Section}].Single();
xmlElem.ReplaceWith(new XElement(XName.Get("Val", ""), Data.Value));
}
(untested - to show general approach only)
i think better approach would be to Deserialize XML to C# Classes and then use LINQ on that, should be fast.
"Well thanks everyone for your precious time and effort"
Problem answer: Actually the object 'DataLst' was of the type IEnumerable<> which was taking time in obtaining the values but after I changed it to the List<> type the performance improved drastically (now running in 20 seconds)
If it really takes this long to run this, then maybe do something like this:
Don't iterate both - only iterate the XML-File and load the Data from your DataLst (make a SQL-command or simple linq-statement to load the data based on Name/Store/Section), make a simple struct/class for your key with this data (Name/Store/Section) - don't forget to implement equals, and GetHashCode
iterate through your XML-Elements and use the dictionary to find the values to replace
This way you will only iterate the XML-File once not once for every data in your DataSource.
It's not clear why it's taking that long - that's a very long time. How many elements are in DataLst? I would rewrite the query for simplicity to start with though:
IEnumerable<XElement> elements = xmlDoc.Descendants();
foreach (DataSource data in DataLst)
{
XElement valElement = (from element in xmlDoc.Descendants(data.Name)
where data.Store == element.Element("Store").Value
&& data.Section == element.Element("Section").Value
select element.Element("Val")).Single();
valElement.ReplaceWith(new XElement("Val"), data.Value));
}
(I'm assuming none of your elements actually have namespaces, by the way.)
Next up: consider replacing the contents of valElement instead of replacing the element itself. Change it to:
valElement.ReplaceAll(data.Value);
Now, this has all been trying to keep to the simplicity of avoiding precomputation etc... because it sounds like it shouldn't take this long. However, you may need to build lookups as Marc and Carsten suggested.
Try by replacing Single() call in the LINQ with First().
At the risk of flaming, have you considered writing this in XQuery instead? There's a good chance that a decent XQuery processor would have a join optimiser that handles this query efficiently.

Which is faster in .NET, .Contains() or .Count()?

I want to compare an array of modified records against a list of records pulled from the database, and delete those records from the database that do not exist in the incoming array. The modified array comes from a client app that maintains the database, and this code runs in a WCF service app, so if the client deletes a record from the array, that record should be deleted from the database. Here's the sample code snippet:
public void UpdateRecords(Record[] recs)
{
// look for deleted records
foreach (Record rec in UnitOfWork.Records.ToList())
{
var copy = rec;
if (!recs.Contains(rec)) // use this one?
if (0 == recs.Count(p => p.Id == copy.Id)) // or this one?
{
// if not in the new collection, remove from database
Record deleted = UnitOfWork.Records.Single(p => p.Id == copy.Id);
UnitOfWork.Remove(deleted);
}
}
// rest of method code deleted
}
My question: is there a speed advantage (or other advantage) to using the Count method over the Contains method? the Id property is guaranteed to be unique and to identify that particular record, so you don't need to do a bitwise compare, as I assume Contains might do.
Anyone?
Thanks, Dave
This would be faster:
if (!recs.Any(p => p.Id == copy.Id))
This has the same advantages as using Count() - but it also stops after it finds the first match unlike Count()
You should not even consider Count since you are only checking for the existence of a record. You should use Any instead.
Using Count forces to iterate the entire enumerable to get the correct count, Any stops enumerating as soon as you found the first element.
As for the use of Contains you need to take in consideration if for the specified type reference equality is equivalent to the Id comparison you are performing. Which by default it is not.
Assuming Record implements both GetHashCode and Equals properly, I'd use a different approach altogether:
// I'm assuming it's appropriate to pull down all the records from the database
// to start with, as you're already doing it.
foreach (Record recordToDelete in UnitOfWork.Records.ToList().Except(recs))
{
UnitOfWork.Remove(recordToDelete);
}
Basically there's no need to have an N * M lookup time - the above code will end up building a set of records from recs based on their hash code, and find non-matches rather more efficiently than the original code.
If you've actually got more to do, you could use:
HashSet<Record> recordSet = new HashSet<Record>(recs);
foreach (Record recordFromDb in UnitOfWork.Records.ToList())
{
if (!recordSet.Contains(recordFromDb))
{
UnitOfWork.Remove(recordFromDb);
}
else
{
// Do other stuff
}
}
(I'm not quite sure why your original code is refetching the record from the database using Single when you've already got it as rec...)
Contains() is going to use Equals() against your objects. If you have not overridden this method, it's even possible Contains() is returning incorrect results. If you have overridden it to use the object's Id to determine identity, then in that case Count() and Contains() are almost doing the exact same thing. Except Contains() will short circuit as soon as it hits a match, where as Count() will keep on counting. Any() might be a better choice than both of them.
Do you know for certain this is a bottleneck in your app? It feels like premature optimization to me. Which is the root of all evil, you know :)
Since you're guarenteed that there will be 1 and only 1, Any might be faster. Because as soon as it finds a record that matches it will return true.
Count will traverse the entire list counting each occurrence. So if the item is #1 in the list of 1000 items, it's going to check each of the 1000.
EDIT
Also, this might be a time to mention not doing a premature optimization.
Wire up both your methods, put a stopwatch before and after each one.
Create a sufficiently large list (1000 items or more, depending on your domain.) And see which one is faster.
My guess is that we're talking on the order of ms here.
I'm all for writing efficient code, just make sure you're not taking hours to save 5 ms on a method that gets called twice a day.
It would be so:
UnitOfWork.Records.RemoveAll(r => !recs.Any(rec => rec.Id == r.Id));
May I suggest an alternative approach that should be faster I believe since count would continue even after the first match.
public void UpdateRecords(Record[] recs)
{
// look for deleted records
foreach (Record rec in UnitOfWork.Records.ToList())
{
var copy = rec;
if (!recs.Any(x => x.Id == copy.Id)
{
// if not in the new collection, remove from database
Record deleted = UnitOfWork.Records.Single(p => p.Id == copy.Id);
UnitOfWork.Remove(deleted);
}
}
// rest of method code deleted
}
That way you are sure to break on the first match instead of continue to count.
If you need to know the actual number of elements, use Count(); it's the only way. If you are checking for the existence of a matching record, use Any() or Contains(). Both are MUCH faster than Count(), and both will perform about the same, but Contains will do an equality check on the entire object while Any() will evaluate a lambda predicate based on the object.

Linq to Entities 4.0 - Optimized Query and multiple calls to database in a single query

Can anyone tell me if the following query calls the database multiple times or just once?
var bizItems = new
{
items = (
from bl in db.rc_BusinessLocations
orderby bl.rc_BusinessProfiles.BusinessName
select new BusinessLocationItem
{
BusinessLocation = bl,
BusinessProfile = bl.rc_BusinessProfiles,
Products = bl.rc_InventoryProducts_Business
})
.Skip(pageSkip).Take(pageSize),
Count = db.rc_BusinessLocations.Count()
};
I really need to get the Count() out of the query and I couldn't find another way to do it so if you have some better optimized code, feel free to share it!
Thanks in advance!
Gwar
It totally depends on what you are doing with the bizItems variable, because after you've ran just this code, only a COUNT(*) query will have ran. This is because the item contains an IQueryable which is a description of a query (an intent to run), not the result of the operation. The query will only run when you start iterating this query, by using foreach or an operator such as .Count(). Besides this, the BusinessProfile and Products properties will probably also contain IQueryables.
So, let's take a look at what you might do with this code:
foreach (var item in bizItems.items)
{
Console.WriteLine(item.BusinessLocation.City);
foreach (var profile in item.BusinessProfile)
{
Console.WriteLine(profile.Name);
}
foreach (var product in item.Products)
{
Console.WriteLine(product.Price);
}
Console.WriteLine(item.Count);
Console.WriteLine();
}
So, if you ask me again, looking at this code, how many queries will be sent to the database, my answer is: 2 + 2 * the number of items in bizItems.items. So the number of queries will be between 2 and (2 + 2 * pageSize).
You should check out Scott Guthrie's post on using the LINQ to SQL Debug Visualizer. This will allow you to see exactly the SQL that will be generated from your LINQ statement.
This will definitely result in two calls.

Categories

Resources