Performance issue with Linq to object taking more time - c#

I have an xml code which is getting updated based on the object value. The 'foreach' loop here is taking almost 12-15 minutes to fetch a 200 kb xml file. Please suggest how I can improve the performance. (the xml file consists of a four leveled tag in which the child (4th level) tags are each 10 in number)
Code:
IEnumerable<XElement> elements = xmlDoc.Descendants();
foreach (DataSource Data in DataLst)
{
XElement xmlElem = (from xmlData in elements
where Data.Name == xmlData.Name.LocalName //Name
&& Data.Store == xmlData.Element(XName.Get("Store", "")).Value
&& Data.Section == xmlData.Element(XName.Get("Section", "")).Value
select xmlData.Element(XName.Get("Val", ""))).Single();
xmlElem.ReplaceWith(new XElement(XName.Get("Val", ""), Data.Value));
}

It looks like you have an O(n)×O(m) issue here, for n = size of DataList and m = size of the xml. To make this O(n)+O(m), you should index the data; for example:
var lookup = elements.ToLookup(
x => new {
Name = x.Name.LocalName,
Store = x.Element(XName.Get("Store", "")).Value,
Section = x.Element(XName.Get("Section", "")).Value},
x => x.Element(XName.Get("Val", ""))
);
foreach (DataSource Data in DataLst)
{
XElement xmlElem = lookup[
new {Data.Name, Data.Store, Data.Section}].Single();
xmlElem.ReplaceWith(new XElement(XName.Get("Val", ""), Data.Value));
}
(untested - to show general approach only)

i think better approach would be to Deserialize XML to C# Classes and then use LINQ on that, should be fast.

"Well thanks everyone for your precious time and effort"
Problem answer: Actually the object 'DataLst' was of the type IEnumerable<> which was taking time in obtaining the values but after I changed it to the List<> type the performance improved drastically (now running in 20 seconds)

If it really takes this long to run this, then maybe do something like this:
Don't iterate both - only iterate the XML-File and load the Data from your DataLst (make a SQL-command or simple linq-statement to load the data based on Name/Store/Section), make a simple struct/class for your key with this data (Name/Store/Section) - don't forget to implement equals, and GetHashCode
iterate through your XML-Elements and use the dictionary to find the values to replace
This way you will only iterate the XML-File once not once for every data in your DataSource.

It's not clear why it's taking that long - that's a very long time. How many elements are in DataLst? I would rewrite the query for simplicity to start with though:
IEnumerable<XElement> elements = xmlDoc.Descendants();
foreach (DataSource data in DataLst)
{
XElement valElement = (from element in xmlDoc.Descendants(data.Name)
where data.Store == element.Element("Store").Value
&& data.Section == element.Element("Section").Value
select element.Element("Val")).Single();
valElement.ReplaceWith(new XElement("Val"), data.Value));
}
(I'm assuming none of your elements actually have namespaces, by the way.)
Next up: consider replacing the contents of valElement instead of replacing the element itself. Change it to:
valElement.ReplaceAll(data.Value);
Now, this has all been trying to keep to the simplicity of avoiding precomputation etc... because it sounds like it shouldn't take this long. However, you may need to build lookups as Marc and Carsten suggested.

Try by replacing Single() call in the LINQ with First().

At the risk of flaming, have you considered writing this in XQuery instead? There's a good chance that a decent XQuery processor would have a join optimiser that handles this query efficiently.

Related

First() taking time in linq

1) When i use without First() it's take 8ms sec
IEnumerable<string> Discriptionlist = (from lib in ProgramsData.Descendants("program")
where lib.Attribute("TMSId").Value == TMSIds
select lib.Element("descriptions").Element("desc").Value);
2) With First() it's take 248ms sec
string Discriptionlist = (from lib in ProgramsData.Descendants("program")
where lib.Attribute("TMSId").Value == TMSIds
select lib.Element("descriptions").Element("desc").Value).First();
Data reading use
using (var sr = new StreamReader(FilePath))
{
Xdoc = XDocument.Load(sr);
}
Any solution or another way for reducing the time (It take less than 248ms ) and also get the result in a string.? Thank you.
The first statement just creates an IEnumerable, the actual query runs only when you start enumerating. The second statement runs the enumeration, that's why it's slower.
You'll notice the same thing with the same statement if you run this:
string DiscriptionListStr;
foreach(var a in Discriptionlist)
{
DiscriptionListStr = a;
break;
}
Linq uses a "feature" called lazy loading. What this means in practice is that in certain cases a linq expression will not actually do anything. It is just ready to do something when asked. So you have ask for an element it will then perform the action to get the next element at that time.
Since your first statement does not ask for an element the database is not even queried. In your second you ask for the First element the query has to run.

LINQ version of looping through array and performing multiple types of replacements

I have an array of strings defined called currentRow. I have a loop that looks something like this:
for (int index = 0; index < currentRow.Length; index++)
{
if (currentRow[index] == string.Empty)
{
currentRow[index] = null;
}
if (currentRow[index] == "''")
{
currentRow[index] = string.Empty;
}
}
If I were to do this as a LINQ query instead, what would it look like? Am I better off using a loop instead for this? It seems like with a LINQ query I would have to create multiple copies of the array.
If I were to do this as a LINQ query instead, what would it look like?
If you want to project to a new array (and optionally overwrite the existing reference) it would look like:
currentRow.Select(r => r == string.Empty ? null :
r == "''" ? string.Empty : r)
.ToArray();
If you wanted to use Linq to modify the original collection it would look like a evil lambda with side-effects too horrible to utter in this place.
Am I better off using a loop instead for this?
Probably. You avoid the creation of a new array.
It seems like with a LINQ query I would have to create multiple copies of the array.
No, just one additional copy. And if you overwrite the original reference (and if nothing else has a reference to it) it would get garbage collected as necessary.
You need projection, not selection.
It's important to remember that LINQ is designed for querying and not actually updating existing values.
If you wanted to do what you are indicating, you could use LINQ to create a projection of your existing collection that would map your values to others via a Select() statement :
// This would project each element from your currentRow array and set its value to
// either null (if it was empty), the empty string (if it was just single quotes) or
// use its original value.
var output = currentRow.Select(x => x == "" ? null : x == "''" ? "" : x).ToArray();
Would a loop be a better option?
There isn't anything wrong with your current approach. It wouldn't require the creation of an entirely separate array to store your new values in (via a projection). I know that it might not look as concise as a LINQ statement, it still works and is quite readable (unlike how some LINQ queries can become).

Slow LINQ Performance on DataTable Where Clause?

I'm dumping a table out of MySQL into a DataTable object using MySqlDataAdapter. Database input and output is doing fine, but my application code seems to have a performance issue I was able to track down to a specific LINQ statement.
The goal is simple, search the contents of the DataTable for a column value matching a specific string, just like a traditional WHERE column = 'text' SQL clause.
Simplified code:
foreach (String someValue in someList) {
String searchCode = OutOfScopeFunction(someValue);
var results = emoteTable.AsEnumerable()
.Where(myRow => myRow.Field<String>("code") == searchCode)
.Take(1);
if (results.Any()) {
results.First()["columnname"] = 10;
}
}
This simplified code is executed thousands of times, once for each entry in someList. When I run Visual Studio Performance Profiler I see that the "results.Any()" line is highlighted as consuming 93.5% of the execution time.
I've tried several different methods for optimizing this code, but none have improved performance while keeping the emoteTable DataTable as the primary source of the data. I can convert emoteTable to Dictionary<String, DataRow> outside of the foreach, but then I have to keep the DataTable and the Dictionary in sync, which while still a performance improvement, feels wrong.
Three questions:
Is this the proper way to search for a value in a DataTable (equivalent of a traditional SQL WHERE clause)? If not, how SHOULD it be done?
Addendum to 1, regardless of the proper way, what is the fastest (execution time)?
Why does the results.Any() line consume 90%+ resources? In this situation it makes more sense that the var results line should consume the resources, after all, it's the line doing the actual search, right?
Thank you for your time. If I find an answer I shall post it here as well.
Any() is taking 90% of the time because the result is only executed when you call Any(). Before you call Any(), the query is not actually made.
It would seem the problem is that you first fetch entire table into the memory and then search. You should instruct your database to search.
Moreover, when you call results.First(), the whole results query is executed again.
With deferred execution in mind, you should write something like
var result = emoteTable.AsEnumerable()
.Where(myRow => myRow.Field<String>("code") == searchCode)
.FirstOrDefault();
if (result != null) {
result["columnname"] = 10;
}
What you have implemented is pretty much join :
var searchCodes = someList.Select(OutOfScopeFunction);
var emotes = emoteTable.AsEnumerable();
var results = Enumerable.Join(emotes, searchCodes, e=>e, sc=>sc.Field<String>("code"), (e, sc)=>sc);
foreach(var result in results)
{
result["columnname"] = 10;
}
Join will probably optimize the access to both lists using some kind of lookup.
But first thing I would do is to completely abandon idea of combining DataTable and LINQ. They are two different technologies and trying to assert what they might do inside when combined is hard.
Did you try doing raw UPDATE calls? How many items are you expecting to update?

Optimize XML Reading

I have an event that occurs over 50 000 times. Each time this event occurs, I need to go look into an xml file and grab some data. This xml file has over 20 000 possible values.
Before needing to read my xml, the 50 000 event's other work was very fast. As soon as I implemented the xml read, it slowed down the whole operation very significantly.
To read the XML, I am doing this:
XElement xelement = XElement.Load(SomePath);
IEnumerable<XElement> elements = xelement.Elements();
foreach (var element in elements)
{
if (element.Element("SomeNode") == value)
{
valueToSet = element.Element("SomeOtherNode").Value
break;
}
}
What can I do to optimize this?
It sounds to me like you should load the file once, preprocess it to a Dictionary<string, String>, and then just perform a lookup in that dictionary each time the event occurs. You haven't given us enough context to say exactly where you'd cache the dictionary, but that's unlikely to be particularly hard.
The loading and converting to a dictionary is really simple:
var dictionary = XElement.Load(SomePath)
.Elements()
.ToDictionary(x => x.Element("SomeNode").Value,
x => x.Element("SomeOtherNode").Value);
That's assuming that all of the secondary elements contains both a SomeNode element and a SomeOtherNode element... if that's not the case, you'd need to do some filtering first.
If you always need to load the (refreshed?) file each time, you may want to consider the XStreamingElement so that you don't have to load the entire document upfront (as XElement.load does). With XElement.Load, you load the entire XML into memory and then iterate over that. With the XStreaminigElement, you only bring the amount of the document you need into memory. In addition, to then get the first match, use .FirstOrDefault.
If you don't need to constantly refresh the document, consider a caching mechanism like the one #jonskeet offered.

Which is faster in .NET, .Contains() or .Count()?

I want to compare an array of modified records against a list of records pulled from the database, and delete those records from the database that do not exist in the incoming array. The modified array comes from a client app that maintains the database, and this code runs in a WCF service app, so if the client deletes a record from the array, that record should be deleted from the database. Here's the sample code snippet:
public void UpdateRecords(Record[] recs)
{
// look for deleted records
foreach (Record rec in UnitOfWork.Records.ToList())
{
var copy = rec;
if (!recs.Contains(rec)) // use this one?
if (0 == recs.Count(p => p.Id == copy.Id)) // or this one?
{
// if not in the new collection, remove from database
Record deleted = UnitOfWork.Records.Single(p => p.Id == copy.Id);
UnitOfWork.Remove(deleted);
}
}
// rest of method code deleted
}
My question: is there a speed advantage (or other advantage) to using the Count method over the Contains method? the Id property is guaranteed to be unique and to identify that particular record, so you don't need to do a bitwise compare, as I assume Contains might do.
Anyone?
Thanks, Dave
This would be faster:
if (!recs.Any(p => p.Id == copy.Id))
This has the same advantages as using Count() - but it also stops after it finds the first match unlike Count()
You should not even consider Count since you are only checking for the existence of a record. You should use Any instead.
Using Count forces to iterate the entire enumerable to get the correct count, Any stops enumerating as soon as you found the first element.
As for the use of Contains you need to take in consideration if for the specified type reference equality is equivalent to the Id comparison you are performing. Which by default it is not.
Assuming Record implements both GetHashCode and Equals properly, I'd use a different approach altogether:
// I'm assuming it's appropriate to pull down all the records from the database
// to start with, as you're already doing it.
foreach (Record recordToDelete in UnitOfWork.Records.ToList().Except(recs))
{
UnitOfWork.Remove(recordToDelete);
}
Basically there's no need to have an N * M lookup time - the above code will end up building a set of records from recs based on their hash code, and find non-matches rather more efficiently than the original code.
If you've actually got more to do, you could use:
HashSet<Record> recordSet = new HashSet<Record>(recs);
foreach (Record recordFromDb in UnitOfWork.Records.ToList())
{
if (!recordSet.Contains(recordFromDb))
{
UnitOfWork.Remove(recordFromDb);
}
else
{
// Do other stuff
}
}
(I'm not quite sure why your original code is refetching the record from the database using Single when you've already got it as rec...)
Contains() is going to use Equals() against your objects. If you have not overridden this method, it's even possible Contains() is returning incorrect results. If you have overridden it to use the object's Id to determine identity, then in that case Count() and Contains() are almost doing the exact same thing. Except Contains() will short circuit as soon as it hits a match, where as Count() will keep on counting. Any() might be a better choice than both of them.
Do you know for certain this is a bottleneck in your app? It feels like premature optimization to me. Which is the root of all evil, you know :)
Since you're guarenteed that there will be 1 and only 1, Any might be faster. Because as soon as it finds a record that matches it will return true.
Count will traverse the entire list counting each occurrence. So if the item is #1 in the list of 1000 items, it's going to check each of the 1000.
EDIT
Also, this might be a time to mention not doing a premature optimization.
Wire up both your methods, put a stopwatch before and after each one.
Create a sufficiently large list (1000 items or more, depending on your domain.) And see which one is faster.
My guess is that we're talking on the order of ms here.
I'm all for writing efficient code, just make sure you're not taking hours to save 5 ms on a method that gets called twice a day.
It would be so:
UnitOfWork.Records.RemoveAll(r => !recs.Any(rec => rec.Id == r.Id));
May I suggest an alternative approach that should be faster I believe since count would continue even after the first match.
public void UpdateRecords(Record[] recs)
{
// look for deleted records
foreach (Record rec in UnitOfWork.Records.ToList())
{
var copy = rec;
if (!recs.Any(x => x.Id == copy.Id)
{
// if not in the new collection, remove from database
Record deleted = UnitOfWork.Records.Single(p => p.Id == copy.Id);
UnitOfWork.Remove(deleted);
}
}
// rest of method code deleted
}
That way you are sure to break on the first match instead of continue to count.
If you need to know the actual number of elements, use Count(); it's the only way. If you are checking for the existence of a matching record, use Any() or Contains(). Both are MUCH faster than Count(), and both will perform about the same, but Contains will do an equality check on the entire object while Any() will evaluate a lambda predicate based on the object.

Categories

Resources