I have a performance problem on certain computers with the following query:
System.Diagnostics.EventLog log = new System.Diagnostics.EventLog("Application");
var entries = log.Entries
.Cast<System.Diagnostics.EventLogEntry>()
.Where(x => x.EntryType == System.Diagnostics.EventLogEntryType.Error)
.OrderByDescending(x => x.TimeGenerated)
.Take(cutoff)
.Select(x => new
{
x.Index,
x.TimeGenerated,
x.EntryType,
x.Source,
x.InstanceId,
x.Message
}).ToList();
Apparently ToList() can be quite slow in certain queries, but with what should I replace it?
log.Entries collection works like this: it is aware about total number of events (log.Entries.Count) and when you access individual element - it makes a query to get that element.
That means when you enumerate over whole Entries collection - it will query for each individual element, so there will be Count queries. And structure of your LINQ query (for example, OrderBy) forces full enumeration of that collection. As you already know - it's very inefficient.
Much more efficient might be to query only log entries you need. For that you can use EventLogQuery class. Suppose you have simple class to hold event info details:
private class EventLogInfo {
public int Id { get; set; }
public string Source { get; set; }
public string Message { get; set; }
public DateTime? Timestamp { get; set; }
}
Then you can convert your inefficient LINQ query like this:
// query Application log, only entries with Level = 2 (that's error)
var query = new EventLogQuery("Application", PathType.LogName, "*[System/Level=2]");
// reverse default sort, by default it sorts oldest first
// but we need newest first (OrderByDescending(x => x.TimeGenerated)
query.ReverseDirection = true;
var events = new List<EventLogInfo>();
// analog of Take
int cutoff = 100;
using (var reader = new EventLogReader(query)) {
while (true) {
using (var next = reader.ReadEvent()) {
if (next == null)
// we are done, no more events
break;
events.Add(new EventLogInfo {
Id = next.Id,
Source = next.ProviderName,
Timestamp = next.TimeCreated,
Message = next.FormatDescription()
});
cutoff--;
if (cutoff == 0)
// we are done, took as much as we need
break;
}
}
}
It will be 10-100 times faster. However, this API is more low-level and returns instances of EventRecord (and not EventLogEntry), so for some information there might be different ways to obtain it (compared to EventLogEntry).
If you decide that you absolutely must use log.Entries and EventLogEntry, then at least enumerate Entries backwards. That's because newest events are in the end (its sorted by timestamp ascending), and you need top X errors by timestamp descended.
EventLog log = new System.Diagnostics.EventLog("Application");
int cutoff = 100;
var events = new List<EventLogEntry>();
for (int i = log.Entries.Count - 1; i >= 0; i--) {
// note that line below might throw ArgumentException
// if, for example, entries were deleted in the middle
// of our loop. That's rare condition, but robust code should handle it
var next = log.Entries[i];
if (next.EntryType == EventLogEntryType.Error) {
// add what you need here
events.Add(next);
// got as much as we need, break
if (events.Count == cutoff)
break;
}
}
That is less efficient, but still should be 10 times faster than your current approach. Note that it's faster because Entries collection is not materialized in memory. Individual elements are queried when you access them, and when enumerating backwards in your specific case - there is high chance to query much less elements.
Related
I am working with an XML standard called SDMX. It's fairly complicated but I'll make it as short as possible. I am receiving an object called CategoryScheme. This object can contain a number of Category, and each Category can contain more Category, and so on, the chain can be infinite. Every Category has an unique ID.
Usually each Category contains a lot of Categories. Together with this object I am receiving an Array, that contains the list of IDs that indicates where a specific Category is nested, and then I am receiving the ID of that category.
What I need to do is to create an object that maintains the hierarchy of the Category objects, but each Category must have only one child and that child has to be the one of the tree that leads to the specific Category.
So I had an idea, but in order to do this I should generate LINQ queries inside a cycle, and I have no clue how to do this. More information of what I wanted to try is commented inside the code
Let's go to the code:
public void RemoveCategory(ArtefactIdentity ArtIdentity, string CategoryID, string CategoryTree)
{
try
{
WSModel wsModel = new WSModel();
// Prepare Art Identity and Array
ArtIdentity.Version = ArtIdentity.Version.Replace("_", ".");
var CatTree = JArray.Parse(CategoryTree).Reverse();
// Get Category Scheme
ISdmxObjects SdmxObj = wsModel.GetCategoryScheme(ArtIdentity, false, false);
ICategorySchemeMutableObject CatSchemeObj = SdmxObj.CategorySchemes.FirstOrDefault().MutableInstance;
foreach (var Cat in CatTree)
{
// The cycle should work like this.
// At every iteration it must delete all the elements except the correct one
// and on the next iteration it must delete all the elements of the previously selected element
// At the end, I need to have the CatSchemeObj full of the all chains of categories.
// Iteration 1...
//CatSchemeObj.Items.ToList().RemoveAll(x => x.Id != Cat.ToString());
// Iteration 2...
//CatSchemeObj.Items.ToList().SingleOrDefault().Items.ToList().RemoveAll(x => x.Id != Cat.ToString());
// Iteration 3...
//CatSchemeObj.Items.ToList().SingleOrDefault().Items.ToList().SingleOrDefault().Items.ToList().RemoveAll(x => x.Id != Cat.ToString());
// Etc...
}
}
catch (Exception ex)
{
throw ex;
}
}
Thank you for your help.
So, as i already said in my comment, building a recursive function should fix the issue. If you're new to it, you can find some basic information about recursion in C# here.
The method could look something like this:
private void DeleteRecursively(int currentRecursionLevel, string[] catTree, ICategorySchemeMutableObject catSchemeObj)
{
catSchemeObj.Items.ToList().RemoveAll(x => x.Id != catTree[currentRecursionLevel].ToString());
var leftoverObject = catSchemeObj.Items.ToList().SingleOrDefault();
if(leftoverObject != null) DeleteRecursively(++currentRecursionLevel, catTree, leftoverObject);
}
Afterwards you can call this method in your main method, instead of the loop:
DeleteRecursively(0, CatTree, CatSchemeObject);
But as i also said, keep in mind, that calling the method in the loop, seems senseless to me, because you already cleared the tree, besides the one leftover path, so calling the method with the same tree, but another category, will result in an empty tree (in CatSchemeObject).
CAUTION! Another thing to mention i noticed right now: Calling to list on your Items property and afterwards deleting entries, will NOT affect your source object, as ToList is generating a new object. It IS keeping the referenced original objects, but a deletion only affects the list. So you must write back the resulting list to your Items property, or find a way to directly delete in the Items object. (Assuming it's an IEnumerable and not a concrete collection type you should write it back).
Just try it out with this simple example, and you will see that the original list is not modified.
IEnumerable<int> test = new List<int>() { 1, 2, 3, 4 , 1 };
test.ToList().RemoveAll(a => a != 1);
Edited:
So here is another possible way of going after the discussion below.
Not sure what do you really need so just try it out.
int counter = 0;
var list = CatSchemeObj.Items.ToList();
//check before you call it or you will get an error
if(!list.Equals(default(list)))
{
while(true)
{
var temp = list.Where(x => CatTree[counter++] == x.Id); // or != ? play with it .
list = temp.Items.ToList().SingleOrDefault();
if(list.Equals(default(list))
{
break;
}
}
}
I just translated you problem to 2 solutions, but I am not sure if you won't lose data because of the SingleOrDefault call. It means 'Grab the first item regardless of everything'. I know you said you have only 1 Item that is ok, but still... :)
Let me know in comment if this worked for you or not.
//solution 1
// inside of this loop check each child list if empty or not
foreach (var Cat in CatTree)
{
var list = CatSchemeObj.Items.ToList();
//check before you call it or you will get an error
if(!list.Equals(default(list)))
{
while(true)
{
list.RemoveAll(x => x.Id != Cat.ToString());
list = list.ToList().SingleOrDefault();
if(list.Equals(default(list))
{
break;
}
}
}
}
//solution 2
foreach (var Cat in CatTree)
{
var list = CatSchemeObj.Items.ToList();
//check before you call it or you will get an error
if(!list.Equals(default(list)))
{
CleanTheCat(cat, list);
}
}
//use this recursive function outside of loop because it will cat itself
void CleanTheCat(string cat, List<typeof(ICategorySchemeMutableObject.Items) /*Place here whatever type you have*/> CatSchemeObj)
{
CatSchemeObj.RemoveAll(x => x.Id != cat);
var catObj = CatSchemeObj.Items.ToList().SingleOrDefault();
if (!catObj.Equals(default(catObj)){
CleanTheCat(cat, catObj);
}
}
Thank you to whoever tried to help but I solved it by myself in a much easier way.
I just sent the full CategoryScheme object to the method that converted it in the XML format, then just one line did the trick:
XmlDocument.Descendants("Category").Where(x => !CatList.Contains(x.Attribute("id").Value)).RemoveIfExists();
I've simplified things as much as possible. This is reading from a table that has around 3,000,000 rows. I want to create a Dictionary from some concatenated fields of the data.
Here's the code that, in my opinion, should never, ever throw an Out Of Memory Exception:
public int StupidFunction()
{
var context = GetContext();
int skip = 0;
int take = 100000;
var batch = context.VarsHG19.OrderBy(v => v.Id).Skip(skip).Take(take);
while (batch.Any())
{
batch.ToList();
skip += take;
batch = context.VarsHG19.OrderBy(v => v.Id).Skip(skip).Take(take);
}
return 1;
}
In my opinion, the batch object should simply be replaced each iteration and the previous memory allocated for the previous batch object should be garbage collected. I would expect that the loop in this function should take a nearly static amount of memory. At the very worst, it should be bounded by the memory needs of one row * 100,000. The Max size of a row from this table is 540 bytes. I removed Navigation Properties from the edmx.
You can turn off tracking using AsNoTracking. Why not use a foreach loop though on a filtered IEnumerable from the DbSet? You can also help by only returning what you need using an anonymous type using Select() – Igor
Thanks for the Answer Igor.
public int StupidFunction()
{
var context = GetContext();
int skip = 0;
int take = 100000;
var batch = context.VarsHG19.AsNoTracking().OrderBy(v => v.Id).Skip(skip).Take(take);
while (batch.Any())
{
batch.ToList();
skip += take;
batch = context.VarsHG19.AsNoTracking().OrderBy(v => v.Id).Skip(skip).Take(take);
}
return 1;
}
No Out of Memory Exception.
You are not assigning the query's result to anything. How C# will understand what should be cleaned to assign new memory.
batch is a query and would not contain anything. Once you have called .ToList() this will execute the query and return the records.
public int StupidFunction()
{
var context = GetContext();
int skip = 0;
int take = 100000;
var batch = context.VarsHG19.OrderBy(v => v.Id).Skip(skip).Take(take).ToList();
while (batch.Any())
{
skip += take;
batch = context.VarsHG19.OrderBy(v => v.Id).Skip(skip).Take(take).ToList();
}
return 1;
}
I have the following code returning 22,000,000 records from the database pretty quick:
var records = from row in dataContext.LogicalMapTable
select
new
{
row.FwId,
row.LpDefId,
row.FlDefMapID
};
The code following the database call above takes over 60 seconds to run:
var cache = new Dictionary<string, int>();
foreach (var record in records)
{
var tempHashCode = record.FirmwareVersionID + "." + record.LogicalParameterDefinitionID;
cache.Add(tempHashCode, record.FirmwareLogicalDefinitionMapID);
}
return cache;
Is there a better way to do this with improve performance?
The second part of your code is not slow. It just causes LINQ query evaluation, you can see this by consuming your query earlier, for example
var records = (from row in dataContext.LogicalMapTable
select
new
{
row.FwId,
row.LpDefId,
row.FlDefMapID
}).ToList();
So it is your LINQ query that is slow, and here is how you can fix it.
You probably don't need 22M records cached in memory. Things you can try:
Pagination (take, skip)
Change queries to include specific ids or other columns. E.g. before select * ..., after select * ... where id in (1,2,3) ...
Do most of the analytic work at database, it's fast and doesn't take up your app memory
Prefer queries that bring small data batches fast. You can run several of these concurrently to update different bits of your UI
As others have mentioned in comments, reading the entire list like that is very inefficient.
Based on the code you posted, I am assuming that after the list is loaded into your "Cache", you lookup the FirmwareLogicalDefinitionMapID using the Key of FirmwareVersionID + "." + LogicalParameterDefinitionID;
My suggestion to improve overall performance and memory usage is to implement an actual caching pattern, something like this:
public static class CacheHelper
{
public static readonly object _SyncLock = new object();
public static readonly _MemoryCache = MemoryCache.Default;
public static int GetFirmwareLogicalDefinitionMapID(int firmwareVersionID, int logicalParameterDefinitionID)
{
int result = -1;
// Build up the cache key
string cacheKey = string.Format("{0}.{1}", firmwareVersionID, logicalParameterDefinitionID);
// Check if the object is in the cache already
if(_MemoryCache.Countains(cacheKey))
{
// It is, so read it and type cast it
object cacheObject = _MemoryCache[cacheKey];
if(cacheObject is int)
{
result = (int)cacheObject;
}
}
else
{
// The object is not in cache, aquire a sync lock for thread safety
lock(_SyncLock)
{
// Double check that the object hasnt been put into the cache by another thread.
if(!_MemoryCache.Countains(cacheKey))
{
// Still not there, now Query the database
result = (from i in dataContext.LogicalMapTable
where i.FwId == firmwareVersionID && i.LpDefId == logicalParameterDefinitionID
select i.FlDefMapID).FirstOrDefault();
// Add the results to the cache so that the next operation that asks for this object can read it from ram
_MemoryCache.Add(new CacheItem(cacheKey, result), new CacheItemPolicy() { SlidingExpiration = new TimeSpan(0, 5, 0) });
}
else
{
// we lost a concurrency race to read the object from source, its in the cache now so read it from there.
object cacheObject = _MemoryCache[cacheKey];
if(cacheObject is int)
{
result = (int)cacheObject;
}
}
}
}
// return the results
return result;
}
}
You should also read up on the .Net MemoryCache: http://www.codeproject.com/Articles/290935/Using-MemoryCache-in-Net-4-0
Hope this helps!
Trying speed up iterating though two foreach loops at the moment it takes about 15 seconds`
foreach (var prodCost in Settings.ProdCostsAndQtys)
{
foreach (var simplified in Settings.SimplifiedPricing
.Where(simplified => prodCost.improd.Equals(simplified.PPPROD) &&
prodCost.pplist.Equals(simplified.PPLIST)))
{
prodCost.pricecur = simplified.PPP01;
prodCost.priceeur = simplified.PPP01;
}
}
Basically the ProdCostsAndQtys list is a list of objects which has 5 properties, the size of the list is 798677
The SimplifiedPricing list is a list of objects with 44 properties, the size of this list is 347 but is more than likely going to get a lot bigger (hence wanting to get the best performance now).
The loop iterates through all the objects in the first list within the second loop if the two conditions match they replace the two properties from the first loop with the second loop.
It seems that your SimplifiedPricing is a smaller lookup list and the outer loop iterates on a larger list. It looks to me as if the main source of delay is the Equals check for each item on the smaller list to match each item in the larger list. Also, when you have a match, you update the value in the larger list, so updating multiple times looks redundant.
Considering this, I would suggest building up a Dictionary for the items in the smaller list, increasing memory consumption but drastically speeding up lookup times. First we need something to hold the key of this dictionary. I will assume that the improd and pplist are integers, but it does not matter for this case:
public struct MyKey
{
public readonly int Improd;
public readonly int Pplist;
public MyKey(int improd, int pplist)
{
Improd = improd;
Pplist = pplist;
}
public override int GetHashCode()
{
return Improd.GetHashCode() ^ Pplist.GetHashCode();
}
public override bool Equals(object obj)
{
if (!(obj is MyKey)) return false;
var other = (MyKey)obj;
return other.Improd.Equals(this.Improd) && other.Pplist.Equals(this.Pplist);
}
}
Now that we have something that compares Pplist and Improd in one go, we can use it as a key for a dictionary containing the SimplifiedPricing.
IReadOnlyDictionary<MyKey, SimplifiedPricing> simplifiedPricingLookup =
(from sp in Settings.SimplifiedPricing
group sp by new MyKey(sp.PPPROD, sp.PPLIST) into g
select new {key = g.Key, value = g.Last()}).ToDictionary(o => o.key, o => o.value);
Notice the IReadOnlyDictionary. This is to show our intent of not modifying this dictionary after its creation, allowing us to safely parallelize the main loop:
Parallel.ForEach(Settings.ProdCostsAndQtys, c =>
{
SimplifiedPricing value;
if (simplifiedPricingLookup.TryGetValue(new MyKey(c.improd, c.pplist), out value))
{
c.pricecur = value.PPP01;
c.priceeur = value.PPP01;
}
});
This should change your single-threaded O(n²) loop to a parallelized O(n) loop, with a slight overhead for creating the simplifiedPricingLookup dictionary.
A join should be more efficient:
var toUpdate = from pc in Settings.ProdCostsAndQtys
join s in Settings.SimplifiedPricing
on new { prod=pc.improd, list=pc.pplist } equals new { prod=s.PPPROD, list=s.PPLIST }
select new { prodCost = pc, simplified = s };
foreach (var pcs in toUpdate)
{
pcs.prodCost.pricecur = pcs.simplified.PPP01;
pcs.prodCost.priceeur = pcs.simplified.PPP01;
}
You could make use of multiple threads with parallel.Foreach:
Parallel.ForEach(Settings.ProdCostsAndQtys, prodCost =>
{
foreach (var simplified in Settings.SimplifiedPricing
.Where(simplified =>
prodCost.improd.Equals(simplified.PPPROD) &&
prodCost.pplist.Equals(simplified.PPLIST))
{
prodCost.pricecur = simplified.PPP01;
prodCost.priceeur = simplified.PPP01;
}
}
However, this only applies if you have the lists in memory. There are far more efficient mechanisms for updating the lists in the database. Also, using linq join might make the code more readable at neglectible performance cost.
I'm fetching data from all 3 tables at once to avoid network latency. Fetching the data is pretty fast, but when I loop through the results a lot of time is used
Int32[] arr = { 1 };
var query = from a in arr
select new
{
Basket = from b in ent.Basket
where b.SUPERBASKETID == parentId
select new
{
Basket = b,
ObjectTypeId = 0,
firstObjectId = "-1",
},
BasketImage = from b in ent.Image
where b.BASKETID == parentId
select new
{
Image = b,
ObjectTypeId = 1,
CheckedOutBy = b.CHECKEDOUTBY,
firstObjectId = b.FIRSTOBJECTID,
ParentBasket = (from parentBasket in ent.Basket
where parentBasket.ID == b.BASKETID
select parentBasket).ToList()[0],
},
BasketFile = from b in ent.BasketFile
where b.BASKETID == parentId
select new
{
BasketFile = b,
ObjectTypeId = 2,
CheckedOutBy = b.CHECKEDOUTBY,
firstObjectId = b.FIRSTOBJECTID,
ParentBasket = (from parentBasket in ent.Basket
where parentBasket.ID == b.BASKETID
select parentBasket),
}
};
//Exception handling
var mixedElements = query.First();
ICollection<BasketItem> basketItems = new Collection<BasketItem>();
//Here 15 millis has been used
//only 6 elements were found
if (mixedElements.Basket.Count() > 0)
{
foreach (var mixedBasket in mixedElements.Basket){}
}
if (mixedElements.BasketFile.Count() > 0)
{
foreach (var mixedBasketFile in mixedElements.BasketFile){}
}
if (mixedElements.BasketImage.Count() > 0)
{
foreach (var mixedBasketImage in mixedElements.BasketImage){}
}
//the empty loops takes 811 millis!!
Why are you bothering to check the counts before the foreach statements? If there are no results, the foreach will just finish immediately.
Your queries are actually all being deferred - they'll be executed as and when you ask for the data. Don't forget that your outermost query is a LINQ to Objects query: it's just returning the result of calling ent.Basket.Where(...).Select(...) etc... which doesn't actually execute the query.
Your plan to do all three queries in one go isn't actually working. However, by asking for the count separately, you may actually be executing each database query twice - once just getting the count and once for the results.
I strongly suggest that you get rid of the "optimizations" in this code which are making it much more complicated and slower than just writing the simplest code you can.
I don't know of any way of getting LINQ to SQL (or LINQ to EF) to execute multiple queries in a single call - but this approach certainly isn't going to do it.
One other minor hint which is irrelevant in this case, but can be useful in LINQ to Objects - if you want to find out whether there's any data in a collection, just use Any() instead of Count() > 0 - that way it can stop as soon as it's found anything.
You're using IEnumerable in the foreach loop. Implementations only have to prepare data when it's asked for. In this way, I'd suggest that the above code is accessing your data lazily -- that is, only when you enumerate the items (which actually happens when you call Count().)
Put a System.Diagnostics.Stopwatch around the call to Count() and see whether that's taking the bulk of the time you're seeing.
I can't comment further here because you don't specify the type of ent in your code sample.