C# Performance issue about List<T> Count property - c#

The project I am working on is about request a xml from a web set. The server side construct the xml. The xml may have many nodes, so the performance is not that good.
I use virtual studio 2010 profiler to analyze the performance issue.
Find out that the most time-consuming function is System.Collections.Generic.ICollection`1.get_Count() which actually is Count property of Generic List.This function is called about 9000 times.
The performance data shown as below:
The Elapsed exclusive time is 4154.14(ms), while the Application exclusive time is just 0.52(ms).
I know the different between Elapsed exclusive time and Application exclusive time.
Application exclusive time exclude the time spend on the context switch stuff.
How could context switch stuff happy when the code just obtain the Count property of the Generic List.
I am very confused by the performance profiler data. Is anyone can provide some information? Thanks a lot!

Actually the decompiled sources show the following for List<T>:
[__DynamicallyInvokable]
public int Count
{
[__DynamicallyInvokable, TargetedPatchingOptOut("Performance critical to inline this type of method across NGen image boundaries")] get
{
return this._size;
}
}
It's literally returning the value of a field and doing nothing else. I'd suggest your performance hit is elsewhere, or you're misinterpreting your profiler's output.

Related

Visual Studio Performance Analyzer: Line count on open brace

I am trying to assess the performance of a program I'm writing.
I have a method:
public double FooBar(ClassA firstArg, EnumType secondArg)
{
[...]
If I check the Function Details in the VS Performace Analyser for FooBar I can see that the method accounts for 14% of the total time (inclusive), and that 10% is spent in the body of the method itself. The thing that I cannot understand is that it looks like 6.5% of the total time (both inclusive and exclusive) is spent in the open brace of this method; it is actually the most time-consuming line in the code (as exclusive time concerns).
The method is not overriding any other method. The profiling is done in Debug configuration using sampling, the run last about 150s and that 6.5% correspond to more than 3000 samples out of a total of 48000.
Can someone explain me what it is happening in this line and if there is a way to improve that behaviour?
In the first open curly braces of the method is shown the amount of time spent for method initialization.
During the method initialization, the local variables are allocated and initialized.
Be aware that all the local variables of the method are initialized before the execution also if are declared in the middle of the body.
In order to reduce the initialization time try moving local variables to the heap or, if they are only used sometimes (like variables inside an if branch or after a return), extract the piece of code that uses them to another method.

C# Optimizing a function by pre-loading it

I have a function that is very small, but is called so many times that my profiler marks it as time consuming. It is the following one:
private static XmlElement SerializeElement(XmlDocument doc, String nodeName, String nodeValue)
{
XmlElement newElement = doc.CreateElement(nodeName);
newElement.InnerXml = nodeValue;
return newElement;
}
The second line (where it enters the nodeValue) is the one takes some time.
The thing is, I don't think it can be optimized code-wise, I'm still open to suggestions on that part though.
However, I remember reading or hearing somewhere that you could tell the compiler to flag this function, so that it is loaded in memory when the program starts and it runs faster.
Is this just my imagination or such a flag exists?
Thanks,
FB.
There are ways you can cause it to be jitted early, but it's not the jit time that's hurting you here.
If you're having performance problems related to Xml serialization, you might consider using XmlWriter rather than XmlDocument, which is fairly heavy. Also, most automatic serialization systems (including the built-in .NET XML Serialization) will emit code dynamically to perform the serialization, which can then be cached and re-used. Most of this has to do with avoiding the overhead of reflection, however, rather than the overhead of the actual XML writing/parsing.
I dont think this can be solved using any kind of catching or inlining. And I believe its your imagination. Mainly the part about performance. What you have in mind is pre-JIT-ing your code. This technique will remove the wait time for JITer when your function is first called. But this is only first time this function is called. It has no performance effect for subsequent calls.
As documentation states, setting InnterXml parses set string as XML. And parsing XML string can be expensive operation, especialy if set xml in string format is complex. And documentation even has this line:
InnerXml is not an efficient way to modify the DOM. There may be performance issues when replacing complex nodes. It is more efficient to construct nodes and use methods such as InsertBefore, InsertAfter, AppendChild, and RemoveChild to modify the Xml document.
So, if you are creating complex XML structure this way it would be wise to do it by hand.

Can Performance Counters track time taken along with a string idenitifer?

I need to record the time taken by a task and it's been sugguested I use windows performance counters.
I need to record the time taken to solve a given MathProblem. The Solve methods first line will start the StopWatch and the last line will Stop it.
When I record the time taken to solve the problem I need to record the time along with the ProblemId (a string).
Can performance counters be use the record data like this? Will the perfmon graph plot the times along with a idenitifer? so when I click or hover over the graph point it will show the ProblemID?
Thanks in advance
public class MathProblem
{
public string ProblemID;
public void Solve()
{
StopWatch sw = StopWatch.StartNew();
sw.Stop();
//Log to performance counter with ProblemID
}
}
No, the system performance counters do not work for such a case if you want to separate counters per problem ID. While is true that an instanced category can track separate counters for each instance and the display can show the counters for each category and for _Total (which is an aggregated category you create in code and make sure you add all individual instances to _Total as well), this infrastructure is designed for fairly stable instances, the most volatile example being a process. If your ProblemIDs show up and vanish frequently (ie. the ID is very volatile, changing more often than a few times per hour) tracking this kind of volatility under the perfmon infrastructure is just not going to work. The clients take a snapshot of the instance names and then look for changes in that instance names space, so if the names are volatile, all clients will track basically nothing: will track some ProblemId that happen to exists at the moment the snapshot was captured, then nothing more since the snapshot instances will be gone and no new instance is captured.
Forgive me if this seems like a simplistic approach, but for what it's worth (and based on Remus' information that perfmon will not do what you want), I've always approached this type of problem by writing results to a CSV file. This can then be imported directly into Excel and Excel will analyse and graph the data in whatever manner is most appropriate. Any other spreadsheet program will work, of course.
This way you get sophisticated analysis capabilities without having to write code for anything more than creating a text file; unless you want to write some VBA in Excel to help process the data.
I may have misunderstood your requirements but you could append your results to what is effectively a log file (in a CSV format) and simply add to that each time your program is run, which would have more or less the same effect as creating performance counter datapoints.
If you want to get more sophisticated, you can of course inject results directly into Excel sheets with a more complicated solution.

To cache or not to cache - GetCustomAttributes

I currently have a function:
public static Attribute GetAttribute(MemberInfo Member, Type AttributeType)
{
Object[] Attributes = Member.GetCustomAttributes(AttributeType, true);
if (Attributes.Length > 0)
return (Attribute)Attributes[0];
else
return null;
}
I am wondering if it would be worthwhile caching all the attributes on a property into a
Attribute = _cache[MemberInfo][Type] dictionary,
This would require using GetCustomAttributes without any type parameter then enumerating over the result. Is it worth it?
You will get better bangs for your bucks if you replace the body of your method with this:
return Attribute.GetCustomAttribute(Member, AttributeType,false); // only look in the current member and don't go up the inheritance tree.
If you really need to cache on a type-basis:
public static class MyCacheFor<T>
{
static MyCacheFor()
{
// grab the data
Value = ExtractExpensiveData(typeof(T));
}
public static readonly MyExpensiveToExtractData Value;
private static MyExpensiveToExtractData ExtractExpensiveData(Type type)
{
// ...
}
}
Beats dictionary lookups everytime. Plus it's threadsafe:)
Cheers,
Florian
PS: Depends how often you call this. I had some cases where doing a lot of serialization using reflection really called for caching, as usual, you want to measure the performance gain versus the memory usage increase. Instrument your memory use and profile your CPU time.
The only way you can know for sure, is to profile it. I am sorry if this sounds like a cliche. But the reason why a saying is a cliche is often because it's true.
Caching the attribute is actually making the code more complex, and more error prone. So you might want to take this into account-- your development time-- before you decide.
So like optimization, don't do it unless you have to.
From my experience ( I am talking about AutoCAD-like Windows Application, with a lot of click-edit GUI operations and heavy number crunching), the reading of custom attribute is never--even once-- the performance bottleneck.
I just had a scenario where GetCustomAttributes turned out to be the performance bottleneck. In my case it was getting called hundreds of thousands of times in a dataset with many rows and this made the problem easy to isolate. Caching the attributes solved the problem.
Preliminary testing led to a barely noticeable performance hit at about 5000 calls on a modern day machine. (And it became drastically more noticeable as the dataset size increased.)
I generally agree with the other answers about premature optimization, however, on a scale of CPU instruction to DB call, I'd suggest that GetCustomAttributes would lean more towards the latter.
Your question is a case of premature optimization.
You don't know the inner workings of the reflection classes and therefore are making assumptions about the performance implications of calling GetCustomAttributes multiple times. The method itself could well cache its output already, meaning your code would actually add overhead with no performance improvement.
Save your brain cycles for thinking about things which you already know are problems!
Old question but GetCustomAttributes is costly/heavyweight
Using a cache if it is causing performance problems can be a good idea
The article I linked: (Dodge Common Performance Pitfalls to Craft Speedy Applications) was taken down but here a link to an archived version:
https://web.archive.org/web/20150118044646/http://msdn.microsoft.com:80/en-us/magazine/cc163759.aspx
Are you actually having a performance problem? If not then don't do it until you need it.
It might help depending on how often you call the method with the same paramters. If you only call it once per MemberInfo, Type combination then it won't do any good. Even if you do cache it you are trading speed for memory consumption. That might be fine for your application.

Lockless list help!

Hi im trying to write a lockless list i got the adding part working it think but the code that extracts objects from the list does not work to good :(
Well the list is not a normal list.. i have the Interface IWorkItem
interface IWorkItem
{
DateTime ExecuteTime { get; }
bool Cancelled { get; }
void Execute(DateTime now);
}
and well i have a list where i can add this :P and the idear is when i run Get(); on the list it should loop it until it finds a IWorkItem that
If (item.ExecuteTime < DateTime.Now)
and remove it from the list and return it..
i have ran tests with many threads on my dual core cpu and it seems that Add works never failed so far but the Get function looses some workitems some where i have no idear whats wrong.....
ps if i get this working any one is free to use the code :) well you are any way but i dont se the point when its bugged :P
The code is here http://www.easy-share.com/1903474734/LinkedList.zip and if you try to run it you will see that it will some times not be able to get as many workitems as it did put in the list...
Edit: I have got a lockless list working it was faster than using the lock(obj) statment but i have a lock object that uses Interlocked that was still outpreforming the lockless list, im going to try to make a lockless arraylist and se if i get the same results there when im done ill upload the result here..
The problem is your algorithm: Consider this sequence of events:
Thread 1 calls list.Add(workItem1), which completes fully.
Status is:
first=workItem1, workItem1.next = null
Then thread 1 calls list.Add(workItem2) and reaches the spot right before the second Replace (where you have the comment "//lets try").
Status is:
first=workItem1, workItem1.next = null, nextItem=workItem1
At this point thread 2 takes over and calls list.Get(). Assume workItem1's executionTime is now, so the call succeeds and returns workItem1.
After this status is:
first = null, workItem1.next = null
(and in the other thread, nextItem is still workItem1).
Now we get back to the first thread, and it completes the Add() by setting workItem1.next:=workItem2.
If we call list.Get() now, we will get null, even though the Add() completed successfully.
You should probably look up a real peer-reviewed lock-free linked list algorithm. I think the standard one is this by John Valois. There is a C++ implementation here. This article on lock-free priority queues might also be of use.
You can use a timestamping protocol for datastructures just fine, mirroring this example from the database world:
Concurrency
But be clear that each item needs both a read and write timestamp, and be sure to follow the rules of the algorithm clearly.
There are some additional difficulties of implementing this on a linked list though, I think. The database example would be fine for a vector where you know the array index of what you want. However, in a linked list, you may need to walk down the pointers -- and the structure of the list could change while you are searching! I guess you could solve that by some sort of nuance (or if you just want to traverse the "new" list as it is, do nothing), but it poses a problem. Try to solve it without introducing some rollback condition that makes it worse than locking the list!
So are you sure that it needs to be lockless? Depending on your work load the non-blocking solution can sometimes be slower. Check out this MSDN article for a little more. Also proving that a lockless data structure is correct can be very difficult.
I am in no way an expert on the subject, but as far as I can see, you need to either make the ExecutionTime-field in the implementation of IWorkItem volatile (of course it might already be that) or insert a memorybarrier either after you set the ExecutionTime or before you read it.

Categories

Resources