cast to IEnumerable results in performance hit

cast to IEnumerable results in performance hit - c#

While analyzing an performance hit in our Windows CE based software, I stumbled upon a very
fascinating mystery:
Look at these two methods:
void Method1(List<int> list)
{
foreach (var item in list)
{
if (item == 2000)
break;
}
}
void Method2(List<int> list)
{
foreach (var item in (IEnumerable<int>)list)
{
if (item == 2000)
break;
}
}
void StartTest()
{
var list = new List<int>();
for (var i = 0; i < 3000; i++)
list.Add(i);
StartMeasurement();
Method1(list);
StopMeasurement(); // 1 ms
StartMeasurement();
Method2(list);
StopMeasurement(); // 721 ms
}
void StartMeasurement()
{
_currentStartTime = Environment.TickCount;
}
void StopMeasurement()
{
var time = Environment.TickCount - _currentStartTime;
Debug.WriteLine(time);
}
Method1 takes 1 ms to run. Method2 needs nearly 700 ms !
Don't try to reproduce this performance hit: it won't appear in a normal program on PC.
Unfortunately we can reproduce it in our software on our smart devices very reliably. The program runs on Compact Framework 3.5, Windows Forms, Windows CE 6.0.
The measurement uses Environment.TickCount.
As it is quite clear that there must be a strange bug in our software which slows down the enumerator, I simply can't imagine what kind of error could let the List class slow down, only if the iteration is using the IEnumerable interface of List.
And one more hint: After opening and closing a modal dialog (Windows Forms), suddenly both methods take the same time: 1 ms.

You need to run tests several times, because in a single run the CPU might get suspended, etc. It is for instance possible that while running method2, you are moving with your mouse resulting in the OS temporary letting the mouse driver run, etc. Or a network package arrives, or the timer says it is time to let another application run,... In other words there are a lot of reasons why all of a sudden your program stops running for a few milliseconds.
If I run the following program (note that using DateTime's, etc.) is not recommended:
var list = new List<int>();
for (var i = 0; i < 3000; i++)
list.Add(i);
DateTime t0 = DateTime.Now;
for(int i = 0; i < 50000; i++) {
Method1(list);
}
DateTime t1 = DateTime.Now;
for(int i = 0; i < 50000; i++) {
Method2(list);
}
DateTime t2 = DateTime.Now;
Console.WriteLine(t1-t0);
Console.WriteLine(t2-t1);
I get:
00:00:00.6522770 (method1)
00:00:01.2461630 (method2)
Swapping the order of testing results in:
00:00:01.1278890 (method2)
00:00:00.5473190 (method1)
So it's only 100% slower. Furthermore the performance of the first method (method1) is probably a bit better since for method1, the JIT compiler will first need to translate the code to machine instructions. In other words, the first method calls you do tend to be a bit slower than the ones later in the process..
The delay is probably caused because if you use a List<T>, the compiler can specialize the foreach loop: it already knows the structure of the IEnumerator<T> at compile time and can inline it if necessary.
If you use an IEnumerable<T>, the compiler must use virtual calls and lookup the exact methods using a vtable. This explains the time difference. Especially since you don't do much in your loop. In other words, the runtime must lookup which method to actually use, since the IEnumerable<T> could be anything: a LinkedList<T>, a HashSet<T>, a datastructure you made yourself,...
A general rule is: the higher the type of object in the class hierarchy the less the compiler knows about the real instance, the less it can optimize performance.

Maybe generating the template code the first time for the IEnumerable?

Many thanks for all your comments.
Our development team is analyzing this performance bug for nearly a week now
We run these tests several times in different order and different modules of the program
with and without compiler optimization. #CommuSoft: You are right that JIT needs more time to run code
for the first time. Unfortunately the result is always the same:
Method2 is about 700 times slower than Method1.
Maybe it's worth to mention again, that the performance hit appears untill we open and close any modal dialog in the program. It's not important what modal dialog. The performance hit will be fixed right after the
method Dispose() of the base class Form has been called. (The Dispose method hasn't been
overriden by a derived class)
On my way of analyzing the bug I stepped deeply into the framework and found out now,
that List can't be the reason for the performance hit. Look at this piece of code:
class Test
{
void Test()
{
var myClass = new MyClass();
StartMeasurement();
for (int i = 0; i < 5000; i++)
{
myClass.DoSth();
}
StopMeasurement(); // ==> 46 ms
var a = (IMyInterface)myClass;
StartMeasurement();
for (int i = 0; i < 5000; i++)
{
a.DoSth();
}
StopMeasurement(); // ==> 665 ms
}
}
public interface IMyInterface
{
void DoSth();
}
public class MyClass : IMyInterface
{
public void DoSth()
{
for (int i = 0; i < 10; i++ )
{
double a = 1.2345;
a = a / 19.44;
}
}
}
To call a method via the interface needs much more time than calling it directly.
But of course after closing our dubious dialog we measured between 44 and 45 ms for both loops.

Related

Java is scaling much worse than C# over many cores?

I am testing spawning off many threads running the same function on a 32 core server for Java and C#. I run the application with 1000 iterations of the function, which is batched across either 1,2,4,8, 16 or 32 threads using a threadpool.
At 1, 2, 4, 8 and 16 concurrent threads Java is at least twice as fast as C#. However, as the number of threads increases, the gap closes and by 32 threads C# has nearly the same average run-time, but Java occasionally takes 2000ms (whereas both languages are usually running about 400ms). Java is starting to get worse with massive spikes in the time taken per thread iteration.
EDIT This is Windows Server 2008
EDIT2 I have changed the code below to show using the Executor Service threadpool. I have also installed Java 7.
I have set the following optimisations in the hotspot VM:
-XX:+UseConcMarkSweepGC -Xmx 6000
but it still hasnt made things any better. The only difference between the code is that im using the below threadpool and for the C# version we use:
http://www.codeproject.com/Articles/7933/Smart-Thread-Pool
Is there a way to make the Java more optimised? Perhaos you could explain why I am seeing this massive degradation in performance?
Is there a more efficient Java threadpool?
(Please note, I do not mean by changing the test function)
import java.io.DataOutputStream;
import java.io.FileNotFoundException;
import java.io.FileOutputStream;
import java.io.PrintStream;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.ThreadPoolExecutor;
public class PoolDemo {
static long FastestMemory = 2000000;
static long SlowestMemory = 0;
static long TotalTime;
static int[] FileArray;
static DataOutputStream outs;
static FileOutputStream fout;
static Byte myByte = 0;
public static void main(String[] args) throws InterruptedException, FileNotFoundException {
int Iterations = Integer.parseInt(args[0]);
int ThreadSize = Integer.parseInt(args[1]);
FileArray = new int[Iterations];
fout = new FileOutputStream("server_testing.csv");
// fixed pool, unlimited queue
ExecutorService service = Executors.newFixedThreadPool(ThreadSize);
ThreadPoolExecutor executor = (ThreadPoolExecutor) service;
for(int i = 0; i<Iterations; i++) {
Task t = new Task(i);
executor.execute(t);
}
for(int j=0; j<FileArray.length; j++){
new PrintStream(fout).println(FileArray[j] + ",");
}
}
private static class Task implements Runnable {
private int ID;
public Task(int index) {
this.ID = index;
}
public void run() {
long Start = System.currentTimeMillis();
int Size1 = 100000;
int Size2 = 2 * Size1;
int Size3 = Size1;
byte[] list1 = new byte[Size1];
byte[] list2 = new byte[Size2];
byte[] list3 = new byte[Size3];
for(int i=0; i<Size1; i++){
list1[i] = myByte;
}
for (int i = 0; i < Size2; i=i+2)
{
list2[i] = myByte;
}
for (int i = 0; i < Size3; i++)
{
byte temp = list1[i];
byte temp2 = list2[i];
list3[i] = temp;
list2[i] = temp;
list1[i] = temp2;
}
long Finish = System.currentTimeMillis();
long Duration = Finish - Start;
TotalTime += Duration;
FileArray[this.ID] = (int)Duration;
System.out.println("Individual Time " + this.ID + " \t: " + (Duration) + " ms");
if(Duration < FastestMemory){
FastestMemory = Duration;
}
if (Duration > SlowestMemory)
{
SlowestMemory = Duration;
}
}
}
}

Summary
Below are the original response, update 1, and update 2. Update 1 talks about dealing with the race conditions around the test statistic variables by using concurrency structures. Update 2 is a much simpler way of dealing with the race condition issue. Hopefully no more updates from me - sorry for the length of the response but multithreaded programming is complicated!
Original Response
The only difference between the code is that im using the below
threadpool
I would say that is an absolutely huge difference. It's difficult to compare the performance of the two languages when their thread pool implementations are completely different blocks of code, written in user space. The thread pool implementation could have enormous impact on performance.
You should consider using Java's own built-in thread pools. See ThreadPoolExecutor and the entire java.util.concurrent package of which it is part. The Executors class has convenient static factory methods for pools and is a good higher level interface. All you need is JDK 1.5+, though the newer, the better. The fork/join solutions mentioned by other posters are also part of this package - as mentioned, they require 1.7+.
Update 1 - Addressing race conditions by using concurrency structures
You have race conditions around the setting of FastestMemory, SlowestMemory, and TotalTime. For the first two, you are doing the < and > testing and then the setting in more than one step. This is not atomic; there is certainly the chance that another thread will update these values in between the testing and the setting. The += setting of TotalTime is also non-atomic: a test and set in disguise.
Here are some suggested fixes.
TotalTime
The goal here is a threadsafe, atomic += of TotalTime.
// At the top of everything
import java.util.concurrent.atomic.AtomicLong;
...
// In PoolDemo
static AtomicLong TotalTime = new AtomicLong();
...
// In Task, where you currently do the TotalTime += piece
TotalTime.addAndGet (Duration);
FastestMemory / SlowestMemory
The goal here is testing and updating FastestMemory and SlowestMemory each in an atomic step, so no thread can slip in between the test and update steps to cause a race condition.
Simplest approach:
Protect the testing and setting of the variables using the class itself as a monitor. We need a monitor that contains the variables in order to guarantee synchronized visibility (thanks #A.H. for catching this.) We have to use the class itself because everything is static.
// In Task
synchronized (PoolDemo.class) {
if (Duration < FastestMemory) {
FastestMemory = Duration;
}
if (Duration > SlowestMemory) {
SlowestMemory = Duration;
}
}
Intermediate approach:
You may not like taking the whole class for the monitor, or exposing the monitor by using the class, etc. You could do a separate monitor that does not itself contain FastestMemory and SlowestMemory, but you will then run into synchronization visibility issues. You get around this by using the volatile keyword.
// In PoolDemo
static Integer _monitor = new Integer(1);
static volatile long FastestMemory = 2000000;
static volatile long SlowestMemory = 0;
...
// In Task
synchronized (PoolDemo._monitor) {
if (Duration < FastestMemory) {
FastestMemory = Duration;
}
if (Duration > SlowestMemory) {
SlowestMemory = Duration;
}
}
Advanced approach:
Here we use the java.util.concurrent.atomic classes instead of monitors. Under heavy contention, this should perform better than the synchronized approach. Try it and see.
// At the top of everything
import java.util.concurrent.atomic.AtomicLong;
. . . .
// In PoolDemo
static AtomicLong FastestMemory = new AtomicLong(2000000);
static AtomicLong SlowestMemory = new AtomicLong(0);
. . . . .
// In Task
long temp = FastestMemory.get();
while (Duration < temp) {
if (!FastestMemory.compareAndSet (temp, Duration)) {
temp = FastestMemory.get();
}
}
temp = SlowestMemory.get();
while (Duration > temp) {
if (!SlowestMemory.compareAndSet (temp, Duration)) {
temp = SlowestMemory.get();
}
}
Let me know what happens after this. It may not fix your problem, but the race condition around the very variables that track your performance is too dangerous to ignore.
I originally posted this update as a comment but moved it here so that I would have room to show code. This update has been through a few iterations - thanks to A.H. for catching a bug I had in an earlier version. Anything in this update supersedes anything in the comment.
Last but not least, an excellent source covering all this material is Java Concurrency in Practice, the best book on Java concurrency, and one of the best Java books overall.
Update 2 - Addressing race conditions in a much simpler way
I recently noticed that your current code will never terminate unless you add executorService.shutdown(). That is, the non-daemon threads living in that pool must be terminated or else the main thread will never exit. This got me to thinking that since we have to wait for all threads to exit, why not compare their durations after they finished, and thus bypass the concurrent updating of FastestMemory, etc. altogether? This is simpler and could be faster; there's no more locking or CAS overhead, and you are already doing an iteration of FileArray at the end of things anyway.
The other thing we can take advantage of is that your concurrent updating of FileArray is perfectly safe, since each thread is writing to a separate cell, and since there is no reading of FileArray during the writing of it.
With that, you make the following changes:
// In PoolDemo
// This part is the same, just so you know where we are
for(int i = 0; i<Iterations; i++) {
Task t = new Task(i);
executor.execute(t);
}
// CHANGES BEGIN HERE
// Will block till all tasks finish. Required regardless.
executor.shutdown();
executor.awaitTermination(10, TimeUnit.SECONDS);
for(int j=0; j<FileArray.length; j++){
long duration = FileArray[j];
TotalTime += duration;
if (duration < FastestMemory) {
FastestMemory = duration;
}
if (duration > SlowestMemory) {
SlowestMemory = duration;
}
new PrintStream(fout).println(FileArray[j] + ",");
}
. . .
// In Task
// Ending of Task.run() now looks like this
long Finish = System.currentTimeMillis();
long Duration = Finish - Start;
FileArray[this.ID] = (int)Duration;
System.out.println("Individual Time " + this.ID + " \t: " + (Duration) + " ms");
Give this approach a shot as well.
You should definitely be checking your C# code for similar race conditions.

...but Java occasionally takes 2000ms...
And
byte[] list1 = new byte[Size1];
byte[] list2 = new byte[Size2];
byte[] list3 = new byte[Size3];
The hickups will be the garbage collector cleaning up your arrays. If you really want to tune that I suggest you use some kind of cache for the arrays.
Edit
This one
System.out.println("Individual Time " + this.ID + " \t: " + (Duration) + " ms");
does one or more synchronized internally. So your highly "concurrent" code will be serialized quite good at this point. Just remove it and retest.

While #sparc_spread's answer is great, another thing I've noticed is this:
I run the application with 1000 iterations of the function
Notice that the HotSpot JVM is working on interpreted mode for the first 1.5k iterations of any function on client mode, and for 10k iterations on server mode. Computers with that many cores are automatically considered "servers" by the HotSpot JVM.
That would mean that C# would do JIT (and run in machine code) before Java does, and has a chance for better performance at the function runtime. Try increasing the iterations to 20,000 and start counting from 10k iteration.
The rationale here is that the JVM collects statistical data for how to do JIT best. It trusts that your function is going to be run a lot through time, so it takes a "slow bootstrapping" mechanism for a faster runtime overall. Or in their words "20% of the functions run 80% of the time", so why JIT them all?

Are you using java6? Java 7 comes with features to improve performance in parallel programing:
http://www.oracle.com/technetwork/articles/java/fork-join-422606.html

Why is concurrent modification of arrays so slow?

I was writing a program to illustrate the effects of cache contention in multithreaded programs. My first cut was to create an array of long and show how modifying adjacent items causes contention. Here's the program.
const long maxCount = 500000000;
const int numThreads = 4;
const int Multiplier = 1;
static void DoIt()
{
long[] c = new long[Multiplier * numThreads];
var threads = new Thread[numThreads];
// Create the threads
for (int i = 0; i < numThreads; ++i)
{
threads[i] = new Thread((s) =>
{
int x = (int)s;
while (c[x] > 0)
{
--c[x];
}
});
}
// start threads
var sw = Stopwatch.StartNew();
for (int i = 0; i < numThreads; ++i)
{
int z = Multiplier * i;
c[z] = maxCount;
threads[i].Start(z);
}
// Wait for 500 ms and then access the counters.
// This just proves that the threads are actually updating the counters.
Thread.Sleep(500);
for (int i = 0; i < numThreads; ++i)
{
Console.WriteLine(c[Multiplier * i]);
}
// Wait for threads to stop
for (int i = 0; i < numThreads; ++i)
{
threads[i].Join();
}
sw.Stop();
Console.WriteLine();
Console.WriteLine("Elapsed time = {0:N0} ms", sw.ElapsedMilliseconds);
}
I'm running Visual Studio 2010, program compiled in Release mode, .NET 4.0 target, "Any CPU", and executed in the 64-bit runtime without the debugger attached (Ctrl+F5).
That program runs in about 1,700 ms on my system, with a single thread. With two threads, it takes over 25 seconds. Figuring that the difference was cache contention, I set Multipler = 8 and ran again. The result is 12 seconds, so contention was at least part of the problem.
Increasing Multiplier beyond 8 doesn't improve performance.
For comparison, a similar program that doesn't use an array takes only about 2,200 ms with two threads when the variables are adjacent. When I separate the variables, the two thread version runs in the same amount of time as the single-threaded version.
If the problem was array indexing overhead, you'd expect it to show up in the single-threaded version. It looks to me like there's some kind of mutual exclusion going on when modifying the array, but I don't know what it is.
Looking at the generated IL isn't very enlightening. Nor was viewing the disassembly. The disassembly does show a couple of calls to (I think) the runtime library, but I wasn't able to step into them.
I'm not proficient with windbg or other low-level debugging tools these days. It's been a really long time since I needed them. So I'm stumped.
My only hypothesis right now is that the runtime code is setting a "dirty" flag on every write. It seems like something like that would be required in order to support throwing an exception if the array is modified while it's being enumerated. But I readily admit that I have no direct evidence to back up that hypothesis.
Can anybody tell me what is causing this big slowdown?

You've got false sharing. I wrote an article about it here

Why does parallel this code work sometimes?

I wanted to parallelize a piece of code, but the code actually got slower probably because of overhead of Barrier and BlockCollection. There would be 2 threads, where the first would find pieces of work wich the second one would operate on. Both operations are not much work so the overhead of switching safely would quickly outweigh the two threads.
So I thought I would try to write some code myself to be as lean as possible, without using Barrier etc. It does not behave consistent however. Sometimes it works, sometimes it does not and I can't figure out why.
This code is just the mechanism I use to try to synchronize the two threads. It doesn't do anything useful, just the minimum amount of code you need to reproduce the bug.
So here's the code:
// node in linkedlist of work elements
class WorkItem {
public int Value;
public WorkItem Next;
}
static void Test() {
WorkItem fst = null; // first element
Action create = () => {
WorkItem cur=null;
for (int i = 0; i < 1000; i++) {
WorkItem tmp = new WorkItem { Value = i }; // create new comm class
if (fst == null) fst = tmp; // if it's the first add it there
else cur.Next = tmp; // else add to back of list
cur = tmp; // this is the current one
}
cur.Next = new WorkItem { Value = -1 }; // -1 means stop element
#if VERBOSE
Console.WriteLine("Create is done");
#endif
};
Action consume = () => {
//Thread.Sleep(1); // this also seems to cure it
#if VERBOSE
Console.WriteLine("Consume starts"); // especially this one seems to matter
#endif
WorkItem cur = null;
int tot = 0;
while (fst == null) { } // busy wait for first one
cur = fst;
#if VERBOSE
Console.WriteLine("Consume found first");
#endif
while (true) {
if (cur.Value == -1) break; // if stop element break;
tot += cur.Value;
while (cur.Next == null) { } // busy wait for next to be set
cur = cur.Next; // move to next
}
Console.WriteLine(tot);
};
try { Parallel.Invoke(create, consume); }
catch (AggregateException e) {
Console.WriteLine(e.Message);
foreach (var ie in e.InnerExceptions) Console.WriteLine(ie.Message);
}
Console.WriteLine("Consume done..");
Console.ReadKey();
}
The idea is to have a Linkedlist of workitems. One thread adds items to the back of that list, and another thread reads them, does something, and polls the Next field to see if it is set. As soon as it is set it will move to the new one and process it. It polls the Next field in a tight busy loop because it should be set very quickly. Going to sleep, context switching etc would kill the benefit of parallizing the code.
The time it takes to create a workitem would be quite comparable to executing it, so the cycles wasted should be quite small.
When I run the code in release mode, sometimes it works, sometimes it does nothing. The problem seems to be in the 'Consumer' thread, the 'Create' thread always seems to finish. (You can check by fiddling with the Console.WriteLines).
It has always worked in debug mode. In release it about 50% hit and miss. Adding a few Console.Writelines helps the succes ratio, but even then it's not 100%. (the #define VERBOSE stuff).
When I add the Thread.Sleep(1) in the 'Consumer' thread it also seems to fix it. But not being able to reproduce a bug is not the same thing as knowing for sure it's fixed.
Does anyone here have a clue as to what goes wrong here? Is it some optimization that creates a local copy or something that does not get updated? Something like that?
There's no such thing as a partial update right? like a datarace, but then that one thread is half doen writing and the other thread reads the partially written memory? Just checking..
Looking at it I think it should just work.. I guess once every few times the threads arrive in different order and that makes it fail, but I don't get how. And how I could fix this without adding slowing it down?
Thanks in advance for any tips,
Gert-Jan

I do my damn best to avoid the utter minefield of closure/stack interaction at all costs.
This is PROBABLY a (language-level) race condition, but without reflecting Parallel.Invoke i can't be sure. Basically, sometimes fst is being changed by create() and sometimes not. Ideally, it should NEVER be changed (if c# had good closure behaviour). It could be due to which thread Parallel.Invoke chooses to run create() and consume() on. If create() runs on the main thread, it might change fst before consume() takes a copy of it. Or create() might be running on a separate thread and taking a copy of fst. Basically, as much as i love c#, it is an utter pain in this regard, so just work around it and treat all variables involved in a closure as immutable.
To get it working:
//Replace
WorkItem fst = null
//with
WorkItem fst = WorkItem.GetSpecialBlankFirstItem();
//And
if (fst == null) fst = tmp;
//with
if (fst.Next == null) fst.Next = tmp;

A thread is allowed by the spec to cache a value indefinitely.
see Can a C# thread really cache a value and ignore changes to that value on other threads? and also http://www.yoda.arachsys.com/csharp/threads/volatility.shtml

Generics vs Object performance

I'm doing practice problems from MCTS Exam 70-536 Microsft .Net Framework Application Dev Foundation, and one of the problems is to create two classes, one generic, one object type that both perform the same thing; in which a loop uses the class and iterated over thousand times. And using the timer, time the performance of both. There was another post at C# generics question that seeks the same questoion but nonone replied.
Basically if in my code I run the generic class first it takes loger to process. If I run the object class first than the object class takes longer to process. The whole idea was to prove that generics perform faster.
I used the original users code to save me some time. I didn't particularly see anything wrong with the code and was puzzled by the outcome. Can some one explain why the unusual results?
Thanks,
Risho
Here is the code:
class Program
{
class Object_Sample
{
public Object_Sample()
{
Console.WriteLine("Object_Sample Class");
}
public long getTicks()
{
return DateTime.Now.Ticks;
}
public void display(Object a)
{
Console.WriteLine("{0}", a);
}
}
class Generics_Samle<T>
{
public Generics_Samle()
{
Console.WriteLine("Generics_Sample Class");
}
public long getTicks()
{
return DateTime.Now.Ticks;
}
public void display(T a)
{
Console.WriteLine("{0}", a);
}
}
static void Main(string[] args)
{
long ticks_initial, ticks_final, diff_generics, diff_object;
Object_Sample OS = new Object_Sample();
Generics_Samle<int> GS = new Generics_Samle<int>();
//Generic Sample
ticks_initial = 0;
ticks_final = 0;
ticks_initial = GS.getTicks();
for (int i = 0; i < 50000; i++)
{
GS.display(i);
}
ticks_final = GS.getTicks();
diff_generics = ticks_final - ticks_initial;
//Object Sample
ticks_initial = 0;
ticks_final = 0;
ticks_initial = OS.getTicks();
for (int j = 0; j < 50000; j++)
{
OS.display(j);
}
ticks_final = OS.getTicks();
diff_object = ticks_final - ticks_initial;
Console.WriteLine("\nPerformance of Generics {0}", diff_generics);
Console.WriteLine("Performance of Object {0}", diff_object);
Console.ReadKey();
}
}

Well, the first problem I can see is that you're using the DateTime object to measure time in your application (for a very small interval).
You should be using the Stopwatch class. It offers better precision when trying to benchmark code.
The second problem is that you're not allowing for JIT (Just-In-Time compilation). The first call to your code is going to take longer simply because it has to be JIT'd. After that, you'll get your results.
I would make a single call in to your code before you start timing things so you can get an accurate idea of what is happening during the loop.

You should run both classes a separate time before timing it to allow the JITter to run.

Your test is incorrect. Here are your methods:
public void display(T a)
{
Console.WriteLine("{0}", a); // Console.WriteLine(string format, params object[] args) <- boxing is performed here
}
public void display(Object a)// <- boxing is performed here
{
Console.WriteLine("{0}", a);
}
So, in both cases you are using boxing. Much better would be if your class, for example, will count total sum of values, like:
public void add(long a)
{
Total += a;
}
public void display(Object a)// <- boxing is performed here
{
Total += (long) a;// <- unboxing is performed here
}

Your timed code includes a Console.WriteLine(). That will take up 99.999999% of the time.
Your assumption that generic will be faster in this situation is wrong. You may have misinterpreted a remark about non-generic collection classes.
This won't be on he exam.

why would it be faster? both ints must be boxed in order to use Console.WriteLine(string, object)
edit: ToString() itself does not seem to cause boxing
http://weblogs.asp.net/ngur/archive/2003/12/16/43856.aspx
so when you use Console.WriteLine(a); which would call Console.WriteLine(Int32) that should work i guess (i would need to look into reflector to confirm this)

Is using delegates excessively a bad idea for performance? [duplicate]

This question already has answers here:
Does using delegates slow down my .NET programs?
(4 answers)
Closed 9 years ago.
Consider the following code:
if (IsDebuggingEnabled) {
instance.Log(GetDetailedDebugInfo());
}
GetDetailedDebugInfo() may be an expensive method, so we only want to call it if we're running in debug mode.
Now, the cleaner alternative is to code something like this:
instance.Log(() => GetDetailedDebugInfo());
Where .Log() is defined such as:
public void Log(Func<string> getMessage)
{
if (IsDebuggingEnabled)
{
LogInternal(getMessage.Invoke());
}
}
My concern is with performance, preliminary testing doesn't show the second case to be particularly more expensive, but I don't want to run into any surprises if load increases.
Oh, and please don't suggest conditional compilation because it doesn't apply to this case.
(P.S.: I wrote the code directly in the StackOverflow Ask a Question textarea so don't blame me if there are subtle bugs and it doesn't compile, you get the point :)

No, it shouldn't have a bad performance. After all, you'll be calling it only in debug mode where performance is not at the forefront. Actually, you could remove the lambda and just pass the method name to remove the overhead of an unnecessary intermediate anonymous method.
Note that if you want to do this in Debug builds, you can add a [Conditional("DEBUG")] attribute to the log method.

There is a difference in performance. How significant it is will depend on the rest of your code so I would recommend profiling before embarking on optimisations.
Having said that for your first example:
if (IsDebuggingEnabled)
{
instance.Log(GetDetailedDebugInfo());
}
If IsDebuggingEnabled is static readonly then the check will be jitted away as it knows it can never change. This means that the above sample will have zero performance impact if IsDebuggingEnabled is false, because after the JIT is done the code will be gone.
instance.Log(() => GetDetailedDebugInfo());
public void Log(Func<string> getMessage)
{
if (IsDebuggingEnabled)
{
LogInternal(getMessage.Invoke());
}
}
The method will be called every time instance.Log is called. Which will be slower.
But before expending time with this micro optimization you should profile your application or run some performance tests to make sure this is actually a bottle neck in your application.

I was hoping for some documentation regarding performance in such cases, but it seems that all I got were suggestions on how to improve my code... No one seems to have read my P.S. - no points for you.
So I wrote a simple test case:
public static bool IsDebuggingEnabled { get; set; }
static void Main(string[] args)
{
for (int j = 0; j <= 10; j++)
{
Stopwatch sw = Stopwatch.StartNew();
for (int i = 0; i <= 15000; i++)
{
Log(GetDebugMessage);
if (i % 1000 == 0) IsDebuggingEnabled = !IsDebuggingEnabled;
}
sw.Stop();
Console.WriteLine(sw.ElapsedMilliseconds);
}
Console.ReadLine();
for (int j = 0; j <= 10; j++)
{
Stopwatch sw = Stopwatch.StartNew();
for (int i = 0; i <= 15000; i++)
{
if (IsDebuggingEnabled) GetDebugMessage();
if (i % 1000 == 0) IsDebuggingEnabled = !IsDebuggingEnabled;
}
sw.Stop();
Console.WriteLine(sw.ElapsedMilliseconds);
}
Console.ReadLine();
}
public static string GetDebugMessage()
{
StringBuilder sb = new StringBuilder(100);
Random rnd = new Random();
for (int i = 0; i < 100; i++)
{
sb.Append(rnd.Next(100, 150));
}
return sb.ToString();
}
public static void Log(Func<string> getMessage)
{
if (IsDebuggingEnabled)
{
getMessage();
}
}
Timings seem to be exactly the same between the two versions.
I get 145 ms in the first case, and 145 ms in the second case
Looks like I answered my own question.

You can also do this:
// no need for a lambda
instance.Log(GetDetailedDebugInfo)
// Using these instance methods on the logger
public void Log(Func<string> detailsProvider)
{
if (!DebuggingEnabled)
return;
this.LogImpl(detailsProvider());
}
public void Log(string message)
{
if (!DebuggingEnabled)
return;
this.LogImpl(message);
}
protected virtual void LogImpl(string message)
{
....
}

Standard answers:
If you gotta do it, you gotta do it.
Loop it 10^9 times, look at a stopwatch, & that tells you how many nanoseconds it takes.
If your program is big, chances are you have bigger problems elsewhere.

Call getMessage delegate directly instead of calling Invoke on it.
if(IsDebuggingEnabled)
{
LogInternal(getMessage());
}
You should also add null check on getMessage.

I believe delegates create a new thread, so you may be right about it increasing performance.
Why not set up a test run like Dav suggested, and keep a close eye on the number of threads spawned by your app, you can use Process Explorer for that.
Hang on! I've been corrected! Delegates only use threads when you use 'BeginInvoke'... so my above comments don't apply to the way you're using them.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

cast to IEnumerable results in performance hit - c#

Maybe generating the template code the first time for the IEnumerable?

Related

Java is scaling much worse than C# over many cores?

Why is concurrent modification of arrays so slow?

Why does parallel this code work sometimes?

Generics vs Object performance

Is using delegates excessively a bad idea for performance? [duplicate]

Categories

Resources