Consider this code:
var (mult, sum) = MultSum(a, b);
and
var (_, sum) = MultSum(a, b);
Question 1.
If I use discard instead of a variable name, does it have performance benefit? eg. by reducing assignment operations.
Question 2.
Is there any way to write the MultSum smart enough so it doesn't calculate the discards!?
If I use discard instead of a variable name, does it have performance benefit? eg. by reducing assignment operations.
In your particular case it is unlikely that there would be a benefit in performance. The tuple that is returned is assigned to temporary storage; you've just not given a name to one part of that storage.
Now, if you had an expression that had discards that were entire values, not fragments of a tuple, then the compiler and the jitter can be smart about not allocating any storage on the short-term pool for the result, or re-using existing storage that was already allocated. Note that by "short-term pool" I effectively mean "activation record on the stack" or "registers". This could, in theory, lead to better register allocation or smaller frames (and therefore better locality of reference) and that in turn could save you entire nanoseconds.
Nano-optimizations are generally not worth it; there is almost always a better bang-for-buck performance problem to attack. But if you think it might be relevant for your scenario, measure it and see. That is the only way to know if there is a relevant performance difference. Get out a nano-scale stopwatch, run the code both ways, and see which one is faster.
The benefit you should be attempting to accrue by using discards is the "make my program easier to understand" benefit. Programmers are expensive; optimize for making your code easy for future programmers to read, understand and modify.
Is there any way to write the MultSum smart enough so it doesn't calculate the discards!?
Yes. Write your program in Haskell. Haskell will avoid performing calculations whose results are never used. C# is not such a language.
Suppose I have following program:
static void SomeMethod(Func<int, int> otherMethod)
{
otherMethod(1);
}
static int OtherMethod(int x)
{
return x;
}
static void Main(string[] args)
{
SomeMethod(OtherMethod);
SomeMethod(x => OtherMethod(x));
SomeMethod(x => OtherMethod(x));
}
I cannot understand compiled il code (it uses too extra code). Here is simplified version:
class C
{
public static C c;
public static Func<int, int> foo;
public static Func<int, int> foo1;
static C()
{
c = new C();
}
C(){}
public int b(int x)
{
return OtherMethod(x);
}
public int b1(int x)
{
return OtherMethod(x);
}
}
static void Main()
{
SomeMethod(new Func<int, int>(OtherMethod));
if (C.foo != null)
SomeMethod(C.foo)
else
{
C.foo = new Func<int, int>(c, C.b)
SomeMethod(C.foo);
}
if (C.foo1 != null)
SomeMethod(C.foo1)
else
{
C.foo1 = new Func<int, int>(c, C.b1)
SomeMethod(C.foo1);
}
}
Why does compiler create not static equal methods b/b1? Equal means that they have the same code
Your question is: why did the compiler not realize that the two lines
SomeMethod(x => OtherMethod(x));
SomeMethod(x => OtherMethod(x));
Are the same and write this as
if ( delegate is not created )
create the delegate and stash it away
SomeMethod( the delegate );
SomeMethod( the delegate );
? Well let me answer that question in several ways.
First off, is the compiler permitted to make that optimization? Yes. The specification calls out that a C# compiler is permitted to make two lambdas that do exactly the same thing into a single delegate. And in fact you can see that it already does this optimization in part: it creates each delegate once and saves it away so that it doesn't have to create it again later when the code is called again. Notice that this is a waste of memory in the case where code is only called once.
Second, is the compiler required to make the caching optimization? No. The specification calls out that the compiler is only permitted to make the optimization, but not required to.
Is the compiler required to make the optimization you want? Obviously not, because it doesn't. It is permitted to, and maybe a future version of the compiler will. The compiler is open-source; if you care about this optimization, go write it and submit a pull request.
Third, is it possible to make the optimization you want? Yes. The compiler could take all pairs of lambdas that appear in the same method, compile them to the internal tree format, and do a tree comparison to see if they have the same content, and then generate the same static backing field for both.
So now we have a situation: the compiler is permitted to make a particular optimization, and it doesn't. And you've asked "why not"? That's an easy question to answer: all optimizations are not implemented until someone spends the considerable time and effort to:
Carefully design the optimization: under precisely what conditions is the optimization triggered and not triggered? How general should the optimization be? You've suggested that they detect similar lambda bodies but why stop there? You have two identical statements of code, so why not generate the code for those statements once instead of twice? What if you had a repeated group of statements? There is a huge amount of design work to do here.
In particular, an important aspect of the design is: could the user reasonably do the optimization "by hand" while still keeping the code readable. In this case, yes they could, easily. Just assign the duplicated lambda to a variable and then use the variable. An optimization which does automatically something that a user who cared could have done themselves easily is not really a very interesting or compelling optimization.
Your examples are trivial; real-world code is not. What does your proposed design do with identical nested lambdas? And so on.
Does your optimization cause the behaviour of the code in the debugger to "look weird"? You have probably noticed that when debugging code that was compiled with optimizations turned on, the debugger seems to behave weirdly; that's because there's no longer a clear mapping between the generated code and the original code. Does your optimization make that worse? Is it acceptable to users? Does the debugger need to be aware of the optimization? If so, you'll have to change the debugger. In this case, probably not, but these are questions you have to ask and answer.
Get the design reviewed by experts; this takes up their time, and will likely result in changes to the design
Make estimates of the pros and cons of the optimization -- optimizations often have hidden costs, like the memory leak I mentioned before. In particular, optimizations often preclude other optimizations which might be better.
Make estimates as to the total savings world-wide of this optimization. Does the optimization actually affect real-world code? Does it change the correctness of that code? Is there any production code, anywhere in the world, that would break with this optimization and cause the CTO of company X to call the CTO of Microsoft demanding a fix? If the answer is yes then maybe you might want to not do this optimization. C# is not a toy. Millions and millions of people depend on its correct operation every day.
What's the estimated burden of doing the optimization on compile time? Compilation doesn't have to happen between keystrokes but it does have to be pretty fast. Anything which introduces a superlinear algorithm in a common code path in the compiler is going to be unacceptable. Can you implement your optimization so that it is linear in code size? Note that the algorithm I sketched before -- compare all pairs -- is superlinear in code size. (Exercise: what's the worst case asymptotic performance of doing a tree comparison on all pairs of lambdas?)
Actually implement the optimization. I encourage you to do so.
Test the optimization; does it actually produce better code? On what metric? An optimization which causes no change to any metric is not an optimization.
Sign up to fix bugs in the optimization forever.
The optimization you want simply doesn't meet the bar. No one writes code like that. If they did, and they cared that it duplicated an object, they could easily fix it themselves. So the optimization optimizes code that doesn't exist, in order to get a "win" that is the construction of a single object amongst the millions and millions of objects the program will allocate. Not worth it.
But again, if you think it is, go ahead and implement it and submit a pull request. Make sure to submit the results of the investigations I noted above, because those are where the real work is. The implementation is usually the smallest part of the total effort spent on a feature; that's why C# is a successful language.
Consider the situation in which the main logic of a method should only actually run given a certain condition. As far as I know, there are two basic ways to achieve this:
If inverse condition is true, simply return:
public void aMethod(){
if(!aBoolean) return;
// rest of method code goes here
}
or
If original condition is true, continue execution:
public void aMethod(){
if(aBoolean){
// rest of method code goes here
}
}
Now, I would guess that which of these implementations is more efficient is dependent on the language its written in and/or how if statements and return statements, and possibly method calls, are implemented by the compiler/interpreter/VM (depending on language); so the first part of my question is, is this true?
The second part of my question is, if the the answer to the first part is "yes", which of the above code-flow patterns is more efficient specifically in C#/.NET 4.6.x?
Edit:
In reference to Dark Falcon's comment: the purpose of this question is not actually to fix performance issues or optimize any real code I've written, I am just curious about how each piece of each pattern is implemented by the compiler, e.g. for arguments sake, if it was compiled verbatim with no compiler optimizations, which would be more efficient?
TL;DR It doesn't make a difference. Current generations of processors (circa Ivy Bridge and later) don't use a static branch-prediction algorithm that you can reason about anymore, so there is no possible performance gain in using one form or the other.
On most older processors, the static branch-prediction strategy is generally that forward conditional jumps are assumed to be taken, while backwards conditional jumps are assumed not-taken. Therefore, there might be a small performance advantage to be gained the first time the code is executed by arranging for the fall-through case to be the most likely—i.e.,
if { expected } else { unexpected }.
But the fact is, this kind of low-level performance analysis makes very little sense when writing in a managed, JIT-compiled language like C#.
You're getting a lot of answers that say readability and maintainability should be your primary concern when writing code. This is regrettably common with "performance" questions, and while it is completely true and unarguable, it mostly skirts the question instead of answering it.
Moreover, it isn't clear why form "A" would be intrinsically more readable than form "B", or vice versa. There are just as many arguments one way or the other—do all parameter validation at the top of the function, or ensure there is only a single return point—and it ultimately gets down to doing what your style guide says, except in really egregious cases where you'd have to contort the code in all sorts of terrible ways, and then you should obviously do what is most readable.
Beyond being a completely reasonable question to ask on conceptual/theoretical grounds, understanding the performance implications also seems like an excellent way to make an informed decision about which general form to adopt when writing your style guide.
The remainder of the existing answers consist of misguided speculation, or downright incorrect information. Of course, that makes sense. Branch prediction is complicated, and as processors get smarter, it only gets harder to understand what is actually happening (or going to happen) under the hood.
First, let's get a couple of things straight. You make reference in the question to analyzing the performance of unoptimized code. No, you don't ever want to do that. It is a waste of time; you'll get meaningless data that does not reflect real-world usage, and then you'll try and draw conclusions from that data, which will end up being wrong (or maybe right, but for the wrong reasons, which is just as bad). Unless you're shipping unoptimized code to your clients (which you shouldn't be doing), then you don't care how unoptimized code performs. When writing in C#, there are effectively two levels of optimization. The first is performed by the C# compiler when it is generating the intermediate language (IL). This is controlled by the optimization switch in the project settings. The second level of optimization is performed by the JIT compiler when it translates the IL into machine code. This is a separate setting, and you can actually analyze the JITed machine code with optimization enabled or disabled. When you're profiling or benchmarking, or even analyzing the generated machine code, you need to have both levels of optimizations enabled.
But benchmarking optimized code is difficult, because the optimization often interferes with the thing you're trying to test. If you tried to benchmark code like that shown in the question, an optimizing compiler would likely notice that neither one of them is actually doing anything useful and transform them into no-ops. One no-op is equally fast as another no-op—or maybe it's not, and that's actually worse, because then all you're benchmarking is noise that has nothing to do with performance.
The best way to go here is to actually understand, on a conceptual level, how the code is going to be transformed by a compiler into machine code. Not only does that allow you to escape the difficulties of creating a good benchmark, but it also has value above and beyond the numbers. A decent programmer knows how to write code that produces correct results; a good programmer knows what is happening under the hood (and then makes an informed decision about whether or not they need to care).
There has been some speculation about whether the compiler will transform form "A" and form "B" into equivalent code. It turns out that the answer is complicated. The IL will almost certainly be different because it will be a more or less literal translation of the C# code that you actually write, regardless of whether or not optimizations are enabled. But it turns out that you really don't care about that, because IL isn't executed directly. It's only executed after the JIT compiler gets done with it, and the JIT compiler will apply its own set of optimizations. The exact optimizations depend on exactly what type of code you've written. If you have:
int A1(bool condition)
{
if (condition) return 42;
return 0;
}
int A2(bool condition)
{
if (!condition) return 0;
return 42;
}
it is very likely that the optimized machine code will be the same. In fact, even something like:
void B1(bool condition)
{
if (condition)
{
DoComplicatedThingA();
DoComplicatedThingB();
}
else
{
throw new InvalidArgumentException();
}
}
void B2(bool condition)
{
if (!condition)
{
throw new InvalidArgumentException();
}
DoComplicatedThingA();
DoComplicatedThingB();
}
will be treated as equivalent in the hands of a sufficiently capable optimizer. It is easy to see why: they are equivalent. It is trivial to prove that one form can be rewritten in the other without changing the semantics or behavior, and that is precisely what an optimizer's job is.
But let's assume that they did give you different machine code, either because you wrote complicated enough code that the optimizer couldn't prove that they were equivalent, or because your optimizer was just falling down on the job (which can sometimes happen with a JIT optimizer, since it prioritizes speed of code generation over maximally efficient generated code). For expository purposes, we'll imagine that the machine code is something like the following (vastly simplified):
C1:
cmp condition, 0 // test the value of the bool parameter against 0 (false)
jne ConditionWasTrue // if true (condition != 1), jump elsewhere;
// otherwise, fall through
call DoComplicatedStuff // condition was false, so do some stuff
ret // return
ConditionWasTrue:
call ThrowException // condition was true, throw an exception and never return
C2:
cmp condition, 0 // test the value of the bool parameter against 0 (false)
je ConditionWasFalse // if false (condition == 0), jump elsewhere;
// otherwise, fall through
call DoComplicatedStuff // condition was true, so do some stuff
ret // return
ConditionWasFalse:
call ThrowException // condition was false, throw an exception and never return
That cmp instruction is equivalent to your if test: it checks the value of condition and determines whether it's true or false, implicitly setting some flags inside the CPU. The next instruction is a conditional branch: it branches to the specification location/label based on the values of one or more flags. In this case, je is going to jump if the "equals" flag is set, while jne is going to jump if the "equals" flag is not set. Simple enough, right? This is exactly how it works on the x86 family of processors, which is probably the CPU for which your JIT compiler is emitting code.
And now we get to the heart of the question that you're really trying to ask; namely, does it matter whether we execute a je instruction to jump if the comparison set the equal flag, or whether we execute a jne instruction to jump if the comparison did not set the equal flag? Again, unfortunately, the answer is complicated, but enlightening.
Before continuing, we need to develop some understanding of branch prediction. These conditional jumps are branches to some arbitrary section in the code. A branch can either be taken (which means the branch actually happens, and the processor begins executing code found at a completely different location), or it can be not taken (which means that execution falls through to the next instruction as if the branch instruction wasn't even there). Branch prediction is very important because mispredicted branches are very expensive on modern processors with deep pipelines that use speculative execution. If it predicts right, it continues uninterrupted; however, if it predicts wrong, it has to throw away all of the code that it speculatively executed and start over. Therefore, a common low-level optimization technique is replacing branches with clever branchless code in cases where the branch is likely to be mispredicted. A sufficiently smart optimizer would turn if (condition) { return 42; } else { return 0; } into a conditional move that didn't use a branch at all, regardless of which way you wrote the if statement, making branch prediction irrelevant. But we're imagining that this didn't happen, and you actually have code with a conditional branch—how does it get predicted?
How branch prediction works is complicated, and getting more complicated all the time as CPU vendors continue to improve the circuitry and logic inside of their processors. Improving branch prediction logic is a significant way that hardware vendors add value and speed to the things they're trying to sell, and every vendor uses different and proprietary branch-prediction mechanisms. Worse, every generation of processor uses slightly different branch-prediction mechanisms, so reasoning about it in the "general case" is exceedingly difficult. Static compilers offer options that allow you to optimize the code they generate for a particular generation of microprocessor, but this doesn't generalize well when shipping code to a large number of clients. You have little choice but to resort to a "general purpose" optimization strategy, although this usually works pretty well. The big promise of a JIT compiler is that, because it compiles the code on your machine right before you use it, it can optimize for your specific machine, just like a static compiler invoked with the perfect options. This promise hasn't exactly been reached, but I won't digress down that rabbit hole.
All modern processors have dynamic branch prediction, but how exactly they implement it is variable. Basically, they "remember" whether a particular (recent) branch was taken or not taken, and then predict that it will go this way the next time. There are all kinds of pathological cases that you can imagine here, and there are, correspondingly, all kinds of cases in or approaches to the branch-prediction logic that help to mitigate the possible damage. Unfortunately, there isn't really anything you can do yourself when writing code to mitigate this problem—except getting rid of branches entirely, which isn't even an option available to you when writing in C# or other managed languages. The optimizer will do whatever it will; you just have to cross your fingers and hope that it is the most optimal thing. In the code we're considering, then, dynamic branch prediction is basically irrelevant and we won't talk about it any more.
What is important is static branch prediction—what prediction is the processor going to make the first time it executes this code, the first time it encounters this branch, when it doesn't have any real basis on which to make a decision? There are a bunch of plausible static prediction algorithms:
Predict all branches are not taken (some early processors did, in fact, use this).
Assume "backwards" conditional branches are taken, while "forwards" conditional branches are not taken. The improvement here is that loops (which jump backwards in the execution stream) will be correctly predicted most of the time. This is the static branch-prediction strategy used by most Intel x86 processors, up to about Sandy Bridge.
Because this strategy was used for so long, the standard advice was to arrange your if statements accordingly:
if (condition)
{
// most likely case
}
else
{
// least likely case
}
This possibly looks counter-intuitive, but you have to go back to what the machine code looks like that this C# code will be transformed into. Compilers will generally transform the if statement into a comparison and a conditional branch into the else block. This static branch prediction algorithm will predict that branch as "not taken", since it's a forward branch. The if block will just fall through without taking the branch, which is why you want to put the "most likely" case there.
If you get into the habit of writing code this way, it might have a performance advantage on certain processors, but it's never enough of an advantage to sacrifice readability. Especially since it only matters the first time the code is executed (after that, dynamic branch prediction kicks in), and executing code for the first time is always slow in a JIT-compiled language!
Always use the dynamic predictor's result, even for never-seen branches.
This strategy is pretty strange, but it's actually what most modern Intel processors use (circa Ivy Bridge and later). Basically, even though the dynamic branch-predictor may have never seen this branch and therefore may not have any information about it, the processor still queries it and uses the prediction that it returns. You can imagine this as being equivalent to an arbitrary static-prediction algorithm.
In this case, it absolutely does not matter how you arrange the conditions of an if statement, because the initial prediction is essentially going to be random. Some 50% of the time, you'll pay the penalty of a mispredicted branch, while the other 50% of the time, you'll benefit from a correctly predicted branch. And that's only the first time—after that, the odds get even better because the dynamic predictor now has more information about the nature of the branch.
This answer has already gotten way too long, so I'll refrain from discussing static prediction hints (implemented only in the Pentium 4) and other such interesting topics, bringing our exploration of branch prediction to a close. If you're interested in more, examine the CPU vendor's technical manuals (although most of what we know has to be empirically determined), read Agner Fog's optimization guides (for x86 processors), search online for various white-papers and blog posts, and/or ask additional questions about it.
The takeaway is probably that it doesn't matter, except on processors that use a certain static branch-prediction strategy, and even there, it hardly matters when you're writing code in a JIT-compiled language like C# because the first-time compilation delay exceeds the cost of a single mispredicted branch (which may not even be mispredicted).
Same issue when validating parameters to functions.
It's much cleaner to act like a night-club bouncer, kicking the no-hopers out as soon as possible.
public void aMethod(SomeParam p)
{
if (!aBoolean || p == null)
return;
// Write code in the knowledge that everything is fine
}
Letting them in only causes trouble later on.
public void aMethod(SomeParam p)
{
if (aBoolean)
{
if (p != null)
{
// Write code, but now you're indented
// and other if statements will be added later
}
// Later on, someone else could add code here by mistake.
}
// or here...
}
The C# language prioritizes safety (bug prevention) over speed. In other words, almost everything has been slowed down to prevent bugs, one way or another.
If you need speed so badly that you start worrying about if statements, then perhaps a faster language would suit your purposes better, possibly C++
Compiler writers can and do make use of statistics to optimize code, for example "else clauses are only executed 30% of the time".
However, the hardware guys probably do a better job of predicting execution paths. I would guess that these days, the most effective optimizations happen within the CPU, with their L1 and L2 caches, and compiler writers don't need to do a thing.
I am just curious about how each piece of each pattern is implemented
by the compiler, e.g. for arguments sake, if it was compiled verbatim
with no compiler optimizations, which would be more efficient?
The best way to test efficiency in this way is to run benchmarks on the code samples you're concerned with. With C# in particular it is not going to be obvious what the JIT is doing with these scenarios.
As a side note, I throw in a +1 for the other answers that point out that efficiency isn't only determined at the compiler level - code maintainability involves magnitudes of levels of efficiency more than what you'll get from this specific sort of pattern choice.
As [~Dark Falcon] mentioned you should not be concerned by micro optimization of little bits of code, the compiler will most probably optimize both approaches to the same thing.
Instead you should be very concerned about your program maintainability and ease of read
From this perspective you should choose B for two reasons:
It only has one exit point (just one return)
The if block is surrounded by curly braces
edit
But hey! as told in the comments that is just my opinion and what I consider good practices
Which is better in accessing a property value?
Accessing like this
propertyobjA.objB.Prop1
propertyobjA.objB.Prop2
or assign to var
var objB = propertyobjA.objB;
then call objB.Prop1 and objB.Prop1
Which one improves performance in c#?
To be perfectly the honest, the answer is likely that the second will be faster, but I can pretty much guarantee that it will not matter in the slightest. You should be careful of thinking too hard about optimisation too early. 99% of all performance issues are down to much larger issues such as hitting a database too frequently, etc., not trivial issues like this. Even if there was a tiny difference between the two cases, unless this is some of the most time-critical software on the planet, what matters is readability (not that either are hard to read in this case), not which is faster.
It depends on what objB is. If you are calculating something (which you shouldn't do but can do) then of course assigning it to a value will yield better performance.
Another note, you should avoid having dependencies on sub properties of a variable, since you are putting a higher coupling between the classes.
I think this won't make a big difference performancewise (second alternative might be a bit faster). But this is not the place where your performance problems (if any) come from.
UPDATE: Thinking about, the value of propertyobjA.objB could change between getting Prop1 and Prop2, so the two alternatives cannot be considered as being the same code.
The impact to performance largely depends on the implementation of the propertyObjA.objB property getter. For instance, if it is simply implemented as:
public Foo objB { get { return this._objB; } }
Then calling that twice will have a negligible impact on performance.
If, however, that same property did something computationally expensive, then your second suggestion would perform better.
That being said, the framework guidelines state that you should not use property getters to hide potentially computationally expensive operations, instead preferring a method call instead, e.g.:
public objB ComputeB ();
You really ought to not concern yourself with things like that when writing code in a higher level language such as c#.
Modern compilers of such languages as c# and java are extremely sofisticated and will perform all kinds optimizations on your code. The end result for you as developer is that you will never see a difference in performance when writing a particular trivial piece of code one way or the other. The compiler will pick the most optimal way.
Everything else is down to preference. If you like to chain several property accesses, that's fine. If you like to assign an intermediate result to a variable to improve readability of your code, that's fine too.
The question is simple, which is faster between CalledOften1 and CalledOften2
class MyTest
{
public bool test = false;
void CalledOften1()
{
if (!test) test = true;
DoSomething();
}
void CalledOften2()
{
test = true;
DoSomething();
}
}
Is the compiler optimized (if possible) to avoid future assignments of test if it's already true?
UPDATE:
This question is just an information, I will not use the if (bla) style if I can write test=true, I prefer code readability.
I prefer to measure for these sorts of questions rather than guess:
CalledOften1: 52 million operations per second
CalledOften2: 53 million operations per second
So they are nearly the same. If anything, the simpler method is also the faster.
This is a perfect example of premature optimization.
If you want to set test to true every time, just set it. Don't complicate your code for a theorized speedup.
That being said, the reduced instruction set of the second example, along with being simpler and more maintainable, is most likely faster due to avoiding the branching and reducing the number of instructions. A single assignment of a bool is a very fast operation. If you really need to know how much faster it may be, I would profile this yourself. However, I suspect that either would be fast enough in any case.
I would expect the second version to be slightly faster, given that it doesn't involve any branching. It also expresses the intention of "make sure the variable is true, whatever it was before" more clearly IMO. However:
I doubt that it's significant
Any number of actual changes in context could make the results change (including your code, or the version of the framework you're running against)
Write the clearest code first, and optimize later
Benchmark this against your real code, under realistic conditions before you decide to change anything
Compiler optimizes only something that is definite at compile time. This is changed at runtime so answer is no. Compiler could optimize if you were checking against constant. CalledOften1 is faster, but the magnitude is so small that you would not notice. This is kind of microptimisation you should avoid.
If I had to guess, I would say that CalledOften2 is more optimized, as there is no logic test operation done.
In the end, if you are looking at this level of optimization, then your application will probably go as fast as it can. Any performance gain you get out of this type of optimization will likely never be noticed by anyone.
My two cents,
Brian
Premature optimization is the root of all evil. Use the one that expresses your intent most clearly.
(I'm guessing a read+branch is going to be more expensive than just a write, but don't really know the CLR. The important thing is that computers are increasing in speed exponentially, and programmers aren't. Algorithmic improvements in performance bottlenecks are worth exploring, barely measurable constant-time improvements for their own sake aren't.)