Why doesn't hoisting exist in C#? - c#

I use both Javascript and C# on a daily basis and I sometimes have to consider hoisting when using Javascript. However, C# doesn't seem to implement hoisting(that I know of) and I can't figure out why. Is it more of a design choice or is it more akin to a security or language constraint that applies to all statically typed languages?
For the record, I'm not saying i WANT it to exist in C#. I just want to understand why it doesn't.
EDIT: I noticed the issue when I declared a variable after a LINQ query, but the LINQ query was deferred until after the variable declaration.
var results = From c In db.LoanPricingNoFee Where c.LoanTerm == LoanTerm
&& c.LoanAdvance <= UpperLimit Select c
Order By c.LoanInstalment Ascending;
Int LoanTerm = 12;
Throws an error whereas:
int LoanTerm = 12;
var results = From c In db.LoanPricingNoFee Where c.LoanTerm == LoanTerm
&& c.LoanAdvance <= UpperLimit Select c
Order By c.LoanInstalment Ascending;
Does not.

Of all the programming languages I have used, Javascript has the most confusing scope system and hoisting is a part of that. The outcome is that it is easy to write unpredictable code in JavaScript and you have to be careful with how you write it to make it into the powerful and expressive language it can be.
C#, in common with almost every other language, assumes that you will not use a variable until you have declared it. Because it has a compiler it can enforce that by simply refusing to compile if you try to use an undeclared variable. The other approach to this, more often seen in scripting languages, is that if a variable is used without having been declared it is instantiated at first use. This can make it somewhat hard to follow the flow of code and is often used as a criticism of languages that behave that way. Most people who have used languages with block level scope ( where variables only exist at the level where they were declared ) find it a particularly weird feature of Javascript.
A couple of big reasons that hoisting can cause problems:
It is absolutely counter-intuitive and makes code harder to read and its behaviour harder to predict unless you are conscious of this behaviour. Hard to read and hard to predict code is far more likely to include bugs.
In terms of limiting the number of bugs in your code, limiting the lifetime of your variables can be really helpful. If you can declare the variable and use it in two lines of code, then having ten lines of code in between those two lines gives a lot of opportunities to accidentally affect the behaviour of the variable. There is a lot of information on this in Code Complete - if you haven't read that, I heartily recommend it.
There is a classic UX concept of the Principle Of Least Astonishment - features like hoisting ( or like the way Javascript handles equality ) tend to break that. People don't often think of user experience when developing programming languages, but actually programmers tend to be quite discerning users and more than a little grumpy when they find themselves routinely caught out by odd features. Javascript is very lucky that it's unique ubiquity in the browser has created a kind of enforced popularity that meant we have to tolerate its many quirks and problematic design decisions.
Finally, I cannot imagine a reason why it would be a useful addition to a language like C#- what possible benefit could it confer?

"Is it more of a design choice or is it more akin to a security or language constraint that applies to all statically typed languages?"
It's not a constraint of static typing. It would be trivial for the compiler to move all variable declarations to the top of the scope (in Javascript this is the top of the function, in C# the top of the current block) and to error if a name was declared with different types.
So the reason hoisting doesn't exist in C# is purely a design decision. Why it was designed that way I can't say I wasn't on the team. But it was probably due to the ease of parsing (both for human programmers and the compiler) if variables are always declared before use.

There is a form of Hoisting that exists in C# (and Java), in the context of Loop-invariant code motion - which is the JIT compiler optimization which "hoists" (pulls up) expressions from loop statements that don't effect the actual loop.
You can learn more about it here.
Quote:
“Hoisting” is a compiler optimization that moves loop-invariant code
out of loops. “Loop-invariant code” is code that is referentially
transparent to the loop and can be replaced with its values, so that
it doesn’t change the semantic of the loop. This optimization improves
runtime performance by executing the code only once rather than at
each iteration.
So this written code
public void Update(int[] arr, int x, int y)
{
for (var i = 0; i < arr.Length; i++)
{
arr[i] = x + y;
}
}
is actually optimized to be somewhat like this:
public void Update(int[] arr, int x, int y)
{
var temp = x + y;
var length = arr.Length;
for (var i = 0; i < length; i++)
{
arr[i] = temp;
}
}
This happens in the JIT - i.e. when translating the IL into native machine instructions so its not so easy to view (you can check here, and here).
I'm not an expert in reading assembly, but here is what I got from running this snippet with BenchmarkDotNet, and my comments on it showing that the optimization actually took place:
int[] arr = new int[10];
int x = 11;
int y = 19;
public void Update()
{
for (var i = 0; i < arr.Length; i++)
{
arr[i] = x + y;
}
}
Generated:

Because it is a faulty concept, most probably existing because of rushed implementation of JavaScript. It is a bad approach to coding, which can mislead even experienced javascript coder about scope of a variable.

Function hoisting has a potentially unnecessary cost in work that the compiler has to fulfill. For example, if a variable declaration is never even reached because various code control decisions returned the function, then the processor does not need to waste time pushing an undefined null-reference variable onto the stack memory and then popping it from the stack as part of it's method's clean up operations when it wasn't even reached.
Also, remember that JavaScript has "variable hoisting" and "function hoisting" (among others) which are treated differently. Function hoisting wouldn't make sense in C# since it is not a top-down interpreted language. Once the code is compiled, the method might not ever be called. In JavaScript, however, the "self-invoking" functions are evaluated immediately as the interpreter parses them.
I doubt that it was an arbitrary design decision: Not only is hoisting inefficient for C#, but it just wouldn't make sense for the way that C# works.

Related

Do functions slow down performance

I am using c# to go through a loop and do something (this loop is massive, sometimes as big as 1,000,000 records long). I wanted to replace the inline code with code that does the exact same thing, except in a function.
I am guessing there is a slight decrease in performance, but will it actually be noticeable?
If I have a loop:
public void main()
{
int x = 0;
for (int i = 0; i < 1000; i++)
{
x += 1;
}
}
Would my loop slow down if I did the same thing except this time making use of a function?
public void main()
{
int x = 0;
for (int i = 0; i < 1000; i++)
{
x = incrementInt(x);
}
}
public int incrementInt(int x)
{
return x + 1;
}
EDIT:
Fixed logic bug, sorry for that.
A method call will always slow you down. But the JIT compiler can inline your method if a set of conditions is fullfilled which results in assembly code which is equivalent to your first example (if you fix the logic bug in your example).
The question you are indirectly asking is under which circumstances my method is inlined? There are many different rules but the easiest way to be sure that inlining does work is that you measure it.
You can also use PerfView to find out for each method why it was not inlined. You can give the JIT compiler a hint to relax some of the rules and to inline a method with .NET 4.5
See http://blogs.microsoft.co.il/sasha/2012/01/20/aggressive-inlining-in-the-clr-45-jit/
There are some conditions described which prevent inlining:
Methods marked with MethodImplOptions.NoInlining
Methods larger than 32 bytes of IL
Virtual methods
Methods that take a large value type as a parameter
Methods on MarshalByRef classes
Methods with complicated flowgraphs
Methods meeting other, more exotic criteria
If you follow the rules and measure carefully you can write highly performant code while keeping readable and maintainable code.
I have written a test application and run the performance analyzer on the code and the function call is slower than the loop (Although as mentioned above the two do different things.)
It is very simple to analyze these things in VS2012. Just click the "ANALYZE" menu item and select "Start Performance Analysis".
Calling a function is slower than not calling it, but you can really ignore this.

In C#, does copying a member variable to a local stack variable improve performance?

I quite often write code that copies member variables to a local stack variable in the belief that it will improve performance by removing the pointer dereference that has to take place whenever accessing member variables.
Is this valid?
For example
public class Manager {
private readonly Constraint[] mConstraints;
public void DoSomethingPossiblyFaster()
{
var constraints = mConstraints;
for (var i = 0; i < constraints.Length; i++)
{
var constraint = constraints[i];
// Do something with it
}
}
public void DoSomethingPossiblySlower()
{
for (var i = 0; i < mConstraints.Length; i++)
{
var constraint = mConstraints[i];
// Do something with it
}
}
}
My thinking is that DoSomethingPossiblyFaster is actually faster than DoSomethingPossiblySlower.
I know this is pretty much a micro optimisation, but it would be useful to have a definitive answer.
Edit
Just to add a little bit of background around this. Our application has to process a lot of data coming from telecom networks, and this method is likely to be called about 1 billion times a day for some of our servers. My view is that every little helps, and sometimes all I am trying to do is give the compiler a few hints.
Which is more readable? That should usually be your primary motivating factor. Do you even need to use a for loop instead of foreach?
As mConstraints is readonly I'd potentially expect the JIT compiler to do this for you - but really, what are you doing in the loop? The chances of this being significant are pretty small. I'd almost always pick the second approach simply for readability - and I'd prefer foreach where possible. Whether the JIT compiler optimizes this case will very much depend on the JIT itself - which may vary between versions, architectures, and even how large the method is or other factors. There can be no "definitive" answer here, as it's always possible that an alternative JIT will optimize differently.
If you think you're in a corner case where this really matters, you should benchmark it - thoroughly, with as realistic data as possible. Only then should you change your code away from the most readable form. If you're "quite often" writing code like this, it seems unlikely that you're doing yourself any favours.
Even if the readability difference is relatively small, I'd say it's still present and significant - whereas I'd certainly expect the performance difference to be negligible.
If the compiler/JIT isn't already doing this or a similar optimization for you (this is a big if), then DoSomethingPossiblyFaster should be faster than DoSomethingPossiblySlower. The best way to explain why is to look at a rough translation of the C# code to straight C.
When a non-static member function is called, a hidden pointer to this is passed into the function. You'd have roughly the following, ignoring virtual function dispatch since it's irrelevant to the question (or equivalently making Manager sealed for simplicity):
struct Manager {
Constraint* mConstraints;
int mLength;
}
void DoSomethingPossiblyFaster(Manager* this) {
Constraint* constraints = this->mConstraints;
int length = this->mLength;
for (int i = 0; i < length; i++)
{
Constraint constraint = constraints[i];
// Do something with it
}
}
void DoSomethingPossiblySlower()
{
for (int i = 0; i < this->mLength; i++)
{
Constraint constraint = (this->mConstraints)[i];
// Do something with it
}
}
The difference is that in DoSomethingPossiblyFaster, mConstraints lives on the stack and access only requires one layer of pointer indirection, since it's at a fixed offset from the stack pointer. In DoSomethingPossiblySlower, if the compiler misses the optimization opportunity, there's an extra pointer indirection. The compiler has to read a fixed offset from the stack pointer to access this and then read a fixed offset from this to get mConstraints.
There are two possible optimizations that could negate this hit:
The compiler could do exactly what you did manually and cache mConstraints on the stack.
The compiler could store this in a register so that it doesn't need to fetch it from the stack on every loop iteration before dereferencing it. This means that fetching mConstraints from this or from the stack is basically the same operation: A single dereference of a fixed offset from a pointer that's already in a register.
You know the response you will get, right? "Time it."
There is probably not a definitive answer. First, the compiler might do the optimization for you. Second, even if it doesn't, indirect addressing at the assembly level may not be significantly slower. Third, it depends on the cost of making the local copy, compared to the number of loop iterations. Then there are caching effects to consider.
I love to optimize, but this is one place I would definitely say wait until you have a problem, then experiment. This is a possible optimization that can be added when needed, not one of those optimizations that needs to be planned up front to avoid a massive ripple effect later.
Edit: (towards a definitive answer)
Compiling both functions in release mode and examining the IL with IL Dasm shows that in both places the "PossiblyFaster" function uses the local variable, it has one less instruction
ldloc.0 vs
ldarg.0; ldfld class Constraint[] Manager::mConstraints
Of course, this is still one level removed from the machine code - you don't know what the JIT compiler will do for you. But it is likely that "PossiblyFaster" is marginally faster.
However, I still don't recommend adding the extra variable until you are sure this function is the most expensive thing in your system.
I've profiled this and came up with a bunch of interesting results that are probably only valid for my specific example, but I thought would be worth while noting here.
The fastest is X86 release mode. That runs one iteration of my test in 7.1 seconds, whereas the equivalent X64 code takes 8.6 seconds. This was running 5 iterations, each iteration processing the loop 19.2 million times.
The fastest approach for the loop was:
foreach (var constraint in mConstraints)
{
... do stuff ...
}
The second fastest approach, which massively surprised me was the following
for (var i = 0; i < mConstraints.Length; i++)
{
var constraint = mConstraints[i];
... do stuff ...
}
I guess this was because mConstraints was stored in a register for the loop.
This slowed down when I removed the readonly option for mConstraints.
So, my summary from this is that being readable in this situation does give performance as well.

Does the c# compiler optimizes Count properties?

List<int> list = ...
for(int i = 0; i < list.Count; ++i)
{
...
}
So does the compiler know the list.Count does not have to be called each iteration?
Are you sure about that?
List<int> list = new List<int> { 0 };
for (int i = 0; i < list.Count; ++i)
{
if (i < 100)
{
list.Add(i + 1);
}
}
If the compiler cached the Count property above, the contents of list would be 0 and 1. If it did not, the contents would be the integers from 0 to 100.
Now, that might seem like a contrived example to you; but what about this one?
List<int> list = new List<int>();
int i = 0;
while (list.Count <= 100)
{
list.Add(i++);
}
It may seem as if these two code snippets are completely different, but that's only because of the way we tend to think about for loops versus while loops. In either case, the value of a variable is checked on every iteration. And in either case, that value very well could change.
Typically it's not safe to assume the compiler optimizes something when the behavior between "optimized" and "non-optimized" versions of the same code is actually different.
The C# compiler does not do any optimizations like this. The JIT compiler, however, optimizes this for arrays, I believe (which are not resizable), but not for lists.
A List's count property can change within the loop structure, so it would be an incorrect optimization.
It's worth noting, as nobody else has mentioned it, that there is no knowing from looking at a loop like this what the "Count" property will actually do, or what side effects it may have.
Consider the following cases:
A third party implementation of a property called "Count" could execute any code it wished to. e.g. return a Random number for all we know. With List we can be a bit more confident about how it will operate, but how is the JIT to tell these implementations apart?
Any method call within the loop could potentially alter the return value of Count (not just a straight "Add" directly on the collection, but a user method that is called in the loop might also party on the collection)
Any other thread that happens to be executing concurrently could also change the Count value.
The JIT just can't "know" that Count is constant.
However, the JIT compiler can make the code run much more efficiently by inlining the implementation of the Count property (as long as it is a trivial implementation). In your example it may well be inlined down to a simple test of a variable value, avoiding the overhead of a function call on each iteration, and thus making the final code nice and fast. (Note: I don't know if the JIT will do this, just that it could. I don't really care - see the last sentence of my answer to find out why)
But even with inlining, the value may still be changed between iterations of the loop, so it would still need to be read from RAM for each comparison. If you were to copy Count into a local variable and the JIT could determine by looking at the code in the loop that the local variable will remain constant for the loop's lifetime, then it may be able to further optimise it (e.g. by holding the constant value in a register rather than having to read it from RAM on each iteration). So if you (as a programmer) know that Count will be constant for the lifetime of the loop, you may be able to help the JIT by caching Count in a local variable. This gives the JIT the best chance of optimising the loop. (But there are no guarantees that the JIT will actually apply this optimisation, so it may make no difference to the execution times to manually "optimise" this way. You also risk things going wrong if your assumption (that Count is constant) is incorrect. Or your code may break if another programmer edits the contents of the loop so that Count is no longer constant, and he doesn't spot your cleverness)
So the moral of the story is: The JIT can make a pretty good stab at optimising this case by inlining. Even if it doesn't do this now, it may do it with the next C# version. You might not gain any advantage by manually "optmising" the code, and you risk changing its behaviour and thus breaking it, or at least making future maintenance of your code more risky, or possibly losing out on future JIT enhancements. So the best approach is to just write it the way you have, and optimise it when your profiler tells you that the loop is your performance bottleneck.
Hence, IMHO it's interesting to consider/understand cases like this, but ultimately you don't actually need to know. A little bit of knowledge can be a dangerous thing. Just let the JIT do its thing, and then profile the result to see if it needs improving.
If you take a look at the IL generated for Dan Tao's example you'll see a line like this at the condition of the loop:
callvirt instance int32 [mscorlib]System.Collections.Generic.List`1<int32>::get_Count()
This is undeniable proof that Count (i.e. get_Count()) is called for every iteration of the loop.
For all the other commenters who say that the 'Count' property could change in a loop body: JIT optimizations let you take advantage of the actual code that's running, not the worst-case of what might happen. In general, the Count could change. But it doesn't in all code.
So in the poster's example (which might not have any Count-changing), is it unreasonable for the JIT to detect that the code in the loop doesn't change whatever internal variable List uses to hold its length? If it detects that list.Count is constant, wouldn't it lift that variable access out of the loop body?
I don't know if the JIT does this or not. But I am not so quick to brush this problem off as trivially "never."
No, it doesn't. Because condition is calculated on each step. It can be more complex than just comparsion with count, and any boolean expression is allowed:
for(int i = 0; new Random().NextDouble() < .5d; i++)
Console.WriteLine(i);
http://msdn.microsoft.com/en-us/library/aa664753(VS.71).aspx
It depends on the particular implementation of Count; I've never noticed any performance issues with using the Count property on a List so I assume it's ok.
In this case you can save yourself some typing with a foreach.
List<int> list = new List<int>(){0};
foreach (int item in list)
{
// ...
}

Do you use 1-3 letters variables EVERYWHERE? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 7 years ago.
Improve this question
I notice, in C# i use very short variable names EVERYWHERE. My code is polluted with
foreach(var (v|f|i) in SOMETHING)
for(int (i|n|z)=0
var (ret|r) = blah();
...
return ret;
var sw = new StringWriter();
using(var r = cmd.ExecuteNonQuery()) {
while(r.Read()) {
r.GetSomething(3)
I dont know if this is bad or ok. I can certainly read it. I havent looked at code 5months ago or older so i cant say if i understand old code. All my functions are short and do one thing so by reading the function you can get a good idea what the variable is, especially since theres <=5vars in a function.
People use to yell at me about not using good variable names. Am i not using good variable names or is this ok?
Write code for humans to read. A good rule of thumb is the bigger the scope in which a variable is used, the more descriptive its name should be. Function parameters especially should have very descriptive names, with the exception of functions where it is obvious what the parameter does, as in
double sqrt(double n)
However, if it's something commonly given a short name and used in a small scope, then use a short name. Examples:
//these are okay
var sb = new StringBuilder();
var sw = new StringWriter();
var cmd = new SqlCommand();
for(var i = 0; i< count; i++) {}
Unless your code is minified, you shouldn't see vars like this all over the place. Your code should be effortlessly intelligible.
I recall hearing that we ought all code as if the next person to manage our project is a psychopathic killer who knows where you live.
Using short variable names for local variables is okay as long as the scope is limited.
Personally, I find that for simple usage short concise variable names tend to be easier to read than longer ones.
using (StreamReader sr = new StreamReader(inputStream))
{
sr.ReadByte();
}
As opposed to:
using (StreamReader streamReader = new StreamReader(inputStream))
{
streamReader.ReadByte();
}
It's really all about readability. Every situation is different, and developer teams are different. Follow the coding standard for the project, if that exists. If not, follow the style of existing codebase, if that exists.
I agree with some of the answers here say that variables names should have good names. But I believe that presupposes that an object has semantic value. Sometimes, it doesn't. In some cases, you just need an instance of a specific object to perform some small task, after which it becomes irrelevant. In cases like this, I believe that abbreviated identifiers are acceptable.
Note: Just because the usage of a variable is limited in its scope does not necessarily mean that an meaningless name is okay. If there is a good name that represents what the object does, then it should be used. If you can come up with a variable name that answers 'Why?', then that name is far preferable.
Also, using 'i' and 'j' for for indexes is well understood by developers. By convention, loop counter variables have been named this way since the days of FORTRAN.
for (int i = 0; i < 10; i++)
{
for (int j = 0; j < 10; j++)
{
PerformOperation(i,j);
}
}
Some years ago I discovered what happens if I made my functions short:
I could understand them. My brain is small, and long functions don't fit.
Classes get complicated (lots of functions). But Extract Class produced small, cohesive, single-purpose classes. Again, small brain, so small classes required.
The number of variables in a function (or class) is small. Remembering which is which from declaration time to use time is easy, because the distance is short.
The number of scopes in my functions is small, so I don't have to figure out which variables go where.
With all of that in place, how I name my variables doesn't matter much. The names don't have to make up for code that is otherwise hard to understand.
Since the number of variables in a scope is small, and the purpose obvious, I rarely need to put any effort in to choosing a descriptive name. Since I don't want to strain my small brain any more than I have to, I never abbreviate. My default name for a variable is the name of the type. E.g. class Foo goes in variable foo. In my code, if it's ever something different, you know something special is happening, and you should pay attention.
In the old days, my no-abbreviation habit would have produce unwieldy code, but since my methods and classes are small, the code doesn't suffer.
It's not just a matter of good variable names (which is a good idea) but rather if someone else could make sense of what you've written relying on the comments as well as the variable names.
Sure, for things like counters or simple actions short and concise names make sense. For more complex algorithms or something that is a little harder to read, you'll want to elaborate to the extent that the code is clear.
Every shop and every developer is different. At the end of the day, try to write your code with consideration for the next guy that might have to maintain it.
Using one letter variable names as indexes in loops or in short well defined blocks is normally considered ok. However, there is nothing wrong with using descriptive camel case names that convey meaning to others reading your code, for things like function arguments and local variables.
With limited exceptions, no - this is not OK. There's just no excuse any longer for single letter or overly abbreviated variable names. Even if you're a hunt-and-peck typist, intellisense means you almost never have to spell anything out. If you continue to name variables this way you are punishing both yourself any anyone unfortunate enough to be tasked with maintaining your code.
Would I consider it a bad coding style? Well, yes.
If you were working with me on same code I'd repeatedly remind you to name your variables better. In short, good code should be readable by other developers without much trouble and good variable names help a lot. Maybe you don't have problems reading your code even after a while, but the question is whether someone who has never worked on that good would be equally fine with it.
There are a few exceptions where I think that short variable names are okay:
Indexes (mostly in for loops) such as:
for (int i = 0; i < 10; i++)
{
}
Variables used in a very limited scope, such as Linq queries, lambda expressions or some of the examples already mentioned like Streamwriters and -readers and such are another example where I think that short variable names are fine.
Furthermore it's always a question of how readable your code eventually is. The reason why I would be constantly nagging at people who use short variable names is that for me that it is an indicator that they generally don't care about how readable their code is (especially for others).
I have no idea how you can keep track of things when you have variable names like that.
Generally, its much better to have longer names that actually describe the variables. The thing to strive for is for anyone to be able to read the code and understand whats going on, to be able to understand what they are for etc =)
It seems like the average length of my variable names increases by one every year I spend writing (and more importantly reading) code.
It should be immediately clear what any variable is for just by looking at a few lines of code. This can either be due to a nice variable name or the context.
About the only time I use short variable names is either if a short name is entirely descriptive (ie, x & y in a situation dealing with coordinates) or it's a utility function that operates on a data type (ie, capitalize the first letter of this string. It's a string, what else can you say about it? I named it S.)
I might not know what 'r' is later on in the code. Also, variable names are one thing, but you should be commenting code for the verbose explanation.
NB: This should probably be a community wiki as there's no definite answer.
This is bad. This is unmaintainable.
Short variables have their place. There is really no reason to write
for(int iterator; iterator
The rule of thumb is: One letter per screen of reach. With standarized 24 lines screen.
The exception is picking one-two extremely frequently used globals or semi-globals like pointer to the data storage or THE input data pointer, and make them 1-3 letters long. But anything else - use reason. Short loops - one letter. Locals of a function - 3-5 letters. Class variables - full word. Library class/function names - two words.
I don't see any reason to use short variable names that say nothing. We live in 21st century, we've got IDEs with IntelliSense (or other autocompletion)! Just press Ctrl+Space and it will advice you normal name for your variable depending on variable type, e.g.
StringBuilder <Ctrl+Space> stringBuilder;
List<Person> <Ctrl+Space> persons;
It is even easier than to type something like sb or another short name. No reason to use short names anymore.
P.S.: The only exception for me are counters like i, j, k in for loop.
I tend to prefer short "cryptic" variables (Symbols, in Mathematica) combined with descriptive comments.
Mathematica already has VeryLongFunctionNames for built in commands, and adding my own often spreads out code more than I care for.
I find it easier to read a shorter block of code where I can see everything at once, alongside a series of symbol descriptions.

How to get optimization from a "pure function" in C#?

If I have the following function, it is considered pure in that it has no side effects and will always produce the same result given the same input x.
public static int AddOne(int x) { return x + 1; }
As I understand it, if the runtime understood the functional purity it could optimize execution so that return values wouldn't have to be re-calculated.
Is there a way to achieve this kind of runtime optimization in C#? And I assume there is a name for this kind of optimization. What's it called?
Edit: Obviously, my example function wouldn't have a lot of benefit from this kind of optimization. The example was given to express the type of purity I had in mind rather than the real-world example.
As others have noted, if you want to save on the cost of re-computing a result you've already computed, then you can memoize the function. This trades increased memory usage for increased speed -- remember to clear your cache occasionally if you suspect that you might run out of memory should the cache grow without bound.
However, there are other optimizations one can perform on pure functions than memoizing their results. For example, pure functions, having no side effects, are usually safe to call on other threads. Algorithms which use a lot of pure functions can often be parallelized to take advantage of multiple cores.
This area will become increasingly important as massively multi-core machines become less expensive and more common. We have a long-term research goal for the C# language to figure out some way to take advantage of the power of pure functions (and impure but "isolated" functions) in the language, compiler and runtime. But doing so involves many difficult problems, problems about which there is little consensus in industry or academia as to the best approach. Top minds are thinking about it, but do not expect any major results any time soon.
if the calculation was a costly one, you could cache the result in a dictionary?
static Dictionary<int, int> cache = new Dictionary<int, int>();
public static int AddOne(int x)
{
int result;
if(!cache.TryGetValue(x, out result))
{
result = x + 1;
cache[x] = result;
}
return result;
}
of course, the dictionary lookup in this case is more costly than the add :)
There's another much cooler way to do functional memoization explained by Wes Dyer here: http://blogs.msdn.com/wesdyer/archive/2007/01/26/function-memoization.aspx - if you do a LOT of this caching, then his Memoize function might save you a lot of code...
I think you're looking for functional memoization
The technique you are after is memoization: cache the results of execution, keyed off the arguments passed in to the function, in an array or dictionary. Runtimes do not tend to apply it automatically, although there are certainly cases where they would. Neither C# nor .NET applies memoization automatically. You can implement memoization yourself - it's rather easy -, but doing so is generally useful only for slower pure functions where you tend to repeat calculations and where you have enough memory.
This will probably be inlined (aka inline expansion) by the compiler ...
Just make sure you compile your code with the "Optimize Code" flag set (in VS : project properties / build tab / Optimize Code)
The other thing you can do is to cache the results (aka memoization). However, there is a huge initial performance hit due to your lookup logic, so this is interesting only for slow functions (ie not an int addition).
There is also a memory impact, but this can be managed through a clever use of weak references.
As I understand it, if the runtime
understood the functional purity it
could optimize execution so that
return values wouldn't have to be
re-calculated.
In your example, the runtime WILL have to compute the result, unless x is known at compile time. In that case, your code will be further optimized through the use of constant folding
How could the compiler do that ? How does it know what values of x are going to be passed in at runtime?
and re: other answers that mention inlining...
My understanding is that inlining (as an optimization) is warranted for small functions that are used only once (or only a very few times...) not because they have no side effects...
A compiler can optimize this function through a combination of inlining (replacing a function call with the body of that function at the call site) and constant propagation (replacing an expression with no free variables with the result of that expression). For example, in this bit of code:
AddOne(5);
AddOne can be inlined:
5 + 1;
Constant propagation can then simplify the expression:
6;
(Dead code elimination can then simplify this expression even further, but this is just an example).
Knowing that AddOne() has no side effects might also enable the a compiler to perform common subexpression elimination, so that:
AddOne(3) + AddOne(3)
may be transformed to:
int x = AddOne(3);
x + x;
or by strength reduction, even:
2*AddOne(3);
There is no way to command the c# JIT compiler to perform these optimizations; it optimizes at its own discretion. But it's pretty smart, and you should feel comfortable relying on it to perform these sorts of transformations without your intervention.
Another option is to use a fody plugin https://github.com/Dresel/MethodCache
you can decorate methods that should be cached. When using this you should of course take into consideration all the comments mentioned in the other answers.

Categories

Resources