Unable to initialize a variable in while in C#, why? [duplicate] - c#

We can do:
using (Stream s ..)
and:
for (int i ...)
Why can't we as well do something like:
while ((int i = NextNum()) > 0) {..}
I find it very useful and sensible.

I'm not a language designer, but I'll give it an educated guess.
The clause inside the while() is executed every single time the loop is executed. (+1 more time at the end.) The statement int i = NextNum() declares a local variable. You can't declare a local variable more than once.
Update
Semantically, it makes sense that this should be possible. In fact, as pointed out in the comments, this is possible in other languages. However, this would not be possible in C# without re-writing some major syntax rules.
Local variables must be declared in a statement. I believe the language to be separated this way because a variable declaration is not really executed. When you see a line of code that creates a variable and assigns it a value, that is actually just a shortcut for two statements. From the ECMA-334 Specification:
The example
void F() {
int x = 1, y, z = x * 2;
}
corresponds exactly to
void F() {
int x; x = 1;
int y;
int z; z = x * 2;
}
The variable declaration part by itself is not "executed". It just means that there should be some memory allocated on the stack for a certain type of variable.
The while statement is expecting a boolean expression, but an expression cannot be composed of a statement -- without a special casing some new grammar.
The for loop is specially designed to declare a local variable, but you'll note that declaration part is "executed" only once.
The using statement was specifically designed to declare local variables (and dispose of them). It is also "executed" only once.
Also consider that a local variable declaration doesn't return a value -- it can't since the it allows you to declare multiple variables. Which value would this statement return?
int x = 1, y, z = x * 2;
The above statement is a local-variable-declaration. It is composed of a type, and three local-variable-declarators. Each one of those can optionally include an "=" token and a local-variable-initializer To allowing a local variable to be declared in this manner would mean that you pull apart the existing grammar a bit since you would need the type specifier, but mandate a single declarator so that it could return a value.
Enabling this behavior may nave negative side effects also, Consider that the while and do/while statements are opposite, but parallel. As a language designer, would you also enable the do statement to declare a local variable? I don't see this as possible. You wouldn't be able to use the variable in the body of the loop because it wouldn't have been initialized yet (as of the first run). Only the while statement would be possible, but then you would destroy the parallelism between while and do statements.

Don't know for certain but here is my educated guess.
The for case works because it actually has 3 parts.
The iteration variable
The terminating condition
The increment
These 3 run differing amounts of times. #1 runs only once, #2 runs number of iterations +1 and #3 runs once per iteration. Since #1 runs only once it's a nice and clean place to define a variable.
Now lets examine the while loop. It only has 1 part and it runs every iteration + 1. Since it runs every single time it's not a great place to define a variable which must necessarily be a part of the condition. It raises questions like
Should the define happen once or once per iteration? If it's the latter how many people will misunderstand this and mess up the conditional? If it's the former then you have a complete statement of which only part executes once per query
How do multiple variable defines behave?
I'm guessing the complexity / ambiguity is one of the reasons why it's not allowed. That and it's probably never been high on the priority list.

Because then it's defined in every iteration. A for loop doesn't do that. (So you cant define it again after the first loop. Since it already exists.)

As opposite to the 'for' loop, 'while' doesn't have a initialization part. The 'for' syntax looks like this:
for ( initializer; conditional expression; loop expression)
{
statements to be executed
}
and the 'while' looks like this:
while (condition)
{
statements to be executed
}
The closest thing to your request is:
int i;
while ((i = NextNum()) > 0) { ... }

There's no inherent reason why it couldn't be done - the C# designers just chose not to allow it. It works just fine in Perl, for example ("while ((my $i = NextNum()) > 0) { ... }").

Variable declarations are statements; the while loop expects an expression (an expression can be a statement, but not the other way around). using and for, on the other hand, are special-cased to allow variable declarations.
More importantly, the scope of such a thing would be unclear. Consider:
int x;
// ...
y = (int x = 42) + x;
// what is y?
Or worse:
(int x = 42) + (int x = 42);
Or even:
(int x = 42, y = 24) // what is the value of this expression?
Once you allow declarations be expressions, you have to deal with these things. In particular, the last example becomes a problem because it's hard to disambiguate between expression-declaration-as-a-statement or statement-declaration, so the expression style will have to be general enough to be how we do statement declarations.
This is starting to get to be a bit of a hairy design challenge, and since it's not really a very important feature, it's likely the C# committee decided not to allow declarations be expressions (or they simply never thought of it :).
Finally, you can actually get what you want with just a for loop. For example:
while ((int i = NextNum()) > 0) {..}
// becomes...
for (int i; (i = NextNum()) > 0; ) {..}
// or...
for (int i = NextNum(); i > 0; i = NextNum()) {..}

Ten years later and it's possible,
while (NextNum() is int i and > 0) { ..}

Related

What is the benefit of using a local variable?

I keep seeing examples online, where there is a property of an element within a method that is copied to a local variable before use. For example, something like this (from Microsoft's StackPanel source code):
UIElementCollection children = arrangeElement.InternalChildren;
...
for (int i = 0, count = children.Count; i < count; ++i)
{
UIElement child = (UIElement)children[i];
if (child == null) { continue; }
...
}
Can anyone explain to me what the benefit of doing that is (if there is one), rather than accessing the property directly each time, like this?:
for (int i = 0, count = arrangeElement.InternalChildren.Count; i < count; ++i)
{
UIElement child = (UIElement)arrangeElement.InternalChildren[i];
if (child == null) { continue; }
...
}
Clearly, it saves a few characters on the screen, but that's not much of a reason to do this. Also, I understand why we might want to do this with a long running method, as a form of caching:
double value = GetValueFromLongRunningMethod();
...
for (int i = 0; i < someCollection.Count; i++) DoSomethingWith(value);
But I see this done with properties a lot and wonder why. Here's another commonly found example from the internet to do with virtualization:
IItemContainerGenerator generator = this.ItemContainerGenerator;
GeneratorPosition position = generator.GeneratorPositionFromIndex(firstVisibleItemIndex);
Why do that instead of this?:
GeneratorPosition position =
this.ItemContainerGenerator.GeneratorPositionFromIndex(firstVisibleItemIndex);
Finally, if this is done for the same reason that we might cache the result of a long running method, then how are we supposed to know which properties need to be accessed in this way?
Firstly, it avoids calling .InternalChildren lots of times. This could be a small but noticeable reduction of virtual calls (since it is used in a loop), but in some cases it might be much more significant. In some cases, a property that returns a collection or array might allocate every time it is called; DataRow.ItemArray is a classic example of this - so it is actively harmful to call it each time. An additional consideration is that even if it returns the same array each time it is called, there is JIT magic that happens to elide bounds checking, but it'll only work if the JIT can see that you are iterating a single array for the entire duration. If you stick a property accessor in the middle: this won't be obvious and the bounds check removal won't happen. It also might not happen if you've manually hoisted the upper bound!
Side note: if it isn't an array, then foreach would probably usually be preferable, and there would not be any advantage to introducing a local, due to how foreach works internally.
Note: since you're using .Count vs .Length, this definitely isn't an array, and you should probably simplify to:
foreach(UIElement child = in arrangeElement.InternalChildren) {...}
or
foreach(var child = in arrangeElement.InternalChildren) {...}
Not only does this remove this question completely, but it means that the type's own iterator (which might be an optimized struct iterator, or might be a simple IEnumerable<T> class, such as a compiler-generated iterator block) can be used. This usually has more direct access to the internals, and thus bypasses a few indirections and API checks that indexers require.
It might be fruitful in some cases like when you have to
debug some piece of code and you need to instantly see the value of variable
do a few operations at a time with an object, which requires casting - as result you cast it once
and sometimes, when you use value type objects this kind of making a local copy gives you an opportunity to not change the value of class' property
Why do that instead of this?:
GeneratorPosition position =
this.ItemContainerGenerator.GeneratorPositionFromIndex(firstVisibleItemIndex);
Let's get very abstract about this:
We get a generator. That apparently is this.ItemContainerGenerator for now, but that could change.
We use it. Only once here, but usually in multiple statements.
When we later decide to get that generator elsewhere, the usage should stay the same.
The example is too small to make this convincing, but there is some kind of logic to be discerned here.

Why doesn't hoisting exist in C#?

I use both Javascript and C# on a daily basis and I sometimes have to consider hoisting when using Javascript. However, C# doesn't seem to implement hoisting(that I know of) and I can't figure out why. Is it more of a design choice or is it more akin to a security or language constraint that applies to all statically typed languages?
For the record, I'm not saying i WANT it to exist in C#. I just want to understand why it doesn't.
EDIT: I noticed the issue when I declared a variable after a LINQ query, but the LINQ query was deferred until after the variable declaration.
var results = From c In db.LoanPricingNoFee Where c.LoanTerm == LoanTerm
&& c.LoanAdvance <= UpperLimit Select c
Order By c.LoanInstalment Ascending;
Int LoanTerm = 12;
Throws an error whereas:
int LoanTerm = 12;
var results = From c In db.LoanPricingNoFee Where c.LoanTerm == LoanTerm
&& c.LoanAdvance <= UpperLimit Select c
Order By c.LoanInstalment Ascending;
Does not.
Of all the programming languages I have used, Javascript has the most confusing scope system and hoisting is a part of that. The outcome is that it is easy to write unpredictable code in JavaScript and you have to be careful with how you write it to make it into the powerful and expressive language it can be.
C#, in common with almost every other language, assumes that you will not use a variable until you have declared it. Because it has a compiler it can enforce that by simply refusing to compile if you try to use an undeclared variable. The other approach to this, more often seen in scripting languages, is that if a variable is used without having been declared it is instantiated at first use. This can make it somewhat hard to follow the flow of code and is often used as a criticism of languages that behave that way. Most people who have used languages with block level scope ( where variables only exist at the level where they were declared ) find it a particularly weird feature of Javascript.
A couple of big reasons that hoisting can cause problems:
It is absolutely counter-intuitive and makes code harder to read and its behaviour harder to predict unless you are conscious of this behaviour. Hard to read and hard to predict code is far more likely to include bugs.
In terms of limiting the number of bugs in your code, limiting the lifetime of your variables can be really helpful. If you can declare the variable and use it in two lines of code, then having ten lines of code in between those two lines gives a lot of opportunities to accidentally affect the behaviour of the variable. There is a lot of information on this in Code Complete - if you haven't read that, I heartily recommend it.
There is a classic UX concept of the Principle Of Least Astonishment - features like hoisting ( or like the way Javascript handles equality ) tend to break that. People don't often think of user experience when developing programming languages, but actually programmers tend to be quite discerning users and more than a little grumpy when they find themselves routinely caught out by odd features. Javascript is very lucky that it's unique ubiquity in the browser has created a kind of enforced popularity that meant we have to tolerate its many quirks and problematic design decisions.
Finally, I cannot imagine a reason why it would be a useful addition to a language like C#- what possible benefit could it confer?
"Is it more of a design choice or is it more akin to a security or language constraint that applies to all statically typed languages?"
It's not a constraint of static typing. It would be trivial for the compiler to move all variable declarations to the top of the scope (in Javascript this is the top of the function, in C# the top of the current block) and to error if a name was declared with different types.
So the reason hoisting doesn't exist in C# is purely a design decision. Why it was designed that way I can't say I wasn't on the team. But it was probably due to the ease of parsing (both for human programmers and the compiler) if variables are always declared before use.
There is a form of Hoisting that exists in C# (and Java), in the context of Loop-invariant code motion - which is the JIT compiler optimization which "hoists" (pulls up) expressions from loop statements that don't effect the actual loop.
You can learn more about it here.
Quote:
“Hoisting” is a compiler optimization that moves loop-invariant code
out of loops. “Loop-invariant code” is code that is referentially
transparent to the loop and can be replaced with its values, so that
it doesn’t change the semantic of the loop. This optimization improves
runtime performance by executing the code only once rather than at
each iteration.
So this written code
public void Update(int[] arr, int x, int y)
{
for (var i = 0; i < arr.Length; i++)
{
arr[i] = x + y;
}
}
is actually optimized to be somewhat like this:
public void Update(int[] arr, int x, int y)
{
var temp = x + y;
var length = arr.Length;
for (var i = 0; i < length; i++)
{
arr[i] = temp;
}
}
This happens in the JIT - i.e. when translating the IL into native machine instructions so its not so easy to view (you can check here, and here).
I'm not an expert in reading assembly, but here is what I got from running this snippet with BenchmarkDotNet, and my comments on it showing that the optimization actually took place:
int[] arr = new int[10];
int x = 11;
int y = 19;
public void Update()
{
for (var i = 0; i < arr.Length; i++)
{
arr[i] = x + y;
}
}
Generated:
Because it is a faulty concept, most probably existing because of rushed implementation of JavaScript. It is a bad approach to coding, which can mislead even experienced javascript coder about scope of a variable.
Function hoisting has a potentially unnecessary cost in work that the compiler has to fulfill. For example, if a variable declaration is never even reached because various code control decisions returned the function, then the processor does not need to waste time pushing an undefined null-reference variable onto the stack memory and then popping it from the stack as part of it's method's clean up operations when it wasn't even reached.
Also, remember that JavaScript has "variable hoisting" and "function hoisting" (among others) which are treated differently. Function hoisting wouldn't make sense in C# since it is not a top-down interpreted language. Once the code is compiled, the method might not ever be called. In JavaScript, however, the "self-invoking" functions are evaluated immediately as the interpreter parses them.
I doubt that it was an arbitrary design decision: Not only is hoisting inefficient for C#, but it just wouldn't make sense for the way that C# works.

C#, Declaring a variable inside for..loop, will it decrease performance? [duplicate]

This question already has answers here:
Reference type variable recycling - is a new reference variable created every loop in a loop if declared therein?
(3 answers)
Closed 6 years ago.
For example:
for (i = 0; i < 100; i++)
{
string myvar = "";
// Some logic
}
Do it make performace or memory leak?
Why i do this, because i don't want "myvar" accessible outside the for..loop.
It is any performance monitor, i can compare the execute time between two snippet or whole program ?
thanks you.
No, variables are purely for the programmer's convenience. It doesn't matter where you declare them. See my answer to this duplicate question for more details.
Perhaps you could check out an old test that I once did regarding another conversation. Variable declaration. Optimized way
The results turned out that it was faster to redefine but not as easy on memory.
My simple test. I initialised an object 100,000,000 times and it is was apparently faster to create a new one instead of re-using an old one :O
string output = "";
{
DateTime startTime1 = DateTime.Now;
myclass cls = new myclass();
for (int i = 0; i < 100000000; i++)
{
cls = new myclass();
cls.var1 = 1;
}
TimeSpan span1 = DateTime.Now - startTime1;
output += span1.ToString();
}
{
DateTime startTime2 = DateTime.Now;
for (int i = 0; i < 100000000; i++)
{
myclass cls = new myclass();
cls.var1 = 1;
}
TimeSpan span2 = DateTime.Now - startTime2;
output += Environment.NewLine + span2.ToString() ;
}
//Span1 took 00:00:02.0391166
//Span2 took 00:00:01.9331106
public class myclass
{
public int var1 = 0;
public myclass()
{
}
}
I believe this would be a performance problem. Since the VM would need to allocate memory to store a reference to the String every loop. Even though the reference may be to the same String instance, allocating memory every time it goes around the loop would not be preferable.
UPDATE:
if your are using the same type of variable as the loop you could do what I originally sugested:
for (int i = 0, myvar = 0; i < 100; i++) {
//some logic
}
otherwise don't worry about it as others already suggested
#phoog, thx for checking the answer
To provide a real life example:
I just finished writing a .obj model loader which, of course, contains some nested loops.
I declared all my variabeles above my loops but then I started wondering the same thing as the OP and found this thread. So I tried and moved all variable declarations to the first point in my loop where I use them and actually saw a small performance gain. A model that previously took 380 ms on average to load (370-400 ms actually) now consistently loads about 15-20 ms faster. This is just about 5% but nontheless an improvement.
My loop construct consists of just a foreach-loop and a nested for-loop but also contains a lot of if-statements. I moved 5 variable declarations, most of which are arrays of strings and ints.
Yes, that would create a memory usage problem. I don't believe it would result in a memory leak as the Garbage Collector would eventually collect the unused objects, but that will have a negative impact on the performance of the application.
I would recommend you use a string builder declared outside the scope of the for loop.
It won't exactly decrease performance or cause a memory leak, but it is still something to be cautious of.
Strings are immutable, which means that once created, they can't be changed. By creating a new string inside the loop, you are creating at least n string variables. If you're trying to do string manipulation inside a loop, you should consider using a StringBuilder instead.
Unless the compiler somehow optimizes your code, declaring a variable inside a for loop will require the allocation of new variables and the collection of old ones. That being said, the compiler does a really good job at optimizing your code.
If you want a quick way to test your two scenarios, use the StopWatch class to measure how much time it takes to execute each case. My guess is that this difference will be nonexistent to negligible.

Howto: "letrec" in C# (lambda expression call within its definition)

Consider the factorial function defined within a method body as a lambda expression and assigned to a variable:
Func<int, int> factfail = n =>
{
if (n == 0)
return 1;
else
return n * factfail(n-1);
};
This fails, since factfail isn't bound by the the local variable yet.
Is there a way to add a sort of fixpoint - by abstracting over the function itself?!
Func<Func<int, int>, int, int> fact_ = (fact, n) =>
{
if (n == 0)
return 1;
else
return n * fact(n-1);
};
fact_(??);
long story:
I need to write a recursive function that has the side effect of changing some outer state.
Therefore i am trying to write that method as a lambda expression that captures that outer state.
I am still experimenting with different styles how to write that and - besides of that one dictionary that needs to be the same for all recursive calls - i want to be as purely functional and lazy as possible.
So i was playing with LINQ, since it helps me reducing mutual data.
It also helps with understanding which parts of the code can be expressed in a functional style.
To be brief in the LINQ statement it is helpful to be able to define some helper functions in front of and i did that by binding lambda expressions to variables.
And with lamda expression i also can capture my dictionary without the need to pass its reference to the method explicitely, which is quite nice.
not sure if i am on the right track though...
You can find more information about recursive lambda expressions in this blog post by Mads Torgersen. He shows how to define the usual fixed point combinator. He uses factorial function as an example, so you can find your exact sample there :-).
However, in practice, you can just define a local Func<..> variable and then mutate it. If you want to give a name to the delegate, then it works just fine (it is a bit dirty, but simple):
Func<int, int> fact = null;
fact = (n) => (n == 0) ? 1 : n * fact(n-1);
This works, because the closure captures reference to the fact variable, so when you actually call it (during the recursive call), the value is not null anymore, but references the delegate.

C# Best way of assigning values to strings in a loop

I wonder what is the most efficient way of assigning string variables in a loop. So, for example if I have to browse through a list of nodes and assigning the value of the node to a string, would it be better if I define a variable before the loop starts like
string myStringVariable = string.Empty
foreach(XmlNode node in givenNodes)
{
myStringVariable = node.Value;
....
...
}
or would it be more efficient if I define the variable inside the loop like
foreach(XmlNode node in givenNodes)
{
string myStringVariable = node.Value;
....
...
}
I think the first approach is more efficient while the second looks more elegant. Is there a performance difference between the two?
Thanks for you answers.
With modern compilers this doesn't make any performance difference at all and you should always use the way that best matches your algorithm. That is, prefer the second variant if you don't need the variable's value from the last iteration.
I guess the main question is: do you need to use that string variable further down in your code somewhere, or is its use limited to the scope of the for loop? If it's limited to the scope of the for loop, definitely declare it inside the loop. I doubt there's any performance penalty for doing it either way, but that should be secondary to keeping your variables scoped properly.
Nope, there is no real performance difference between the two. The VM will recognize that it only needs to allocate space on the stack for one additional variable.
I don't usually optimize to this level, because I'd expect the JIT compiler to be able to perform an optimization like that anyway at runtime. That being said, I've never actually compared the two. Of course, if you really do need the maximum performance, it's worth testing it both ways (using a sufficent number of iterations and with a release build).
Due to fact, that strings are immutable and .net works with references, there is no performance difference between both methods.
Maybe the first one would be a little bit slower, cause there is one (unneeded) set of myStringVariable to string.Empty. But i think these issues will be kept by compiler and JIT and so there is no difference between both in case of performance.
Last but not least there is a difference in scope. So declare the variable in the appropriate scope, where the variable is needed.
Why don't you set up a little test in a console application and test it.
I get very close results for both methods.
using System;
using System.Collections.Generic;
using System.Text;
using System.Diagnostics;
namespace stringtestloop
{
class Program
{
static void Main(string[] args)
{
Stopwatch w = new Stopwatch();
int itterations = 1024 * 1024 * 512;
w.Start();
string var1 = string.Empty;
for (var i = 0; i < itterations; i++)
{
var1 = "some string";
}
w.Stop();
Console.WriteLine("outside: {0} ms", w.ElapsedMilliseconds);
w.Reset();
w.Start();
for (var i = 0; i < itterations; i++)
{
string var2 = "some string";
}
w.Stop();
Console.WriteLine("inside: {0} ms", w.ElapsedMilliseconds);
Console.ReadKey();
}
}
}
EDIT:
The next question to ask yourself is... Is 536870912 (1024*1024*512) a similar number to what you are going to be working with. If not, if your number is going to be a lot less, then you really aren't going to notice the difference.
I doubt there is any significant performance difference, since in both cases you're just getting a reference to the XmlNode.Value, not creating a new string.
Still, you usually shouldn't worry about optimizing these cases. Just declare the variable in the scope it's going to be used and let the compiler work its magic.

Categories

Resources