Specific of Closure Lambda in for and foreach c# - c#

I read different articles regarding this theme Eric Lippert's blog and other here and know why this two code will work differently
int[] values = { 7, 9, 13 };
List<Action> f = new List<Action>();
foreach (var value in values)
f.Add( () => Console.WriteLine("foreach value: " + value));
foreach (var item in f)
item();
f = new List<Action>();
for (int i = 0; i < values.Length; i++)
f.Add(() => Console.WriteLine("for value: " + ((i < values.Length) ? values[i] : i)));
foreach (var item in f)
item();
But I didn't find clean explanation why eventually (begin from compiler c# version 5) was decided to made " the foreach loop variable will be logically inside the body of the loop, and therefore closures will get a fresh copy every time."
Because the old realization give a more freedom to use the closures Lambda ("by value or by ref") but demanded from programmer to use it carefully and if you needed the current implementation from foreach you should was used the following code:
foreach(var v in values)
{
var v2 = v;
funcs.Add( ()=>v2 );
}
QUESTIONS:
Q1. I wonder why in the end it was decided to change the implementation of foreach (pros and cons I read, it was 50/50 Eric Lippert's blog)?
Q2. In the case of a "for" loop, it's a value that falls outside the loop's operating range (this creates a very specific situation where the lambda gets a value that you'll never get in the loop), why is it "out of control" because it's a very error prone situation ? (question number 2 is more rhetorical therefore can be skipped)
Additional explanation (for Q1) It would be interesting to know the reasons why this implementation was chosen - why the developers of C#, starting with version 5, changed the principles of closure for the foreach loop. Or why they did it for the foreach loop but didn't do it for the for.

I wonder why in the end it was decided to change the implementation of foreach
Because (1) users strongly believed that the compiler's behaviour was at best unexpected, and at worst, simply wrong, and (2) there was no compelling reason to keep the strange behaviour. The biggest factor causing the design team to NOT take a breaking change is "because real code depends on the current behavior", but real code that depends on that behaviour is probably wrong!
This was a fairly easy call to make. Lots of people complained about the behaviour; no one at all complained about the fix. It was a good call.
why they did it for the foreach loop but didn't do it for the for.
(1) people didn't complain about the for loop, and (2) the change would be much more difficult, and (3) the change would be much more likely to produce a real-world break.
One reasonably expects that the "loop variable" of a foreach is not a "real" variable. You don't ever change it; the runtime changes it for you:
foreach(char c in "ABCDEFG")
{
c = 'X'; // This is illegal! You cannot treat c as a variable.
}
But that's not true of the loop variable(s) of a for loop; they really are variables.
for(int i = 0; i < 10; i += 1)
i = 11; // weird but legal!
for loops are much more complicated. You can have multiple variables, they can change value arbitrarily, they can be declared outside the loop, and so on. Better to not risk breaking someone by changing how those variables are treated inside the loop.

Related

The potential impact of an if statement inside a loop

Assuming we have the Boolean check = false, and the value for this variable is either always true or false meaning it never changes through the loop, would the following be more computationally efficient than the latter one?
First:
// The value of check variable never changes inside the loop.
if(check){
for(int i=0; i < array.Length; i++){
sb.Append(String.Format("\"{0}\"", array[i].ToString());
}
}
else{
for(int i=0; i < array.Length; i++){
sb.Append(String.Format("{0}", array[i].ToString());
}
}
Second:
for(int i=0; i < array.Length; i++){
if(check){
sb.Append(String.Format("\"{0}\"", array[i].ToString());
}
else{
sb.Append(String.Format("{0}", array[i].ToString());
}
}
There is no general answer for that, especially on modern CPUs.
In theory
Theoretically, the less branches you have in your code, the better. So since the second statement repeats the branches once per loop iteration, you need more processing time and hence it is less efficient.
In practice
Modern CPUs do what is called branch prediction. That means they try to figure out in advance if a branch is taken. If the prediction is correct, the branch is free (free as in 0 CPU cycles), if it is incorrect, the CPU has to flush its execution queue and the branch is very expensive (as in much more than 1 CPU cycle).
In your specific examples you have two branch types, the ones for the loop and the ones for the if. Since your condition for the if does not change and the loop has a fixed number of executions, both branches are trivial to predict for the branch prediction engine and you can expect both alternatives to perform the same.
In coding practice
Performance considerations rarely have an impact in practice (especially in this case because of branch prediction), so you should choose the better coding style. And I would consider the second alternative to be better in this respect.
Sefe's answer is very interesting, but if you know in advance that the value will not change throughout the loop, then you really shouldn't be checking within the loop.
It is preferable to separate the decision from the loop entirely:
var template = check ? "\"{0}\"" : "{0};
for(int i=0; i < array.Length; i++)
{
sb.Append(String.Format(check, array[i].ToString());
}
Also, the whole code could be refactored as:
Func<int, string> getText;
if(array.Length > 2) getText = i => $#"A - ""{array[i]}""";
else getText = i => $#"B - ""{array[i]}""";
for(int i = 0; i < array.Length; i++) sb.Append(getText(i));
That is, you define the whole Func<int, string> based on some boolean check, and later you do the whole for loop against the pre-defined Func<int, string> which won't need a check anymore, and also, you don't repeat yourself!
See how I've used interpolated strings, which is a syntactic sugar of regular string concatenation, and I've used verbatim strings to escape quots using double quots.
In summary:
You avoid repeating yourself.
You avoid many calls to string.Format.
You avoid many calls to string.ToString().
You reduce code lines.
Compiled code is simpler, because delegates end up in a call to a method in a generated internal class, and the rest of operations are just syntactic sugar over regular string concatenation...
I know...
I know that my answer doesn't address the question at all, but I wanted to give the OP some hints on how to optimize the code from a high-level point of view, instead of focusing on low-level details.
First one is more efficient. But compiler optimizations potentially can fix the latter case during compilation.

In C families, in a loop why is "less than or equal to" more preferred over just "less than" symbol? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 years ago.
Improve this question
Why is it that in C family of languages when we use a counter for any loop the most preferred comparison is Greater Equal To <= or its inverse? Please take a look at these three pieces of code
for (var i = 0; i <= 5; i++)
{...}/// loop1
for (var i = 0; i < 6; i++)
{...}/// loop2
for (var i = 0; i != 6; i++)
{...}/// loop3
I understand why loop 3 should be discouraged as something in code can assign i > 5 causing infinite loop. But loop1 and loop2 are essentially the same and loop2 may be better performance wise since only one comparison is done. So why is loop1 more preferred. Is it just convention or is there something more to it?
Note: I have no formal training in programming. i just picked up C when I needed better tools to program 8051s rather than using assembly language.
For loops are often used to iterate over arrays, and the limit is the length of the array. Since arrays are zero-based, the last valid element is length-1. So the choice is between:
for (int i = 0; i < length; i++)
and
for (int i = 0; i <= length-1; i++)
The first is simpler, so it is preferred. As a result, this idiom has become common even when the limit is not an array size.
We don't use != because occasionally we write loops where the index increments by variable steps, and sometimes it will skip over the limit. So it's safer to use a < comparison, so these won't turn into infinite loops.
This is generally a matter of contextual semantics, facilitating 'those that come after you' to maintain the code.
If you need 10 iterations of something, this is usually written as starting from 0 and having an end condition with < or != because it means the 10 is literally part of the code, thus showing clearly that 10 iterations were intended. The non-inclusive notation is also more practical for zero-based arrays like C-style strings. Notation with != is generally discouraged because it can cause endless loops in case the indexer isn't just a straightforward increment, unexpected overflows occur or the like.
On the other hand, if you need a loop from and to a specific value, it's also clearer if you have the end condition literally in the code, for example with for(var i = 1; i <= 5; i++) it is clear right away that it's an inclusive loop from 1 to 5.
These are just common reasons cited for using one notation or the other, most good programmers decide which to use by context and situation. There is no reason performance- or otherwise to prefer one over the other.
Less than or equal to is not preferred. Traditionally, in C,
less than was preferred; in C++, not equals is by far the most
idiomatic. Thus, in C:
#define N 100
int array[N];
for ( int i = 0; i < N; i ++ ) {
// ...
}
and in C++, either:
int const N = 100;
int array[N];
for ( int i = 0; i != N; ++ i ) {
// ...
}
or even more often, if there is only one container, and the
index isn't needed otherwise:
for ( int* p = std::begin( array ); p != std::end( array ); ++ p ) {
// ...
}
(In pre-C++11, of course, we used our own implementations of
begin and end to do the same thing.)
Other forms are generally not idiomatic, and are only used in
exceptional cases.
Almost all for loops have the exact same header except for the upper bound. It is a useful convention that helps with quick understanding and making less mistakes. (And the convention is <, not <=. Not sure where you got that from.)
Programs that do the same thing are not necessarily equal when it comes to code quality. Coding style has an objective component to it in that it helps humans deal with the complexity of the task.
Consistency is an important goal. If you have the choice, prefer the alternative that the majority of team members is using.

C# equivalent to Delphi High() and Low() functions for arrays that maintains performance?

In Delphi there are the Low() and High() functions that return the lowermost and uppermost index dimensions for an Array. This helps eliminate error prone for loops for iterating an array that might fall victim to an insidious +1/-1 array boundary error, like using <= when you meant < for the terminating condition in the for loop statement.
Here's an example for the Low/High functions (in Delphi):
for i := Low(ary) to High(ary) do
For now I'm using a simple for loop statement in C#:
for (int i = 0; i < ary.Length; i++)
I know there is the Array method GetDimension(N) but that has its own liabilities since I could introduce an error by accidentally using the wrong dimension index. I guess I could do something with enumerators, but I worry that there would be a significant performance cost when scanning a large array compared to using a for loop. Is there an equivalent to High/Low in C#?
The C# equivalent to the intrinsic low(ary) and high(ary) functions are, respectively, 0 and ary.Length-1. That's because C# arrays are zero based. I don't see any reason why the Length property of an array should have performance characteristics that differ from Delphi's high().
In terms of performance, the big difference between a Pascal for loop and that used by C derived languages concerns evaluation of the termination test. Consider a classic Pascal for loop:
for i := 0 to GetCount()-1 do
....
With a Pascal for loop, GetCount() is evaluated once only, at the beginning of the loop.
Now consider the equivalent in a C derived language:
for (int i=0; i<GetCount(); i++)
....
In this loop, GetCount() is evaluated every time round the loop. So in a language like C#, you would need a local variable to avoid calling that function over and over.
int N = GetCount();
for (int i=0; i<N; i++)
....
In the case of an array, if the optimiser could be certain that ary.Length did not mutate during the loop, then the code could be optimised by the compiler. I personally do not know whether or not the C# optimiser does that, but please refer to the comments for some more information.
Before you start re-writing your loops to use local variables containing the length of the array, check whether or not it makes any difference. Almost certainly it won't. The difference between Pascal and C-like for loops that I outline above is probably more significant in semantic terms than performance.
The language that I am particularly envious of is D. Here you can use a foreach loop that presents each item in an array as a reference, and thus allows you to modify the contents of the array:
void IncArray(int[] array, int increment) {
foreach (ref e; array) {
e += increment;
}
}
In C# the lower boundary is always zero, so the equivalent of Low(ary) is just 0.
For a single dimension array, the equivalent of High(ary) is ary.Length - 1. (For multi dimensional arrays you would need more than one loop anyway.)
Can you just use foreach statement instead?
like
foreach(int i in ary)
{
. . .
}
[Boundary(int Dimension)]
These boundary functions provide the start and end point for the user specified dimension of any array. A second dimension would be noted with a 1, instead of a zero, and so on.
for (int i = ary.GetLowerBound(0); i <= ary.GetUpperBound(0); i++ )
{}

Is It Ever Good Practice To Modify The Index Variable Inside a FOR Loop?

Given the code:
for (int i = 1; i <= 5; i++)
{
// Do work
}
Is is ever acceptable to change the value of i from within the loop?
For example:
for (int i = 1; i <= 5; i++)
{
if( i == 2)
{
i = 4;
}
// Do work
}
In my opinion, it is too confusing. Better use a while loop in such case.
It is acceptable, however, I personally think this should be avoided. Since it's creating code that will be unexpected by most developers, I find that it's causing something much less maintainable.
Personally, if you need to do this, I would recommend switching to a while loop:
int i=1;
while (i <= 5)
{
if (i == 2)
i = 4;
++i;
}
This, at least, warns people that you're using non-standard logic.
Alternatively, if you're just trying to skip elements, use continue:
for (int i = 1; i <= 5; i++)
{
if (i == 2 || i == 3)
continue;
}
While this is, technically, a few more operations than just setting i directly, it will make more sense to other developers...
YES
You see that frequently in apps that parse data. For example, suppose I'm scanning a binary file, and I'm basically looking for certain data structures. I might have code that does the following:
int SizeOfInterestingSpot = 4;
int InterestingSpotCount = 0;
for (int currentSpot = 0; currentSpot < endOfFile; currentSpot++)
{
if (IsInterestingPart(file[currentSpot])
{
InterestingSpotCount++;
//I know that I have one of what I need ,and further, that this structure in the file takes 20 bytes, so...
currentSpot += SizeOfInterestingSpot-1; //Skip the rest of that structure.
}
}
An example would be deleting items which match some criteria:
for (int i = 0; i < array.size(); /*nothing*/)
{
if (pred(array[i]))
i++;
else
array.erase(array.begin() + i);
}
However a better idea would be using iterators:
for (auto it = array.begin(); it != array.end(); /*nothing*/)
{
if (pred(*it))
++it;
else
it = array.erase(it);
}
EDIT
Oh sorry, my code is C++, and the question is about C#. But nevertheless the idea is the same:
for (int i = 0; i < list.Length; /*nothing*/)
{
if (pred(list[i]))
i++;
else
list.RemoveAt(i);
}
And a better idea might be of course just
list.RemoveAll(x => !pred(x));
Or in a slightly more modern style,
list = list.Where(pred);
(here list should be IEnumerable<...>)
I would say yes, but only in a specific cases.
It may be a bit confusing - if I set i=4 will it be incremented before the next iteration or not?
It may be a sign of a code smell - maybe you should do a LINQ query before and only process relevant elements?
Use with care!
Yes it can be. As there are an extremely enormous amount of possible situations, you're bound to find one exception where it would be considered good practice.
But stopping the theoretica lside of things, i'd say: no. Don't do it.
It gets quite complicated, and hard to read and/or follow. I would rather see something like the continue statement, although i'm not a big fan of that either.
Personally, I would say that if the logic of the algorithm called for a normally-linearly-iterating behavior, but skipping or repeating certain iterations, go for it. However, I also agree with most people that this is not normal for loop usage, so were I in your shoes, I'd make sure to throw in a line or two of comments stating WHY this is happening.
A perfectly valid use case for such a thing might be to parse a roman numeral string. For each character index in the string, look at that character and the next one. If the next character's numeric value is greater than the current character, subtract the current character's value from the next one's, add the result to the total, and skip the next char by incrementing the current index. Otherwise, just add the current character's value to the running total and continue.
An example could be a for loop where you want in a certain condition to repeat current iteration or go back to a previous iteration or even skip a certain amount of iterations (instead of a numered continue).
But these cases are rare. And even for these cases, consider that the for loop is just one means among while, do and other tools that can be used. so consider this as bad practice and try to avoid it. your code will also be less readable that way.
So for conclusion: It's achievable (not in a foreach) but strive to avoid this using while and do etc. instead.
Quoting Petar Minchev:
In my opinion, it is too confusing.
Better use a while loop in such case.
And I would say by doing that, you must be aware of some things that could happen, such as infinite loops, premature-canceled loops, weird variable values or maths when they are based on your index, and mainly (not excluding any of the others) execution flow problems based on your index and other variabes modified by the fail loop.
But if you got such a case, go for it.

'do...while' vs. 'while'

Possible Duplicates:
While vs. Do While
When should I use do-while instead of while loops?
I've been programming for a while now (2 years work + 4.5 years degree + 1 year pre-college), and I've never used a do-while loop short of being forced to in the Introduction to Programming course. I have a growing feeling that I'm doing programming wrong if I never run into something so fundamental.
Could it be that I just haven't run into the correct circumstances?
What are some examples where it would be necessary to use a do-while instead of a while?
(My schooling was almost all in C/C++ and my work is in C#, so if there is another language where it absolutely makes sense because do-whiles work differently, then these questions don't really apply.)
To clarify...I know the difference between a while and a do-while. While checks the exit condition and then performs tasks. do-while performs tasks and then checks exit condition.
If you always want the loop to execute at least once. It's not common, but I do use it from time to time. One case where you might want to use it is trying to access a resource that could require a retry, e.g.
do
{
try to access resource...
put up message box with retry option
} while (user says retry);
do-while is better if the compiler isn't competent at optimization. do-while has only a single conditional jump, as opposed to for and while which have a conditional jump and an unconditional jump. For CPUs which are pipelined and don't do branch prediction, this can make a big difference in the performance of a tight loop.
Also, since most compilers are smart enough to perform this optimization, all loops found in decompiled code will usually be do-while (if the decompiler even bothers to reconstruct loops from backward local gotos at all).
I have used this in a TryDeleteDirectory function. It was something like this
do
{
try
{
DisableReadOnly(directory);
directory.Delete(true);
}
catch (Exception)
{
retryDeleteDirectoryCount++;
}
} while (Directory.Exists(fullPath) && retryDeleteDirectoryCount < 4);
Do while is useful for when you want to execute something at least once. As for a good example for using do while vs. while, lets say you want to make the following: A calculator.
You could approach this by using a loop and checking after each calculation if the person wants to exit the program. Now you can probably assume that once the program is opened the person wants to do this at least once so you could do the following:
do
{
//do calculator logic here
//prompt user for continue here
} while(cont==true);//cont is short for continue
This is sort of an indirect answer, but this question got me thinking about the logic behind it, and I thought this might be worth sharing.
As everyone else has said, you use a do ... while loop when you want to execute the body at least once. But under what circumstances would you want to do that?
Well, the most obvious class of situations I can think of would be when the initial ("unprimed") value of the check condition is the same as when you want to exit. This means that you need to execute the loop body once to prime the condition to a non-exiting value, and then perform the actual repetition based on that condition. What with programmers being so lazy, someone decided to wrap this up in a control structure.
So for example, reading characters from a serial port with a timeout might take the form (in Python):
response_buffer = []
char_read = port.read(1)
while char_read:
response_buffer.append(char_read)
char_read = port.read(1)
# When there's nothing to read after 1s, there is no more data
response = ''.join(response_buffer)
Note the duplication of code: char_read = port.read(1). If Python had a do ... while loop, I might have used:
do:
char_read = port.read(1)
response_buffer.append(char_read)
while char_read
The added benefit for languages that create a new scope for loops: char_read does not pollute the function namespace. But note also that there is a better way to do this, and that is by using Python's None value:
response_buffer = []
char_read = None
while char_read != '':
char_read = port.read(1)
response_buffer.append(char_read)
response = ''.join(response_buffer)
So here's the crux of my point: in languages with nullable types, the situation initial_value == exit_value arises far less frequently, and that may be why you do not encounter it. I'm not saying it never happens, because there are still times when a function will return None to signify a valid condition. But in my hurried and briefly-considered opinion, this would happen a lot more if the languages you used did not allow for a value that signifies: this variable has not been initialised yet.
This is not perfect reasoning: in reality, now that null-values are common, they simply form one more element of the set of valid values a variable can take. But practically, programmers have a way to distinguish between a variable being in sensible state, which may include the loop exit state, and it being in an uninitialised state.
I used them a fair bit when I was in school, but not so much since.
In theory they are useful when you want the loop body to execute once before the exit condition check. The problem is that for the few instances where I don't want the check first, typically I want the exit check in the middle of the loop body rather than at the very end. In that case, I prefer to use the well-known for (;;) with an if (condition) exit; somewhere in the body.
In fact, if I'm a bit shaky on the loop exit condition, sometimes I find it useful to start writing the loop as a for (;;) {} with an exit statement where needed, and then when I'm done I can see if it can be "cleaned up" by moving initilizations, exit conditions, and/or increment code inside the for's parentheses.
A situation where you always need to run a piece of code once, and depending on its result, possibly more times. The same can be produced with a regular while loop as well.
rc = get_something();
while (rc == wrong_stuff)
{
rc = get_something();
}
do
{
rc = get_something();
}
while (rc == wrong_stuff);
It's as simple as that:
precondition vs postcondition
while (cond) {...} - precondition, it executes the code only after checking.
do {...} while (cond) - postcondition, code is executed at least once.
Now that you know the secret .. use them wisely :)
do while is if you want to run the code block at least once. while on the other hand won't always run depending on the criteria specified.
I see that this question has been adequately answered, but would like to add this very specific use case scenario. You might start using do...while more frequently.
do
{
...
} while (0)
is often used for multi-line #defines. For example:
#define compute_values \
area = pi * r * r; \
volume = area * h
This works alright for:
r = 4;
h = 3;
compute_values;
-but- there is a gotcha for:
if (shape == circle) compute_values;
as this expands to:
if (shape == circle) area = pi *r * r;
volume = area * h;
If you wrap it in a do ... while(0) loop it properly expands to a single block:
if (shape == circle)
do
{
area = pi * r * r;
volume = area * h;
} while (0);
The answers so far summarize the general use for do-while. But the OP asked for an example, so here is one: Get user input. But the user's input may be invalid - so you ask for input, validate it, proceed if it's valid, otherwise repeat.
With do-while, you get the input while the input is not valid. With a regular while-loop, you get the input once, but if it's invalid, you get it again and again until it is valid. It's not hard to see that the former is shorter, more elegant, and simpler to maintain if the body of the loop grows more complex.
I've used it for a reader that reads the same structure multiple times.
using(IDataReader reader = connection.ExecuteReader())
{
do
{
while(reader.Read())
{
//Read record
}
} while(reader.NextResult());
}
I can't imagine how you've gone this long without using a do...while loop.
There's one on another monitor right now and there are multiple such loops in that program. They're all of the form:
do
{
GetProspectiveResult();
}
while (!ProspectIsGood());
I like to understand these two as:
while -> 'repeat until',
do ... while -> 'repeat if'.
I've used a do while when I'm reading a sentinel value at the beginning of a file, but other than that, I don't think it's abnormal that this structure isn't too commonly used--do-whiles are really situational.
-- file --
5
Joe
Bob
Jake
Sarah
Sue
-- code --
int MAX;
int count = 0;
do {
MAX = a.readLine();
k[count] = a.readLine();
count++;
} while(count <= MAX)
Here's my theory why most people (including me) prefer while(){} loops to do{}while(): A while(){} loop can easily be adapted to perform like a do..while() loop while the opposite is not true. A while loop is in a certain way "more general". Also programmers like easy to grasp patterns. A while loop says right at start what its invariant is and this is a nice thing.
Here's what I mean about the "more general" thing. Take this do..while loop:
do {
A;
if (condition) INV=false;
B;
} while(INV);
Transforming this in to a while loop is straightforward:
INV=true;
while(INV) {
A;
if (condition) INV=false;
B;
}
Now, we take a model while loop:
while(INV) {
A;
if (condition) INV=false;
B;
}
And transform this into a do..while loop, yields this monstrosity:
if (INV) {
do
{
A;
if (condition) INV=false;
B;
} while(INV)
}
Now we have two checks on opposite ends and if the invariant changes you have to update it on two places. In a certain way do..while is like the specialized screwdrivers in the tool box which you never use, because the standard screwdriver does everything you need.
I am programming about 12 years and only 3 months ago I have met a situation where it was really convenient to use do-while as one iteration was always necessary before checking a condition. So guess your big-time is ahead :).
It is a quite common structure in a server/consumer:
DOWHILE (no shutdown requested)
determine timeout
wait for work(timeout)
IF (there is work)
REPEAT
process
UNTIL(wait for work(0 timeout) indicates no work)
do what is supposed to be done at end of busy period.
ENDIF
ENDDO
the REPEAT UNTIL(cond) being a do {...} while(!cond)
Sometimes the wait for work(0) can be cheaper CPU wise (even eliminating the timeout calculation might be an improvement with very high arrival rates). Moreover, there are many queuing theory results that make the number served in a busy period an important statistic. (See for example Kleinrock - Vol 1.)
Similarly:
DOWHILE (no shutdown requested)
determine timeout
wait for work(timeout)
IF (there is work)
set throttle
REPEAT
process
UNTIL(--throttle<0 **OR** wait for work(0 timeout) indicates no work)
ENDIF
check for and do other (perhaps polled) work.
ENDDO
where check for and do other work may be exorbitantly expensive to put in the main loop or perhaps a kernel that does not support an efficient waitany(waitcontrol*,n) type operation or perhaps a situation where a prioritized queue might starve the other work and throttle is used as starvation control.
This type of balancing can seem like a hack, but it can be necessary. Blind use of thread pools would entirely defeat the performance benefits of the use of a caretaker thread with a private queue for a high updating rate complicated data structure as the use of a thread pool rather than a caretaker thread would require thread-safe implementation.
I really don't want to get into a debate about the pseudo code (for example, whether shutdown requested should be tested in the UNTIL) or caretaker threads versus thread pools - this is just meant to give a flavor of a particular use case of the control flow structure.
This is my personal opinion, but this question begs for an answer rooted in experience:
I have been programming in C for 38 years, and I never use do / while loops in regular code.
The only compelling use for this construct is in macros where it can wrap multiple statements into a single statement via a do { multiple statements } while (0)
I have seen countless examples of do / while loops with bogus error detection or redundant function calls.
My explanation for this observation is programmers tend to model problems incorrectly when they think in terms of do / while loops. They either miss an important ending condition or they miss the possible failure of the initial condition which they move to the end.
For these reasons, I have come to believe that where there is a do / while loop, there is a bug, and I regularly challenge newbie programmers to show me a do / while loop where I cannot spot a bug nearby.
This type of loop can be easily avoided: use a for (;;) { ... } and add the necessary termination tests where they are appropriate. It is quite common that there need be more than one such test.
Here is a classic example:
/* skip the line */
do {
c = getc(fp);
} while (c != '\n');
This will fail if the file does not end with a newline. A trivial example of such a file is the empty file.
A better version is this:
int c; // another classic bug is to define c as char.
while ((c = getc(fp)) != EOF && c != '\n')
continue;
Alternately, this version also hides the c variable:
for (;;) {
int c = getc(fp);
if (c == EOF || c == '\n')
break;
}
Try searching for while (c != '\n'); in any search engine, and you will find bugs such as this one (retrieved June 24, 2017):
In ftp://ftp.dante.de/tex-archive/biblio/tib/src/streams.c , function getword(stream,p,ignore), has a do / while and sure enough at least 2 bugs:
c is defined as a char and
there is a potential infinite loop while (c!='\n') c=getc(stream);
Conclusion: avoid do / while loops and look for bugs when you see one.
while loops check the condition before the loop, do...while loops check the condition after the loop. This is useful is you want to base the condition on side effects from the loop running or, like other posters said, if you want the loop to run at least once.
I understand where you're coming from, but the do-while is something that most use rarely, and I've never used myself. You're not doing it wrong.
You're not doing it wrong. That's like saying someone is doing it wrong because they've never used the byte primitive. It's just not that commonly used.
The most common scenario I run into where I use a do/while loop is in a little console program that runs based on some input and will repeat as many times as the user likes. Obviously it makes no sense for a console program to run no times; but beyond the first time it's up to the user -- hence do/while instead of just while.
This allows the user to try out a bunch of different inputs if desired.
do
{
int input = GetInt("Enter any integer");
// Do something with input.
}
while (GetBool("Go again?"));
I suspect that software developers use do/while less and less these days, now that practically every program under the sun has a GUI of some sort. It makes more sense with console apps, as there is a need to continually refresh the output to provide instructions or prompt the user with new information. With a GUI, in contrast, the text providing that information to the user can just sit on a form and never need to be repeated programmatically.
I use do-while loops all the time when reading in files. I work with a lot of text files that include comments in the header:
# some comments
# some more comments
column1 column2
1.234 5.678
9.012 3.456
... ...
i'll use a do-while loop to read up to the "column1 column2" line so that I can look for the column of interest. Here's the pseudocode:
do {
line = read_line();
} while ( line[0] == '#');
/* parse line */
Then I'll do a while loop to read through the rest of the file.
Being a geezer programmer, many of my school programming projects used text menu driven interactions. Virtually all used something like the following logic for the main procedure:
do
display options
get choice
perform action appropriate to choice
while choice is something other than exit
Since school days, I have found that I use the while loop more frequently.
One of the applications I have seen it is in Oracle when we look at result sets.
Once you a have a result set, you first fetch from it (do) and from that point on.. check if the fetch returns an element or not (while element found..) .. The same might be applicable for any other "fetch-like" implementations.
I 've used it in a function that returned the next character position in an utf-8 string:
char *next_utf8_character(const char *txt)
{
if (!txt || *txt == '\0')
return txt;
do {
txt++;
} while (((signed char) *txt) < 0 && (((unsigned char) *txt) & 0xc0) == 0xc0)
return (char *)txt;
}
Note that, this function is written from mind and not tested. The point is that you have to do the first step anyway and you have to do it before you can evaluate the condition.
Any sort of console input works well with do-while because you prompt the first time, and re-prompt whenever the input validation fails.
Even though there are plenty of answers here is my take. It all comes down to optimalization. I'll show two examples where one is faster then the other.
Case 1: while
string fileName = string.Empty, fullPath = string.Empty;
while (string.IsNullOrEmpty(fileName) || File.Exists(fullPath))
{
fileName = Guid.NewGuid().ToString() + fileExtension;
fullPath = Path.Combine(uploadDirectory, fileName);
}
Case 2: do while
string fileName = string.Empty, fullPath = string.Empty;
do
{
fileName = Guid.NewGuid().ToString() + fileExtension;
fullPath = Path.Combine(uploadDirectory, fileName);
}
while (File.Exists(fullPath));
So there two will do the exact same things. But there is one fundamental difference and that is that the while requires an extra statement to enter the while. Which is ugly because let's say every possible scenario of the Guid class has already been taken except for one variant. This means I'll have to loop around 5,316,911,983,139,663,491,615,228,241,121,400,000 times.
Every time I get to the end of my while statement I will need to do the string.IsNullOrEmpty(fileName) check. So this would take up a little bit, a tiny fraction of CPU work. But do this very small task times the possible combinations the Guid class has and we are talking about hours, days, months or extra time?
Of course this is an extreme example because you probably wouldn't see this in production. But if we would think about the YouTube algorithm, it is very well possible that they would encounter the generation of an ID where some ID's have already been taken. So it comes down to big projects and optimalization.
Even in educational references you barely would find a do...while example. Only recently, after reading Ethan Brown beautiful book, Learning JavaScript I encountered one do...while well defined example. That's been said, I believe it is OK if you don't find application for this structure in you routine job.
It's true that do/while loops are pretty rare. I think this is because a great many loops are of the form
while(something needs doing)
do it;
In general, this is an excellent pattern, and it has the usually-desirable property that if nothing needs doing, the loop runs zero times.
But once in a while, there's some fine reason why you definitely want to make at least one trip through the loop, no matter what. My favorite example is: converting an integer to its decimal representation as a string, that is, implementing printf("%d"), or the semistandard itoa() function.
To illustrate, here is a reasonably straightforward implementation of itoa(). It's not quite the "traditional" formulation; I'll explain it in more detail below if anyone's curious. But the key point is that it embodies the canonical algorithm, repeatedly dividing by 10 to pick off digits from the right, and it's written using an ordinary while loop... and this means it has a bug.
#include <stddef.h>
char *itoa(unsigned int n, char buf[], int bufsize)
{
if(bufsize < 2) return NULL;
char *p = &buf[bufsize];
*--p = '\0';
while(n > 0) {
if(p == buf) return NULL;
*--p = n % 10 + '0';
n /= 10;
}
return p;
}
If you didn't spot it, the bug is that this code returns nothing — an empty string — if you ask it to convert the integer 0. So this is an example of a case where, when there's "nothing" to do, we don't want the code to do nothing — we always want it to produce at least one digit. So we always want it to make at least one trip through the loop. So a do/while loop is just the ticket:
do {
if(p == buf) return NULL;
*--p = n % 10 + '0';
n /= 10;
} while(n > 0);
So now we have a loop that usually stops when n reaches 0, but if n is initially 0 — if you pass in a 0 — it returns the string "0", as desired.
As promised, here's a bit more information about the itoa function in this example. You pass it arguments which are: an int to convert (actually, an unsigned int, so that we don't have to worry about negative numbers); a buffer to render into; and the size of that buffer. It returns a char * pointing into your buffer, pointing at the beginning of the rendered string. (Or it returns NULL if it discovers that the buffer you gave it wasn't big enough.) The "nontraditional" aspect of this implementation is that it fills in the array from right to left, meaning that it doesn't have to reverse the string at the end — and also meaning that the pointer it returns to you is usually not to the beginning of the buffer. So you have to use the pointer it returns to you as the string to use; you can't call it and then assume that the buffer you handed it is the string you can use.
Finally, for completeness, here is a little test program to test this version of itoa with.
#include <stdio.h>
#include <stdlib.h>
int main(int argc, char *argv[])
{
int n;
if(argc > 1)
n = atoi(argv[1]);
else {
printf("enter a number: "); fflush(stdout);
if(scanf("%d", &n) != 1) return EXIT_FAILURE;
}
if(n < 0) {
fprintf(stderr, "sorry, can't do negative numbers yet\n");
return EXIT_FAILURE;
}
char buf[20];
printf("converted: %s\n", itoa(n, buf, sizeof(buf)));
return EXIT_SUCCESS;
}
I ran across this while researching the proper loop to use for a situation I have. I believe this will fully satisfy a common situation where a do.. while loop is a better implementation than a while loop (C# language, since you stated that is your primary for work).
I am generating a list of strings based on the results of an SQL query. The returned object by my query is an SQLDataReader. This object has a function called Read() which advances the object to the next row of data, and returns true if there was another row. It will return false if there is not another row.
Using this information, I want to return each row to a list, then stop when there is no more data to return. A Do... While loop works best in this situation as it ensures that adding an item to the list will happen BEFORE checking if there is another row. The reason this must be done BEFORE checking the while(condition) is that when it checks, it also advances. Using a while loop in this situation would cause it to bypass the first row due to the nature of that particular function.
In short:
This won't work in my situation.
//This will skip the first row because Read() returns true after advancing.
while (_read.NextResult())
{
list.Add(_read.GetValue(0).ToString());
}
return list;
This will.
//This will make sure the currently read row is added before advancing.
do
{
list.Add(_read.GetValue(0).ToString());
}
while (_read.NextResult());
return list;

Categories

Resources