I have a large for-loop (30k max iterations) that seems to be consistently slowing down:
The first thousand iterations take 1.34s
After 12k iterations, the next thousand take 5.31s
After 23k iterations, the next thousand take 6.65s
The last thousand iterations take 7.43s
In order to gain a little performance i switched from a foreach loop to a for loop, and tried release configuration, but I can't find anything else in this question that applies to me. The loop is in an async method
Why does the loop slow down? Can it be avoided?
for(int iter = 0; iter < LargeList1.Count; iter++)
{
var cl_from = LargeList1[iter];
if(LargeList2.Any(cl => cl.str.Contains(cl_from.str)))
{
DateTime dt1 = //last write time of a file
DateTime dt2 = //last write time of a different file
if(DateTime.Compare(dt1, dt2) > 0)
{
try
{
CopyFile(//Kernel32 CopyFile a file overwrite);
globals.fileX++;
}
catch(Exception filexx)
{
//error handler
}
}
else
{
globals.fileS++;
}
}
else
{
Directory.CreateDirectory(//create a directory, no check if it already exists);
try
{
CopyFile(//Kernel32 CopyFile a file do not overwrite);
globals.fileX++;
}
catch(Exception filex)
{
// error handler
}
}
gui.UpdateCount(globals.fileF, globals.fileX, globals.fileS); //updates iteration on textboxes
float p = (float)100.0*((float)globals.fileF + (float)globals.fileX + (float)globals.fileS)/(float)globals.totalCount;
gui.setProgress(p); //updates progressbar
}
Edit: as many suggested, Using hashset.Contains(cl_from.str) solved the problem.
The nature of these 2 items, I can imagine would be the bottleneck.
for(int iter = 0; iter < LargeList1.Count; iter++)
{
.....
if(LargeList2.Any(cl => cl.str.Contains(cl_from.str)))
...........
You are checking if any word from another large list, is contained within the current string.
A few reasons why it probably is slower over time:
Initially its faster because GC doesnt run as much, as you get further in the loop, the GC has to collect more and more often.
Length of string cl_from.st is possibly getting larger?
Some points to consider:
How big is cl_from.str and LargeList2, is it worth creating a hash of all the possible values in cl_from.str and then checking that has lookup or possibly even creating a hash set of all the LargeList2 strings, and then use that, iterating over each combination of string in cl_From.str.
You probably want to improve your searching algorithm, e.g. check out C# .NET: FASTEST WAY TO CHECK IF A STRING OCCURS WITHIN A STRING. Or google it for other string search indexing/algorithm. Why not use something like Lucene.NET?
Use a .NET profiler to find out where the bottleneck is, and where it is spending time.
Another possibility is that the file system is causing you trouble. If you have thousands of files in a single folder, opening a file can take a very long time. The system loads the directory and does a sequential search to find the entry for the file you requested.
If you're getting a list of the files in the directory and then opening them one-by-one, it will take an increasingly long time to open a file as you get further into the list. If you have, for example:
foreach (var filename in Directory.GetFiles(...))
{
// start stopwatch
// open the file
// stop stopwatch
// display time
// close the file
}
You'll find that the time to open the file increases as you get further in the list of files. That difference isn't really significant when you're talking about a few hundred files, but it becomes quite evident when you have 10,000 files in a single folder.
The solution is to break things up so that you don't have so many files in a folder. Rather than 10,000 files in a single folder, have 100 folders with 100 files each. Or 10 folders with 1,000 files each. Either is going to be much faster than a single folder with a huge number of files.
I'm pretty sure the offending line will be:
if(LargeList2.Any(cl => cl.str.Contains(cl_from.str)))
The 'Any' extension method with the lambda expression will ultimately be slower than the longhand version due to the overhead of the anonymous delegate, so you could try expanding out thus:
foreach (var cl in LargeList2)
{
if (cl.str.Contains(cl_from.str))
{
// do stuff
break;
}
}
But I think this iteration is where you might find the slowness creeps in, what with the anonymous delegate and the string function. Is is likely that the iteration has to work through most of LargeList2 to find a match as the iteration goes on, meaning a slow down in performance as the process goes on.
Try using ANTS performance profiler with line-level timings enabled to see if it shows this line as the hotspot after running this process.
I am trying to increase the speed of some UI Automation operations. I stumbled across the (not so well) documented caching possibility.
From what I have understood, the whole operation (if you have a large GUI-tree) is so slow, because for every single function call there has to be a process-change (kind of like going to kernel-mode, I suppose, speed-wise?!). So.. in comes caching.
Simply tell the function to cache an element and its children, and then work on it lightning-fast. (From what I understand, you only have one context-change, and assemble all the data you need in one go.)
Good idea, but.. it is just as slow for me as the uncached variation. I wrote some simple testcode, and could not see an improvement.
AutomationElement ae; // element whose siblings are to be examined, thre are quite a few siblings
AutomationElement sibling;
#region non-cached
watch.Start();
for (int i = 0; i < 10; ++i)
{
sibling = TreeWalker.RawViewWalker.GetFirstChild(TreeWalker.RawViewWalker.GetParent(ae));
while (sibling != null)
{
sibling = TreeWalker.RawViewWalker.GetNextSibling(sibling);
}
}
watch.Stop();
System.Diagnostics.Debug.WriteLine("Execution time without cache: " + watch.ElapsedMilliseconds + " ms.");
#endregion
#region cached
watch.Reset();
watch.Start();
CacheRequest cacheRequest = new CacheRequest();
cacheRequest.TreeScope = TreeScope.Children | TreeScope.Element; // for testing I chose a minimal set
AutomationElement parent;
for (int j = 0; j < 10; ++j)
{
using (cacheRequest.Activate())
{
parent = TreeWalker.RawViewWalker.GetParent(ae, cacheRequest);
}
int cnt = parent.CachedChildren.Count;
for (int i = 0; i < cnt; ++i)
{
sibling = parent.CachedChildren[i];
}
}
watch.Stop();
System.Diagnostics.Debug.WriteLine("Execution time parentcache: " + watch.ElapsedMilliseconds + " ms.");
#endregion
The setup is: you get an element and want to check on all its (many) siblings. Both implementations, without and with cache, are given.
The output (debug mode):
Execution time without cache: 1130 ms.
Execution time parentcache: 1271 ms.
Why is this not working? How to improve?
Thanks for any ideas!!!
I would not expect the two loops to differ much in their running time, since in both cases the parent's children must be fully walked (in the first explicitly, and in the second to populate the cache.) I do think, though, that the time required for looping through the parent.CachedChildren array is much lower than your initial walk code. At that point, the elements should be cached, and you can use them without re-walking the tree.
The general point is that you can't get performance for free, since you need to invest the time to actually populate the cache before it becomes useful.
Actually, recently I found the time to check it again, and also against different UI elements.
For caching to be faster, one first has to specify all (and only) the needed elements.
There is a real performance difference if you e. g. look at a ComboBox with 100 elements inside or something similar, like a very complicated GUI with a lot of elements on the same hierarchy level - and you actually need the data from all of them.
So.. caching is not a solution for everything but rather a performance-optimization-tool that has to be applied with.. knowledge about the situation, the requirements and the internal working.
BTW, talking about performance, I found out that each .current access to a UI element take very roughly 20ms, so for complex operations this really can reach a significant time.
I was hoping to get some advice on how to speed up the following function. Specifically, I'm hoping to find a faster way to convert numbers (mostly doubles, IIRC there's one int in there) to strings to store as Listview subitems. As it stands, this function takes 9 seconds to process 16 orders! Absolutely insane, especially considering that with the exception of the call to process the DateTimes, it's all just string conversion.
I thought it was the actual displaying of the listview items that was slow, so I did some research and found that adding all subitems to an array and using Addrange was far faster than adding the items one at a time. I implemented the change, but got no better speed.
I then added some stopwatches around each line to narrow down exactly what's causing the slowdown; unsurprisingly, the call to the datetime function is the biggest slowdown, but I was surprised to see that the string.format calls were extremely slow as well, and given the number of them, make up the majority of my time.
private void ProcessOrders(List<MyOrder> myOrders)
{
lvItems.Items.Clear();
marketInfo = new MarketInfo();
ListViewItem[] myItems = new ListViewItem[myOrders.Count];
string[] mySubItems = new string[8];
int counter = 0;
MarketInfo.GetTime();
CurrentTime = MarketInfo.CurrentTime;
DateTime OrderIssueDate = new DateTime();
foreach (MyOrder myOrder in myOrders)
{
string orderIsBuySell = "Buy";
if (!myOrder.IsBuyOrder)
orderIsBuySell = "Sell";
var listItem = new ListViewItem(orderIsBuySell);
mySubItems[0] = (myOrder.Name);
mySubItems[1] = (string.Format("{0:g}", myOrder.QuantityRemaining) + "/" + string.Format("{0:g}", myOrder.InitialQuantity));
mySubItems[2] = (string.Format("{0:f}", myOrder.Price));
mySubItems[3] = (myOrder.Local);
if (myOrder.IsBuyOrder)
{
if (myOrder.Range == -1)
mySubItems[4] = ("Local");
else
mySubItems[4] = (string.Format("{0:g}", myOrder.Range));
}
else
mySubItems[4] = ("N/A");
mySubItems[5] = (string.Format("{0:g}", myOrder.MinQuantityToBuy));
string IssueDateString = (myOrder.DateWhenIssued + " " + myOrder.TimeWhenIssued);
if (DateTime.TryParse(IssueDateString, out OrderIssueDate))
mySubItems[6] = (string.Format(MarketInfo.ParseTimeData(CurrentTime, OrderIssueDate, myOrder.Duration)));
else
mySubItems[6] = "Error getting date";
mySubItems[7] = (string.Format("{0:g}", myOrder.ID));
listItem.SubItems.AddRange(mySubItems);
myItems[counter] = listItem;
counter++;
}
lvItems.BeginUpdate();
lvItems.Items.AddRange(myItems.ToArray());
lvItems.EndUpdate();
}
Here's the time data from a sample run:
0: 166686
1: 264779
2: 273716
3: 136698
4: 587902
5: 368816
6: 955478
7: 128981
Where the numbers are equal to the indexes of the array. All other lines were so low in ticks as to be negligible compared to these.
Although I'd like to be able to use the number formatting of string.format for pretty output, I'd like to be able to load a list of orders within my lifetime more, so if there's an alternative to string.format that's considerably faster but without the bells and whistles, I'm all for it.
Edit: Thanks to all of the people who suggested the myOrder class might be using getter methods rather than actually storing the variables as I originally thought. I checked that and sure enough, that was the cause of my slowdown. Although I don't have access to the class to change it, I was able to piggyback onto the method call to populate myOrders and copy each of the variables to a list within the same call, then use that list when populating my Listview. Populates pretty much instantly now. Thanks again.
I find it hard to believe that simple string.Format calls are causing your slowness problems - it's generally a very fast call, especially for nice simple ones like most of yours.
But one thing that might give you a few microseconds...
Replace
string.Format("{0:g}", myOrder.MinQuantityToBuy)
with
myOrder.MinQuantityToBuy.ToString("g")
This will work when you're doing a straight format of a single value, but isn't any good for more complex calls.
I put all the string.format calls into a loop and was able to run them all 1 million times in under a second, so your problem isn't string.format...it's somewhere else in your code.
Perhaps some of these properties have logic in their getter methods? What sort of times do you get if you comment out all the code for the listview?
It is definitely not string.Format that is slowing you down. Suspect the property accesses from myOrder.
On one of the format calls, try to declare a few local variables and set those to the properties you try to format, then pass those local variables to yoru string.Format and retime. You may find that your string.Format now runs in lightning speed as it should.
Now, property accesses usually don't require much time to run. However, I've seen some classes where each property access is logged (for audit trail). Check if this is the case and if some operation is holding your property access from returning immediately.
If there is some operation holding a property access, try to queue up those operations (e.g. queue up the logging calls) and have a background thread execute them. Return the property access immediately.
Also, never put slow-running code (e.g. elaborate calculations) into a property accesser/getter, nor code that has side-effects. People using the class will not be aware that it will be slow (since most property accesses are fast) or has side effects (since most property accesses do not have side effects). If the access is slow, rename the access to a GetXXX() function. If it has side effects, name the method something that conveys this fact.
Wow. I feel a little stupid now. I've spent hours beating my head against the wall trying to figure out why a simple string operation would be taking so long. MarketOrders is (I thought) an array of myOrders, which is populated by an explicit call to a method which is severely restricted as far as how times per second it can be run. I don't have access to that code to check, but I had been assuming that myOrders were simple structs with member variables that were assigned when MarketOrders is populated, so the string.format calls would simply be acting on existing data. On reading all of the replies that point to the access of the myOrder data as the culprit, I started thinking about it and realized that MarketOrders is likely just an index, rather than an array, and the myOrder info is being read on demand. So every time I call an operation on one of its variables, I'm calling the slow lookup method, waiting for it to become eligible to run again, returning to my method, calling the next lookup, etc. No wonder it's taking forever.
Thanks for all of the replies. I can't believe that didn't occur to me.
I am glad that you got your issue solved. However I did a small refactoring on your method and came up with this:
private void ProcessOrders(List<MyOrder> myOrders)
{
lvItems.Items.Clear();
marketInfo = new MarketInfo();
ListViewItem[] myItems = new ListViewItem[myOrders.Count];
string[] mySubItems = new string[8];
int counter = 0;
MarketInfo.GetTime();
CurrentTime = MarketInfo.CurrentTime;
// ReSharper disable TooWideLocalVariableScope
DateTime orderIssueDate;
// ReSharper restore TooWideLocalVariableScope
foreach (MyOrder myOrder in myOrders)
{
string orderIsBuySell = myOrder.IsBuyOrder ? "Buy" : "Sell";
var listItem = new ListViewItem(orderIsBuySell);
mySubItems[0] = myOrder.Name;
mySubItems[1] = string.Format("{0:g}/{1:g}", myOrder.QuantityRemaining, myOrder.InitialQuantity);
mySubItems[2] = myOrder.Price.ToString("f");
mySubItems[3] = myOrder.Local;
if (myOrder.IsBuyOrder)
mySubItems[4] = myOrder.Range == -1 ? "Local" : myOrder.Range.ToString("g");
else
mySubItems[4] = "N/A";
mySubItems[5] = myOrder.MinQuantityToBuy.ToString("g");
// This code smells:
string issueDateString = string.Format("{0} {1}", myOrder.DateWhenIssued, myOrder.TimeWhenIssued);
if (DateTime.TryParse(issueDateString, out orderIssueDate))
mySubItems[6] = MarketInfo.ParseTimeData(CurrentTime, orderIssueDate, myOrder.Duration);
else
mySubItems[6] = "Error getting date";
mySubItems[7] = myOrder.ID.ToString("g");
listItem.SubItems.AddRange(mySubItems);
myItems[counter] = listItem;
counter++;
}
lvItems.BeginUpdate();
lvItems.Items.AddRange(myItems.ToArray());
lvItems.EndUpdate();
}
This method should be further refactored:
Remove outer dependencies Inversion of control (IoC) in mind and by using dependency injection (DI);
Create new property "DateTimeWhenIssued" for MyOrder that will return DateTime data type. This should be used instead of joining two strings (DateWhenIssued and TimeWhenIssued) and then parsing them into DateTime;
Rename ListViewItem as this is a built in class;
ListViewItem should have a new constructor for boolean "IsByOrder": var listItem = new ListViewItem(myOrder.IsBuyOrder). Instead of a string "Buy" or "Sell";
"mySubItems" string array should be replaced with a class for better readability and extendibility;
Lastly, the foreach (MyOrder myOrder in myOrders) could be replaced with a "for" loop as you are using a counter anyway. Besides "for" loops are faster too.
Hopefully you do not mind my suggestions and that they are doable in your situation.
PS. Are you using generic arrays? ListViewItem.SubItems property could be
public List<string> SubItems { get; set; };
Possible Duplicates:
While vs. Do While
When should I use do-while instead of while loops?
I've been programming for a while now (2 years work + 4.5 years degree + 1 year pre-college), and I've never used a do-while loop short of being forced to in the Introduction to Programming course. I have a growing feeling that I'm doing programming wrong if I never run into something so fundamental.
Could it be that I just haven't run into the correct circumstances?
What are some examples where it would be necessary to use a do-while instead of a while?
(My schooling was almost all in C/C++ and my work is in C#, so if there is another language where it absolutely makes sense because do-whiles work differently, then these questions don't really apply.)
To clarify...I know the difference between a while and a do-while. While checks the exit condition and then performs tasks. do-while performs tasks and then checks exit condition.
If you always want the loop to execute at least once. It's not common, but I do use it from time to time. One case where you might want to use it is trying to access a resource that could require a retry, e.g.
do
{
try to access resource...
put up message box with retry option
} while (user says retry);
do-while is better if the compiler isn't competent at optimization. do-while has only a single conditional jump, as opposed to for and while which have a conditional jump and an unconditional jump. For CPUs which are pipelined and don't do branch prediction, this can make a big difference in the performance of a tight loop.
Also, since most compilers are smart enough to perform this optimization, all loops found in decompiled code will usually be do-while (if the decompiler even bothers to reconstruct loops from backward local gotos at all).
I have used this in a TryDeleteDirectory function. It was something like this
do
{
try
{
DisableReadOnly(directory);
directory.Delete(true);
}
catch (Exception)
{
retryDeleteDirectoryCount++;
}
} while (Directory.Exists(fullPath) && retryDeleteDirectoryCount < 4);
Do while is useful for when you want to execute something at least once. As for a good example for using do while vs. while, lets say you want to make the following: A calculator.
You could approach this by using a loop and checking after each calculation if the person wants to exit the program. Now you can probably assume that once the program is opened the person wants to do this at least once so you could do the following:
do
{
//do calculator logic here
//prompt user for continue here
} while(cont==true);//cont is short for continue
This is sort of an indirect answer, but this question got me thinking about the logic behind it, and I thought this might be worth sharing.
As everyone else has said, you use a do ... while loop when you want to execute the body at least once. But under what circumstances would you want to do that?
Well, the most obvious class of situations I can think of would be when the initial ("unprimed") value of the check condition is the same as when you want to exit. This means that you need to execute the loop body once to prime the condition to a non-exiting value, and then perform the actual repetition based on that condition. What with programmers being so lazy, someone decided to wrap this up in a control structure.
So for example, reading characters from a serial port with a timeout might take the form (in Python):
response_buffer = []
char_read = port.read(1)
while char_read:
response_buffer.append(char_read)
char_read = port.read(1)
# When there's nothing to read after 1s, there is no more data
response = ''.join(response_buffer)
Note the duplication of code: char_read = port.read(1). If Python had a do ... while loop, I might have used:
do:
char_read = port.read(1)
response_buffer.append(char_read)
while char_read
The added benefit for languages that create a new scope for loops: char_read does not pollute the function namespace. But note also that there is a better way to do this, and that is by using Python's None value:
response_buffer = []
char_read = None
while char_read != '':
char_read = port.read(1)
response_buffer.append(char_read)
response = ''.join(response_buffer)
So here's the crux of my point: in languages with nullable types, the situation initial_value == exit_value arises far less frequently, and that may be why you do not encounter it. I'm not saying it never happens, because there are still times when a function will return None to signify a valid condition. But in my hurried and briefly-considered opinion, this would happen a lot more if the languages you used did not allow for a value that signifies: this variable has not been initialised yet.
This is not perfect reasoning: in reality, now that null-values are common, they simply form one more element of the set of valid values a variable can take. But practically, programmers have a way to distinguish between a variable being in sensible state, which may include the loop exit state, and it being in an uninitialised state.
I used them a fair bit when I was in school, but not so much since.
In theory they are useful when you want the loop body to execute once before the exit condition check. The problem is that for the few instances where I don't want the check first, typically I want the exit check in the middle of the loop body rather than at the very end. In that case, I prefer to use the well-known for (;;) with an if (condition) exit; somewhere in the body.
In fact, if I'm a bit shaky on the loop exit condition, sometimes I find it useful to start writing the loop as a for (;;) {} with an exit statement where needed, and then when I'm done I can see if it can be "cleaned up" by moving initilizations, exit conditions, and/or increment code inside the for's parentheses.
A situation where you always need to run a piece of code once, and depending on its result, possibly more times. The same can be produced with a regular while loop as well.
rc = get_something();
while (rc == wrong_stuff)
{
rc = get_something();
}
do
{
rc = get_something();
}
while (rc == wrong_stuff);
It's as simple as that:
precondition vs postcondition
while (cond) {...} - precondition, it executes the code only after checking.
do {...} while (cond) - postcondition, code is executed at least once.
Now that you know the secret .. use them wisely :)
do while is if you want to run the code block at least once. while on the other hand won't always run depending on the criteria specified.
I see that this question has been adequately answered, but would like to add this very specific use case scenario. You might start using do...while more frequently.
do
{
...
} while (0)
is often used for multi-line #defines. For example:
#define compute_values \
area = pi * r * r; \
volume = area * h
This works alright for:
r = 4;
h = 3;
compute_values;
-but- there is a gotcha for:
if (shape == circle) compute_values;
as this expands to:
if (shape == circle) area = pi *r * r;
volume = area * h;
If you wrap it in a do ... while(0) loop it properly expands to a single block:
if (shape == circle)
do
{
area = pi * r * r;
volume = area * h;
} while (0);
The answers so far summarize the general use for do-while. But the OP asked for an example, so here is one: Get user input. But the user's input may be invalid - so you ask for input, validate it, proceed if it's valid, otherwise repeat.
With do-while, you get the input while the input is not valid. With a regular while-loop, you get the input once, but if it's invalid, you get it again and again until it is valid. It's not hard to see that the former is shorter, more elegant, and simpler to maintain if the body of the loop grows more complex.
I've used it for a reader that reads the same structure multiple times.
using(IDataReader reader = connection.ExecuteReader())
{
do
{
while(reader.Read())
{
//Read record
}
} while(reader.NextResult());
}
I can't imagine how you've gone this long without using a do...while loop.
There's one on another monitor right now and there are multiple such loops in that program. They're all of the form:
do
{
GetProspectiveResult();
}
while (!ProspectIsGood());
I like to understand these two as:
while -> 'repeat until',
do ... while -> 'repeat if'.
I've used a do while when I'm reading a sentinel value at the beginning of a file, but other than that, I don't think it's abnormal that this structure isn't too commonly used--do-whiles are really situational.
-- file --
5
Joe
Bob
Jake
Sarah
Sue
-- code --
int MAX;
int count = 0;
do {
MAX = a.readLine();
k[count] = a.readLine();
count++;
} while(count <= MAX)
Here's my theory why most people (including me) prefer while(){} loops to do{}while(): A while(){} loop can easily be adapted to perform like a do..while() loop while the opposite is not true. A while loop is in a certain way "more general". Also programmers like easy to grasp patterns. A while loop says right at start what its invariant is and this is a nice thing.
Here's what I mean about the "more general" thing. Take this do..while loop:
do {
A;
if (condition) INV=false;
B;
} while(INV);
Transforming this in to a while loop is straightforward:
INV=true;
while(INV) {
A;
if (condition) INV=false;
B;
}
Now, we take a model while loop:
while(INV) {
A;
if (condition) INV=false;
B;
}
And transform this into a do..while loop, yields this monstrosity:
if (INV) {
do
{
A;
if (condition) INV=false;
B;
} while(INV)
}
Now we have two checks on opposite ends and if the invariant changes you have to update it on two places. In a certain way do..while is like the specialized screwdrivers in the tool box which you never use, because the standard screwdriver does everything you need.
I am programming about 12 years and only 3 months ago I have met a situation where it was really convenient to use do-while as one iteration was always necessary before checking a condition. So guess your big-time is ahead :).
It is a quite common structure in a server/consumer:
DOWHILE (no shutdown requested)
determine timeout
wait for work(timeout)
IF (there is work)
REPEAT
process
UNTIL(wait for work(0 timeout) indicates no work)
do what is supposed to be done at end of busy period.
ENDIF
ENDDO
the REPEAT UNTIL(cond) being a do {...} while(!cond)
Sometimes the wait for work(0) can be cheaper CPU wise (even eliminating the timeout calculation might be an improvement with very high arrival rates). Moreover, there are many queuing theory results that make the number served in a busy period an important statistic. (See for example Kleinrock - Vol 1.)
Similarly:
DOWHILE (no shutdown requested)
determine timeout
wait for work(timeout)
IF (there is work)
set throttle
REPEAT
process
UNTIL(--throttle<0 **OR** wait for work(0 timeout) indicates no work)
ENDIF
check for and do other (perhaps polled) work.
ENDDO
where check for and do other work may be exorbitantly expensive to put in the main loop or perhaps a kernel that does not support an efficient waitany(waitcontrol*,n) type operation or perhaps a situation where a prioritized queue might starve the other work and throttle is used as starvation control.
This type of balancing can seem like a hack, but it can be necessary. Blind use of thread pools would entirely defeat the performance benefits of the use of a caretaker thread with a private queue for a high updating rate complicated data structure as the use of a thread pool rather than a caretaker thread would require thread-safe implementation.
I really don't want to get into a debate about the pseudo code (for example, whether shutdown requested should be tested in the UNTIL) or caretaker threads versus thread pools - this is just meant to give a flavor of a particular use case of the control flow structure.
This is my personal opinion, but this question begs for an answer rooted in experience:
I have been programming in C for 38 years, and I never use do / while loops in regular code.
The only compelling use for this construct is in macros where it can wrap multiple statements into a single statement via a do { multiple statements } while (0)
I have seen countless examples of do / while loops with bogus error detection or redundant function calls.
My explanation for this observation is programmers tend to model problems incorrectly when they think in terms of do / while loops. They either miss an important ending condition or they miss the possible failure of the initial condition which they move to the end.
For these reasons, I have come to believe that where there is a do / while loop, there is a bug, and I regularly challenge newbie programmers to show me a do / while loop where I cannot spot a bug nearby.
This type of loop can be easily avoided: use a for (;;) { ... } and add the necessary termination tests where they are appropriate. It is quite common that there need be more than one such test.
Here is a classic example:
/* skip the line */
do {
c = getc(fp);
} while (c != '\n');
This will fail if the file does not end with a newline. A trivial example of such a file is the empty file.
A better version is this:
int c; // another classic bug is to define c as char.
while ((c = getc(fp)) != EOF && c != '\n')
continue;
Alternately, this version also hides the c variable:
for (;;) {
int c = getc(fp);
if (c == EOF || c == '\n')
break;
}
Try searching for while (c != '\n'); in any search engine, and you will find bugs such as this one (retrieved June 24, 2017):
In ftp://ftp.dante.de/tex-archive/biblio/tib/src/streams.c , function getword(stream,p,ignore), has a do / while and sure enough at least 2 bugs:
c is defined as a char and
there is a potential infinite loop while (c!='\n') c=getc(stream);
Conclusion: avoid do / while loops and look for bugs when you see one.
while loops check the condition before the loop, do...while loops check the condition after the loop. This is useful is you want to base the condition on side effects from the loop running or, like other posters said, if you want the loop to run at least once.
I understand where you're coming from, but the do-while is something that most use rarely, and I've never used myself. You're not doing it wrong.
You're not doing it wrong. That's like saying someone is doing it wrong because they've never used the byte primitive. It's just not that commonly used.
The most common scenario I run into where I use a do/while loop is in a little console program that runs based on some input and will repeat as many times as the user likes. Obviously it makes no sense for a console program to run no times; but beyond the first time it's up to the user -- hence do/while instead of just while.
This allows the user to try out a bunch of different inputs if desired.
do
{
int input = GetInt("Enter any integer");
// Do something with input.
}
while (GetBool("Go again?"));
I suspect that software developers use do/while less and less these days, now that practically every program under the sun has a GUI of some sort. It makes more sense with console apps, as there is a need to continually refresh the output to provide instructions or prompt the user with new information. With a GUI, in contrast, the text providing that information to the user can just sit on a form and never need to be repeated programmatically.
I use do-while loops all the time when reading in files. I work with a lot of text files that include comments in the header:
# some comments
# some more comments
column1 column2
1.234 5.678
9.012 3.456
... ...
i'll use a do-while loop to read up to the "column1 column2" line so that I can look for the column of interest. Here's the pseudocode:
do {
line = read_line();
} while ( line[0] == '#');
/* parse line */
Then I'll do a while loop to read through the rest of the file.
Being a geezer programmer, many of my school programming projects used text menu driven interactions. Virtually all used something like the following logic for the main procedure:
do
display options
get choice
perform action appropriate to choice
while choice is something other than exit
Since school days, I have found that I use the while loop more frequently.
One of the applications I have seen it is in Oracle when we look at result sets.
Once you a have a result set, you first fetch from it (do) and from that point on.. check if the fetch returns an element or not (while element found..) .. The same might be applicable for any other "fetch-like" implementations.
I 've used it in a function that returned the next character position in an utf-8 string:
char *next_utf8_character(const char *txt)
{
if (!txt || *txt == '\0')
return txt;
do {
txt++;
} while (((signed char) *txt) < 0 && (((unsigned char) *txt) & 0xc0) == 0xc0)
return (char *)txt;
}
Note that, this function is written from mind and not tested. The point is that you have to do the first step anyway and you have to do it before you can evaluate the condition.
Any sort of console input works well with do-while because you prompt the first time, and re-prompt whenever the input validation fails.
Even though there are plenty of answers here is my take. It all comes down to optimalization. I'll show two examples where one is faster then the other.
Case 1: while
string fileName = string.Empty, fullPath = string.Empty;
while (string.IsNullOrEmpty(fileName) || File.Exists(fullPath))
{
fileName = Guid.NewGuid().ToString() + fileExtension;
fullPath = Path.Combine(uploadDirectory, fileName);
}
Case 2: do while
string fileName = string.Empty, fullPath = string.Empty;
do
{
fileName = Guid.NewGuid().ToString() + fileExtension;
fullPath = Path.Combine(uploadDirectory, fileName);
}
while (File.Exists(fullPath));
So there two will do the exact same things. But there is one fundamental difference and that is that the while requires an extra statement to enter the while. Which is ugly because let's say every possible scenario of the Guid class has already been taken except for one variant. This means I'll have to loop around 5,316,911,983,139,663,491,615,228,241,121,400,000 times.
Every time I get to the end of my while statement I will need to do the string.IsNullOrEmpty(fileName) check. So this would take up a little bit, a tiny fraction of CPU work. But do this very small task times the possible combinations the Guid class has and we are talking about hours, days, months or extra time?
Of course this is an extreme example because you probably wouldn't see this in production. But if we would think about the YouTube algorithm, it is very well possible that they would encounter the generation of an ID where some ID's have already been taken. So it comes down to big projects and optimalization.
Even in educational references you barely would find a do...while example. Only recently, after reading Ethan Brown beautiful book, Learning JavaScript I encountered one do...while well defined example. That's been said, I believe it is OK if you don't find application for this structure in you routine job.
It's true that do/while loops are pretty rare. I think this is because a great many loops are of the form
while(something needs doing)
do it;
In general, this is an excellent pattern, and it has the usually-desirable property that if nothing needs doing, the loop runs zero times.
But once in a while, there's some fine reason why you definitely want to make at least one trip through the loop, no matter what. My favorite example is: converting an integer to its decimal representation as a string, that is, implementing printf("%d"), or the semistandard itoa() function.
To illustrate, here is a reasonably straightforward implementation of itoa(). It's not quite the "traditional" formulation; I'll explain it in more detail below if anyone's curious. But the key point is that it embodies the canonical algorithm, repeatedly dividing by 10 to pick off digits from the right, and it's written using an ordinary while loop... and this means it has a bug.
#include <stddef.h>
char *itoa(unsigned int n, char buf[], int bufsize)
{
if(bufsize < 2) return NULL;
char *p = &buf[bufsize];
*--p = '\0';
while(n > 0) {
if(p == buf) return NULL;
*--p = n % 10 + '0';
n /= 10;
}
return p;
}
If you didn't spot it, the bug is that this code returns nothing — an empty string — if you ask it to convert the integer 0. So this is an example of a case where, when there's "nothing" to do, we don't want the code to do nothing — we always want it to produce at least one digit. So we always want it to make at least one trip through the loop. So a do/while loop is just the ticket:
do {
if(p == buf) return NULL;
*--p = n % 10 + '0';
n /= 10;
} while(n > 0);
So now we have a loop that usually stops when n reaches 0, but if n is initially 0 — if you pass in a 0 — it returns the string "0", as desired.
As promised, here's a bit more information about the itoa function in this example. You pass it arguments which are: an int to convert (actually, an unsigned int, so that we don't have to worry about negative numbers); a buffer to render into; and the size of that buffer. It returns a char * pointing into your buffer, pointing at the beginning of the rendered string. (Or it returns NULL if it discovers that the buffer you gave it wasn't big enough.) The "nontraditional" aspect of this implementation is that it fills in the array from right to left, meaning that it doesn't have to reverse the string at the end — and also meaning that the pointer it returns to you is usually not to the beginning of the buffer. So you have to use the pointer it returns to you as the string to use; you can't call it and then assume that the buffer you handed it is the string you can use.
Finally, for completeness, here is a little test program to test this version of itoa with.
#include <stdio.h>
#include <stdlib.h>
int main(int argc, char *argv[])
{
int n;
if(argc > 1)
n = atoi(argv[1]);
else {
printf("enter a number: "); fflush(stdout);
if(scanf("%d", &n) != 1) return EXIT_FAILURE;
}
if(n < 0) {
fprintf(stderr, "sorry, can't do negative numbers yet\n");
return EXIT_FAILURE;
}
char buf[20];
printf("converted: %s\n", itoa(n, buf, sizeof(buf)));
return EXIT_SUCCESS;
}
I ran across this while researching the proper loop to use for a situation I have. I believe this will fully satisfy a common situation where a do.. while loop is a better implementation than a while loop (C# language, since you stated that is your primary for work).
I am generating a list of strings based on the results of an SQL query. The returned object by my query is an SQLDataReader. This object has a function called Read() which advances the object to the next row of data, and returns true if there was another row. It will return false if there is not another row.
Using this information, I want to return each row to a list, then stop when there is no more data to return. A Do... While loop works best in this situation as it ensures that adding an item to the list will happen BEFORE checking if there is another row. The reason this must be done BEFORE checking the while(condition) is that when it checks, it also advances. Using a while loop in this situation would cause it to bypass the first row due to the nature of that particular function.
In short:
This won't work in my situation.
//This will skip the first row because Read() returns true after advancing.
while (_read.NextResult())
{
list.Add(_read.GetValue(0).ToString());
}
return list;
This will.
//This will make sure the currently read row is added before advancing.
do
{
list.Add(_read.GetValue(0).ToString());
}
while (_read.NextResult());
return list;
I have built an application that is used to simulate the number of products that a company can produce in different "modes" per month. This simulation is used to aid in finding the optimal series of modes to run in for a month to best meet the projected sales forecast for the month. This application has been working well, until recently when the plant was modified to run in additional modes. It is now possible to run in 16 modes. For a month with 22 work days this yields 9,364,199,760 possible combinations. This is up from 8 modes in the past that would have yielded a mere 1,560,780 possible combinations. The PC that runs this application is on the old side and cannot handle the number of calculations before an out of memory exception is thrown. In fact the entire application cannot support more than 15 modes because it uses integers to track the number of modes and it exceeds the upper limit for an integer. Baring that issue, I need to do what I can to reduce the memory utilization of the application and optimize this to run as efficiently as possible even if it cannot achieve the stated goal of 16 modes. I was considering writing the data to disk rather than storing the list in memory, but before I take on that overhead, I would like to get people’s opinion on the method to see if there is any room for optimization there.
EDIT
Based on a suggestion by few to consider something more academic then merely calculating every possible answer, listed below is a brief explanation of how the optimal run (combination of modes) is chosen.
Currently the computer determines every possible way that the plant can run for the number of work days that month. For example 3 Modes for a max of 2 work days would result in the combinations (where the number represents the mode chosen) of (1,1), (1,2), (1,3), (2,2), (2,3), (3,3) For each mode a product produces at a different rate of production, for example in mode 1, product x may produce at 50 units per hour where product y produces at 30 units per hour and product z produces at 0 units per hour. Each combination is then multiplied by work hours and production rates. The run that produces numbers that most closely match the forecasted value for each product for the month is chosen. However, because some months the plant does not meet the forecasted value for a product, the algorithm increases the priority of a product for the next month to ensure that at the end of the year the product has met the forecasted value. Since warehouse space is tight, it is important that products not overproduce too much either.
Thank you
private List<List<int>> _modeIterations = new List<List<int>>();
private void CalculateCombinations(int modes, int workDays, string combinationValues)
{
List<int> _tempList = new List<int>();
if (modes == 1)
{
combinationValues += Convert.ToString(workDays);
string[] _combinations = combinationValues.Split(',');
foreach (string _number in _combinations)
{
_tempList.Add(Convert.ToInt32(_number));
}
_modeIterations.Add(_tempList);
}
else
{
for (int i = workDays + 1; --i >= 0; )
{
CalculateCombinations(modes - 1, workDays - i, combinationValues + i + ",");
}
}
}
This kind of optimization problem is difficult but extremely well-studied. You should probably read up in the literature on it rather than trying to re-invent the wheel. The keywords you want to look for are "operations research" and "combinatorial optimization problem".
It is well-known in the study of optimization problems that finding the optimal solution to a problem is almost always computationally infeasible as the problem grows large, as you have discovered for yourself. However, it is frequently the case that finding a solution guaranteed to be within a certain percentage of the optimal solution is feasible. You should probably concentrate on finding approximate solutions. After all, your sales targets are already just educated guesses, therefore finding the optimal solution is already going to be impossible; you haven't got complete information.)
What I would do is start by reading the wikipedia page on the Knapsack Problem:
http://en.wikipedia.org/wiki/Knapsack_problem
This is the problem of "I've got a whole bunch of items of different values and different weights, I can carry 50 pounds in my knapsack, what is the largest possible value I can carry while meeting my weight goal?"
This isn't exactly your problem, but clearly it is related -- you've got a certain amount of "value" to maximize, and a limited number of slots to pack that value into. If you can start to understand how people find near-optimal solutions to the knapsack problem, you can apply that to your specific problem.
You could process the permutation as soon as you have generated it, instead of collecting them all in a list first:
public delegate void Processor(List<int> args);
private void CalculateCombinations(int modes, int workDays, string combinationValues, Processor processor)
{
if (modes == 1)
{
List<int> _tempList = new List<int>();
combinationValues += Convert.ToString(workDays);
string[] _combinations = combinationValues.Split(',');
foreach (string _number in _combinations)
{
_tempList.Add(Convert.ToInt32(_number));
}
processor.Invoke(_tempList);
}
else
{
for (int i = workDays + 1; --i >= 0; )
{
CalculateCombinations(modes - 1, workDays - i, combinationValues + i + ",", processor);
}
}
}
I am assuming here, that your current pattern of work is something along the lines
CalculateCombinations(initial_value_1, initial_value_2, initial_value_3);
foreach( List<int> list in _modeIterations ) {
... process the list ...
}
With the direct-process-approach, this would be
private void ProcessPermutation(List<int> args)
{
... process ...
}
... somewhere else ...
CalculateCombinations(initial_value_1, initial_value_2, initial_value_3, ProcessPermutation);
I would also suggest, that you try to prune the search tree as early as possible; if you can already tell, that certain combinations of the arguments will never yield something, which can be processed, you should catch those already during generation, and avoid the recursion alltogether, if this is possible.
In new versions of C#, generation of the combinations using an iterator (?) function might be usable to retain the original structure of your code. I haven't really used this feature (yield) as of yet, so I cannot comment on it.
The problem lies more in the Brute Force approach that in the code itself. It's possible that brute force might be the only way to approach the problem but I doubt it. Chess, for example, is unresolvable by Brute Force but computers play at it quite well using heuristics to discard the less promising approaches and focusing on good ones. Maybe you should take a similar approach.
On the other hand we need to know how each "mode" is evaluated in order to suggest any heuristics. In your code you're only computing all possible combinations which, anyway, will not scale if the modes go up to 32... even if you store it on disk.
if (modes == 1)
{
List<int> _tempList = new List<int>();
combinationValues += Convert.ToString(workDays);
string[] _combinations = combinationValues.Split(',');
foreach (string _number in _combinations)
{
_tempList.Add(Convert.ToInt32(_number));
}
processor.Invoke(_tempList);
}
Everything in this block of code is executed over and over again, so no line in that code should make use of memory without freeing it. The most obvious place to avoid memory craziness is to write out combinationValues to disk as it is processed (i.e. use a FileStream, not a string). I think that in general, doing string concatenation the way you are doing here is bad, since every concatenation results in memory sadness. At least use a stringbuilder (See back to basics , which discusses the same issue in terms of C). There may be other places with issues, though. The simplest way to figure out why you are getting an out of memory error may be to use a memory profiler (Download Link from download.microsoft.com).
By the way, my tendency with code like this is to have a global List object that is Clear()ed rather than having a temporary one that is created over and over again.
I would replace the List objects with my own class that uses preallocated arrays to hold the ints. I'm not really sure about this right now, but I believe that each integer in a List is boxed, which means much more memory is used than with a simple array of ints.
Edit: On the other hand it seems I am mistaken: Which one is more efficient : List<int> or int[]