I have a loop (below) that loops N number of times based on user input. the loop, calls a method that creates a random string of text for an insert into a database. I want the loop to call this method before it executes the query, so every insert into the database has a different random string of characters.
What seems to be happening, is that the loop runs too quickly, and the random string is inserted about 50 times because the dynamic string variable doesn't get updated quick enough. However, if I throw in a Thread.Sleep(50), the code executes perfectly.
I don't like the thread.sleep option because I don't know exactly how long it needs to sleep, and this time will add up if we start running a few hundred thousand transactions. does anyone have a good solution to ensuring that the method executes completely before it moves on?
for (int i = 0; i < nLoop; i++)
{
rnd.RndName();
query.CommandText = "insert into XXX (col";
query.ExecuteNonQuery();
}
What seems to be happening, is that the loop runs too quickly, and the random string is inserted about 50 times because the dynamic string variable doesn't get updated quick enough .
The instructions inside your loop will be executed one after the other.
Unless rnd.RndName() fires up a separate thread (in which case, show that code) it will complete before the following two statements execute.
If the name is not changing, the problem lies elsewhere.
however, if i throw in a thread.sleep(50), the code executes perfectly.
Nothing in the code you have shown would be sensitive to a thread sleep. If that is having some effect, the issue lies in how rnd.RndName() is implemented. Perhaps you are creating a new instance of Random each time (as suggested in the comment by #rynah)? If so, the instance is initialized using the system time. That would cause the behavior you observe.
The Random class does not really generate random numbers. It generates a deterministic series of numbers for a given seed value. If you seed to the current number of ticks (which I believe Random does), creating many Random instances in quick succession will cause them all to have the same seed, and therefore to produce the exact same sequence of numbers.
Related
This question already has answers here:
Random number generator only generating one random number
(15 answers)
Closed 5 years ago.
Yesterday I wrote my first answer at Programming Puzzles & Code Golf. The question said this:
Given an input string S, print S followed by a non-empty separator
in the following way:
Step 1: S has a 1/2 chance of being printed, and a 1/2 chance for the program to terminate.
Step 2: S has a 2/3 chance of being printed, and a 1/3 chance for the program to terminate.
Step 3: S has a 3/4 chance of being printed, and a 1/4 chance for the program to terminate.
…
Step n: S has a n/(n+1) chance of being printed, and a 1/(n+1) chance for the program to terminate.
So I went and wrote this code (ungolfed):
Action<string> g = s =>
{
var r = new Random();
for (var i = 2; r.Next(i++) > 0;)
Console.Write(s + " ");
};
This code works fine, but then someone said that I could save a few bytes creating the r variable inline, like this:
Action<string> g = s =>
{
for (var i = 2; new Random().Next(i++) > 0;)
Console.Write(s + " ");
};
I tried but when I executed the code, it always went in one of two possibilities:
Either the program halted before printing anything (the first call to Next() returns 0), or
The program never stops (the calls to Next() never return 0).
When I reverted the code to my original proposal, the program stopped more randomly as expected by the OP.
I know that the new Random() constructor depends on time, but how much? If I add a Sleep() call, the code behaviour starts to seem really random (but not much, the strings returned are still longer than the ones returned by the initial code):
Action<string> g = s =>
{
for (var i = 2; new Random().Next(i++) > 0; Thread.Sleep(1))
Console.Write(s + " ");
};
If I increment the sleep time to 10 ms, now the code really behaves like the original one.
So why is this? How much does the Random class depends on time? How exactly does the Random class seeds the number generator when calling the empty constructor?
Note: I know that creating a single Random object is the best practice, I just wanted to know a bit more than what the MSDN says:
The default seed value is derived from the system clock and has finite resolution.
What is that "finite resolution" the Random class default constructor uses as seed? How much time should we separate the construction of two Random objects to get different sequences? How much would those two different sequences differ when creating the Random instances too close in time?
What is that "finite resolution" the Random class default constructor uses as seed?
It uses Environment.TickCount which has a resolution of one millisecond.
How much time should we separate the construction of two Random objects to get different sequences?
As per the previous section, by at least one millisecond - or manually feed another seed to the constructor each time (or, well, reuse the same generator?)
How much would those two different sequences differ when creating the Random instances too close in time?
Matt Moss did a nice visualization in his Random Headaches from System.Random blog post:
Each row of the bitmap represents five first generated numbers with that seed (without preserving generated order):
As you can see, numbers are being selected from distinct but related sequences. As MSDN on Random says, "The chosen numbers... are sufficiently random for practical purposes."
I have a question regarding thread ordering for the TPL.
Indeed, it is very important for me that my Parallel.For loop to be executed in the order of the loop. What I mean is that given 4 threads, i would like the first thread to execute every 4k loop, 2nd thread every 4k+1 etc with (k between 0 and, NbSim/4).
1st thread -> 1st loop, 2nd thread -> 2nd loop , 3rd thread -> 3rd loop
4th thread -> 4th loop , 1th thread -> 5th loop etc ...
I have seen the OrderedPartition directive but I am not quite sure of the way I should apply it to a FOR loop and not to a Parallel.FOREACH loop.
Many Thanks for your help.
Follwing the previous remkarks, I am completing the description :
Actually, after some consideration, I believe that my problem is not about ordering.
Indeed, I am working on a Monte-Carlo engine, in which for each iteration I am generating a set of random numbers (always the same (seed =0)) and then apply some business logic to them. Thus everything should be deterministic and when running the algorithm twice I should get the exact same results. But unfortunately this is not the case, and I am strugeling to understand why. Any idea, on how to solve that kind of problems (without printing out every variable I have)?
Edit Number 2:
Thank you all for your suggestions
First, here is the way my code is ordered :
ParallelOptions options = new ParallelOptions();
options.MaxDegreeOfParallelism = 4; //or 1
ParallelLoopResult res = Parallel.For<LocalDataStruct>(1,NbSim, options,
() => new LocalDataStruct(//params of the constructor of LocalData),
(iSim, loopState, localDataStruct) => {
//logic
return localDataStruct;
}, localDataStruct => {
lock(syncObject) {
//critical section for outputting the parameters
});
When setting the degreeofParallelism to 1,everything works fine, however when setting the degree of Parallelism to 4 I am getting results that are false and non deterministic (when running the code twice I get different results). It is probably due to mutable objects that is what I am checking now, but the source code is quite extensive, so it takes time. Do you think that there is a good strategy to check the code other than review it (priniting out all variables is impossible in this case (> 1000)? Also when setting the Nb of Simulation to 4 for 4 threads everything is working fine as well, mostly due to luck I believe ( that s why I metionned my first idea regarding ordering).
You can enforce ordering in PLINQ but it comes at a cost. It gives ordered results but does not enforce ordering of execution.
You really cannot do this with TPL without essentially serializing your algorithm. The TPL works on a Task model. It allows you to schedule tasks which are executed by the scheduler with no guarantee as to the order in which the Tasks are executed. Typically parallel implementations take the PLINQ approach and guarantee ordering of results not ordering of execution.
Why is ordered execution important?
So. For a Monte-Carlo engine you would need to make sure that each index in your array received the same random numbers. This does not mean that you need to order your threads, just make the random numbers are ordered across the work done by each thread. So if each loop of your ParallelForEach was passed not only the array of elements to do work on but also it's own instance of a random number generator (with a different fixed seed per thread) then you will still get deterministic results.
I'm assuming that you are familiar with the challenges related to parallelizing Monte-Carlo and generating good random number sequences. If not here's something to get you started;Pseudo-random Number Generation for
Parallel Monte Carlo—A Splitting Approach, Fast, High-Quality, Parallel Random-Number Generators: Comparing Implementations.
Some suggestions
I would start off by ensuring that you can get deterministic results in the sequential case by replacing the ParallelForEach with a ForEach and see if this runs correctly. You could also try comparing the output of a sequential and a parallel run, add some diagnostic output and pipe it to a text file. Then use a diff tool to compare the results.
If this is OK then it is something to do with your parallel implementation, which as is pointed out below is usually related to mutable state. Some things to consider:
Is your random number generator threadsafe? Random is a poor random number generator at best and as far as I know is not designed for parallel execution. It is certainly not suitable for M-C calculations, parallel or otherwise.
Does your code have other state shared between threads, if so what is it? This state will be mutated in a non-deterministic manner and effect your results.
Are you merging results from different threads in parallel. The non-associativity of parallel floating point operations will also cause you issues here, see How can floating point calculations be made deterministic?. Even if the thread results are deterministic if you are combining them in a non deterministic way you will still have issues.
Assuming all threads share the same random number generator, then although you are generating the same sequence every time, which thread gets which elements of this sequence is non-deterministic. Hence you could arrive at different results.
That's if the random number generator is thread-safe; if it isn't, then it's not even guaranteed to generate the same sequence when called from multiple threads.
Apart from that it is difficult to theorize what could be causing non-determinism to arise; basically any global mutable state is suspicious. Each Task should be working with its own data.
If, rather than using a random number generator, you set up an array [0...N-1] of pre-determined values, say [0, 1/N, 2/N, ...], and do a Parallel.ForEach on that, does it still give nondeterministic results? If so, the RNG isn't the issue.
I am creating a live editor in Windows 8.1 App using JavaScript. Almost done with that, but the problem is whenever I run such bad loops or functions then it automatically hangs or exits.
I test it with a loop such as:( It just a example-user may write its loop in its own way..)
for(i=0;i<=50000;i++)
{
for(j=0;j<5000;j++){
$('body').append('hey I am a bug<br>');
}
}
I know that this is a worst condition for any app or browser to handle that kind of loop. So here I want that if user uses such a loop then how I handle it, to produce their output?
Or if its not possible to protect my app for that kind of loop, if it is dangerous to my app so I alert the user that:
Running this snippet may crash the app!
I have an idea to check the code by using regular expressions if code have something like for(i=0;i<=5000;i++) then the above alert will show, how to do a Regex for that?
Also able to include C# as back-end .
Unfortunately, without doing some deep and complex code analysis of the edited code, you'll not be able to fully prevent errant JavaScript that kills your application. You could use, for example, a library that builds an abstract syntax tree from JavaScript and not allow code execution if certain patterns are found. But, the number of patterns that could cause an infinite loop are large, so it would not be simple to find, and it's likely to not be robust enough.
In the for example, you could modify the code to be like this:
for(i=0;!timeout() && i<=50000;i++)
{
for(j=0;!timeout() && j<5000;j++){
$('body').append('hey I am a bug<br>');
}
}
I've "injected" a call to a function you'd write called timeout. In there, it would need to be able to detect whether the loop should be aborted because the script has been running too long.
But, that could have been written with a do-while, so that type of loop would need to be handled.
The example of using jQuery for example in a tight loop, and modifying the DOM means that solutions that trying to isolate the JavaScript into a Web Worker would be complex, as it's not allowed to manipulate the DOM directly. It can only send/receive "string" messages.
If you had used the XAML/C# WebView to host (and build) the JavaScript editor, you could have considered using an event that is raised called WebView.LongRunningScriptDetected. It is raised when a long running script is detected, providing the host the ability to kill the script before the entire application becomes unresponsive and is killed.
Unfortunately, this same event is not available in the x-ms-webview control which is available in a WinJS project.
I've got 2 solutions:
1.
My first solution would be defining a variable
startSeconds=new Date().getSeconds();.
Then, using regex, I'm inserting this piece of code inside the nested loop.
;if(startSecond < new Date().getSeconds())break;
So, what it does is each time the loop runs, it does two things:
Checks if startSecond is less than current seconds new Date().getSeconds();.
For example, startSecond may be 22. new Date().getSeconds() may return 24.Now, the if condition succeeds so it breaks the loop.
Mostly, a non dangerous loop should run for about 2 to 3 seconds
Small loops like for(var i=0;i<30;i++){} will run fully, but big loops will run for 3 to 4 seconds, which is perfectly ok.
My solution uses your own example of 50000*5000, but it doesn't crash!
Live demo:http://jsfiddle.net/nHqUj/4
2.
My second solution would be defining two variables start, max.
Max should be the maximum number of loops that you are willing to run. Example 1000.
Then, using regex, I'm inserting this piece of code inside the nested loop.
;start+=1;if(start>max)break;
So, what it does is each time the loop runs, it does two things:
Increments the value of start by 1.
Checks whether start is greater than the max. If yes, it breaks the loop.
This solution also uses your own example of 50000*5000, but it doesn't crash!
Updated demo:http://jsfiddle.net/nHqUj/3
Regex I'm using:(?:(for|while|do)\s*\([^\{\}]*\))\s*\{([^\{\}]+)\}
One idea, but not sure what is your editor is capable of..
If some how you can understand that this loop may cause problem(like if a loop is more than 200 times then its a issue) and for a loop like that from user if you can change the code to below to provide the output then it will not hang. But frankly not sure if it will work for you.
var j = 0;
var inter = setInterval( function(){
if( j<5000 ){
$('#test').append('hey I am a bug<br>');
++j;
} else {
clearInterval(inter);
}
}, 100 );
Perhaps inject timers around for loops and check time at the first line. Do this for every loop.
Regex: /for\([^{]*\)[\s]*{/
Example:
/for\([^{]*\)[\s]*{/.test("for(var i=0; i<length; i++){");
> true
Now, if you use replace and wrap the for in a grouping you can get the result you want.
var code = "for(var i=0; i<length; i++){",
testRegex = /(?:for\([^{]*\)[\s]*{)/g,
matchReplace = "var timeStarted = new Date().getTime();" +
"$1" +
"if (new Date().getTime() - timeStarted > maxPossibleTime) {" +
"return; // do something here" +
"}";
code.replace(textRegex, matchReplace);
You cannot find what user is trying to do with a simple regex. Lets say, the user writes his code like...
for(i=0;i<=5;i++)
{
for(j=0;j<=5;j++){
if(j>=3){
i = i * 5000;
j = j * 5000;
}
$('body').append('hey I am a bug<br>');
}
}
Then with a simple regex you cannot avoid this. Because the value of i is increased after a time period. So the best way to solve the problem is to have a benchmark. Say, your app hangs after continuos processing of 3 minutes(Assume, until your app hits 3 minutes of processing time, its running fine). Then, whatever the code the user tries to run, you just start a timer before the process and if the process takes more than 2.5 minutes, then you just kill that process in your app and raise a popup to the user saying 'Running this snippet may crash the app!'... By doing this way you dont even need a regex or to verify users code if it is bad...
Try this... Might help... Cheers!!!
Let's assume you are doing this in the window context and not in a worker. Put a function called rocketChair in every single inner loop. This function is simple. It increments a global counter and checks the value against a global ceiling. When the ceiling is reached rocketChair summarily throws "eject from perilous code". At this time you can also save to a global state variable any state you wish to preserve.
Wrap your entire app in a single try catch block and when rocket chair ejects you can save the day like the hero you are.
I have a very strange problem that I never been in touch with in my entire life.
This is what I been up to:
I have programmed a game that involves you going to throw two dices and the sum of the two dices should be seven for you to win.
This is how the interface is built:
The textbox1 shows the value of first thrown dice.
The textbox2 shows the value of second thrown dice.
The textbox3 shows the sum of the both dices.
The button1 throws the dices.
This is the problem:
When i Debugg (F5) the application in Visual Studio 2013 Ultimate
the textboxes gets the exactly same value all the time. This is wrong, it shouldn't act like this.
When i Step Into (F11) the application/code the textboxes gets
different values, just as it should be, this is right, this is how the program should act.
Is there anyone that can help with this problem, i think that I have just missed a very small but a obvious thing that I have missed but I really can't find anything, I'm actually out of ideas!
Attachments
Here is all the files, I hope it will help you, the program is written in Swedish but I don't think that makes any problem, if it do, I can translate the whole solution to English.
The whole Solution: Throw_Dices.zip
The Code: Big picture on three screens of the code
From MSDN:
different Random objects that are created in close succession by a
call to the default constructor will have identical default seed
values and, therefore, will produce identical sets of random numbers
In your Kasta.cs, create a static instance of Random instead of multiple ones.
public class Tarning
{
private static Random ran = new Random();
int slump;
public int Kasta()
{
//Random ran = new Random();
slump = ran.Next(1, 6);
return slump;
}
}
Another possibility would be to create a seed manually. For instance like
public int Kasta()
{
byte[] seed = new byte[4];
new RNGCryptoServiceProvider().GetBytes(seed);
int seedInt = BitConverter.ToInt32(seed, 0);
Random ran = new Random(seedInt);
slump = ran.Next(1, 6);
return slump;
}
Instead of creating two Dice (Tarning ?)
Create one and roll it twice.
Or create both on start up, Or perhaps have a class that holds 2 dice.
and throw them again.
You should also google random and seeding, what's happening is from the same seed value, you get the same sequence of random numbers. Debugging is introducing enough of a delay between the new Random calls, that the seed (based on the clock) has changed between the two calls.
PS your button1Click handler
should set the three textbox values, not trigger textbox changed events which then set them. Imagine if you wanted to reuse your code, you'd have to create a UI to do it.
A better way would be to have a class that held two (or n) dice with a Roll method and a property that returned the result. Then you could reuse it without worrying about when and how.
Short Question
I have a loop that runs 180,000 times. At the end of each iteration it is supposed to append the results to a TextBox, which is updated real-time.
Using MyTextBox.Text += someValue is causing the application to eat huge amounts of memory, and it runs out of available memory after a few thousand records.
Is there a more efficient way of appending text to a TextBox.Text 180,000 times?
Edit I really don't care about the result of this specific case, however I want to know why this seems to be a memory hog, and if there is a more efficient way to append text to a TextBox.
Long (Original) Question
I have a small app which reads a list of ID numbers in a CSV file and generates a PDF report for each one. After each pdf file is generated, the ResultsTextBox.Text gets appended with the ID Number of the report that got processed and that it was successfully processed. The process runs on a background thread, so the ResultsTextBox gets updated real-time as items get processed
I am currently running the app against 180,000 ID numbers, however the memory the application is taking up is growing exponentially as time goes by. It starts by around 90K, but by about 3000 records it is taking up roughly 250MB and by 4000 records the application is taking up about 500 MB of memory.
If I comment out the update to the Results TextBox, the memory stays relatively stationary at roughly 90K, so I can assume that writing ResultsText.Text += someValue is what is causing it to eat memory.
My question is, why is this? What is a better way of appending data to a TextBox.Text that doesn't eat memory?
My code looks like this:
try
{
report.SetParameterValue("Id", id);
report.ExportToDisk(ExportFormatType.PortableDocFormat,
string.Format(#"{0}\{1}.pdf", new object[] { outputLocation, id}));
// ResultsText.Text += string.Format("Exported {0}\r\n", id);
}
catch (Exception ex)
{
ErrorsText.Text += string.Format("Failed to export {0}: {1}\r\n",
new object[] { id, ex.Message });
}
It should also be worth mentioning that the app is a one-time thing and it doesn't matter that it is going to take a few hours (or days :)) to generate all the reports. My main concern is that if it hits the system memory limit, it will stop running.
I'm fine with leaving the line updating the Results TextBox commented out to run this thing, but I would like to know if there is a more memory efficient way of appending data to a TextBox.Text for future projects.
I suspect the reason the memory usage is so large is because textboxes maintain a stack so that the user can undo/redo text. That feature doesn't seem to be required in your case, so try setting IsUndoEnabled to false.
Use TextBox.AppendText(someValue) instead of TextBox.Text += someValue. It's easy to miss since it's on TextBox, not TextBox.Text. Like StringBuilder, this will avoid creating copies of the entire text each time you add something.
It would be interesting to see how this compares to the IsUndoEnabled flag from keyboardP's answer.
Don't append directly to the text property. Use a StringBuilder for the appending, then when done, set the .text to the finished string from the stringbuilder
Instead of using a text box I would do the following:
Open up a text file and stream the errors to a log file just in case.
Use a list box control to represent the errors to avoid copying potentially massive strings.
Personally, I always use string.Concat* . I remember reading a question here on Stack Overflow years ago that had profiling statistics comparing the commonly-used methods, and (seem) to recall that string.Concat won out.
Nonetheless, the best I can find is this reference question and this specific String.Format vs. StringBuilder question, which mentions that String.Format uses a StringBuilder internally. This makes me wonder if your memory hog lies elsewhere.
**based on James' comment, I should mention that I never do heavy string formatting, as I focus on web-based development.*
Maybe reconsider the TextBox? A ListBox holding string Items will probably perform better.
But the main problem seem to be the requirements, Showing 180,000 items cannot be aimed at a (human) user, neither is changing it in "Real Time".
The preferable way would be to show a sample of the data or a progress indicator.
When you do want to dump it at the poor User, batch string updates. No user could descern more than 2 or 3 changes per second. So if you produce 100/second, make groups of 50.
Some responses have alluded to it, but nobody has outright stated it which is surprising.
Strings are immutable which means a String cannot be modified after it is created. Therefore, every time you concatenate to an existing String, a new String Object needs to be created. The memory associated with that String Object also obviously needs to be created, which can get expensive as your Strings become larger and larger. In college, I once made the amateur mistake of concatenating Strings in a Java program that did Huffman coding compression. When you're concatenating extremely large amounts of text, String concatenation can really hurt you when you could have simply used StringBuilder, as some in here have mentioned.
Use the StringBuilder as suggested.
Try to estimate the final string size then use that number when instantiating the StringBuilder. StringBuilder sb = new StringBuilder(estSize);
When updating the TextBox just use assignment eg: textbox.text = sb.ToString();
Watch for cross-thread operations as above. However use BeginInvoke. No need to block
the background thread while the UI updates.
A) Intro: already mentioned, use StringBuilder
B) Point: don't update too frequently, i.e.
DateTime dtLastUpdate = DateTime.MinValue;
while (condition)
{
DoSomeWork();
if (DateTime.Now - dtLastUpdate > TimeSpan.FromSeconds(2))
{
_form.Invoke(() => {textBox.Text = myStringBuilder.ToString()});
dtLastUpdate = DateTime.Now;
}
}
C) If that's one-time job, use x64 architecture to stay within 2Gb limit.
StringBuilder in ViewModel will avoid string rebindings mess and bind it to MyTextBox.Text. This scenario will increase performance many times over and decrease memory usage.
Something that has not been mentioned is that even if you're performing the operation in the background thread, the update of the UI element itself HAS to happen on the main thread itself (in WinForms anyway).
When updating your textbox, do you have any code that looks like
if(textbox.dispatcher.checkAccess()){
textbox.text += "whatever";
}else{
textbox.dispatcher.invoke(...);
}
If so, then your background op is definitely being bottlenecked by the UI Update.
I would suggest that your background op use StringBuilder as noted above, but instead of updating the textbox every cycle, try updating it at regular intervals to see if it increases performance for you.
EDIT NOTE:have not used WPF.
You say memory grows exponentially. No, it is a quadratic growth, i.e. a polynomial growth, which is not as dramatic as an exponential growth.
You are creating strings holding the following number of items:
1 + 2 + 3 + 4 + 5 ... + n = (n^2 + n) /2.
With n = 180,000 you get total memory allocation for 16,200,090,000 items, i.e. 16.2 billion items! This memory will not be allocated at once, but it is a lot of cleanup work for the GC (garbage collector)!
Also, bear in mind, that the previous string (which is growing) must be copied into the new string 179,999 times. The total number of copied bytes goes with n^2 as well!
As others have suggested, use a ListBox instead. Here you can append new strings without creating a huge string. A StringBuild does not help, since you want to display the intermediate results as well.