Thread Executing the Same code block - passing parameters from a dataset c# - c#

Let me preface my question by stating that I am a casual developer, so I don't really know what I am doing past the basics.
This is developed in c# and .net 3.5.
The core of what my current application is doing is to connect to remote servers, execute a WMI call, retrieve some data and then place this data into a database. It is a simple healthcheck application.
The basic code runs fine, but I ran into an issue where if a few servers were offline, it would take 1 minute to timeout (which is realistic because of network bandwidth etc). I had an execution of the application run for 45 minutes (because 40 servers were offline) which is not efficient, since the code executed for 45 minutes and 40 minutes wait time.
After some research I think that using threads would be the best way to get around it, if I spawned a thread for each of the servers as it was processing.
Here is my thread code:
for (int x = 0; x < mydataSet.Tables[0].Rows.Count; x++)
{
Thread ts0 = new Thread(() =>
executeSomeSteps(mydataSet.Tables[0].Rows[x]["IPAddress"].ToString(),
mydataSet.Tables[0].Rows[x]["ID"].ToString(), connString, filepath));
ts0.Start();
}
The dataset contains an ID reference and an IP Address. The execute some steps function looks like this:
static void executeSomeSteps(string IPAddress, string ID, string connstring, string filepath)
{
string executeStuff;
executeStuff = funclib.ExecuteSteps(IPAddress, ID, connstring, filepath);
executeStuff = null;
}
And execute some steps inserts data into a database based on returned wmi results. This process works fine as mentioned earlier but the problem is that some of the threads in the above for loop end up with the same data, and it executes more than once per server. There are often up to 5 records for a single server once the process completes.
From the research I have done, I believe it might be an issue with more than one thread reading the same x value from the dataset.
So now onto my questions:
Assume there are 10 records in the dataset:
Why are there more than 10 executions happening?
Will I still gain the performance if I lock the dataset value?
Can someone point me into the right direction regarding how to deal with variable data being passed to a static function by multiple threads?

What Davide refers to is that at the time you thread is executed the captured values from Rows[x] may be different (are even likely to be different) than when the delegate was created. This is because you the for loops goes on while the threads start running. This is a very common gotcha. It may even happen without servers timing out.
The solution to this "modified closure" problem is to use new variables for each thread:
for (int x = 0; x < mydataSet.Tables[0].Rows.Count; x++)
{
string ip = mydataSet.Tables[0].Rows[x]["IPAddress"].ToString();
string id = mydataSet.Tables[0].Rows[x]["ID"].ToString();
Thread ts0 = new Thread(() => executeSomeSteps(ip, id, connString, filepath));
ts0.Start();
}
You may even encounter a System.ArgumentOutOfRangeException because when the for loop has finished, the last x++ may have executed, making x 1 higher than the row indexes. Any thread reaching its Rows[x] part will then throw the exception.
Edit
This issue kept bugging me. I think what you describe in your comment (it looked like extra records were being generated by one iteration) is exactly what the modified closure does. A few threads happen to start roughly at the same time, all taking the value of x at that moment. You must also have found that servers were skipped in one time lap, I cannot image that not happening.

Related

How do I avoid two (or more) threads that work on a table at the same time to not work on same row?

I am trying to make a C# WinForms application that fetches data from a url that is saved in a table named "Links". And each link has a "Last Checked" and "Next Check" datetime and there is "interval" which decides "next check" based on last check.
Right now, what I am doing is fetching ID with a query BEFORE doing the webscraping, and after that I turn Last Checked into DateTime.Now and Next Check into null untill all is completed. Which both then gets updated, after web scraping is done.
Problem with this is if there is any "abort" with an ongoing process, lastcheck will be a date, but nextcheck will be null.
So I need a better way for two processes to not work on same table's same row. But not sure how.
For a multithreaded solution, the standard engineering approach is to use a pool of workers and a pool of work.
This is just a conceptual sketch - you should adapt it to your circumstances:
A worker (i.e. a thread) looks at the pool of work. If there is some work available, it marks it as in_progress. This has to be done so that no two threads can take the same work. For example, you could use a lock in C# to do the query in a database, and to mark a row before returning it.
You need to have a way of un-marking it after the thread finishes. Successful or not, in_progress must be re-set. Typically, you could use a finally block so that you don't miss it in the event of any exception.
If there is no work available, the thread goes to sleep.
Whenever a new work arrives (i.e. INSERT, or a nextcheck is due), one of sleeping threads is awakened.
When your program starts, it should clear any in_progress flags in the event of a previous crash.
You should take advantage of DBMS transactions so that any changes a worker makes after completing its work are atomic - i.e. other threads percieve them as they had happened all at once.
By changing the size of worker pool, you can set the maximum number of simultaneously active workers.
First thing, the separation of controller/workers might be a better pattern as mentioned in other answer. This will work better if the number of threads gets large and te number of links to check is large.
But if your problem is this:
But problem with it is, if for any reason that scraping gets
aborted/finishes halfway/doesn't work properly, LastCheck becomes
DateTime.Now but NextCheck is left NULL, and previous
LastCheck/NextCheck values are gone, and LastCheck/NextCheck values
are updated for a link that is not actually checked
You just need to handle errors better.
The failure will result in exception. Catch the exception and handle it by resetting the state in the database. For example:
void DoScraping(.....)
{
try
{
// ....
}
catch (Exception err)
{
// oh dear, it went wrong, reset lastcheck/nextcheck
}
}
What you reset last/nextcheck to depends on you. You could reset them to what they where at the start if when you determine 'the next thing to do' you also get the values of last/nextcheck and store in variables. Then in the event of failure just set to what they were before.

Reading database values and executing function in new Thread

I am for the first time trying to use Thread in my windows service application.Now as per my condition i have to read data from database and if it matches with condition i have to execute a function in new thread.Now the main concern is that as my function which meant to execute in new Thread is lengthy and will take time so i have a query that, Will my program will reach to datareader code and read the new value from the database while my function keeps on executing in the background in thread.My application execution logic is time specific.
Here is the code..
while (dr.Read())
{
time = dr["SendingTime"].ToString();
if ((str = DateTime.Now.ToString("HH:mm")).Equals(time))
{
//Execute Function and send reports based on the data from the database.
Thread thread = new Thread(sendReports);
thread.Start();
}
}
Please help me..
Yep, as the comments said, you will have one thread per row. if you have 4-5 rows, and you'll run that code, you'll get 4-5 threads working happily in the back.
You might be happy with it, and leave it, and in half a year, someone else will play with the DB, and you'll get 10K rows, and this will create 10K threads, and you'll be on a holiday and people will call you panicking because the program is broken ...
In other words, you don't want to do it, because it's a bad practice.
You should either use a queue with working units, and have a fixed number of threads reading from those queues (in which case you might have 10K units there, but lets say 10 threads that will pick them up and process them until they are done), or some other mechanism to make sure you don't create a thread per row.
Unless of course, you don't care ...

Need to run migration console app: Too many records killing process - no idea how to solve

I've written a bit of code to manipulate data for a comprehensive transaction. However, am suffering endless problems and dead ends. If I run my code on a small data set, it works as expected. But now that I've had the production environment restored to the testing db, to get a full scope of testing. I basically wasted my time as best I can tell.
private static void AddProvisionsForEachSupplement2(ISupplementCoordinator supplmentCord)
{
var time = DateTime.Now;
using (var scope = new UnitOfWorkScope())
{
var supplements = supplmentCord.GetContracts(x => x.EffectiveDate <= new DateTime(2014, 2, 27)).AsEnumerable();
foreach (var supplement in supplements){
var specialProvisionTable = supplement.TrackedTables.FirstOrDefault(x => x.Name == "SpecialProvisions");
SetDefaultSpecialProvisions(specialProvisionTable, supplement);
Console.Out.WriteLine(supplement.Id.ToString() + ": " + (DateTime.Now - time).TotalSeconds);
}
}
}
You can see I decided to test my timing, it takes roughly 300+ seconds to complete the loop, and then the 'commit' that occurs is obscenely long. Probably longer.
I get this error:
The transaction associated with the current connection has completed but has not been disposed. The transaction must be disposed before the connection can be used to execute SQL statements.
I added [Transaction(Timeout = 6000)] to even get that, before I was getting a transaction timeout.
There is so much more info that we need and giving you a definitive answer is going to be tough. However when dealing with large datasets in one massive transaction is always going to hurt performance.
However, the first thing I would do too be honest is to hook up a SQL profiler (like NHProf, or SQL profiler, or even Log4Net) to see how many queries you are issuing. Then as you test you can see if you can reduce the queries which will in turn reduce the time.
Then after working this out you have a few options:-
Use a stateless session to grab data, but this removes change tracking so you will need to persist it later manually
Reduce the single big transactions into smaller transactions (this is what I would start with)
See if eager loading a whole lot of data in one hit might be more efficient
See if using batching is going to help you
Don't give up, fight and understand why this is giving you issues, this is the key becoming a master of NHibernate (rather than a slave to it)
Good luck :)

Issue calling database trigger with long process in C#

I want to read a huge directory and its subdirectories and files ,then write to database.Everything is fine but i put a trigger on a table that it is fired when a data is inserted and update another table.Trigger works fine with a single sql command but
Due to long process in the main program , trigger is not fired. I am using queue dequeue , and backroundworker thread.(c#)
How can this problem be solved.?any idea apreciated.
I assume that the trigger is working OK, but you need all the data to be processed before seeing the trigger effects take place. Therefore, I suggest that you split the data into smaller pieces (batches), and insert them to the database one by one. Basically, choose a size of the batch that suits best your setup and load the data on iterations.
Here is some example C# code:
public void ProcessData(String rootDirectory, int batchSize)
{
IEnumerable<string> pathsToProcess = GetPathsToProcess(rootDirectory);
int currentBatch = 0;
while (currentBatch*batchSize < pathsToProcess.Length)
{
// take a subset of the paths to process
IEnumerable<string> batch = pathsToProcess
.Skip(currentBatch*batchSize)
.Take(batchSize);
DoYourDatabaseLogic(batch);
currentBatch++;
}
}
The code above will execute the database operation for a smaller subset of data, after which your trigger will execute against that data. This will happen for each of the batches. You will still have to wait for all the batches to complete, but you can see the changes for the ones that have completed.
Using this approach brings, however, an important issue to worry about: What would happen if some of the batches fails for some reason?
In case you must revert all the changes for the entire pathsToProcess collection if a single batch/subset of it fails, you should organize the above code to run in a single database transaction, and ensure the rollback takes place appropriately.
If the pathsToProcess collection is not required to be rolled back entirely, I still recommend using transactions on each of the batches. In that case you may need to know which batch did you write last successfully, in order to resume from it if the data is to get processed again.

BackgroundWorker Question (C# Windows Forms)

I am on MSDN reading about the BackgroundWorker class and I have a question about how it works.
The following code has a for loop in it. And inside the for loop, in the else clause, you're supposed to: Perform a time consuming operation and report progress.
But, why is there a for loop, and why is its maximum value only 10?
private void bw_DoWork(object sender, DoWorkEventArgs e)
{
BackgroundWorker worker = sender as BackgroundWorker;
for (int i = 1; (i <= 10); i++)
{
if ((worker.CancellationPending == true))
{
e.Cancel = true;
break;
}
else
{
// Perform a time consuming operation and report progress.
System.Threading.Thread.Sleep(500);
worker.ReportProgress((i * 10));
}
}
}
I have a really massive database, and it takes sometimes up to a minute to check for new orders based on certain criteria. I don't want to guess how long it may take to complete a query, I want actual progress. How can I make this background worker report progress based on a MySQL SELECT query?
How can I make this background worker report progress based on a MySQL SELECT query?
You can't. That's one of the problems with a synchronous method call that you cannot predict ahead of time how long it is going to take. You have two cut points of time to deal with. Before you call the method, and after you call the method. You do not get anything in between. Either the method has returned, or it has not.
You can use statistics to your advantage though. You can record how long it takes each time it executes, store that, and use that to calculate a prediction, but it's never going to be accurate. With such a prediction, you could space out progress reporting accordingly so that you end up at 100% at or around the statistical prediction you've calculated.
However, if the database is slower or faster than usual, it'll be off.
Also note that whichever thread that is calling into MySQL to retrieve data can not be the same thread that is reporting progress, since it will be "waiting" for the MySQL database and the .NET code that talks to it to return with the data, all in one piece. You need to spin up yet another thread that reports the progress.
But, why is there a for loop, and why
is its maximum value only 10?
In the example, the worker is reporting progress between 10 an 100, purely out of simplicity. The values 10 to 100 come from i (1-10), and the * 10 in ReportProgress.
The documentation says that ReportProgress takes:
The percentage, from 0 to 100, of the
background operation that is complete.
When you write it for your really massive database, you must report progress as a percentage, between 0 and 100.
Given that your database may take "up to a minute", 1% is slightly more than 1/2 second, so you should see any associated progress bar move every 1/2 second or so. That sounds like pretty smooth reporting to me.
(Other answers describe why its difficult to attach the progress to a SQL-query)
You'll need to figure out a way to measure the progress of your query. Instead of one long query, you might be able to do it in batches (say of 10, then the progress increments by 10% each time).
The example is showing how to batch long processes so they can be reported. The 'sleep' instruction in the example would be replaced by a call to a method that did a time-consuming, batchable job.
In your case, unless you can split up your query into multiple parts, you can't really use ReportProgress to give feedback - you won't have any progress to report. A SQL query is a one-shot run, and ReportProgress is used for batchable things.
You may want to look into optimizing your database - it's possible that an index on a heavily-used table or something similar could be a big help. If this isn't possible, you'll have to find a way to do batched queries against the data (or get back the whole thing and go through it in code - ugh) if you want to be able to report meaningful progress.
The example code is just that: An example. So the 10 is arbitrary. It simply shows an example of estimating progress. In this case there are 10 discrete steps, so it can estimate progress easily.
I don't want to guess how long it may
take to complete a query, I want
actual progress.
A database query provides no means to report progress. You cannot do anything but guess.
What I do is this:
Assume that the longest time it will take is the timeout period for the connection. This way if the query fails because the connection died, the user will get a perfectly accurate progress bar. Most queries take far, far less time than the timeout value, so the user sees a little progress and then suddenly it completes. This gives the user the illusion that things happened better than expected!
To accomplish it I perform the Db query asyncronously and run the progress bar off a UI timer rather than using a BackgroundWorker.

Categories

Resources