I am for the first time trying to use Thread in my windows service application.Now as per my condition i have to read data from database and if it matches with condition i have to execute a function in new thread.Now the main concern is that as my function which meant to execute in new Thread is lengthy and will take time so i have a query that, Will my program will reach to datareader code and read the new value from the database while my function keeps on executing in the background in thread.My application execution logic is time specific.
Here is the code..
while (dr.Read())
{
time = dr["SendingTime"].ToString();
if ((str = DateTime.Now.ToString("HH:mm")).Equals(time))
{
//Execute Function and send reports based on the data from the database.
Thread thread = new Thread(sendReports);
thread.Start();
}
}
Please help me..
Yep, as the comments said, you will have one thread per row. if you have 4-5 rows, and you'll run that code, you'll get 4-5 threads working happily in the back.
You might be happy with it, and leave it, and in half a year, someone else will play with the DB, and you'll get 10K rows, and this will create 10K threads, and you'll be on a holiday and people will call you panicking because the program is broken ...
In other words, you don't want to do it, because it's a bad practice.
You should either use a queue with working units, and have a fixed number of threads reading from those queues (in which case you might have 10K units there, but lets say 10 threads that will pick them up and process them until they are done), or some other mechanism to make sure you don't create a thread per row.
Unless of course, you don't care ...
Related
I am trying to make a C# WinForms application that fetches data from a url that is saved in a table named "Links". And each link has a "Last Checked" and "Next Check" datetime and there is "interval" which decides "next check" based on last check.
Right now, what I am doing is fetching ID with a query BEFORE doing the webscraping, and after that I turn Last Checked into DateTime.Now and Next Check into null untill all is completed. Which both then gets updated, after web scraping is done.
Problem with this is if there is any "abort" with an ongoing process, lastcheck will be a date, but nextcheck will be null.
So I need a better way for two processes to not work on same table's same row. But not sure how.
For a multithreaded solution, the standard engineering approach is to use a pool of workers and a pool of work.
This is just a conceptual sketch - you should adapt it to your circumstances:
A worker (i.e. a thread) looks at the pool of work. If there is some work available, it marks it as in_progress. This has to be done so that no two threads can take the same work. For example, you could use a lock in C# to do the query in a database, and to mark a row before returning it.
You need to have a way of un-marking it after the thread finishes. Successful or not, in_progress must be re-set. Typically, you could use a finally block so that you don't miss it in the event of any exception.
If there is no work available, the thread goes to sleep.
Whenever a new work arrives (i.e. INSERT, or a nextcheck is due), one of sleeping threads is awakened.
When your program starts, it should clear any in_progress flags in the event of a previous crash.
You should take advantage of DBMS transactions so that any changes a worker makes after completing its work are atomic - i.e. other threads percieve them as they had happened all at once.
By changing the size of worker pool, you can set the maximum number of simultaneously active workers.
First thing, the separation of controller/workers might be a better pattern as mentioned in other answer. This will work better if the number of threads gets large and te number of links to check is large.
But if your problem is this:
But problem with it is, if for any reason that scraping gets
aborted/finishes halfway/doesn't work properly, LastCheck becomes
DateTime.Now but NextCheck is left NULL, and previous
LastCheck/NextCheck values are gone, and LastCheck/NextCheck values
are updated for a link that is not actually checked
You just need to handle errors better.
The failure will result in exception. Catch the exception and handle it by resetting the state in the database. For example:
void DoScraping(.....)
{
try
{
// ....
}
catch (Exception err)
{
// oh dear, it went wrong, reset lastcheck/nextcheck
}
}
What you reset last/nextcheck to depends on you. You could reset them to what they where at the start if when you determine 'the next thing to do' you also get the values of last/nextcheck and store in variables. Then in the event of failure just set to what they were before.
I should first mention I am new to programming. I will explain my problem and my question to the best of my ability.
I was wondering if it is possible to update a single column of an ObjectListView (from here on referred to as OLV). What I would like is to have a column that would display the "Elapsed Time" of each row. This Elapsed Time column would be refreshed every 15 to 30 seconds.
How I set my OLV
myOLV.SetObjects(GetMyList());
GetMyList method returns a list populated from a simple database query.
Within the GetMyList method, it would convert the DateTime column into a string showing the elapsed time. This is then shown by the OLV as a string, not a datetime...
elapsed_time = ((TimeSpan)(DateTime.Now - x.time_arrived)).ToString("hh\\:mm\\:ss");
How I was trying to make this work was by using a timer that would every 30 seconds recall the GetMyList method. Because the method would re-query the database and return the records it retrieved, they were more up to date. This worked fine until I had more than 20 rows in the OLV. Each "refresh" for 200 rows takes 4 seconds to complete. That is 4 seconds that the UI is unresponsive...
As I am new to programming, I have never done anything with multi-threading, but I did a little reading up on it and attempted to create a solution. Even when I create a new thread to "refresh" the OLV object, it still causes the entire form to become unresponsive.
My question is, how can I have a column within my ObjectListView refresh every 15-30 seconds without causing the entire UI to become unresponsive?
Is it even possible/ a good idea to have the ObjectListView refresh in the background and then display it when it's ready?
The problem is the run time of the database query, not the updating of ObjectListView (updating 200 rows in ObjectListView should take about 200ms). You need to get the database work off the UI thread and onto a background thread. That background thread can then periodically fetch the updated data from the database. Once you have the updated data, updating ObjectListView is very fast.
However, multi-threading is a deep-dive topic that is likely to bite you. There are many things that can go wrong. You would be better off having a button on your UI that the user can click to Refresh the grid, and running everything synchronously. Once the synchronous version is working perfectly, then start working on the much more error prone multi-threaded version.
To strictly answer your question, you would do something like this:
var thread = new Thread(_ => {
while (!_cancelled) {
Thread.Sleep(15 * 1000);
if (!_cancelled) {
var results = QueryDatabase();
this.objectListView1.SetObjects(results);
}
}
});
thread.Start();
_cancelled is a boolean variable in your class that allows you to cancel the querying process. QueryDatabase() is your method that fetches the new data.
This example doesn't deal with errors, which are a significant issue from background threads.
Another gotcha is that most UI components cannot be updated from a background thread, however, ObjectListView has a few methods that are thread-safe and SetObjects() is one of them so you can call it like this.
Just to repeat: you really should start with the synchronous version, and only start thinking about async once you are happy with that one.
I tried to make the title as specific as possible. Basically what I have running inside a backgroundworker thread now is some code that looks like:
SqlConnection conn = new SqlConnection(connstring);
SqlCommand cmd = new SqlCommand(query, conn);
conn.Open();
SqlDataAdapter sda = new SqlDataAdapter(cmd);
sda.Fill(Results);
conn.Close();
sda.Dispose();
Where query is a string representing a large, time consuming query, and conn is the connection object.
My problem now is I need a stop button. I've come to realize killing the backgroundworker would be worthless because I still want to keep what results are left over after the query is canceled. Plus it wouldn't be able to check the canceled state until after the query.
What I've come up with so far:
I've been trying to conceptualize how to handle this efficiently without taking too big of a performance hit.
My idea was to use a SqlDataReader to read the data from the query piece at a time so that I had a "loop" to check a flag I could set from the GUI via a button. The problem is as far as I know I can't use the Load() method of a datatable and still be able to cancel the sqlcommand. If I'm wrong please let me know because that would make cancelling slightly easier.
In light of what I discovered I came to the realization I may only be able to cancel the sqlcommand mid-query if I did something like the below (pseudo-code):
while(reader.Read())
{
//check flag status
//if it is set to 'kill' fire off the kill thread
//otherwise populate the datatable with what was read
}
However, it would seem to me this would be highly ineffective and possibly costly. Is this the only way to kill a sqlcommand in progress that absolutely needs to be in a datatable? Any help would be appreciated!
There are really two stages where cancelling matters:
Cancelling the initial query execution before the first rows are returned
Aborting the process of reading the rows as they are served
Depending on the nature of the actual sql statement, either of these steps could be 99% of the time, so they both should be considered. For example, calling SELECT * on some table with a billion rows will take essentionally no time to execute but will take a very long time read. Conversely, requesting a super complicated join on poorly tuned tables and then wrapping it all in some aggregating clauses may take minutes to execute but negligible time to read the handful of rows once they are actually returned.
Well-tuned advanced database engines will also cache chunks of rows at a time for complicated queries, so you will see alternating pauses where the engine is executing the query on the next batch of rows and then fast bursts of data as it returns the next batch of results.
Cancelling the query execution
In order to be able to cancel a query while it is executing you can use one of the overloads of SqlCommand.BeginExecuteReader to start the query, and call SqlCommand.Cancel to abort it. Alternatively you can call ExecuteReader() syncronously in one thread and still call Cancel() from another. I'm not including code examples because there are plenty of them in the documentation.
Aborting the read operation
Here using a simple boolean flag is probably the easiest way. And remember it's really easy to fill a data table row using the Rows.Add() overload that takes an array of object, that is:
object[] buffer = new object[reader.FieldCount]
while(reader.Read()) {
if(cancelFlag) break;
reader.GetValues(buffer);
dataTable.Rows.Add(buffer);
}
Cancelling blocking calls to Read()
A sort of mixed case occurs when, as mentioned earlier, a call to reader.Read() causes the database engine to do another batch of intensive processing. As noted in the MSDN documentation, calls to Read() can be blocking in this case even if the original query was executed with BeginExecuteReader. You can still get around this by calling Read() in one thread that's handling all the reading but calling Cancel() in another thread. The way you know if you reader is in a blocking Read call is to have another flag the the reader thread updates while the monitoring thread reads:
...
inRead = true
while(reader.Read()) {
inRead = false
...
inRead = true
}
// Somewhere else:
private void foo_onUITimerTick(...) {
status.Text = inRead ? "Waiting for server" : "Reading";
}
Regarding performance of Reader vs Adapter
A DataReader is usually faster than using DataAdapter.Fill(). The whole point of a DataReader is to be really, really fast and responsive for reading. Checking some boolean flag once per row would not add a measurable difference in time even over millions of rows.
The limiting factor for a big database query is not the local CPU processing time but the size of the I/O pipe (your network connection for a remote database or your disk speed for a local one) or a combination of the db server's own disk speed and CPU processing time for a complex query. Both a DataAdapter and a DataReader will spend time (perhaps the majority of the time) just waiting for a few nanoseconds at a time for the next row to be served.
One convenience of DataAdapter.Fill() is that it does the magic of dynamically generating the DataTable columns to match the query results, but that's not difficult to do yourself (see SqlDataReader.GetSchemaTable()).
Just a try
I would suggest you to put a time consuming query in a BackgroundWorker and pass the command to it. so that you can hold the command object in control. When cancel command comes, just say passed(to BackgroundWorker which in progress) command to Cancel by command.Cancel()
Let me preface my question by stating that I am a casual developer, so I don't really know what I am doing past the basics.
This is developed in c# and .net 3.5.
The core of what my current application is doing is to connect to remote servers, execute a WMI call, retrieve some data and then place this data into a database. It is a simple healthcheck application.
The basic code runs fine, but I ran into an issue where if a few servers were offline, it would take 1 minute to timeout (which is realistic because of network bandwidth etc). I had an execution of the application run for 45 minutes (because 40 servers were offline) which is not efficient, since the code executed for 45 minutes and 40 minutes wait time.
After some research I think that using threads would be the best way to get around it, if I spawned a thread for each of the servers as it was processing.
Here is my thread code:
for (int x = 0; x < mydataSet.Tables[0].Rows.Count; x++)
{
Thread ts0 = new Thread(() =>
executeSomeSteps(mydataSet.Tables[0].Rows[x]["IPAddress"].ToString(),
mydataSet.Tables[0].Rows[x]["ID"].ToString(), connString, filepath));
ts0.Start();
}
The dataset contains an ID reference and an IP Address. The execute some steps function looks like this:
static void executeSomeSteps(string IPAddress, string ID, string connstring, string filepath)
{
string executeStuff;
executeStuff = funclib.ExecuteSteps(IPAddress, ID, connstring, filepath);
executeStuff = null;
}
And execute some steps inserts data into a database based on returned wmi results. This process works fine as mentioned earlier but the problem is that some of the threads in the above for loop end up with the same data, and it executes more than once per server. There are often up to 5 records for a single server once the process completes.
From the research I have done, I believe it might be an issue with more than one thread reading the same x value from the dataset.
So now onto my questions:
Assume there are 10 records in the dataset:
Why are there more than 10 executions happening?
Will I still gain the performance if I lock the dataset value?
Can someone point me into the right direction regarding how to deal with variable data being passed to a static function by multiple threads?
What Davide refers to is that at the time you thread is executed the captured values from Rows[x] may be different (are even likely to be different) than when the delegate was created. This is because you the for loops goes on while the threads start running. This is a very common gotcha. It may even happen without servers timing out.
The solution to this "modified closure" problem is to use new variables for each thread:
for (int x = 0; x < mydataSet.Tables[0].Rows.Count; x++)
{
string ip = mydataSet.Tables[0].Rows[x]["IPAddress"].ToString();
string id = mydataSet.Tables[0].Rows[x]["ID"].ToString();
Thread ts0 = new Thread(() => executeSomeSteps(ip, id, connString, filepath));
ts0.Start();
}
You may even encounter a System.ArgumentOutOfRangeException because when the for loop has finished, the last x++ may have executed, making x 1 higher than the row indexes. Any thread reaching its Rows[x] part will then throw the exception.
Edit
This issue kept bugging me. I think what you describe in your comment (it looked like extra records were being generated by one iteration) is exactly what the modified closure does. A few threads happen to start roughly at the same time, all taking the value of x at that moment. You must also have found that servers were skipped in one time lap, I cannot image that not happening.
I am on MSDN reading about the BackgroundWorker class and I have a question about how it works.
The following code has a for loop in it. And inside the for loop, in the else clause, you're supposed to: Perform a time consuming operation and report progress.
But, why is there a for loop, and why is its maximum value only 10?
private void bw_DoWork(object sender, DoWorkEventArgs e)
{
BackgroundWorker worker = sender as BackgroundWorker;
for (int i = 1; (i <= 10); i++)
{
if ((worker.CancellationPending == true))
{
e.Cancel = true;
break;
}
else
{
// Perform a time consuming operation and report progress.
System.Threading.Thread.Sleep(500);
worker.ReportProgress((i * 10));
}
}
}
I have a really massive database, and it takes sometimes up to a minute to check for new orders based on certain criteria. I don't want to guess how long it may take to complete a query, I want actual progress. How can I make this background worker report progress based on a MySQL SELECT query?
How can I make this background worker report progress based on a MySQL SELECT query?
You can't. That's one of the problems with a synchronous method call that you cannot predict ahead of time how long it is going to take. You have two cut points of time to deal with. Before you call the method, and after you call the method. You do not get anything in between. Either the method has returned, or it has not.
You can use statistics to your advantage though. You can record how long it takes each time it executes, store that, and use that to calculate a prediction, but it's never going to be accurate. With such a prediction, you could space out progress reporting accordingly so that you end up at 100% at or around the statistical prediction you've calculated.
However, if the database is slower or faster than usual, it'll be off.
Also note that whichever thread that is calling into MySQL to retrieve data can not be the same thread that is reporting progress, since it will be "waiting" for the MySQL database and the .NET code that talks to it to return with the data, all in one piece. You need to spin up yet another thread that reports the progress.
But, why is there a for loop, and why
is its maximum value only 10?
In the example, the worker is reporting progress between 10 an 100, purely out of simplicity. The values 10 to 100 come from i (1-10), and the * 10 in ReportProgress.
The documentation says that ReportProgress takes:
The percentage, from 0 to 100, of the
background operation that is complete.
When you write it for your really massive database, you must report progress as a percentage, between 0 and 100.
Given that your database may take "up to a minute", 1% is slightly more than 1/2 second, so you should see any associated progress bar move every 1/2 second or so. That sounds like pretty smooth reporting to me.
(Other answers describe why its difficult to attach the progress to a SQL-query)
You'll need to figure out a way to measure the progress of your query. Instead of one long query, you might be able to do it in batches (say of 10, then the progress increments by 10% each time).
The example is showing how to batch long processes so they can be reported. The 'sleep' instruction in the example would be replaced by a call to a method that did a time-consuming, batchable job.
In your case, unless you can split up your query into multiple parts, you can't really use ReportProgress to give feedback - you won't have any progress to report. A SQL query is a one-shot run, and ReportProgress is used for batchable things.
You may want to look into optimizing your database - it's possible that an index on a heavily-used table or something similar could be a big help. If this isn't possible, you'll have to find a way to do batched queries against the data (or get back the whole thing and go through it in code - ugh) if you want to be able to report meaningful progress.
The example code is just that: An example. So the 10 is arbitrary. It simply shows an example of estimating progress. In this case there are 10 discrete steps, so it can estimate progress easily.
I don't want to guess how long it may
take to complete a query, I want
actual progress.
A database query provides no means to report progress. You cannot do anything but guess.
What I do is this:
Assume that the longest time it will take is the timeout period for the connection. This way if the query fails because the connection died, the user will get a perfectly accurate progress bar. Most queries take far, far less time than the timeout value, so the user sees a little progress and then suddenly it completes. This gives the user the illusion that things happened better than expected!
To accomplish it I perform the Db query asyncronously and run the progress bar off a UI timer rather than using a BackgroundWorker.