Put break point before Thread start and you will notice console consuming about 5 to 8 MB memory but once Thread started it spike to 17 to 20 MB memory. And this memory stay used until close console. How can i freeup memory after Thread finished it task? Any better solution?
Now question is: Why i need it since garbage collector will automatically free up memory when needed. I need it because i am doing web scraping and i got a global class to store all scraped html text there and i have to scrape like 10k pages and store that html to global class. What happen is: when i run this app after scrape 500 html data to global class it eat almost 100% of my pc RAM which is 20 GB. So i need to free up RAM. I cant close console app to free up ram bcoz i have some calculation after collect all html.
class DemoData
{
public int Id { get; set; }
public string Text { get; set; }
public static List<DemoData> data = new List<DemoData>();
}
class Program
{
public static void Main()
{
for (var i = 0; i < 5000; i++)
{
DemoData.data.Add(new DemoData
{
Id = i,
Text = "something....",
});
}
foreach (var item in DemoData.data)
{
var t = new Thread(new ThreadStart(DoSomething));//put break point here and see.
t.Name = item.Id.ToString(); ;
t.Start();
}
Console.WriteLine("wait");
Console.ReadLine();
}
public static void DoSomething()
{
Thread thr = Thread.CurrentThread;
Console.WriteLine(thr.Name);
}
}
you just need to wait for for the garbage collector to run, your current app isn't complex enough to require this except on close so you aren't seeing this occur,
so the framework will fire off the GC when it feels it is needed, this will then look though the callstack and decide which objects it no longer needs and delete them freeing up memory, this will happen completely automatically with out you needing to do anything.
if you want to help the GC out you can check your variable scopes so that the GC can see its not needed anymore because its completely out of scope, so avoiding global variables, not creating links between data objects that aren't needed making correct choices between value and reference types, ect
however if you ever do end up in a position where you need to manually fire the GC you can call GC.Collect()
see
https://learn.microsoft.com/en-us/dotnet/api/system.gc.collect?view=net-5.0
You might need to use a memory profiler to check for potential memory leaks.
A possible reason for your issues is the large number of threads, there is no reason to use more threads than there are cpu cores. Each thread used will need some memory for a stack and other house keeping.
One fairly simple way would be to put addresses that needs visiting in a collection, and use a parallel.Foreach loop. This will try to adjust the number of threads used to maximize thruput. More complex variants could use one of the concurrent collections and multiple consumers. There is also async variants of many IO calls to avoid the memory overhead of blocking a thread while waiting for IO. I would recommend reading some examples of multiple producers/consumers pattern for more details.
if your problem is feeding the data into the threads for processing then i would suggest you look at https://devblogs.microsoft.com/dotnet/an-introduction-to-system-threading-channels/ this allows you to then create a thread safe processing buffer
here is working example, note this is using the .net 5 syntax
using System;
using System.Threading.Channels;
using System.Net.Http;
using System.Collections.Generic;
using System.Threading.Tasks;
using System.Linq;
var sites = Enumerable.Range(0, 5000).Select(i => #"http:\\www.example.com");
//create thread safe buffer no limit on size
var sitebuffer = Channel.CreateUnbounded<string>();
//create thread safe buffer limited to 10 elements
var htmlbuffer = Channel.CreateBounded<string>(10);
async Task Feed()
{
//while the buffer hasn't closed, wait for new data to be available
while(await sitebuffer.Reader.WaitToReadAsync())
{
//read the next available url from the buffer
var uri = await sitebuffer.Reader.ReadAsync();
var http = new HttpClient();
var html= await http.GetAsync(uri);
Console.WriteLine("reading site");
//load the return text to the htmlbuffer, if buffer is full wait for space
await htmlbuffer.Writer.WriteAsync(await html.Content.ReadAsStringAsync());
Console.WriteLine("reading site complete");
}
}
async Task Process()
{
//while the buffer hasn't closed, wait for new data to be available
while (await htmlbuffer.Reader.WaitToReadAsync())
{
//read html from buffer send to doSomething then read next element
var html = await htmlbuffer.Reader.ReadAsync();
await doSomethingWithHTML(html);
}
}
async Task doSomethingWithHTML(string html)
{
await Task.Delay(2);
Console.WriteLine("done something");
}
//start 4 feeders threads
var feeders = new[]
{
Feed(),
Feed(),
Feed(),
Feed(),
};
//start 2 worker threads
var workers = new[]
{
Process(),
Process(),
};
//start of feeding in sites
foreach (var item in sites)
{
await sitebuffer.Writer.WriteAsync(item);
}
//mark that all sites have been fed into the systems
sitebuffer.Writer.Complete();
//wait for all feeders to finish
await Task.WhenAll(feeders);
//mark that no more sites will be read
htmlbuffer.Writer.Complete();
//wait for all workers to finish
await Task.WhenAll(workers);
Console.WriteLine("all tasks complete");
notice the async Task's and awaits this is a newer wrapper around threads that simplifies a lot of the complexity in managing threads
Related
I have two versions of my program that submit ~3000 HTTP GET requests to a web server.
The first version is based off of what I read here. That solution makes sense to me because making web requests is I/O bound work, and the use of async/await along with Task.WhenAll or Task.WaitAll means that you can submit 100 requests all at once and then wait for them all to finish before submitting the next 100 requests so that you don't bog down the web server. I was surprised to see that this version completed all of the work in ~12 minutes - way slower than I expected.
The second version submits all 3000 HTTP GET requests inside a Parallel.ForEach loop. I use .Result to wait for each request to finish before the rest of the logic within that iteration of the loop can execute. I thought that this would be a far less efficient solution, since using threads to perform tasks in parallel is usually better suited for performing CPU bound work, but I was surprised to see that the this version completed all of the work within ~3 minutes!
My question is why is the Parallel.ForEach version faster? This came as an extra surprise because when I applied the same two techniques against a different API/web server, version 1 of my code was actually faster than version 2 by about 6 minutes - which is what I expected. Could performance of the two different versions have something to do with how the web server handles the traffic?
You can see a simplified version of my code below:
private async Task<ObjectDetails> TryDeserializeResponse(HttpResponseMessage response)
{
try
{
using (Stream stream = await response.Content.ReadAsStreamAsync())
using (StreamReader readStream = new StreamReader(stream, Encoding.UTF8))
using (JsonTextReader jsonTextReader = new JsonTextReader(readStream))
{
JsonSerializer serializer = new JsonSerializer();
ObjectDetails objectDetails = serializer.Deserialize<ObjectDetails>(
jsonTextReader);
return objectDetails;
}
}
catch (Exception e)
{
// Log exception
return null;
}
}
private async Task<HttpResponseMessage> TryGetResponse(string urlStr)
{
try
{
HttpResponseMessage response = await httpClient.GetAsync(urlStr)
.ConfigureAwait(false);
if (response.StatusCode != HttpStatusCode.OK)
{
throw new WebException("Response code is "
+ response.StatusCode.ToString() + "... not 200 OK.");
}
return response;
}
catch (Exception e)
{
// Log exception
return null;
}
}
private async Task<ListOfObjects> GetObjectDetailsAsync(string baseUrl, int id)
{
string urlStr = baseUrl + #"objects/id/" + id + "/details";
HttpResponseMessage response = await TryGetResponse(urlStr);
ObjectDetails objectDetails = await TryDeserializeResponse(response);
return objectDetails;
}
// With ~3000 objects to retrieve, this code will create 100 API calls
// in parallel, wait for all 100 to finish, and then repeat that process
// ~30 times. In other words, there will be ~30 batches of 100 parallel
// API calls.
private Dictionary<int, Task<ObjectDetails>> GetAllObjectDetailsInBatches(
string baseUrl, Dictionary<int, MyObject> incompleteObjects)
{
int batchSize = 100;
int numberOfBatches = (int)Math.Ceiling(
(double)incompleteObjects.Count / batchSize);
Dictionary<int, Task<ObjectDetails>> objectTaskDict
= new Dictionary<int, Task<ObjectDetails>>(incompleteObjects.Count);
var orderedIncompleteObjects = incompleteObjects.OrderBy(pair => pair.Key);
for (int i = 0; i < 1; i++)
{
var batchOfObjects = orderedIncompleteObjects.Skip(i * batchSize)
.Take(batchSize);
var batchObjectsTaskList = batchOfObjects.Select(
pair => GetObjectDetailsAsync(baseUrl, pair.Key));
Task.WaitAll(batchObjectsTaskList.ToArray());
foreach (var objTask in batchObjectsTaskList)
objectTaskDict.Add(objTask.Result.id, objTask);
}
return objectTaskDict;
}
public void GetObjectsVersion1()
{
string baseUrl = #"https://mywebserver.com:/api";
// GetIncompleteObjects is not shown, but it is not relevant to
// the question
Dictionary<int, MyObject> incompleteObjects = GetIncompleteObjects();
Dictionary<int, Task<ObjectDetails>> objectTaskDict
= GetAllObjectDetailsInBatches(baseUrl, incompleteObjects);
foreach (KeyValuePair<int, MyObject> pair in incompleteObjects)
{
ObjectDetails objectDetails = objectTaskDict[pair.Key].Result
.objectDetails;
// Code here that copies fields from objectDetails to pair.Value
// (the incompleteObject)
AllObjects.Add(pair.Value);
};
}
public void GetObjectsVersion2()
{
string baseUrl = #"https://mywebserver.com:/api";
// GetIncompleteObjects is not shown, but it is not relevant to
// the question
Dictionary<int, MyObject> incompleteObjects = GetIncompleteObjects();
Parallel.ForEach(incompleteHosts, pair =>
{
ObjectDetails objectDetails = GetObjectDetailsAsync(
baseUrl, pair.Key).Result.objectDetails;
// Code here that copies fields from objectDetails to pair.Value
// (the incompleteObject)
AllObjects.Add(pair.Value);
});
}
A possible reason why Parallel.ForEach may run faster is because it creates the side-effect of throttling. Initially x threads are processing the first x elements (where x in the number of the available cores), and progressively more threads may be added depending on internal heuristics. Throttling IO operations is a good thing because it protects the network and the server that handles the requests from becoming overburdened. Your alternative improvised method of throttling, by making requests in batches of 100, is far from ideal for many reasons, one of them being that 100 concurrent requests are a lot of requests! Another one is that a single long running operation may delay the completion of the batch until long after the completion of the other 99 operations.
Note that Parallel.ForEach is also not ideal for parallelizing IO operations. It just happened to perform better than the alternative, wasting memory all along. For better approaches look here: How to limit the amount of concurrent async I/O operations?
https://learn.microsoft.com/en-us/dotnet/api/system.threading.tasks.parallel.foreach?view=netframework-4.8
Basically the parralel foreach allows iterations to run in parallel so you are not constraining the iteration to run in serial, on a host that is not thread constrained this will tend to lead to improved throughput
In short:
Parallel.Foreach() is most useful for CPU bound tasks.
Task.WaitAll() is more useful for IO bound tasks.
So in your case, you are getting information from webservers, which is IO. If the async methods are implemented correctly, it won't block any thread. (It will use IO Completion ports to wait on) This way the threads can do other stuff.
By running the async methods GetObjectDetailsAsync(baseUrl, pair.Key).Result synchroniced, it will block a thread. So the threadpool will be flood by waiting threads.
So I think the Task solution will have a better fit.
My process Gets the data through HTTP request and it will get the data in Chunks(100 records at a time). in my case I had 100,000 records.
and then I need to process that data and load it into DB..
MY Current Process..
GrabAllRecords()
{
GRAB all 100,000 records(i.e 1000 requests).. its big amount of time.
Load into ArrayData
}
then..
Process Data(ArrayData)
{
}
But I need some thing like this...
START:
step1:
Grab 100 Records load into arraylist..
repeat step1 until it reach 100,000
step2:
process arrayList
This screams for the producer - consumer design pattern: one producer produces something in its own pace, while one or more consumers wait until something is produced, grab the produced information and process it, possibly leading to new produced output that other consumers might process.
Microsoft has good support for this via Microsoft TPL Dataflow nuget package.
Implement a Producer-Consumer Dataflow Pattern
Also helpful to start: Walkthrough: Creating a Dataflow Pipeline
The producer produces output in processable units, in your case: chunks. The output will be sent to an object of class BufferBlock< T > , where T is your chunk. Code will be similar to:
public class ChunkProducer
{
private BufferBlock<Chunk> outputBuffer = new BufferBlock<Chunk>;
// whenever the ChunkProducer produces a chunk it is put in this buffer
// consumers will need access to this outputbuffer as source of data:
public ISourceBlock<Chunk> OutputBuffer
{get {return this.outputBuffer as ISourceBlock<Chunk>;} }
public async Task ProduceAsync()
{
while(someThingsToProcess)
{
Chunk chunk = CreateChunk(...);
await this.outputBuffer.SendAsync(chunk);
}
// if here: nothing to process anymore.
// notify consumers that all output has been produced
this.outputBuffer.Complete();
}
The efficiency of this can be enhanced by creating the next chunk while the previous one is being sent and await before sending the next chunk. This is a bit out of scope here. More info about this is available on Stackoverflow.
You'll also need a ChunkConsumer. The ChunkConsumer will wait for chunks on the buffer block and process them:
public class ChunkConsumer
{
private readonly ISourceBlock<Chunk> chunkSource;
// the chunkConsumer will wait for input at this source
public ChunkConsumer(ISourceBlock<Chunk> chunkSource)
{
this.chunkSource = chunkSource
}
public async Task ConsumeAsync()
{
// wait until there is some data in the buffer
while (await this.chunkSource.OutputAvailableAsync())
{
// get the chunk and process it:
Chunk chunk = this.chunkSource.Receive()
ProcessChunk(chunk);
}
// if here: chunkSource has been completed. No more data to expect
}
Put it all together:
private async Task ProcessAsync()
{
ChunkProducer producer = new ChunkProducer();
ChunkConsumer consumer = new ChunkConsumer(producer.OutputBuffer);
// start a thread for the consumer to consume:
Task consumeTask = Task.Run( () => consumer.ConsumeAsync());
// let this thread start producing, and await until it is completed
await producer.ProduceAsync();
// if here, I know the producer finished producing
// wait until the consumer finished consuming:
await consumeTask;
// finished, all produced data is consumed.
}
Possible enhancements:
If producing is faster than consuming, consider using multiple consumers listening to the same ISourceBlock. Check TPL to see which of the BufferBlock types can handle multiple listeners
If producing is slower than consuming, consider using multiple producers producing to the same ITargetBlock. Check which type of buffer block can handle this.
Consider enabling cancellation using CancellationToken
If your chunk is not always the same number of records, consider using a batch block: The consumer gets notified if the batch has enough records to process.
You can use the DataFlow library to do something like this:
ActionBlock<Record[]> action_block = new ActionBlock<Record[]>(
x => ConsumeRecords(x),
new ExecutionDataflowBlockOptions
{
//Use one thread to process data.
//You can increase it if you want
//That would make sense if you produce the records faster than you consume them
MaxDegreeOfParallelism = 1
});
for (int i = 0; i < 1000; i++)
{
action_block.Post(ProduceNext100Records());
}
I am assuming that you have a method called ProduceNext100Records that produces records (e.g. via web service call) and another method called ConsumeRecords that consumes the records.
The easy answer I think is to use Microsoft Reactive Extensions (NuGet "Rx-Main").
Then you can do something like this:
var query =
from records in Get100Records().ToObservable()
from record in records.ToObservable()
from result in Observable.Start(() => ProcessRecord(record))
select new { record, result };
IDisposable subscription =
query
.Subscribe(
rr =>
{
/* Process each `rr.record`/`rr.result`
as they are produced */
},
() => { /* Run when all completed */ });
This will process in parallel and you'll start getting results as soon as the first ProcessRecord call is completed.
If you need to stop the processing early you just call subscription.Dispose().
I have developed an application in c#. The class structure is as follows.
Form1 => The UI form. Has a backgroundworker, processbar, and a "ok" button.
SourceReader, TimedWebClient, HttpWorker, ReportWriter //clases do some work
Controller => Has the all over control. From "ok" button click an instance of this class called "cntrl" is created. This cntrlr is a global variable in Form1.cs.
(At the constructor of the Controler I create SourceReader, TimedWebClient,HttpWorker,ReportWriter instances. )
Then I call the RunWorkerAsync() of the background worker.
Within it code is as follows.
private void backgroundWorker1_DoWork(object sender, DoWorkEventArgs e)
{
int iterator = 1;
for (iterator = 1; iterator <= this.urlList.Count; iterator++)
{
cntrlr.Vmain(iterator-1);
backgroundWorker1.ReportProgress(iterator);
}
}
At themoment ReportProgress updates the progressbar.
The urlList mentioned above has 1000 of urls. cntlr.Vamin(int i) process the whole process at themoment. I want to give the task to several threads, each one having to process 100 of urls. Though access for other instances or methods of them is not prohibited, access to ReportWriter should be limited to only one thread at a time. I can't find a way to do this. If any one have an idea or an answer, please explain.
If you do want to restrict multiple threads using the same method concurrently then I would use the Semaphore class to facilitate the required thread limit; here's how...
A semaphore is like a mean night club bouncer, it has been provide a club capacity and is not allowed to exceed this limit. Once the club is full, no one else can enter... A queue builds up outside. Then as one person leaves another can enter (analogy thanks to J. Albahari).
A Semaphore with a value of one is equivalent to a Mutex or Lock except that the Semaphore has no owner so that it is thread ignorant. Any thread can call Release on a Semaphore whereas with a Mutex/Lock only the thread that obtained the Mutex/Lock can release it.
Now, for your case we are able to use Semaphores to limit concurrency and prevent too many threads from executing a particular piece of code at once. In the following example five threads try to enter a night club that only allows entry to three...
class BadAssClub
{
static SemaphoreSlim sem = new SemaphoreSlim(3);
static void Main()
{
for (int i = 1; i <= 5; i++)
new Thread(Enter).Start(i);
}
// Enfore only three threads running this method at once.
static void Enter(int i)
{
try
{
Console.WriteLine(i + " wants to enter.");
sem.Wait();
Console.WriteLine(i + " is in!");
Thread.Sleep(1000 * (int)i);
Console.WriteLine(i + " is leaving...");
}
finally
{
sem.Release();
}
}
}
Note, that SemaphoreSlim is a lighter weight version of the Semaphore class and incurs about a quarter of the overhead. it is sufficient for what you require.
I hope this helps.
I think I would have used the ThreadPool, instead of background worker, and given each thread 1, not 100 url's to process. The thread pool will limit the number of threads it starts at once, so you wont have to worry about getting 1000 requests at once. Have a look here for a good example
http://msdn.microsoft.com/en-us/library/3dasc8as.aspx
Feeling a little more adventurous? Consider using TPL DataFlow to download a bunch of urls:
var urls = new[]{
"http://www.google.com",
"http://www.microsoft.com",
"http://www.apple.com",
"http://www.stackoverflow.com"};
var tb = new TransformBlock<string, string>(async url => {
using(var wc = new WebClient())
{
var data = await wc.DownloadStringTaskAsync(url);
Console.WriteLine("Downloaded : {0}", url);
return data;
}
}, new ExecutionDataflowBlockOptions{MaxDegreeOfParallelism = 4});
var ab = new ActionBlock<string>(data => {
//process your data
Console.WriteLine("data length = {0}", data.Length);
}, new ExecutionDataflowBlockOptions{MaxDegreeOfParallelism = 1});
tb.LinkTo(ab); //join output of producer to consumer block
foreach(var u in urls)
{
tb.Post(u);
}
tb.Complete();
Note how you can control the parallelism of each block explicitly, so you can gather in parallel but process without going concurrent (for example).
Just grab it with nuget. Easy.
I'm creating a tool to load test (sends http: GETs) and it runs fine but eventually dies because of an out of memory error.
ASK: How can I reset the threads so this loop can continually run and not err?
static void Main(string[] args)
{
System.Net.ServicePointManager.DefaultConnectionLimit = 200;
while (true)
{
for (int i = 0; i < 1000; i++)
{
new Thread(LoadTest).Start(); //<-- EXCEPTION!.eventually errs out of memory
}
Thread.Sleep(2);
}
}
static void LoadTest()
{
string url = "http://myserv.com/api/dev/getstuff?whatstuff=thisstuff";
// Sends http get from above url ... and displays the repose in the console....
}
You are instantiating Threads left right and centre. This is likely you problem. You want to replace the
new Thread(LoadTest).Start();
with
Task.Run(LoadTest);
This will run your LoadTest on a Thread in the ThreadPool, instead of using resources to create a new Thread each time. HOWEVER. This will then expose a different issue.
Threads on the ThreadPool are a limited resource and you want to return Threads to the ThreadPool as soon as possible. I assume you are using the synchronous download methods as opposed to the APM methods. This means that whilst the request is being sent out to the server, the thread spawning the request is sleeping as opposed to going off to do some other work.
Either use (assuming .net 4.5)
var client = new WebClient();
var response = await client.DownloadStringTaskAsync(url);
Console.WriteLine(response);
Or use a callback (if not .net 4.5)
var client = new WebClient();
client.OnDownloadStringCompleted(x => Console.WriteLine(x));
client.BeginDownloadString(url);
Use a ThreadPool and use QueueUserWorkItem instead of creating thousands of threads. Threads are expensive objects and it is no surprise you are running out of memory and besides you won't be able to have any performance (in your test tool) with so many threads.
You code snippet creates lots of threads and no wonder it eventually runs out of memory. It would be better to use a Thread Pool here.
You code would look like this:
static void Main(string[] args)
{
System.Net.ServicePointManager.DefaultConnectionLimit = 200;
ThreadPool.SetMaxThreads(500, 300);
while (true)
{
ThreadPool.QueueUserWorkItem(LoadTest);
}
}
static void LoadTest(object state)
{
string url = "http://myserv.com/api/dev/getstuff?whatstuff=thisstuff";
// Sends http get from above url ... and displays the repose in the console....
}
I think I may need to re-think my design. I'm having a hard time narrowing down a bug that is causing my computer to completely hang, sometimes throwing an HRESULT 0x8007000E from VS 2010.
I have a console application (that I will later convert to a service) that handles transferring files based on a database queue.
I am throttling the threads allowed to transfer. This is because some systems we are connecting to can only contain a certain number of connections from certain accounts.
For example, System A can only accept 3 simultaneous connections (which means 3 separate threads). Each one of these threads has their own unique connection object, so we shouldn't run in to any synchronization problems since they aren't sharing a connection.
We want to process the files from those systems in cycles. So, for example, we will allow 3 connections that can transfer up to 100 files per connection. This means, to move 1000 files from System A, we can only process 300 files per cycle, since 3 threads are allowed with 100 files each. Therefore, over the lifetime of this transfer, we will have 10 threads. We can only run 3 at a time. So, there will be 3 cycles, and the last cycle will only use 1 thread to transfer the last 100 files. (3 threads x 100 files = 300 files per cycle)
The current architecture by example is:
A System.Threading.Timer checks the queue every 5 seconds for something to do by calling GetScheduledTask()
If there's nothing to, GetScheduledTask() simply does nothing
If there is work, create a ThreadPool thread to process the work [Work Thread A]
Work Thread A sees that there are 1000 files to transfer
Work Thread A sees that it can only have 3 threads running to the system it is getting files from
Work Thread A starts three new work threads [B,C,D] and transfers
Work Thread A waits for B,C,D [WaitHandle.WaitAll(transfersArray)]
Work Thread A sees that there are still more files in the queue (should be 700 now)
Work Thread A creates a new array to wait on [transfersArray = new TransferArray[3] which is the max for System A, but could vary on system
Work Thread A starts three new work threads [B,C,D] and waits for them [WaitHandle.WaitAll(transfersArray)]
The process repeats until there are no more files to move.
Work Thread A signals that it is done
I am using ManualResetEvent to handle the signaling.
My questions are:
Is there any glaring circumstance which would cause a resource leak or problem that I am experiencing?
Should I loop thru the array after every WaitHandle.WaitAll(array) and call array[index].Dispose()?
The Handle count under the Task Manager for this process slowly creeps up
I am calling the initial creation of Worker Thread A from a System.Threading.Timer. Is there going to be any problems with this? The code for that timer is:
(Some class code for scheduling)
private ManualResetEvent _ResetEvent;
private void Start()
{
_IsAlive = true;
ManualResetEvent transferResetEvent = new ManualResetEvent(false);
//Set the scheduler timer to 5 second intervals
_ScheduledTasks = new Timer(new TimerCallback(ScheduledTasks_Tick), transferResetEvent, 200, 5000);
}
private void ScheduledTasks_Tick(object state)
{
ManualResetEvent resetEvent = null;
try
{
resetEvent = (ManualResetEvent)state;
//Block timer until GetScheduledTasks() finishes
_ScheduledTasks.Change(Timeout.Infinite, Timeout.Infinite);
GetScheduledTasks();
}
finally
{
_ScheduledTasks.Change(5000, 5000);
Console.WriteLine("{0} [Main] GetScheduledTasks() finished", DateTime.Now.ToString("MMddyy HH:mm:ss:fff"));
resetEvent.Set();
}
}
private void GetScheduledTask()
{
try
{
//Check to see if the database connection is still up
if (!_IsAlive)
{
//Handle
_ConnectionLostNotification = true;
return;
}
//Get scheduled records from the database
ISchedulerTask task = null;
using (DataTable dt = FastSql.ExecuteDataTable(
_ConnectionString, "hidden for security", System.Data.CommandType.StoredProcedure,
new List<FastSqlParam>() { new FastSqlParam(ParameterDirection.Input, SqlDbType.VarChar, "#ProcessMachineName", Environment.MachineName) })) //call to static class
{
if (dt != null)
{
if (dt.Rows.Count == 1)
{ //Only 1 row is allowed
DataRow dr = dt.Rows[0];
//Get task information
TransferParam.TaskType taskType = (TransferParam.TaskType)Enum.Parse(typeof(TransferParam.TaskType), dr["TaskTypeId"].ToString());
task = ScheduledTaskFactory.CreateScheduledTask(taskType);
task.Description = dr["Description"].ToString();
task.IsEnabled = (bool)dr["IsEnabled"];
task.IsProcessing = (bool)dr["IsProcessing"];
task.IsManualLaunch = (bool)dr["IsManualLaunch"];
task.ProcessMachineName = dr["ProcessMachineName"].ToString();
task.NextRun = (DateTime)dr["NextRun"];
task.PostProcessNotification = (bool)dr["NotifyPostProcess"];
task.PreProcessNotification = (bool)dr["NotifyPreProcess"];
task.Priority = (TransferParam.Priority)Enum.Parse(typeof(TransferParam.SystemType), dr["PriorityId"].ToString());
task.SleepMinutes = (int)dr["SleepMinutes"];
task.ScheduleId = (int)dr["ScheduleId"];
task.CurrentRuns = (int)dr["CurrentRuns"];
task.TotalRuns = (int)dr["TotalRuns"];
SchedulerTask scheduledTask = new SchedulerTask(new ManualResetEvent(false), task);
//Queue up task to worker thread and start
ThreadPool.QueueUserWorkItem(new WaitCallback(this.ThreadProc), scheduledTask);
}
}
}
}
catch (Exception ex)
{
//Handle
}
}
private void ThreadProc(object taskObject)
{
SchedulerTask task = (SchedulerTask)taskObject;
ScheduledTaskEngine engine = null;
try
{
engine = SchedulerTaskEngineFactory.CreateTaskEngine(task.Task, _ConnectionString);
engine.StartTask(task.Task);
}
catch (Exception ex)
{
//Handle
}
finally
{
task.TaskResetEvent.Set();
task.TaskResetEvent.Dispose();
}
}
0x8007000E is an out-of-memory error. That and the handle count seem to point to a resource leak. Ensure you're disposing of every object that implements IDisposable. This includes the arrays of ManualResetEvents you're using.
If you have time, you may also want to convert to using the .NET 4.0 Task class; it was designed to handle complex scenarios like this much more cleanly. By defining child Task objects, you can reduce your overall thread count (threads are quite expensive not only because of scheduling but also because of their stack space).
I'm looking for answers to a similar problem (Handles Count increasing over time).
I took a look at your application architecture and like to suggest you something that could help you out:
Have you heard about IOCP (Input Output Completion Ports).
I'm not sure of the dificulty to implement this using C# but in C/C++ it is a piece of cake.
By using this you create a unique thread pool (The number of threads in that pool is in general defined as 2 x the number of processors or processors cores in the PC or server)
You associate this pool to a IOCP Handle and the pool does the work.
See the help for these functions:
CreateIoCompletionPort();
PostQueuedCompletionStatus();
GetQueuedCompletionStatus();
In General creating and exiting threads on the fly could be time consuming and leads to performance penalties and memory fragmentation.
There are thousands of literature about IOCP in MSDN and in google.
I think you should reconsider your architecture altogether. The fact that you can only have 3 simultaneously connections is almost begging you to use 1 thread to generate the list of files and 3 threads to process them. Your producer thread would insert all files into a queue and the 3 consumer threads will dequeue and continue processing as items arrive in the queue. A blocking queue can significantly simplify the code. If you are using .NET 4.0 then you can take advantage of the BlockingCollection class.
public class Example
{
private BlockingCollection<string> m_Queue = new BlockingCollection<string>();
public void Start()
{
var threads = new Thread[]
{
new Thread(Producer),
new Thread(Consumer),
new Thread(Consumer),
new Thread(Consumer)
};
foreach (Thread thread in threads)
{
thread.Start();
}
}
private void Producer()
{
while (true)
{
Thread.Sleep(TimeSpan.FromSeconds(5));
ScheduledTask task = GetScheduledTask();
if (task != null)
{
foreach (string file in task.Files)
{
m_Queue.Add(task);
}
}
}
}
private void Consumer()
{
// Make a connection to the resource that is assigned to this thread only.
while (true)
{
string file = m_Queue.Take();
// Process the file.
}
}
}
I have definitely oversimplified things in the example above, but I hope you get the general idea. Notice how this is much simpler as there is not much in the way of thread synchronization (most will be embedded in the blocking queue) and of course there is no use of WaitHandle objects. Obviously you would have to add in the correct mechanisms to shut down the threads gracefully, but that should be fairly easy.
It turns out the source of this strange problem was not related to architecture but rather because of converting the solution from 3.5 to 4.0. I re-created the solution, performing no code changes, and the problem never occurred again.