What does MaxDegreeOfParallelism do? - c#
I am using Parallel.ForEach and I am doing some database updates, now without setting MaxDegreeOfParallelism, a dual core processor machine results in SQL client timeouts, where else quad core processor machine somehow does not timeout.
Now I have no control over what kind of processor cores are available where my code runs, but is there some settings I can change with MaxDegreeOfParallelism that will probably run less operations simultaneously and not result in timeouts?
I can increase timeouts but it isn't a good solution, if on lower CPU I can process less operations simultaneously, that will put less load on cpu.
Ok I have read all other posts and MSDN too, but will setting MaxDegreeOfParallelism to lower value make my quad core machines suffer?
For example, is there anyway to do something like, if CPU has two cores, then use 20, if CPU has four cores then 40?
The answer is that it is the upper limit for the entire parallel operation, irrespective of the number of cores.
So even if you don't use the CPU because you are waiting on IO, or a lock, no extra tasks will run in parallel, only the maximum that you specifiy.
To find this out, I wrote this piece of test code. There is an artificial lock in there to stimulate the TPL to use more threads. The same will happen when your code is waiting for IO or database.
class Program
{
static void Main(string[] args)
{
var locker = new Object();
int count = 0;
Parallel.For
(0
, 1000
, new ParallelOptions { MaxDegreeOfParallelism = 2 }
, (i) =>
{
Interlocked.Increment(ref count);
lock (locker)
{
Console.WriteLine("Number of active threads:" + count);
Thread.Sleep(10);
}
Interlocked.Decrement(ref count);
}
);
}
}
If I don't specify MaxDegreeOfParallelism, the console logging shows that up to around 8 tasks are running at the same time. Like this:
Number of active threads:6
Number of active threads:7
Number of active threads:7
Number of active threads:7
Number of active threads:7
Number of active threads:7
Number of active threads:6
Number of active threads:7
Number of active threads:7
Number of active threads:7
Number of active threads:7
Number of active threads:7
Number of active threads:7
Number of active threads:7
Number of active threads:7
Number of active threads:7
Number of active threads:7
Number of active threads:7
Number of active threads:7
It starts lower, increases over time and at the end it is trying to run 8 at the same time.
If I limit it to some arbitrary value (say 2), I get
Number of active threads:2
Number of active threads:1
Number of active threads:2
Number of active threads:2
Number of active threads:2
Number of active threads:2
Number of active threads:2
Number of active threads:2
Number of active threads:2
Number of active threads:2
Number of active threads:2
Number of active threads:2
Number of active threads:2
Number of active threads:2
Number of active threads:2
Number of active threads:2
Number of active threads:2
Oh, and this is on a quadcore machine.
For example, is there anyway to do something like, if CPU has two cores, then use 20, if CPU has four cores then 40?
You can do this to make parallelism dependent on the number of CPU cores:
var options = new ParallelOptions { MaxDegreeOfParallelism = Environment.ProcessorCount * 10 };
Parallel.ForEach(sourceCollection, options, sourceItem =>
{
// do something
});
However, newer CPU's tend to use hyper-threading to simulate extra cores. So if you have a quad-core processor, then Environment.ProcessorCount will probably report this as 8 cores. I've found that if you set the parallelism to account for the simulated cores then it actually slows down other threads such as UI threads.
So although the operation will finish a bit faster, an application UI may experience significant lag during this time. Dividing the `Environment.ProcessorCount' by 2 seems to achieve the same processing speeds while still keeping the CPU available for UI threads.
It sounds like the code that you're running in parallel is deadlocking, which means that unless you can find and fix the issue that's causing that, you shouldn't parallelize it at all.
Something else to consider, especially for those finding this many years later, is depending on your situation it's usually best to collect all data in a DataTable and then use SqlBulkCopy toward the end of each major task.
For example I have a process that I made that runs through millions of files and I ran into the same errors when each file transaction made a DB query to insert the record. I instead moved to storing it all in a DataTable in memory for each share I iterated through, dumping the DataTable into my SQL Server and clearing it between each separate share. The bulk insert takes a split second and has the benefit of not opening thousands of connections at once.
EDIT:
Here's a quick & dirty working example
The SQLBulkCopy method:
private static void updateDatabase(DataTable targetTable)
{
try
{
DataSet ds = new DataSet("FileFolderAttribute");
ds.Tables.Add(targetTable);
writeToLog(targetTable.TableName + " - Rows: " + targetTable.Rows.Count, logDatabaseFile, getLineNumber(), getCurrentMethod(), true);
writeToLog(#"Opening SQL connection", logDatabaseFile, getLineNumber(), getCurrentMethod(), true);
Console.WriteLine(#"Opening SQL connection");
SqlConnection sqlConnection = new SqlConnection(sqlConnectionString);
sqlConnection.Open();
SqlBulkCopy bulkCopy = new SqlBulkCopy(sqlConnection, SqlBulkCopyOptions.TableLock | SqlBulkCopyOptions.FireTriggers | SqlBulkCopyOptions.UseInternalTransaction, null);
bulkCopy.DestinationTableName = "FileFolderAttribute";
writeToLog(#"Copying data to SQL Server table", logDatabaseFile, getLineNumber(), getCurrentMethod(), true);
Console.WriteLine(#"Copying data to SQL Server table");
foreach (var table in ds.Tables)
{
writeToLog(table.ToString(), logDatabaseFile, getLineNumber(), getCurrentMethod(), true);
Console.WriteLine(table.ToString());
}
bulkCopy.WriteToServer(ds.Tables[0]);
sqlConnection.Close();
sqlConnection.Dispose();
writeToLog(#"Closing SQL connection", logDatabaseFile, getLineNumber(), getCurrentMethod(), true);
writeToLog(#"Clearing local DataTable...", logDatabaseFile, getLineNumber(), getCurrentMethod(), true);
Console.WriteLine(#"Closing SQL connection");
Console.WriteLine(#"Clearing local DataTable...");
targetTable.Clear();
ds.Tables.Remove(targetTable);
ds.Clear();
ds.Dispose();
}
catch (Exception error)
{
errorLogging(error, getCurrentMethod(), logDatabaseFile);
}
}
...and for dumping it into the datatable:
private static void writeToDataTable(string ServerHostname, string RootDirectory, string RecordType, string Path, string PathDirectory, string PathFileName, string PathFileExtension, decimal SizeBytes, decimal SizeMB, DateTime DateCreated, DateTime DateModified, DateTime DateLastAccessed, string Owner, int PathLength, DateTime RecordWriteDateTime)
{
try
{
if (tableToggle)
{
DataRow toInsert = results_1.NewRow();
toInsert[0] = ServerHostname;
toInsert[1] = RootDirectory;
toInsert[2] = RecordType;
toInsert[3] = Path;
toInsert[4] = PathDirectory;
toInsert[5] = PathFileName;
toInsert[6] = PathFileExtension;
toInsert[7] = SizeBytes;
toInsert[8] = SizeMB;
toInsert[9] = DateCreated;
toInsert[10] = DateModified;
toInsert[11] = DateLastAccessed;
toInsert[12] = Owner;
toInsert[13] = PathLength;
toInsert[14] = RecordWriteDateTime;
results_1.Rows.Add(toInsert);
}
else
{
DataRow toInsert = results_2.NewRow();
toInsert[0] = ServerHostname;
toInsert[1] = RootDirectory;
toInsert[2] = RecordType;
toInsert[3] = Path;
toInsert[4] = PathDirectory;
toInsert[5] = PathFileName;
toInsert[6] = PathFileExtension;
toInsert[7] = SizeBytes;
toInsert[8] = SizeMB;
toInsert[9] = DateCreated;
toInsert[10] = DateModified;
toInsert[11] = DateLastAccessed;
toInsert[12] = Owner;
toInsert[13] = PathLength;
toInsert[14] = RecordWriteDateTime;
results_2.Rows.Add(toInsert);
}
}
catch (Exception error)
{
errorLogging(error, getCurrentMethod(), logFile);
}
}
...and here's the context, the looping piece itself:
private static void processTargetDirectory(DirectoryInfo rootDirectory, string targetPathRoot)
{
DateTime StartTime = DateTime.Now;
int directoryCount = 0;
int fileCount = 0;
try
{
manageDataTables();
Console.WriteLine(rootDirectory.FullName);
writeToLog(#"Working in Directory: " + rootDirectory.FullName, logFile, getLineNumber(), getCurrentMethod(), true);
applicationsDirectoryCount++;
// REPORT DIRECTORY INFO //
string directoryOwner = "";
try
{
directoryOwner = File.GetAccessControl(rootDirectory.FullName).GetOwner(typeof(System.Security.Principal.NTAccount)).ToString();
}
catch (Exception error)
{
//writeToLog("\t" + rootDirectory.FullName, logExceptionsFile, getLineNumber(), getCurrentMethod(), true);
writeToLog("[" + error.Message + "] - " + rootDirectory.FullName, logExceptionsFile, getLineNumber(), getCurrentMethod(), true);
errorLogging(error, getCurrentMethod(), logFile);
directoryOwner = "SeparatedUser";
}
writeToRawLog(serverHostname + "," + targetPathRoot + "," + "Directory" + "," + rootDirectory.Name + "," + rootDirectory.Extension + "," + 0 + "," + 0 + "," + rootDirectory.CreationTime + "," + rootDirectory.LastWriteTime + "," + rootDirectory.LastAccessTime + "," + directoryOwner + "," + rootDirectory.FullName.Length + "," + DateTime.Now + "," + rootDirectory.FullName + "," + "", logResultsFile, true, logFile);
//writeToDBLog(serverHostname, targetPathRoot, "Directory", rootDirectory.FullName, "", rootDirectory.Name, rootDirectory.Extension, 0, 0, rootDirectory.CreationTime, rootDirectory.LastWriteTime, rootDirectory.LastAccessTime, directoryOwner, rootDirectory.FullName.Length, DateTime.Now);
writeToDataTable(serverHostname, targetPathRoot, "Directory", rootDirectory.FullName, "", rootDirectory.Name, rootDirectory.Extension, 0, 0, rootDirectory.CreationTime, rootDirectory.LastWriteTime, rootDirectory.LastAccessTime, directoryOwner, rootDirectory.FullName.Length, DateTime.Now);
if (rootDirectory.GetDirectories().Length > 0)
{
Parallel.ForEach(rootDirectory.GetDirectories(), new ParallelOptions { MaxDegreeOfParallelism = directoryDegreeOfParallelism }, dir =>
{
directoryCount++;
Interlocked.Increment(ref threadCount);
processTargetDirectory(dir, targetPathRoot);
});
}
// REPORT FILE INFO //
Parallel.ForEach(rootDirectory.GetFiles(), new ParallelOptions { MaxDegreeOfParallelism = fileDegreeOfParallelism }, file =>
{
applicationsFileCount++;
fileCount++;
Interlocked.Increment(ref threadCount);
processTargetFile(file, targetPathRoot);
});
}
catch (Exception error)
{
writeToLog(error.Message, logExceptionsFile, getLineNumber(), getCurrentMethod(), true);
errorLogging(error, getCurrentMethod(), logFile);
}
finally
{
Interlocked.Decrement(ref threadCount);
}
DateTime EndTime = DateTime.Now;
writeToLog(#"Run time for " + rootDirectory.FullName + #" is: " + (EndTime - StartTime).ToString() + #" | File Count: " + fileCount + #", Directory Count: " + directoryCount, logTimingFile, getLineNumber(), getCurrentMethod(), true);
}
Like noted above, this is quick & dirty, but works very well.
For memory-related issues I ran into once I got to around 2,000,000 records, I had to create a second DataTable and alternate between the 2, dumping the records to SQL server between alternation. So my SQL connections consist of 1 every 100,000 records.
I managed that like this:
private static void manageDataTables()
{
try
{
Console.WriteLine(#"[Checking datatable size] toggleValue: " + tableToggle + " | " + #"r1: " + results_1.Rows.Count + " - " + #"r2: " + results_2.Rows.Count);
if (tableToggle)
{
int rowCount = 0;
if (results_1.Rows.Count > datatableRecordCountThreshhold)
{
tableToggle ^= true;
writeToLog(#"results_1 row count > 100000 # " + results_1.Rows.Count, logDatabaseFile, getLineNumber(), getCurrentMethod(), true);
rowCount = results_1.Rows.Count;
logResultsFile = "FileServerReport_Results_" + DateTime.Now.ToString("yyyyMMdd-HHmmss") + ".txt";
Thread.Sleep(5000);
if (results_1.Rows.Count != rowCount)
{
writeToLog(#"results_1 row count increased, # " + results_1.Rows.Count, logDatabaseFile, getLineNumber(), getCurrentMethod(), true);
rowCount = results_1.Rows.Count;
Thread.Sleep(15000);
}
writeToLog(#"results_1 row count stopped increasing, updating database...", logDatabaseFile, getLineNumber(), getCurrentMethod(), true);
updateDatabase(results_1);
results_1.Clear();
writeToLog(#"results_1 cleared, count: " + results_1.Rows.Count, logDatabaseFile, getLineNumber(), getCurrentMethod(), true);
}
}
else
{
int rowCount = 0;
if (results_2.Rows.Count > datatableRecordCountThreshhold)
{
tableToggle ^= true;
writeToLog(#"results_2 row count > 100000 # " + results_2.Rows.Count, logDatabaseFile, getLineNumber(), getCurrentMethod(), true);
rowCount = results_2.Rows.Count;
logResultsFile = "FileServerReport_Results_" + DateTime.Now.ToString("yyyyMMdd-HHmmss") + ".txt";
Thread.Sleep(5000);
if (results_2.Rows.Count != rowCount)
{
writeToLog(#"results_2 row count increased, # " + results_2.Rows.Count, logDatabaseFile, getLineNumber(), getCurrentMethod(), true);
rowCount = results_2.Rows.Count;
Thread.Sleep(15000);
}
writeToLog(#"results_2 row count stopped increasing, updating database...", logDatabaseFile, getLineNumber(), getCurrentMethod(), true);
updateDatabase(results_2);
results_2.Clear();
writeToLog(#"results_2 cleared, count: " + results_2.Rows.Count, logDatabaseFile, getLineNumber(), getCurrentMethod(), true);
}
}
}
catch (Exception error)
{
errorLogging(error, getCurrentMethod(), logDatabaseFile);
}
}
Where "datatableRecordCountThreshhold = 100000"
The Parallel.ForEach method starts internally a number of Tasks, and each of these tasks repeatedly takes an item from the source sequence and invokes the body delegate for this item. The MaxDegreeOfParallelism can set an upper limit to these internal tasks. But this setting is not the only factor that limits the parallelism. There is also the willingness of the TaskScheduler to execute the tasks that are spawned by the Parallel.ForEach.
The spawning mechanism works by each spawned task replicating itself. In order words the first thing that each task do is to create another task. Most TaskSchedulers have a limit on how many tasks can execute concurrently, and when this limit is reached they queue the next incoming tasks without executing them immediately. So eventually the self-replicating pattern of Parallel.ForEach will stop spawning more tasks, because the last task spawned will be sitting idle in the TaskScheduler's queue.
Let's talk about the TaskScheduler.Default, which is the default scheduler of the Parallel.ForEach, and schedules the tasks on the ThreadPool. The ThreadPool has a soft and a hard limit. The soft limit is when the demand for work is not satisfied immediately, and the hard limit is when the demand for work is never satisfied until an already running workitem completes. When the ThreadPool reaches the soft limit, which is Environment.ProcessorCount by default, it spawns more threads to satisfy the demand at a frequency of one new thread per second¹. The soft limit can be configured with the ThreadPool.SetMinThreads method. The hard limit can be found with the ThreadPool.GetMaxThreads method, and is 32,767 threads in my machine.
So if I configure the Parallel.ForEach in my 4-core machine with MaxDegreeOfParallelism = 20, and the body delegate keeps the current thread busy for more than one second, the effective degree of parallelism will start with 5, then it will gradually increase during the next 15 seconds until it becomes 20, and it will stay at 20 until the completion of the loop. The reason that it starts with 5 instead of 4 is because the Parallel.ForEach uses also the current thread, along with the ThreadPool.
If I don't configure the MaxDegreeOfParallelism, it will be the same as configuring it with the value -1, which means unlimited parallelism. In this case the ThreadPool availability will be the only limiting factor of the actual degree of parallelism. As long as the Parallel.ForEach runs, the ThreadPool will be saturated, in order words it will be in a situation where the supply will be constantly surpassed by the demand. Each time a new thread is spawned by the ThreadPool, this thread will pick the last task scheduled previously by the Parallel.ForEach, which will immediately replicate itself, and the replica will enter the ThreadPool's queue. Provided that the Parallel.ForEach will run for sufficiently long time, the ThreadPool will reach its maximum size (32,767 in my machine), and will stay at this level until the completion of the loop. This assuming that the process will not have crashed already because of lack of other resources like memory.
The official documentation for the MaxDegreeOfParallelism property states that "generally, you do not need to modify this setting". Apparently it has been this way since the introduction of the TPL with .NET Framework 4.0 (2010). At this point you may have started questioning the validity of this advice. So do I, so I posted a question on the dotnet/runtime repository, asking if the given advice is still valid or it's outdated. I was surprised to receive the feedback that the advice is as valid as ever. Microsoft's argument is that limiting the MaxDegreeOfParallelism to the value Environment.ProcessorCount may cause performance regression, or even deadlocks in some scenarios. I responded with a couple of examples demonstrating the problematic behavior that might emerge when an unconfigured Parallel.ForEach runs in an async-enabled application, where other things are happening concurrently with the parallel loop. The demos were dismissed as unrepresentative, because I used the Thread.Sleep method for simulating the work inside the loop.
My personal suggestion is: whenever you use any of the Parallel methods, specify always explicitly the MaxDegreeOfParallelism. In case you buy my arguments that saturating the ThreadPool is undesirable and unhealthy, you can configure it with a suitable value like Environment.ProcessorCount. In case you buy Microsoft's arguments, you can configure it with -1. In any case everyone who sees your code, will be hinted that you made a conscious and informed decision.
¹ The injection rate of the ThreadPool is not documented. The "one new thread per second" is an experimental observation.
it sets number of threads to run in parallel...
Related
DateTime multithreading UDP Sender has very different time results when re-assigning the value of DateTime object in my schedule loop
Summary: the only code difference between blocks is the assignment and re-assignment of the DateTime object I have an app that sends UDP packets out on a very redamentry "schedule". I am seeing some behavior that makes me think I have a big misunderstanding about timers, threads, or some other concepts- I am trying to send out packets using the SendAllMessages() method based on a SendingSchedule List. My sending freqencies are are 12.5Hz, or just under a .1 seconds delay between each burst of messages. However, I am seeing much different values if I re-assign DateTime object within my loop in this method. Note that this method is run in 4 different tasks, so 4 different threads. The first example of my code gives the behavior I desire and expect: sending 4 packets sent at nearly the same time, then .08 seconds later I see another burst of 4 messages. The second example of code shows messages at a much slower rate, but I don't understand why. I thought both would behave identically. Is my "time" object being shared between threads somehow in my second example, or is something else happening? WorkingCode (burst of 4 messages followed by waiting .08 seconds before another burst): private static void SendAllMessages(List<DataMessageFormat> dataMessageList, UDPSender udpSender, byte[] first4bytes, int messageSize, bool loopContinuously = false) { // pass inetMessageList to DataMessageEncoder MessageDataEncoder dataMessageEncoder = new MessageDataEncoder(); List<byte[]> byteArrayListDataMessage = dataMessageEncoder.ConvertFromFormatToByteArray(dataMessageList, first4bytes, messageSize, switchDefaultEndian); Console.WriteLine("Sending " + first4bytes + " UDP Messages on Thread" + Thread.CurrentThread.ManagedThreadId); do { DateTime start = DateTime.Now; for (int i = 0; i < byteArrayListDataMessage.Count; i++) // all message lists must have the same count for this to work { DateTime time = start.AddSeconds(dataMessageEncoder.SendingSchedule[i]); Send: if (DateTime.Now > time) { udpSender.SendUDPOnce(byteArrayListDataMessage[i]); } else { System.Threading.Thread.Sleep(1); goto Send; } } } while (loopContinuously); } The below code waits a very long time between bursting 4 messages, almost like the threads are all waiting on the same DateTimeObject: private static void SendAllMessages(List<DataMessageFormat> dataMessageList, UDPSender udpSender, byte[] first4bytes, int messageSize, bool loopContinuously = false) { // pass inetMessageList to DataMessageEncoder MessageDataEncoder dataMessageEncoder = new MessageDataEncoder(); List<byte[]> byteArrayListDataMessage = dataMessageEncoder.ConvertFromFormatToByteArray(dataMessageList, first4bytes, messageSize, switchDefaultEndian); Console.WriteLine("Sending " + first4bytes + " UDP Messages on Thread" + Thread.CurrentThread.ManagedThreadId); do { DateTime time = DateTime.Now; for (int i = 0; i < byteArrayListDataMessage.Count; i++) // all message lists must have the same count for this to work { time = time.AddSeconds(dataMessageEncoder.SendingSchedule[i]); Send: if (DateTime.Now > time) { udpSender.SendUDPOnce(byteArrayListDataMessage[i]); } else { System.Threading.Thread.Sleep(1); goto Send; } } } while (loopContinuously); } Below is my good code resulting screenshot of wireshark Below is my Bad code resulting screenshot of wireshark Just to give you an idea of what this app is doing:
Both codes have different logic: First: the time value you are comparing is the initial time plus the schedule. Once you hit the desired time, probably the if test will always be true and you send the entire byteArrayListDataMessage.Count elements. Second: every step of the for loop creates moving target so only one element gets sent.
StopWatch in C# to calculate inactivity time
I've got the code: public static Stopwatch stopWatch = new Stopwatch(); private void start_Checker() { stopWatch.Start(); while (true) { TimeSpan ts = stopWatch.Elapsed; if (ts.Minutes % 15 == 0) { Core.sendLog("Detected " + ts.Minutes + " of possible inactivity. Bot might be in game. Waiting " + (Core.inactivity_Max - ts.Minutes) + " minutes before restarting", false); } if (ts.Minutes >= Core.inactivity_Max) { Core.sendLog(Core.inactivity_Max + " minutes of inactivity - restarting the bot."); Thread.Sleep(500); Process.Start(Assembly.GetExecutingAssembly().Location); Environment.Exit(0); } Thread.Sleep(10000); } } and this one in the Core class: public static void sendLog(string text, bool isAction = true) { if (isAction) { Listener.stopWatch.Reset(); } using (WebClient client = new WebClient()) { try { string log = "[" + account[0] + "] " + text + " | Time: " + DateTime.Now; client.OpenRead(url + #"elements/logs/logs.php?file=" + used_email + "&text=" + log); } catch (Exception) { return; } } } It is supposed to send the log every 15 minutes, and if the ts.Minutes is longer than max inactivity time - it's supposed to reset the application. Everytime sendLog() is executed, it resets the stopwatch's time. Current code results in the log file being spammed with messages like below: [ChristianFromDK] Detected0 of possible inactivity. Bot might be in game. Waiting 80 minutes before restarting | Time: 7/21/2017 7:50:18 PM What have I done wrong?
Since you are just checking for modulo there will be a message every ten seconds while the StopWatch is less than one minute and for every 15 minutes from there on. This is due to zero modulo 15 being zero so the condition matches. If you want to call this only once every 15 minutes you will have to compare the current value to a previous value, if it's more than 15 minutes send the message and then set the previous value to current value. Then keep comparing. This way it will happen only once when the timer gets to 15 minutes. Also remember to zero the previous value when the stopwatch is zeroed. You could also use a timer for it canceling it when the stopwatch is zeroed. Usually system timers are less resource intensive.
avoid long connection time
I have a class called messaging. When an instance of the class is created a connection is made to a service which in turn has access to a database. This connection takes 5 seconds (Messaging.Connection MESConnection = new Messaging.Connection(); below).The class has a method that the user can submit a message which puts some data in the database. When a user presses a button i want to submit X number of messages to the database using threading. I have this working using the task parralel library but the issue is that the X threads create X instances of the class which means that the whole operation takes about 10 seconds if X is 30 for example. How could i offline have say 10 connections open such that when messages are submitted, the connection to the database is already open and hence i could avoid the 5 second connection time? C# Code // Loop through and multithread foreach (string container in containers) { int output = Convert.ToInt32(container); Task t = Task.Factory.StartNew(() => { Messaging.Connection MESConnection = new Messaging.Connection(); //Takes 5 seconds BSCContainerWorkflowResponse.BscContainerWorkflowResponse WorkflowResponse2; // Get device next step MESConnection.xmlMessage = Messaging.BscContainerNextTaskRequest(Convert.ToString(output)); // Send message to MES String result; result = MESConnection.SendMessage(); if (result != "") { MessageBox.Show("Error sending message to MES: " + result); return null; } result = MESConnection.GetReply(); if (result != "") { MessageBox.Show("Error receiving message from MES: " + result); return null; } WorkflowResponse2 = BSCContainerWorkflowResponse.ReadBscContainerWorkflowResponse(MESConnection.xmlReply); if (WorkflowResponse2.mes_message.msg_header.msg_stat < 0) { MessageBox.Show("Error with mes Response " + " message stat:" + Convert.ToString(WorkflowResponse.mes_message.msg_header.msg_stat) + " Error source " + (WorkflowResponse.mes_message.msg_error.error_source) + " Error code " + (WorkflowResponse.mes_message.msg_error.error_code) + " Error string " + Convert.ToString(WorkflowResponse.mes_message.msg_error.error_string), "MES Message Error"); return null; } return WorkflowResponse2; }).ContinueWith(o => { listBox1.Items.Add(o.Result.mes_message.msg_body.Container.Name + " " + o.Result.mes_message.msg_body.Container.Product.Name + " " + o.Result.mes_message.msg_body.Container.Product.BscModelNumber + " " + o.Result.mes_message.msg_body.Container.BscSerialNumber + " " + o.Result.mes_message.msg_body.Container.TaskList.Name + " " + o.Result.mes_message.msg_body.Container.TaskList.Revision + " " + Convert.ToString(o.Result.mes_message.msg_body.Container.MfgOrder.BscSWR)); buttonSendBSCNextTaskRequestThreaded.Text = "Process"; buttonSendBSCNextTaskRequestThreaded.Enabled = true; }, TaskScheduler.FromCurrentSynchronizationContext()); }
The right approach would be to use a connection pool to alleviate the long connection times. The ADO.NET providers have this built in. If your connection class doesn't, you can implement one yourself as it could improve the performance of the entire application. Although, the right approach in this particular situation would be to compare what takes longer: To create multiple connections to push the data through or Reuse a single connection to push the data through This will depend on how much data you are pushing through and the latency involved. To keep things simple, I'd probably start off with reusing 1 connection, and if found insufficient, try connection pooling. I'd only resort to parallelization if the latency is very high. Always dispose of disposables if you are not using it anymore - a connection typically requires disposing.
Multi Threaded Windows Application uses more memory when idle
I've got a technical question around memory usage and multi-threaded applications. My scenario is that I've built a server application (in C#) that is constantly running and updating DB info periodically (every minute). The application is split into two threads. The first thread handles the UI. The second thread handles all the grunt work in a continuous loop (until Tasks flag = false). When the second thread completes its tasks it goes into sleep for 60 seconds after which it loops and does it again. The application runs happily without memory issues or resource issues but I've noticed something strange which I don't understand. When the application is working (doing its tasks), it utilizes 100% of the available CPU and the memory drops to 80k kb. But when the tasks are finished and the second thread goes into sleep the memory goes up to 180k kb. I'd appreciate it if anyone can explain this? Code private void BackgroundWorker1() { BackgroundWorker bw = new BackgroundWorker(); bw.WorkerReportsProgress = true; bw.DoWork += new DoWorkEventHandler( delegate(object o, DoWorkEventArgs args) { BackgroundWorker b = o as BackgroundWorker; { while (Tasks) { try { SetValue(System.DateTime.Now.ToString() + " - Opening DB Connection \r\n"); GlobalVars.WiFiToolDataSource.Open(); SetValue(System.DateTime.Now.ToString() + " - Starting Task \r\n"); TicketFunctions.GetBOSSTickets(); SetValue(System.DateTime.Now.ToString() + " - Got Tickets from BOSS \r\n"); EmailFunctions.GetEmails(""); SetValue(System.DateTime.Now.ToString() + " - Got Emails from Inbox \r\n"); EmailFunctions.GetEmails("WiFI Survey Archive"); SetValue(System.DateTime.Now.ToString() + " - Got Emails from Archive \r\n"); EmailFunctions.checkEmails("BodyText", GlobalVars.ApplicationData + "Emails"); SetValue(System.DateTime.Now.ToString() + " - Loaded Emails in DB \r\n"); EmailFunctions.CreateEmailIndexes(); SetValue(System.DateTime.Now.ToString() + " - Created Email Indexes \r\n"); TicketFunctions.UpdateTicketList(); SetValue(System.DateTime.Now.ToString() + " - Updated Tickets \r\n"); TicketFunctions.BuildTicketKB(); SetValue(System.DateTime.Now.ToString() + " - Finished Build Knowledge Base \r\n"); SetValue(System.DateTime.Now.ToString() + " - Finished Refresh \r\n"); if (!Tasks) break; if (System.DateTime.Now.Hour.ToString("HH") == "00") { if (newFile) { string CDate = System.DateTime.Now.AddDays(-1).ToString("ddMMyyyy"); string FileName = "QA_LOG_" + CDate + ".txt"; System.IO.File.WriteAllText(GlobalVars.ApplicationData + #"QA Server Log\" + FileName, textBox1.Text); ClearValue(); newFile = false; } } if (System.DateTime.Now.Hour.ToString("HH") == "01") { newFile = true; } Thread.Sleep(60000); } catch (Exception e) { SetValue("Error : " + e.ToString() + " \r\n"); ; } finally { GlobalVars.WiFiToolDataSource.Close(); } } } }); bw.RunWorkerCompleted += new RunWorkerCompletedEventHandler( delegate(object o, RunWorkerCompletedEventArgs args) { SetValue(System.DateTime.Now.ToString() + " - Complete \r\n"); }); bw.RunWorkerAsync(); }
Odds are one of the later tasks that you have is one that is more memory intensive, while the earlier tasks take longer but don't consume as much memory. Given that you're sleeping for 60 seconds, there's a high chance that a garbage collection will run sometime during that time period, thus bringing the memory back down before the start of the next iteration. Without knowing the specifics of your functions, going into any more detail wouldn't be possible.
The Computer ultimately wants everything to be as close to the CPU as possible so while the threads active, it will try and move things out of memory into the various levels of cache. When the thread goes to sleep the computer will try to free up the different levels of cache, to make room for active tasks, moving things further and further away from the CPU ( back to memory or disk ).
Many TCPClients in benchmark do not close properly
I'm currently programing a benchmark for my TCP-Socket Server. The basic concept is the following: The Client creates 10000 connections There are 2500 connections concurrent They all send 10 seconds ping-pong messages to the server and receive the pong After the 10 seconds they all disconnect When I use smaller numbers of connections (100 concurrent and 1000 connections) everything works fine, but with the setup above, some of the connections remain connected at the server. This means that the close call never reaches the server at all. Here is the code for the explanation above: class Program { static List<Thread> mConnectionThreads_ = new List<Thread>(); //!< The list of the Threads for all textloaders static List<TCPConnection> mConnections_ = new List<TCPConnection>(); //!< The list of TextsXMLParser static void Main(string[] args) { int numConnections = 10000; int numConcurrentConnections = 2500; for( int k = 0; k < numConnections/numConcurrentConnections; ++k) { for( int i = 0; i < numConcurrentConnections; ++i ) { TCPConnection connection = new TCPConnection(); connection.connect(((k+1)*numConcurrentConnections)+i); mConnections_.Add(connection); mConnectionThreads_.Add(new Thread(connection.pingLoop)); } Console.WriteLine(((k+1)*numConcurrentConnections) + "/" + numConnections + " Threads connected"); // start all threads foreach (Thread t in mConnectionThreads_) t.Start(); foreach (Thread t in mConnectionThreads_) t.Join(); foreach (TCPConnection c in mConnections_) c.disconnect(); Console.WriteLine(((k+1)*numConcurrentConnections) + "/" + numConnections + " Threads disconnected " + cnt + " calls"); mConnections_.Clear(); mConnectionThreads_.Clear(); } } } The disconnect function looks like the following: public void disconnect() { if( mClient_.Client != null ) { mClient_.Client.Disconnect(false); //mClient_.GetStream().Close(); //mClient_.Close(); Console.WriteLine("closed " + mConnectionId_); } else if( mClient_.Client == null ) Console.WriteLine("closed invalid " + mConnectionId_); } As you can see I've already tried a lot of different close methods, but neighter works. Is there anything I can do in this case? Anybody else having the same issue?
Maybe I'm missing something but what type has mClient_.Client? Usually if you use TCP client (TCPClient class) you can call Close to close the connection. In the same fashion when using directly Socket or NetworkStream you can also call Close. On the other hand you're detecting and debugging connection open/closed connections on the server, right? There can be the possibility that server code does not handle connection close properly and thus you get incorrect statistics. Also under heavy load, server may not have enough CPU time to update the state for the connections so you can expect some delays. Does your server uses asynchronous I/O or connection per thread principle?