How to throttle multi-threaded application properly?

How to throttle multi-threaded application properly? - c#

I have a c# console application running on a 64-bit Windows 2008 r2 server which also hosts MSSQL Server 2005.
This application runs through text files, reads the line, splits the line values into variables, and inserts the data into a SQL database hosted at localhost.
Each Text file is a new thread, each line is a new thread, and each SQL insert statement is executed under a new thread.
I am counting the number of each of these types of threads and decrementing when they complete. I'm wondering what the best way is to "pend" future threads from opening...
For example.. before a new SQL insert thread is opened I'm calling...
while(numberofcurrentthreads > specifiednumberofthreads)
{
// wait
}
new.Thread(insertSQL);
Where specifiednumberofthreads has been estimated to a value that does not throw System.OutofMemoryExceptions. A lot of guess work has gone into determining that number for each process.
My questions is.. is there a more 'efficient' or proper way to do this? Is there a way to read System memory, not physical memory, and wait based on a specified resource allotment?
To illustrate this idea...
while(System.Memory < (System.Memory/2) || System.OutofMemory == true)
{
// wait
}
new.Thread(insertSQL);
The current method I am employing works and completes in a decent time.. but it could do better. Some of the text files going through the process are larger than others and do not necessarily make the best use of system resources...
In example, if I say process 2 text files at a time that works perfectly when both text files are < 300KB. It does not work so well if one or two are over 100,000KB.
There also seems to be a 'butter-zone' where things process most efficiently. Somewhere averaging around 75% of all CPU resources. Crank these values too high and it will run at 100% CPU but process way slower as it cannot keep up.

It's crazy to be creating a new thread for every file and for every line and for every SQL insert statement. You'd probably be much better off using three threads and a chained producer-consumer model, all of which communicate through thread-safe queues. In C#, that would be BlockingCollection.
First, you set up two queues, one for lines that have been read from a text file, and one for lines that have been processed:
const int MaxQueueSize = 10000;
BlockingCollection<string> _lines = new BlockingCollection<string>(MaxQueueSize);
BlockingCollection<DataObject> _dataObjects = new BlockingCollection<DataObject>(MaxQueueSize);
DataObject, by the way, is what I'm calling the object that you'll be inserting into the database. You don't say what that is. It doesn't really matter for the purposes of this discussion, but you'd replace it with whatever type you use to represent the processed string.
Now, you create three threads:
A thread that reads text files line-by-line and places the lines into the _lines queue.
A line processor that reads lines one-by-one from the _lines queue, processes it, and creates a DataObject which it then places on the _dataObjects queue.
A thread that reads the _dataObjects queue and inserts them into the database.
Beyond simplicity (and this is very easy to put together), there are many benefits to this model.
First, having more than one thread reading from the disk concurrently usually leads to slower performance because the disk drive can only do one thing at a time. Having multiple threads hitting the disk at the same time just causes unnecessary head seeks. Just one thread will keep your input queue full.
Second, limiting the queues' sizes will prevent you from running out of memory. When the disk reading thread tries to insert the 10,001th item into the queue, it will wait until the processing thread removes an item. That's the "blocking" part of BlockingCollection.
You might find that you can speed your SQL inserts by grouping them and sending a bunch of records at once, doing what is essentially a bulk insert of 100 or 1000 records at a time rather than sending 100 or 1000 individual transactions.
This solution prevents the problem of too many threads. You have a fixed number of threads, all of which are running as fast as they possibly can. And memory use is constrained by limiting the number of things that can be in the queues.
The solution also scales rather well. If you have files on multiple drives, you can add a second file reading thread to read the files from that other physical drive and places the lines in the same queue. BlockingCollection supports multiple producers and multiple consumers, so adding another producer is no trouble at all.
The same goes for consumers. If you find that the processing step is the bottleneck, you can add another processing thread. It, too, will read from the _lines queue and write to the dataObjects queue.
However, having more threads than you have processor cores will likely make your program slower. If you have a four-core processor, creating 8 processing threads won't do you any good. It will make things slower because the operating system will be spending a lot of time on thread context switches rather than on doing useful work.
You'll have to do a little tuning to get the best performance. Queue sizes should be large enough to support continuous workflow (so no thread is starved of work, or spends too much time waiting for the output queue), but not so large to overfill memory. Depending on the relative speed of the three stages, one of the queues might have to be larger than the other. If one of the three stages is a bottleneck, you can add another thread to help at that stage.
I created a simple example of this model using text files for input and output. It should be pretty easy to extend for your situation. See Simple Multithreading, and the follow up, Part 2.

Related

SSAS Processing via c# (Microsoft.AnalysisServices) Much Slower than SSMS

I am processing my SSAS Cube programmatically. I process the dimensions in parallel (I manage the parallel calls to .Process() myself) and once they're all finished, I process the measure group partitions in parallel (again managing the parallelism myself).
As far as I can see, this is a direct replication of what I would otherwise do in SSMS (same process types etc.) The only difference I can see is that I'm processing ALL of the dimensions in parallel and ALL of the measure group partitions in parallel thereafter. If you right click process several objects within SSMS, it appears to only process 2 in parallel at any one time (inferred from the text that indicates process has not started in all processing windows other than 2). But if anything, I would expect my code to be faster, not slower than SSMS.
I have wrapped the processing action with "starting" and "finishing" debug messages and everything is as expected. It is the work done by .Process() that seems to be much slower than SSMS.
On a Cube that normally takes just under 1 hour to process, it is taking 7.5 hours.
On a cube that normally takes just under 3 minutes to process, it is taking 6.5 minutes.
As far as I can tell, the processing of dimensions is about the same but the measure groups are significantly slower. However, the latter are much much larger of course so it might just be that the difference is not as obvious to me.
I'm at a loss for ideas and would appreciate any help! Am I missing a setting? Is managing the parallelism myself and processing multiple in parallel as opposed to 2 causing a problem?

If you can provide your code I'm happy to look but my guess is that you are calling dimension.Process() in parallel threads expecting it to process in parallel on the server. It won't. It will process in serial due to locking because you are executing separate processing batches and separate transactions.
Any reason not to process everything (rather than incrementally processing just recent partitions or something)? Let's start simple and see if this is all you need. Can you get the database object and just do a ProcessFull? That will properly process in parallel all dimensions and measure groups.
database.Process(ProcessType.ProcessFull)
If you do need incremental processing then review this link for using ExecuteCaptureLog(true,true) to run multiple ProcessUpdate commands in parallel and in a transaction:
https://jesseorosz.wordpress.com/2006/11/20/how-to-process-dimensions-in-parallel-using-amo/
I would recommend including the partitions you want to process in that transactional batch. It will know the right dependencies automatically. Also make sure to include a ProcessIndexes on the cube object in that batch so flexible aggs and indexes on old partitions get rebuilt after the dimension ProcessUpdate.

Fastest way to read many 300 bytes chunks randomly by file offset from a 2TB file?

I have some 2TB read only (no writing once created) files on a RAID 5 (4 x 7.2k # 3TB) system.
Now I have some threads that wants to read portions of that file.
Every thread has an array of chunks it needs.
Every chunk is addressed by file offset (position) and size (mostly about 300 bytes) to read from.
What is the fastest way to read this data.
I don't care about CPU cycles, (disk) latency is what counts.
So if possible I want take advantage of NCQ of the hard disks.
As the files are highly compressed and will accessed randomly and I know exactly the position, I have no other way to optimize it.
Should I pool the file reading to one thread?
Should I keep the file open?
Should every thread (maybe about 30) keep every file open simultaneously, what is with new threads that are coming (from web server)?
Will it help if I wait 100ms and sort my readings by file offsets (lowest first)?
What is the best way to read the data? Do you have experiences, tips, hints?

The optimum number of parallel requests depends highly on factors outside your app (e.g. Disk count=4, NCQ depth=?, driver queue depth=? ...), so you might want to use a system, that can adapt or be adapted. My recommendation is:
Write all your read requests into a queue together with some metadata that allows to notify the requesting thread
have N threads dequeue from that queue, synchronously read the chunk, notify the requesting thread
Make N runtime-changeable
Since CPU is not your concern, your worker threads can calculate a floating latency average (and/or maximum, depending on your needs)
Slide N up and down, until you hit the sweet point
Why sync reads? They have lower latency than ascync reads.
Why waste latency on a queue? A good lockless queue implementation starts at less than 10ns latency, much less than two thread switches
Update: Some Q/A
Should the read threads keep the files open? Yes, definitly so.
Would you use a FileStream with FileOptions.RandomAccess? Yes
You write "synchronously read the chunk". Does this mean every single read thread should start reading a chunk from disk as soon as it dequeues an order to read a chunk? Yes, that's what I meant. The queue depth of read requests is managed by the thread count.

Disks are "single threaded" because there is only one head. It won't go faster no matter how many threads you use... in fact more threads probably will just slow things down. Just get yourself the list and arrange (sort) it in the app.
You can of course use many threads that'd make use of NCQ probably more efficient, but arranging it in the app and using one thread should work better.
If the file is fragmented - use NCQ and a couple of threads because you then can't know exact position on disk so only NCQ can optimize reads. If it's contignous - use sorting.
You may also try direct I/O to bypass OS caching and read the whole file sequentially... it sometimes can be faster, especially if you have no other load on this array.

Will ReadFileScatter do what you want?

Threads vs Processes in .NET

I have a long-running process that reads large files and writes summary files. To speed things up, I'm processing multiple files simultaneously using regular old threads:
ThreadStart ts = new ThreadStart(Work);
Thread t = new Thread(ts);
t.Start();
What I've found is that even with separate threads reading separate files and no locking between them and using 4 threads on a 24-core box, I can't even get up to 10% on the CPU or 10% on disk I/O. If I use more threads in my app, it seems to run even more slowly.
I'd guess I'm doing something wrong, but where it gets curious is that if I start the whole exe a second and third time, then it actually processes files two and three times faster. My question is, why can't I get 12 threads in my one app to process data and tax the machine as well as 4 threads in 3 instances of my app?
I've profiled the app and the most time-intensive and frequently called functions are all string processing calls.

It's possible that your computing problem is not CPU bound, but I/O bound. It doesn't help to state that your disk I/O is "only at 10%". I'm not sure such performance counter even exists.
The reason why it gets slower while using more threads is because those threads are all trying to get to their respective files at the same time, while the disk subsystem is having a hard time trying to accomodate all of the different threads. You see, even with a modern technology like SSDs where the seek time is several orders of magnitude smaller than with traditional hard drives, there's still a penalty involved.
Rather, you should conclude that your problem is disk bound and a single thread will probably be the fastest way to solve your problem.
One could argue that you could use asynchronous techniques to process a bit that's been read, while on the background the next bit is being read in, but I think you'll see very little performance improvement there.
I've had a similar problem not too long ago in a small tool where I wanted to calculate MD5 signatures of all the files on my harddrive and I found that the CPU is way too fast compared to the storage system and I got similar results trying to get more performance by using more threads.
Using the Task Parallel Library didn't alleviate this problem.

First of all on a 24 core box if you are using only 4 threads the most cpu it could ever use is 16.7% so really you are getting 60% utilization, which is fairly good.
It is hard to tell if your program is I/O bound at this point, my guess is that is is. You need to run a profiler on your project and see what sections of code your project is spending the most of it's time. If it is sitting on a read/write operation it is I/O bound.
It is possable you have some form of inter-thread locking being used. That would cause the program to slow down as you add more threads, and yes running a second process would fix that but fixing your locking would too.
What it all boils down to is without profiling information we can not say if using a second process will speed things up or make things slower, we need to know if the program is hanging on a I/O operation, a locking operation, or just taking a long time in a function that can be parallelized better.

I think you find out what file cache is not ideal in case when one proccess write data in many file concurrently. File cache should sync to disk when the number of dirty page cache exceeds a threshold. It seems concurrent writers in one proccess hit threshold faster than the single thread writer. You can read read about file system cache here File Cache Performance and Tuning

Try using Task library from .net 4 (System.Threading.Task). This library have built-in optimizations for different number of processors.
Have no clue what is you problem, maybe because your code snippet is not really informative

How to Download 5 files at a time using Thread in .net framework 3.5

I need to download certain files using FTP.Already it is implemented without using the thread. It takes too much time to download all the files.
So i need to use some thread for speed up the process .
my code is like
foreach (string str1 in files)
{
download_FTP(str1)
}
I refer this , But i don't want every files to be queued at ones.say for example 5 files at a time.

If the process is too slow, it means most likely that the network/Internet connection is the bottleneck. In that case, downloading the files in parallel won't significantly increase the performance.
It might be another story though if you are downloading from different servers. We may then imagine that some of the servers are slower than others. In that case, parallel downloads would increase the overall performance since the program would download files from other servers while being busy with slow downloads.
EDIT: OK, we have more info from you: Single server, many small files.
Downloading multiple files involves some overhead. You can decrease this overhead by somehow grouping the files (tar, zip, whatever) on server-side. Of course, this may not be possible. If your app would talk to a web server, I'd advise to create a zip file on the fly server-side according to the list of files transmitted in the request. But you are on an FTP server so I'll assume you have nearly no flexibility server-side.
Downloading several files in parallel may probably increase the throughput in your case. Be very careful though about restrictions set by the server such as the max amount of simultaneous connections. Also, keep in mind that if you have many simultaneous users, you'll end up with a big amount of connections on the server: users x threads. Which may prove counter-productive according to the scalability of the server.
A commonly accepted rule of good behaviour consists in limiting to max 2 simultaneoud connections per user. YMMV.

Okay, as you're not using .NET 4 that makes it slightly harder - the Task Parallel Library would make it really easy to create five threads reading from a producer/consumer queue. However, it still won't be too hard.
Create a Queue<string> with all the files you want to download
Create 5 threads, each of which has a reference to the queue
Make each thread loop, taking an item off the queue and downloading it, or finishing if the queue is empty
Note that as Queue<T> isn't thread-safe, you'll need to lock to make sure that only one thread tries to fetch an item from the queue at a time:
string fileToDownload = null;
lock(padlock)
{
if (queue.Count == 0)
{
return; // Done
}
fileToDownload = queue.Dequeue();
}
As noted elsewhere, threading may not speed things up at all - it depends where the bottleneck is. If the bottleneck is the user's network connection, you won't be able to get more data down the same size of pipe just by using multi-threading. On the other hand, if you have a lot of small files to download from different hosts, then it may be latency rather than bandwidth which is the problem, in which case threading will help.

look up on ParameterizedThreadStart
List<System.Threading.ParameterizedThreadStart> ThreadsToUse = new List<System.Threading.ParameterizedThreadStart>();
int count = 0;
foreach (string str1 in files)
{
ThreadsToUse.add(System.Threading.ParameterizedThreadStart aThread = new System.Threading.ParameterizedThreadStart(download_FTP));
ThreadsToUse[count].Invoke(str1);
count ++;
}
I remember something about Thread.Join that can make all threads respond to one start statement, due to it being a delegate.
There is also something else you might want to look up on which i'm still trying to fully grasp which is AsyncThreads, with these you will know when the file has been downloaded. With a normal thread you gonna have to find another way to flag it's finished.
This may or may not help your speed, in one way of your line speed is low then it wont help you much,
on the other hand some servers set each connection to be capped to a certain speed in which you this in theory will set up multiple connections to the server therefore having a slight increase in speed. how much increase tho I cannot answer.
Hope this helps in some way

I can add some experience to the comments already posted. In an app some years ago I had to generate a treeview of files on an FTP server. Listing files does not normally require actual downloading, but some of the files were zipped folders and I had to download these and unzip them, (sometimes recursively), to display the files/folders inside. For a multithreaded solution, this reqired a 'FolderClass' for each folder that could keep state and so handle both unzipped and zipped folders. To start the operation off, one of these was set up with the root folder and submitted to a P-C queue and a pool of threads. As the folder was LISTed and iterated, more FolderClass instances were submitted to the queue for each subfolder. When a FolderClass instance reached the end of its LIST, it PostMessaged itself, (it was not C#, for which you would need BeginInvoke or the like), to the UI thread where its info was added to the listview.
This activity was characterised by a lot of latency-sensitive TCP connect/disconnect with occasional download/unzip.
A pool of, IIRC, 4-6 threads, (as already suggested by other posters), provided the best performance on the single-core system i had at the time and, in this particular case, was much faster than a single-threaded solution. I can't remember the figures exactly, but no stopwatch was needed to detect the performance boost - something like 3-4 times faster. On a modern box with multiiple cores where LISTs and unzips could occur concurrently, I would expect even more improvement.
There were some problems - the visual ListView component could not keep up with the incoming messages, (because of the multiple threads, data arrived for aparrently 'random' positions on the treeview and so required continual tree navigation for display), and so the UI tended to freeze during the operation. Another problem was detecting when the operation had actually finished. These snags are probably not relevant to your download-many-small-files app.
Conclusion - I expect that downloading a lot of small files is going to be faster if multithreaded with multiple connections, if only from mitigating the connect/disconnect latency which can be larger than the actual data download time. In the extreme case of a satellite connection with high speed but very high latency, a large thread pool would provide a massive speedup.
Note the valid caveats from the other posters - if the server, (or its admin), disallows or gets annoyed at the multiple connections, you may get no boost, limited bandwidth or a nasty email from the admin!
Rgds,
Martin

Fast data recording/logging on a separate thread in C#

We're developing an application which reads data from a number of external hardware devices continuously. The data rate is between 0.5MB - 10MB / sec, depending on the external hardware configuration.
The reading of the external devices is currently being done on a BackgroundWorker. Trying to write the acquired data to disk with this same BackgroundWorker does not appear to be a good solution, so what we want to do is, to queue this data to be written to a file, and have another thread dequeue the data and write to a file. Note that there will be a single producer and single consumer for the data.
We're thinking of using a synchronized queue for this purpose. But we thought this wheel must have been invented so many times already, so we should ask the SO community for some input.
Any suggestions or comments on things that we should watch out for would be appreciated.

I would do what a combination of what mr 888 does.
Basicly in you have 2 background workers,
one that reads from the hardware device.
one that writes the data to disk.
Hardware background worker:
Adds chucks on data from the hardware in the Queue<> . In whatever format you have it in.
Write background worker
Parses the data if needed and dumps to disk.
One thing to consider here is is getting the data from the hardware to disk as fast as posible importent?
If Yes, then i would have the write brackground test basicly in a loop with a 100ms or 10ms sleep in the while loop with checking if the que has data.
If No, Then i would have it either sleep a simular amount ( Making the assumtion that the speed you get from your hardware changes periodicly) and make only write to disk when it has around 50-60mb of data. I would consider doing it this way because modern hard drives can write about 60mb pr second ( This is a desktop hard drive, your enterprice once could be much quicker) and constantly writing data to it in small chucks is a waste of IO bandwith.

I am pretty confident that your queue will be pretty much ok. But make sure that you use efficient method of storing/retrieving data not to overhaul you logging procedure with memory allocation/deallocation. I would go for some pre-allocated memory buffer, and use it as a circular queue.

u might need queing
eg. code
protected Queue<Byte[]> myQ;
or
protected Queue<Stream> myQ;
//when u got the content try
myQ.Enque(...);
and use another thread to pop the queue
// another thread
protected void Loging(){
while(true){
while(myQ.Count > 0){
var content = myQ.Dequeue();
// save content
}
System.Threading.Thread.Sleep(1000);
}
}

I have a similar situation, In my case I used an asynchrounous lockfree queue with a LIFO synchronous object
Basically the threads that write to the queue set the sync object in the LIFO while the other threads 'workers' reset the sync object in the LIFO
We have fixed number of sync objects that are equal to the threads. the reason for using a LIFO is that to keep minimum number of threads running and better use of cache system.

Have you tried MSMQ

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.