I've written a small, in house C# program which batch converts between various file formats, mostly relying on other API's. Currently, the UI spawns a BackgroundWorker (not a Thread) to handle the conversions, and then fills a queue with the requests which empties as the worker completes jobs. The queue object themselves are very small (3 Strings to tell the worker what to do) and don't contribute greatly to the program's memory footprint. I've been fairly draconian with my memory management (disposing of Images when finished with them, and manually garbage collecting at certain points.) Still, the program tends to use about 100 MB of memory at a time, and uses about 50% of total CPU time. It seems like if I naively implemented threading, it would quickly run out of system memory (unless the CLR does some sort of magic I don't know about.)
Is there a simple/efficient way to spawn threads to prevent the system from running out of memory besides catching OutOfMemory exceptions and rolling back the thread that died (seems very inefficient, but there's no way to preserve state without using prohibitive amounts of memory)?
If you use the ThreadPool.QueueUserWorkItem to spawn the conversions you will automatically get a limit on how many threads that are running. The ThreadPool manages this internally and will process queued calls as soon as a pool thread becomes available.
Put a limit on the queue size and make sender wait if it's full. You would have to find the right queue size limit empirically though.
Out of memory exceptions can be tricky and maybe caused by fragmented memory, not actually running out of memory. Therefore they can be tricky to track down.
Tess from Microsoft product support in Sweeden (http://blogs.msdn.com/tess/) has got a good number of posts on tracking down memory where memory is going to help with this process.
A good profile of your app could also help. JetBrains have a good one as is AQTime.
Using 100MB of memory is not a lot for an image processing application.
Also, in .net when you get an OutOfMemory exception the entire process dies, you can't recover from this.
If you need more memory (or, more correctly, more address space) than the system can give you can either page memory in and out as you use it or switch to 64bit.
You can limit the queue size to make sure the only "heavy" memory user is the worker thread, you can have more than one worker thread, that will increase your memory usage but will also empty the queue faster, just remember that in a CPU intensive operations like this having more threads than CPU cores is inefficient.
You talk about "naively implemented threading", multi-threading is full of pitfalls - a naive implementation will be inefficient and full of bugs.
Related
I would like to limit how much RAM a thread in C# can allocate, such that allocations beyond this limit will fail/crash or the thread will terminate.
So for example, suppose that I have 3 managed threads in C#, threads A, B, and C. I want to limit all three threads to not use more than 100 MB of RAM. If they do, then I want them to crash without impacting the other threads.
Now it seems to me that since C# is a virtual machine-based language, and uses a garbage collector, this should actually be theoretically possible without trying to use OS features that limit process RAM usage.
How can something like this be achieved? Any guidance would be greatly appreciated.
You can't limit the RAM used by a thread, or a group of threads, because the only RAM that is owned by a thread is its stack (around 1MB of memory). The memory that is allocated in the heap is shared by all threads of the program. It's not owned by any thread or groups of threads. There is no provision in the C# language or the .NET runtime for preventing a specific thread from interacting with an object allocated in the heap. If you want to limit the amount of RAM that is available for an operation, you must isolate this operation in a separate process, not a separate thread.
I was attempting to use worker threads to speed up a larger algorithm when I noticed that using independent priority queue's on more threads actually slowed performance down. So I wrote a small test case.
In which I query how many threads to start, set each thread to its own processor, and the push and pop a lot of stuff from my priority queues. Each thread owns it's own priority queue, and they're allocated separately so I don't suspect false sharing.
I put the test case here, because it's longer than a snippet.
(The processor affinity bit comes from NCrunch)
The priority queue is of my own creation because .NET didn't have a built in queue. It uses a Pairing Heap if that makes any difference.
At any rate if I run the program with one thread and one core, it gets about 100% usage.
Usage drops with two threads / two cores
And eventually pittles down to 30% usage with all 8 cores.
Which is a problem because the drop in performance nullifies any would be gain from multithreading. What's causing the drop in performance? Each queue is completely independent of the other thread's
Some problems like solving pi are more suited to parallelization and hyperthreading can actually give you a speed up. When you are dealing with a heavy memory problem like you are, Hyperthreading can't help and can actually hurt. Check out "pipelining" in CPU architecture.
There are not many practical problems for which you can get a 2x speedup by using 2-cpus. The more cpus, the more overhead. In your test case algorithm, I suspect cores are having to wait for the memory subsystem. If you tweak the memory requirements, you will see an increase in performance (and utilization) as you move the memory requirements closer to the CPU cache-size.
The OS is assigning the processing to whichever CPU it wishes to at a moment. Therefore, you are seeing every processor do some amount of work.
Additionally, when you say "drop in performance", have you checked how many contentions the system is creating? You probably are relieving yourself of contentions amongst the threads as well.
I have written a program in C# that does a lot of parallel work using different threads. When i reach approx 300 threads the GUI of the program starts to become slow and the execution of threads is also slowing down drastically. The threads are reading and writing data from a mySQL Database runnning on a different machine.
The funny thing is that if i split the work between two processes on the same machine everything runs perfect. Is there a thread limit per process in the .net framework or in windows? Or why am I getting this behaviour? Could it be a network related problem? I am running Windows 7 Ultimate and i have tried both VS2010 and VS 2012 with the same behaviour.
The way processor time is allocated is that the Operating System gives processor time to every process, then every process gives time to every thread.
So two processes will get twice the processor time, and that's why it works faster if you divide the program into two processes.
If you want to make the GUI run smoother, just set the priority higher for that thread.
This way the GUI thread will get more processor time then the other threads, but not so much that it will noticeably slow down the other threads.
300 threads is silly.
The number of threads should be in the range of your number of cores (2..8) and/or the max simultaneous connections (sometimes only 4 over TCP) your system supports.
Get beyond that and you're only wasting memory, at 1 MB per thread. In a 32bit system, 300 MB is already consuming a lot of the available mem space. And I assume each thread has some buffers attached.
If 2 separate processes perform better than1 then it probably isn't the context switching but either memory usage or a connection limit that holds you back.
Use ThreadPool. That should automatically allocate the optimal number of threads based on your system by throttling the number of threads in existence. You can also set the maximum number of threads allowable at any one time.
Also, if you're allocating thread to parallelize tasks from within a for-loop, foreach-loop, or linq statment you should look at the Parallel Class or PLINQ.
The accepted answer to this question will probably explain what is happening, but 300 threads seems like to many to be a good idea for any normal application.
At first if you have 300 threads for an application then probably you should rethink about your program design.
Setting up GUI threads priority may give you a better performance of GUI. But if you run so much thread the OS have to allocate space in program stack. And the stack is a continuous segment of the memory. So each time you create a new thread the allocated memory space for the stack may be incapable to hold the new thread. And then the OS must have to allocate a larger continuous space in the memory and copy all the data from the old stack to new stack. So obviously this may cause performance slow of your program.
the Windows Taskmanger is fine to check the CPU and memory useage of an application but in our programm we have different threads and we want to know how much of the total ammount is on each thread.
We would like to check this with an external programm and also from the application itself during runtime. It would be great if a thread could tell about his memory and cpu useage.
Here's is the example:
You have threadA and ThreadB.
ThreadA creats an object X.
ThreadB uses this object.
So what do you want to see in thread's information? Who created the object or who is using it?
The only thing you can see is how much CPU time is using thread AFAIK
And all the same the only program that I know that shows MAX info on process is Process Explorer. http://technet.microsoft.com/en-us/sysinternals/bb896653
You can use performance monitor to see how much memory is allocated to a process, but you cannot see the same for single threads inside it.
However, you could create custom performance counters to display any value you want to monitor from within your code.
SysInternals Process Explorer has this feature, check this Server fault thread.
There is an open source project on CodeProject, the screenshot looks promising: How to get CPU usage of processes and threads, but the demo project seems to be crashing on Win7 (probably missing some privileges).
[Edit] If you want to write it yourself, you can P/Invoke Thread32First and Thread32Next functions to enumerate threads within a single process, and then use QueryThreadCycleTime to query CPU time for each thread.
Objects are shared between threads, threads do not own objects.
Memory for an object is allocated on the heap, which lives in the realm of the application. Any thread can access any of this memory at any time during the lifetime of the application.
There is no way to determine which thread is or may be using any arbitrary blocks of memory.
Threads perform units of work. Unless you know which thread is going to be running which unit of work you will be able to get no reliable metrics out of CPU usage. If you do know which thread will be performing which tasks, then Process Explorer by SysInternals has this metric.
I am working on a C# application that works with an array. It walks through it (meaning that at one time only a narrow part of the array is used). I am considering adding threads in it to make it perform faster (it runs on a dualcore computer). The problem is that I do not know if it would actually help, because threads cost something and this cost could easily be more than the parallel gain... So how do I determine if threading would help?
Try writing some benchmarks that mimic, as closely as possible, the real-world conditions in which your software will actually be used.
Test and time the single-threaded version. Test and time the multi-threaded version. Compare the two sets of results.
If your application is CPU bound (i.e. it isn't spending time trying to read files or waiting for data from a device) and there is little to no sharing of live data (data being altered, if its read only its fine) between the threads then you can pretty much increase the speed by 50->75% by adding another thread (as long as it still remains CPU bound of course).
The main overhead in multithreading comes from 2 places.
Creation & initialization of the thread. Creating a thread requires quite a few resources to be allocated and involves swaps between kernel and user mode, this is expensive though a once off per thread so you can pretty much ignore it if the thread is running for any reasonable amount of time. The best way to mitigate this problem is to use a thread pool as it will keep the thread on hand and not need to be recreated.
Handling synchronization of data. If one thread is reading from data that another is writing, bad things will generally happen (worse if both are changing it). This requires you to lock your data before altering it so that no thread reads a half written value. These locks are generally quite slow as well. To mitigate this problem, you need to design your data layout so that the threads don't need to read or write to the same data as much as possible. If you do need a lot of these locks it can then become slower than the single thread option.
In short, if you are doing something that requires the CPU's to share a lot of data, then multi-threading it will be slower and if the program isn't CPU bound there will be little or no difference (could be a lot slower depending on what it is bound to, e.g. a cd/hard drive). If your program matches these conditions, then it will PROBABLY be worthwhile to add another thread (though the only way to be certain would be profiling).
One more little note, you should only create as many CPU bound threads as you have physical cores (threads that idle most of the time, such as a GUI message pump thread, can be ignored for this condition).
P.S. You can reduce the cost of locking data by using a methodology called "lock-free programming", though this something that should really only be attempted by people with a lot of experience with multi-threading and a clear understanding of their target architecture (including how the cache is treated and the memory bus).
I agree with Luke's answer. Benchmark it, it's the only way to be sure.
I can also give a prediction of the results - the fastest version will be when the number of threads matches the number of cores, EXCEPT if the array is very small and each thread would have to process just a few items, the setup/teardown times might get larger than the processing itself. How few - that depends on what you do. Again - benchmark.
I'd advise to find out a "minimum number of items for a thread to be useful". Then, when you are deciding how many threads to spawn (or take from a pool), check how many cores the computer has and how many items there are. Spawn as many threads as possible, but no more than the computer has cores, and not so many that each thread would have less than the minimum number of items to process.
For example if the minimum number of items is, say, 1000; and the computer has 4 cores; and your list contains 2500 items, you would spawn just 2 threads, because more threads would be inefficient (each would process less than 1000 items).
Making a step by step list for Luke's idea:
Make a single threaded test app
Download Sysinternals Process Monitor and run it
Run your test app and find it on the process list (remember to run it as a release build outside of Visual Studio)
Double click the process and select the Performance Graph tab
Observe the CPU time used by your process
If the CPU time is sittling flat 50% for more than a few seconds, you can probably speed your overall process up using threads (assuming the bunch of stuff Mr Peters refered to holds true)
(However, the best you can do on a duel core machine is to halve the time it takes to run. If your process only take 4 seconds, it might not be worth getting it to run in 2 seconds)
Using the task parallel library / Rx provides a friendlier interface than System.Threading.ThreadPool, which might make your world a bit easier.
You miss imho one item, which is that it is not always about execution time. There is:
The problem to koop a UI operational during an operation. Even if the UI is "dormant", a nonresponsive message pump makes a worse impression.
The possibility to use a thread pool to actually not ahve to start / stop threads all the time. I use thread pools very extensively, and various parts of the applications keep them busy.
Anyhow, ignoring my point 1 - where you may go multi threaded without speeding things up in order to keep your UI responsive - I would say it is always then faster when you can actually either split up work (so you can keep more than one core busy) or offload it for othe reasons.