I have a small list of rather large files that I want to process, which got me thinking...
In C#, I was thinking of using Parallel.ForEach of TPL to take advantage of modern multi-core CPUs, but my question is more of a hypothetical character;
Does the use of multi-threading in practicality mean that it would take longer time to load the files in parallel (using as many CPU-cores as possible), as opposed to loading each file sequentially (but with probably less CPU-utilization)?
Or to put it in another way (:
What is the point of multi-threading? More tasks in parallel but at a slower rate, as opposed to focusing all computing resources on one task at a time?
In order to not increase latency, parallel computational programs typically only create one thread per core. Applications which aren't purely computational tend to add more threads so that the number of runnable threads is the number of cores (the others are in I/O wait, and not competing for CPU time).
Now, parallelism on disk-I/O bound programs may well cause performance to decrease, if the disk has a non-negligible seek time then much more time will be wasted performing seeks and less time actually reading. This is called "churning" or "thrashing". Elevator sorting helps somewhat, true random access (such as solid state memories) helps more.
Parallelism does almost always increase the total raw work done, but this is only important if battery life is of foremost importance (and by the time you account for power used by other components, such as the screen backlight, completing quicker is often still more efficient overall).
You asked multiple questions, so I've broken up my response into multiple answers:
Multithreading may have no effect on loading speed, depending on what your bottleneck during loading is. If you're loading a lot of data off disk or a database, I/O may be your limiting factor. On the other hand if 'loading' involves doing a lot of CPU work with some data, you may get a speed up from using multithreading.
Generally speaking you can't focus "all computing resources on one task." Some multicore processors have the ability to overclock a single core in exchange for disabling other cores, but this speed boost is not equal to the potential performance benefit you would get from fully utilizing all of the cores using multithreading/multiprocessing. In other words it's asymmetrical -- if you have a 4 core 1Ghz CPU, it won't be able to overclock a single core all the way to 4ghz in exchange for disabling the others. In fact, that's the reason the industry is going multicore in the first place -- at least for now we've hit limits on how fast we can make a single CPU run, so instead we've gone the route of adding more CPUs.
There are 2 reasons for multithreading. The first is that you want to tasks to run at the same time simply because it's desirable for both to be able to happen simultaneously -- e.g. you want your GUI to continue to respond to clicks or keyboard presses while it's doing other work (event loops are another way to accomplish this though). The second is to utilize multiple cores to get a performance boost.
For loading files from disk, this is likely to make things much slower. What happens is the operating system tries to lay out files on disk such that you should only need to do an expensive disk seek once for each file. If you have a lot of threads reading a lot of files, you're gonna have contention over which thread has access to the disk, and you'll have to seek back to the right place in the file every time the next thread gets a turn.
What you can do is use exactly two threads. Set one to load all of the files in the background, and let the other remain available for other tasks, like handling user input. In C# winforms, you can do this easily with a BackgroundWorker control.
Multi-threading is useful for highly parallelizable tasks. CPU intensive tasks are perfect. Your CPU has many cores, many threads can use many cores. They'll use more CPU time, but in the end they'll use less "user" time. If your app is I/O bounded, then multithreading isn't always the solution (but it COULD help)
It might be helpful to first understand the difference between Multithreading and Parallelism, as more often than not I see them being used rather interchangeably. Joseph Albahari has written a quite interesting guide about the subject: Threading in C# - Part 5 - Parallelism
As with all great programming endeavors, it depends. By and large, you'll be requesting files from one physical store, or one physical controller which will serialize the requests anyhow (or worse, cause a LOT of head back-and-forth on a classical hard drive) and slow down the already slow I/O.
OTOH, if the controllers and the medium are separate, multiple cores loading data from them should be improved over a sequential method.
Related
This a VERY open question.
Basically, I have a computing application that launches test combinations for N Scenarios.
Each test is conducted in a single dedicated thread, and involves reading large binary data, processing it, and dropping results to DB.
If the number of threads is too large, the app gets rogue and eats out all available memory and hangs out..
What is the most efficient way to exploit all CPU+RAM capabilities (High Performance computing i.e 12Cores/16GB RAM) without putting the system down to its knees (which happens if "too many" simultaneous threads are launched, "too many" being a relative notion of course)
I have to specify that I have a workers buffer queue with N workers, every time one finishes and dies a new one is launched via a Queue. This works pretty fine as of now. But I would like to avoid "manually" and "empirically" setting the number of simultaneous threads and have an intelligent scalable system that drops as many threads at a time that the system can properly handle, and stop at a "reasonable" memory usage (the target server is dedicated to the app so there is no problem regarding other applications except the system)
PS : I know that .Net 3.5 comes with Thread Pools and .Net 4 has interesting TPL capabilites, that I am still considering right now (I never went very deep into this so far).
PS 2 : After reading this post I was a bit puzzled by the "don't do this" answers. Though I think such request is fair for a memory-demanding computing program.
EDIT
After reading this post I will to try to use WMI features
All built-in threading capabilities in .NET do not support adjusting according to memory usage. You need to build this yourself.
You can either predict memory usage or react to low memory conditions. Alternatives:
Look at the amount of free memory on the system before launching a new task. If it is below 500mb, wait until enough has been freed.
Launch tasks as they come and throttle as soon as some of them start to fail because of OOM. Restart them later. This alternative sucks big time because your process will do garbage collections like crazy to avoid the OOMs.
I recommend (1).
You can either look at free system memory or your own processes memory usage. In order to get the memory usage I recommend looking at private bytes using the Process class.
If you set aside 1GB of buffer on your 16GB system you run at 94% efficiency and are pretty safe.
I have a long-running process that reads large files and writes summary files. To speed things up, I'm processing multiple files simultaneously using regular old threads:
ThreadStart ts = new ThreadStart(Work);
Thread t = new Thread(ts);
t.Start();
What I've found is that even with separate threads reading separate files and no locking between them and using 4 threads on a 24-core box, I can't even get up to 10% on the CPU or 10% on disk I/O. If I use more threads in my app, it seems to run even more slowly.
I'd guess I'm doing something wrong, but where it gets curious is that if I start the whole exe a second and third time, then it actually processes files two and three times faster. My question is, why can't I get 12 threads in my one app to process data and tax the machine as well as 4 threads in 3 instances of my app?
I've profiled the app and the most time-intensive and frequently called functions are all string processing calls.
It's possible that your computing problem is not CPU bound, but I/O bound. It doesn't help to state that your disk I/O is "only at 10%". I'm not sure such performance counter even exists.
The reason why it gets slower while using more threads is because those threads are all trying to get to their respective files at the same time, while the disk subsystem is having a hard time trying to accomodate all of the different threads. You see, even with a modern technology like SSDs where the seek time is several orders of magnitude smaller than with traditional hard drives, there's still a penalty involved.
Rather, you should conclude that your problem is disk bound and a single thread will probably be the fastest way to solve your problem.
One could argue that you could use asynchronous techniques to process a bit that's been read, while on the background the next bit is being read in, but I think you'll see very little performance improvement there.
I've had a similar problem not too long ago in a small tool where I wanted to calculate MD5 signatures of all the files on my harddrive and I found that the CPU is way too fast compared to the storage system and I got similar results trying to get more performance by using more threads.
Using the Task Parallel Library didn't alleviate this problem.
First of all on a 24 core box if you are using only 4 threads the most cpu it could ever use is 16.7% so really you are getting 60% utilization, which is fairly good.
It is hard to tell if your program is I/O bound at this point, my guess is that is is. You need to run a profiler on your project and see what sections of code your project is spending the most of it's time. If it is sitting on a read/write operation it is I/O bound.
It is possable you have some form of inter-thread locking being used. That would cause the program to slow down as you add more threads, and yes running a second process would fix that but fixing your locking would too.
What it all boils down to is without profiling information we can not say if using a second process will speed things up or make things slower, we need to know if the program is hanging on a I/O operation, a locking operation, or just taking a long time in a function that can be parallelized better.
I think you find out what file cache is not ideal in case when one proccess write data in many file concurrently. File cache should sync to disk when the number of dirty page cache exceeds a threshold. It seems concurrent writers in one proccess hit threshold faster than the single thread writer. You can read read about file system cache here File Cache Performance and Tuning
Try using Task library from .net 4 (System.Threading.Task). This library have built-in optimizations for different number of processors.
Have no clue what is you problem, maybe because your code snippet is not really informative
I am interning for a company this summer, and I got passed down this program which is a total piece. It does very computationally intensive operations throughout most of its duration. It takes about 5 minutes to complete a run on a small job, and the guy I work with said that the larger jobs have taken up to 4 days to run. My job is to find a way to make it go faster. My idea was that I could split the input in half and pass the halves to two new threads or processes, I was wondering if I could get some feedback on how effective that might be and whether threads or processes are the way to go.
Any inputs would be welcomed.
Hunter
I'd take a strong look at TPL that was introduced in .net4 :) PLINQ might be especially useful for easy speedups.
Genereally speaking, splitting into diffrent processes(exefiles) is inadvicable for perfomance since starting processes is expensive. It does have other merits such as isolation(if part of a program crashes) though, but i dont think they are applicable for your problem.
If the jobs are splittable, then going multithreaded/multiprocessed will bring better speed. That is assuming, of course, that the computer they run on actually has multiple cores/cpus.
Threads or processes doesn't really matter regarding speed (if the threads don't share data). The only reason to use processes that I know of is when a job is likely to crash an entire process, which is not likely in .NET.
Use threads if theres lots of memory sharing in your code but if you think you'd like to scale the program to run across multiple computers (when required cores > 16) then develop it using processes with a client/server model.
Best way when optimising code, always, is to Profile it to find out where the Logjam's are IMO.
Sometimes you can find non obvious huge speed increases with little effort.
Eqatec, and SlimTune are two free C# profilers which may be worth trying out.
(Of course the other comments about which parallelization architecture to use are spot on - it's just I prefer analysis first....
Have a look at the Task Parallel Library -- this sounds like a prime candidate problem for using it.
As for the threads vs processes dilemma: threads are fine unless there is a specific reason to use processes (e.g. if you were using buggy code that you couldn't fix, and you did not want a bad crash in that code to bring down your whole process).
Well if the problem has a parallel solution then this is the right way to (ideally) significantly (but not always) increase performance.
However, you don't control making additional processes except for running an app that launches multiple mini apps ... which is not going to help you with this problem.
You are going to need to utilize multiple threads. There is a pretty cool library added to .NET for parallel programming you should take a look at. I believe its namespace is System.Threading.Tasks or System.Threading with the Parallel class.
Edit: I would definitely suggest though, that you think about whether or not a linear solution may fit better. Sometimes parallel solutions would taken even longer. It all depends on the problem in question.
If you need to communicate/pass data, go with threads (and if you can go .Net 4, use the Task Parallel Library as others have suggested). If you don't need to pass info that much, I suggest processes (scales a bit better on multiple cores, you get the ability to do multiple computers in a client/server setup [server passes info to clients and gets a response, but other than that not much info passing], etc.).
Personally, I would invest my effort into profiling the application first. You can gain a much better awareness of where the problem spots are before attempting a fix. You can parallelize this problem all day long, but it will only give you a linear improvement in speed (assuming that it can be parallelized at all). But, if you can figure out how to transform the solution into something that only takes O(n) operations instead of O(n^2), for example, then you have hit the jackpot. I guess what I am saying is that you should not necessarily focus on parallelization.
You might find spots that are looping through collections to find specific items. Instead you can transform these loops into hash table lookups. You might find spots that do frequent sorting. Instead you could convert those frequent sorting operations into a single binary search tree (SortedDictionary) which maintains a sorted collection efficiently through the many add/remove operations. And maybe you will find spots that repeatedly make the same calculations. You can cache the results of already made calculations and look them up later if necessary.
Basically, I'm wondering if threading is useful or necessary, or possibly more specifically the uses and situations in which you would use it. I don't know much about threading, and have never used it (I primarily use C#) and have wondered if there are any gains to performance or stability if you use them. If anyone would be so kind to explain, I would be grateful.
In the world of desktop applications (my domain), threading is a vital construct in creating responsive user interfaces. Whenever a time-or-computationally-intensive operation needs to run, it's almost essential to run that operation in a separate thread. Otherwise, the user interface locks up and, in some cases, Windows will decide that the whole application has become unresponsive.
Threading is also a vital tool in animation, audio and communications. Basically, any situation in which you find yourself needing to do several things at once lends itself to the use of threads.
there is definitely no gains to stability :). I would suggest you get a basic understanding of threading but don't jump to use it in any real production application until you have a real need. you have C# so not sure if you are building websites or winforms.
Usually the firsty threading use case for winforms is when a user click a button and you want to run some expensive operation (database or webservice call) but you dont want the screen to freeze up . .
a good tutorial to deal with that situation is to look at the backgroundworker class in c# as this will give you a first flavor into this space and then you can go from there
There was a time when our applications would speed up when we deploy them on new CPU. And that speed up was by large extent because CPU speed (clock) was incremented by large factors.
But several years ago, CPU manufacturers stopped increasing CPU clocks because of physical limits (e.g. heat dissipation). And instead they started adding additional cores to CPUs.
Now, if your application runs only on one thread it cannot take advantage of complete CPU (e.g. of 4 cores it uses only 1).
So today to fully utilize CPU we must take effort and divide task on multiple treads.
For ASP.NET this is already done for us by ASP.NET architecture and IIS.
Look here The Free Lunch Is Over: A Fundamental Turn Toward Concurrency in Software
Here is a simple example of how threading can improve performance. You have a n numbers that all needed to be added together. In a single threaded application, it will take a n time units to add all of the numbers together for the final sum. However, if you broke your numbers into 2 groups, you could have the same operation running side by side with, each with a group of n/2 numbers. Each would take n/2 time units to find their respective sums, and then an additional unit to find the full sum. By creating two threads, you have effectively cut the compute time in half.
Technically on a single core processor, there is no such thing as multi-threading, just the illusion that multiple tasks are happening in parallel since each task gets a small amount of time.
However, that being said, threading is very useful if you have to do some work that takes a long time but you want your application to be responsive (i.e. be able to do other things) while you wait for that task to finish. A good example is GUI applications.
On multi-core / multi-processor systems, you can have one process doing many things at once so the performance gain there is obvious :)
How often do you find yourself actually using spinlocks in your code? How common is it to come across a situation where using a busy loop actually outperforms the usage of locks?
Personally, when I write some sort of code that requires thread safety, I tend to benchmark it with different synchronization primitives, and as far as it goes, it seems like using locks gives better performance than using spinlocks. No matter for how little time I actually hold the lock, the amount of contention I receive when using spinlocks is far greater than the amount I get from using locks (of course, I run my tests on a multiprocessor machine).
I realize that it's more likely to come across a spinlock in "low-level" code, but I'm interested to know whether you find it useful in even a more high-level kind of programming?
It depends on what you're doing. In general application code, you'll want to avoid spinlocks.
In low-level stuff where you'll only hold the lock for a couple of instructions, and latency is important, a spinlock mat be a better solution than a lock. But those cases are rare, especially in the kind of applications where C# is typically used.
In C#, "Spin locks" have been, in my experience, almost always worse than taking a lock - it's a rare occurrence where spin locks will outperform a lock.
However, that's not always the case. .NET 4 is adding a System.Threading.SpinLock structure. This provides benefits in situations where a lock is held for a very short time, and being grabbed repeatedly. From the MSDN docs on Data Structures for Parallel Programming:
In scenarios where the wait for the lock is expected to be short, SpinLock offers better performance than other forms of locking.
Spin locks can outperform other locking mechanisms in cases where you're doing something like locking through a tree - if you're only having locks on each node for a very, very short period of time, they can out perform a traditional lock. I ran into this in a rendering engine with a multithreaded scene update, at one point - spin locks profiled out to outperform locking with Monitor.Enter.
For my realtime work, particularly with device drivers, I've used them a fair bit. It turns out that (when last I timed this) waiting for a sync object like a semaphore tied to a hardware interrupt chews up at least 20 microseconds, no matter how long it actually takes for the interrupt to occur. A single check of a memory-mapped hardware register, followed by a check to RDTSC (to allow for a time-out so you don't lock up the machine) is in the high nannosecond range (basicly down in the noise). For hardware-level handshaking that shouldn't take much time at all, it is really tough to beat a spinlock.
My 2c: If your updates satisfy some access criteria then they are good spinlock candidates:
fast, ie you will have time to acquire the spinlock, perform the updates and release the spinlock in a single thread quanta so that you don't get pre-empted while holding the spinlock
localized all data you update are in preferably one single page that is already loaded, you do not want a TLB miss while you holding the spinlock, and you definetely don't want an page fault swap read!
atomic you do not need any other lock to perform the operation, ie. never wait for locks under spinlock.
For anything that has any potential to yield, you should use a notified lock structure (events, mutex, semaphores etc).
One use case for spin locks is if you expect very low contention but are going to have a lot of them. If you don't need support for recursive locking, a spinlock can be implemented in a single byte, and if contention is very low then the CPU cycle waste is negligible.
For a practical use case, I often have arrays of thousands of elements, where updates to different elements of the array can safely happen in parallel. The odds of two threads trying to update the same element at the same time are very small (low contention) but I need one lock for every element (I'm going to have a lot of them). In these cases, I usually allocate an array of ubytes of the same size as the array I'm updating in parallel and implement spinlocks inline as (in the D programming language):
while(!atomicCasUbyte(spinLocks[i], 0, 1)) {}
myArray[i] = newVal;
atomicSetUbyte(spinLocks[i], 0);
On the other hand, if I had to use regular locks, I would have to allocate an array of pointers to Objects, and then allocate a Mutex object for each element of this array. In scenarios such as the one described above, this is just plain wasteful.
If you have performance critical code and you have determined that it needs to be faster than it currently is and you have determined that the critical factor is the lock speed, then it'd be a good idea to try a spinlock. In other cases, why bother? Normal locks are easier to use correctly.
Please note the following points :
Most mutexe's implementations spin for a little while before the thread is actually unscheduled. Because of this it is hard to compare theses mutexes with pure spinlocks.
Several threads spining "as fast as possible" on the same spinlock will consome all the bandwidth and drasticly decrease your program efficiency. You need to add tiny "sleeping" time by adding noop in your spining loop.
You hardly ever need to use spinlocks in application code, if anything you should avoid them.
I can't thing of any reason to use a spinlock in c# code running on a normal OS. Busy locks are mostly a waste on the application level - the spinning can cause you to use the entire cpu timeslice, vs a lock will immediatly cause a context switch if needed.
High performance code where you have nr of threads=nr of processors/cores might benefit in some cases, but if you need performance optimization at that level your likely making next gen 3D game, working on an embedded OS with poor synchronization primitives, creating an OS/driver or in any case not using c#.
I used spin locks for the stop-the-world phase of the garbage collector in my HLVM project because they are easy and that is a toy VM. However, spin locks can be counter-productive in that context:
One of the perf bugs in the Glasgow Haskell Compiler's garbage collector is so annoying that it has a name, the "last core slowdown". This is a direct consequence of their inappropriate use of spinlocks in their GC and is excacerbated on Linux due to its scheduler but, in fact, the effect can be observed whenever other programs are competing for CPU time.
The effect is clear on the second graph here and can be seen affecting more than just the last core here, where the Haskell program sees performance degradation beyond only 5 cores.
Always keep these points in your mind while using spinlocks:
Fast user mode execution.
Synchronizes threads within a single process, or multiple processes if in shared memory.
Does not return until the object is owned.
Does not support recursion.
Consumes 100% of CPU while "waiting".
I have personally seen so many deadlocks just because someone thought it will be a good idea to use spinlock.
Be very very careful while using spinlocks
(I can't emphasize this enough).