c# parallelize simulation - c#

I have a simulation engine that I would like to parallelize first and later develop as a web service in C#. This is an intensive simulation that requires a lot of CPU and RAM and I would like to split each run on a separate thread. To give you a better idea the simulation can run 100 runs and for each run I collect some results. It would be straightforward to collect the results from each run and then collate them into one big file. So if I have a multi-core machine with 4 cores for example the idea is to run 4 runs on each core and then another 4 ... etc. I have read a few things about Parallel Extensions in the newer version of .net. Could I achieve the same things in 3.5 or would it be better to move to 4.0? Also anything to watch out if I make this a web service? Any further ideas or suggestions are more than welcome.

You would be better off moving to 4.0 and using the TPL. That way you could create a Task<> to run each simulation and have the TPL scheduler schedule them appropriately as resources become available. As the runs finish you could put the results into a ConcurrentCollection<> and once everything had finished run a collation on them (you could even have another Task collating while the others were running if this turned out to be important to you.
In 3.5 much of the scheduling work would be left to you and the APIs aren't as clean for creating tasks. You'd also not have any of the concurrent collections which might make result collation a lot simpler (never underextimate the complexity of writing a concurrent collection thats both correct and performant).
If you make this a web service then you have to understand the usage of the service and how that will effect the web service. Essentially you can improve individual request latency but this may come at the cost of a degredation in overall throughput. See the following link for a discussion of this.
http://blogs.msdn.com/b/pfxteam/archive/2010/02/08/9960003.aspx

Related

Azure Web App. Free is faster than Basic and Standard?

I have a C# MVC application with a WCF service running on Azure. First of it was of course hosted on the free version, but as I had that one running smoothly I wanted to try and see how it ran on either Basic or Standard, which as far as I know should be dedicated servers.
To my surprise the code ran significantly slower once it was changed from Free to either Standard or Basic. I chose the smallest instance, but still expected them to perform better than the Free option?
From my performance logging I can see that the code that runs especially slow is something that is started as async from Task.Run. Initially it was old school Thread.Start() but considered whether this might spawn it in some lower priority thread and therefore changed it to Task.Run - without this changing anything - so perhaps it has nothing to do with it - but it might, so now you know.
The code that runs really slow basically works on some XML document, through XDocument, XElement etc. It loops through, has some LINQ etc. but nothing too fancy. But still it is 5-10 times slower on Basic and Standard as on the Free version? For the exact same request the Free version uses around 1000ms where as Basic and Standard uses 8000-10000ms?
In each test I have tried 5-10 times but without any decrease in response-times. I thought about whether I need to wait some hours before the Basic/Standard is fully functional or something like that, but each time I switch back, the Free version just outperforms it from the get-go.
Any suggestions? Is the Free version for some strange reason more powerful than Basic or Standard or do I need to configure something differently once I get up and running on Basic or Standard?
The notable difference between the Free and Basic/Standard tiers is that Free uses an undisclosed number of shared cores, whereas Basic/Standard has a defined number of CPU cores (1-4 based on how much you pay). Related to this is the fact that Free is a shared instance while Basic/Standard is a private instance.
My best guess based on this that since the Free servers you would be on house multiple different users and applications, they probably have pretty beef specs. Their CPUs are probably 8-core Xeons and there might even be multiple CPUs. Most likely, Azure isn't enforcing any caps but rather relying on quotas (60 CPU minutes / day for the Free tier) and overall demand on the server to restrict CPU use. In other words, if your site is the only one that happens to be doing anything at the moment (unlikely of course, but for the sake of example), you could be potentially utilizing all 8+ cores on the box, whereas when you move over to Basic/Standard you are hard-limited to 1-4. Processing XML is actually very CPU heavy, so this seems to line up with my assumptions.
More than likely, this is a fluke. Perhaps your residency is currently on a relatively newly provisioned server that hasn't been fill up with tenants yet. Maybe you just happen to be sharing with tenants that aren't doing much. Who knows? But, if the server is ever actually under real load, I'd imagine you'd see a much worse response time on the Free tier than even Basic/Standard.

Force simultaneous threads/tasks for C# load testing app?

Question:
Is there a way to force the Task Parallel Library to run multiple tasks simultaneously? Even if it means making the whole process run slower with all the added context switching on each core?
Background:
I'm fairly new to multithreading, so I could use some assistance. My initial research hasn't turned up much, but I also doubt I know what exactly to search for. Perhaps someone more experienced with multithreading can help me better understand TPL and/or find a better solution.
Our company is planning on deploying a piece of software to all users' machines that will connect to a central server a few times a day, and synchronize some files and MS Access data back to the user's machine. We would like to load-test this concept first and see how the Access DB holds up to lots of simultaneous connections.
I've been tasked with writing a .NET application that behaves like the client app (connecting & syncing with a network location), but does this on multiple threads simultaneously.
I've been getting familiar with the Task Parallel Library (TPL), as this seems like the best (newest) way to handle multithreading, and get return values back from each thread easily. However as I understand it, TPL decides how to run each "task" for the fastest execution possible, splitting the work among the available cores. So lets say I want to run 30 sync jobs on a 2-core machine... the TPL would run 15 on each core, sequentially. This would mean my load test would only be hitting the Access DB with at most 2 connections at the same time. I want to hit the database with lots of simultaneous connections.
You can force the TPL to do this by specifying TaskOptions.LongRunning. According to Reflector (not according to the docs, though) this always creates a new thread. I consider relying on this safe production use.
Normal tasks will not do, because they don't guarantee execution. Setting MinThreads is a horrible solution (for production) because you are changing a process global setting to solve a local problem. And still, you are not guaranteed success.
Of course, you can also start threads. Tasks are more convenient though because of error handling. Nothing wrong with using threads for this use case.
Based on your comment, I think you should reconsider using Access in the first place. It doesn't scale well and has problems once the database grows to a certain size. Especially if this is simply served off some file share on your network.
You can try and simulate load from your single machine but I don't think that would be very representative of what you are trying to accomplish.
Have you considered using SQL Server Express? It's basically a de-tuned version of the full-blown SQL Server which might suit your needs better.

Any Good Patterns For Distributed Parallelism?

I've got a for loop I want to parallelize with something like PLINQ's Parallel.ForEach().
The key here is that the C++ library i'm calling to do the computation is decidedly not thread safe, therefore, any plans to parallelize this need to do so across multiple processes.
I was thinking about using WCF to create a "distributor" process to which the "client" and multiple "calculators" could both connect and add/remove items to/from a queue and then the "calculator" sends the results directly back to the client which could update the gui as it receives them. This architecture would allow me to bring as many "calculators" online as I have processors and as I see it even bring them up across multiple computers creating a potential farm of processing power to which all the clients could share.
I'm just wondering if anyone has had any experience doing this and if there are existing application blocks or frameworks that I can use to build this for me. PLINQ does it within the process. is there like a DPLINQ (distributed) or something?
Also if that doesn't exist, does anybody want to give an opinion on my proposed architecture? Any obvious pitfalls? Does anyone think it will work!?!?!?
Sounds like you could be looking for Dryad. It's a Microsoft research project right now, but they do have an "academic release" available. My understanding is that they are also in the process of better productizing it (probably some kind of integration with Azure) for RTM sometime near the end of 2011. Mary Jo Foley covers more about this here.
A long time standard for controlling/dispatching distributed work is MPI. I've only ever used it from C++, but implementations from many languages exist. A quick google suggests that MPI.Net could be a good implementation for .Net!

Is Threading Necessary/Useful?

Basically, I'm wondering if threading is useful or necessary, or possibly more specifically the uses and situations in which you would use it. I don't know much about threading, and have never used it (I primarily use C#) and have wondered if there are any gains to performance or stability if you use them. If anyone would be so kind to explain, I would be grateful.
In the world of desktop applications (my domain), threading is a vital construct in creating responsive user interfaces. Whenever a time-or-computationally-intensive operation needs to run, it's almost essential to run that operation in a separate thread. Otherwise, the user interface locks up and, in some cases, Windows will decide that the whole application has become unresponsive.
Threading is also a vital tool in animation, audio and communications. Basically, any situation in which you find yourself needing to do several things at once lends itself to the use of threads.
there is definitely no gains to stability :). I would suggest you get a basic understanding of threading but don't jump to use it in any real production application until you have a real need. you have C# so not sure if you are building websites or winforms.
Usually the firsty threading use case for winforms is when a user click a button and you want to run some expensive operation (database or webservice call) but you dont want the screen to freeze up . .
a good tutorial to deal with that situation is to look at the backgroundworker class in c# as this will give you a first flavor into this space and then you can go from there
There was a time when our applications would speed up when we deploy them on new CPU. And that speed up was by large extent because CPU speed (clock) was incremented by large factors.
But several years ago, CPU manufacturers stopped increasing CPU clocks because of physical limits (e.g. heat dissipation). And instead they started adding additional cores to CPUs.
Now, if your application runs only on one thread it cannot take advantage of complete CPU (e.g. of 4 cores it uses only 1).
So today to fully utilize CPU we must take effort and divide task on multiple treads.
For ASP.NET this is already done for us by ASP.NET architecture and IIS.
Look here The Free Lunch Is Over: A Fundamental Turn Toward Concurrency in Software
Here is a simple example of how threading can improve performance. You have a n numbers that all needed to be added together. In a single threaded application, it will take a n time units to add all of the numbers together for the final sum. However, if you broke your numbers into 2 groups, you could have the same operation running side by side with, each with a group of n/2 numbers. Each would take n/2 time units to find their respective sums, and then an additional unit to find the full sum. By creating two threads, you have effectively cut the compute time in half.
Technically on a single core processor, there is no such thing as multi-threading, just the illusion that multiple tasks are happening in parallel since each task gets a small amount of time.
However, that being said, threading is very useful if you have to do some work that takes a long time but you want your application to be responsive (i.e. be able to do other things) while you wait for that task to finish. A good example is GUI applications.
On multi-core / multi-processor systems, you can have one process doing many things at once so the performance gain there is obvious :)

Multi-threading access to MapPoint?

Good afternoon,
As I said earlier in another post, I have to calculate some 8,000,000 shortest- time/path distances between some points in the map, the coordinates of which are know. The problem is that, while straight-line distances were easy (and quick) to calculate, someone told me that a single-threaded application can have problems calculating this number of distances using MapPoint. The question is that I know nothing about multi-threading... I am currently working on a i7 - 720QM environment, so I would like to use all the 4 cores to make these calculations... Is there any easy way of doing this in C# or C++?
Thank you very much.
If you are totally new to the Multithreading than my advice start with BackGroundWorker component as a starting point and gradually switch to more garnular threading concepts.
and if you are using ..net 4.0 than Task Parallel Library gives you easy way to start with.
See Links Below
TPL
BackGroundWorker
That might have been me who said it would take a long time. MapPoint's COM API is single threaded. The way to get it to compute multiple routes in parallel is to start multiple MapPoint's, each on its own thread.
So for your quad core, you will start 2-3 threads. Each thread starts its own MapPoint, and then uses it for routing. You will NOT have one MapPoint per core. As well OS overhead and your I/O overhead, if you watch a single MapPoint compute a route, you will find that later versions are partially internally multi-threaded and can take about 1.5 cores if they are available.
There are also a lot of gotchas to watch out for. MapPoint's own garbage collection is not optimized for batch route calculation. The easiest workaround for this is to simply restart each MapPoint application at periodic intervals (at least once a day but probably more frequently).
Also, some operations (File Open seems to be the main one) cannot be called by multiple MapPoints at once. Probably because they are trying to open the same file, but I have not investigated further. You will need to implement your own locking mechanism to avoid this.
Saurabh's advice for .NET 4 sounds good: I have yet to use .NET 4's multi-threading in anger - my MapPoint/.NET threading experience is with .NET 2.
I don't know what your app is, but did you know that I sell a product that uses multi-processor MapPoint for batch route distance/time calculation... :-)

Categories

Resources