We developed windows services with using threads to consume database records(generally .Net 2.0).Code block under the show.
for(int i=0;i<ThreadCount;i++)
{
ParameterizedThreadStart pts=new ParameterizedThreadStart(MyCode.DoWork)
Thread t=new Thread(pts);
t.Start(someObject);
}
ThreadCount read from app.config.MyCode.DoWork(Object someObject) code block select with some data in SQL server,and some operations.In addition to we call sp,and the query contains with(rowlock) vs.
while(someObject.Running)
{
*/select some data
}
Main question is how to improve my windows service.Some article related to manual creating thread increase CPU cost vs.So how to improve my app performance.If i use Parallel Task Library bring to any advantage.Create a task instead of create thread.Does Task create thread manually to look available CPU count,or i convert to like.
for(int i=0;i<ThreadCount;i++){
Task t=new Task(=>MyCode.Work());
}
To improve the performance of your application, you need to find out where the performance is lacking and then why it's lacking. Unfortunately we can't do that for you, because it needs access to the running C# and SQL code.
I suggest doing some performance profiling, either with a profiler tool (I use Redgate's tool) or by adding profiling code to your application. Once you can see the bottlenecks, you can make a theory about what's causing them.
I would start with the database first - look at the cached execution plans to see if there are any clues. Try running the stored procedures separately from the service.
to make things simpler and assuming the windows service is the only client accessing the database.
1. try to access the database from one thread. this reduces lock contention on db tables assuming there is contention in accessing data.
2. put the data retrieved into a queue and have TPL tasks processing the data to utilize all CPU cores assuming your performance bottle neck is the cpu
3. and then buy as many cpu core as you can afford.
This is an intuitive pattern and there are many assumptions in it. You'll need to profile and analyze your program to know whether this is suitable.
Related
I have Windows Form application.
(C# with visual studio 2010, Framework : 4.0)
Database: .db (file database), connection object through SQLite
Thread th = new Thread(() => database(node, xNodetcname, xNodesqlquery1, xNodeHint, e, inXml));
th.Name = thread;
th.Start();
Above code create each thread and processing parallel on database() function.
Each thread having one SQL Query which fetching data from database.
While I not use Multithreading the performance is better
but when I use Multithreading the performance is down.
Example:
Without Multithreading 3 query processing time = 1.5 minutes
With Multithreading 3 query processing time = 1.9 minutes.
My aim is to reduce the processing time of query.
Then generally stay away from threads.
Some basic edutation: In most cases database performance is limited by IO, not CPU. IO can partialyl be mitigated by using a lot of memory as buffers, hence large database servers have TONS of memory.
You run a small leightweight databasde. It likely is not running on a database server level hardware or a SSD - so it has a IO problem. Perforamcne will be limited by IO.
Now multiple threads make sure that the IO side (hard disc) is inefficieent, especialyl because a winform app is not running normally on a high end IO subsystem.
Ergo: if you want a faster query then:
Optimize the query. MIssing index?
Get proper hardware and / or upgrade to a heaver setup. A SSD is a great help here.
Do not use multi threading - try to solve it on SQL level but accept that this may not be possible. THere is a reason companies still actually use real database servers to handle large data amounts. And SQLite may not be a good option - no chance saying it is or not as you totally ignore that side in your information.
There are a few things you have to take into consideration here:
Is your db connection in Multi-Threading mode(As described in this document)?
Is the database engine suitable for multithreading (hint, SQLite is not, see #TomTom answer)
Are you using a thread pool(you are not) or are you initializing a new thread every time (which is rather slow)
Question:
Is there a way to force the Task Parallel Library to run multiple tasks simultaneously? Even if it means making the whole process run slower with all the added context switching on each core?
Background:
I'm fairly new to multithreading, so I could use some assistance. My initial research hasn't turned up much, but I also doubt I know what exactly to search for. Perhaps someone more experienced with multithreading can help me better understand TPL and/or find a better solution.
Our company is planning on deploying a piece of software to all users' machines that will connect to a central server a few times a day, and synchronize some files and MS Access data back to the user's machine. We would like to load-test this concept first and see how the Access DB holds up to lots of simultaneous connections.
I've been tasked with writing a .NET application that behaves like the client app (connecting & syncing with a network location), but does this on multiple threads simultaneously.
I've been getting familiar with the Task Parallel Library (TPL), as this seems like the best (newest) way to handle multithreading, and get return values back from each thread easily. However as I understand it, TPL decides how to run each "task" for the fastest execution possible, splitting the work among the available cores. So lets say I want to run 30 sync jobs on a 2-core machine... the TPL would run 15 on each core, sequentially. This would mean my load test would only be hitting the Access DB with at most 2 connections at the same time. I want to hit the database with lots of simultaneous connections.
You can force the TPL to do this by specifying TaskOptions.LongRunning. According to Reflector (not according to the docs, though) this always creates a new thread. I consider relying on this safe production use.
Normal tasks will not do, because they don't guarantee execution. Setting MinThreads is a horrible solution (for production) because you are changing a process global setting to solve a local problem. And still, you are not guaranteed success.
Of course, you can also start threads. Tasks are more convenient though because of error handling. Nothing wrong with using threads for this use case.
Based on your comment, I think you should reconsider using Access in the first place. It doesn't scale well and has problems once the database grows to a certain size. Especially if this is simply served off some file share on your network.
You can try and simulate load from your single machine but I don't think that would be very representative of what you are trying to accomplish.
Have you considered using SQL Server Express? It's basically a de-tuned version of the full-blown SQL Server which might suit your needs better.
I have a C# service application which interacts with a database. It was recently migrated from .NET 2.0 to .NET 4.0 so there are plenty of new tools we could use.
I'm looking for pointers to programming approaches or tools/libraries to handle defining tasks, configuring which tasks they depend on, queueing, prioritizing, cancelling, etc.
There are various types of services:
Data (for retrieving and updating)
Calculation (populate some table with the results of a calculation on the data)
Reporting
These services often depend on one another and are triggered on demand, i.e., a Reporting task, will probably have code within it such as
if (IsSomeDependentCalculationRequired())
PerformDependentCalculation(); // which may trigger further calculations
GenerateRequestedReport();
Also, any Data modification is likely to set the Required flag on some of the Calculation or Reporting services, (so the report could be out of date before it's finished generating). The tasks vary in length from a few seconds to a couple of minutes and are performed within transactions.
This has worked OK up until now, but it is not scaling well. There are fundamental design problems and I am looking to rewrite this part of the code. For instance, if two users request the same report at similar times, the dependent tasks will be executed twice. Also, there's currently no way to cancel a task in progress. It's hard to maintain the dependent tasks, etc..
I'm NOT looking for suggestions on how to implement a fix. Rather I'm looking for pointers to what tools/libraries I would be using for this sort of requirement if I were starting in .NET 4 from scratch. Would this be a good candidate for Windows Workflow? Is this what Futures are for? Are there any other libraries I should look at or books or blog posts I should read?
Edit: What about Rx Reactive Extensions?
I don't think your requirements fit into any of the built-in stuff. Your requirements are too specific for that.
I'd recommend that you build a task queueing infrastructure around a SQL database. Your tasks are pretty long-running (seconds) so you don't need particularly high throughput in the task scheduler. This means you won't encounter performance hurdles. It will actually be a pretty manageable task from a programming perspective.
Probably you should build a windows service or some other process that is continuously polling the database for new tasks or requests. This service can then enforce arbitrary rules on the requested tasks. For example it can detect that a reporting task is already running and not schedule a new computation.
My main point is that your requirements are that specific that you need to use C# code to encode them. You cannot make an existing tool fit your needs. You need the turing completeness of a programming language to do this yourself.
Edit: You should probably separate a task-request from a task-execution. This allows multiple parties to request a refresh of some reports while at the same time only one actual computation is running. Once this single computation is completed all task-requests are marked as completed. When a request is cancelled the execution does not need to be cancelled. Only when the last request is cancelled the task-execution is cancelled as well.
Edit 2: I don't think workflows are the solution. Workflows usually operate separately from each other. But you don't want that. You want to have rules which span multiple tasks/workflows. You would be working against the system with a workflow based model.
Edit 3: A few words about the TPL (Task Parallel Library). You mentioned it ("Futures"). If you want some inspiration on how tasks could work together, how dependencies could be created and how tasks could be composed, look at the Task Parallel Library (in particular the Task and TaskFactory classes). You will find some nice design patterns there because it is very well designed. Here is how you model a sequence of tasks: You call Task.ContinueWith which will register a continuation function as a new task. Here is how you model dependencies: TaskFactory.WhenAll(Task[]) starts a task that only runs when all its input tasks are completed.
BUT: The TPL itself is probably not well suited for you because its task cannot be saved to disk. When you reboot your server or deploy new code, all existing tasks are being cancelled and the process aborted. This is likely to be unacceptable. Please just use the TPL as inspiration. Learn from it what a "task/future" is and how they can be composed. Then implement your own form of tasks.
Does this help?
I would try to use the state machine package stateless to model the workflow. Using a package will provide a consistent way to advance the state of the workflow, across the various services. Each of your services would hold an internal statemachine implementation, and expose methods for advancing it. Stateless will be resposible for triggering actions based on the state of the workflow, and enforce you to explicitly setup the various states that it can be in - this will be particularly useful for maintenance, and it will probably help you understand the domain better.
If you want to solve this fundamental problem properly and in a scalable way, you should probably look as SOA architecture style.
Your services will receive commands and generate events you can handle in order to react on facts happen in your system.
And, yes, there are tools for it. For example NServiceBus is a wonderful tool to build SOA systems.
You can do a SQL data agent to run SQL queries in timed interval. You have to write the application yourself it looks like. Write like a long running program that checks the time and does something. I don't think there is clearcut tools out there to do what you are trying to do. Do C# application, WCF service. data automation can be done in the sql itself.
If I understand you right you want to cache the generated reports and do not the work again. As other commenters have pointed out this can be solved elegantly with multiple Producer/Consumer queues and some caches.
First you enqueue your Report request. Based on the report genration parameters you can check the cache first if a previously generated report is already available and simply return this one. If due to changes in the database the report becomes obsolete you need to take care that the cache is invalidated in a reliable manner.
Now if the report was not generated yet you need need to schedule the report for generation. The report scheduler needs to check if the same report is already beeing generated. If yes register an event to notify you when it is completed and return the report once it is finished. Make sure that you do not access the data via the caching layer since it could produce races (report is generated, data is changed and the finished report would be immediatly discared by the cache leaving noting for you to return).
Or if you do want to prevent to return outdated reports you can let the caching layer become your main data provider which will produce as many reports until one report is generated in time which was not outdated. But be aware that if you have constant changes in your database you might enter an endless loop here by constantly generating invalid reports if the report generation time is longer as the average time between to changes to your db.
As you can see you have plenty of options here without actually talking about .NET, TPL, SQL server. First you need to set your goals how fast/scalable and reliable your system should be then you need to choose the appropriate architecture-design as described above for your particular problem domain. I cannot do it for you because I do not have your full domain know how what is acceptable and what not.
The tricky part is the handover part between different queues with the proper reliability and correctness guarantees. Depending on your specific report generation needs you can put this logic into the cloud or use a single thread by putting all work into the proper queues and work on them concurrently or one by one or something in between.
TPL and SQL server can help there for sure but they are only tools. If used wrongly due to not sufficient experience with the one or the other it might turn out that a different approach (like the usage of only in memory queues and persisted reports on in the file system) is better suited for your problem.
From my current understanding I would not use SQL server to misuse it as a cache but if you want a database I would use something like RavenDB or RaportDB which look stable and much more light weight compared to a full blown SQL server.
But if you already have a SQL server running then go ahead and use it.
I am not sure if I understood you correctly, but you might want to have a look at JAMS Scheduler: http://www.jamsscheduler.com/. It's non-free, but a very good system for scheduling depending tasks and reporting. I have used it with success at my previous company. It's written in .NET and there is a .NET API for it, so you can write your own apps communicating with JAMS. They also have a very good support and are eager to implement new features.
I have a Windows Service which executes large number of tasks in parallel using the Task Parallel Library (TPL). This is about to be extended to handle tasks which interact with an SQL Server on an external server.
TPL is supposed to be good at measuring load and assigning the right number of parallel threads to the tasks. Is there a way to make it aware of load to the external SQL Server instance? The actual code to run for each task on the local server is quite small, but the calls to the database can be quite heavy.
Am I not likely to end up with my service bogging down the database with request because TPL sees that the local server has loads of free resources, or is there a known way to handle this?
There's is nothing native to TPL that will help you with this. TPL is about managing/maximizing the CPU load of your local application. It has no idea about SQL load, let alone on another machine.
That said, if you wanted to get crazy, there is an extensibility point called the TaskScheduler. You could theoretically implement a custom TaskScheduler that can watch the load on the SQL server and only schedule tasks to execute if that load is at some defined threshold.
Honestly though, I don't think it's the right solution to the problem. Managing load against a shared resource like a SQL server is a completely different beast than what TPL is designed to solve. You'd be much better off just making sure you design your application such that it doesn't abuse the SQL server in its own right by load testing, finding a sweet spot and configuring your application not go out outside those bounds. From there it would be up to your DBA to determine the right solution for the SQL server infrastructure itself to manage that application's needs along with any other external load.
If you parallelise the data access functionality in your client application, you will find that the next bottleneck is the SQL Server connection pool.
TPL is good at partitioning your data, as for measuring load, that task is for your OS to determine (and in fact its pretty good at it). Therefore this is more of a configuration question then a development question. (Does your SQL Server instance have a higher priority then your service?).
I have a simulation engine that I would like to parallelize first and later develop as a web service in C#. This is an intensive simulation that requires a lot of CPU and RAM and I would like to split each run on a separate thread. To give you a better idea the simulation can run 100 runs and for each run I collect some results. It would be straightforward to collect the results from each run and then collate them into one big file. So if I have a multi-core machine with 4 cores for example the idea is to run 4 runs on each core and then another 4 ... etc. I have read a few things about Parallel Extensions in the newer version of .net. Could I achieve the same things in 3.5 or would it be better to move to 4.0? Also anything to watch out if I make this a web service? Any further ideas or suggestions are more than welcome.
You would be better off moving to 4.0 and using the TPL. That way you could create a Task<> to run each simulation and have the TPL scheduler schedule them appropriately as resources become available. As the runs finish you could put the results into a ConcurrentCollection<> and once everything had finished run a collation on them (you could even have another Task collating while the others were running if this turned out to be important to you.
In 3.5 much of the scheduling work would be left to you and the APIs aren't as clean for creating tasks. You'd also not have any of the concurrent collections which might make result collation a lot simpler (never underextimate the complexity of writing a concurrent collection thats both correct and performant).
If you make this a web service then you have to understand the usage of the service and how that will effect the web service. Essentially you can improve individual request latency but this may come at the cost of a degredation in overall throughput. See the following link for a discussion of this.
http://blogs.msdn.com/b/pfxteam/archive/2010/02/08/9960003.aspx