improve perfomance of a REST Service [closed] - c#

It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center.
Closed 10 years ago.
I have method which calls a stored procedure 300 times in a for loop and each time the stored procedure returns me 1200 records. How can i improve this ? I cannot eliminate the 300 calls but is there any otherways i can try out. I am using REST service impletemented through ASP.NET and using IBATIS for database connectivity

I cannot eliminate the 300 calls
Eliminate the 300 calls.
Even if all you can do is to just add another stored procedure which calls the original stored procedure 300 times, aggregating the results, you should see a massive performance gain.
Even better if you can write a new stored procedure that replicates the original functionality but is structured more appropriately for your specific use case, and call that, once, instead.
Making 300 round trips between your code and your database quite simply is going to take time, even where the code and the database are on the same system.
Once this bit of horrible is resolved, there will be other things you can look to optimise, if required.

Measure.
Measure the amount of time spent inside the server-side code. Measure the amount of that time that is spent in the stored procedure. Measure the amount of time spent at the client part. Do some math, and you have a rough estimate for network time and other overheads.
Returning 1200 records, I would expect network bandwidth to be one of the main issues; you could perhaps investigate whether a different serialization engine (with the same output type) might help, or perhaps whether adding compression (gzip / deflate) support would be beneficial (meaning: reduced bandwidth being more important than the increased CPU required).
Latency might be important if you are calling the REST service 300 times; maybe you can parallelize slightly, or make fewer big calls rather than lots of small calls.
You could batch the SQL code, so you only make a few trips to the DB (calling the SP repeatedly in each) - that is perfectly possible; just use EXEC etc (still using parameterization).
You could look at how you are getting the data from ADO.NET to the REST layer. You mention IBATIS, but have you checked whether this is fast / slow compared to, say, "dapper" ?
Finally, the SP performance itself can be investigated; indexing or just a re-structuring of the SP's SQL may help.

Well, if you have to return 360,000 records, you have to return 360,000 records. But do you really need to return 360,000 records? Start there and work your way down.

Without knowing too much of the details, the architecture appears flawed. On one hand its considered unreasonable to lock the tables for the 6 seconds it takes to retrieve the 360,000 records using a single S.P. execution, but it fine to return a possibly inconsistent set of 360,000 records that are retrieved via multiple S.P. executions. It makes me wonder what exactly are you trying to implement and if there is a better way to design the integration between the client and the server.
For instance, if the client is retrieving a set of records that have been created since the last request, then maybe a paged ATOM feed would be more appropriate.
What ever it is you are doing, 360,000 records is a lot of data to move between the server and the client and we should be looking at the architecture and purpose of that data transfer to make sure the current approach is appropriate.

Related

EntityFramework VS SQL Stored PROCEDURE

I have a table with more than 10,000,000 Rows.
I need some filters (some in queries and some like queries) and dynamic order by
I wondered what is the best way to work with big data, Pagination, Filtering and ordering.
Of course its easy to work with entity framework, But I think the performance better on stored procedure
I have a table with more than 10,000,000 Rows.
You have a small table, nearly tiny, small enough to have no problems for anyone now abusing the server.
Seriously.
I wondered what is the best way to work with big data,
That starts with HAVING big data. That is generally defined a smultiple times RMA of a low cost server. Which today has around 16 cores and around 128gbm memory. After that it gets expensive.
General rules are:
DO NOT PAGE. Paging at the start is easy, but gtting to the end results is slow - either you precalculate the pages and store them OR you ahve to reexecute queries just to throw away results. It works nice on page 1-2, then it gets slower.
Of course its easy to work with entity framework, But I think the
performance better on stored procedure
And why would that be? The overhead of generating the query is tiny, and contrary to often repeated delusions - SQL Server uses query plan caching for everything. A SP is faster - if the compilation overhead is significant (i.e. SMALL data), or if you otherwise pull a lot of data over a network in order to send results back (processing only in database).
For anyhing else the "general" performance impact is close to zero.
OTOH it allows you to send much more tailored SQL without geting into really bad and ugly stored procedures - that either issue dynamic SQL internally, or have tons of complex conditions for optional parameters.
What to be careful with:
IN clauses can be terrible for performance. DO not put hundreds of elements in there. IF you need that - a SP and a table variable that prepares and is joined is the better way.
As I said - careful with paging. Someone asking for page 100 and just pressing forward is repeating a TON of processing.
And: Attitude adjustment. The time where 10 million rows where large are around 20 years ago.

XML vs Array Performance [closed]

It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center.
Closed 9 years ago.
From a performance standpoint, it is more beneficial to read large amounts of data from an XML file or by looping through an array?
I have around 2,000 datasets I need to loop through and do calculations with, so I'm just wondering if it would be better to import all XML data and process it as an array (single large import) or to import each dataset sequentially (many small imports).
Thoughts and suggestions?
If I have interpreted your question correctly, you need to load 2,000 sets of data from one file, and then process them all. So you have to read all the data and process all the data. At a basic level there is the same amount of work to do.
So I think the question is "How can I finish the same processing earlier?"
Consider:
How much memory will the data use? If it's going to be more than 1.5GB of RAM, then you will not be able to process it in a single pass on a 32-bit PC, and even on 64-bit PCs you're likely to see virtual memory paging killing performance. In either of these cases, streaming the data in smaller chunks is a necessity.
Conversely if the data is small (e.g. 2000 records might only be 200kB for all I know), then you may get better I/O performance by reading it in one chunk, or it will load so fast compared to the processing time that there is no point trying to optimise it.
Are the records independent? (so they don't need to be processed in a particular order, and you don't need one record present in memory in order to process another one) If so, and if the loading time is significant overall, then the "best" approach may be to parallelise the operation - If you can process some data while you are loading more data in the background, you will utilise the hardware better and do the same work in less time. So you probably want to consider splitting your loading and processing onto different threads.
But spreading the processing onto many threads might not help you if loading takes much longer than processing, as your processing threads may be starved of data while waiting for I/O - so using 1 processing thread may be just as fast as using 3 or 7. And there's no point in creating more threads than you have available CPU cores. If going multithreaded, I'd write it to use a configurable/dynamic number of threads and then do some testing to determine what the optimum approach will be.
But before you consider all of that, you might want to consider writing a brute force approach and see what the performance is like. Do you even need to optimise it?
And if the answer is "yes, I desperately need to optimise it", then can you reconsider the data format? XML is a very useful but grossly inefficient format. If you have a performance critical case, is there anything you can do to reduce the XML size (e.g. simply using shorter element names can make a massive difference on large files), or even use a much more compact and easily read binary format?

C# Multithread, which performs the best? [closed]

It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center.
Closed 9 years ago.
I'm currently writting an application that make a huge lot of call to slow webservices (I had no say in that pattern) that produce little output.
I'd like to make like 100 parallel calls (I know real parallelism can only go as far as you have cores).
But I was wondering if they were performance differences between different approach.
I'm hesitating between:
Using Task.Factory.StartNew in a loop.
Using Parallel.For.
Using BackgroundWorker.
Using AsyncCallback.
...Others?
My main goal, is to have as many webservices calls being started as quickly as possible.
How should I proceed?
From a performance standpoint it's unlikely to matter. As you yourself have described, the bottleneck in your program is a network call to a slow performing web service. That will be the bottleneck. Any differences in how long it takes you to spin up new threads or manage them is unlikely to matter at all due to how much they will be overshadowed by the network interaction.
You should use the model/framework that you are most comfortable with, and that will most effectively allow you to write code that you know is correct. It's also important to note that you don't actually need to use multiple threads on your machine at all. You can send a number of asynchronous requests to the web service all from the same thread, and even handle all of the callbacks in the same thread. Parallelizing the sending of requests is unlikely to have any meaningful performance impact. Because of this you don't really need to use any of the frameworks that you have described, although the Task Parallel Library is actually highly effective at managing asynchronous operations even when those operations don't represent work in another thread. You don't need it, but it's certainly capable of helping.
According to your advices I used Async (with I/O event) while I was previously using TLP.
Async really does outperform Sync + Task usage.
I can now launches 100 requests (almost?) at the same time and if the longest running one takes 5 seconds, the whole process will only last 7 seconds while when using Sync + TLP it took me like 70 seconds.
In conclusion, (auto generated) Async is really the way to go when consuming a lot of webservices.
Thanks to you all.
Oh and by the way, this would not be possible without:
<connectionManagement>
<add address="*" maxconnection="100" />
</connectionManagement>

Best practice - load a lot of stuff in the application_start? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 2 years ago.
Improve this question
I have a webshop with a lot of products and other content. Currently I load all content in to a global list at Application_Start, which takes aprox 15-25 seconds.
This makes the the site really fast, as I can get any product/content in O(1) time.
However, is this best practice?
Currently I got a webhotel which is not a VPS / Dedicated server, so it recycles the application from time to time, which gives random visitors load times up to 15-25 seconds (only to become a bigger number with more content). This is of course totally unacceptable, but I guess it would be solved with a VPS.
What is the normal way of doing this? I guess a webshop like Amazon probably don't load all their products into a huge list :-D
Any thoughts and ideas would be highly appreciated.
It looks like you've answered your question for your case "This is of course totally unacceptable".
If your goal O(1) normal request to database for single product is likely O(1) unless you need to have complicated joins between products. Consider trying to drop all your pre-caching logic and see if you have problem with performance. You can limit startup impact by lazy caching instead.
Large sites often use distributed caching like MemcaheD.
A more scalable setup is to set up a web service to provide the content, which the website calls when it needs it. The web service will need to cache frequently needed content to achieve fast response times.
First of all, 15-20 seconds to load data is too much time, so I suspect this cases
This time is for compile and not the data load
The data is too much and you full the memory
The method that you use to read data is very slow
The data storage is too slow, or the struct is on text file and the read of it is slow
My opinion is that you cache only small amount of data that you need to use it too many times in short time. The way you describe it is not good practice for some reasons.
If you have many pools you read the same data on all pools and you spend memory for no reason.
The data that you cache you can not change them - is for read only
Even if you cache some data then you need to render the page, and there is where you actually need to make the cache, on the final render, not on data.
What and how to cache.
We cache the final render page.
We also set cache for the page and other elements to the client.
We read and write the data from database as they come and we left the database do the cache, he knows better.
If we cache data then they must be small amount that needed to be used for long loop and we avoid the database call many times.
Also we cache as they ask for it, and if not used for long time, or the memory need space this part of cache gone away. If some part of the data come from complex combinations of many tables then we make a temporary flat big table that keep all the data together, every one in a row. This table are temporary and if we needed too much we make a second temporary database file that we keep this part of the data.
How fast is the database read ? Well is so fast that you not need to worry about that, you need to check other point of delays, like as I say the full render of a page, or some parts of the page.
What you need to worry about is a good database design, a good and fast way to retrieve your data, and a good optimize code to show them.
Separation of responsibilities will help you scale for the future.
With your current setup, you are limited to the resources of your web server, and, like you said, your start up times will grow out of control as you continue adding more products.
If you share the burden of each page request with SQL Server, you open up your application to allow it to scale as needed. Over time, you may decide to add more web servers, cluster SQL Server, or switch to a new database back-end altogether. However, if all the burden is on the application pool, then you are drastically limiting yourself.

What to use for high-performance server coding? [closed]

It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center.
Closed 11 years ago.
I need to build a server to accept client connection with a very high frequency and load (each user will send a request each 0.5 seconds and should get a response in under 800ms, I should be able to support thousands of users on one server). The assumption is that the SQL Server is finely tuned and will not pose a problem. (assumption that of course might not be true)
I'm looking to write a non-blocking server to accomplish this. My back end is an SQL Server which is sitting on another machine. It doesn't have to be updated live - so I think I can cache most of the data in memory and dump it to the DB every 10-20 seconds.
Should I write the server using C# (which is more compatible with SQL Server)? maybe Python with Tornado? What should be my considerations when write a high-performance server?
EDIT: (added more info)
The Application is a game server.
I don't really know the actual traffic - but this is the prognosis and the server should support it and scale well.
It's hosted "in the cloud" in a Datacenter.
Language doesn't really matter. Performance does. (a Web service can be exposed on the SQL Server to allow other languages than .NET)
The connections are very frequent but small (very little data is returned and little computations are necessary).
It should hold most of the data in the memory for fastest performance.
Any thoughts will be much appreciated :)
Thanks
Okay, if you REALLY need high performance, don't go for C#, but C/C++, it's obvious.
In any case, the fastest way to do server programming (as far as I know) is to use IOCP (I/O Completion Ports). Well, that's what I used when I made a MMORPG server emulator, and it performed faster than the official C++ select-based servers.
Here's a very complete introduction to IOCP in C#
http://www.codeproject.com/KB/IP/socketasynceventargs.aspx
Good luck !
Use the programming language that you know the most. It's a lot more expensive to hunt down performance issues in an large application that you do not fully understand.
It's a lot cheaper to buy more hardware.
People will say C++, because garbage collection in .Net could kill your latency. You could avoid garbage collection though if you were clever, by reusing existing managed objects.
Edit: your assumption about SQL Server is probably wrong. You need to store your state in memory for random access. If you need to persist changes, journal them to the filsystem and consolidate them with the database infrequently
Edit 2: You will have a lot different threads talking to the same data. In order to avoid blocking and deadlocks, learn about lock-free programming (Interlocked.CompareExchange etc)
I was part of a project that included very high-performance server code, which actually included the ability to response with a TCP packet within 12 milliseconds or so.
We used C# and I must agree with jgauffin - a language that you know is much more important than just about anything.
Two tips:
Writing to console (especially in color) can really slow things down.
If it's important for the server to be fast at the first requests, you might want to use a pre-JIT compiler to avoid JIT compilation during the first requests. See Ngen.exe.

Categories

Resources