reducing loading time of 100 pages of google - c#

for my project i need to access entire pages(100) of google at a time for a particular keyword.I used 'for' loop for accessing pages in url written in my c# code.But it is taking more time to access.Some times it showing HttpRequest error.Any way to increase the speed?

Query them in parallel. HTTP is asynchronous by nature, so should be your request code.

In your case, speed is limited by the time it takes to fulfill an I/O request. You can speed up the total task by accessing servers in parallel (i.e. using ThreadPool). A browser will generally use a couple (2-8) parallel I/O requests to a serer, so so could you (for instance usefull if you also need image files or css files referenced by the google result). Since you'll have up to 100 servers, you can do it massively parallel; again a task the Threadpool will help you with.

Related

How to asynchronically download millions of files from a file storage?

Let's assume I have a database managing millions of documents, which are stored on a WebDav or SMB server, which does not support retrieving documents in bulks.
Given a list of (potentially all) document IDs, how do I download the corresponding documents as fast as possible?
Iterating over the list and sequentially downloading them is far too slow.
The 2 options I see is threads and async downloads.
My gut says that async programming should be preferred to threads, because I'm just waiting for IO on the client side. But I am rather new to async programming and I don't know how to do it.
I assume that iterating over the whole list and sending an async download request could potentially lead to too many requests in a very short time leading to rejected requests. So how do I throttle this? Is there a best practice way to do this?
Take a look at this: How to limit the amount of concurrent async I/O? Using a SemaphoreSlim, as suggested in the accepted answer, is an easy and quite good solution.
My personal favorite though for this kind if job is the TPL Dataflow library. You can see here an example of using this library to download pages from the web asynchronously with a configurable level of concurrency, in combination with the HttpClient class. Here is another example.
I also found this great article explaining 4 different ways to limit the number of concurrent downloads.

How to increase/decrease # of threads used(applicationPool) in ASP.NET web API

I have a process that takes a very large amount of memory, it involves manipulating large images. The process is called via get request route, and I currently have a lock on the image creation method. Without the lock, if I send more than 10+ requests at once, the application memory immediately spikes and throws an exception.
[HttpGet]
[Route("example")]
public HttpResponseMessage GetImage([FromUri]ImageParams imageParams){
lock (myLock){
return CreateImage(imageParams);
}
}
Someone mentioned increasing the applicationPool in another question, but I can't figure out how to do it. I think this would be a better alternative to locking, because I could still use a couple of threads to create images, but could limit this so I don't run out of memory. I am under the impression that .NET is using an integrated thread pooling system for each GET request. I am sure from testing that these requests are somehow run in parallel, and it would be helpful to decrease the potential number of threads rather than locking it to one.
Looking at this resource,
https://msdn.microsoft.com/en-us/library/dd560842(v=vs.110).aspx
I've tried adding this element to the System.Web section but it says it is not a valid child element (even though I am running IIS version 10)
I was able to change the aspnet.config file to change the applicationPool number from 0 (default, no limit) to 1 and 3 but this did not yield any different results at all.
Any input would be appreciated, thanks for reading
Edit:
This is a followup from my question at this link, which shows the code where these errors are pointing me and some efforts to analyze..
Diagnosing OutOfMemory issues with Image Processing API in ASP.NET

C# 4 System.Threading.Tasks performance interrogations

I'm currently working on a ASP.NET MVC application with some pages loading a lot of data (repartited in separate LINQ queries).
To increase performance of these pages, i envisage to use the C# 4 Tasks to allow to make queries simultaneouly and, gain execution time.
But I have one major question : from a server side, wich situation is the best :
pages who use Tasks and so, a lot of the server resources in a small amount of time ?
pages who use only synchronous code, less server resource but a bigger amount of time ?
no incidence ?
Performance of my pages is important, but stability of the server is more !
Thank's by advance for your help.
You don't say whether the LINQ queries are CPU bound (e.g. computed in-memory) or IO bound (e.g. reading across the network or from disk).
If they are CPU bound, then using asynchronous code will improve fairness, but reduce throughput - but only so everyone suffers. For example, say you could only process one request at a time, and each request takes 5 seconds. Two requests come in almost at the same time. With synchronous code, the first will complete in 5 seconds while the second is queued, and the second will complete after 10. With asychronous code, both will start together and finish after slightly more than 10 seconds (due to overhead of swapping between the two). This is hypothetical, because you have many threads to process requests concurrently.
In reality, you'll find asynchronous code will only help when you have lots of IO bound operations that take long enough to cause request queuing. If the queue fills up, the server will start issuing Server Unavailable 503 errors. Check your performance counters - if you have little or no requests queued in ASP.NET under typical live loads then don't bother with the additional complexity.
If the work is IO bound, then using asynchronous code will push the bottleneck towards the network/disk. This is a good thing, because you aren't wasting your web server's memory resource on idle blocked request threads just waiting for responses - instead you make request throughput dependent on downstream performance and can focus on optimizing that. i.e. You'll keep those request threads free to take more work from the queue.
EDIT - Nice article on this here: http://blog.stevensanderson.com/2010/01/25/measuring-the-performance-of-asynchronous-controllers/

Calling mutliple services in a method. How to do it effectively?

I have a asp .net web page(MVC) displaying 10,000 products.
For this I am using a method. In that method I have to call an external web service 20 times. This is because the web service gives me 500 data at a time, so to get 10000 data I need to call the service 20 times.
20 calls makes the page load slow. Now I need to increase the performance. Since web service is external I cannot make changes there.
Threading is an option I thought of. Since I can use page numbers (service is paging for the data) each service call is almost independent.
Another option is using parallel linq.
Should I use parallel linq, or choose threading?
Someone please guide me here. Or let me know another way to achieve this.
Note : this web page can be used by many users at a time.
We have filters left side of the page.for that we need all the 10,000 data to construct filter.Otherwise pagewise info could have been enough.and caching is not possible since the huge overload on the server. at a time 400-1000 users can hit server.web service response time is 10 second so that we can hit them many time
We have to hit the service 20 times to get all data.Now i need a solution to improve that hit.Is threading is the only option?
If you can't cache the data from the service, then just get the data you need, when you need to display it. I very much doubt that somebody wants to see all 10000 products on a single web page, and if they do, there is probably something wrong!
Threads, parallel linq will not help you here.
Parallel Linq is meant for lots of CPU work to be shared over CPU cores, what you want to do is make 20 web requests at the same time. You will need to use threading to do that.
You'll probably want to use the built in async capability of HttpWebRequest (see BeginGetResponse).
Consider calling that service asyncrhonously. Most of delay in calling webservice is caused by IO operations that can be done simultaneously.
But getting 10000 items per each request is something very scarry :)

How to Download 5 files at a time using Thread in .net framework 3.5

I need to download certain files using FTP.Already it is implemented without using the thread. It takes too much time to download all the files.
So i need to use some thread for speed up the process .
my code is like
foreach (string str1 in files)
{
download_FTP(str1)
}
I refer this , But i don't want every files to be queued at ones.say for example 5 files at a time.
If the process is too slow, it means most likely that the network/Internet connection is the bottleneck. In that case, downloading the files in parallel won't significantly increase the performance.
It might be another story though if you are downloading from different servers. We may then imagine that some of the servers are slower than others. In that case, parallel downloads would increase the overall performance since the program would download files from other servers while being busy with slow downloads.
EDIT: OK, we have more info from you: Single server, many small files.
Downloading multiple files involves some overhead. You can decrease this overhead by somehow grouping the files (tar, zip, whatever) on server-side. Of course, this may not be possible. If your app would talk to a web server, I'd advise to create a zip file on the fly server-side according to the list of files transmitted in the request. But you are on an FTP server so I'll assume you have nearly no flexibility server-side.
Downloading several files in parallel may probably increase the throughput in your case. Be very careful though about restrictions set by the server such as the max amount of simultaneous connections. Also, keep in mind that if you have many simultaneous users, you'll end up with a big amount of connections on the server: users x threads. Which may prove counter-productive according to the scalability of the server.
A commonly accepted rule of good behaviour consists in limiting to max 2 simultaneoud connections per user. YMMV.
Okay, as you're not using .NET 4 that makes it slightly harder - the Task Parallel Library would make it really easy to create five threads reading from a producer/consumer queue. However, it still won't be too hard.
Create a Queue<string> with all the files you want to download
Create 5 threads, each of which has a reference to the queue
Make each thread loop, taking an item off the queue and downloading it, or finishing if the queue is empty
Note that as Queue<T> isn't thread-safe, you'll need to lock to make sure that only one thread tries to fetch an item from the queue at a time:
string fileToDownload = null;
lock(padlock)
{
if (queue.Count == 0)
{
return; // Done
}
fileToDownload = queue.Dequeue();
}
As noted elsewhere, threading may not speed things up at all - it depends where the bottleneck is. If the bottleneck is the user's network connection, you won't be able to get more data down the same size of pipe just by using multi-threading. On the other hand, if you have a lot of small files to download from different hosts, then it may be latency rather than bandwidth which is the problem, in which case threading will help.
look up on ParameterizedThreadStart
List<System.Threading.ParameterizedThreadStart> ThreadsToUse = new List<System.Threading.ParameterizedThreadStart>();
int count = 0;
foreach (string str1 in files)
{
ThreadsToUse.add(System.Threading.ParameterizedThreadStart aThread = new System.Threading.ParameterizedThreadStart(download_FTP));
ThreadsToUse[count].Invoke(str1);
count ++;
}
I remember something about Thread.Join that can make all threads respond to one start statement, due to it being a delegate.
There is also something else you might want to look up on which i'm still trying to fully grasp which is AsyncThreads, with these you will know when the file has been downloaded. With a normal thread you gonna have to find another way to flag it's finished.
This may or may not help your speed, in one way of your line speed is low then it wont help you much,
on the other hand some servers set each connection to be capped to a certain speed in which you this in theory will set up multiple connections to the server therefore having a slight increase in speed. how much increase tho I cannot answer.
Hope this helps in some way
I can add some experience to the comments already posted. In an app some years ago I had to generate a treeview of files on an FTP server. Listing files does not normally require actual downloading, but some of the files were zipped folders and I had to download these and unzip them, (sometimes recursively), to display the files/folders inside. For a multithreaded solution, this reqired a 'FolderClass' for each folder that could keep state and so handle both unzipped and zipped folders. To start the operation off, one of these was set up with the root folder and submitted to a P-C queue and a pool of threads. As the folder was LISTed and iterated, more FolderClass instances were submitted to the queue for each subfolder. When a FolderClass instance reached the end of its LIST, it PostMessaged itself, (it was not C#, for which you would need BeginInvoke or the like), to the UI thread where its info was added to the listview.
This activity was characterised by a lot of latency-sensitive TCP connect/disconnect with occasional download/unzip.
A pool of, IIRC, 4-6 threads, (as already suggested by other posters), provided the best performance on the single-core system i had at the time and, in this particular case, was much faster than a single-threaded solution. I can't remember the figures exactly, but no stopwatch was needed to detect the performance boost - something like 3-4 times faster. On a modern box with multiiple cores where LISTs and unzips could occur concurrently, I would expect even more improvement.
There were some problems - the visual ListView component could not keep up with the incoming messages, (because of the multiple threads, data arrived for aparrently 'random' positions on the treeview and so required continual tree navigation for display), and so the UI tended to freeze during the operation. Another problem was detecting when the operation had actually finished. These snags are probably not relevant to your download-many-small-files app.
Conclusion - I expect that downloading a lot of small files is going to be faster if multithreaded with multiple connections, if only from mitigating the connect/disconnect latency which can be larger than the actual data download time. In the extreme case of a satellite connection with high speed but very high latency, a large thread pool would provide a massive speedup.
Note the valid caveats from the other posters - if the server, (or its admin), disallows or gets annoyed at the multiple connections, you may get no boost, limited bandwidth or a nasty email from the admin!
Rgds,
Martin

Categories

Resources