Out of Memory Threading - Perf Test Tool - c#

I'm creating a tool to load test (sends http: GETs) and it runs fine but eventually dies because of an out of memory error.
ASK: How can I reset the threads so this loop can continually run and not err?
static void Main(string[] args)
{
System.Net.ServicePointManager.DefaultConnectionLimit = 200;
while (true)
{
for (int i = 0; i < 1000; i++)
{
new Thread(LoadTest).Start(); //<-- EXCEPTION!.eventually errs out of memory
}
Thread.Sleep(2);
}
}
static void LoadTest()
{
string url = "http://myserv.com/api/dev/getstuff?whatstuff=thisstuff";
// Sends http get from above url ... and displays the repose in the console....
}

You are instantiating Threads left right and centre. This is likely you problem. You want to replace the
new Thread(LoadTest).Start();
with
Task.Run(LoadTest);
This will run your LoadTest on a Thread in the ThreadPool, instead of using resources to create a new Thread each time. HOWEVER. This will then expose a different issue.
Threads on the ThreadPool are a limited resource and you want to return Threads to the ThreadPool as soon as possible. I assume you are using the synchronous download methods as opposed to the APM methods. This means that whilst the request is being sent out to the server, the thread spawning the request is sleeping as opposed to going off to do some other work.
Either use (assuming .net 4.5)
var client = new WebClient();
var response = await client.DownloadStringTaskAsync(url);
Console.WriteLine(response);
Or use a callback (if not .net 4.5)
var client = new WebClient();
client.OnDownloadStringCompleted(x => Console.WriteLine(x));
client.BeginDownloadString(url);

Use a ThreadPool and use QueueUserWorkItem instead of creating thousands of threads. Threads are expensive objects and it is no surprise you are running out of memory and besides you won't be able to have any performance (in your test tool) with so many threads.

You code snippet creates lots of threads and no wonder it eventually runs out of memory. It would be better to use a Thread Pool here.
You code would look like this:
static void Main(string[] args)
{
System.Net.ServicePointManager.DefaultConnectionLimit = 200;
ThreadPool.SetMaxThreads(500, 300);
while (true)
{
ThreadPool.QueueUserWorkItem(LoadTest);
}
}
static void LoadTest(object state)
{
string url = "http://myserv.com/api/dev/getstuff?whatstuff=thisstuff";
// Sends http get from above url ... and displays the repose in the console....
}

Related

Free up memory once Thread finished its task

Put break point before Thread start and you will notice console consuming about 5 to 8 MB memory but once Thread started it spike to 17 to 20 MB memory. And this memory stay used until close console. How can i freeup memory after Thread finished it task? Any better solution?
Now question is: Why i need it since garbage collector will automatically free up memory when needed. I need it because i am doing web scraping and i got a global class to store all scraped html text there and i have to scrape like 10k pages and store that html to global class. What happen is: when i run this app after scrape 500 html data to global class it eat almost 100% of my pc RAM which is 20 GB. So i need to free up RAM. I cant close console app to free up ram bcoz i have some calculation after collect all html.
class DemoData
{
public int Id { get; set; }
public string Text { get; set; }
public static List<DemoData> data = new List<DemoData>();
}
class Program
{
public static void Main()
{
for (var i = 0; i < 5000; i++)
{
DemoData.data.Add(new DemoData
{
Id = i,
Text = "something....",
});
}
foreach (var item in DemoData.data)
{
var t = new Thread(new ThreadStart(DoSomething));//put break point here and see.
t.Name = item.Id.ToString(); ;
t.Start();
}
Console.WriteLine("wait");
Console.ReadLine();
}
public static void DoSomething()
{
Thread thr = Thread.CurrentThread;
Console.WriteLine(thr.Name);
}
}
you just need to wait for for the garbage collector to run, your current app isn't complex enough to require this except on close so you aren't seeing this occur,
so the framework will fire off the GC when it feels it is needed, this will then look though the callstack and decide which objects it no longer needs and delete them freeing up memory, this will happen completely automatically with out you needing to do anything.
if you want to help the GC out you can check your variable scopes so that the GC can see its not needed anymore because its completely out of scope, so avoiding global variables, not creating links between data objects that aren't needed making correct choices between value and reference types, ect
however if you ever do end up in a position where you need to manually fire the GC you can call GC.Collect()
see
https://learn.microsoft.com/en-us/dotnet/api/system.gc.collect?view=net-5.0
You might need to use a memory profiler to check for potential memory leaks.
A possible reason for your issues is the large number of threads, there is no reason to use more threads than there are cpu cores. Each thread used will need some memory for a stack and other house keeping.
One fairly simple way would be to put addresses that needs visiting in a collection, and use a parallel.Foreach loop. This will try to adjust the number of threads used to maximize thruput. More complex variants could use one of the concurrent collections and multiple consumers. There is also async variants of many IO calls to avoid the memory overhead of blocking a thread while waiting for IO. I would recommend reading some examples of multiple producers/consumers pattern for more details.
if your problem is feeding the data into the threads for processing then i would suggest you look at https://devblogs.microsoft.com/dotnet/an-introduction-to-system-threading-channels/ this allows you to then create a thread safe processing buffer
here is working example, note this is using the .net 5 syntax
using System;
using System.Threading.Channels;
using System.Net.Http;
using System.Collections.Generic;
using System.Threading.Tasks;
using System.Linq;
var sites = Enumerable.Range(0, 5000).Select(i => #"http:\\www.example.com");
//create thread safe buffer no limit on size
var sitebuffer = Channel.CreateUnbounded<string>();
//create thread safe buffer limited to 10 elements
var htmlbuffer = Channel.CreateBounded<string>(10);
async Task Feed()
{
//while the buffer hasn't closed, wait for new data to be available
while(await sitebuffer.Reader.WaitToReadAsync())
{
//read the next available url from the buffer
var uri = await sitebuffer.Reader.ReadAsync();
var http = new HttpClient();
var html= await http.GetAsync(uri);
Console.WriteLine("reading site");
//load the return text to the htmlbuffer, if buffer is full wait for space
await htmlbuffer.Writer.WriteAsync(await html.Content.ReadAsStringAsync());
Console.WriteLine("reading site complete");
}
}
async Task Process()
{
//while the buffer hasn't closed, wait for new data to be available
while (await htmlbuffer.Reader.WaitToReadAsync())
{
//read html from buffer send to doSomething then read next element
var html = await htmlbuffer.Reader.ReadAsync();
await doSomethingWithHTML(html);
}
}
async Task doSomethingWithHTML(string html)
{
await Task.Delay(2);
Console.WriteLine("done something");
}
//start 4 feeders threads
var feeders = new[]
{
Feed(),
Feed(),
Feed(),
Feed(),
};
//start 2 worker threads
var workers = new[]
{
Process(),
Process(),
};
//start of feeding in sites
foreach (var item in sites)
{
await sitebuffer.Writer.WriteAsync(item);
}
//mark that all sites have been fed into the systems
sitebuffer.Writer.Complete();
//wait for all feeders to finish
await Task.WhenAll(feeders);
//mark that no more sites will be read
htmlbuffer.Writer.Complete();
//wait for all workers to finish
await Task.WhenAll(workers);
Console.WriteLine("all tasks complete");
notice the async Task's and awaits this is a newer wrapper around threads that simplifies a lot of the complexity in managing threads

Multiple concurrent webservice requests in ASP.NET using threads

I need to make multiple requests to a webservice at the same time. I thought of creating a thread for each request. Can this be done in ASP.NET v3.5?
Example:
for(int i = 0; i<=10; i++)
{
"Do each Request in a separate thread..."
}
While the oportunities to what you can use vary depending of what and where you would like to use paralelism in your code. I would suggest that you start off by using the new Task class from .NET 4.0.
Example would be:
Task backgroundProcess = new Task(() =>
{
service.CallMethod();
});
This will get you started. After that I suggest that you do some reading because this is a very broad subject. Try this link:
http://www.albahari.com/threading/
The following pattern can be used to spin off multiple requests as work items in the ThreadPool. It will also wait for all of those works items to complete before proceeding.
int pending = requests.Count;
var finished = new ManualResetEvent(false);
foreach (Request request in requests)
{
Request capture = request; // Required to close over the loop variable correctly.
ThreadPool.QueueUserWorkItem(
(state) =>
{
try
{
ProcessRequest(capture);
}
finally
{
if (Interlocked.Decrement(ref pending) == 0)
{
finished.Set(); // Signal completion of all work items.
}
}
}, null);
}
finished.WaitOne(); // Wait for all work items to complete.
You could also download the Reactive Extensions backport for 3.5 and then use Parallel.For to do the same thing.
If the webservice calls are asynchronous I don't see how have multiple threads will accomplish anything.
In .NET 3.5 you can use ThreadPool QueueUserWorkItem method. Plenty of examples on the web.

Run x number of web requests simultaneously

Our company has a web service which I want to send XML files (stored on my drive) via my own HTTPWebRequest client in C#. This already works. The web service supports 5 synchronuous requests at the same time (I get a response from the web service once the processing on the server is completed). Processing takes about 5 minutes for each request.
Throwing too many requests (> 5) results in timeouts for my client. Also, this can lead to errors on the server side and incoherent data. Making changes on the server side is not an option (from different vendor).
Right now, my Webrequest client will send the XML and wait for the response using result.AsyncWaitHandle.WaitOne();
However, this way, only one request can be processed at a time although the web service supports 5. I tried using a Backgroundworker and Threadpool but they create too many requests at same, which make them useless to me. Any suggestion, how one could solve this problem? Create my own Threadpool with exactly 5 threads? Any suggestions, how to implement this?
The easy way is to create 5 threads ( aside: that's an odd number! ) that consume the xml files from a BlockingCollection.
Something like:
var bc = new BlockingCollection<string>();
for ( int i = 0 ; i < 5 ; i++ )
{
new Thread( () =>
{
foreach ( var xml in bc.GetConsumingEnumerable() )
{
// do work
}
}
).Start();
}
bc.Add( xml_1 );
bc.Add( xml_2 );
...
bc.CompleteAdding(); // threads will end when queue is exhausted
If you're on .Net 4, this looks like a perfect fit for Parallel.ForEach(). You can set its MaxDegreeOfParallelism, which means you are guaranteed that no more items are processed at one time.
Parallel.ForEach(items,
new ParallelOptions { MaxDegreeOfParallelism = 5 },
ProcessItem);
Here, ProcessItem is a method that processes one item by accessing your server and blocking until the processing is done. You could use a lambda instead, if you wanted.
Creating your own threadpool of five threads isn't tricky - Just create a concurrent queue of objects describing the request to make, and have five threads that loop through performing the task as needed. Add in an AutoResetEvent and you can make sure they don't spin furiously while there are no requests that need handling.
It can though be tricky to return the response to the correct calling thread. If this is the case for how the rest of your code works, I'd take a different approach and create a limiter that acts a bit like a monitor but allowing 5 simultaneous threads rather than only one:
private static class RequestLimiter
{
private static AutoResetEvent _are = new AutoResetEvent(false);
private static int _reqCnt = 0;
public ResponseObject DoRequest(RequestObject req)
{
for(;;)
{
if(Interlocked.Increment(ref _reqCnt) <= 5)
{
//code to create response object "resp".
Interlocked.Decrement(ref _reqCnt);
_are.Set();
return resp;
}
else
{
if(Interlocked.Decrement(ref _reqCnt) >= 5)//test so we don't end up waiting due to race on decrementing from finished thread.
_are.WaitOne();
}
}
}
}
You could write a little helper method, that would block the current thread until all the threads have finished executing the given action delegate.
static void SpawnThreads(int count, Action action)
{
var countdown = new CountdownEvent(count);
for (int i = 0; i < count; i++)
{
new Thread(() =>
{
action();
countdown.Signal();
}).Start();
}
countdown.Wait();
}
And then use a BlockingCollection<string> (thread-safe collection), to keep track of your xml files. By using the helper method above, you could write something like:
static void Main(string[] args)
{
var xmlFiles = new BlockingCollection<string>();
// Add some xml files....
SpawnThreads(5, () =>
{
using (var web = new WebClient())
{
web.UploadFile(xmlFiles.Take());
}
});
Console.WriteLine("Done");
Console.ReadKey();
}
Update
An even better approach would be to upload the files async, so that you don't waste resources on using threads for an IO task.
Again you could write a helper method:
static void SpawnAsyncs(int count, Action<CountdownEvent> action)
{
var countdown = new CountdownEvent(count);
for (int i = 0; i < count; i++)
{
action(countdown);
}
countdown.Wait();
}
And use it like:
static void Main(string[] args)
{
var urlXML = new BlockingCollection<Tuple<string, string>>();
urlXML.Add(Tuple.Create("http://someurl.com", "filename"));
// Add some more to collection...
SpawnAsyncs(5, c =>
{
using (var web = new WebClient())
{
var current = urlXML.Take();
web.UploadFileCompleted += (s, e) =>
{
// some code to mess with e.Result (response)
c.Signal();
};
web.UploadFileAsyncAsync(new Uri(current.Item1), current.Item2);
}
});
Console.WriteLine("Done");
Console.ReadKey();
}

Why does WebClient.DownloadStringTaskAsync() block ? - new async API/syntax/CTP

For some reason there is a pause after the program below starts. I believe that WebClient().DownloadStringTaskAsync() is the cause.
class Program
{
static void Main(string[] args)
{
AsyncReturnTask();
for (int i = 0; i < 15; i++)
{
Console.WriteLine(i);
Thread.Sleep(100);
}
}
public static async void AsyncReturnTask()
{
var result = await DownloadAndReturnTaskStringAsync();
Console.WriteLine(result);
}
private static async Task<string> DownloadAndReturnTaskStringAsync()
{
return await new WebClient().DownloadStringTaskAsync(new Uri("http://www.weather.gov"));
}
}
As far as I understand my program should start counting from 0 to 15 immediately. Am I doing something wrong?
I had the same problem with the original Netflix download sample (which you get with CTP) - after pressing the search button the UI first freezes - and after some time it is responsive while loadning the next movies. And I believe it didn't freeze in Anders Hejlsberg's presentation at PDC 2010.
One more thing. When instead of
return await new WebClient().DownloadStringTaskAsync(new Uri("http://www.weather.gov"));
I use my own method:
return await ReturnOrdinaryTask();
Which is:
public static Task<string> ReturnOrdinaryTask()
{
var t = Task.Factory.StartNew(() =>
{
for (int i = 0; i < 10; i++)
{
Console.WriteLine("------------- " + i.ToString());
Thread.Sleep(100);
}
return "some text";
});
return t;
}
It works as it should. I mean it doesn't load anything, but it starts immediately and doesn't block the main thread, while doing its work.
Edit
OK, what I believe right now is: the WebClient.DownloadStringTaskAsync function is screwed up. It should work without the initial blocking period, like this:
static void Main(string[] args)
{
WebClient cli = new WebClient();
Task.Factory.StartNew(() =>
{
cli.DownloadStringCompleted += (sender, e) => Console.WriteLine(e.Result);
cli.DownloadStringAsync(new Uri("http://www.weather.gov"));
});
for (int i = 0; i < 100; i++)
{
Console.WriteLine(i);
Thread.Sleep(100);
}
}
While your program does block for a while, it does resume execution in the for loop, before the result is returned from the remote server.
Remember that the new async API is still single-threaded. So WebClient().DownloadStringTaskAsync() still needs to run on your thread until the request has been prepared and sent to the server, before it can await and yield execution back to your program flow in Main().
I think the results you are seeing are due to the fact that it takes some time to create and send the request out from your machine. First when that has finished, the implementation of DownloadStringTaskAsync can wait for network IO and the remote server to complete, and can return execution to you.
On the other hand, your RunOrdinaryTask method just initializes a task and gives it a workload, and tells it to start. Then it returns immediately. That is why you don't see a delay when using RunOrdinaryTask.
Here are some links on the subject: Eric Lippert's blog (one of the language designers), as well as Jon Skeet's initial blog post about it. Eric has a series of 5 posts about continuation-passing style, which really is what async and await is really about. If you want to understand the new feature in detail, you might want to read Eric's posts about CPS and Async. Anyways, both links above does a good job on explaining a very important fact:
Asynchronous != parallel
In other words, async and await does not spin up new threads for you. They just lets you resume execution of your normal flow, when you are doing a blocking operation - times where your CPU would just sit and do nothing in a synchronous program, waiting for some external operation to complete.
Edit
Just to be clear about what is happening: DownloadStringTaskAsync sets up a continuation, then calls WebClient.DownloadStringAsync, on the same thread, and then yields execution back to your code. Therefore, the blocking time you are seeing before the loop starts counting, is the time it takes DownloadStringAsync to complete. Your program with async and await is very close to be the equivalent of the following program, which exhibits the same behaviour as your program: An initial block, then counting starts, and somewhere in the middle, the async op finishes and prints the content from the requested URL:
static void Main(string[] args)
{
WebClient cli = new WebClient();
cli.DownloadStringCompleted += (sender, e) => Console.WriteLine(e.Result);
cli.DownloadStringAsync(new Uri("http://www.weather.gov")); // Blocks until request has been prepared
for (int i = 0; i < 15; i++)
{
Console.WriteLine(i);
Thread.Sleep(100);
}
}
Note: I am by no means an expert on this subject, so I might be wrong on some points. Feel free to correct my understanding of the subject, if you think this is wrong - I just looked at the PDC presentation and played with the CTP last night.
Are you sure the issue isn't related to the proxy configuration settings being detected from IE/Registry/Somewhere Slow?
Try setting webClient.Proxy = null (or specifying settings in app.config) and your "blocking" period should be minimal.
Are you pressing F5 or CTLR+F5 to run it? With F5 there's a delay for VS just to search for the symbols for AsyncCtpLibrary.dll...

Threading an unkown amount of threads in C#

I'm currently writing a sitemap generator that scrapes a site for urls and builds an xml sitemap. As most of the waiting is spent on requests to uri's I'm using threading, specifically the build in ThreadPool object.
In order to let the main thread wait for the unknown amount of threads to complete I have implemented the following setup. I don't feel this is a good solution though, can any threading gurus advise me of any problems this solution has, or suggest a better way to implement it?
The EventWaitHandle is set to EventResetMode.ManualReset
Here is the thread method
protected void CrawlUri(object o)
{
try
{
Interlocked.Increment(ref _threadCount);
Uri uri = (Uri)o;
foreach (Match match in _regex.Matches(GetWebResponse(uri)))
{
Uri newUri = new Uri(uri, match.Value);
if (!_uriCollection.Contains(newUri))
{
_uriCollection.Add(newUri);
ThreadPool.QueueUserWorkItem(_waitCallback, newUri);
}
}
}
catch
{
// Handle exceptions
}
finally
{
Interlocked.Decrement(ref _threadCount);
}
// If there are no more threads running then signal the waithandle
if (_threadCount == 0)
_eventWaitHandle.Set();
}
Here is the main thread method
// Request first page (based on host)
Uri root = new Uri(context.Request.Url.GetLeftPart(UriPartial.Authority));
// Begin threaded crawling of the Uri
ThreadPool.QueueUserWorkItem(_waitCallback, root);
Thread.Sleep(5000); // TEMP SOLUTION: Sleep for 5 seconds
_eventWaitHandle.WaitOne();
// Server the Xml Sitemap
context.Response.ContentType = "text/xml";
context.Response.Write(GetXml().OuterXml);
Any ideas are much appreciated :)
Well, first off you can create a ManualResetEvent that starts unset, so you don't have to sleep before waiting on it. Secondly you're going to need to put thread synchronization around your Uri collection. You could get a race condition where one two threads pass the "this Uri does not exist yet" check and they add duplicates. Another race condition is that two threads could pass the if (_threadCount == 0) check and they could both set the event.
Last, you can make the whole thing much more efficient by using the asynchronous BeginGetRequest. Your solution right now keeps a thread around to wait for every request. If you use async methods and callbacks, your program will use less memory (1MB per thread) and won't need to do context switches of threads nearly as much.
Here's an example that should illustrate what I'm talking about. Out of curiosity, I did test it out (with a depth limit) and it does work.
public class CrawlUriTool
{
private Regex regex;
private int pendingRequests;
private List<Uri> uriCollection;
private object uriCollectionSync = new object();
private ManualResetEvent crawlCompletedEvent;
public List<Uri> CrawlUri(Uri uri)
{
this.pendingRequests = 0;
this.uriCollection = new List<Uri>();
this.crawlCompletedEvent = new ManualResetEvent(false);
this.StartUriCrawl(uri);
this.crawlCompletedEvent.WaitOne();
return this.uriCollection;
}
private void StartUriCrawl(Uri uri)
{
Interlocked.Increment(ref this.pendingRequests);
HttpWebRequest request = (HttpWebRequest)WebRequest.Create(uri);
request.BeginGetResponse(this.UriCrawlCallback, request);
}
private void UriCrawlCallback(IAsyncResult asyncResult)
{
HttpWebRequest request = asyncResult.AsyncState as HttpWebRequest;
try
{
HttpWebResponse response = (HttpWebResponse)request.EndGetResponse(asyncResult);
string responseText = this.GetTextFromResponse(response); // not included
foreach (Match match in this.regex.Matches(responseText))
{
Uri newUri = new Uri(response.ResponseUri, match.Value);
lock (this.uriCollectionSync)
{
if (!this.uriCollection.Contains(newUri))
{
this.uriCollection.Add(newUri);
this.StartUriCrawl(newUri);
}
}
}
}
catch (WebException exception)
{
// handle exception
}
finally
{
if (Interlocked.Decrement(ref this.pendingRequests) == 0)
{
this.crawlCompletedEvent.Set();
}
}
}
}
When doing this kind of logic I generally try to make an object representing each asynchronous task and the data it needs to run. I would typically add this object to the collection of tasks to be done. The thread pool gets these tasks secheduled, and I would let the object itself remove itself from the "to be done" collection when the task finishes, possibly signalling on the collection itself.
So you're finished when the "to be done" collection is empty; the main thread is probably awoken once by each task that finishes.
You could look into the CTP of the Task Parallel Library which should make this simpler for you. What you're doing can be divided into "tasks", chunks or units of work, and the TPL can parallelize this for you if you supply the tasks. It uses a thread pool internally as well, but it's easier to use and comes with a lot of options like waiting for all tasks to finish. Check out this Channel9 video where the possibilities are explained and where a demo is shown of traversing a tree recursively in parallel, which seems very applicable to your problem.
However, it's still a preview and won't be released until .NET 4.0, so it comes with no warranties and you'll have to manually include the supplied System.Threading.dll (found in the install folder) into your project and I don't know if that's an option to you.

Categories

Resources