I have a program which loops through an apps list.
Apps
--------
App1
App2
App3
Now, for each of them, I do a http request to get a list of builds for each app as an Xml.
So a request like,
http://example.com/getapplist.do?appid=App1
gives me a response like,
<appid name="App1">
<buildid BldName="Bld3" Status="Not Ready"></buildid>
<buildid BldName="Bld2" Status="Ready"></buildid>
<buildid BldName="Bld1" Status="Ready"></buildid>
</appid>
Now I get the Highest build number with Status "Ready" and then do another web api call like,
http://example.com/getapplist.do?appid=App1&bldid=Bld2
This gives me a response like,
<buildinfo appid="App1" buildid="Bld2" value="someinfo"></build>
I feed these into internal data tables. But now, this program takes a painfully long time to complete (3 hours), since I have close to 2000 appids and there are 2 Web requests for each id. I tried sorting this issue using a BackgroundWorker as specified here. I thought of collating all info from http responses into a single XML and then using that XML for further processing. This throws the error,
file being used by another process
So my code looks like,
if (!backgroundWorker1.IsBusy)
{
for(int i = 0; i < appList.Count; i++)
{
BackgroundWorker bgw = new BackgroundWorker();
bgw.WorkerReportsProgress = true;
bgw.WorkerSupportsCancellation = true;
bgw.DoWork += new DoWorkEventHandler(bgw_DoWork);
bgw.ProgressChanged += new ProgressChangedEventHandler(bgw_ProgressChanged);
bgw.RunWorkerCompleted += new RunWorkerCompletedEventHandler(bgw_RunWorkerCompleted);
//Start The Worker
bgw.RunWorkerAsync();
}
}
And the DoWork function picks the tag values and puts it into an XML.
What is the best way I can get the app- buildinfo details into a common file from all the http responses from all the background workers?
HTTP requests are IO bound and asynchronous by nature, there is no reason to use background workers to accomplish what you need.
You can take advantage of async-await which is compatible in .NET 4 via Microsoft.Bcl.Async and HttpClient:
private async Task ProcessAppsAsync(List<string> appList)
{
var httpClient = new HttpClient();
// This will execute your IO requests concurrently,
// no need for extra threads.
var appListTasks = appList.Select(app => httpClient.GetAsync(app.Url)).ToList();
// Wait asynchronously for all of them to finish
await Task.WhenAll(appListTasks);
// process each Task.Result and aggregate them to an xml
using (var streamWriter = new StreamWriter(#"PathToFile")
{
foreach (var appList in appListTasks)
{
await streamWriter.WriteAsync(appList.Result);
}
}
}
This way, you process all requests concurrently and handle results from all of them once they've completed.
This solution works for .Net 2.0 and up by using the async methods from the WebClient class and using a counter that is decremented with the Interlocked class and ordinary lock to serialize the writing of the results to the file.
var writer = XmlWriter.Create(
new FileStream("api.xml",
FileMode.Create));
writer.WriteStartElement("apps"); // root element in the xml
// lock for one write
object writeLock = new object();
// this many calls
int counter = appList.Count;
foreach (var app in appList)
{
var wc = new WebClient();
var url = String.Format(
"http://example.com/getapplist.do?appid={0}&bldid=Bld2",
app);
wc.DownloadDataCompleted += (o, args) =>
{
try
{
var xd = new XmlDocument();
xd.LoadXml(Encoding.UTF8.GetString(args.Result));
lock (writeLock)
{
xd.WriteContentTo(writer);
}
}
finally
{
// count down our counter in a thread safe manner
if (Interlocked.Decrement(ref counter) == 0)
{
// this was the last one, close nicely
writer.WriteEndElement();
writer.Close();
((IDisposable) writer).Dispose();
}
}
};
wc.DownloadDataAsync(
new Uri(url));
}
Related
I want to increase the performance of a procedure which invokes a web service multiple times sequentially and store the result in a list.
Due that a single call to the WS last 1second and I need to do something like 300 calls to the web service if I do the job sequentially it takes 300 seconds to accomplish the task, that's why I changed the procedure implementation to multithreading using the following piece of code:
List<WCFResult> resultList= new List<WCFResult>()
using (var ws = new WCFService(binding, endpoint))
{
foreach (var singleElement in listOfelements)
{
Action action = () =>
{
var singleResult = ws.Call(singleElement);
resultList.Add(singleResult);
};
tasks.Add(Task.Factory.StartNew(action, TaskCreationOptions.LongRunning));
}
}
Task.WaitAll(tasks.ToArray());
//Do other stuff with the resultList...
Using this code I achieve to save 0.1 seconds per single element which is less than I thought, do you know any further optimization I can do? Or can you share an alternative?
Using the following code all the request are handled in half of the time
ParallelOptions ops = new ParallelOptions();
ops.MaxDegreeOfParallelism = 16;
ConcurrentBag<WCFResult> sapResultList = new ConcurrentBag<WCFResult>();
Parallel.ForEach(allItems, ops, item =>
{
var ws = new WCFClient(binding, endpoint);
result = ws.Call(item);
svc.Close();
resultList.Add(result);
});
//Do other stuff with the resultList...
Mission accomplished. I also modified the result list to be a ConcurrentBag instead of a List
I have a tcp listener which listens and writes data from the server. I used a BlockingCollection to store data. Here I don't know when the file ends. So, my filestream is always open.
Part of my code is:
private static BlockingCollection<string> Buffer = new BlockingCollection<string>();
Process()
{
var consumer = Task.Factory.StartNew(() =>WriteData());
while()
{
string request = await reader.ReadLineAsync();
Buffer.Add(request);
}
}
WriteData()
{
FileStream fStream = new FileStream(filename,FileMode.Append,FileAccess.Write,FileShare.Write, 16392);
foreach(var val in Buffer.GetConsumingEnumerable(token))
{
fStream.Write(Encoding.UTF8.GetBytes(val), 0, val.Length);
fStream.Flush();
}
}
The problem is I cannot dispose filestream within loop otherwise I have to create filestream for each line and the loop may never end.
This would be much easier in .NET 4.5 if you used a DataFlow ActionBlock. An ActionBlock accepts and buffers incoming messages and processes them asynchronously using one or more Tasks.
You could write something like this:
public static async Task ProcessFile(string sourceFileName,string targetFileName)
{
//Pass the target stream as part of the message to avoid globals
var block = new ActionBlock<Tuple<string, FileStream>>(async tuple =>
{
var line = tuple.Item1;
var stream = tuple.Item2;
await stream.WriteAsync(Encoding.UTF8.GetBytes(line), 0, line.Length);
});
//Post lines to block
using (var targetStream = new FileStream(targetFileName, FileMode.Append,
FileAccess.Write, FileShare.Write, 16392))
{
using (var sourceStream = File.OpenRead(sourceFileName))
{
await PostLines(sourceStream, targetStream, block);
}
//Tell the block we are done
block.Complete();
//And wait fo it to finish
await block.Completion;
}
}
private static async Task PostLines(FileStream sourceStream, FileStream targetStream,
ActionBlock<Tuple<string, FileStream>> block)
{
using (var reader = new StreamReader(sourceStream))
{
while (true)
{
var line = await reader.ReadLineAsync();
if (line == null)
break;
var tuple = Tuple.Create(line, targetStream);
block.Post(tuple);
}
}
}
Most of the code deals with reading each line and posting it to the block. By default, an ActionBlock uses only a single Task to process one message at a time, which is fine in this scenario. More tasks can be used if needed to process data in parallel.
Once all lines are read, we notify the block with a call to Complete and await for it to finish processing with await block.Completion.
Once the block's Completion task finishes we can close the target stream.
The beauty of the DataFlow library is that you can link multiple blocks together, to create a pipeline of processing steps. ActionBlock is typically the final step in such a chain. The library takes care to pass data from one block to the next and propagate completion down the chain.
For example, one step can read files from a log, a second can parse them with a regex to find specific patterns (eg error messages) and pass them on, a third can receive the error messages and write them to another file. Each step will execute on a different thread, with intermediate messages buffered at each step.
I'm currently using WebClient to open a few websites but after a period of time I start receiving Error 403 messages.
I'm assuming it's because I'm hitting their servers to frequently/quickly. I'm assuming all I need to do is add a Thread.Sleep time-frame between requests.
Since I'm having to do this a large number of times is there a suggestion on how to handle the throttling issue without have to take a tremendous amount of time?
For instance 3 secs between requests will end up taking me like 3 hrs to do.
So the question is, is Thread.Sleep really the right solution for this? and if it is, what is a good time-frame for it?
As a side note I have also used HttpWebRequest and ran into the same problem. I do still use it in other code projects and technically am hoping to utilize the same solution (or close to it) for these other code projects utilizing the HttpWebRequest
try running the requests in parallel
public static void RunRequest(Uri uri, Action<string> onCompleted)
{
var client = new WebClient();
client.DownloadStringCompleted += (sender, e) => onCompleted(e.Result);
client.DownloadStringAsync(uri);
};
warning: code is not testet and i have never used WebClient
private const int _maxParallelRequest = 10;
private int _requestCount = 0;
private readonly object _sync = new object();
private ManualResetEvent _ev = new ManualResetEvent(false);
while(true)
{
foreach (var uri in _allYourUris)
{
var wait = false;
lock (_sync)
{
if (_requestCount >= _maxParallelRequest)
wait = true;
}
if (!wait)
{
lock (_sync) { ++_requestCount; }
RunRequest(uri, r => {
lock (_sync)
{
--_requestCount;
_ev.Set();
}
// handle r
});
continue;
}
_ev.WaitOne();
}
Thread.Sleep(3000);
}
Attempting to write a HTML crawler using the Async CTP I have gotten stuck as to how to write a recursion free method for accomplishing this.
This is the code I have so far.
private readonly ConcurrentStack<LinkItem> _LinkStack;
private readonly Int32 _MaxStackSize;
private readonly WebClient client = new WebClient();
Func<string, string, Task<List<LinkItem>>> DownloadFromLink = async (BaseURL, uri) =>
{
string html = await client.DownloadStringTaskAsync(uri);
return LinkFinder.Find(html, BaseURL);
};
Action<LinkItem> DownloadAndPush = async (o) =>
{
List<LinkItem> result = await DownloadFromLink(o.BaseURL, o.Href);
if (this._LinkStack.Count() + result.Count <= this._MaxStackSize)
{
this._LinkStack.PushRange(result.ToArray());
o.Processed = true;
}
};
Parallel.ForEach(this._LinkStack, (o) =>
{
DownloadAndPush(o);
});
But obviously this doesn't work as I would hope because at the time that Parallel.ForEach executes the first (and only iteration) I only have only 1 item. The simplest approach I can think of to make the ForEach recursive but I can't (I don't think) do this as I would quickly run out of stack space.
Could anyone please guide me as to how I can restructure this code, to create what I would describe as a recursive continuation that adds items until either the MaxStackSize is reached or the system runs out of memory?
I think the best way to do something like this using C# 5/.Net 4.5 is to use TPL Dataflow. There even is a walkthrough on how to implement web crawler using it.
Basically, you create one "block" that takes care of downloading one URL and getting the link from it:
var cts = new CancellationTokenSource();
Func<LinkItem, Task<IEnumerable<LinkItem>>> downloadFromLink =
async link =>
{
// WebClient is not guaranteed to be thread-safe,
// so we shouldn't use one shared instance
var client = new WebClient();
string html = await client.DownloadStringTaskAsync(link.Href);
return LinkFinder.Find(html, link.BaseURL);
};
var linkFinderBlock = new TransformManyBlock<LinkItem, LinkItem>(
downloadFromLink,
new ExecutionDataflowBlockOptions
{ MaxDegreeOfParallelism = 4, CancellationToken = cts.Token });
You can set MaxDegreeOfParallelism to any value you want. It says at most how many URLs can be downloaded concurrently. If you don't want to limit it at all, you can set it to DataflowBlockOptions.Unbounded.
Then you create one block that processes all the downloaded links somehow, like storing them all in a list. It can also decide when to cancel downloading:
var links = new List<LinkItem>();
var storeBlock = new ActionBlock<LinkItem>(
linkItem =>
{
links.Add(linkItem);
if (links.Count == maxSize)
cts.Cancel();
});
Since we didn't set MaxDegreeOfParallelism, it defaults to 1. That means using collection that is not thread-safe should be okay here.
We create one more block: it will take a link from linkFinderBlock, and pass it both to storeBlock and back to linkFinderBlock.
var broadcastBlock = new BroadcastBlock<LinkItem>(li => li);
The lambda in its constructor is a "cloning function". You can use it to create a clone of the item if you want to, but it shouldn't be necessary here, since we don't modify the LinkItem after creation.
Now we can connect the blocks together:
linkFinderBlock.LinkTo(broadcastBlock);
broadcastBlock.LinkTo(storeBlock);
broadcastBlock.LinkTo(linkFinderBlock);
Then we can start processing by giving the first item to linkFinderBlock (or broadcastBlock, if you want to also send it to storeBlock):
linkFinderBlock.Post(firstItem);
And finally wait until the processing is complete:
try
{
linkFinderBlock.Completion.Wait();
}
catch (AggregateException ex)
{
if (!(ex.InnerException is TaskCanceledException))
throw;
}
Our company has a web service which I want to send XML files (stored on my drive) via my own HTTPWebRequest client in C#. This already works. The web service supports 5 synchronuous requests at the same time (I get a response from the web service once the processing on the server is completed). Processing takes about 5 minutes for each request.
Throwing too many requests (> 5) results in timeouts for my client. Also, this can lead to errors on the server side and incoherent data. Making changes on the server side is not an option (from different vendor).
Right now, my Webrequest client will send the XML and wait for the response using result.AsyncWaitHandle.WaitOne();
However, this way, only one request can be processed at a time although the web service supports 5. I tried using a Backgroundworker and Threadpool but they create too many requests at same, which make them useless to me. Any suggestion, how one could solve this problem? Create my own Threadpool with exactly 5 threads? Any suggestions, how to implement this?
The easy way is to create 5 threads ( aside: that's an odd number! ) that consume the xml files from a BlockingCollection.
Something like:
var bc = new BlockingCollection<string>();
for ( int i = 0 ; i < 5 ; i++ )
{
new Thread( () =>
{
foreach ( var xml in bc.GetConsumingEnumerable() )
{
// do work
}
}
).Start();
}
bc.Add( xml_1 );
bc.Add( xml_2 );
...
bc.CompleteAdding(); // threads will end when queue is exhausted
If you're on .Net 4, this looks like a perfect fit for Parallel.ForEach(). You can set its MaxDegreeOfParallelism, which means you are guaranteed that no more items are processed at one time.
Parallel.ForEach(items,
new ParallelOptions { MaxDegreeOfParallelism = 5 },
ProcessItem);
Here, ProcessItem is a method that processes one item by accessing your server and blocking until the processing is done. You could use a lambda instead, if you wanted.
Creating your own threadpool of five threads isn't tricky - Just create a concurrent queue of objects describing the request to make, and have five threads that loop through performing the task as needed. Add in an AutoResetEvent and you can make sure they don't spin furiously while there are no requests that need handling.
It can though be tricky to return the response to the correct calling thread. If this is the case for how the rest of your code works, I'd take a different approach and create a limiter that acts a bit like a monitor but allowing 5 simultaneous threads rather than only one:
private static class RequestLimiter
{
private static AutoResetEvent _are = new AutoResetEvent(false);
private static int _reqCnt = 0;
public ResponseObject DoRequest(RequestObject req)
{
for(;;)
{
if(Interlocked.Increment(ref _reqCnt) <= 5)
{
//code to create response object "resp".
Interlocked.Decrement(ref _reqCnt);
_are.Set();
return resp;
}
else
{
if(Interlocked.Decrement(ref _reqCnt) >= 5)//test so we don't end up waiting due to race on decrementing from finished thread.
_are.WaitOne();
}
}
}
}
You could write a little helper method, that would block the current thread until all the threads have finished executing the given action delegate.
static void SpawnThreads(int count, Action action)
{
var countdown = new CountdownEvent(count);
for (int i = 0; i < count; i++)
{
new Thread(() =>
{
action();
countdown.Signal();
}).Start();
}
countdown.Wait();
}
And then use a BlockingCollection<string> (thread-safe collection), to keep track of your xml files. By using the helper method above, you could write something like:
static void Main(string[] args)
{
var xmlFiles = new BlockingCollection<string>();
// Add some xml files....
SpawnThreads(5, () =>
{
using (var web = new WebClient())
{
web.UploadFile(xmlFiles.Take());
}
});
Console.WriteLine("Done");
Console.ReadKey();
}
Update
An even better approach would be to upload the files async, so that you don't waste resources on using threads for an IO task.
Again you could write a helper method:
static void SpawnAsyncs(int count, Action<CountdownEvent> action)
{
var countdown = new CountdownEvent(count);
for (int i = 0; i < count; i++)
{
action(countdown);
}
countdown.Wait();
}
And use it like:
static void Main(string[] args)
{
var urlXML = new BlockingCollection<Tuple<string, string>>();
urlXML.Add(Tuple.Create("http://someurl.com", "filename"));
// Add some more to collection...
SpawnAsyncs(5, c =>
{
using (var web = new WebClient())
{
var current = urlXML.Take();
web.UploadFileCompleted += (s, e) =>
{
// some code to mess with e.Result (response)
c.Signal();
};
web.UploadFileAsyncAsync(new Uri(current.Item1), current.Item2);
}
});
Console.WriteLine("Done");
Console.ReadKey();
}