I want to download images from random generated URIs and fastest way I found is unstoppable.
I'm using pregenerated List<string> of URIs reaching about 400imgs/minute (about 8 times more than while using standard Threads) but i want it to continuously generate URIs and download new images until I say it to pause. How to achieve that?
private void StartButton_Click(object sender, EventArgs e)
{
List<string> ImageURI;
GenerateURIs(out ImageURI); // creates list of 1000 uris
ImageNames.AsParallel().WithDegreeOfParallelism(50).Sum(s => DownloadFile(s));
}
private int DownloadFile(string URI)
{
try
{
HttpWebRequest webrequest = (HttpWebRequest)WebRequest.Create(URI);
webrequest.Timeout = 10000;
webrequest.ReadWriteTimeout = 10000;
webrequest.Proxy = null;
webrequest.KeepAlive = false;
HttpWebResponse webresponse = (HttpWebResponse)webrequest.GetResponse();
using (Stream sr = webrequest.GetResponse().GetResponseStream())
{
DownloadedImages++;
using (MemoryStream ms = new MemoryStream())
{
sr.CopyTo(ms);
byte[] ImageBytes = ms.ToArray();
if (ImageBytes.Length == 503)
{
InvalidImages++;
return 0;
}
else
{
ValidImages++;
using (var Writer = new BinaryWriter(new FileStream("images/" + (++FilesIndex).ToString() + ".png", FileMode.Append, FileAccess.Write)))
{
Writer.Write(ImageBytes);
}
}
}
}
}
catch (Exception e)
{
return 0;
}
return 0;
}
First off, your current code isn't thread safe. InvalidImages, DownloadedImages and ValidImages all need synchronization.
That being said, you can do this more efficiently using async instead of threading. Since nearly all of the "work" in this case is IO bound, async will likely be a far better, more scalable approach.
Try this instead:
private async void StartButton_Click(object sender, EventArgs e)
{
List<string> ImageURI;
GenerateURIs(out ImageURI); // creates list of 1000 uris
var requests = ImageURI
.Select(uri => (new WebClient()).DownloadDataTaskAsync(uri))
.Select(SaveImageFile);
await Task.WhenAll(requests);
}
private Task SaveImageFile(Task<byte[]> data)
{
try
{
byte[] ImageBytes = await data;
DownloadedImages++;
if (ImageBytes.Length == 503)
{
InvalidImages++;
return;
}
ValidImages++;
using (var file = new FileStream("images/" + (++FilesIndex).ToString() + ".png", FileMode.Append, FileAccess.Write))
{
await Writer.WriteAsync(ImageBytes, 0, ImageBytes.Length);
}
}
catch (Exception e)
{
}
return;
}
Note that, with async/await, you no longer have to worry about synchronization since those values will be set on the main UI thread still.
As for pausing, there are various options - you could add a flag of whether or not to continually execute data, or use CancellationTokenSource to provide cancellation support through the entire operation.
What you're looking for is a producer/consumer model in which you have a producer adding items to a queue and a consumer pulling items off of it. BlockingCollection makes this very easy. Create a BlockingCollection, have your producer continue to add items to it over time as you generate them, calling CompleteAdding when done, and have your consumer use GetConsumingEnumerable to which you can call your exact code on that enumerable.
You'll want both the producer and the consuming code to be moved into non-UI threads, so that they both don't block the UI and can produce/consume the data in parallel.
Also note that currently within your DownloadFile method you are mutating and accessing instance data, despite the fact that this method is likely to be called from different threads concurrently. Doing things like incrementing indexes is not safe, because it is not an atomic operation, which result in your code having possible side effects. You either need to avoid the use of shared state between these different threads, or properly synchronize access to that shared state.
Related
I have a .NET framework Windows Forms application with a form that has this code:
using System;
using System.Collections.Generic;
using System.IO;
using System.Net.Http;
using System.Threading;
using System.Threading.Tasks;
using System.Windows.Forms;
namespace test
{
public partial class Main : Form
{
public int exitCode = 1;
private Options opts;
CancellationTokenSource cancellationSource = new CancellationTokenSource();
public Main(Options opts)
{
InitializeComponent();
this.opts = opts;
}
private void btnCancel_Click(object sender, EventArgs e)
{
exitCode = 1;
cancellationSource.Cancel();
Close();
}
async Task doUpload()
{
using (var content = new MultipartFormDataContent())
{
List<FileStream> streams = new List<FileStream>();
try
{
foreach (string fPath in opts.InputFiles)
{
FileStream stream = new FileStream(fPath, FileMode.Open, FileAccess.Read);
streams.Add(stream);
content.Add(new StreamContent(stream), fPath);
}
var progressContent = new ProgressableStreamContent(
content,
4096,
(sent, total) =>
{
double percent = 100 * sent / total;
progressBar.Value = (int)percent;
});
using (var client = new HttpClient())
{
using (var response = await client.PostAsync(opts.URL, progressContent, cancellationSource.Token))
{
if (response.IsSuccessStatusCode)
{
exitCode = 0;
}
else
{
MessageBox.Show(
response.Content.ToString(),
"Error " + response.StatusCode,
MessageBoxButtons.OK, MessageBoxIcon.Error
);
}
Close();
}
}
}
finally
{
foreach (FileStream stream in streams)
{
stream.Close();
}
}
}
}
private void Main_Load(object sender, EventArgs e)
{
}
private void Main_FormClosing(object sender, FormClosingEventArgs e)
{
e.Cancel = !cancellationSource.IsCancellationRequested;
}
private void Main_Shown(object sender, EventArgs e)
{
doUpload();
}
}
}
The ProgressableStreamContent is the same that was given here: C#: HttpClient, File upload progress when uploading multiple file as MultipartFormDataContent
The problem is that the response is never returned. In other words: await for postAsync never completes. Also, the progress callback is never called back. Even if I try to use a POST URL that contains a non-exsitent domain, nothing happens. I guess it is a deadlock, but I don't see how? The async Task's result is never used anywhere and it is not awaited for.
It is different from An async/await example that causes a deadlock because .Result is not used and the method is never awaited for, and also it seems that calling ConfigureAwait(false) ha no effect.
UPDATE: I have created a new github repo for this question, so anyone can test it:
https://github.com/nagylzs/csharp_http_post_example
UPDATE: Finally it works. ConfigureAwait is not needed. All UI update operations must be placed inside Invoke. I have updated the test repo to the working version. Also added TLSv1.2 support (which is disabled by default).
PostAsync in the code you've posted doesn't block (but it really never returns though!). It throws an exception:
System.InvalidOperationException: Cross-thread operation not valid: Control 'progressBar' accessed from a thread other than the thread it was created on.
That's the reason for the breakpoints that didn't worked for you. The right solution would be:
var progressContent = new ProgressableStreamContent(
content,
4096,
(sent, total) =>
{
Invoke((Action) (() => {
double percent = 100 * sent / total;
progressBar.Value = (int) percent;
}));
});
(either add Invoke or BeginInvoke to the callback)
The callbacks of the HTTP client are called on a background thread, and you have to put them into your window's even queue if you want them to access your UI controls.
.ConfigureAwait(false) has nothing to do with this issue, you shouldn't use it in UI context (quite the opposite: you want it to put the continuation onto the UI thread, so you shouldn't use it).
You need to change this:
client.PostAsync(opts.URL, progressContent, cancellationSource.Token)
to
client.PostAsync(opts.URL, progressContent, cancellationSource.Token).ConfigureAwait(false)
This is already discussed so you can find additional resources on the net, but this should be good starting point.
I need to do some WebRequest to a certain endpoint every 2 seconds. I tried to do it with a Timer, the problem is that every call to the callback function is creating a different Thread and I'm havind some concurrence problems. So I decided to change my implementation and I was thinking about using a background worker with a sleep of two seconds inside or using async await but I don't see the advantages of using async await. Any advice? thank you.
This is the code that I will reimplement.
private void InitTimer()
{
TimerCallback callback = TimerCallbackFunction;
m_timer = new Timer(callback, null, 0, m_interval);
}
private void TimerCallbackFunction(Object info)
{
Thread.CurrentThread.Name = "Requester thread ";
m_object = GetMyObject();
}
public MyObject GetMyObject()
{
MyObject myobject = new MyObject();
try
{
MemoryStream responseInMemory = CreateWebRequest(m_host, ENDPOINT);
XmlSerializer xmlSerializer = new XmlSerializer(typeof(MyObject));
myObject = (MyObject) xmlSerializer.Deserialize(responseInMemory);
}
catch (InvalidOperationException ex)
{
m_logger.WriteError("Error getting MyObject: ", ex);
throw new XmlException();
}
return myObject;
}
private MemoryStream CreateWebRequest(string host, string endpoint)
{
WebRequest request = WebRequest.Create(host + endpoint);
using (var response = request.GetResponse())
{
return (MemoryStream) response.GetResponseStream();
}
}
EDIT: I have read this SO thread Async/await vs BackgroundWorker
async await is also concurrence. If you have concurrence problems and you want your application to have only one thread, you should avoid using async await.
However the best way to do WebRequest is to use async await, which does not block the main UI thread.
Use the bellow method, it will not block anything and it is recommended by Microsoft. https://msdn.microsoft.com/en-us/library/86wf6409(v=vs.110).aspx
private async Task<MemoryStream> CreateWebRequest(string host, string endpoint)
{
WebRequest request = WebRequest.Create(host + endpoint);
using (var response = await request.GetResponseAsync())
{
return (MemoryStream)response.GetResponseStream();
}
}
You don't mention what the concurrency problems are. It may be that the request takes so long that the next one starts before the previous one finishes. It could also be that the callback replaces the value in my_Object while readers are accessing it.
You can easily make a request every X seconds, asynchronously and without blocking, by using Task.Delay, eg:
ConcurrentQueue<MyObject> m_Responses=new ConcurrentQueue<MyObject>();
public async Task MyPollMethod(int interval)
{
while(...)
{
var result=await SomeAsyncCall();
m_Responses.Enqueue(result);
await Task.Delay(interval);
}
}
This will result in a polling call X seconds after the last one finishes.
It also avoids concurrency issues by storing the result in a concurrent queue instead of replacing the old value, perhaps while someone else was reading int.
Consumers of MyObject would call Dequeue to retrieve MyObject instances in the order they were received.
You could use the ConcurrentQueue to fix the current code too:
private void TimerCallbackFunction(Object info)
{
Thread.CurrentThread.Name = "Requester thread ";
var result=GetMyObject();
m_Responses.Enqueue(result);
}
or
private async void TimerCallbackFunction(Object info)
{
Thread.CurrentThread.Name = "Requester thread ";
var result=await GetMyObjectAsync();
m_Responses.Enqueue(result);
}
if you want to change your GetObject method to work asynchronously.
Since your request seems to take a long time, it's a good idea to make it asynchronous and avoid blocking the timer's ThreadPool thread while waiting for a network response.
I have a tcp listener which listens and writes data from the server. I used a BlockingCollection to store data. Here I don't know when the file ends. So, my filestream is always open.
Part of my code is:
private static BlockingCollection<string> Buffer = new BlockingCollection<string>();
Process()
{
var consumer = Task.Factory.StartNew(() =>WriteData());
while()
{
string request = await reader.ReadLineAsync();
Buffer.Add(request);
}
}
WriteData()
{
FileStream fStream = new FileStream(filename,FileMode.Append,FileAccess.Write,FileShare.Write, 16392);
foreach(var val in Buffer.GetConsumingEnumerable(token))
{
fStream.Write(Encoding.UTF8.GetBytes(val), 0, val.Length);
fStream.Flush();
}
}
The problem is I cannot dispose filestream within loop otherwise I have to create filestream for each line and the loop may never end.
This would be much easier in .NET 4.5 if you used a DataFlow ActionBlock. An ActionBlock accepts and buffers incoming messages and processes them asynchronously using one or more Tasks.
You could write something like this:
public static async Task ProcessFile(string sourceFileName,string targetFileName)
{
//Pass the target stream as part of the message to avoid globals
var block = new ActionBlock<Tuple<string, FileStream>>(async tuple =>
{
var line = tuple.Item1;
var stream = tuple.Item2;
await stream.WriteAsync(Encoding.UTF8.GetBytes(line), 0, line.Length);
});
//Post lines to block
using (var targetStream = new FileStream(targetFileName, FileMode.Append,
FileAccess.Write, FileShare.Write, 16392))
{
using (var sourceStream = File.OpenRead(sourceFileName))
{
await PostLines(sourceStream, targetStream, block);
}
//Tell the block we are done
block.Complete();
//And wait fo it to finish
await block.Completion;
}
}
private static async Task PostLines(FileStream sourceStream, FileStream targetStream,
ActionBlock<Tuple<string, FileStream>> block)
{
using (var reader = new StreamReader(sourceStream))
{
while (true)
{
var line = await reader.ReadLineAsync();
if (line == null)
break;
var tuple = Tuple.Create(line, targetStream);
block.Post(tuple);
}
}
}
Most of the code deals with reading each line and posting it to the block. By default, an ActionBlock uses only a single Task to process one message at a time, which is fine in this scenario. More tasks can be used if needed to process data in parallel.
Once all lines are read, we notify the block with a call to Complete and await for it to finish processing with await block.Completion.
Once the block's Completion task finishes we can close the target stream.
The beauty of the DataFlow library is that you can link multiple blocks together, to create a pipeline of processing steps. ActionBlock is typically the final step in such a chain. The library takes care to pass data from one block to the next and propagate completion down the chain.
For example, one step can read files from a log, a second can parse them with a regex to find specific patterns (eg error messages) and pass them on, a third can receive the error messages and write them to another file. Each step will execute on a different thread, with intermediate messages buffered at each step.
My application connects to a large number of clients over http, downloads data from those clients and processes data as these results are received. Each request is sent in a separate thread so that the main thread does not remain occupied.
We have started encountering performance issues and it seems like these are mostly related to large number of threads in the ThreadPool that are just waiting for getting data back from those requests. I know with .NET 4.5 we have async and await for the same type of problem but we are still using .NET 3.5.
Any thoughts on what's the best way of sending these requests in a different thread but not to keep that thread alive while all its doing is to keep waiting for request to come back?
You can use async operations in .NET 3.5, it's just not as convenient as in .NET 4.5. Most IO methods have a BeginX/EndX method pair that is the async equivalent of the X method. This is called the Asynchronous Programming Model (APM).
For instance, instead of Stream.Read, you could use Stream.BeginRead and Stream.EndRead.
Actually, many async IO methods in .NET 4.5 are just wrappers around the Begin/End methods.
If you cannot use .NET 4.x and async/await, you still can achieve a sort of similar behavior using IEnumerator and yield. It allows to use pseudo-synchronous linear code flow with Begin/End-style callbacks, including statements like using, try/finally, while/for/foreach etc. You cannot use try/catch, though.
There are a few implementations of the asynchronous enumerator driver out there, e.g. Jeffrey Richter's AsyncEnumerator.
I used something like below in the past:
class AsyncIO
{
void ReadFileAsync(string fileName)
{
AsyncOperationExt.Start(
start => ReadFileAsyncHelper(fileName, start),
result => Console.WriteLine("Result: " + result),
error => Console.WriteLine("Error: " + error));
}
static IEnumerator<object> ReadFileAsyncHelper(string fileName, Action nextStep)
{
using (var stream = new FileStream(
fileName, FileMode.Open, FileAccess.Read, FileShare.Read, bufferSize: 1024, useAsync: true))
{
IAsyncResult asyncResult = null;
AsyncCallback asyncCallback = ar => { asyncResult = ar; nextStep(); };
var buff = new byte[1024];
while (true)
{
stream.BeginRead(buff, 0, buff.Length, asyncCallback, null);
yield return Type.Missing;
int readBytes = stream.EndRead(asyncResult);
if (readBytes == 0)
break;
// process the buff
}
}
yield return true;
}
}
// ...
// implement AsyncOperationExt.Start
public static class AsyncOperationExt
{
public static void Start<TResult>(
Func<Action, IEnumerator<TResult>> start,
Action<TResult> oncomplete,
Action<Exception> onerror)
{
IEnumerator<TResult> enumerator = null;
Action nextStep = () =>
{
try
{
var current = enumerator.Current;
if (!enumerator.MoveNext())
oncomplete(current);
}
catch (Exception ex)
{
onerror(ex);
}
enumerator.Dispose();
};
try
{
enumerator = start(nextStep);
}
catch (Exception ex)
{
onerror(ex);
enumerator.Dispose();
}
}
}
i want to be able to put the code of writing file into mutex so as to avoid any concurrent modification to a file. however, I want only the file with particular name be blocked as critical section not for the other file. will this code work as expected or have I missed anything?
private async static void StoreToFileAsync(string filename, object data)
{
IsolatedStorageFile AppIsolatedStorage = IsolatedStorageFile.GetUserStoreForApplication();
if (IsolatedStorageFileExist(filename))
{
AppIsolatedStorage.DeleteFile(filename);
}
if (data != null)
{
string json = await Task.Factory.StartNew<string>(() => JsonConvert.SerializeObject(data));
if (!string.IsNullOrWhiteSpace(json))
{
byte[] buffer = Encoding.UTF8.GetBytes(json);
Mutex mutex = new Mutex(false, filename);
mutex.WaitOne();
IsolatedStorageFileStream ISFileStream = AppIsolatedStorage.CreateFile(filename);
await ISFileStream.WriteAsync(buffer, 0, buffer.Length);
ISFileStream.Close();
mutex.ReleaseMutex();
}
}
}
EDIT 1: or should I replace the async write with synchronous one and run as a separate task?
await Task.Factory.StartNew(() =>
{
if (!string.IsNullOrWhiteSpace(json))
{
byte[] buffer = Encoding.UTF8.GetBytes(json);
lock (filename)
{
IsolatedStorageFileStream ISFileStream = AppIsolatedStorage.CreateFile(filename);
ISFileStream.Write(buffer, 0, buffer.Length);
ISFileStream.Close();
}
}
});
In the general case, using a Mutex with asynchronous code is wrong. Normally, I'd say you should use something like AsyncLock from my AsyncEx library, and if you want a separate lock per file, then you'll need a dictionary or some such.
However, if your method is always called from the UI thread, then it will work. It is not very efficient, but it will work.
There are few things you should improve:
running asynchronous code 'inside' Mutex is not good, as the same thread should create and release Mutex. You will probably get exception when called not from UI thread.
filename is not a good name for a Mutex - it is global object (among all processes), so it should be unique.
you should Dispose your Mutex, and Relese it in finally clause (what happens if you get an Exception?)- BTW here is a good pattern
you should/must Dispose IsolatedFileStream
you should also think if infinite waiting is a good choice
if you aren't accessing to file amopng processes (Bacground Agents, other App) then you can use lock or SemaphoreSlim
private async static void StoreToFileAsync(string filename, object data)
{
using (IsolatedStorageFile AppIsolatedStorage = IsolatedStorageFile.GetUserStoreForApplication())
{
if (IsolatedStorageFileExist(filename))
AppIsolatedStorage.DeleteFile(filename);
if (data != null)
{
string json = await Task.Factory.StartNew<string>(() => JsonConvert.SerializeObject(data));
if (!string.IsNullOrWhiteSpace(json)) await Task.Run(() =>
{
byte[] buffer = Encoding.UTF8.GetBytes(json);
using (Mutex mutex = new Mutex(false, filename))
{
try
{
mutex.WaitOne();
using (IsolatedStorageFileStream ISFileStream = AppIsolatedStorage.CreateFile(filename))
ISFileStream.Write(buffer, 0, buffer.Length);
}
catch { }
finally { mutex.ReleaseMutex(); }
}
});
}
}
}
EDIT: Removed aynchronous call from Mutex - as it will cause problems when called not from UI thread.