Hi i'm new in async programming. How can I run my method checkAvaible to run async? I would like to download 2500 pages at once if it's possible, dont wait to complete one download and start another. How can I make it?
private static void searchForLinks()
{
string url = "http://www.xxxx.pl/xxxx/?action=xxxx&id=";
for (int i = 0; i < 2500; i++)
{
string tmp = url;
tmp += Convert.ToString(i);
checkAvaible(tmp); // this method run async, do not wait while one page is downloading
}
Console.WriteLine(listOfUrls.Count());
Console.ReadLine();
}
private static async void checkAvaible(string url)
{
using (WebClient client = new WebClient())
{
string htmlCode = client.DownloadString(url);
if (htmlCode.IndexOf("Brak takiego obiektu w naszej bazie!") == -1)
listOfUrls.Add(url);
}
}
You would not want to download 2500 pages at the same time since this will be a problem for both your client and the server. Instead, I have added a concurrent download limitation (of 10 by default). The web pages will be downloaded 10 page at a time. (Or you can change it to 2500 if you are running a super computer :))
Generic Lists (I think it is a List of strings in your case) is not thread safe by default therefore you should synchronize access to the Add method. I have also added that.
Here is the updated source code to download pages asynhcronously with a configurable amount of concurrent calls
private static List<string> listOfUrls = new List<string>();
private static void searchForLinks()
{
string url = "http://www.xxxx.pl/xxxx/?action=xxxx&id=";
int numberOfConcurrentDownloads = 10;
for (int i = 0; i < 2500; i += numberOfConcurrentDownloads)
{
List<Task> allDownloads = new List<Task>();
for (int j = i; j < i + numberOfConcurrentDownloads; j++)
{
string tmp = url;
tmp += Convert.ToString(i);
allDownloads.Add(checkAvaible(tmp));
}
Task.WaitAll(allDownloads.ToArray());
}
Console.WriteLine(listOfUrls.Count());
Console.ReadLine();
}
private static async Task checkAvaible(string url)
{
using (WebClient client = new WebClient())
{
string htmlCode = await client.DownloadStringTaskAsync(new Uri(url));
if (htmlCode.IndexOf("Brak takiego obiektu w naszej bazie!") == -1)
{
lock (listOfUrls)
{
listOfUrls.Add(url);
}
}
}
}
It's best to convert code to async by working from the inside and proceeding out. Follow best practices along the way, such as avoiding async void, using the Async suffix, and returning results instead of modifying shared variables:
private static async Task<string> checkAvaibleAsync(string url)
{
using (var client = new HttpClient())
{
string htmlCode = await client.GetStringAsync(url);
if (htmlCode.IndexOf("Brak takiego obiektu w naszej bazie!") == -1)
return url;
else
return null;
}
}
You can then start off any number of these concurrently using Task.WhenAll:
private static async Task<string[]> searchForLinksAsync()
{
string url = "http://www.xxxx.pl/xxxx/?action=xxxx&id=";
var tasks = Enumerable.Range(0, 2500).Select(i => checkAvailableAsync(url + i));
var results = await Task.WhenAll(tasks);
var listOfUrls = results.Where(x => x != null).ToArray();
Console.WriteLine(listOfUrls.Length);
Console.ReadLine();
}
Related
I'm downloading 100K+ files and want to do it in patches, such as 100 files at a time.
static void Main(string[] args) {
Task.WaitAll(
new Task[]{
RunAsync()
});
}
// each group has 100 attachments.
static async Task RunAsync() {
foreach (var group in groups) {
var tasks = new List<Task>();
foreach (var attachment in group.attachments) {
tasks.Add(DownloadFileAsync(attachment, downloadPath));
}
await Task.WhenAll(tasks);
}
}
static async Task DownloadFileAsync(Attachment attachment, string path) {
using (var client = new HttpClient()) {
using (var fileStream = File.Create(path + attachment.FileName)) {
var downloadedFileStream = await client.GetStreamAsync(attachment.url);
await downloadedFileStream.CopyToAsync(fileStream);
}
}
}
Expected
Hoping it to download 100 files at a time, then download next 100;
Actual
It downloads a lot more at the same time. Quickly got an error Unable to read data from the transport connection: An existing connection was forcibly closed by the remote host
Running tasks in "batch" is not a good idea in terms of performance. A long running task would make whole batch block. A better approach would be starting a new task as soon as one is finished.
This can be implemented with a queue as #MertAkcakaya suggested. But I will post another alternative based on my other answer Have a set of Tasks with only X running at a time
int maxTread = 3;
System.Net.ServicePointManager.DefaultConnectionLimit = 50; //Set this once to a max value in your app
var urls = new Tuple<string, string>[] {
Tuple.Create("http://cnn.com","temp/cnn1.htm"),
Tuple.Create("http://cnn.com","temp/cnn2.htm"),
Tuple.Create("http://bbc.com","temp/bbc1.htm"),
Tuple.Create("http://bbc.com","temp/bbc2.htm"),
Tuple.Create("http://stackoverflow.com","temp/stackoverflow.htm"),
Tuple.Create("http://google.com","temp/google1.htm"),
Tuple.Create("http://google.com","temp/google2.htm"),
};
DownloadParallel(urls, maxTread);
async Task DownloadParallel(IEnumerable<Tuple<string,string>> urls, int maxThreads)
{
SemaphoreSlim maxThread = new SemaphoreSlim(maxThreads);
var client = new HttpClient();
foreach(var url in urls)
{
await maxThread.WaitAsync();
DownloadFile(client, url.Item1, url.Item2)
.ContinueWith((task) => maxThread.Release() );
}
}
async Task DownloadFile(HttpClient client, string url, string fileName)
{
var stream = await client.GetStreamAsync(url);
using (var fileStream = File.Create(fileName))
{
await stream.CopyToAsync(fileStream);
}
}
PS: DownloadParallel will return as soon as it starts the last download. So don't await it. If you really want to await it you should add for (int i = 0; i < maxThreads; i++) await maxThread.WaitAsync(); at the end of the method.
PS2: Don't forget to add exception handling to DownloadFile
I would like to seek your help in implementing Multi-Threading in my C# program.
The program aims to upload 10,000++ files to an ftp server. I am planning to implement atleast a minimum of 10 threads to increase the speed of the process.
With this, this is the line of code that I have:
I have initialized 10 Threads:
public ThreadStart[] threadstart = new ThreadStart[10];
public Thread[] thread = new Thread[10];
My plan is to assign one file to a thread, as follows:
file 1 > thread 1
file 2 > thread 2
file 3 > thread 3
.
.
.
file 10 > thread 10
file 11 > thread 1
.
.
.
And so I have the following:
foreach (string file in files)
{
loop++;
threadstart[loop] = new ThreadStart(() => ftp.uploadToFTP(uploadPath + #"/" + Path.GetFileName(file), file));
thread[loop] = new Thread(threadstart[loop]);
thread[loop].Start();
if (loop == 9)
{
loop = 0;
}
}
The passing of files to their respective threads is working. My problem is that the starting of the thread is overlapping.
One example of exception is that when Thread 1 is running, then a file is passed to it. It returns an error since Thread 1 is not yet successfully done, then a new parameter is being passed to it. Also true with other threads.
What is the best way to implement this?
Any feedback will be greatly appreciated. Thank you! :)
Using async-await and just pass an array of files into it:
private static async void TestFtpAsync(string userName, string password, string ftpBaseUri,
IEnumerable<string> fileNames)
{
var tasks = new List<Task<byte[]>>();
foreach (var fileInfo in fileNames.Select(fileName => new FileInfo(fileName)))
{
using (var webClient = new WebClient())
{
webClient.Credentials = new NetworkCredential(userName, password);
tasks.Add(webClient.UploadFileTaskAsync(ftpBaseUri + fileInfo.Name, fileInfo.FullName));
}
}
Console.WriteLine("Uploading...");
foreach (var task in tasks)
{
try
{
await task;
Console.WriteLine("Success");
}
catch (Exception ex)
{
Console.WriteLine(ex.ToString());
}
}
}
Then call it like this:
const string userName = "username";
const string password = "password";
const string ftpBaseUri = "ftp://192.168.1.1/";
var fileNames = new[] { #"d:\file0.txt", #"d:\file1.txt", #"d:\file2.txt" };
TestFtpAsync(userName, password, ftpBaseUri, fileNames);
Why doing it the hard way?
.net already has a class called ThreadPool.
You can just use that and it manages the threads itself.
Your code will be like this:
static void DoSomething(object n)
{
Console.WriteLine(n);
Thread.Sleep(10);
}
static void Main(string[] args)
{
ThreadPool.SetMaxThreads(20, 10);
for (int x = 0; x < 30; x++)
{
ThreadPool.QueueUserWorkItem(new WaitCallback(DoSomething), x);
}
Console.Read();
}
I have a string array which contains addresses of websites:
string[] arr = new string[]
{
"https://www/google.com",
"https://www.yahoo.com",
"https://www.microsoft.com"
};
I have to send these URLs as argument to the asynctask method so that I will be able to calculate the loading time of each website. I don't have to show the website pages, so I am not using webview.
I can use stopwatch or httprequest to calculate the loading time and my ultimate goal is that all the websites need to start loading at the same time asynchronously, and output has to look like the following
Loading time
google - 00:00:04:092345 (hr:min:sec:millisec) yahoo - 00:00:06:028458
How can I send an array to asynctask and how I can generate loading time without using await?
Here is a brief solution of what you could do.
This is not complete nor perfect. It will will give you the loading time of one URL. Also there is a suggestion of how you could extend this to multiple URLs.
You will need a WebView, either in code or from UI.
Load the URL into the WebView using webview.LoadUrl("https://www/google.com");.
Create a new class by extending it from WebViewClient as follows:
public class myWebViewClient : WebViewClient
{
public override void OnPageFinished(WebView view, string url)
{
base.OnPageFinished(view, url);
Console.WriteLine("OnPageFinished for url : " + url + " at : " + DateTime.Now);
}
}
In your OnCreate() method add the following line of code :
webview.SetWebViewClient(new myWebViewClient());
So from here what you have to do is, Create a Dictionary with URL as key and Loading time as value. Set all the loading time to 0 initially. Update the value corresponding to each URL in the OnPageFinished(). Create an async Task function which would return you the populated dictionary.
public async Task<Dictionary<string, double>> myAsyncFunction()
{
await Task.Delay(5); //to make it async
//Wait till all the OnPageFinished events have fired.
while (myDictionary.Any(x=>x.Value == 0) == true)
{
//there are still websites which have not fully loaded.
await Task.Delay(1); //wait a millisecond before checking again
}
return myDictionary;
}
You can call myAsyncFunction() in a seprate thread than your UI and implement the ContinueWith() or just let it run in a separate thread and write that output into somewhere that you can check when required.
eg : Task.Run(async () => await myAsyncFunction());
UPDATE : based on OP's comments
In the UI thread :
var myClassList = new List<myClass>
{
new myClass{URL = "https://www/google.com", TimeTaken = null},
new myClass{URL = "https://www.yahoo.com", TimeTaken = null},
new myClass{URL = "https://www.microsoft.com", TimeTaken = null}
};
Console.WriteLine("Started at : " + DateTime.Now.ToShortTimeString());
var business = new BusinessLogic();
var loadtimetask = business.GetLoadTimeTakenAsync(myClassList);
await loadtimetask;
Console.WriteLine("Completed at : " + DateTime.Now.ToShortTimeString());
And implementation class :
public async Task<List<myClass>> GetLoadTimeTakenAsync(List<myClass> myClassList)
{
Parallel.ForEach(myClassList, myClassObj =>
{
using (var client = new HttpClient())
{
myClassObj.StartTime = DateTime.Now;
var stream = client.GetStreamAsync(myClassObj.URL)
.ContinueWith((s) =>
{
if (s.IsCompleted)
{
var myClassObjCompleted = myClassList.Where(x => x.URL == myClassObj.URL).First();
myClassObjCompleted.EndTime = DateTime.Now;
myClassObjCompleted.TimeTaken = myClassObj.EndTime - myClassObj.StartTime;
}
});
Task.Run(async () => await stream);
}
});
while (myClassList.Any(x => x.TimeTaken == null))
{
await Task.Delay(1);
}
return myClassList;
}
//Create TextView to display status of Wifi
TextView wifitext = FindViewById<TextView>(Resource.Id.WifiTextView);
//Configuring Wifi connection
var connectivityManager = (ConnectivityManager)GetSystemService(ConnectivityService);
var activeConnection = connectivityManager.ActiveNetworkInfo;
if (activeConnection != null && activeConnection.IsConnected)
{
wifitext.Text = "WIFI AVAILABLE";
string[] urladdress = new string[] { "https://www.google.com/", "https://www.yahoo.com/"};
for (int i = 0; i < urladdress.Length; i++)
{
string url = urladdress[i];
//Call async method
Task returnedTask = Task_MethodAsync(url);
}
}
else
wifitext.Text = "WIFI UNAVAILABLE";
}
public async Task Task_MethodAsync(string url)
{
LinearLayout ll = FindViewById<LinearLayout>(Resource.Id.linearLayout1);
WebClient client = new WebClient();
Stopwatch stopwatch = new Stopwatch();
stopwatch.Start();
Stream listurl = client.OpenRead(url);
StreamReader reader = new StreamReader(listurl);
stopwatch.Stop();
// listurl.Close();
var time = Convert.ToString(stopwatch.Elapsed);
I am writing application on WP 8.1. One of my method is parsing html and everything was ok. But I want to change coding to have polish characters.
So I must to have Length properties to variable type byte[]. To make this possible I need to use await and changed my method on asnych.
public async void GetTimeTable(string href, int day)
{
string htmlPage = string.Empty;
using (var client = new HttpClient())
{
var response = await client.GetByteArrayAsync(URL);
char[] decoded = new char[response.Length];
for (int i = 0; i < response.Length; i++)
{
if (response[i] < 128)
decoded[i] = (char)response[i];
else if (response[i] < 0xA0)
decoded[i] = '\0';
else
decoded[i] = (char)iso8859_2[response[i] - 0xA0];
}
htmlPage = new string(decoded);
}
// further code... and on the end::
TimeTableCollection.Add(xxx);
}
public ObservableCollection<Groups> TimeTableCollection { get; set; }
Method is calling from MainPage.xaml.cs
vm.GetTimeTable(navContext.HrefValue, pivot.SelectedIndex);
TimeTableViewOnPage.DataContext = vm.TimeTableCollection;
And now is my question. Why vm.TimeTableCollection is null? When I don't use async and await everything is ok and vm.TimeTableCollection has x elements.
And now is my question. Why vm.TimeTableCollection is null?
Because you're executing an async operation without awaiting it. Hence, the request may not be complete when you access your vm property in the next line.
You need to change your method signature to async Task and await it:
public async Task GetTimeTableAsync(string href, int day)
{
string htmlPage = string.Empty;
using (var client = new HttpClient())
{
var response = await client.GetByteArrayAsync(URL);
char[] decoded = new char[response.Length];
for (int i = 0; i < response.Length; i++)
{
if (response[i] < 128)
decoded[i] = (char)response[i];
else if (response[i] < 0xA0)
decoded[i] = '\0';
else
decoded[i] = (char)iso8859_2[response[i] - 0xA0];
}
htmlPage = new string(decoded);
}
// further code... and on the end::
TimeTableCollection.Add(xxx);
}
and then:
await vm.GetTimeTableAsync(navContext.HrefValue, pivot.SelectedIndex);
This means your top calling method has to become async as well. This is usually the behavior when dealing with async methods, you need to go async all the way.
Note, to follow TPL guidelines, you should mark any async method with the Async postfix, Hence GetTimeTable should be GetTimeTableAsync
You're not awaiting the result:
await vm.GetTimeTable(navContext.HrefValue, pivot.SelectedIndex);
TimeTableViewOnPage.DataContext = vm.TimeTableCollection;
If you don't await an async method, the program will execute it, and continue to execute the following code without waiting for it to finish.
I have a zip file creator that takes in a String[] of Urls, and returns a zip file with all of the files in the String[]
I figured there would be a number of example of this, but I cannot seem to find an answer to "How to download many files asynchronously and return when done"
How do I download {n} files at once, and return the Dictionary only when all downloads are complete?
private static Dictionary<string, byte[]> ReturnedFileData(IEnumerable<string> urlList)
{
var returnList = new Dictionary<string, byte[]>();
using (var client = new WebClient())
{
foreach (var url in urlList)
{
client.DownloadDataCompleted += (sender1, e1) => returnList.Add(GetFileNameFromUrlString(url), e1.Result);
client.DownloadDataAsync(new Uri(url));
}
}
return returnList;
}
private static string GetFileNameFromUrlString(string url)
{
var uri = new Uri(url);
return System.IO.Path.GetFileName(uri.LocalPath);
}
First, you tagged your question with async-await without actually using it. There really is no reason anymore to use the old asynchronous paradigms.
To wait asynchronously for all concurrent async operation to complete you should use Task.WhenAll which means that you need to keep all the tasks in some construct (i.e. dictionary) before actually extracting their results.
At the end, when you have all the results in hand you just create the new result dictionary by parsing the uri into the file name, and extracting the result out of the async tasks.
async Task<Dictionary<string, byte[]>> ReturnFileData(IEnumerable<string> urls)
{
var dictionary = urls.ToDictionary(
url => new Uri(url),
url => new WebClient().DownloadDataTaskAsync(url));
await Task.WhenAll(dictionary.Values);
return dictionary.ToDictionary(
pair => Path.GetFileName(pair.Key.LocalPath),
pair => pair.Value.Result);
}
public string JUST_return_dataURL_by_URL(string URL, int interval, int max_interval)
{
var client = new WebClient(proxy);
client.Headers = _headers;
string downloaded_from_URL = "false"; //default - until downloading
client.DownloadDataCompleted += bytes =>
{
Console.WriteLine("Done!");
string dataURL = Convert.ToBase64String( bytes );
string filename = Guid.NewGuid().ToString().Trim('{', '}')+".png";
downloaded_from_URL =
"Image Downloaded from " + URL
+ "<br>"
+ "<a href=\""+dataURL+"\" download=\""+filename+"\">"
+ "<img src=\"data:image/png;base64," + dataURL + "\"/>"+filename
+ "</a>"
;
return;
};
client.DownloadDataAsync(new System.Uri(URL));
int i = 0;
do{
// Console.WriteLine(
// "(interval > 10): "+(interval > 10)
// +"\n(downloaded_from_URL == \"false\"): " + (downloaded_from_URL == "false")
// +"\ninterval: "+interval
// );
Thread.Sleep(interval);
i+=interval;
}
while( (downloaded_from_URL == "false") && (i < max_interval) );
return downloaded_from_URL;
}
You'd be wanting the task.WaitAll method...
msdn link
Create each download as a separate task, then pass them as a collection.
A shortcut to this might be to wrap your download method in a task.
Return new Task<downloadresult>(()=>{ method body});
Apologies for vagueness, working on iPad sucks for coding.
EDIT:
Another implementation of this that may be worth considering is wrapping the downloads using the parallel framework.
Since your tasks all do the same thing taking a parameter, you could instead use Parallel.Foreach and wrap that into a single task:
public System.Threading.Tasks.Task<System.Collections.Generic.IDictionary<string, byte[]>> DownloadTask(System.Collections.Generic.IEnumerable<string> urlList)
{
return new System.Threading.Tasks.Task<System.Collections.Generic.IDictionary<string, byte[]>>(() =>
{
var r = new System.Collections.Concurrent.ConcurrentDictionary<string, byte[]>();
System.Threading.Tasks.Parallel.ForEach<string>(urlList, (url, s, l) =>
{
using (System.Net.WebClient client = new System.Net.WebClient())
{
var bytedata = client.DownloadData(url);
r.TryAdd(url, bytedata);
}
});
var results = new System.Collections.Generic.Dictionary<string, byte[]>();
foreach (var value in r)
{
results.Add(value.Key, value.Value);
}
return results;
});
}
This leverages a concurrent collection to support parallel access within the method before converting back to IDictionary.
This method returns a task so can be called with an await.
Hope this provides a helpful alternative.