C# Multi-threading - Upload to FTP Server - c#

I would like to seek your help in implementing Multi-Threading in my C# program.
The program aims to upload 10,000++ files to an ftp server. I am planning to implement atleast a minimum of 10 threads to increase the speed of the process.
With this, this is the line of code that I have:
I have initialized 10 Threads:
public ThreadStart[] threadstart = new ThreadStart[10];
public Thread[] thread = new Thread[10];
My plan is to assign one file to a thread, as follows:
file 1 > thread 1
file 2 > thread 2
file 3 > thread 3
.
.
.
file 10 > thread 10
file 11 > thread 1
.
.
.
And so I have the following:
foreach (string file in files)
{
loop++;
threadstart[loop] = new ThreadStart(() => ftp.uploadToFTP(uploadPath + #"/" + Path.GetFileName(file), file));
thread[loop] = new Thread(threadstart[loop]);
thread[loop].Start();
if (loop == 9)
{
loop = 0;
}
}
The passing of files to their respective threads is working. My problem is that the starting of the thread is overlapping.
One example of exception is that when Thread 1 is running, then a file is passed to it. It returns an error since Thread 1 is not yet successfully done, then a new parameter is being passed to it. Also true with other threads.
What is the best way to implement this?
Any feedback will be greatly appreciated. Thank you! :)

Using async-await and just pass an array of files into it:
private static async void TestFtpAsync(string userName, string password, string ftpBaseUri,
IEnumerable<string> fileNames)
{
var tasks = new List<Task<byte[]>>();
foreach (var fileInfo in fileNames.Select(fileName => new FileInfo(fileName)))
{
using (var webClient = new WebClient())
{
webClient.Credentials = new NetworkCredential(userName, password);
tasks.Add(webClient.UploadFileTaskAsync(ftpBaseUri + fileInfo.Name, fileInfo.FullName));
}
}
Console.WriteLine("Uploading...");
foreach (var task in tasks)
{
try
{
await task;
Console.WriteLine("Success");
}
catch (Exception ex)
{
Console.WriteLine(ex.ToString());
}
}
}
Then call it like this:
const string userName = "username";
const string password = "password";
const string ftpBaseUri = "ftp://192.168.1.1/";
var fileNames = new[] { #"d:\file0.txt", #"d:\file1.txt", #"d:\file2.txt" };
TestFtpAsync(userName, password, ftpBaseUri, fileNames);

Why doing it the hard way?
.net already has a class called ThreadPool.
You can just use that and it manages the threads itself.
Your code will be like this:
static void DoSomething(object n)
{
Console.WriteLine(n);
Thread.Sleep(10);
}
static void Main(string[] args)
{
ThreadPool.SetMaxThreads(20, 10);
for (int x = 0; x < 30; x++)
{
ThreadPool.QueueUserWorkItem(new WaitCallback(DoSomething), x);
}
Console.Read();
}

Related

Method giving back wrong result during Task

I have a loop creating three tasks:
List<Task> tasks = new List<Task>();
foreach (DSDevice device in validdevices)
{
var task = Task.Run(() =>
{
var conf = PrepareModasConfig(device, alternativconfig));
//CHECK-Point1
string config = ModasDicToConfig(conf);
//CHECK-Point2
if (config != null)
{
//Do Stuff
}
else
{
//Do other Stuff
}
});
tasks.Add(task);
}
Task.WaitAll(tasks.ToArray());
it calls this method, where some data of a dictionary of a default-config gets overwritten:
private Dictionary<string, Dictionary<string, string>> PrepareModasConfig(DSDevice device, string alternativeconfig)
{
try
{
Dictionary<string, Dictionary<string, string>> config = new Dictionary<string, Dictionary<string, string>>(Project.project.ModasConfig.Config);
if (config.ContainsKey("[Main]"))
{
if (config["[Main]"].ContainsKey("DevName"))
{
config["[Main]"]["DevName"] = device.ID;
}
}
return config;
}
catch
{
return null;
}
}
and after that, it gets converted into a string with this method:
private string ModasDicToConfig(Dictionary<string, Dictionary<string, string>> dic)
{
string s = string.Empty;
try
{
foreach (string key in dic.Keys)
{
s = s + key + "\n";
foreach (string k in dic[key].Keys)
{
s = s + k + "=" + dic[key][k] + "\n";
}
s = s + "\n";
}
return s;
}
catch
{
return null;
}
}
But every Tasks gets the exact same string back.
On //CHECK-Point1 I check the Dic for the changed value: Correct Value for each Task
On //CHECK-Point2 I check the String: Same String on all 3 Tasks (Should be of course different)
Default-Dictionary looks like this: (shortened)
{
{"[Main]",
{"DevName", "Default"},
...
},
...
}
The resulting string look like that:
[Main]
DevName=003 <--This should be different (from Device.ID)
...
[...]
EDIT:
I moved the methods to execute outside the Task. Now I get the correct Results. So I guess it has something to do with the Task?
List<Task> tasks = new List<Task>();
foreach (DSDevice device in validdevices)
{
var conf = PrepareModasConfig(device, alternativconfig));
//CHECK-Point1
string config = ModasDicToConfig(conf);
//CHECK-Point2
var task = Task.Run(() =>
{
if (config != null)
{
//Do Stuff
}
else
{
//Do other Stuff
}
});
tasks.Add(task);
}
Task.WaitAll(tasks.ToArray());
The problem isn't caused by tasks. The lambda passed to Task.Run captures the loop variable device so when the tasks are executed, all will use the contents of that variable. The same problem would occur even without tasks as this SO question shows. The following code would print 10 times:
List<Action> actions = new List<Action>();
for (int i = 0; i < 10; ++i )
actions.Add(()=>Console.WriteLine(i));
foreach (Action a in actions)
a();
------
10
10
10
10
10
10
10
10
10
10
If the question's code used an Action without Task.Run it would still result in bad results.
One way to fix this is to copy the loop variable into a local variable and use only that in the lambda :
for (int i = 0; i < 10; ++i )
{
var ii=i;
actions.Add(()=>Console.WriteLine(ii));
}
The question's code can be fixed by copying the device loop variable into the loop:
foreach (DSDevice dev in validdevices)
{
var device=dev;
var task = Task.Run(() =>
{
var conf = PrepareModasConfig(device, alternativconfig));
Another way is to use Parallel.ForEach to process all items in parallel, using all available cores, without creating tasks explicitly:
Parallel.ForEach(validdevices,device=>{
var conf = PrepareModasConfig(device, alternativconfig));
string config = ModasDicToConfig(conf);
...
});
Parallel.ForEach allows limiting the number of worker tasks through the MaxDegreeOfParallelism option. It's a blocking call because it uses the current thread to process data along with any worker tasks.

how do you release a thread when it finishes

I have an application that recursively walks a very large (6 TB) folder. To speed things up, I create a new thread for each recursion. At one point my thread count was in excess of 12,000. As the task gets closer to completion, my thread count gets drops, but on Task Manager the thread count keeps climbing. I think that indicates that the threads are not being garbage collected when they finish.
At one point, my internal thread count showed 5575 threads while the Windows resource monitor showed the task using 33,023 threads.
static void Main(string[] args)
{
string folderName = Properties.Settings.Default.rootFolder;
ParameterizedThreadStart needleThreader = new ParameterizedThreadStart(needle);
Thread eye = new Thread(needleThreader);
threadcount = 1;
eye.Start(folderName);
}
static void needle(object objFolderName)
{
string folderName = (string)objFolderName;
FolderData folderData = getFolderData(folderName);
addToDB(folderData);
//since the above statement gets executed (my database table
//gets populated), I think the thread should get garbage collected
//here, but the windows thread count keeps climbing.
}
// recursive routine to walk directory structure and create annotated treeview
private static FolderData getFolderData(string folderName)
{
//Console.WriteLine(folderName);
long folderSize = 0;
string[] directories = new string[] { };
string[] files = new string[] { };
try
{
directories = Directory.GetDirectories(folderName);
}
catch { };
try
{
files = Directory.GetFiles(folderName);
}
catch { }
for (int f = 0; f < files.Length; f++)
{
try
{
folderSize += new FileInfo(files[f]).Length;
}
catch { } //cannot access file so skip;
}
FolderData folderData = new FolderData(folderName, directories.Length, files.Length, folderSize);
List<String> directoryList = directories.ToList<String>();
directoryList.Sort();
for (int d = 0; d < directoryList.Count; d++)
{
Console.Write(" " + threadcount + " ");
//threadcount is my internal counter. it increments here
//where i start a new thread and decrements when the thread ends
//see below
threadcount++;
ParameterizedThreadStart needleThreader = new ParameterizedThreadStart(needle);
Thread eye = new Thread(needleThreader);
eye.Start(directoryList[d]);
}
//thread is finished, so decrement
threadcount--;
return folderData;
}
Thanks to matt-dot-net's suggestion I spent a few hours research TPL (Task Parallel Library), and it was well worth it.
Here is my new code. It works blazingly fast, does not peg the CPU (uses 41% which is a lot but still plays nice in the sandbox), uses only about 160MB of memory (instead of nearly all of the 4GB available) and uses a maximum of about 70 threads.
You'd almost think I new what I was doing. But the .net TPL handles all the hard stuff, like determining the correct number of threads and making sure they clean up after themselves.
class Program
{
static object padlock = new object();
static void Main(string[] args)
{
OracleConnection ora = new OracleConnection(Properties.Settings.Default.ora);
ora.Open();
new OracleCommand("DELETE FROM SCRPT_APP.S_DRIVE_FOLDERS", ora).ExecuteNonQuery();
ora.Close();
string folderName = Properties.Settings.Default.rootFolder;
Task processRoot = new Task((value) =>
{
getFolderData(value);
}, folderName);
//wait is like join; it waits for this asynchronous task to finish.
processRoot.Start();
processRoot.Wait();
}
// recursive routine to walk directory structure and create annotated treeview
private static void getFolderData(object objFolderName)
{
string folderName = (string)objFolderName;
Console.WriteLine(folderName);
long folderSize = 0;
string[] directories = new string[] { };
string[] files = new string[] { };
try
{
directories = Directory.GetDirectories(folderName);
}
catch { };
try
{
files = Directory.GetFiles(folderName);
}
catch { }
for (int f = 0; f < files.Length; f++)
{
try
{
folderSize += new FileInfo(files[f]).Length;
}
catch { } //cannot access file so skip;
}
FolderData folderData = new FolderData(folderName, directories.Length, files.Length, folderSize);
List<String> directoryList = directories.ToList<String>();
directoryList.Sort();
//create a task for each subdirectory
List<Task> dirTasks = new List<Task>();
for (int d = 0; d < directoryList.Count; d++)
{
dirTasks.Add(new Task((value) =>
{
getFolderData(value);
}, directoryList[d]));
}
//start all tasks
foreach (Task task in dirTasks)
{
task.Start();
}
//wait fo them to finish
Task.WaitAll(dirTasks.ToArray());
addToDB(folderData);
}
private static void addToDB(FolderData folderData)
{
lock (padlock)
{
OracleConnection ora = new OracleConnection(Properties.Settings.Default.ora);
ora.Open();
OracleCommand addFolderData = new OracleCommand(
"INSERT INTO FOLDERS " +
"(PATH, FOLDERS, FILES, SPACE_USED) " +
"VALUES " +
"(:PATH, :FOLDERS, :FILES, :SPACE_USED) ",
ora);
addFolderData.BindByName = true;
addFolderData.Parameters.Add(":PATH", OracleDbType.Varchar2);
addFolderData.Parameters.Add(":FOLDERS", OracleDbType.Int32);
addFolderData.Parameters.Add(":FILES", OracleDbType.Int32);
addFolderData.Parameters.Add(":SPACE_USED", OracleDbType.Int64);
addFolderData.Prepare();
addFolderData.Parameters[":PATH"].Value = folderData.FolderName;
addFolderData.Parameters[":FOLDERS"].Value = folderData.FolderCount;
addFolderData.Parameters[":FILES"].Value = folderData.FileCount;
addFolderData.Parameters[":SPACE_USED"].Value = folderData.Size;
addFolderData.ExecuteNonQuery();
ora.Close();
}
}
}
}

Calling API asynchronously in c#

I need to call a API 5000 times, Presently with the current logic its happening synchronously one by one. Is there any way to call it asynchronously without actually waiting for the API response. Code Below.
while (true)
{
using (HttpClient httpclient = new HttpClient())
{// ***Want to call the API Asynchronously***
for (int i = 0; i < 5000; i++)
{
DateTime dt = DateTime.Now;
dt = dt.AddSeconds(-dt.Second);
Log[] data1 = new Log[]
{
log =new Log(){LogID=Guid.NewGuid(),LogLevel=new LogLevel(){ },Message="Maverick_Messgaes",Source="Maverick",StackTrace="Maverick Started",
Time=dt,Traceid="1"},
};
var response4 = httpclient.PostAsJsonAsync("http://localhost:8095/api/Log/PostAsync", data1).Result;
}
}
//logstack.Clear();
Console.WriteLine(log.Message + log.Time + " ");
Thread.Sleep(120000);
Console.WriteLine(" " + " 5000 messages Sent.. Iterating Again" + "" + DateTime.Now.ToString());
}
}
catch(Exception ex)
{ throw ex; }
}
You could replace your for-loop with a Parallel.For loop to run the code within the loop in parallel.
This guide provides a good introduction with examples: https://learn.microsoft.com/en-us/dotnet/standard/parallel-programming/how-to-write-a-simple-parallel-for-loop
In its simplest form, it would look like:
Parallel.For(0, 5000, i =>
{
// code within existing for-loop goes here...
});
If you're concerned about the number of concurrent tasks, there are overloads that take a ParallelOptions parameter, within which you can specify the MaxDegreeOfParallelism.
Looks like you are not doing anything with the result so I am assuming you don't need to return it.
You need to make your method async and await the HttpClient synchronous call just like the code below.
static void Main(string[] args)
{
using (var client = new HttpClient())
{
for (int i = 0; i < 10; i++)
{
Console.WriteLine("Continuing iteration " + i);
PostData(client);
}
Console.ReadKey();
}
}
static async void PostData(HttpClient client)
{
await client.GetStringAsync("https://www.google.com.ph");
Console.WriteLine("Async call done");
}

I want to read content of 10 different website and save them into 10 diferrent text files

I have been able to read the content of a web page and saved it into a file. This question is, how do I read for 10 different webpages without having to repeat the code over and over again. Is there a loop mechanism that can help? Here is what I have done:
static void Main(string[] args)
{
Task Task1 = new Task(() => ReadWriteWeb("http://www.hawaii.edu"));
Task1.Start();
Console.ReadLine();
}
static void ReadWriteWeb(string Url)
{
try
{
using (WebClient WebC = new WebClient())
{
string WebContents = WebC.DownloadString(Url);
Console.WriteLine(WebContents);
using (StreamWriter SW = new StreamWriter("myFile")
SW.WriteLine(WebContents + ". " + "The lenght of file is {0}", WebContents.Length);
}
}
catch (Exception e)
{
Console.WriteLine("The web content cannot be reached");
Console.WriteLine(e.Message);
}
}
Forgive me, but assuming you are aware of loops already as I hope you are here is an the answer I thought you would be looking for. This will download all the files simultaneously via a simple loop / multithreaded feature of linq.
public class Program
{
volatile static int fileNameCounter = 1;
static void Main(string[] args)
{
var listOfTasks = new List<Task>()
{
new Task(() => ReadWriteWeb("http://www.hawaii.edu")),
new Task(() => ReadWriteWeb("http://www.hawaii.edu")),
new Task(() => ReadWriteWeb("http://www.hawaii.edu")),
new Task(() => ReadWriteWeb("http://www.hawaii.edu"))
};
listOfTasks.AsParallel().ForAll(task => task.Start());
Console.ReadLine();
}
static async void ReadWriteWeb(string Url)
{
Console.WriteLine($"File {fileNameCounter} complete");
try
{
using (WebClient WebC = new WebClient())
{
string WebContents = await WebC.DownloadStringTaskAsync(Url);
Console.WriteLine(WebContents);
using (StreamWriter SW = new StreamWriter($"myFile{fileNameCounter++}"))
SW.WriteLine(WebContents + ". " + "The lenght of file is {0}", WebContents.Length);
}
}
catch (Exception e)
{
Console.WriteLine("The web content cannot be reached");
Console.WriteLine(e.Message);
}
}
}
Since I only had the one web url I listed the same one 4 times. You get the idea...
Standard looping constructs in C# include the for loop and the foreach loop.
In general, anything you could possibly want to know about C# can be found in the reference on MSDN.

C# Task.WaitAll isn't waiting

My aim is to download images from an Amazon Web Services bucket.
I have the following code function which downloads multiple images at once:
public static void DownloadFilesFromAWS(string bucketName, List<string> imageNames)
{
int batchSize = 50;
int maxDownloadMilliseconds = 10000;
List<Task> tasks = new List<Task>();
for (int i = 0; i < imageNames.Count; i++)
{
string imageName = imageNames[i];
Task task = Task.Run(() => GetFile(bucketName, imageName));
tasks.Add(task);
if (tasks.Count > 0 && tasks.Count % batchSize == 0)
{
Task.WaitAll(tasks.ToArray(), maxDownloadMilliseconds);//wait to download
tasks.Clear();
}
}
//if there are any left, wait for them
Task.WaitAll(tasks.ToArray(), maxDownloadMilliseconds);
}
private static void GetFile(string bucketName, string filename)
{
try
{
using (AmazonS3Client awsClient = new AmazonS3Client(Amazon.RegionEndpoint.EUWest1))
{
string key = Path.GetFileName(filename);
GetObjectRequest getObjectRequest = new GetObjectRequest() {
BucketName = bucketName,
Key = key
};
using (GetObjectResponse response = awsClient.GetObject(getObjectRequest))
{
string directory = Path.GetDirectoryName(filename);
if (!Directory.Exists(directory))
{
Directory.CreateDirectory(directory);
}
if (!File.Exists(filename))
{
response.WriteResponseStreamToFile(filename);
}
}
}
}
catch (AmazonS3Exception amazonS3Exception)
{
if (amazonS3Exception.ErrorCode == "NoSuchKey")
{
return;
}
if (amazonS3Exception.ErrorCode != null && (amazonS3Exception.ErrorCode.Equals("InvalidAccessKeyId") || amazonS3Exception.ErrorCode.Equals("InvalidSecurity")))
{
// Log AWS invalid credentials
throw new ApplicationException("AWS Invalid Credentials");
}
else
{
// Log generic AWS exception
throw new ApplicationException("AWS Exception: " + amazonS3Exception.Message);
}
}
catch
{
//
}
}
The downloading of the images all works fine but the Task.WaitAll seems to be ignored and the rest of the code continues to be executed - meaning I try to get files that are currently non existent (as they've not yet been downloaded).
I found this answer to another question which seems to be the same as mine. I tried to use the answer to change my code but it still wouldn't wait for all files to be downloaded.
Can anyone tell me where I am going wrong?
The code behaves as expected. Task.WaitAll returns after ten seconds even when not all files have been downloaded, because you have specified a timeout of 10 seconds (10000 milliseconds) in variable maxDownloadMilliseconds.
If you really want to wait for all downloads to finish, call Task.WaitAll without specifying a timeout.
Use
Task.WaitAll(tasks.ToArray());//wait to download
at both places.
To see some good explanations on how to implement parallel downloads while not stressing the system (only have a maximum number of parallel downloads), see the answer at How can I limit Parallel.ForEach?

Categories

Resources