Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 6 years ago.
Improve this question
Hi I am using multithread to copy many files from a source to multi-network destinations, each thread copy a bulk of files to Different network!
I use .net File.Copy(...)
I see 100% uses on only one network, each moment
the 100% change from network to network.
I tried to change the destinations to local one, then i see balanced bytes copy over all threads
I tried to run 10 processes(each one to a different destination) instead of 10 thread, then I get all 10 network at 100% use.
I use .net 4.5
any idea ?
I'd suggest you to replace threads (good for CPU bound operations) with async/await model which perform excellent for IO bound operations.
Let's rewrite File.Copy operation as
public static async Task Copy(string src, string dest)
{
await Task.Run(() => {
System.IO.File.Copy(src, dest);
});
}
You can call it from calling method as snippet below.
var srcPath = "your source location";
var dstPath = "your destination location";
foreach(var file in System.IO.Directory.EnumerateFiles(srcPath))
{
var dstFile = System.IO.Path.Combine(dstPath, System.IO.Path.GetFileName (file));
Copy (file, dstFile);
}
Now you can simply pump this method with src and destination paths as fast as you can. Your limitation will be IO speed (disk/network etc) but your Cpu will be mostly free.
Have you taken at look at asynchronous operations. There is some really good documentations on MSDN about specifically async IO.
There are also some questions about async IO on stack overflow such as this one.
Using Task.Run you can queue all of your file copy operations asynchronously.
Try something like:
List<String> fileList = new List<string>();
fileList.Add("TextFile1.txt");
fileList.Add("TextFile2.txt");
Parallel.For(0, fileList.Count, x =>
{
File.Copy(fileList[x], #"C:\" + fileList[x]);
}
);
Change c:\ to match your multiple destinations. If you provide the orignal code we could do more.
Related
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 3 years ago.
Improve this question
I have a windows forms app that works well on my development machine. However, I see strange behavior trying to run multiple tasks in parallel after publishing the application. There is no error, but it doesn't work as expected. Here is the code:
private async void Button1_Click(object sender, EventArgs e)
{
button1.Enabled = false;
try
{
var watch = Stopwatch.StartNew();
textBox1.Text = $"Processing...";
await SyncAppDbAsync();
watch.Stop();
var time = watch.ElapsedMilliseconds;
textBox1.Text = $"End successfully. Minutes: {String.Format("{0:0.00}", (double)(time / 1000) / 60)}";
}
catch (Exception ex)
{
textBox1.Text = $"Message: {ex.Message}, Source: {ex.Source}, HResult: {ex.InnerException}";
}
}
public async Task SyncAppDbAsync()
{
//delete tables rows
// I block the UI for some seconds because not want to write
// a record if is not deleted
Task.WaitAll(
AgenteApp.RemoveAllAgentiAppAsync(),
RubricaApp.RemoveAllRubricheAppAsync(),
...
);
//read data da from database
var readAgents = Task.Run(Agent.GetAgentAsync);
var readAddressBooks = Task.Run(AddressBook.GetAddressBookAsync);
...
await Task.WhenAll(
readAgents,
readAddressBooks,
...
);
//save data on sqlite database(myDb.db)
var addAgenti = Task.Run(async () =>
{
var progrIndicator = new Progress<int>(AgentiProgress);
var agenti = AgenteApp.FillAgentiAppFromCompanyAsync(await readAgents, progrIndicator);
await AgenteApp.AddAgentiAppAsync(await agenti);
});
var addRubriche = Task.Run(async () =>
{
var progrIndicator = new Progress<int>(RubricheProgress);
var rubriche = RubricaApp.FillRubricheAppFromCompanyAsync(await readAddressBooks, progrIndicator);
await RubricaApp.AddRubricheAppAsync(await rubriche);
});
await Task.WhenAll(
addAgenti,
addRubriche,
...
);
}
Each task in that code corresponds to a table in an sqlite database. The code reads data from one sqlite database and writes to another sqlite database.
I expect this code to take a few minutes to run. In the meantime, there is a progress bar for each table that should update. Instead, the code runs in just a few seconds, the progress bars never update, and the database tables are unchanged. I see this text in my textbox at the end: End successfully. Minutes: 0,02.
What can I do to understand the problem and fix it? Again, this works correctly on my development machine.
UPDATE:
Sorry to everyone: code works perfectly fine! I make stupid mistake
with a path of sqlite database.
I hardcoded in app.config:
I accept suggests on how make dynamic that path
So again sorry
There's not enough information in the question at the time I'm writing this for me to evaluate the problem. But I can at least give some strategies that will help you find a solution on your own:
Add excessive and detailed logging to the code (You can remove it later). That will help you understand what is happening as the program runs, and potentially see where it goes wrong. Run this in production if you have to, but preferably:
If you don't already have one, get a staging or QA environment separate from your own machine, (use a local VM if you really have to) where you can reproduce the problem on demand, away from production. The logging information from the previous step may help with this.
Look for exceptions that might be hidden by the async code. Make sure you're checking the result of each of those operations.
Remove most of the code. The program will be incomplete, but it will run that incomplete section as expected. Keep adding more small chunks of the complete program back until it breaks again. At this point, you will (probably) know where the issue is... though it could be a race condition caused by an earlier block, but at least you'll have a clue where to start looking.
Unroll the async code and run everything using traditional synchronized methods. Make sure the simple synchronized code works in the production environment before trying to adding parallelism.
When you finally track down this issue, make sure you have a unit test that will detect the problem in the future before it goes to production, to avoid a regression.
I have a windows form application that currently does the following:
1) point to a directory and do 2) for all the xml files in there (usually max of 25 files ranging from 10mb to !5gb! - uncommon but possible)
2) xml read/write to alter some of the existing xml attributes (currently I use a single background worker for that)
3) write the altered xml attributes directly to a NEW file in a different dir
the little app works fine but it takes far tooo long to finish (about 20min depending on the net gb size)
what I casually tried is start the main rw method in a Parallel.ForEach() but it blocked itself out unsurprisingly and exited
my idea would be to parallelize the read/write process by starting it on all ~25 files at the same time is this wise? how can I do it (TPL?) without locking myself out?
PS: I have quite a powerful desktop pc, with 1TB samsung pro ssd, 16gb of ram, and intel core i7
You can have a ThreadPool for this approach
You Can have a pool for a size of 20 files
Because you have core i7 , you should use TaskFactory.StartNew
In this case , you should encapsulate the code for processing on file in the a sample class like XMLProcessor
then with use of TaskFactory.StartNew you can use multithreading for xml processsing
This sounds like a job for data parallelism via PLINQ + asynchronous lambdas.
I recently needed to process data from a zip archive that itself contained 5,200 zip archives which then each contained one or more data files in XML or CSV format. In total, between 40-60 GB of data when decompressed and read into memory.
The algorithm browses through this data, makes decisions based on what it finds in conjunction with supplied predicates, and finally writes the selections to disk as 1.0-1.5 GB files. Using an async PLINQ pattern with 32 processors, the average run time for each output file was 4.23 minutes.
After implementing the straightforward solution with async PLINQ, I spent some time trying to improve the running time by digging down into the TPL and TPL Dataflow libraries. In the end, attempting to beat async PLINQ proved to be a fun but ultimately fruitless exercise for my needs. The performance margins from the more "optimized" solutions were not worth the added complexity.
Below is an example of the async PLINQ pattern. The initial collection is an array of file paths.
In the first step, each file path is asynchronously read into memory and parsed, the file name is cached as a root-level attribute, and streamed to the next function.
In the last step, each XElement is asynchronously written to a new file.
I recommend that you play around with the lambda that reads the files. In my case, I found that reading via an async lambda gave me better throughput while decompressing files in memory.
However, for simple XML documents, you may be better off replacing the first async lambda with a method call to XElement.Load(string file) and letting PLINQ read as needed.
using System.IO;
using System.Linq;
using System.Xml.Linq;
namespace AsyncPlinqExample
{
public class Program
{
public static void Main(string[] args)
{
// Limit parallelism here if needed
int degreeOfParallelism = Environment.ProcessorCount;
string resultDirectory = "[result directory path here]";
string[] files = Directory.GetFiles("[directory with files here]");
files.AsParallel()
.WithDegreeOfParallelism(degreeOfParallelism)
.Select(
async x =>
{
using (StreamReader reader = new StreamReader(x))
{
XElement root = XElement.Parse(await reader.ReadToEndAsync());
root.SetAttributeValue("filePath", Path.GetFileName(x));
return root;
}
})
.Select(x => x.Result)
.Select(
x =>
{
// Perform other manipulations here
return x;
})
.Select(
async x =>
{
string resultPath =
Path.Combine(
resultDirectory,
(string) x.Attribute("fileName"));
await Console.Out.WriteLineAsync($"{DateTime.Now}: Starting {(string) x.Attribute("fileName")}.");
using (StreamWriter writer = new StreamWriter(resultPath))
{
await writer.WriteAsync(x.ToString());
}
await Console.Out.WriteLineAsync($"{DateTime.Now}: Comleted {(string)x.Attribute("fileName")}.");
});
}
}
}
I have a method:GetThumbnailPhotoForProfile() to which I pass a list<EmployeeInfo> which has few properties like FirstName,LastName and EmailAddress. Within the method I am using the following code which based on each email address connects to the Active Directory and fetches the profile image and saves it to a location (C:\Temp:\Images)
private bool GetThumbnailPhotoForProfile(List<EmployeeInfo> EmployeeInfoList)
{
Parallel.ForEach(employeeInfoList.Select(emp=>emp.EmailAddress).ToList(),emailAddress =>
{
var thumbnail = emailAddress.GetThumbnailPhoto();
if (thumbnail != null)
{
thumbnail.Save("C:\Test\Images\" + emailAddress + ".jpg", ImageFormat.Jpeg);
}
else
{
Log.Information("Thumbnail photo is not available for the {EmailAddress}.", emailAddress);
}
});
}
I initially used foreach loop and found that the method was taking more time, then I updated the code to use Parallel.ForEach and found that it is taking less time. I tested it many times and found the same performance difference.
Here I want to know whether the above method is thread safe and efficient or is there any better way to implement it.
Can anyone please guide me here.
The question whether the code is thread safe cannot be answered based on just the code you showed. It depends on what exactly GetThumbnailPhoto() and Save() do and possibly on how EmployeeInfoList is constructed.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
I have two identical process running on two different computers which are accessing a shared folder in a third computer. This share folder have directories.So how to make sure these two processors does not access and start working on the same folder. in other words if one already working on it other skips that directory.
I tried creating a temp file on the directory using File.Create() method
MSDN state that
The FileStream object created by this method has a default FileShare
value of None; no other process or code can access the created file
until the original file handle is closed.
So my logic was to skip the directory if the temp file was already there or if a exception was thrown while trying to create it.
Unfortunately this does not seem to work.So my question is
Does what MSDN says hold true even if the files are created in a
shared folder by processors running on different computers or is it
only valid for local drives
A way to actually accomplish what i am trying to do
Your best option is to create a true temporary file.
Use this code around your directory processing:
public void ProcessDirectory(string path)
{
using (var lockFile = File.Create(Path.Combine(path, "process.lock"), 256, FileOptions.DeleteOnClose))
{
// process the directory here
}
}
Notice the FileOptions.DeleteOnClose there. Even if your process terminates abnormally here, the file system will take care to delete the file for you when you close it. In the case of the abnormal termination, the file is closed as part of the process tear down. In the case of a hard power failure, the file might be left behind though.
Note that this method will throw an IOException if there is already a lock in there, you need to handle this and move on.
Your idea is right, perhaps you made a mistake in the implementation?
Here is working code
private const string _lockFileName = #"c:\whatever\lock";
private FileStream _lockFile;
public bool Lock()
{
try
{
_lockFile = File.Open(_lockFileName, FileMode.Create, FileAccess.ReadWrite, FileShare.None);
return true;
}
catch { } // perhaps a certain exception type?
return false;
}
public void Unlock()
{
if(_lockFile != null)
{
_lockFile.Close();
File.Delete(_lockFileName);
_lockFile = null;
}
}
At start of using folder call Lock(), work with folder only if it return true. At the end call Unlock(), it's safe to call even if Lock() return false.
You can write a lock.conf file in the directory when you start working on it. You then remove the file (releasing the lock) when you are done.
Two considerations:
Set a timestamp on the lock file, so that the lock can be ignored after a predetermined time interval (in order to avoid the lock remaining there if the program crashes)
Use a function that raises an exception when creating a file if the file is already there, to avoid race conditions.
i am designing and developing an api where multiple threads are downloading files from the net and then write it to disk.
if it is used incorrectly it could happen that the same file is downloaded and written by more than one threads, which will lead to an exception at the moment of writing to disk.
i would like to avoid this problem with a lock() { ... } around the part that writes the file, but obviously i dont want to lock with a global object, just something that is related to that specific file so that not all threads are locked when a file is written.
i hope this question is understandable.
So what you want to be able to do is synchronize a bunch of actions based no a given key. In this case, that key can be an absolute file name. We can implement this as a dictionary that maps a key to some synchronization object. This could be either an object to lock on, if we want to implement a blocking synchronization mechanism, or a Task if we want to represent an asynchronous method of running the code when appropriate; I went with the later. I also went with a ConcurrentDictionary to let it handle the synchronization, rather than handling it manually, and used Lazy to ensure that each task was created exactly once:
public class KeyedSynchronizer<TKey>
{
private ConcurrentDictionary<TKey, Lazy<Task>> dictionary;
public KeyedSynchronizer(IEqualityComparer<TKey> comparer = null)
{
dictionary = new ConcurrentDictionary<TKey, Lazy<Task>>(
comparer ?? EqualityComparer<TKey>.Default);
}
public Task ActOnKey(TKey key, Action action)
{
var dictionaryValue = dictionary.AddOrUpdate(key,
new Lazy<Task>(() => Task.Run(action)),
(_, task) => new Lazy<Task>(() =>
task.Value.ContinueWith(t => action())));
return dictionaryValue.Value;
}
public static readonly KeyedSynchronizer<TKey> Default =
new KeyedSynchronizer<TKey>();
}
You can now create an instance of this synchronizer, and then specify actions along with the keys (files) that they correspond to. You can be confident that the actions won't be executed until any previous actions on that file have completed. If you want to wait until that action completes, then you can Wait on the task, if you don't have any need to wait, then you can just not. This also allows you to do your processing asynchronously by awaiting the task.
You may consider using ReaderWriterLockSlim
http://msdn.microsoft.com/en-us/library/system.threading.readerwriterlockslim.aspx
private ReaderWriterLockSlim fileLock = new ReaderWriterLockSlim();
fileLock.EnterWriteLock();
try
{
//write your file here
}
finally
{
fileLock.ExitWriteLock();
}
I had a similar situation, and resolved it by lock()ing on the StreamWriter object in question:
private Dictionary<string, StreamWriter> _writers; // Consider using a thread-safe dictionary
void WriteContent(string file, string content)
{
StreamWriter writer;
if (_writers.TryGetValue(file, out writer))
lock (writer)
writer.Write(content);
// Else handle missing writer
}
That's from memory, it may not compile. I'd read up on Andrew's solution (I will be), as it may be more exactly what you need... but this is super-simple, if you just want a quick-and-dirty.
I'll make it an answer with some explanation.
Windows already have something like you want, idea behind is simple: to allow multiple processes access same file and to carry on all writing/reading operations, so that: 1) all processes operates with the most recent data of that file 2) multiple writing or reading occurs without waiting (if possible).
It's called Memory-Mapped Files. I was using it for IPC mostly (without file), so can't provide an example, but there should be some.
You could mimic MMF behavior by using some buffer and sort of layer on top of it, which will redirect all reading/writing operations to that buffer and periodically flush updated content into physical file.
P.S: try to look also for file-sharing (open file for shared reading/writing).