Use multithreading for multiple file copies

Use multithreading for multiple file copies - c#

I have to copy large number of files (10000 files)
because it takes long time to copy.
I have tried using two threads instead of single thread, one to copy odd number files in list and other to copy even numbers from list
I have used this code:
ThreadPool.QueueUserWorkItem(new WaitCallback(this.RunFileCopy),object)
but there is no significant difference in time when using single thread and when using two threads.
What could be the reason for this?

File copying is not a CPU process, it a IO process, so multithreding or parallelism wont help you.
Multithreading will slow you down in almost all cases.If disc is SSD too it has a limited speed for r/w and it will use it efficiently with single thread too. If u use parallelism you will just split your speed into pieces and this will create a huge overhead for HDD
Multithreading only helps you in more than one disc case, when you read from different discs and write to different discs.
If files are too small. Zipping and unzipping the files on the target drive can be faster in most cases, and if u zip the files with low compression it will be quite faster
using System.IO;
using System.IO.Compression;
.....
string startPath = #"c:\example\start";
string zipPath = #"c:\example\result.zip";
string extractPath = #"c:\example\extract";
ZipFile.CreateFromDirectory(startPath, zipPath, CompressionLevel.Fastest, true);
ZipFile.ExtractToDirectory(zipPath, extractPath);
More implementation details here
How to: Compress and Extract Files

I'm going to provide a minority opinion here. Everybody is telling you that Disk I/O is preventing you from getting any speedup from multiple threads. That's ... sort ... of right, but...
Given a single disk request, the OS can only choose to move the heads to the point on the disk selected impliclity by the file access, usually incurring an average of half of the full stroke seek time (tens of milliseconds) and rotational delays (another 10 milliseconds) to access the data. And sticking with single disk requests, this is a pretty horrendous (and unavoidable) cost to pay.
Because disk accesses take a long time, the OS has plenty of CPU to consider the best order to access the disk when there are multiple requests, should they occur while it is already waiting for the disk to do something. The OS does so usually with an elevator algorithm, causing the heads to efficiently scan across the disk in one direction in one pass, and scan efficiently in the other direction when the "furthest" access has been reached.
The idea is simple: if you process multiple disk requests in exactly the time order in which they occur, the disk heads will likely jump randomly about the disk (under the assumption the files are placed randomly), thus incurring the helf-full seek + rotational delay on every access. With 1000 live accesses processed in order, 1000 average half-full seeks will occur. Ick.
Instead, give N near-simultaneous accesses, the OS can sort these accesses by the physical cylinder they will touch, and then process them in cylinder order. A 1000 live accesses, processed in cylinder order (even with random file distributions), is likely to have one request per cylinder. Now the heads only have to step from one cylinder to the next, and that's a lot less than the average seek.
So having lots of requests should help the OS make better access-order decisions.
Since OP has lots of files, there's no reason he could not run a lot of threads, each copying its own file and generating demand for disk locations. He would want each thread to issue a read and write of of something like a full track, so that when the heads arrive at a cylinder, a full track is read or written (under the assumption the OS lays files out contiguously on a track where it can).
OP would want to make sure his machine had enough RAM to buffer his threadcount times tracksize. An 8Gb machine with 4 Gb unbusy during the copy has essentially a 4 Gb disk cache. A 100Kb per track (been a long time since I looked) suggests "room" for 10,000 threads. I seriously doubt he needs that many; mostly he needs enough threads to overwhelm the number of cylinders on his disk. I'd surely consider several hundred threads.
Two threads surely is not enough. (Windows appears to use one thread when you ask it copy a bunch of files. That's always seemed pretty dumb to me).
Another poster suggested zipping the files. With lots of threads, and everything waiting on the disk (the elevator algorithm doesnt change that, just the average wait time), many threads can afford to throw computational cycles at zipping. This won't help with reads; files are what they are when read. But it may shorten the amount of data to write, and provide effectively larger buffers in memory, providing some additional speedup.
Note: If one has an SSD, then there are no physical cylinders, thus no seek time, and nothing for an elevator algorithm to optimize. Here, lots of threads don't buy any cylinder ordering time. They shouldn't hurt, either.

Related

Writing big files async or sync in a special thread?

I am recording the screen in Hololens, using C#. I am creating small videos with 100 frames and I plan to write them separately because RAM is not enough to write only one big video.
Which would be better in terms of performance?
Create videos and write them async, or create a work queue that writes them syncronously?

I would create a work queue and only allow 1 (background) thread to write to the same physical disk at the same time.
If you did this with multiple threads to a spinning disk, each thread would be fighting for access, causing a lot of unnecessary disk seeks and switching between threads/files.
On average, a disk seek is around 10ms (to 15ms). In that same time an extra megabyte could have been written.
So, for spinning disks, writing from multiple threads will never be faster (but probably slower, depending on the buffering/caching).
For SSD however there could be some speed improvement - but there is always a maximum total bandwidth. So if the data to be written is ready in memory, writing from a single thread should be close to the available SSD bandwidth.

file copy is faster after one pass

I have a program that copies a large amount of files from several directories to on directory. the amount is not known(about 50K), but they are all on the same drive. in the program there should be a progress bar. when i wrote the program for the first time i did not wrote the progress bar and program ran slow. i toke about 15-20 min to pass the files. in order to write the progress bar i needed to know how many files do i have, so i went through the directories and listed the files. now the first ran through the files takes about 5 min, but the copy takes only 5-7 min.
can anyone explain the phenomenon? I'm sorry that i can't share the code, but it's a simple use of File.Copy and a simple c# .net 3.5 progressBar

This approach minimizes the most expensive operation on a disk drive, moving the reader head. Disk speeds are rated by two basic mechanical constraints. One is how fast the platters spin which sets an upper bound on the data transfer speed. That's fixed. And how fast the read head can be moved to another track. The seek time, a fat dozen milliseconds to move it by one track is typical. A very long time in cpu cycles. Which makes the order in which you access disk data very important. Constantly jumping the reader head back-and-forth between the directory records and the file data clusters like you did originally is very expensive.
To what degree the data on the disk is fragmented is also very important, the reason defrag utilities exist. A drive that sees a high rate of files getting created and deleted tends to get fragmented quicker. The higher the fragmentation rate, the more disk seeks you'll need to read data from the drive.
By reading the directory entries first you can avoid a lot of seeks. They are localized in an area of the drive called the MFT, physically close to each other so far fewer long seeks. You'll read them again when you actually start copying the files, but this time they come from the file system cache. Stored in RAM when you scanned the directories the first time. So no need for an expensive seek back to the MFT.
Also notably the reason why SSDs work so much better than mechanical drives, they have a very low seek time.

This is not a phenomenon, it is Caching

Mutiple Threading in the eyes of I/O operations?

I was thinking...
Does Multithreading using c# in I/O operations ( lets say copying many files from c:\1\ to c:\2\ ) , will have performance differences rather than doing the operation - sequential ?
The reason why im struggle with myself is that an IO operation finally - is One item which has to do work. so even if im working in parallel - he will still execute those copy orders as sequential...
or maybe my assumption is wrong ?
in that case is there any benefit of using multithreaded copy to :
copy many small file ( sum 4GB)
copy 4 big files ( sum 4 gb , 1000 mb each)
thanks

Like others says, it has to be measured against concrete application context.
But just would like to invite an attention on this.
Every time you copy a file the permission of Write access to destination location is checked, which is slow.
All of us met a case when you have to copy/paste a sequence of already compressed files, and if you them compress again into one big ZIP file, so the total compressed size is not significally smaller then the sum of all content, the IO operation will be executed a way faster. (Just try it, you will see a huge difference, if you didn't do it before).
So I would assume (again it has to be measured on concrete system, mine are just guesses) that having one big file write on single disk, will be faster the a lot of small files.
Hope this helps.

Multithreading with files is not so much about the CPU but about IO. This means that totally different rules apply. Different devices have different characterstics:
Magnetic disks like sequential IO
SSDs like sequential or parallel random IO (mine has 4 hardware "threads")
The network likes many parallel operations to amortize latency

I'm no expert in hard-disk related questions, but maybe this will shred some light for you:
Windows is using the NTFS file system. This system doesn't "like" too much small files, for example, under 1kb files. It will "magically" make 100 files of 1kb weight 400kb instead of 100kb. It is also VERY slow when dealing which a lot of "small" files. Therefore, copying one big file instead of many small files of the same weight will be much faster.
Also, from my personal experience and knowledge, Multithreading will NOT speed up the copying of many files, because the actual hardware disk is acting like one unit, and can't be sped up by sending many requests at the same time (it will process them one by one.)

Fastest way to read many 300 bytes chunks randomly by file offset from a 2TB file?

I have some 2TB read only (no writing once created) files on a RAID 5 (4 x 7.2k # 3TB) system.
Now I have some threads that wants to read portions of that file.
Every thread has an array of chunks it needs.
Every chunk is addressed by file offset (position) and size (mostly about 300 bytes) to read from.
What is the fastest way to read this data.
I don't care about CPU cycles, (disk) latency is what counts.
So if possible I want take advantage of NCQ of the hard disks.
As the files are highly compressed and will accessed randomly and I know exactly the position, I have no other way to optimize it.
Should I pool the file reading to one thread?
Should I keep the file open?
Should every thread (maybe about 30) keep every file open simultaneously, what is with new threads that are coming (from web server)?
Will it help if I wait 100ms and sort my readings by file offsets (lowest first)?
What is the best way to read the data? Do you have experiences, tips, hints?

The optimum number of parallel requests depends highly on factors outside your app (e.g. Disk count=4, NCQ depth=?, driver queue depth=? ...), so you might want to use a system, that can adapt or be adapted. My recommendation is:
Write all your read requests into a queue together with some metadata that allows to notify the requesting thread
have N threads dequeue from that queue, synchronously read the chunk, notify the requesting thread
Make N runtime-changeable
Since CPU is not your concern, your worker threads can calculate a floating latency average (and/or maximum, depending on your needs)
Slide N up and down, until you hit the sweet point
Why sync reads? They have lower latency than ascync reads.
Why waste latency on a queue? A good lockless queue implementation starts at less than 10ns latency, much less than two thread switches
Update: Some Q/A
Should the read threads keep the files open? Yes, definitly so.
Would you use a FileStream with FileOptions.RandomAccess? Yes
You write "synchronously read the chunk". Does this mean every single read thread should start reading a chunk from disk as soon as it dequeues an order to read a chunk? Yes, that's what I meant. The queue depth of read requests is managed by the thread count.

Disks are "single threaded" because there is only one head. It won't go faster no matter how many threads you use... in fact more threads probably will just slow things down. Just get yourself the list and arrange (sort) it in the app.
You can of course use many threads that'd make use of NCQ probably more efficient, but arranging it in the app and using one thread should work better.
If the file is fragmented - use NCQ and a couple of threads because you then can't know exact position on disk so only NCQ can optimize reads. If it's contignous - use sorting.
You may also try direct I/O to bypass OS caching and read the whole file sequentially... it sometimes can be faster, especially if you have no other load on this array.

Will ReadFileScatter do what you want?

C#: poor performance with multithreading with heavy I/O

I've written an application in C# that moves jpgs from one set of directories to another set of directories concurrently (one thread per fixed subdirectory). The code looks something like this:
string destination = "";
DirectoryInfo dir = new DirectoryInfo("");
DirectoryInfo subDirs = dir.GetDirectories();
foreach (DirectoryInfo d in subDirs)
{
FileInfo[] files = subDirs.GetFiles();
foreach (FileInfo f in files)
{
f.MoveTo(destination);
}
}
However, the performance of the application is horrendous - tons of page faults/sec. The number of files in each subdirectory can get quite large, so I think a big performance penalty comes from a context switch, to where it can't keep all the different file arrays in RAM at the same time, such that it's going to disk nearly every time.
There's a two different solutions that I can think of. The first is rewriting this in C or C++, and the second is to use multiple processes instead of multithreading.
Edit: The files are named based on a time stamp, and the directory they are moved to are based on that name. So the directories they are moved to would correspond to the hour it was created; 3-27-2009/10 for instance.
We are creating a background worker per directory for threading.
Any suggestions?

Rule of thumb, don't parallelize operations with serial dependencies. In this case your hard drive is the bottleneck and to many threads are just going to make performance worse.
If you are going to use threads try to limit the number to the number of resources you have available, cores and hard disks not the number of jobs you have pending, directories to copy.

Reconsidered answer
I've been rethinking my original answer below. I still suspect that using fewer threads would probably be a good idea, but as you're just moving files, it shouldn't actually be that IO intensive. It's possible that just listing the files is taking a lot of disk work.
However, I doubt that you're really running out of memory for the files. How much memory have you got? How much memory is the process taking up? How many threads are you using, and how many cores do you have? (Using significantly more threads than you have cores is a bad idea, IMO.)
I suggest the following plan of attack:
Work out where the bottlenecks actually are. Try fetching the list of files but not doing the moving them. See how hard the disk is hit, and how long it takes.
Experiment with different numbers of threads, with a queue of directories still to process.
Keep an eye on the memory use and garbage collections. The Windows performance counters for the CLR are good for this.
Original answer
Rewriting in C or C++ wouldn't help. Using multiple processes wouldn't help. What you're doing is akin to giving a single processor a hundred threads - except you're doing it with the disk instead.
It makes sense to parallelise tasks which use IO if there's also a fair amount of computation involved, but if it's already disk bound, asking the disk to work with lots of files at the same time is only going to make things worse.
You may be interested in a benchmark (description and initial results) I've recently been running, testing "encryption" of individual lines of a file. When the level of "encryption" is low (i.e. it's hardly doing any CPU work) the best results are always with a single thread.

If you've got a block of work that is dependent on a system bottleneck, in this case disk IO, you would be better off not using multiple threads or processes. All that you will end up doing is generating a lot of extra CPU and memory activity while waiting for the disk. You would probably find the performance of your app improved if you used a single thread to do your moves.

It seems you are moving a directory, surely just renaming/moving the directory would be sufficient. If you are on the same source and hard disk, it would be instant.
Also capturing all the file info for every file would be unnecessary, the name of the file would suffice.

the performence problem comes from the hard drive there is no point from redoing everything with C/C++ nor from multiple processes

Are you looking at the page-fault count and inferring memory pressure from that? You might well find that the underlying Win32/OS file copy is using mapped files/page faults to do its work, and the faults are not a sign of a problem anyway. Much of Window's own file handling is done via page faults (e.g. 'loading' executable code) - they're not a bad thing per se.
If you are suffering from memory pressure, then I would surmise that it's more likely to be caused by creating a huge number of threads (which are very expensive), rather than by the file copying.
Don't change anything without profiling, and if you profile and find the time is spent in framework methods which are merely wrappers on Win32 functions (download the framework source and have a look at how those methods work), then don't waste time on C++.

If GetFiles() is indeed returning a large set of data, you could write an enumerator, as in:
IEnumerable<string> GetFiles();

So, you're moving files, one at a time, from one subfolder to another subfolder? Wouldn't you be causing lots of disk seeks as the drive head moves back and forth? You might get better performance from reading the files into memory (at least in batches if not all at once), writing them to disk, then deleting the originals from disk.
And if you're doing multiple sets of folders in separate threads, then you're moving the disk head around even more. This is one case where multiple threads isn't doing you a favor (although you might get some benefit if you have a RAID or SAN, etc).
If you were processing the files in some way, then mulptithreading could help if different CPUs could calculate on multiple files at once. But you can't get four CPUs to move one disk head to four different locations at once.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.