Writing big files async or sync in a special thread?

Writing big files async or sync in a special thread? - c#

I am recording the screen in Hololens, using C#. I am creating small videos with 100 frames and I plan to write them separately because RAM is not enough to write only one big video.
Which would be better in terms of performance?
Create videos and write them async, or create a work queue that writes them syncronously?

I would create a work queue and only allow 1 (background) thread to write to the same physical disk at the same time.
If you did this with multiple threads to a spinning disk, each thread would be fighting for access, causing a lot of unnecessary disk seeks and switching between threads/files.
On average, a disk seek is around 10ms (to 15ms). In that same time an extra megabyte could have been written.
So, for spinning disks, writing from multiple threads will never be faster (but probably slower, depending on the buffering/caching).
For SSD however there could be some speed improvement - but there is always a maximum total bandwidth. So if the data to be written is ready in memory, writing from a single thread should be close to the available SSD bandwidth.

Related

Load a single file at a time or all files at once into dictionary

I've been doing some work with loading multiple image files into an HTML document that is then converted into a PDF.
I am unsure on the specifics, but I was under the impression it's better to read a single file at a time and keep the memory footprint low, rather than loading all the files into memory (in a dictionary) at once (there are so many images - the collection can be as big as 500MB!).
I was wondering what is faster though? Is it quicker to read say, 100MB worth of files into memory, process them, then load another 100MB? Or is it better to do it a single file at a time (surely the number of disk I/O operations would be similar in either regard)?

It's better to read file one by one as it is more memory efficient. If you can you should work only with stream rather than in memory buffer.
When you use more memory, your data may ends in a page file resulting in more disk I/O operations.
You should avoid working with large memory block if you don't want to see an OutOfMemoryException.

This depends on a number of things, but fundamentally, disk is a lot slower than memory, so you can gain by reading, if you do it right.
First, a warning: if you do not have plenty of memory to fit the files you attempt to load, then your operating system will page memory to disk, which will slow your system down far more than reading one file at a time, so be careful.
The key to improving disk io performance is to keep the disk busy. Reading one at a time leaves the disk idle while you are processing the file in memory. Reading a set of files into a large block of memory, but still reading one at a time, and then processing the block of files, probably won't improve performance except in very unusual conditions.
If your goal is to reduce the time from start to finish of processing these files, you will probably want to run on multiple threads; the system calls to open and read a file still take time to queue, so depending on the capabilities of your disk, you can usually get better overall io throughput by having at least one read request queued while the disk is loading another request; this minimizes idle time between requests, and keeps the disk at its absolute maximum. Note that having too many requests queued can slow performance.
Since processing in memory is likely to be faster, you could have at least 2 threads set up to read files, and at least 1 thread set up to process the files that have already been loaded into memory by the other threads.
A better way than managing your own threads is to use a thread pool; this would naturally limit the number of io requests to the number of concurrent threads allowed, and wouldn't require you to manage the threads yourself. This may not be quite optimal, but a thread pool should be faster than processing the files one at a time, and easier/safer than managing threads.
Note that if you don't understand what I mean by threads and a threadpool, or you haven't done much multi-threaded development relating to disk io, you are probably better off sticking with one file at a time, unless improving the total processing time is a requirement that you can't get around. There are plenty of examples of how to use threads on MSDN, but if you haven't done it much, this probably isn't a good first project for threading.

Use multithreading for multiple file copies

I have to copy large number of files (10000 files)
because it takes long time to copy.
I have tried using two threads instead of single thread, one to copy odd number files in list and other to copy even numbers from list
I have used this code:
ThreadPool.QueueUserWorkItem(new WaitCallback(this.RunFileCopy),object)
but there is no significant difference in time when using single thread and when using two threads.
What could be the reason for this?

File copying is not a CPU process, it a IO process, so multithreding or parallelism wont help you.
Multithreading will slow you down in almost all cases.If disc is SSD too it has a limited speed for r/w and it will use it efficiently with single thread too. If u use parallelism you will just split your speed into pieces and this will create a huge overhead for HDD
Multithreading only helps you in more than one disc case, when you read from different discs and write to different discs.
If files are too small. Zipping and unzipping the files on the target drive can be faster in most cases, and if u zip the files with low compression it will be quite faster
using System.IO;
using System.IO.Compression;
.....
string startPath = #"c:\example\start";
string zipPath = #"c:\example\result.zip";
string extractPath = #"c:\example\extract";
ZipFile.CreateFromDirectory(startPath, zipPath, CompressionLevel.Fastest, true);
ZipFile.ExtractToDirectory(zipPath, extractPath);
More implementation details here
How to: Compress and Extract Files

I'm going to provide a minority opinion here. Everybody is telling you that Disk I/O is preventing you from getting any speedup from multiple threads. That's ... sort ... of right, but...
Given a single disk request, the OS can only choose to move the heads to the point on the disk selected impliclity by the file access, usually incurring an average of half of the full stroke seek time (tens of milliseconds) and rotational delays (another 10 milliseconds) to access the data. And sticking with single disk requests, this is a pretty horrendous (and unavoidable) cost to pay.
Because disk accesses take a long time, the OS has plenty of CPU to consider the best order to access the disk when there are multiple requests, should they occur while it is already waiting for the disk to do something. The OS does so usually with an elevator algorithm, causing the heads to efficiently scan across the disk in one direction in one pass, and scan efficiently in the other direction when the "furthest" access has been reached.
The idea is simple: if you process multiple disk requests in exactly the time order in which they occur, the disk heads will likely jump randomly about the disk (under the assumption the files are placed randomly), thus incurring the helf-full seek + rotational delay on every access. With 1000 live accesses processed in order, 1000 average half-full seeks will occur. Ick.
Instead, give N near-simultaneous accesses, the OS can sort these accesses by the physical cylinder they will touch, and then process them in cylinder order. A 1000 live accesses, processed in cylinder order (even with random file distributions), is likely to have one request per cylinder. Now the heads only have to step from one cylinder to the next, and that's a lot less than the average seek.
So having lots of requests should help the OS make better access-order decisions.
Since OP has lots of files, there's no reason he could not run a lot of threads, each copying its own file and generating demand for disk locations. He would want each thread to issue a read and write of of something like a full track, so that when the heads arrive at a cylinder, a full track is read or written (under the assumption the OS lays files out contiguously on a track where it can).
OP would want to make sure his machine had enough RAM to buffer his threadcount times tracksize. An 8Gb machine with 4 Gb unbusy during the copy has essentially a 4 Gb disk cache. A 100Kb per track (been a long time since I looked) suggests "room" for 10,000 threads. I seriously doubt he needs that many; mostly he needs enough threads to overwhelm the number of cylinders on his disk. I'd surely consider several hundred threads.
Two threads surely is not enough. (Windows appears to use one thread when you ask it copy a bunch of files. That's always seemed pretty dumb to me).
Another poster suggested zipping the files. With lots of threads, and everything waiting on the disk (the elevator algorithm doesnt change that, just the average wait time), many threads can afford to throw computational cycles at zipping. This won't help with reads; files are what they are when read. But it may shorten the amount of data to write, and provide effectively larger buffers in memory, providing some additional speedup.
Note: If one has an SSD, then there are no physical cylinders, thus no seek time, and nothing for an elevator algorithm to optimize. Here, lots of threads don't buy any cylinder ordering time. They shouldn't hurt, either.

Fastest way to read many 300 bytes chunks randomly by file offset from a 2TB file?

I have some 2TB read only (no writing once created) files on a RAID 5 (4 x 7.2k # 3TB) system.
Now I have some threads that wants to read portions of that file.
Every thread has an array of chunks it needs.
Every chunk is addressed by file offset (position) and size (mostly about 300 bytes) to read from.
What is the fastest way to read this data.
I don't care about CPU cycles, (disk) latency is what counts.
So if possible I want take advantage of NCQ of the hard disks.
As the files are highly compressed and will accessed randomly and I know exactly the position, I have no other way to optimize it.
Should I pool the file reading to one thread?
Should I keep the file open?
Should every thread (maybe about 30) keep every file open simultaneously, what is with new threads that are coming (from web server)?
Will it help if I wait 100ms and sort my readings by file offsets (lowest first)?
What is the best way to read the data? Do you have experiences, tips, hints?

The optimum number of parallel requests depends highly on factors outside your app (e.g. Disk count=4, NCQ depth=?, driver queue depth=? ...), so you might want to use a system, that can adapt or be adapted. My recommendation is:
Write all your read requests into a queue together with some metadata that allows to notify the requesting thread
have N threads dequeue from that queue, synchronously read the chunk, notify the requesting thread
Make N runtime-changeable
Since CPU is not your concern, your worker threads can calculate a floating latency average (and/or maximum, depending on your needs)
Slide N up and down, until you hit the sweet point
Why sync reads? They have lower latency than ascync reads.
Why waste latency on a queue? A good lockless queue implementation starts at less than 10ns latency, much less than two thread switches
Update: Some Q/A
Should the read threads keep the files open? Yes, definitly so.
Would you use a FileStream with FileOptions.RandomAccess? Yes
You write "synchronously read the chunk". Does this mean every single read thread should start reading a chunk from disk as soon as it dequeues an order to read a chunk? Yes, that's what I meant. The queue depth of read requests is managed by the thread count.

Disks are "single threaded" because there is only one head. It won't go faster no matter how many threads you use... in fact more threads probably will just slow things down. Just get yourself the list and arrange (sort) it in the app.
You can of course use many threads that'd make use of NCQ probably more efficient, but arranging it in the app and using one thread should work better.
If the file is fragmented - use NCQ and a couple of threads because you then can't know exact position on disk so only NCQ can optimize reads. If it's contignous - use sorting.
You may also try direct I/O to bypass OS caching and read the whole file sequentially... it sometimes can be faster, especially if you have no other load on this array.

Will ReadFileScatter do what you want?

Threads vs Processes in .NET

I have a long-running process that reads large files and writes summary files. To speed things up, I'm processing multiple files simultaneously using regular old threads:
ThreadStart ts = new ThreadStart(Work);
Thread t = new Thread(ts);
t.Start();
What I've found is that even with separate threads reading separate files and no locking between them and using 4 threads on a 24-core box, I can't even get up to 10% on the CPU or 10% on disk I/O. If I use more threads in my app, it seems to run even more slowly.
I'd guess I'm doing something wrong, but where it gets curious is that if I start the whole exe a second and third time, then it actually processes files two and three times faster. My question is, why can't I get 12 threads in my one app to process data and tax the machine as well as 4 threads in 3 instances of my app?
I've profiled the app and the most time-intensive and frequently called functions are all string processing calls.

It's possible that your computing problem is not CPU bound, but I/O bound. It doesn't help to state that your disk I/O is "only at 10%". I'm not sure such performance counter even exists.
The reason why it gets slower while using more threads is because those threads are all trying to get to their respective files at the same time, while the disk subsystem is having a hard time trying to accomodate all of the different threads. You see, even with a modern technology like SSDs where the seek time is several orders of magnitude smaller than with traditional hard drives, there's still a penalty involved.
Rather, you should conclude that your problem is disk bound and a single thread will probably be the fastest way to solve your problem.
One could argue that you could use asynchronous techniques to process a bit that's been read, while on the background the next bit is being read in, but I think you'll see very little performance improvement there.
I've had a similar problem not too long ago in a small tool where I wanted to calculate MD5 signatures of all the files on my harddrive and I found that the CPU is way too fast compared to the storage system and I got similar results trying to get more performance by using more threads.
Using the Task Parallel Library didn't alleviate this problem.

First of all on a 24 core box if you are using only 4 threads the most cpu it could ever use is 16.7% so really you are getting 60% utilization, which is fairly good.
It is hard to tell if your program is I/O bound at this point, my guess is that is is. You need to run a profiler on your project and see what sections of code your project is spending the most of it's time. If it is sitting on a read/write operation it is I/O bound.
It is possable you have some form of inter-thread locking being used. That would cause the program to slow down as you add more threads, and yes running a second process would fix that but fixing your locking would too.
What it all boils down to is without profiling information we can not say if using a second process will speed things up or make things slower, we need to know if the program is hanging on a I/O operation, a locking operation, or just taking a long time in a function that can be parallelized better.

I think you find out what file cache is not ideal in case when one proccess write data in many file concurrently. File cache should sync to disk when the number of dirty page cache exceeds a threshold. It seems concurrent writers in one proccess hit threshold faster than the single thread writer. You can read read about file system cache here File Cache Performance and Tuning

Try using Task library from .net 4 (System.Threading.Task). This library have built-in optimizations for different number of processors.
Have no clue what is you problem, maybe because your code snippet is not really informative

How to calculate the optimum chunk size for uploading large files

Is there such a thing as an optimum chunk size for processing large files? I have an upload service (WCF) which is used to accept file uploads ranging from several hundred megabytes.
I've experimented with 4KB, 8KB through to 1MB chunk sizes. Bigger chunk sizes is good for performance (faster processing) but it comes at the cost of memory.
So, is there way to work out the optimum chunk size at the moment of uploading files. How would one go about doing such calculations? Would it be a combination of available memory and the client, CPU and network bandwidth which determines the optimum size?
Cheers
EDIT: Probably should mention that the client app will be in silverlight.

If you are concerned about running out of resources, then the optimum is probably best determined by evaluating your peak upload concurrency against your system's available memory. How many simultaneous uploads you have in progress at a time would be the key critical variable in any calculation you might do. All you have to do is make sure you have enough memory to handle the upload concurrency, and that's rather trivial to achieve. Memory is cheap and you will likely run out of network bandwidth long before you get to the point where your concurrency would overrun your memory availability.
On the performance side, this isn't the kind of thing you can really optimize much during app design and development. You have to have the system in place, users uploading files for real, and then you can monitor actual runtime performance.
Try a chunk size that matches your network's TCP/IP window size. That's about as optimal as you'd really need to get at design time.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.