How to increase or keep starting speed of copying a file - c#

I am using these codes to copy a big file:
const int CopyBufferSize = 64 * 1024;
string src = #"F:\Test\src\Setup.exe";
string dst = #"F:\Test\dst\Setup.exe";
public void CopyFile()
{
Stream input = File.OpenRead(src);
long length = input.Length;
byte[] buffer = new byte[CopyBufferSize];
Stopwatch swTotal = Stopwatch.StartNew();
Invoke((MethodInvoker)delegate
{
progressBar1.Maximum = (int)Math.Abs(length / CopyBufferSize) + 1;
});
using (Stream output = File.OpenWrite(dst))
{
int bytesRead = 1;
// This will finish silently if we couldn't read "length" bytes.
// An alternative would be to throw an exception
while (length > 0 && bytesRead > 0)
{
bytesRead = input.Read(buffer, 0, Math.Min(CopyBufferSize, buffer.Length));
output.Write(buffer, 0, bytesRead);
length -= bytesRead;
Invoke((MethodInvoker)delegate
{
progressBar1.Value++;
label1.Text = (100 * progressBar1.Value / progressBar1.Maximum).ToString() + " %";
label3.Text = ((int)swTotal.Elapsed.TotalSeconds).ToString() + " Seconds";
});
}
Invoke((MethodInvoker)delegate
{
progressBar1.Value = progressBar1.Maximum;
});
}
Invoke((MethodInvoker)delegate
{
swTotal.Stop();
Console.WriteLine("Total time: {0:N4} seconds.", swTotal.Elapsed.TotalSeconds);
label3.Text += ((int)swTotal.Elapsed.TotalSeconds - int.Parse(label3.Text.Replace(" Seconds",""))).ToString() + " Seconds";
});
}
The file size is about 4 GB.
In the first 7 seconds it can copy up to 400 MB then this hot speed calms down.
What happen and how to keep this hot speed or even increase it?
Another question is here:
When the file copied, windows is still working on destination file(about 10 seconds).
Copy Time: 116 seconds
extra time: 10-15 seconds or even more
How to remove or decrease this extra time?

What happens? Caching, mostly.
The OS pretends you copied 400 MiB in seven seconds, but you didn't. You just sent 400 MiB to the OS (or file system) to write in the future, and that's as much as the buffer can take. If you try to write a 400 MiB file and you pull the plug as soon as it's "done", your file will not be written. The same thing deals with the "overtime" - your application has sent all it has to the buffer, but the buffer isn't yet written to the drive itself (either its buffer, or even slower, the actual physical platter).
This is especially visible with USB flash drives, which tend to use caching heavily. This makes working with the (usually very slow) drive much more pleasant, with the trade-off that you have to wait for the OS to finish writing everything before pulling the drive out (that's why you get the "Safe remove" icon).
So it should be obvious that you can't really make the total time shorter. All you can do is try and make the user interface reflect reality a bit better, so that the user doesn't see the "first 400 MiB are so fast!" thing... but it doesn't really work well. In any case, your read->write speed is ~30 MiB/s. The OS just hides the peaks to make it easier to deal with the slow hard drive - very useful when you're dealing with lots of small files, worthless when dealing with files bigger than the buffer.
You have a bit of control over this when you use the FileStream constructor directly, instead of using File.OpenWrite - you can use FileOptions.WriteThrough to instruct the OS to avoid any caching and write directly to disk[1], giving you a better idea of the real write speed. Do note that this usually makes the total time larger, though, and it may make concurrent access even worse. You definitely don't want to use it for small files.
[1] - Haha, right. The drive usually has caching of its own, and some ignore the OS' pleas. Tough luck.

One thing you could try is to increase the buffer size. This really matters when the write cache can no longer keep up (as discussed in other answer). Writing a lot of small blocks is often slower than writing a few large blocks. Instead of 64 kB, try 1 MB, 4 MB or even bigger:
const int CopyBufferSize = 1 * 1024 * 1024; // 1 MB
// or
const int CopyBufferSize = 4 * 1024 * 1024; // 4 MB

Related

monotorrent - writeRate/readRate not working

i'm using monotorrent that downloads a 20GB~ file, when monotorrent creates the files the memory and CPU reaches maximum which slows the computer and even overheat it, so i wanted to limit the memory usage by limiting the write rate.
here's what i have tried:-
, i checked around and found that you can limit read/write rate of the engine using this code:-
EngineSettings engineSettings = new EngineSettings(downloadsPath, port);
engineSettings.PreferEncryption = true;
engineSettings.AllowedEncryption = EncryptionTypes.All;
engineSettings.MaxWriteRate = **maximum write rate in bytes**;
engineSettings.MaxReadRate = **maximum read rate in bytes**;
engineSettings.GlobalMaxDownloadSpeed = **max download in bytes**;
the download rate worked but it didn't limited the memory usage, so i checked the write rate value in runtime using this code
MessageBox.Show(engine.DiskManager.WriteRate.ToString());
and it returned 0, so instead of adding MaxWriteRate to the EngineSettings i went into EngineSettings.cs and added a default value to MaxWriteRate by changing this code:-
public int MaxWriteRate
{
get { return 5000; }
set { maxWriteRate = 5000; }
}
and it didn't limited the memory usage also the WriteRate value returned 0, so i went into DiskManager.cs and added a default value to WriteRate by changing this code:-
public int WriteRate
{
get { return 5000; }
}
now WriteRate value returned 5000 but it didn't limited the memory usage, then i stuck and didn't found anything else to change,
does anyone know why it's not working? i'm thinking that WriteRate is not even about limiting the writing speed.
When downloading a torrent, the download speed is limited by three things:
1) The maximum allowed download speed speed for the TorrentManager
2) The maximum allowed download speed overall
3) No more than 4MB of data is held in memory while waiting to be written to disk.
Specifically on the third point, if there are more than 4MB of pieces held in memory then no further Socket.Receive calls will be made until that data is flushed. https://github.com/mono/monotorrent/blob/caac16cffd95749febe04c3f7cf22567c3e40432/src/MonoTorrent/MonoTorrent.Client/RateLimiters/DiskWriterLimiter.cs#L43-L46
This screenshot shows what happens today when you specify a max write rate of 2 * 1024 * 1024 (2,048 kB/sec):
The download rate auto-limits because the 4MB buffer fills up, which means setting the max disk write rate ends up limiting both download rate and memory consumption.

C#.Net Bandwidth Computation VS Speedtest.net Speed

We are working on Windows Desktop application in which we need to capture current internet Bandwidth.
We are downloading a ZIP file multiple times sequentially but our results are not matching with Speed Test.
We are capturing bytes received on ACTIVE network card, but sequential download doesn't provide expected result. We even tried parallel downloading of different files multiple times but failed.
We got success only when we downloaded different files in parallel and performed test using Speed Test simultaneously.
Now here are my few questions:
Does bandwidth between TCP HOPS affects our bandwidth?
Does traffic between TCP HOPS affects our bandwidth?
How to effectively consume entire bandwidth using HTTP / TCP downloads and C# .NET?
Does ISP throttles bandwidth per TCP Socket connection?
Does ISP provides bandwidth to http://www.speedtest.net? (Could be possible as it can always show expected result but other sites cannot)
for (int downloadCount = 0; downloadCount < iterations; downloadCount++)
{
try
{
string downloadUrl = GetUniqueDownloadUrl();
bool isValidUrl = Uri.IsWellFormedUriString(downloadUrl, UriKind.Absolute);
if (true != isValidUrl)
{
return result;
}
// Download file and register total time to download file.
Stopwatch stopwatch = new Stopwatch();
stopwatch.Start();
byte[] fileContent = webclient.DownloadData(new Uri(downloadUrl, UriKind.Absolute));
stopwatch.Stop();
double downloadTime = stopwatch.ElapsedMilliseconds / 1000; // Milliseconds to Seconds
// Convert bytes to Mbits.
fileSizeInMbits = fileContent.Length / 125000; // bytes to Megabits
double speed = fileSizeInMbits / downloadTime; // speed in Mbps
// Store speeds for average calculation.
speeds.Add(speed);
}
catch (Exception e)
{
result.Error = e;
break;
}
}
}
// Calculate average bandwidth for total successful downloads.
double totalAvgSpeed = speeds.Average();
result.FileSizeInMB = fileSizeInMbits / 8;
result.Speed = Math.Round(totalAvgSpeed, 2, MidpointRounding.AwayFromZero);
return result;
}
There's no such thing as internet "speed" there's only speed between 2 hosts, if you have 1 computer on gigabit ethernet and a server also on gigabit ethernet, even if just 1 node on the way is saturated speed will go down, when you use speedtest.net it has a lot of close servers (including likely one at your isp) so you're going to get a very positive estimate.
And if your isp throttled you'd see it on speedtest just the same.
The only thing to remember is that downloading a file from a server will only give you an estimate of the speed TO/FROM that server, and not an "internet speed" which is a concept that doesn't really exist to begin with.

Memory mapped files that are contiguous on disk

I've read quite a few SO posts and general articles on trying to allocate over 1GB of memory so before getting shot down like the others here is some context.
This app will run as a kiosk with a dedicated machine running no unnecessary processes.
My app acquires images from a high-speed camera with a rolling shutter at a rate of 120 frames per second at a resolution of 1920 x 1080 with a bit depth of 24. The app needs to write every single frame to disk for post-processing. The current problem I am facing is that the Disk I/O won't keep up with the capture rate even though it is limited to 120 frames per second. The Disk I/O bandwidth needed is around 750MBps!
The total length of the recording needs to be at least 10 seconds (7.5GB) in raw form. Performing any on-the-fly transcoding or compression brings the frame-rate down to utterly unacceptable levels.
To work around this, I have tried the following:
Compromising on quality by reducing the bit-depth at hardware-level to 16 which is still around 500MBps.
Disabled all image encoding and writing raw camera data to disk. This has saved some processing time.
Creating a single 10GB file on disk and doing a sequential write-through as frames come in. This has helped most so far. All dev and production systems have a 100GB dedicated drive for this application.
Using Contig.exe from Sysinternals to defragment the file. This has had astonishing gains on non-SSD drives.
Out of options to explore here. I am not familiar with memory-mapped files and when trying to create them, I get an IOException saying Not enough storage is available to process this command..
using (var file = MemoryMappedFile.CreateFromFile(#"D:\Temp.VideoCache", FileMode.OpenOrCreate, "MyMapName", int.MaxValue, MemoryMappedFileAccess.CopyOnWrite))
{
...
}
The large file I currently use requires either sequential write-though or sequential read access. Any pointers would be appreciated.
I could even force the overall recording size down to 1.8GB if only there was a way to allocate that much RAM. Once again, this will run on a dedicated with 8GB available memory and 100GB free space. However, not all production systems will have SSD drives.
32 bit processes on a 64 bit system can allocate 4 GB of RAM, so it should be possible to get 1.8 GB of RAM for storing the video, but of course you need to consider loaded DLLs and a buffer until the video is compressed.
Other than that, you could use a RAMDisk, e.g. from DataRam. You just need to find a balance between how much memory the application needs and how much memory you can grant the disk. IMHO a 5 GB / 3 GB setting could work well: 1 GB for the OS, 4 GB for your application and 3 GB for the file.
Don't forget to copy the file from the RAM disk to HDD if you want it persistent.
Commodity hardware is cheap for a reason. You need faster hardware.
Buy a faster disk system. A good RAID controller and four SSDs. Put the drives into a RAID 1+0 configuration and be done with this problem.
How much money is your company planning on spending developing and testing software to push cheap hardware past its limitations? And even if you can get it to work fast enough, how much do they plan on spending to maintain that software?
Memory mapped files don't speed-up very much writing to a file...
If you have a big file, you normally don't try to map it entirely in RAM... you map a "window" of it, then "move" the window (in C#/Windows API you create a "view" of the file starting at any one location and with a certain size)
Example of code: (here the window is 1mb big... bigger windows are possible... at 32 bits it should be possible to allocate a 64 or 128mb window without any problem)
const string fileName = "Test.bin";
const long fileSize = 1024L * 1024 * 16;
const long windowSize = 1024 * 1024;
if (!File.Exists(fileName)) {
using (var file = File.Create(fileName)) {
file.SetLength(fileSize);
}
}
long realFileSize = new FileInfo(fileName).Length;
if (realFileSize < fileSize) {
using (var file = File.Create(fileName)) {
file.SetLength(fileSize);
}
}
using (var mm = MemoryMappedFile.CreateFromFile(fileName, FileMode.Open)) {
long start = 0;
while (true) {
long size = Math.Min(fileSize - start, windowSize);
if (size <= 0) {
break;
}
using (var acc = mm.CreateViewAccessor(start, size)) {
for (int i = 0; i < size; i++) {
// It is probably faster if you write the file with
// acc.WriteArray()
acc.Write(i, (byte)i);
}
}
start += windowSize;
}
}
Note that here I'm writing code that will write a fixed pre-known number of bytes (fileSize)... Your code should be different (because you can't pre-know the "exact" fileSize). Still remember: Memory mapped files don't speed-up very much writing to a file.

Is it necessary to Flush the Steam during a internet read file (download) process?

I'm working on a download method with resume support in my application (inside a thread).
My application users usually has low internet speed (common between 10 to 50 kbps download speed) and target files are between 3 to 80 MB...
A part of my download codes:
// Buffer size = 1024000 bit = >>
int iBufferSize = 1024;
iBufferSize *= 1000;
/// some codes.....
Try{
/// some codes.....
request = HttpWebRequest.Create(sourceUrl);
request.AddRange((int)downloadedSize)
Stream smRespStream = response.GetResponseStream();
const int MAX_LOOP=5;
var flushStreamCounter = 0;
while ((iByteSize = smRespStream.Read(downBuffer, 0, downBuffer.Length)) > 0)
{
saveFileStream.Write(downBuffer, 0, iByteSize);
//----------------------------------
// some codes
//----------------------------------
// Are these codes necessary to really write data to file now (when the condition is true)?
flushStreamCounter++;
if (flushStreamCounter > MAX_LOOP)
{
saveFileStream.Flush();
flushStreamCounter = 0;
}
}
}
finally
{
if (saveFileStream != null)
{
saveFileStream.Flush();
saveFileStream.Close();
saveFileStream.Dispose();
}
}
When i know the intenet speed is low for my customers i won't lost downloaded data by internet disconnect interrupts or application force closing or PC power off...
So, My Questions:
Should i use Flush method of saveFileStream to write data from memory to hard disk during while loop to prevent lost data, when i know the users maybe force close my application during download or intenet DC or...?
If using Flush is good way and necessary then what is the best MAX_LOOP value in my case and why?
What is the best iBufferSize value for my application (base on application user's internet speed) and why?
Update:
I don't know what are the advantages and disadvantages of using Flush stream during while loop!
When download process of my application is running, the windows explorer dose not show the file size increasing (even by Refresh). It updates the file size after above method complete, so i don't know is it a danger to lost data on application force closing or PC power outage.

Parallel GZip Decompression of Log Files - Tweaking MaxDegreeOfParallelism for the Highest Throughput

We have up to 30 GB of GZipped log files per day. Each file holds 100.000 lines and is between 6 and 8 MB when compressed. The simplified code in which the parsing logic has been stripped out, utilises the Parallel.ForEach loop.
The maximum number of lines processed peaks at MaxDegreeOfParallelism of 8 on the two-NUMA node, 32 logical CPU box (Intel Xeon E7-2820 # 2 GHz):
using System;
using System.Collections.Concurrent;
using System.Linq;
using System.IO;
using System.IO.Compression;
using System.Threading.Tasks;
namespace ParallelLineCount
{
public class ScriptMain
{
static void Main(String[] args)
{
int maxMaxDOP = (args.Length > 0) ? Convert.ToInt16(args[0]) : 2;
string fileLocation = (args.Length > 1) ? args[1] : "C:\\Temp\\SomeFiles" ;
string filePattern = (args.Length > 1) ? args[2] : "*2012-10-30.*.gz";
string fileNamePrefix = (args.Length > 1) ? args[3] : "LineCounts";
Console.WriteLine("Start: {0}", DateTime.UtcNow.ToString("yyyy-MM-ddTHH:mm:ss.fffffffZ"));
Console.WriteLine("Processing file(s): {0}", filePattern);
Console.WriteLine("Max MaxDOP to be used: {0}", maxMaxDOP.ToString());
Console.WriteLine("");
Console.WriteLine("MaxDOP,FilesProcessed,ProcessingTime[ms],BytesProcessed,LinesRead,SomeBookLines,LinesPer[ms],BytesPer[ms]");
for (int maxDOP = 1; maxDOP <= maxMaxDOP; maxDOP++)
{
// Construct ConcurrentStacks for resulting strings and counters
ConcurrentStack<Int64> TotalLines = new ConcurrentStack<Int64>();
ConcurrentStack<Int64> TotalSomeBookLines = new ConcurrentStack<Int64>();
ConcurrentStack<Int64> TotalLength = new ConcurrentStack<Int64>();
ConcurrentStack<int> TotalFiles = new ConcurrentStack<int>();
DateTime FullStartTime = DateTime.Now;
string[] files = System.IO.Directory.GetFiles(fileLocation, filePattern);
var options = new ParallelOptions() { MaxDegreeOfParallelism = maxDOP };
// Method signature: Parallel.ForEach(IEnumerable<TSource> source, Action<TSource> body)
Parallel.ForEach(files, options, currentFile =>
{
string filename = System.IO.Path.GetFileName(currentFile);
DateTime fileStartTime = DateTime.Now;
using (FileStream inFile = File.Open(fileLocation + "\\" + filename, FileMode.Open))
{
Int64 lines = 0, someBookLines = 0, length = 0;
String line = "";
using (var reader = new StreamReader(new GZipStream(inFile, CompressionMode.Decompress)))
{
while (!reader.EndOfStream)
{
line = reader.ReadLine();
lines++; // total lines
length += line.Length; // total line length
if (line.Contains("book")) someBookLines++; // some special lines that need to be parsed later
}
TotalLines.Push(lines); TotalSomeBookLines.Push(someBookLines); TotalLength.Push(length);
TotalFiles.Push(1); // silly way to count processed files :)
}
}
}
);
TimeSpan runningTime = DateTime.Now - FullStartTime;
// Console.WriteLine("MaxDOP,FilesProcessed,ProcessingTime[ms],BytesProcessed,LinesRead,SomeBookLines,LinesPer[ms],BytesPer[ms]");
Console.WriteLine("{0},{1},{2},{3},{4},{5},{6},{7}",
maxDOP.ToString(),
TotalFiles.Sum().ToString(),
Convert.ToInt32(runningTime.TotalMilliseconds).ToString(),
TotalLength.Sum().ToString(),
TotalLines.Sum(),
TotalSomeBookLines.Sum().ToString(),
Convert.ToInt64(TotalLines.Sum() / runningTime.TotalMilliseconds).ToString(),
Convert.ToInt64(TotalLength.Sum() / runningTime.TotalMilliseconds).ToString());
}
Console.WriteLine();
Console.WriteLine("Finish: " + DateTime.UtcNow.ToString("yyyy-MM-ddTHH:mm:ss.fffffffZ"));
}
}
}
Here's a summary of the results, with a clear peak at MaxDegreeOfParallelism = 8:
The CPU load (shown aggregated here, most of the load was on a single NUMA node, even when DOP was in 20 to 30 range):
The only way I've found to make CPU load cross 95% mark was to split the files across 4 different folders and execute the same command 4 times, each one targeting a subset of all files.
Can someone find a bottleneck?
It's likely that one problem is the small buffer size used by the default FileStream constructor. I suggest you use a larger input buffer. Such as:
using (FileStream infile = new FileStream(
name, FileMode.Open, FileAccess.Read, FileShare.None, 65536))
The default buffer size is 4 kilobytes, which has the thread making many calls to the I/O subsystem to fill its buffer. A buffer of 64K means that you will make those calls much less frequently.
I've found that a buffer size of between 32K and 256K gives the best performance, with 64K being the "sweet spot" when I did some detailed testing a while back. A buffer size larger than 256K actually begins to reduce performance.
Also, although this is unlikely to have a major effect on performance, you probably should replace those ConcurrentStack instances with 64-bit integers and use Interlocked.Add or Interlocked.Increment to update them. It simplifies your code and removes the need to manage the collections.
Update:
Re-reading your problem description, I was struck by this statement:
The only way I've found to make CPU load cross 95% mark was to split
the files across 4 different folders and execute the same command 4
times, each one targeting a subset of all files.
That, to me, points to a bottleneck in opening files. As though the OS is using a mutual exclusion lock on the directory. And even if all the data is in the cache and there's no physical I/O required, processes still have to wait on this lock. It's also possible that the file system is writing to the disk. Remember, it has to update the Last Access Time for a file whenever it's opened.
If I/O really is the bottleneck, then you might consider having a single thread that does nothing but load files and stuff them into a BlockingCollection or similar data structure so that the processing threads don't have to contend with each other for a lock on the directory. Your application becomes a producer/consumer application with one producer and N consumers.
The reason for this is usually that threads synchronize too much.
Looking for synchronization in your code I can see heavy syncing on the collections. Your threads are pushing the lines individually. This means that each line incurs at best an interlocked operation and at worst a kernel-mode lock wait. The interlocked operations will contend heavily because all threads race to get their current line into the collection. They all try to update the same memory locations. This causes cache line pinging.
Change this to push lines in bigger chunks. Push line-arrays of 100 lines or more. The more the better.
In other words, collect results in a thread-local collection first and only rarely merge into the global results.
You might even want to get rid of the manual data pushing altogether. This is what PLINQ is made for: Streaming data concurrently. PLINQ abstracts away all the concurrent collection manipulations in a well-performing way.
I don't think Parallelizing the disk reads is helping you. In fact, this could be seriously impacting your performance by creating contention in reading from multiple areas of storage at same time.
I would restructure the program to first do a single-threaded read of raw file data into a memory stream of byte[]. Then, do a Parallel.ForEach() on each stream or buffer to decompress and count the lines.
You take an initial IO read hit up front but let the OS/hardware optimize the hopefully mostly sequential reads, then decompress and parse in memory.
Keep in mind that operations like decomprless, Encoding.UTF8.ToString(), String.Split(), etc. will use large amounts of memory, so clean up references to/dispose of old buffers as you no longer need them.
I'd be surprised if you can't cause the machine to generate some serious waste hit this way.
Hope this helps.
The problem, I think, is that you are using blocking I/O, so your threads cannot fully take advantage of parallelism.
If I understand your algorithm right (sorry, I'm more of a C++ guy) this is what you are doing in each thread (pseudo-code):
while (there is data in the file)
read data
gunzip data
Instead, a better approach would be something like this:
N = 0
read data block N
while (there is data in the file)
asyncRead data block N+1
gunzip data block N
N = N + 1
gunzip data block N
The asyncRead call does not block, so basically you have the decoding of block N happening concurrently with the reading of block N+1, so by the time you are done decoding block N you might have block N+1 ready (or close to be ready if I/O is slower than decoding).
Then it's just a matter of finding the block size that gives you the best throughput.
Good luck.

Categories

Resources