I have to split a huge file into many smaller files. Each of the destination files is defined by an offset and length as the number of bytes. I'm using the following code:
private void copy(string srcFile, string dstFile, int offset, int length)
{
BinaryReader reader = new BinaryReader(File.OpenRead(srcFile));
reader.BaseStream.Seek(offset, SeekOrigin.Begin);
byte[] buffer = reader.ReadBytes(length);
BinaryWriter writer = new BinaryWriter(File.OpenWrite(dstFile));
writer.Write(buffer);
}
Considering that I have to call this function about 100,000 times, it is remarkably slow.
Is there a way to make the Writer connected directly to the Reader? (That is, without actually loading the contents into the Buffer in memory.)
I don't believe there's anything within .NET to allow copying a section of a file without buffering it in memory. However, it strikes me that this is inefficient anyway, as it needs to open the input file and seek many times. If you're just splitting up the file, why not open the input file once, and then just write something like:
public static void CopySection(Stream input, string targetFile, int length)
{
byte[] buffer = new byte[8192];
using (Stream output = File.OpenWrite(targetFile))
{
int bytesRead = 1;
// This will finish silently if we couldn't read "length" bytes.
// An alternative would be to throw an exception
while (length > 0 && bytesRead > 0)
{
bytesRead = input.Read(buffer, 0, Math.Min(length, buffer.Length));
output.Write(buffer, 0, bytesRead);
length -= bytesRead;
}
}
}
This has a minor inefficiency in creating a buffer on each invocation - you might want to create the buffer once and pass that into the method as well:
public static void CopySection(Stream input, string targetFile,
int length, byte[] buffer)
{
using (Stream output = File.OpenWrite(targetFile))
{
int bytesRead = 1;
// This will finish silently if we couldn't read "length" bytes.
// An alternative would be to throw an exception
while (length > 0 && bytesRead > 0)
{
bytesRead = input.Read(buffer, 0, Math.Min(length, buffer.Length));
output.Write(buffer, 0, bytesRead);
length -= bytesRead;
}
}
}
Note that this also closes the output stream (due to the using statement) which your original code didn't.
The important point is that this will use the operating system file buffering more efficiently, because you reuse the same input stream, instead of reopening the file at the beginning and then seeking.
I think it'll be significantly faster, but obviously you'll need to try it to see...
This assumes contiguous chunks, of course. If you need to skip bits of the file, you can do that from outside the method. Also, if you're writing very small files, you may want to optimise for that situation too - the easiest way to do that would probably be to introduce a BufferedStream wrapping the input stream.
The fastest way to do file I/O from C# is to use the Windows ReadFile and WriteFile functions. I have written a C# class that encapsulates this capability as well as a benchmarking program that looks at differnet I/O methods, including BinaryReader and BinaryWriter. See my blog post at:
http://designingefficientsoftware.wordpress.com/2011/03/03/efficient-file-io-from-csharp/
How large is length? You may do better to re-use a fixed sized (moderately large, but not obscene) buffer, and forget BinaryReader... just use Stream.Read and Stream.Write.
(edit) something like:
private static void copy(string srcFile, string dstFile, int offset,
int length, byte[] buffer)
{
using(Stream inStream = File.OpenRead(srcFile))
using (Stream outStream = File.OpenWrite(dstFile))
{
inStream.Seek(offset, SeekOrigin.Begin);
int bufferLength = buffer.Length, bytesRead;
while (length > bufferLength &&
(bytesRead = inStream.Read(buffer, 0, bufferLength)) > 0)
{
outStream.Write(buffer, 0, bytesRead);
length -= bytesRead;
}
while (length > 0 &&
(bytesRead = inStream.Read(buffer, 0, length)) > 0)
{
outStream.Write(buffer, 0, bytesRead);
length -= bytesRead;
}
}
}
You shouldn't re-open the source file each time you do a copy, better open it once and pass the resulting BinaryReader to the copy function. Also, it might help if you order your seeks, so you don't make big jumps inside the file.
If the lengths aren't too big, you can also try to group several copy calls by grouping offsets that are near to each other and reading the whole block you need for them, for example:
offset = 1234, length = 34
offset = 1300, length = 40
offset = 1350, length = 1000
can be grouped to one read:
offset = 1234, length = 1074
Then you only have to "seek" in your buffer and can write the three new files from there without having to read again.
Have you considered using the CCR since you are writing to separate files you can do everything in parallel (read and write) and the CCR makes it very easy to do this.
static void Main(string[] args)
{
Dispatcher dp = new Dispatcher();
DispatcherQueue dq = new DispatcherQueue("DQ", dp);
Port<long> offsetPort = new Port<long>();
Arbiter.Activate(dq, Arbiter.Receive<long>(true, offsetPort,
new Handler<long>(Split)));
FileStream fs = File.Open(file_path, FileMode.Open);
long size = fs.Length;
fs.Dispose();
for (long i = 0; i < size; i += split_size)
{
offsetPort.Post(i);
}
}
private static void Split(long offset)
{
FileStream reader = new FileStream(file_path, FileMode.Open,
FileAccess.Read);
reader.Seek(offset, SeekOrigin.Begin);
long toRead = 0;
if (offset + split_size <= reader.Length)
toRead = split_size;
else
toRead = reader.Length - offset;
byte[] buff = new byte[toRead];
reader.Read(buff, 0, (int)toRead);
reader.Dispose();
File.WriteAllBytes("c:\\out" + offset + ".txt", buff);
}
This code posts offsets to a CCR port which causes a Thread to be created to execute the code in the Split method. This causes you to open the file multiple times but gets rid of the need for synchronization. You can make it more memory efficient but you'll have to sacrifice speed.
The first thing I would recommend is to take measurements. Where are you losing your time? Is it in the read, or the write?
Over 100,000 accesses (sum the times):
How much time is spent allocating the buffer array?
How much time is spent opening the file for read (is it the same file every time?)
How much time is spent in read and write operations?
If you aren't doing any type of transformation on the file, do you need a BinaryWriter, or can you use a filestream for writes? (try it, do you get identical output? does it save time?)
Using FileStream + StreamWriter I know it's possible to create massive files in little time (less than 1 min 30 seconds). I generate three files totaling 700+ megabytes from one file using that technique.
Your primary problem with the code you're using is that you are opening a file every time. That is creating file I/O overhead.
If you knew the names of the files you would be generating ahead of time, you could extract the File.OpenWrite into a separate method; it will increase the speed. Without seeing the code that determines how you are splitting the files, I don't think you can get much faster.
No one suggests threading? Writing the smaller files looks like text book example of where threads are useful. Set up a bunch of threads to create the smaller files. this way, you can create them all in parallel and you don't need to wait for each one to finish. My assumption is that creating the files(disk operation) will take WAY longer than splitting up the data. and of course you should verify first that a sequential approach is not adequate.
(For future reference.)
Quite possibly the fastest way to do this would be to use memory mapped files (so primarily copying memory, and the OS handling the file reads/writes via its paging/memory management).
Memory Mapped files are supported in managed code in .NET 4.0.
But as noted, you need to profile, and expect to switch to native code for maximum performance.
Related
I'm a novice programmer. I'm creating a library to process binary files of a certain type -- like a codec (though without a need to process a progressive stream coming over a wire). I'm looking for an efficient way to read the file into memory and then parse portions of the data as needed. In particular, I'd like to avoid large memory copies, hopefully without a lot of added complexity to avoid that.
In some situations, I want to do sequential reading of values in the data. For this, a MemoryStream works well.
FileStream fs = new FileStream(_fileName, FileMode.Open, FileAccess.Read);
byte[] bytes = new byte[fs.Length];
fs.Read(bytes, 0, bytes.Length);
_ms = new MemoryStream(bytes, 0, bytes.Length, false, true);
fs.Close();
(That involved a copy from the bytes array into the memory stream; that's one time, and I don't know of a way to avoid it.)
With the memory stream, it's easy to seek to arbitrary positions and then start reading structure members. E.g.,
_ms.Seek(_tableRecord.Offset, SeekOrigin.Begin);
byte[] ab32 = new byte[4];
_version = ConvertToUint(_ms.Read(ab32));
_numRecords = ConvertToUint(_ms.Read(ab32));
// etc.
But there may also be times when I want to take a slice out of the memory corresponding to some large structure and then pass into a method for certain processing. MemoryStream doesn't support that. I could always pass the MemoryStream plus offset and length, though that might not always be the most convenient.
Instead of MemoryStream, I could store the data in memory using Memory. That supports slicing, but not sequential reading.
If for some situation I want to get a slice (rather than pass stream & offset/length), I could construct an ArraySegment from MemoryStream.GetBuffer.
ArraySegment<byte> as = new ArraySegment<byte>(ms.GetBuffer(), offset, length);
It's not clear to me, though, if that will result in a (potentially large) copy, or if that uses a reference into the same memory held by the MemoryStream. I gather that GetBuffer exposes the underlying memory rather than providing a copy; and that ArraySegment will point into the same memory?
There will be times when I need to get a slice that is a copy as I'll need to modify some elements and then process that, but without changing the original. If ArraySegment gets a reference rather than a copy, I gather I could use ArraySegment<byte>.ToArray()?
So, my questions are:
Is MemoryStream the best approach? Is there any other type that allows sequential reading like MemoryStream but also allows slicing like Memory?
If I want a slice without copying memory, will ArraySegment<byte>(ms.GetBuffer(), offset, length) do that?
Then if I need a copy that can be modified without affecting the original, use ArraySegment<byte>.ToArray()?
Is there a way to read the data from a file directly into a new MemoryStream without creating a temporary byte array that gets copied?
Am I approaching all this the best way?
To get the initial MemoryStream from reading the file, the following works:
byte[] bytes;
try
{
// File.ReadAllBytes opens a filestream and then ensures it is closed
bytes = File.ReadAllBytes(_fi.FullName);
_ms = new MemoryStream(bytes, 0, bytes.Length, false, true);
}
catch (IOException e)
{
throw e;
}
File.ReadAllBytes() copies the file content into memory. It uses using, which means that it ensures the file gets closed. So no Finally statement is needed.
I can read individual values from the MemoryStream using MemoryStream.Read. These calls involve copies of those values, which is fine.
In one situation, I needed to read a table out of the file, change a value, and then calculate a checksum of the entire file with that change in place. Instead of copying the entire file so that I could edit one part, I was able to calculate the checksum in progressive steps: first on the initial, unchanged segment of the file, then continue with the middle segment that was changed, then continue with the remainder.
For this I could process the first and final segments using the MemoryStream. This involved lots of reads, with each read copying; but those copies were transient variables, so no significant working set increase.
For the middle segment, that needed to be copied since it had to be changed (but the original version needed to be kept intact). The following worked:
// get ref (not copy!) to the byte array underlying the MemoryStream
byte[] fileData = _ms.GetBuffer();
// determine the required length
int length = _tableRecord.Length;
// create array to hold the copy
byte[] segmentCopy = new byte[length];
// get the copy
Array.ConstrainedCopy(fileData, _tableRecord.Offset, segmentCopy, 0, length);
After modifying values in segmentCopy, I then needed to pass this to my static method for calculating checksums, which expected a MemoryStream (for sequential reading). This worked:
// new MemoryStream will hold a ref to the segmentCopy array (no new copy!)
MemoryStream ms = new MemoryStream(segmentCopy, 0, segmentCopy.Length);
What I haven't needed to do yet, but will want to do, is to get a slice of the MemoryStream that doesn't involve copying. This works:
MemoryStream sliceFromMS = new MemoryStream(fileData, offset, length);
From above, fileData was a ref to the array underlying the original MemoryStream. Now sliceFromMS will have a ref to a segment within that same array.
You can use FileStream.Seek, as I understand it, there is no need to load data into memory, then to use this method of MemoryStream
In the following example, str1 and str2 are equal:
using (var fs = new FileStream(#"C:\Users\bar_v\OneDrive\Desktop\js_balancer.txt", FileMode.Open))
{
var buffer = new byte[20];
fs.Read(buffer, 0, 20);
var str1= Encoding.ASCII.GetString(buffer);
fs.Seek(0, SeekOrigin.Begin);
fs.Read(buffer, 0, 20);
var str2 = Encoding.ASCII.GetString(buffer);
}
By the way, when you create a new MemoryStream object, you don’t copy the byte array, you just keep a reference to it:
public MemoryStream(byte[] buffer, bool writable)
{
if (buffer == null)
throw new ArgumentNullException(nameof(buffer), SR.ArgumentNull_Buffer);
_buffer = buffer;
_length = _capacity = buffer.Length;
_writable = writable;
_exposable = false;
_origin = 0;
_isOpen = true;
}
But when reading, as we can see, copying occurs:
public override int Read(byte[] buffer, int offset, int count)
{
if (buffer == null)
throw new ArgumentNullException(nameof(buffer), SR.ArgumentNull_Buffer);
if (offset < 0)
throw new ArgumentOutOfRangeException(nameof(offset), SR.ArgumentOutOfRange_NeedNonNegNum);
if (count < 0)
throw new ArgumentOutOfRangeException(nameof(count), SR.ArgumentOutOfRange_NeedNonNegNum);
if (buffer.Length - offset < count)
throw new ArgumentException(SR.Argument_InvalidOffLen);
EnsureNotClosed();
int n = _length - _position;
if (n > count)
n = count;
if (n <= 0)
return 0;
Debug.Assert(_position + n >= 0, "_position + n >= 0"); // len is less than 2^31 -1.
if (n <= 8)
{
int byteCount = n;
while (--byteCount >= 0)
buffer[offset + byteCount] = _buffer[_position + byteCount];
}
else
Buffer.BlockCopy(_buffer, _position, buffer, offset, n);
_position += n;
return n;
}
First of all I understand that I can solve this issue using different ways. I guess that this issue exists only because of using different methods in incorrect way. But I want to find out what exactly happened in my example.
I was using StreamReader for reading file. In order to get bytes from it I decided to use BaseStream.Read:
int length = (int)reader.BaseStream.Length;
byte[] file = new byte[length];
while(!reader.EndOfStream)
{
int readBytes = reader.BaseStream.Read(file, 0,
(length-offset)>bufferSize?bufferSize:(length - offset));
for (int i = 0; i<readBytes; i++)
{
...
}
offset += readBytes;
}
BaseStream.Read refuses to get last 1024 bytes when property StreamReader.EndOfStream was used before reading. Later I've found information, that EndOfStream trying to read 1 byte, but in fact he reads 1024 bytes due performance. Apparently this 1kb become impossible to reach.
EDIT: If I delete reader.EndOfStream property in code, reader.BaseStream.Read will work correctly. That was the main point of question.
Again, I understand, that this code example is absolutely inefficient. I'm just trying to understand how streams work in that example and does this issue exist because of bad code only (or StreamReader.BaseStream has some issues)? Thanks in advance.
It is not StreamReader.BaseStream has some issues but is a problem in your code. When you work directly with the Stream wraped inside StreamReader.
From MSDN about StreamReader.DiscardBufferedData:
You need to call this method only when the position of the internal buffer and the BaseStream do not match. These positions can become mismatched when you read data into the buffer and then seek a new position in the underlying stream.
That mean, in your case, when the Stream already reached end position, the position of StreamReader internal buffer still remain the value before you read the underlying stream directedly, therefore reader.EndOfStream still = false. That why you can not finish the loop.
Edit:
I think you are missing something, I give you this code to prove that the file is successfully reached to the end. Run it and you see that your app repeatly say: I'm at the end of the file!
static void Main()
{
using (StreamReader reader = new StreamReader(#"yourFile"))
{
int offset = 0;
int bufferSize = 102400;
int length = (int)reader.BaseStream.Length;
byte[] file = new byte[length];
while (!reader.EndOfStream)
{
// Add this line:
Console.WriteLine(reader.BaseStream.Position);
Console.ReadLine();
int readBytes = reader.BaseStream.Read(file, 0,
(length - offset) > bufferSize ? bufferSize : (length - offset));
string str = Encoding.UTF8.GetString(file, 0, readBytes);
offset += readBytes;
if (reader.BaseStream.Position == length)
{
Console.WriteLine("I'm at the end of the file! Current Tickcount: " + Environment.TickCount);
Thread.Sleep(100);
}
}
}
}
Edit 2
But still , offset and length should be equal, im my case length - offset = 1024 (in case of files that bigger than 1kb). Maybe I'm doing something wrong, but if I use files with size less than 1kb, readBytes always equals 0.
That because your first call to while (!reader.EndOfStream), the reader have to read the file (this case is 1024 bytes - read bytes to internal buffer) to detemine if file is ended or not (see two lines of code I add above), after it read the file is seeked 1024 bytes, that why length - offset = 1024, and if your file small than 1kb then with this first call, it already seek to end of file. This is where you lost data.
The second call to it, it don't seek because you don't send any read request to the reader, so it consider unchanged, then it don't need read file again to check if at the end of file, that why the second call don't loss data.
I have written an application that implements a file copy that is written as below. I was wondering why, when attempting to copy from a network drive to a another network drive, the copy times are huge (20-30 mins to copy a 300mb file) with the following code:
public static void CopyFileToDestination(string source, string dest)
{
_log.Debug(string.Format("Copying file {0} to {1}", source, dest));
DateTime start = DateTime.Now;
string destinationFolderPath = Path.GetDirectoryName(dest);
if (!Directory.Exists(destinationFolderPath))
{
Directory.CreateDirectory(destinationFolderPath);
}
if (File.Exists(dest))
{
File.Delete(dest);
}
FileInfo sourceFile = new FileInfo(source);
if (!sourceFile.Exists)
{
throw new FileNotFoundException("source = " + source);
}
long totalBytesToTransfer = sourceFile.Length;
if (!CheckForFreeDiskSpace(dest, totalBytesToTransfer))
{
throw new ApplicationException(string.Format("Unable to copy file {0}: Not enough disk space on drive {1}.",
source, dest.Substring(0, 1).ToUpper()));
}
long bytesTransferred = 0;
using (FileStream reader = sourceFile.OpenRead())
{
using (FileStream writer = new FileStream(dest, FileMode.OpenOrCreate, FileAccess.Write))
{
byte[] buf = new byte[64 * 1024];
int bytesRead = reader.Read(buf, 0, buf.Length);
double lastPercentage = 0;
while (bytesRead > 0)
{
double percentage = ((float)bytesTransferred / (float)totalBytesToTransfer) * 100.0;
writer.Write(buf, 0, bytesRead);
bytesTransferred += bytesRead;
if (Math.Abs(lastPercentage - percentage) > 0.25)
{
System.Diagnostics.Debug.WriteLine(string.Format("{0} : Copied {1:#,##0} of {2:#,##0} MB ({3:0.0}%)",
sourceFile.Name,
bytesTransferred / (1024 * 1024),
totalBytesToTransfer / (1024 * 1024),
percentage));
lastPercentage = percentage;
}
bytesRead = reader.Read(buf, 0, buf.Length);
}
}
}
System.Diagnostics.Debug.WriteLine(string.Format("{0} : Done copying", sourceFile.Name));
_log.Debug(string.Format("{0} copied in {1:#,##0} seconds", sourceFile.Name, (DateTime.Now - start).TotalSeconds));
}
However, with a simple File.Copy, the time is as expected.
Does anyone have any insight? Could it be because we are making the copy in small chunks?
Changing the size of your buf variable doesn't change the size of the buffer that FileStream.Read or FileStream.Write use when communicating with the file system. To see any change with buffer size, you have to specify the buffer size when you open the file.
As I recall, the default buffer size is 4K. Performance testing I did some time ago showed that the sweet spot is somewhere between 64K and 256K, with 64K being more consistently the best choice.
You should change your File.OpenRead() to:
new FileStream(sourceFile.FullName, FileMode.Open, FileAccess.Read, FileShare.None, BufferSize)
Change the FileShare value if you don't want exclusive access, and declare BufferSize as a constant equal to whatever buffer size you want. I use 64*1024.
Also, change the way you open your output file to:
new FileStream(dest, FileMode.Create, FileAccess.Write, FileShare.None, BufferSize)
Note that I used FileMode.Create rather than FileMode.OpenOrCreate. If you use OpenOrCreate and the source file is smaller than the existing destination file, I don't think the file is truncated when you're done writing. So the destination file would contain extraneous data.
That said, I wouldn't expect this to change your copy time from 20-30 minutes down to the few seconds that it should take. I suppose it could if every low-level read requires a network call. With the default 4K buffer, you're making 16 read calls to the file system in order to fill your 64K buffer. So by increasing your buffer size you greatly reduce the number of OS calls (and potentially the number of network transactions) your code makes.
Finally, there's no need to check to see if a file exists before you delete it. File.Delete silently ignores an attempt to delete a file that doesn't exist.
Call the SetLength method on your writer Stream before actual copying, this should reduce operations by the target disk.
Like so
writer.SetLength(totalBytesToTransfer);
You may need to set the Stream's psoition back to the start after calling this method by using Seek. Check the position of the stream after calling SetLength, should be still zero.
writer.Seek(0, SeekOrigin.Begin); // Not sure on that one
If that still is too slow use the SetFileValidData
I am using this code to extract a chunk from file
// info is FileInfo object pointing to file
var percentSplit = info.Length * 50 / 100; // extract 50% of file
var bytes = new byte[percentSplit];
var fileStream = File.OpenRead(fileName);
fileStream.Read(bytes, 0, bytes.Length);
fileStream.Dispose();
File.WriteAllBytes(splitName, bytes);
Is there any way to speed up this process?
Currently for a 530 MB file it takes around 4 - 5 seconds. Can this time be improved?
There are several cases of you question, but none of them is language relevant.
Following are something to concern
What is the file system of source/destination file?
Do you want to keep original source file?
Are they lie on the same drive?
In c#, you almost do not have a method could be faster than File.Copy which invokes CopyFile of WINAPI internally. Because of the percentage is fifty, however, following code might not be faster. It copies whole file and then set the length of the destination file
var info=new FileInfo(fileName);
var percentSplit=info.Length*50/100; // extract 50% of file
File.Copy(info.FullName, splitName);
using(var outStream=File.OpenWrite(splitName))
outStream.SetLength(percentSplit);
Further, if
you don't keep original source after file splitted
destination drive is the same as source
your are not using a crypto/compression enabled file system
then, the best thing you can do, is don't copy files at all.
For example, if your source file lies on FAT or FAT32 file system, what you can do is
create new dir entry(entries) for newly splitted parts of file
let the entry(entries) point(s) to the cluster of target part(s)
set correct file size for each entry
check for cross-link and avoid that
If your file system was NTFS, you might need to spend a long time to study the spec.
Good luck!
var percentSplit = (int)(info.Length * 50 / 100); // extract 50% of file
var buffer = new byte[8192];
using (Stream input = File.OpenRead(info.FullName))
using (Stream output = File.OpenWrite(splitName))
{
int bytesRead = 1;
while (percentSplit > 0 && bytesRead > 0)
{
bytesRead = input.Read(buffer, 0, Math.Min(percentSplit, buffer.Length));
output.Write(buffer, 0, bytesRead);
percentSplit -= bytesRead;
}
output.Flush();
}
The flush may not be needed but it doesn't hurt, this was quite interesting, changing the loop to a do-while rather than a while had a big hit on performance. I suppose the IL is not as fast. My pc was running the original code in 4-6 secs, the attached code seemed to be running at about 1 second.
I get better results when reading/writing by chunks of a few megabytes. The performances changes also depending on the size of the chunk.
FileInfo info = new FileInfo(#"C:\source.bin");
FileStream f = File.OpenRead(info.FullName);
BinaryReader br = new BinaryReader(f);
FileStream t = File.OpenWrite(#"C:\split.bin");
BinaryWriter bw = new BinaryWriter(t);
long count = 0;
long split = info.Length * 50 / 100;
long chunk = 8000000;
DateTime start = DateTime.Now;
while (count < split)
{
if (count + chunk > split)
{
chunk = split - count;
}
bw.Write(br.ReadBytes((int)chunk));
count += chunk;
}
Console.WriteLine(DateTime.Now - start);
I am developing a TCP file transfer client-server program. At the moment I am able to send text files and other file formats perfectly fine, such as .zip with all contents intact on the server end. However, when I transfer a .gif the end result is a gif with same size as the original but with only part of the image showing as if most of the bytes were lost or not written correctly on the server end.
The client sends a 1KB header packet with the name and size of the file to the server. The server then responds with OK if ready and then creates a fileBuffer as large as the file to be sent is.
Here is some code to demonstrate my problem:
// Serverside method snippet dealing with data being sent
while (true)
{
// Spin the data in
if (streams[0].DataAvailable)
{
streams[0].Read(fileBuffer, 0, fileBuffer.Length);
break;
}
}
// Finished receiving file, write from buffer to created file
FileStream fs = File.Open(LOCAL_FOLDER + fileName, FileMode.CreateNew, FileAccess.Write);
fs.Write(fileBuffer, 0, fileBuffer.Length);
fs.Close();
Print("File successfully received.");
// Clientside method snippet dealing with a file send
while(true)
{
con.Read(ackBuffer, 0, ackBuffer.Length);
// Wait for OK response to start sending
if (Encoding.ASCII.GetString(ackBuffer) == "OK")
{
// Convert file to bytes
FileStream fs = new FileStream(inPath, FileMode.Open, FileAccess.Read);
fileBuffer = new byte[fs.Length];
fs.Read(fileBuffer, 0, (int)fs.Length);
fs.Close();
con.Write(fileBuffer, 0, fileBuffer.Length);
con.Flush();
break;
}
}
I've tried a binary writer instead of just using the filestream with the same result.
Am I incorrect in believing successful file transfer to be as simple as conversion to bytes, transportation and then conversion back to filename/type?
All help/advice much appreciated.
Its not about your image .. It's about your code.
if your image bytes were lost or not written correctly that's mean your file transfer code is wrong and even the .zip file or any other file would be received .. It's gonna be correpted.
It's a huge mistake to set the byte buffer length to the file size. imagine that you're going to send a large a file about 1GB .. then it's gonna take 1GB of RAM .. for an Idle transfering you should loop over the file to send.
This's a way to send/receive files nicely with no size limitation.
Send File
using (FileStream fs = new FileStream(srcPath, FileMode.Open, FileAccess.Read))
{
long fileSize = fs.Length;
long sum = 0; //sum here is the total of sent bytes.
int count = 0;
data = new byte[1024]; //8Kb buffer .. you might use a smaller size also.
while (sum < fileSize)
{
count = fs.Read(data, 0, data.Length);
network.Write(data, 0, count);
sum += count;
}
network.Flush();
}
Receive File
long fileSize = // your file size that you are going to receive it.
using (FileStream fs = new FileStream(destPath, FileMode.Create, FileAccess.Write))
{
int count = 0;
long sum = 0; //sum here is the total of received bytes.
data = new byte[1024 * 8]; //8Kb buffer .. you might use a smaller size also.
while (sum < fileSize)
{
if (network.DataAvailable)
{
{
count = network.Read(data, 0, data.Length);
fs.Write(data, 0, count);
sum += count;
}
}
}
}
happy coding :)
When you write over TCP, the data can arrive in a number of packets. I think your early tests happened to fit into one packet, but this gif file is arriving in 2 or more. So when you call Read, you'll only get what's arrived so far - you'll need to check repeatedly until you've got as many bytes as the header told you to expect.
I found Beej's guide to network programming a big help when doing some work with TCP.
As others have pointed out, the data doesn't necessarily all arrive at once, and your code is overwriting the beginning of the buffer each time through the loop. The more robust way to write your reading loop is to read as many bytes as are available and increment a counter to keep track of how many bytes have been read so far so that you know where to put them in the buffer. Something like this works well:
int totalBytesRead = 0;
int bytesRead;
do
{
bytesRead = streams[0].Read(fileBuffer, totalBytesRead, fileBuffer.Length - totalBytesRead);
totalBytesRead += bytesRead;
} while (bytesRead != 0);
Stream.Read will return 0 when there's no data left to read.
Doing things this way will perform better than reading a byte at a time. It also gives you a way to ensure that you read the proper number of bytes. If totalBytesRead is not equal to the number of bytes you expected when the loop is finished, then something bad happened.
Thanks for your input Tvanfosson. I tinkered around with my code and managed to get it working. The synchronicity between my client and server was off. I took your advice though and replaced read with reading a byte one at a time.