I have a 1 GB file that I need to write to a TcpClient object. What's the best way to do this without reading the entire file into memory?
You have to read it into memory at some point though you obviously don't need to do it all at once!
Just use BinaryReader.Read and read in "n" number of bytes at a time, something like:
BinaryReader reader = new BinaryReader(new FileStream("test.dat", FileMode.Open));
int currentIndex = 0;
byte[] buffer = new byte[100];
while (reader.Read(buffer, currentIndex, 100) > 0)
{
//Send buffer
currentIndex += 100;
}
Related
I'm reading values from a huge file (> 10 GB) using the following code:
FileStream fs = new FileStream(fileName, FileMode.Open);
BinaryReader br = new BinaryReader(fs);
int count = br.ReadInt32();
List<long> numbers = new List<long>(count);
for (int i = count; i > 0; i--)
{
numbers.Add(br.ReadInt64());
}
unfortunately the read-speed from my SSD is stuck at a few MB/s. I guess the limit are the IOPS of the SSD, so it might be better to read in chunks from the file.
Question
Does the FileStream in my code really read only 8 bytes from the file everytime the BinaryReader calls ReadInt64()?
If so, is there a transparent way for the BinaryReader to provide a stream that reads in larger chunks from the file to speed up the procedure?
Test-Code
Here's a minimal example to create a test-file and to measure the read-performance.
using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.IO;
namespace TestWriteRead
{
class Program
{
static void Main(string[] args)
{
System.IO.File.Delete("test");
CreateTestFile("test", 1000000000);
Stopwatch stopwatch = new Stopwatch();
stopwatch.Start();
IEnumerable<long> test = Read("test");
stopwatch.Stop();
Console.WriteLine("File loaded within " + stopwatch.ElapsedMilliseconds + "ms");
}
private static void CreateTestFile(string filename, int count)
{
FileStream fs = new FileStream(filename, FileMode.CreateNew);
BinaryWriter bw = new BinaryWriter(fs);
bw.Write(count);
for (int i = 0; i < count; i++)
{
long value = i;
bw.Write(value);
}
fs.Close();
}
private static IEnumerable<long> Read(string filename)
{
FileStream fs = new FileStream(filename, FileMode.Open);
BinaryReader br = new BinaryReader(fs);
int count = br.ReadInt32();
List<long> values = new List<long>(count);
for (int i = 0; i < count; i++)
{
long value = br.ReadInt64();
values.Add(value);
}
fs.Close();
return values;
}
}
}
You should configure the stream to use SequentialScan to indicate that you will read the stream from start to finish. It should improve the speed significantly.
Indicates that the file is to be accessed sequentially from beginning
to end. The system can use this as a hint to optimize file caching. If
an application moves the file pointer for random access, optimum
caching may not occur; however, correct operation is still guaranteed.
using (
var fs = new FileStream(fileName, FileMode.Open, FileAccess.Read, FileShare.ReadWrite, 8192,
FileOptions.SequentialScan))
{
var br = new BinaryReader(fs);
var count = br.ReadInt32();
var numbers = new List<long>();
for (int i = count; i > 0; i--)
{
numbers.Add(br.ReadInt64());
}
}
Try read blocks instead:
using (
var fs = new FileStream(fileName, FileMode.Open, FileAccess.Read, FileShare.ReadWrite, 8192,
FileOptions.SequentialScan))
{
var br = new BinaryReader(fs);
var numbersLeft = (int)br.ReadInt64();
byte[] buffer = new byte[8192];
var bufferOffset = 0;
var bytesLeftToReceive = sizeof(long) * numbersLeft;
var numbers = new List<long>();
while (true)
{
// Do not read more then possible
var bytesToRead = Math.Min(bytesLeftToReceive, buffer.Length - bufferOffset);
if (bytesToRead == 0)
break;
var bytesRead = fs.Read(buffer, bufferOffset, bytesToRead);
if (bytesRead == 0)
break; //TODO: Continue to read if file is not ready?
//move forward in read counter
bytesLeftToReceive -= bytesRead;
bytesRead += bufferOffset; //include bytes from previous read.
//decide how many complete numbers we got
var numbersToCrunch = bytesRead / sizeof(long);
//crunch them
for (int i = 0; i < numbersToCrunch; i++)
{
numbers.Add(BitConverter.ToInt64(buffer, i * sizeof(long)));
}
// move the last incomplete number to the beginning of the buffer.
var remainder = bytesRead % sizeof(long);
Buffer.BlockCopy(buffer, bytesRead - remainder, buffer, 0, remainder);
bufferOffset = remainder;
}
}
Update in response to a comment:
May I know what's the reason that manual reading is faster than the other one?
I don't know how the BinaryReader is actually implemented. So this is just assumptions.
The actual read from the disk is not the expensive part. The expensive part is to move the reader arm into the correct position on the disk.
As your application isn't the only one reading from a hard drive the disk have to re-position itself every time an application requests a read.
Thus if the BinaryReader just reads the requested int it have to wait on the disk for every read (if some other application make a read in-between).
As I read a much larger buffer directly (which is faster) I can process more integers without having to wait for the disk between reads.
Caching will of course speed things up a bit, and that's why it's "just" three times faster.
(future readers: If something above is incorrect, please correct me).
You can use a BufferedStream to increase the read buffer size.
In theory memory mapped files should help here. You could load it into memory using several very large chunks. Not sure though how much is this relevant when using SSDs.
I am working with filestream read: https://msdn.microsoft.com/en-us/library/system.io.filestream.read%28v=vs.110%29.aspx
What I'm trying to do is read a large file in a loop a certain number of bytes at a time; not the whole file at once. The code example shows this for reading:
int n = fsSource.Read(bytes, numBytesRead, numBytesToRead);
The definition of "bytes" is: "When this method returns, contains the specified byte array with the values between offset and (offset + count - 1) replaced by the bytes read from the current source."
I want to only read in 1 mb at a time so I do this:
using (FileStream fsInputFile = new FileStream(strInputFileName, FileMode.Open, FileAccess.Read)) {
int intBytesToRead = 1024;
int intTotalBytesRead = 0;
int intInputFileByteLength = 0;
byte[] btInputBlock = new byte[intBytesToRead];
byte[] btOutputBlock = new byte[intBytesToRead];
intInputFileByteLength = (int)fsInputFile.Length;
while (intInputFileByteLength - 1 >= intTotalBytesRead)
{
if (intInputFileByteLength - intTotalBytesRead < intBytesToRead)
{
intBytesToRead = intInputFileByteLength - intTotalBytesRead;
}
// *** Problem is here ***
int n = fsInputFile.Read(btInputBlock, intTotalBytesRead, intBytesToRead);
intTotalBytesRead += n;
fsOutputFile.Write(btInputBlock, intTotalBytesRead - n, n);
}
fsOutputFile.Close(); }
Where the problem area is stated, btInputBlock works on the first cycle because it reads in 1024 bytes. But then on the second loop, it doesn't recycle this byte array. It instead tries to append the new 1024 bytes into btInputBlock. As far as I can tell, you can only specify the offset and length of the file you want to read and not the offset and length of btInputBlock. Is there a way to "re-use" the array that is being dumped into by Filestream.Read or should I find another solution?
Thanks.
P.S. The exception on the read is: "Offset and length were out of bounds for the array or count is greater than the number of elements from index to the end of the source collection."
Your code can be simplified somewhat
int num;
byte[] buffer = new byte[1024];
while ((num = fsInputFile.Read(buffer, 0, buffer.Length)) != 0)
{
//Do your work here
fsOutputFile.Write(buffer, 0, num);
}
Note that Read takes in the Array to fill, the offset (which is the offset of the array where the bytes should be placed, and the (max) number of bytes to read.
That's because you're incrementing intTotalBytesRead, which is an offset for the array, not for the filestream. In your case it should always be zero, which will overwrite previous byte data in the array, rather than append it at the end, using intTotalBytesRead.
int n = fsInputFile.Read(btInputBlock, intTotalBytesRead, intBytesToRead); //currently
int n = fsInputFile.Read(btInputBlock, 0, intBytesToRead); //should be
Filestream doesn't need an offset, every Read picks up where the last one left off.
See https://msdn.microsoft.com/en-us/library/system.io.filestream.read(v=vs.110).aspx
for details
Your Read call should be Read(btInputBlock, 0, intBytesToRead). The 2nd parameter is the offset into the array you want to start writing the bytes to. Similarly for Write you want Write(btInputBlock, 0, n) as the 2nd parameter is the offset in the array to start writing bytes from. Also you don't need to call Close as the using will clean up the FileStream for you.
using (FileStream fsInputFile = new FileStream(strInputFileName, FileMode.Open, FileAccess.Read))
{
int intBytesToRead = 1024;
byte[] btInputBlock = new byte[intBytesToRead];
while (fsInputFile.Postion < fsInputFile.Length)
{
int n = fsInputFile.Read(btInputBlock, 0, intBytesToRead);
intTotalBytesRead += n;
fsOutputFile.Write(btInputBlock, 0, n);
}
}
I am doing some data chunking and I'm seeing an interesting issue when sending binary data in my response. I can confirm that the length of the byte array is below my data limit of 4 megabytes, but when I receive the message, it's total size is over 4 megabytes.
For the example below, I used the largest chunk size I could so I could illustrate the issue while still receiving a usable chunk.
The size of the binary data is 3,040,870 on the service side and the client (once the message is deserialized). However, I can also confirm that the byte array is actually just under 4 megabytes (this was done by actually copying the binary data from the message and pasting it into a text file).
So, is WCF causing these issues and, if so, is there anything I can do to prevent it? If not, what might be causing this inflation on my side?
Thanks!
The usual way of sending byte[]s in SOAP messages is to base64-encode the data. This encoding takes 33% more space than binary encoding, which accounts for the size difference almost precisely.
You could adjust the max size or chunk size slightly so that the end result is within the right range, or use another encoding, e.g. MTOM, to eliminate this 33% overhead.
If you're stuck with soap, you can offset the buffer overhead Tim S. talked about using the System.IO.Compression library in .Net - You'd use the compress function first, before building and sending the soap message.
You'd compress with this:
public static byte[] Compress(byte[] data)
{
MemoryStream ms = new MemoryStream();
DeflateStream ds = new DeflateStream(ms, CompressionMode.Compress);
ds.Write(data, 0, data.Length);
ds.Flush();
ds.Close();
return ms.ToArray();
}
On the receiving end, you'd use this to decompress:
public static byte[] Decompress(byte[] data)
{
const int BUFFER_SIZE = 256;
byte[] tempArray = new byte[BUFFER_SIZE];
List<byte[]> tempList = new List<byte[]>();
int count = 0;
int length = 0;
MemoryStream ms = new MemoryStream(data);
DeflateStream ds = new DeflateStream(ms, CompressionMode.Decompress);
while ((InlineAssignHelper(count, ds.Read(tempArray, 0, BUFFER_SIZE))) > 0) {
if (count == BUFFER_SIZE) {
tempList.Add(tempArray);
tempArray = new byte[BUFFER_SIZE];
} else {
byte[] temp = new byte[count];
Array.Copy(tempArray, 0, temp, 0, count);
tempList.Add(temp);
}
length += count;
}
byte[] retVal = new byte[length];
count = 0;
foreach (byte[] temp in tempList) {
Array.Copy(temp, 0, retVal, count, temp.Length);
count += temp.Length;
}
return retVal;
}
i got the following code:
byte[] myBytes = new byte[10 * 10000];
for (long i = 0; i < 10000; i++)
{
byte[] a1 = BitConverter.GetBytes(i);
byte[] a2 = BitConverter.GetBytes(true);
byte[] a3 = BitConverter.GetBytes(false);
byte[] rv = new byte[10];
System.Buffer.BlockCopy(a1, 0, rv, 0, a1.Length);
System.Buffer.BlockCopy(a2, 0, rv, a1.Length, a2.Length);
System.Buffer.BlockCopy(a3, 0, rv, a1.Length + a2.Length, a3.Length);
}
everything works as it should. i was trying to convert this code so everything will be written into myBytes but then i realised, that i use a long and if its value will be higher then int.MaxValue casting will fail.
how could one solve this?
another question would be, since i dont want to create a very large bytearray in memory, how could i send it directry to my .WriteBytes(path, myBytes); function ?
If the final destination for this is, as suggested, a file: then write to a file more directly, rather than buffering in memory:
using (var file = File.Create(path)) // or append file FileStream etc
using (var writer = new BinaryWriter(file))
{
for (long i = 0; i < 10000; i++)
{
writer.Write(i);
writer.Write(true);
writer.Write(false);
}
}
Perhaps the ideal way of doing this in your case would be to pass a single BinaryWriter instance to each object in turn as you serialize them (don't open and close the file per-object).
Why don't you just Write() the bytes out as you process them rather than converting to a massive buffer, or use a smaller buffer at least?
I have a client-server application where the server transmits a 4-byte integer specifying how large the next transmission is going to be. When I read the 4-byte integer on the client side (specifying FILE_SIZE), the next time I read the stream I get FILE_SIZE + 4 bytes read.
Do I need to specify the offset to 4 when reading from this stream, or is there a way to automatically advance the NetworkStream so my offset can always be 0?
SERVER
NetworkStream theStream = theClient.getStream();
//...
//Calculate file size with FileInfo and put into byte[] size
//...
theStream.Write(size, 0, size.Length);
theStream.Flush();
CLIENT
NetworkStream theStream = theClient.getStream();
//read size
byte[] size = new byte[4];
int bytesRead = theStream.Read(size, 0, 4);
...
//read content
byte[] content = new byte[4096];
bytesRead = theStream.Read(content, 0, 4096);
Console.WriteLine(bytesRead); // <-- Prints filesize + 4
Right; found it; FileInfo.Length is a long; your call to:
binWrite.Write(fileInfo.Length);
writes 8 bytes, little-endian. You then read that back via:
filesize = binRead.ReadInt32();
which little-endian will give you the same value (for 32 bits, at least). You have 4 00 bytes left unused in the stream, though (from the high-bytes of the long) - hence the 4 byte mismatch.
Use one of:
binWrite.Write((int)fileInfo.Length);
filesize = binRead.ReadInt64();
NetworkStream certainly advances, but in both cases, your read is unreliable; a classic "read known amount of content" would be:
static void ReadAll(Stream source, byte[] buffer, int bytes) {
if(bytes > buffer.Length) throw new ArgumentOutOfRangeException("bytes");
int bytesRead, offset = 0;
while(bytes > 0 && (bytesRead = source.Reader(buffer, offset, bytes)) > 0) {
offset += bytesRead;
bytes -= bytesRead;
}
if(bytes != 0) throw new EndOfStreamException();
}
with:
ReadAll(theStream, size, 4);
...
ReadAll(theStream, content, contentLength);
note also that you need to be careful with endianness when parsing the length-prefix.
I suspect you simply aren't reading the complete data.