Can't read FileStream into byte[] correctly - c#

I have some C# code to call as TF(true,"C:\input.txt","C:\noexistsyet.file"), but when I run it, it breaks on FileStream.Read() for reading the last chunk of the file into the buffer, getting an index-out-of-bounds ArgumentException.
To me, the code seems logical with no overflow for trying to write to the buffer. I thought I had all that set up with rdlen and _chunk, but maybe I'm looking at it wrong. Any help?
My error: ArgumentException was unhandled: Offset and length were out of bounds for the array or count is greater than the number of elements from index to the end of the source collection.
public static bool TF(bool tf, string filepath, string output)
{
long _chunk = 16 * 1024; //buffer count
long total_size = 0
long rdlen = 0;
long wrlen = 0;
long full_chunks = 0;
long end_remain_buf_len = 0;
FileInfo fi = new FileInfo(filepath);
total_size = fi.Length;
full_chunks = total_size / _chunk;
end_remain_buf_len = total_size % _chunk;
fi = null;
FileStream fs = new FileStream(filepath, FileMode.Open);
FileStream fw = new FileStream(output, FileMode.Create);
for (long chunk_pass = 0; chunk_pass < full_chunks; chunk_pass++)
{
int chunk = (int)_chunk * ((tf) ? (1 / 3) : 3); //buffer count for xbuffer
byte[] buffer = new byte[_chunk];
byte[] xbuffer = new byte[(buffer.Length * ((tf) ? (1 / 3) : 3))];
//Read chunk of file into buffer
fs.Read(buffer, (int)rdlen, (int)_chunk); //ERROR occurs here
//xbuffer = do stuff to make it *3 longer or *(1/3) shorter;
//Write xbuffer into chunk of completed file
fw.Write(xbuffer, (int)wrlen, chunk);
//Keep track of location in file, for index/offset
rdlen += _chunk;
wrlen += chunk;
}
if (end_remain_buf_len > 0)
{
byte[] buffer = new byte[end_remain_buf_len];
byte[] xbuffer = new byte[(buffer.Length * ((tf) ? (1 / 3) : 3))];
fs.Read(buffer, (int)rdlen, (int)end_remain_buf_len); //error here too
//xbuffer = do stuff to make it *3 longer or *(1/3) shorter;
fw.Write(xbuffer, (int)wrlen, (int)end_remain_buf_len * ((tf) ? (1 / 3) : 3));
rdlen += end_remain_buf_len;
wrlen += chunk;
}
//Close opened files
fs.Close();
fw.Close();
return false; //no functionality yet lol
}

The Read() method of Stream (the base class of FileStream) returns an int indicating the number of bytes read, and 0 when it has no more bytes to read, so you don't even need to know the file size beforehand:
public static void CopyFileChunked(int chunkSize, string filepath, string output)
{
byte[] chunk = new byte[chunkSize];
using (FileStream reader = new FileStream(filepath, FileMode.Open))
using (FileStream writer = new FileStream(output, FileMode.Create))
{
int bytes;
while ((bytes = reader.Read(chunk , 0, chunkSize)) > 0)
{
writer.Write(chunk, 0, bytes);
}
}
}
Or even File.Copy() may do the trick, if you can live with letting the framework decide about the chunk size.

I think it's failing on this line:
fw.Write(xbuffer, (int)wrlen, chunk);
You are declaring xbuffer as
byte[] xbuffer = new byte[(buffer.Length * ((tf) ? (1 / 3) : 3))];
Since 1 / 3 is an integer division, it returns 0.And you are declaring xbuffer with the size 0 hence the error.You can fix it by casting one of the operand to a floating point type or using literals.But then you still need to cast the result back to integer.
byte[] xbuffer = new byte[(int)(buffer.Length * ((tf) ? (1m / 3) : 3))];
The same problem also present in the chunk declaration.

Related

C# Garbage Collection Weird Behavior

I have a block of code that loads a custom storage file (data.00x) and dumps it's file contents (several files...) [for this example we'll say the referenced index only contains data.001 files for export]
Example:
public void ExportFileEntries(ref List<IndexEntry> filteredIndex, string dataDirectory, string buildDirectory, int chunkSize)
{
OnTotalMaxDetermined(new TotalMaxArgs(8));
// For each set of dataId files in the filteredIndex
for (int dataId = 1; dataId < 8; dataId++)
{
OnTotalProgressChanged(new TotalChangedArgs(dataId, string.Format("Exporting selected files from data.00{0}", dataId)));
// Filter only entries with current dataId into temp index
List<IndexEntry> tempIndex = GetEntriesByDataId(ref filteredIndex, dataId, SortType.Offset);
// Determine the path of the data.xxx file being exported from
string dataPath = string.Format(#"{0}\data.00{1}", dataDirectory, dataId);
if (File.Exists(dataPath))
{
// Load the data.xxx into filestream
using (FileStream dataFs = new FileStream(dataPath, FileMode.Open, FileAccess.Read))
{
// Loop through filex to export
foreach (IndexEntry indexEntry in tempIndex)
{
int fileLength = indexEntry.Length;
OnCurrentMaxDetermined(new CurrentMaxArgs(fileLength));
// Set the filestreams position to the file entries offset
dataFs.Position = indexEntry.Offset;
// Read the file into a byte array (buffer)
byte[] fileBytes = new byte[indexEntry.Length];
dataFs.Read(fileBytes, 0, fileBytes.Length);
// Define some information about the file being exported
string fileExt = Path.GetExtension(indexEntry.Name).Remove(0, 1);
string buildPath = string.Format(#"{0}\{1}\{2}", buildDirectory, fileExt.ToUpper(), indexEntry.Name);
// If needed unencrypt the data (fileBytes buffer)
if (XOR.Encrypted(fileExt)) { byte b = 0; XOR.Cipher(ref fileBytes, ref b); }
// If no chunkSize is provided, generate default
if (chunkSize == 0) { chunkSize = Math.Max(64000, (int)(fileBytes.Length * .02)); }
// If the build directory doesn't exist yet, create it.
if (!Directory.Exists(Path.GetDirectoryName(buildPath))) { Directory.CreateDirectory(Path.GetDirectoryName(buildPath)); }
using (FileStream buildFs = new FileStream(buildPath, FileMode.Create, FileAccess.Write))
{
using (BinaryWriter bw = new BinaryWriter(buildFs, encoding))
{
for (int byteCount = 0; byteCount < fileLength; byteCount += Math.Min(fileLength - byteCount, chunkSize))
{
bw.Write(fileBytes, byteCount, Math.Min(fileLength - byteCount, chunkSize));
OnCurrentProgressChanged(new CurrentChangedArgs(byteCount, ""));
}
}
}
OnCurrentProgressReset(EventArgs.Empty);
fileBytes = null;
}
}
}
else { OnError(new ErrorArgs(string.Format("[ExportFileEntries] Cannot locate: {0}", dataPath))); }
}
OnTotalProgressReset(EventArgs.Empty);
GC.Collect();
}
The data.001 stores about 12k files, most are very small .jpg pictures etc...etc.. for about the first half of the export process the gc collects just fine, but out of nowhere toward the last half of the export process the gc just stops giving a crap.
If I don't issue GC.Collect() at the end of the method the tool sits at around 255mb ram, but if I do call it goes down to about 14mb. What I'm asking, is there any obvious improvements over the way I coded the method (to increase gc performance)?

Binaryreader read from Filestream which loads in chunks

I'm reading values from a huge file (> 10 GB) using the following code:
FileStream fs = new FileStream(fileName, FileMode.Open);
BinaryReader br = new BinaryReader(fs);
int count = br.ReadInt32();
List<long> numbers = new List<long>(count);
for (int i = count; i > 0; i--)
{
numbers.Add(br.ReadInt64());
}
unfortunately the read-speed from my SSD is stuck at a few MB/s. I guess the limit are the IOPS of the SSD, so it might be better to read in chunks from the file.
Question
Does the FileStream in my code really read only 8 bytes from the file everytime the BinaryReader calls ReadInt64()?
If so, is there a transparent way for the BinaryReader to provide a stream that reads in larger chunks from the file to speed up the procedure?
Test-Code
Here's a minimal example to create a test-file and to measure the read-performance.
using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.IO;
namespace TestWriteRead
{
class Program
{
static void Main(string[] args)
{
System.IO.File.Delete("test");
CreateTestFile("test", 1000000000);
Stopwatch stopwatch = new Stopwatch();
stopwatch.Start();
IEnumerable<long> test = Read("test");
stopwatch.Stop();
Console.WriteLine("File loaded within " + stopwatch.ElapsedMilliseconds + "ms");
}
private static void CreateTestFile(string filename, int count)
{
FileStream fs = new FileStream(filename, FileMode.CreateNew);
BinaryWriter bw = new BinaryWriter(fs);
bw.Write(count);
for (int i = 0; i < count; i++)
{
long value = i;
bw.Write(value);
}
fs.Close();
}
private static IEnumerable<long> Read(string filename)
{
FileStream fs = new FileStream(filename, FileMode.Open);
BinaryReader br = new BinaryReader(fs);
int count = br.ReadInt32();
List<long> values = new List<long>(count);
for (int i = 0; i < count; i++)
{
long value = br.ReadInt64();
values.Add(value);
}
fs.Close();
return values;
}
}
}
You should configure the stream to use SequentialScan to indicate that you will read the stream from start to finish. It should improve the speed significantly.
Indicates that the file is to be accessed sequentially from beginning
to end. The system can use this as a hint to optimize file caching. If
an application moves the file pointer for random access, optimum
caching may not occur; however, correct operation is still guaranteed.
using (
var fs = new FileStream(fileName, FileMode.Open, FileAccess.Read, FileShare.ReadWrite, 8192,
FileOptions.SequentialScan))
{
var br = new BinaryReader(fs);
var count = br.ReadInt32();
var numbers = new List<long>();
for (int i = count; i > 0; i--)
{
numbers.Add(br.ReadInt64());
}
}
Try read blocks instead:
using (
var fs = new FileStream(fileName, FileMode.Open, FileAccess.Read, FileShare.ReadWrite, 8192,
FileOptions.SequentialScan))
{
var br = new BinaryReader(fs);
var numbersLeft = (int)br.ReadInt64();
byte[] buffer = new byte[8192];
var bufferOffset = 0;
var bytesLeftToReceive = sizeof(long) * numbersLeft;
var numbers = new List<long>();
while (true)
{
// Do not read more then possible
var bytesToRead = Math.Min(bytesLeftToReceive, buffer.Length - bufferOffset);
if (bytesToRead == 0)
break;
var bytesRead = fs.Read(buffer, bufferOffset, bytesToRead);
if (bytesRead == 0)
break; //TODO: Continue to read if file is not ready?
//move forward in read counter
bytesLeftToReceive -= bytesRead;
bytesRead += bufferOffset; //include bytes from previous read.
//decide how many complete numbers we got
var numbersToCrunch = bytesRead / sizeof(long);
//crunch them
for (int i = 0; i < numbersToCrunch; i++)
{
numbers.Add(BitConverter.ToInt64(buffer, i * sizeof(long)));
}
// move the last incomplete number to the beginning of the buffer.
var remainder = bytesRead % sizeof(long);
Buffer.BlockCopy(buffer, bytesRead - remainder, buffer, 0, remainder);
bufferOffset = remainder;
}
}
Update in response to a comment:
May I know what's the reason that manual reading is faster than the other one?
I don't know how the BinaryReader is actually implemented. So this is just assumptions.
The actual read from the disk is not the expensive part. The expensive part is to move the reader arm into the correct position on the disk.
As your application isn't the only one reading from a hard drive the disk have to re-position itself every time an application requests a read.
Thus if the BinaryReader just reads the requested int it have to wait on the disk for every read (if some other application make a read in-between).
As I read a much larger buffer directly (which is faster) I can process more integers without having to wait for the disk between reads.
Caching will of course speed things up a bit, and that's why it's "just" three times faster.
(future readers: If something above is incorrect, please correct me).
You can use a BufferedStream to increase the read buffer size.
In theory memory mapped files should help here. You could load it into memory using several very large chunks. Not sure though how much is this relevant when using SSDs.

C# Filestream Read - Recycle array?

I am working with filestream read: https://msdn.microsoft.com/en-us/library/system.io.filestream.read%28v=vs.110%29.aspx
What I'm trying to do is read a large file in a loop a certain number of bytes at a time; not the whole file at once. The code example shows this for reading:
int n = fsSource.Read(bytes, numBytesRead, numBytesToRead);
The definition of "bytes" is: "When this method returns, contains the specified byte array with the values between offset and (offset + count - 1) replaced by the bytes read from the current source."
I want to only read in 1 mb at a time so I do this:
using (FileStream fsInputFile = new FileStream(strInputFileName, FileMode.Open, FileAccess.Read)) {
int intBytesToRead = 1024;
int intTotalBytesRead = 0;
int intInputFileByteLength = 0;
byte[] btInputBlock = new byte[intBytesToRead];
byte[] btOutputBlock = new byte[intBytesToRead];
intInputFileByteLength = (int)fsInputFile.Length;
while (intInputFileByteLength - 1 >= intTotalBytesRead)
{
if (intInputFileByteLength - intTotalBytesRead < intBytesToRead)
{
intBytesToRead = intInputFileByteLength - intTotalBytesRead;
}
// *** Problem is here ***
int n = fsInputFile.Read(btInputBlock, intTotalBytesRead, intBytesToRead);
intTotalBytesRead += n;
fsOutputFile.Write(btInputBlock, intTotalBytesRead - n, n);
}
fsOutputFile.Close(); }
Where the problem area is stated, btInputBlock works on the first cycle because it reads in 1024 bytes. But then on the second loop, it doesn't recycle this byte array. It instead tries to append the new 1024 bytes into btInputBlock. As far as I can tell, you can only specify the offset and length of the file you want to read and not the offset and length of btInputBlock. Is there a way to "re-use" the array that is being dumped into by Filestream.Read or should I find another solution?
Thanks.
P.S. The exception on the read is: "Offset and length were out of bounds for the array or count is greater than the number of elements from index to the end of the source collection."
Your code can be simplified somewhat
int num;
byte[] buffer = new byte[1024];
while ((num = fsInputFile.Read(buffer, 0, buffer.Length)) != 0)
{
//Do your work here
fsOutputFile.Write(buffer, 0, num);
}
Note that Read takes in the Array to fill, the offset (which is the offset of the array where the bytes should be placed, and the (max) number of bytes to read.
That's because you're incrementing intTotalBytesRead, which is an offset for the array, not for the filestream. In your case it should always be zero, which will overwrite previous byte data in the array, rather than append it at the end, using intTotalBytesRead.
int n = fsInputFile.Read(btInputBlock, intTotalBytesRead, intBytesToRead); //currently
int n = fsInputFile.Read(btInputBlock, 0, intBytesToRead); //should be
Filestream doesn't need an offset, every Read picks up where the last one left off.
See https://msdn.microsoft.com/en-us/library/system.io.filestream.read(v=vs.110).aspx
for details
Your Read call should be Read(btInputBlock, 0, intBytesToRead). The 2nd parameter is the offset into the array you want to start writing the bytes to. Similarly for Write you want Write(btInputBlock, 0, n) as the 2nd parameter is the offset in the array to start writing bytes from. Also you don't need to call Close as the using will clean up the FileStream for you.
using (FileStream fsInputFile = new FileStream(strInputFileName, FileMode.Open, FileAccess.Read))
{
int intBytesToRead = 1024;
byte[] btInputBlock = new byte[intBytesToRead];
while (fsInputFile.Postion < fsInputFile.Length)
{
int n = fsInputFile.Read(btInputBlock, 0, intBytesToRead);
intTotalBytesRead += n;
fsOutputFile.Write(btInputBlock, 0, n);
}
}

Reading a file one byte at a time in reverse order

Hi I am trying to read a file one byte at a time in reverse order.So far I only managed to read the file from begining to end and write it on another file.
I need to be able to read the file from the end to the begining and print it to another file.
This is what I have so far:
string fileName = Console.ReadLine();
using (FileStream file = new FileStream(fileName ,FileMode.Open , FileAccess.Read))
{
//file.Seek(endOfFile, SeekOrigin.End);
int bytes;
using (FileStream newFile = new FileStream("newsFile.txt" , FileMode.Create , FileAccess.Write))
{
while ((bytes = file.ReadByte()) >= 0)
{
Console.WriteLine(bytes.ToString());
newFile.WriteByte((byte)bytes);
}
}
}
I know that I have to use the Seek method on the fileStream and that gets me to the end of the file.I already did that at the commented protion of the code , but I do not know how to read the file now in the while loop.
How can I achive this?
string fileName = Console.ReadLine();
using (FileStream file = new FileStream(fileName, FileMode.Open, FileAccess.Read))
{
byte[] output = new byte[file.Length]; // reversed file
// read the file backwards using SeekOrigin.Current
//
long offset;
file.Seek(0, SeekOrigin.End);
for (offset = 0; offset < fs.Length; offset++)
{
file.Seek(-1, SeekOrigin.Current);
output[offset] = (byte)file.ReadByte();
file.Seek(-1, SeekOrigin.Current);
}
// write entire reversed file array to new file
//
File.WriteAllBytes("newsFile.txt", output);
}
You could do it by reading one byte at a time, or you could read a larger buffer, write it to the output file in reverse, and continue like that until you've reached the beginning of the file. For example:
string inputFilename = "inputFile.txt";
string outputFilename = "outputFile.txt";
using (ofile = File.OpenWrite(outputFilename))
{
using (ifile = File.OpenRead(inputFilename))
{
int bufferSize = 4096;
byte[] buffer = new byte[bufferSize];
long filePos = ifile.Length;
do
{
long newPos = Math.Max(0, filePos - bufferSize);
int bytesToRead = (int)(filePos - newPos);
ifile.Seek(newPos, SeekOrigin.Set);
int bytesRead = ifile.Read(buffer, 0, bytesToRead);
// write the buffer to the output file, in reverse
for (int i = bytesRead-1; i >= 0; --i)
{
ofile.WriteByte(buffer[i]);
}
filePos = newPos;
} while (filePos > 0);
}
}
An obvious optimization would be to reverse the buffer after you've read it, and then write it in one whole chunk to the output file.
And if you know that the file will fit into memory, it's really easy:
var buffer = File.ReadAllBytes(inputFilename);
// now, reverse the buffer
int i = 0;
int j = buffer.Length-1;
while (i < j)
{
byte b = buffer[i];
buffer[i] = buffer[j];
buffer[j] = b;
++i;
--j;
}
// and write it
File.WriteAllBytes(outputFilename, buffer);
If the file is small (fits in your RAM) then this would work:
public static IEnumerable<byte> Reverse(string inputFilename)
{
var bytes = File.ReadAllBytes(inputFilename);
Array.Reverse(bytes);
foreach (var b in bytes)
{
yield return b;
}
}
Usage:
foreach (var b in Reverse("smallfile.dat"))
{
}
If the file is large (bigger than your RAM) then this would work:
using (var inputFile = File.OpenRead("bigfile.dat"))
using (var inputFileReversed = new ReverseStream(inputFile))
using (var binaryReader = new BinaryReader(inputFileReversed))
{
while (binaryReader.BaseStream.Position != binaryReader.BaseStream.Length)
{
var b = binaryReader.ReadByte();
}
}
It uses the ReverseStream class which can be found here.

zlib from C++ to C#(How to convert byte[] to stream and stream to byte[])

My task is to decompress a packet(received) using zlib and then use an algoritm to make a picture from the data
The good news is that I have the code in C++,but the task is to do it in C#
C++
//Read the first values of the packet received
DWORD image[200 * 64] = {0}; //used for algoritm(width always = 200 and height always == 64)
int imgIndex = 0; //used for algoritm
unsigned char rawbytes_[131072] = {0}; //read below
unsigned char * rawbytes = rawbytes_; //destrination parameter for decompression(ptr)
compressed = r.Read<WORD>(); //the length of the compressed bytes(picture)
uncompressed = r.Read<WORD>(); //the length that should be after decompression
width = r.Read<WORD>(); //the width of the picture
height = r.Read<WORD>(); //the height of the picture
LPBYTE ptr = r.GetCurrentStream(); //the bytes(file that must be decompressed)
outLen = uncompressed; //copy the len into another variable
//Decompress
if(uncompress((Bytef*)rawbytes, &outLen, ptr, compressed) != Z_OK)
{
printf("Could not uncompress the image code.\n");
Disconnect();
return;
}
//Algoritm to make up the picture
// Loop through the data
for(int c = 0; c < (int)height; ++c)
{
for(int r = 0; r < (int)width; ++r)
{
imgIndex = (height - 1 - c) * width + r;
image[imgIndex] = 0xFF000000;
if(-((1 << (0xFF & (r & 0x80000007))) & rawbytes[((c * width + r) >> 3)]))
image[imgIndex] = 0xFFFFFFFF;
}
}
I'm trying to do this with zlib.NET ,but all demos have that code to decompress(C#)
private void decompressFile(string inFile, string outFile)
{
System.IO.FileStream outFileStream = new System.IO.FileStream(outFile, System.IO.FileMode.Create);
zlib.ZOutputStream outZStream = new zlib.ZOutputStream(outFileStream);
System.IO.FileStream inFileStream = new System.IO.FileStream(inFile, System.IO.FileMode.Open);
try
{
CopyStream(inFileStream, outZStream);
}
finally
{
outZStream.Close();
outFileStream.Close();
inFileStream.Close();
}
}
public static void CopyStream(System.IO.Stream input, System.IO.Stream output)
{
byte[] buffer = new byte[2000];
int len;
while ((len = input.Read(buffer, 0, 2000)) > 0)
{
output.Write(buffer, 0, len);
}
output.Flush();
}
My problem:I don't want to save the file after decompression,because I have to use the algoritm shown in the C++ code.
How to convert the byte[] array into a stream similiar to the one in the C# zlib code to decompress the data and then how to convert the stream back into byte array?
Also,How to change the zlib.NET code to NOT save files?
Just use MemoryStreams instead of FileStreams:
// Assuming inputData is a byte[]
MemoryStream input = new MemoryStream(inputData);
MemoryStream output = new MemoryStream();
Then you can use output.ToArray() afterwards to get a byte array out.
Note that it's generally better to use using statements instead of a single try/finally block - as otherwise if the first call to Close fails, the rest won't be made. You can nest them like this:
using (MemoryStream output = new MemoryStream())
using (Stream outZStream = new zlib.ZOutputStream(output))
using (Stream input = new MemoryStream(bytes))
{
CopyStream(inFileStream, outZStream);
return output.ToArray();
}
I just ran into this same issue.
For Completeness... (since this stumped me for several hours)
In the case of ZLib.Net you also have to call finish(), which usually happens during Close(), before you call return output.ToArray()
Otherwise you will get an empty/incomplete byte array from your memory stream, because the ZStream hasn't actually written all of the data yet:
public static void CompressData(byte[] inData, out byte[] outData)
{
using (MemoryStream outMemoryStream = new MemoryStream())
using (ZOutputStream outZStream = new ZOutputStream(outMemoryStream, zlibConst.Z_DEFAULT_COMPRESSION))
using (Stream inMemoryStream = new MemoryStream(inData))
{
CopyStream(inMemoryStream, outZStream);
outZStream.finish();
outData = outMemoryStream.ToArray();
}
}
public static void DecompressData(byte[] inData, out byte[] outData)
{
using (MemoryStream outMemoryStream = new MemoryStream())
using (ZOutputStream outZStream = new ZOutputStream(outMemoryStream))
using (Stream inMemoryStream = new MemoryStream(inData))
{
CopyStream(inMemoryStream, outZStream);
outZStream.finish();
outData = outMemoryStream.ToArray();
}
}
In this example I'm also using the zlib namespace:
using zlib;
Originally found in this thread:
ZLib decompression
I don't have enough points to vote up yet, so...
Thanks to Tim Greaves for the tip regarding finish before ToArray
And Jon Skeet for the tip regarding nesting the using statements for streams (which I like much better than try/finally)

Categories

Resources