I am working with filestream read: https://msdn.microsoft.com/en-us/library/system.io.filestream.read%28v=vs.110%29.aspx
What I'm trying to do is read a large file in a loop a certain number of bytes at a time; not the whole file at once. The code example shows this for reading:
int n = fsSource.Read(bytes, numBytesRead, numBytesToRead);
The definition of "bytes" is: "When this method returns, contains the specified byte array with the values between offset and (offset + count - 1) replaced by the bytes read from the current source."
I want to only read in 1 mb at a time so I do this:
using (FileStream fsInputFile = new FileStream(strInputFileName, FileMode.Open, FileAccess.Read)) {
int intBytesToRead = 1024;
int intTotalBytesRead = 0;
int intInputFileByteLength = 0;
byte[] btInputBlock = new byte[intBytesToRead];
byte[] btOutputBlock = new byte[intBytesToRead];
intInputFileByteLength = (int)fsInputFile.Length;
while (intInputFileByteLength - 1 >= intTotalBytesRead)
{
if (intInputFileByteLength - intTotalBytesRead < intBytesToRead)
{
intBytesToRead = intInputFileByteLength - intTotalBytesRead;
}
// *** Problem is here ***
int n = fsInputFile.Read(btInputBlock, intTotalBytesRead, intBytesToRead);
intTotalBytesRead += n;
fsOutputFile.Write(btInputBlock, intTotalBytesRead - n, n);
}
fsOutputFile.Close(); }
Where the problem area is stated, btInputBlock works on the first cycle because it reads in 1024 bytes. But then on the second loop, it doesn't recycle this byte array. It instead tries to append the new 1024 bytes into btInputBlock. As far as I can tell, you can only specify the offset and length of the file you want to read and not the offset and length of btInputBlock. Is there a way to "re-use" the array that is being dumped into by Filestream.Read or should I find another solution?
Thanks.
P.S. The exception on the read is: "Offset and length were out of bounds for the array or count is greater than the number of elements from index to the end of the source collection."
Your code can be simplified somewhat
int num;
byte[] buffer = new byte[1024];
while ((num = fsInputFile.Read(buffer, 0, buffer.Length)) != 0)
{
//Do your work here
fsOutputFile.Write(buffer, 0, num);
}
Note that Read takes in the Array to fill, the offset (which is the offset of the array where the bytes should be placed, and the (max) number of bytes to read.
That's because you're incrementing intTotalBytesRead, which is an offset for the array, not for the filestream. In your case it should always be zero, which will overwrite previous byte data in the array, rather than append it at the end, using intTotalBytesRead.
int n = fsInputFile.Read(btInputBlock, intTotalBytesRead, intBytesToRead); //currently
int n = fsInputFile.Read(btInputBlock, 0, intBytesToRead); //should be
Filestream doesn't need an offset, every Read picks up where the last one left off.
See https://msdn.microsoft.com/en-us/library/system.io.filestream.read(v=vs.110).aspx
for details
Your Read call should be Read(btInputBlock, 0, intBytesToRead). The 2nd parameter is the offset into the array you want to start writing the bytes to. Similarly for Write you want Write(btInputBlock, 0, n) as the 2nd parameter is the offset in the array to start writing bytes from. Also you don't need to call Close as the using will clean up the FileStream for you.
using (FileStream fsInputFile = new FileStream(strInputFileName, FileMode.Open, FileAccess.Read))
{
int intBytesToRead = 1024;
byte[] btInputBlock = new byte[intBytesToRead];
while (fsInputFile.Postion < fsInputFile.Length)
{
int n = fsInputFile.Read(btInputBlock, 0, intBytesToRead);
intTotalBytesRead += n;
fsOutputFile.Write(btInputBlock, 0, n);
}
}
Related
In the text file(test.txt) have value
"Hi this my code project file"
Minimum=5 and maximum=6
I need out put is
minimum= 5 ("Hi t)
maximum=6 ("Hi th)
"Hi th
I believe you are looking for a function that reads the text files into a stream and then parses it into string variable. Once you have done that you can call stringVariable.substring(0,x) to get the output sub string that you are looking for.
Here is the code demonstrates this idea.
public string void GetSubString(int x) {
byte[] buffer;
FileStream fileStream = new FileStream(filePath, FileMode.Open, FileAccess.Read);
try
{
int length = (int)fileStream.Length; // get file length
var buffer = new byte[length]; // create buffer
int count; // actual number of bytes read
int sum = 0; // total number of bytes read
// read until Read method returns 0 (end of the stream has been reached)
while ((count = fileStream.Read(buffer, sum, length - sum)) > 0)
sum += count; // sum is a buffer offset for next reading
}
finally
{
fileStream.Close();
}
var str = System.Text.Encoding.Default.GetString(fileStream);
string sub = str.Substring(0, x);
return sub;
}
Here is a simpler version of Jeffrey's answer:
The reason the output has a single quote is because you can't have double quotes normally within double quotes and I forgot how to escape them.
int min = 5;
int max = 6;
String s = "'Hi this my code project file'";
String minS = s.Substring(0, min);
String maxS = s.Substring(0, max);
Console.WriteLine(minS);
Console.WriteLine(maxS);
While converting some older code to use async in c#, I started seeing problems in variations of return values from the Read() and ReadAsync() methods of the DeflateStream.
I thought that the transition from synchronous code like
bytesRead = deflateStream.Read(buffer, 0, uncompressedSize);
to it's equivalent asynchronous version of
bytesRead = await deflateStream.ReadAsync(buffer, 0, uncompressedSize);
should always return the same value.
See updated code added to bottom of question - that uses streams the correct way - hence making the initial question irrelevant
I found that after number of iterations this didn't hold true, and in my specific case was causing random errors in the converted application.
Am I missing something here?
Below is simple repro case (in a console app), where the Assert will break for me in the ReadAsync method on iteration #412, giving output that looks like this:
....
ReadAsync #410 - 2055 bytes read
ReadAsync #411 - 2055 bytes read
ReadAsync #412 - 453 bytes read
---- DEBUG ASSERTION FAILED ----
My question is, why is the DeflateStream.ReadAsync method returning 453 bytes at this point?
Note: this only happens with certain input strings - the massive StringBuilder stuff in the CreateProblemDataString was the best way I could think of constructing the string for this post.
class Program
{
static byte[] DataAsByteArray;
static int uncompressedSize;
static void Main(string[] args)
{
string problemDataString = CreateProblemDataString();
DataAsByteArray = Encoding.ASCII.GetBytes(problemDataString);
uncompressedSize = DataAsByteArray.Length;
MemoryStream memoryStream = new MemoryStream();
using (DeflateStream deflateStream = new DeflateStream(memoryStream, CompressionMode.Compress, true))
{
for (int i = 0; i < 1000; i++)
{
deflateStream.Write(DataAsByteArray, 0, uncompressedSize);
}
}
// now read it back synchronously
Read(memoryStream);
// now read it back asynchronously
Task retval = ReadAsync(memoryStream);
retval.Wait();
}
static void Read(MemoryStream memoryStream)
{
memoryStream.Position = 0;
using (DeflateStream deflateStream = new DeflateStream(memoryStream, CompressionMode.Decompress, true))
{
byte[] buffer = new byte[uncompressedSize];
int bytesRead = -1;
int i = 0;
while (bytesRead > 0 || bytesRead == -1)
{
bytesRead = deflateStream.Read(buffer, 0, uncompressedSize);
System.Diagnostics.Debug.WriteLine("Read #{0} - {1} bytes read", i, bytesRead);
System.Diagnostics.Debug.Assert(bytesRead == 0 || bytesRead == uncompressedSize);
i++;
}
}
}
static async Task ReadAsync(MemoryStream memoryStream)
{
memoryStream.Position = 0;
using (DeflateStream deflateStream = new DeflateStream(memoryStream, CompressionMode.Decompress, true))
{
byte[] buffer = new byte[uncompressedSize];
int bytesRead = -1;
int i = 0;
while (bytesRead > 0 || bytesRead == -1)
{
bytesRead = await deflateStream.ReadAsync(buffer, 0, uncompressedSize);
System.Diagnostics.Debug.WriteLine("ReadAsync #{0} - {1} bytes read", i, bytesRead);
System.Diagnostics.Debug.Assert(bytesRead == 0 || bytesRead == uncompressedSize);
i++;
}
}
}
/// <summary>
/// This is one of the strings of data that was causing issues.
/// </summary>
/// <returns></returns>
static string CreateProblemDataString()
{
StringBuilder sb = new StringBuilder();
sb.Append("0601051081 ");
sb.Append(" ");
sb.Append(" 225021 0300420");
sb.Append("34056064070072076361102 13115016017");
sb.Append("5 192 230237260250 2722");
sb.Append("73280296 326329332 34535535");
sb.Append("7 3 ");
sb.Append(" 4");
sb.Append(" ");
sb.Append(" 50");
sb.Append("6020009 030034045 063071076 360102 13");
sb.Append("1152176160170 208206 23023726025825027227328");
sb.Append("2283285 320321333335341355357 622005009 0");
sb.Append("34053 060070 361096 130151176174178172208");
sb.Append("210198 235237257258256275276280290293 3293");
sb.Append("30334 344348350 ");
sb.Append(" ");
sb.Append(" ");
sb.Append(" ");
sb.Append(" 225020012014 046042044034061");
sb.Append("075078 361098 131152176160170 208195210 230");
sb.Append("231260257258271272283306 331332336 3443483");
sb.Append("54 29 ");
sb.Append(" ");
sb.Append(" 2");
sb.Append("5 29 06 0");
sb.Append("1 178 17");
sb.Append("4 205 2");
sb.Append("05 195 2");
sb.Append("31 231 23");
sb.Append("7 01 01 0");
sb.Append("2 260 26");
sb.Append("2 274 2");
sb.Append("72 274 01 01 0");
sb.Append("3 1 5 3 6 43 52 ");
return sb.ToString();
}
}
UPDATED CODE TO READ STREAMS INTO BUFFER CORRECTLY
Output now looks like this:
...
ReadAsync #410 - 2055 bytes read
ReadAsync #411 - 2055 bytes read
ReadAsync PARTIAL #412 - 453 bytes read, offset for next read = 453
ReadAsync #412 - 1602 bytes read
ReadAsync #413 - 2055 bytes read
...
static void Read(MemoryStream memoryStream)
{
memoryStream.Position = 0;
using (DeflateStream deflateStream = new DeflateStream(memoryStream, CompressionMode.Decompress, true))
{
byte[] buffer = new byte[uncompressedSize]; // buffer to hold known fixed size record.
int bytesRead; // number of bytes read from Read operation
int offset = 0; // offset for writing into buffer
int i = -1; // counter to track iteration #
while ((bytesRead = deflateStream.Read(buffer, offset, uncompressedSize - offset)) > 0)
{
offset += bytesRead; // offset in buffer for results of next reading
System.Diagnostics.Debug.Assert(offset <= uncompressedSize, "should never happen - because would mean more bytes read than requested.");
if (offset == uncompressedSize) // buffer full, complete fixed size record in buffer.
{
offset = 0; // buffer is now filled, next read to start at beginning of buffer again.
i++; // increment counter that tracks iteration #
System.Diagnostics.Debug.WriteLine("Read #{0} - {1} bytes read", i, bytesRead);
}
else // buffer still not full
{
System.Diagnostics.Debug.WriteLine("Read PARTIAL #{0} - {1} bytes read, offset for next read = {2}", i+1, bytesRead, offset);
}
}
}
}
static async Task ReadAsync(MemoryStream memoryStream)
{
memoryStream.Position = 0;
using (DeflateStream deflateStream = new DeflateStream(memoryStream, CompressionMode.Decompress, true))
{
byte[] buffer = new byte[uncompressedSize]; // buffer to hold known fixed size record.
int bytesRead; // number of bytes read from Read operation
int offset = 0; // offset for writing into buffer
int i = -1; // counter to track iteration #
while ((bytesRead = await deflateStream.ReadAsync(buffer, offset, uncompressedSize - offset)) > 0)
{
offset += bytesRead; // offset in buffer for results of next reading
System.Diagnostics.Debug.Assert(offset <= uncompressedSize, "should never happen - because would mean more bytes read than requested.");
if (offset == uncompressedSize) // buffer full, complete fixed size record in buffer.
{
offset = 0; // buffer is now filled, next read to start at beginning of buffer again.
i++; // increment counter that tracks iteration #
System.Diagnostics.Debug.WriteLine("ReadAsync #{0} - {1} bytes read", i, bytesRead);
}
else // buffer still not full
{
System.Diagnostics.Debug.WriteLine("ReadAsync PARTIAL #{0} - {1} bytes read, offset for next read = {2}", i+1, bytesRead, offset);
}
}
}
}
Damien's comments are exactly correct. But, your mistake is a common enough one and IMHO the question deserves an actual answer, if for no other reason than to help others who make the same mistake more easily find the answer to the question.
So, to be clear:
As is true for all of the stream-oriented I/O methods in .NET where one provides a byte[] buffer and the number of bytes read is returned by the method, the only assumptions you can make about the number of bytes are:
The number will not be larger than the maximum number of bytes you asked to read (i.e. passed to the method as the count of bytes to read)
The number will be non-negative, and will be greater than 0 as long as there were in fact data remaining to be read (0 will be returned when you reach the end of the stream).
When reading using any of these methods, you cannot even count on the same method always returning the same number of bytes (depending on context…obviously in some cases, this is in fact deterministic, but you should still not rely on that), and there is no guarantee of any sort that different methods, even those which are reading from the same source, will always return the same number of bytes as some other method.
It is up to the caller to read the bytes as a stream, taking into account the return value specifying the number of bytes read for each call, and reassembling those bytes in whatever manner is appropriate for that particular stream of bytes.
Note that when dealing with Stream objects, you can use the Stream.CopyTo() method. Of course, it only copies to another Stream object. But in many cases, the destination object can be used without treating it as a Stream. E.g. you just want to write the data as a file, or you want to copy it to a MemoryStream and then use the MemoryStream.ToArray() method to turn that into an array of bytes (which you can then access without any concern about how many bytes have been read in a given read operation…by the time you get to the array, all of them have been read :) ).
I have some C# code to call as TF(true,"C:\input.txt","C:\noexistsyet.file"), but when I run it, it breaks on FileStream.Read() for reading the last chunk of the file into the buffer, getting an index-out-of-bounds ArgumentException.
To me, the code seems logical with no overflow for trying to write to the buffer. I thought I had all that set up with rdlen and _chunk, but maybe I'm looking at it wrong. Any help?
My error: ArgumentException was unhandled: Offset and length were out of bounds for the array or count is greater than the number of elements from index to the end of the source collection.
public static bool TF(bool tf, string filepath, string output)
{
long _chunk = 16 * 1024; //buffer count
long total_size = 0
long rdlen = 0;
long wrlen = 0;
long full_chunks = 0;
long end_remain_buf_len = 0;
FileInfo fi = new FileInfo(filepath);
total_size = fi.Length;
full_chunks = total_size / _chunk;
end_remain_buf_len = total_size % _chunk;
fi = null;
FileStream fs = new FileStream(filepath, FileMode.Open);
FileStream fw = new FileStream(output, FileMode.Create);
for (long chunk_pass = 0; chunk_pass < full_chunks; chunk_pass++)
{
int chunk = (int)_chunk * ((tf) ? (1 / 3) : 3); //buffer count for xbuffer
byte[] buffer = new byte[_chunk];
byte[] xbuffer = new byte[(buffer.Length * ((tf) ? (1 / 3) : 3))];
//Read chunk of file into buffer
fs.Read(buffer, (int)rdlen, (int)_chunk); //ERROR occurs here
//xbuffer = do stuff to make it *3 longer or *(1/3) shorter;
//Write xbuffer into chunk of completed file
fw.Write(xbuffer, (int)wrlen, chunk);
//Keep track of location in file, for index/offset
rdlen += _chunk;
wrlen += chunk;
}
if (end_remain_buf_len > 0)
{
byte[] buffer = new byte[end_remain_buf_len];
byte[] xbuffer = new byte[(buffer.Length * ((tf) ? (1 / 3) : 3))];
fs.Read(buffer, (int)rdlen, (int)end_remain_buf_len); //error here too
//xbuffer = do stuff to make it *3 longer or *(1/3) shorter;
fw.Write(xbuffer, (int)wrlen, (int)end_remain_buf_len * ((tf) ? (1 / 3) : 3));
rdlen += end_remain_buf_len;
wrlen += chunk;
}
//Close opened files
fs.Close();
fw.Close();
return false; //no functionality yet lol
}
The Read() method of Stream (the base class of FileStream) returns an int indicating the number of bytes read, and 0 when it has no more bytes to read, so you don't even need to know the file size beforehand:
public static void CopyFileChunked(int chunkSize, string filepath, string output)
{
byte[] chunk = new byte[chunkSize];
using (FileStream reader = new FileStream(filepath, FileMode.Open))
using (FileStream writer = new FileStream(output, FileMode.Create))
{
int bytes;
while ((bytes = reader.Read(chunk , 0, chunkSize)) > 0)
{
writer.Write(chunk, 0, bytes);
}
}
}
Or even File.Copy() may do the trick, if you can live with letting the framework decide about the chunk size.
I think it's failing on this line:
fw.Write(xbuffer, (int)wrlen, chunk);
You are declaring xbuffer as
byte[] xbuffer = new byte[(buffer.Length * ((tf) ? (1 / 3) : 3))];
Since 1 / 3 is an integer division, it returns 0.And you are declaring xbuffer with the size 0 hence the error.You can fix it by casting one of the operand to a floating point type or using literals.But then you still need to cast the result back to integer.
byte[] xbuffer = new byte[(int)(buffer.Length * ((tf) ? (1m / 3) : 3))];
The same problem also present in the chunk declaration.
i got the following code:
byte[] myBytes = new byte[10 * 10000];
for (long i = 0; i < 10000; i++)
{
byte[] a1 = BitConverter.GetBytes(i);
byte[] a2 = BitConverter.GetBytes(true);
byte[] a3 = BitConverter.GetBytes(false);
byte[] rv = new byte[10];
System.Buffer.BlockCopy(a1, 0, rv, 0, a1.Length);
System.Buffer.BlockCopy(a2, 0, rv, a1.Length, a2.Length);
System.Buffer.BlockCopy(a3, 0, rv, a1.Length + a2.Length, a3.Length);
}
everything works as it should. i was trying to convert this code so everything will be written into myBytes but then i realised, that i use a long and if its value will be higher then int.MaxValue casting will fail.
how could one solve this?
another question would be, since i dont want to create a very large bytearray in memory, how could i send it directry to my .WriteBytes(path, myBytes); function ?
If the final destination for this is, as suggested, a file: then write to a file more directly, rather than buffering in memory:
using (var file = File.Create(path)) // or append file FileStream etc
using (var writer = new BinaryWriter(file))
{
for (long i = 0; i < 10000; i++)
{
writer.Write(i);
writer.Write(true);
writer.Write(false);
}
}
Perhaps the ideal way of doing this in your case would be to pass a single BinaryWriter instance to each object in turn as you serialize them (don't open and close the file per-object).
Why don't you just Write() the bytes out as you process them rather than converting to a massive buffer, or use a smaller buffer at least?
Best explained with code:
long pieceLength = Math.Pow(2,18); //simplification
...
public void HashFile(string path)
{
using (FileStream fin = File.OpenRead(path))
{
byte[] buffer = new byte[(int)pieceLength];
int pieceNum = 0;
long remaining = fin.Length;
int done = 0;
int offset = 0;
while (remaining > 0)
{
while (done < pieceLength)
{
int toRead = (int)Math.Min(pieceLength, remaining);
int read = fin.Read(buffer, offset, toRead);
//if read == 0, EOF reached
if (read == 0)
break;
offset += read;
done += read;
remaining -= read;
}
HashPiece(buffer, pieceNum);
done = 0;
pieceNum++;
buffer = new byte[(int)pieceLength];
}
}
}
This works fine if the file is smaller than pieceLength and only does the outer loop once. However, if the file is larger, it throws this at me:
This is in the int read = fin.Read(buffer, offset, toRead); line.
Unhandled Exception: System.ArgumentException: Offset and length were out of bounds for the array or count is greater than the number of elements from index to the end of the source collection.
at System.IO.FileStream.Read(Byte[] array, Int32 offset, Int32 count)
done, buffer DO get reinitialized properly. File is larger than 1 MB.
Thanks in advance
Well, at least one problem is that you're not taking into account the "piece already read" when you work out how much to read. Try this:
int toRead = (int) Math.Min(pieceLenght - done, remaining);
And then also adjust where you're reading to within the buffer:
int read = fin.Read(buffer, done, toRead);
(as you're resetting done for the new buffer, but not offset).
Oh, and at that point offset is irrelevant, so remove it.
Then note djna's answer as well - consider the case where for whatever reason you read to the end of the file, but without remaining becoming zero. You may want to consider whether remaining is actually useful at all... why not just keep reading blocks until you get to the end of the stream?
You don't adjust the value of "remaining" in this case
if (read == 0)
break;
The FileStream.Read method's Offset and Length parameters relate to positions in the buffer, not to positions in the file.
Basically, this should fix it:
int read = fin.Read(buffer, 0, toRead);