Best explained with code:
long pieceLength = Math.Pow(2,18); //simplification
...
public void HashFile(string path)
{
using (FileStream fin = File.OpenRead(path))
{
byte[] buffer = new byte[(int)pieceLength];
int pieceNum = 0;
long remaining = fin.Length;
int done = 0;
int offset = 0;
while (remaining > 0)
{
while (done < pieceLength)
{
int toRead = (int)Math.Min(pieceLength, remaining);
int read = fin.Read(buffer, offset, toRead);
//if read == 0, EOF reached
if (read == 0)
break;
offset += read;
done += read;
remaining -= read;
}
HashPiece(buffer, pieceNum);
done = 0;
pieceNum++;
buffer = new byte[(int)pieceLength];
}
}
}
This works fine if the file is smaller than pieceLength and only does the outer loop once. However, if the file is larger, it throws this at me:
This is in the int read = fin.Read(buffer, offset, toRead); line.
Unhandled Exception: System.ArgumentException: Offset and length were out of bounds for the array or count is greater than the number of elements from index to the end of the source collection.
at System.IO.FileStream.Read(Byte[] array, Int32 offset, Int32 count)
done, buffer DO get reinitialized properly. File is larger than 1 MB.
Thanks in advance
Well, at least one problem is that you're not taking into account the "piece already read" when you work out how much to read. Try this:
int toRead = (int) Math.Min(pieceLenght - done, remaining);
And then also adjust where you're reading to within the buffer:
int read = fin.Read(buffer, done, toRead);
(as you're resetting done for the new buffer, but not offset).
Oh, and at that point offset is irrelevant, so remove it.
Then note djna's answer as well - consider the case where for whatever reason you read to the end of the file, but without remaining becoming zero. You may want to consider whether remaining is actually useful at all... why not just keep reading blocks until you get to the end of the stream?
You don't adjust the value of "remaining" in this case
if (read == 0)
break;
The FileStream.Read method's Offset and Length parameters relate to positions in the buffer, not to positions in the file.
Basically, this should fix it:
int read = fin.Read(buffer, 0, toRead);
Related
I am working with filestream read: https://msdn.microsoft.com/en-us/library/system.io.filestream.read%28v=vs.110%29.aspx
What I'm trying to do is read a large file in a loop a certain number of bytes at a time; not the whole file at once. The code example shows this for reading:
int n = fsSource.Read(bytes, numBytesRead, numBytesToRead);
The definition of "bytes" is: "When this method returns, contains the specified byte array with the values between offset and (offset + count - 1) replaced by the bytes read from the current source."
I want to only read in 1 mb at a time so I do this:
using (FileStream fsInputFile = new FileStream(strInputFileName, FileMode.Open, FileAccess.Read)) {
int intBytesToRead = 1024;
int intTotalBytesRead = 0;
int intInputFileByteLength = 0;
byte[] btInputBlock = new byte[intBytesToRead];
byte[] btOutputBlock = new byte[intBytesToRead];
intInputFileByteLength = (int)fsInputFile.Length;
while (intInputFileByteLength - 1 >= intTotalBytesRead)
{
if (intInputFileByteLength - intTotalBytesRead < intBytesToRead)
{
intBytesToRead = intInputFileByteLength - intTotalBytesRead;
}
// *** Problem is here ***
int n = fsInputFile.Read(btInputBlock, intTotalBytesRead, intBytesToRead);
intTotalBytesRead += n;
fsOutputFile.Write(btInputBlock, intTotalBytesRead - n, n);
}
fsOutputFile.Close(); }
Where the problem area is stated, btInputBlock works on the first cycle because it reads in 1024 bytes. But then on the second loop, it doesn't recycle this byte array. It instead tries to append the new 1024 bytes into btInputBlock. As far as I can tell, you can only specify the offset and length of the file you want to read and not the offset and length of btInputBlock. Is there a way to "re-use" the array that is being dumped into by Filestream.Read or should I find another solution?
Thanks.
P.S. The exception on the read is: "Offset and length were out of bounds for the array or count is greater than the number of elements from index to the end of the source collection."
Your code can be simplified somewhat
int num;
byte[] buffer = new byte[1024];
while ((num = fsInputFile.Read(buffer, 0, buffer.Length)) != 0)
{
//Do your work here
fsOutputFile.Write(buffer, 0, num);
}
Note that Read takes in the Array to fill, the offset (which is the offset of the array where the bytes should be placed, and the (max) number of bytes to read.
That's because you're incrementing intTotalBytesRead, which is an offset for the array, not for the filestream. In your case it should always be zero, which will overwrite previous byte data in the array, rather than append it at the end, using intTotalBytesRead.
int n = fsInputFile.Read(btInputBlock, intTotalBytesRead, intBytesToRead); //currently
int n = fsInputFile.Read(btInputBlock, 0, intBytesToRead); //should be
Filestream doesn't need an offset, every Read picks up where the last one left off.
See https://msdn.microsoft.com/en-us/library/system.io.filestream.read(v=vs.110).aspx
for details
Your Read call should be Read(btInputBlock, 0, intBytesToRead). The 2nd parameter is the offset into the array you want to start writing the bytes to. Similarly for Write you want Write(btInputBlock, 0, n) as the 2nd parameter is the offset in the array to start writing bytes from. Also you don't need to call Close as the using will clean up the FileStream for you.
using (FileStream fsInputFile = new FileStream(strInputFileName, FileMode.Open, FileAccess.Read))
{
int intBytesToRead = 1024;
byte[] btInputBlock = new byte[intBytesToRead];
while (fsInputFile.Postion < fsInputFile.Length)
{
int n = fsInputFile.Read(btInputBlock, 0, intBytesToRead);
intTotalBytesRead += n;
fsOutputFile.Write(btInputBlock, 0, n);
}
}
What I'm doing is taking a user entered string, creating a packet with the data, then sending the string out to a serial port. I am then reading the data I send via a loopback connector. My send is working flawlessly, however my receive is randomly throwing an arithmetic overflow exception.
I say randomly because it is not happening consistently. For example, I send the message "hello" twice. The first time works fine, the second time outputs nothing and throws an exception. I restart my program, run the code again, and send hello only to receive "hell" and then an exception. On rare occasion, I'll receive the packet 3 or 4 times in a row without error before the exception.
Here is my relevant code:
public void receivePacket(object sender, SerialDataReceivedEventArgs e)
{
byte[] tempByte = new byte[2];
int byteCount = 0;
while (serialPort1.BytesToRead > 0)
{
if (byteCount <= 1)
{
tempByte[byteCount] = (byte)serialPort1.ReadByte();
}
if (byteCount == 1)
{
receivedString = new byte[tempByte[byteCount]];
receivedString[0] = tempByte[0];
receivedString[1] = tempByte[1];
}
else if (byteCount > 1)
{
byte b = (byte)serialPort1.ReadByte();
receivedString[byteCount] = b;
}
byteCount++;
}
int strLen = (byteCount - 3);
tempByte = new byte[strLen];
int newBit = 0;
for (int i = 2; i <= strLen+1; i++)
{
tempByte[newBit] = receivedString[i];
newBit++;
}
string receivedText = encoder.GetString(tempByte);
SetText(receivedText.ToString());
}
I'm well aware that my implementation using byteCount (which I use to traverse the byte array) is rather sloppy. When I step through the code, I find that when I get the error byteCount == 1, which is making strLen a negative number (since strLen is byteCount - 3, which is done because the packet contains a header, length, and CRC i.e. byteCount - 3 == # of actual data bytes received). This leads to by tempByte having a size of -2, which throws my exceptions. I, however, am having a very hard time figuring out why byteCount is being set to 1.
The code after this basically just traverses the data section of the array, copies it into the tempByte, then is sent off to a function to append the text in another thread.
I am guessing that byteCount is 1 because you only received one byte - or rather, you processed the first byte before the second one arrived in the buffer.
The ReadByte function will wait for a certain amount of time for a byte to arrive if there isn't one waiting.
Maybe if instead of checking BytesToRead, you did something more like this:
byte headerByte = serialPort1.ReadByte();
byte length = serialPort1.ReadByte();
receivedString = new byte[length];
receivedString[0] = headerByte;
receivedString[1] = length;
for (int i = 2; i < length; i++) {
receivedString[i] = serialPort1.ReadByte();
}
I'm writing an interface for talking to a piece of test equipment. The equipment talks over a serial port and responds with a known number of bytes to each command I send it.
My current structure is:
Send command
Read number of specified bytes back
Proceed with application
However, when I used SerialPort.Read(byte[], int32, int32), the function is not blocking. So, for example, if I call MySerialPort.Read(byteBuffer, 0, bytesExpected);, the function returns with less than the specified number of bytesExpected. Here is my code:
public bool ReadData(byte[] responseBytes, int bytesExpected, int timeOut)
{
MySerialPort.ReadTimeout = timeOut;
int bytesRead = MySerialPort.Read(responseBytes, 0, bytesExpected);
return bytesRead == bytesExpected;
}
And I call this method like this:
byte[] responseBytes = new byte[13];
if (Connection.ReadData(responseBytes, 13, 5000))
ProduceError();
My problem is that I can't ever seem to get it to read the full 13 bytes like I am telling it. If I put a Thread.Sleep(1000) right before my SerialPort.Read(...) everything works fine.
How can I force the Read method to block until either the timeOut is exceeded or the specified number of bytes are read?
That is expected; most IO APIs allow you to specify the upper bound only - they are simply required to return at-least-one byte, unless it is an EOF in which case they can return a non-positive value. To compensate, you loop:
public bool ReadData(byte[] responseBytes, int bytesExpected, int timeOut)
{
MySerialPort.ReadTimeout = timeOut;
int offset = 0, bytesRead;
while(bytesExpected > 0 &&
(bytesRead = MySerialPort.Read(responseBytes, offset, bytesExpected)) > 0)
{
offset += bytesRead;
bytesExpected -= bytesRead;
}
return bytesExpected == 0;
}
The only problem is you might need to reduce the timeout per iteration, by using a Stopwatch or similar to see how much time has passed.
Note that I also removed the ref on responseBytes - you don't need that (you don't re-assign that value).
Try changing the timeout to InfiniteTimeout.
SerialPort.Read is expected to throw a TimeoutException in case no bytes are available before SerialPort.ReadTimeout.
So this method reads exactly the desired number or bytes, or throws an exception:
public byte[] ReadBytes(int byteCount) {
try
{
int totBytesRead = 0;
byte[] rxBytes = new byte[byteCount];
while (totBytesRead < byteCount) {
int bytesRead = comPort.Read(rxBytes, totBytesRead, byteCount - totBytesRead);
totBytesRead += bytesRead;
}
return rxBytes;
}
catch (Exception ex){
throw new MySerialComPortException("SerialComPort.ReadBytes error", ex);
}
}
This is a little more tricky than I first imagined. I'm trying to read n bytes from a stream.
The MSDN claims that Read does not have to return n bytes, it just must return at least 1 and up to n bytes, with 0 bytes being the special case of reaching the end of the stream.
Typically, I'm using something like
var buf = new byte[size];
var count = stream.Read (buf, 0, size);
if (count != size) {
buf = buf.Take (count).ToArray ();
}
yield return buf;
I'm hoping for exactly size bytes but by spec FileStream would be allowed to return a large number of 1-byte chunks as well. This must be avoided.
One way to solve this would be to have 2 buffers, one for reading and one for collecting the chunks until we got the requested number of bytes. That's a little cumbersome though.
I also had a look at BinaryReader but its spec also does not clearly state that n bytes will be returned for sure.
To clarify: Of course, upon the end of the stream the returned number of bytes may be less than size - that's not a problem. I'm only talking about not receiving n bytes even though they are available in the stream.
A slightly more readable version:
int offset = 0;
while (offset < count)
{
int read = stream.Read(buffer, offset, count - offset);
if (read == 0)
throw new System.IO.EndOfStreamException();
offset += read;
}
Or written as an extension method for the Stream class:
public static class StreamUtils
{
public static byte[] ReadExactly(this System.IO.Stream stream, int count)
{
byte[] buffer = new byte[count];
int offset = 0;
while (offset < count)
{
int read = stream.Read(buffer, offset, count - offset);
if (read == 0)
throw new System.IO.EndOfStreamException();
offset += read;
}
System.Diagnostics.Debug.Assert(offset == count);
return buffer;
}
}
Simply; you loop;
int read, offset = 0;
while(leftToRead > 0 && (read = stream.Read(buf, offset, leftToRead)) > 0) {
leftToRead -= read;
offset += read;
}
if(leftToRead > 0) throw new EndOfStreamException(); // not enough!
After this, buf should have been populated with exactly the right amount of data from the stream, or will have thrown an EOF.
Getting everything together from answers here I came up with the following solution. It relies on a source stream length. Works on .NET core 3.1
/// <summary>
/// Copy stream based on source stream length
/// </summary>
/// <param name="source"></param>
/// <param name="destination"></param>
/// <param name="bufferSize">
/// A value that is the largest multiple of 4096 and is still smaller than the LOH threshold (85K).
/// So the buffer is likely to be collected at Gen0, and it offers a significant improvement in Copy performance.
/// </param>
/// <returns></returns>
private async Task CopyStream(Stream source, Stream destination, int bufferSize = 81920)
{
var buffer = new byte[bufferSize];
var offset = 0;
while (offset < source.Length)
{
var leftToRead = source.Length - offset;
var lengthToRead = leftToRead - buffer.Length < 0 ? (int)(leftToRead) : buffer.Length;
var read = await source.ReadAsync(buffer, 0, lengthToRead).ConfigureAwait(false);
if (read == 0)
break;
await destination.WriteAsync(buffer, 0, lengthToRead).ConfigureAwait(false);
offset += read;
}
destination.Seek(0, SeekOrigin.Begin);
}
I was searching for a BinaryReader.Skip function, while I came across this feature request on msdn.
He said you can provide your own BinaryReader.Skip() function, by using this.
Only looking at this code, I'm wondering why he chose this way to skip a certain amount of bytes:
for (int i = 0, i < count; i++) {
reader.ReadByte();
}
Is there a difference between that and:
reader.ReadBytes(count);
Even if it's just a small optimalisation, I'd like to undestand. Because now it doesnt make sense to me why you would use the for loop.
public void Skip(this BinaryReader reader, int count) {
if (reader.BaseStream.CanSeek) {
reader.BaseStream.Seek(count, SeekOffset.Current);
}
else {
for (int i = 0, i < count; i++) {
reader.ReadByte();
}
}
}
No, there is no difference. EDIT: Assuming that the stream has enough byes
The ReadByte method simply forwards to the underlying Stream's ReadByte method.
The ReadBytes method calls the underlying stream's Read until it reads the required number of bytes.
It's defined like this:
public virtual byte[] ReadBytes(int count) {
if (count < 0) throw new ArgumentOutOfRangeException("count", Environment.GetResourceString("ArgumentOutOfRange_NeedNonNegNum"));
Contract.Ensures(Contract.Result<byte[]>() != null);
Contract.Ensures(Contract.Result<byte[]>().Length <= Contract.OldValue(count));
Contract.EndContractBlock();
if (m_stream==null) __Error.FileNotOpen();
byte[] result = new byte[count];
int numRead = 0;
do {
int n = m_stream.Read(result, numRead, count);
if (n == 0)
break;
numRead += n;
count -= n;
} while (count > 0);
if (numRead != result.Length) {
// Trim array. This should happen on EOF & possibly net streams.
byte[] copy = new byte[numRead];
Buffer.InternalBlockCopy(result, 0, copy, 0, numRead);
result = copy;
}
return result;
}
For most streams, ReadBytes will probably be faster.
ReadByte will throw an EndOfStreamException if the end of the stream is reached, whereas ReadBytes will not. It depends on whether you want Skip to throw if it cannot skip the requested number of bytes without reaching the end of the stream.
ReadBytes is faster than multiple ReadByte calls.
Its a very small optimization which will occasionally skip bytes (rather then reading them into ReadByte) Think of it this way
if(vowel)
{
println(vowel);
}
else
{
nextLetter();
}
If you can prevent that extra function call you save a little runtime