I am working on improving a stream reader class that uses a BinaryReader. It consists of a while loop that uses .PeekChar() to check if more data exists to continue processing.
The very first operation is a .ReadInt32() which reads 4 bytes. What if PeekChar only "saw" one byte (or one bit)? This doesn't seem like a reliable way of checking for EOF.
The BinaryReader is constructed using its default parameters, which as I understand it, uses UTF8 as the default encoding. I assume that .PeekChar() checks for 8 bits but I really am not sure.
How many bits does .PeekChar() look for? (and what are some alternate methods to checking for EOF?)
Here BinaryReader.PeekChar
I read:
ArgumentException: The current character cannot be decoded into the
internal character buffer by using the Encoding selected for the
stream.
This makes clear that amount of memory read depends on Encoding applied to that stream.
EDIT
Actually definition according to MSDN is:
Returns the next available character and does not advance the
byte or character position.*
Infact, it depends on encoding if this is a byte or more...
Hope this helps.
Making your Read*() calls blindly and handling any exceptions that are thrown is the normal method. I don't believe that the stream position is moved if anything goes wrong.
The PeekChar() method of BinaryReader is very buggy. Even when trying to read a from a memory stream with UTF8 encoded data, PeekChar() throws an exception after reading a particular length of the stream. The BCL team has acknowledged the issue, but they have not committed to resolving the issue. Their only response is to avoid using PeekChar() if you can.
Related
I'm creating an application that will take an image in a certain format from one of a video game's files and convert it to a DDS. This requires me to build the DDS in a buffer and then write it out to a DDS file. This buffer is of type List<byte>.
I first write the magic number, which is just the text "DDS ", with this code:
ddsFile.AddRange(Encoding.ASCII.GetBytes("DDS "));
I then need to write the header size, which is always 0x7C000000 (124), and this is where I've hit a wall. I used this code to write it to the buffer:
ddsFile.AddRange(BitConverter.GetBytes(0x0000007C));
This made sense to me because Encoding.ASCII.GetBytes()says itself that it returns a byte[], and it does accept an int as a parameter, no problem. And additionally, this was what I saw recommended when looking for a method for adding multi-byte values to a byte list. But for whatever reason, when the program tries to execute that line, this exception is thrown:
Unable to cast object of type 'System.Byte[]' to type 'System.IConvertible'.
But what's even more strange to the point of being ridiculous is that, upon seeing what did make it into the buffer, I see that the int actually was being written to the buffer, but the exception was still occurring for who knows what reason.
Bizarrely, even writing a single byte to the list after writing the magic number e.g. ddsFile.Add((byte)0x00)); results in the same thing.
Any help in figuring out why this exception occurs and/or a solution would be greatly appreciated.
This is not an answer to the question but a suggestion to do it differently.
Instead of using a List<byte> and manually doing all the conversions (while certainly possible, it's cumbersome), use a stream and a BinaryWriter - the stream can be a memory stream if you want to buffer the image in memory or a file stream if you want to write it to disk right away.
Using a BinaryWriter against the stream makes the conversions a lot simpler (and you can still manually convert parts of the data easily, if you need to do so).
Here's a short example:
var ms = new MemoryStream();
var bw = new BinaryWriter(ms, Encoding.ASCII);
bw.Write("DDS ");
bw.Write(124); // writes 4 bytes
bw.Write((byte) 124); // writes 1 byte
...
Use whichever overload of Write() you need to output the right bytes. (This short example omits cleaning up things but if you use a file stream, you'll need to make sure that you properly close it.)
I want to use the stream class to read/write data to/from a serial port. I use the BaseStream to get the stream (link below) but the Length property doesn't work. Does anyone know how can I read the full buffer without knowing how many bytes there are?
http://msdn.microsoft.com/en-us/library/system.io.ports.serialport.basestream.aspx
You can't. That is, you can't guarantee that you've received everything if all you have is the BaseStream.
There are two ways you can know if you've received everything:
Send a length word as the first 2 or 4 bytes of the packet. That says how many bytes will follow. Your reader then reads that length word, reads that many bytes, and knows it's done.
Agree on a record separator. That works great for text. For example you might decide that a null byte or a end-of-line character signals the end of the data. This is somewhat more difficult to do with arbitrary binary data, but possible. See comment.
Or, depending on your application, you can do some kind of timing. That is, if you haven't received anything new for X number of seconds (or milliseconds?), you assume that you've received everything. That has the obvious drawback of not working well if the sender is especially slow.
Maybe you can try SerialPort.BytesToRead property.
I have a socket-based application that exposes received data with a BinaryReader object on the client side. I've been trying to debug an issue where the data contained in the reader is not clean... i.e. the buffer that I'm reading contains old data past the size of the new data.
In the code below:
System.Diagnostics.Debug.WriteLine("Stream length: {0}", _binaryReader.BaseStream.Length);
byte[] buffer = _binaryReader.ReadBytes((int)_binaryReader.BaseStream.Length);
When I comment out the first line, the data doesn't end up being dirty (or, doesn't end up being dirty as regularly) as when I have that print line statement. As far as I can tell, from the server side the data is coming in cleanly, so it's possible that my socket implementation has some issues. But does anyone have any idea why adding that print line would cause the data to be dirty more often?
Your binary reader looks like it is a private member variable (if the leading underscore is a tell tell sign).
Is your application multithreaded? You could be experiencing a race condition if another thread is attempting to do also use your binaryReader while you are reading from it. The fact that you experience issues even without that line seems quite suspect to me.
Are you sure that your reading logic is correct? Stream.Length indicates the length of the entire stream, not of the remaining data to be read.
Suppose that, initially, 100 bytes were available. Length is 100, and BinaryReader corrects reads 100 bytes and advances the stream position by 100. Then, another 20 bytes arrive. Length is now 120; however, your BinaryReader should only be reading 20 bytes, not 120. The ‘extra’ 100 bytes requested in the second read would either cause it to block or (if the stream is not implemented correctly) break.
The problem was silly and unrelated. I believe my reading logic above is correct, however. The issue was that the _binaryReader I was using was a reference that was not owned by my class and hence the underlying stream was being rewritten with bad data.
I have a Decoder instance, and am using its convert method in conjunction with a file reader to read in data with the appropriate encoding.
I'd like to switch the Decoder instance I'm using mid-way through my read, however I'm conscious that the original decoder may have some bytes buffered internally (incomplete chars) despite bytesUsed equalling byteCount, and this switch sounds as though it could then result in data loss.
Can I retrieve the internal byte buffer so I might pass it through? Additionally, this switch is only to occur when a fallback takes place - I have considered using the invalid bytes position made available by the fallback exception to 'split' the current read buffer (as presumably at this point any previous buffered bytes have been used), but perhaps there's a better way?
Thanks in advance,
James
This link explains the Encoder.GetBytes Method and there is a bool parameter called flush explained too . The explanation of flush is :
true if this encoder can flush its
state at the end of the conversion;
otherwise, false. To ensure correct
termination of a sequence of blocks of
encoded bytes, the last call to
GetBytes can specify a value of true
for flush.
but I didn't understand what flush does , maybe I am drunk or somthing :). can you explain it in more details please.
Suppose you receive data over a socket connection. You will receive a long text as several byte[] blocks.
It is possible that 1 Unicode character occupies 2+ bytes in a UTF-8 stream and that it is split over 2 byte blocks. Encoding the 2 byte blocks separately (and concatenating the strings) would produce an error.
So you can only specify flush=true on the last block. And of course, if you only have 1 block then that is also the last.
Tip: Use a TextReader and let it handle this problem(s) for you.
Edit
The mirror problem (that was actually asked: GetBytes) is slightly harder to explain.
Using flush=true is the same as using Encoder.Reset() after GetBytes(...). It clears the 'state' of the encoder,
including trailing characters at the end of the previous data block, such as an unmatched high surrogate
The basic idea is the same: when converting from string to blocks of bytes, or vice versa, the blocks are not independent.
Internally the Encoder would be implemented with a buffer - this buffer may need to be flushed (cleared) in order to end the read correctly or prepare the Encoder for the next read.
Here is one explanation of buffer flushing.
The exact usage of the flush parameter is described here:
true to clear the internal state of the encoder after the conversion; otherwise, false.
Flushing will reset the internal state of the encoder instance used to encode the text into bytes. Why does it need internal state, you ask? Well, to quote MSDN:
The flush parameter is useful for flushing a high-surrogate at the end of a stream
that does not have a low-surrogate. For example, the Encoder created by
UTF8Encoding.GetEncoder uses this parameter to determine whether to write out a
dangling high-surrogate at the end of a character block.
If you're using multiple GetBytes(), hence, you would want to flush the internal state at the end to terminate any character sequences that need terminating, but only at the end, since terminating sequences might otherwise be introduced in the middle of words.
Note that this may be a purely theoretical problem these days. And, you'd be better off using higher-level wrappers anyway. If you do, being drunk will not be a problem.