Why is my encoding showing twice?

Why is my encoding showing twice? - c#

byte[] lengthBytes = new byte[4];
serverStream.Read(lengthBytes, 0, 4);
MessageBox.Show("'>>" + System.Text.Encoding.UTF8.GetString(lengthBytes) + "<<'");
MessageBox.Show("Hello");
This is the code I used for debugging. I get 2 messageboxes now. If I used Debug.WriteLine it was also printed twice.
Msgbox 1: '>>/ (Note that this is still 4 characters long, the last 3 bytes are null.
Msgbox 2: '>>{"ac<<'
Msgbox 3: Hello
I'm trying to send 4 bytes with an integer, the length of the message. This is going fine ('/ ' is utf8 for 47). The problem is that the first 4 bytes of the message are also being read ('{"ac'). I totally dont know how this happens, I'm already debugging this for several hours and I just can't get my head around it. One of my friends suggested to make an account on StackOverflow so here I am :p
Thanks for all the help :)
EDIT: The real code for the people who asked
My code http://kutj.es/2ah-j9

You are making traditional programmer mistakes, everybody has to make them once to learn how to avoid it and do it right. This primarily went off the rails by writing debugging code that is buggy and made it lot harder to find your mistake:
Never write debugging code that uses MessageBox.Show(). It is a very, very evil function, it causes re-entrancy. And expensive word that means that it only freezes the user interface, it doesn't freeze your program. It continues to run, one of the things that can go wrong is that the code that you posted is executed again. Re-entered. You'll see two message boxes. And you'll have a completely corrupted program state because your code was never written to assume it could be re-entered. Which is why you complained that 4 bytes of data were swallowed.
The proper tool to use here is the feature that really freezes your program. A debugger breakpoint.
Never assume that binary data can be converted to text. Those 4 bytes you received contain binary zeros. There is no character for it. Worse, it acts as a string terminator to many operating system calls, the kind used by the debugger, Debug.WriteLine() etc. This is why you can't see the "<<"
The proper tool to use here is a debugger watch or tooltip, it lets you look into the array directly. If you absolutely have to generate a diagnostic string then use BitConverter.GetString().
Never assume that a stream's Read() method will always return the number of bytes you asked for. Using the return value in your code is a hard requirement. This is the real bug in your program, the only you are actually trying to fix.
The proper solution is to continue to call Read() until you counted down the number of bytes you expected to receive from the length you read earlier. You'll need a MemoryStream to store the chunks of byte[]s you get.

Perhaps this link regarding Encoding.GetString() will help you out a bit. The part to pay attention to being:
If the data to be converted is available only in sequential blocks
(such as data read from a stream) or if the amount of data is so large
that it needs to be divided into smaller blocks, you should use the
Decoder object returned by the GetDecoder method of a derived class.

The problem was that I started the getMessage void 2 times. This started the while 2 times (in different threads).
Elgonzo helped me finding the problem, he is a great guy :)

Related

Looking for an efficient way to build and parse string without GC

I am trying to figure out if there is a more efficient way than what I'm doing now to build up a message coming in on a serial port and validate it is the right message before I parse it. A complete message starts with a $ and ends with a CR/LF. I use an event handler to get the characters as they show up at the serial port so the message will not necessarily come in as one complete block. Just to confuse things, there are a bunch of other messages that come in on the serial port that don't necessarily start with a $ or end with a CR/LF. I want to see those but not parse them. I understand that concatenating strings is probably not a good idea so I use a StringBuilder to build the message then I use a couple of .ToString() calls to make sure I've got the right message to parse. Do the .ToString calls generate much garbage? Is there a better way?
I'm not a particularly experienced programmer so thanks for the help.
private void SetText(string text)
{
//This is the original approach
//this.rtbIncoming.Text += text;
//First post the raw data to the console rtb
rtbIncoming.AppendText(text);
//Now clean up the text and only post messages to the CPFMessages rtb that start with a $ and end with a LF
incomingMessage.Append(text);
//Make sure the message starts with a $
int stxIndex = incomingMessage.ToString().IndexOf('$');
if (stxIndex == 0)
{ }
else
{
if (stxIndex > 0)
incomingMessage.Remove(0, stxIndex);
}
//If the message is terminated with a LF: 1) post it to the CPFMessage textbox,
// 2) remove it from incomingMessage,
// 3) parse and display fields
int etxIndex = incomingMessage.ToString().IndexOf('\n');
if (etxIndex >= 0)
{
rtbCPFMessages.AppendText(incomingMessage.ToString(0, etxIndex));
incomingMessage.Remove(0, etxIndex);
parseCPFMessage();
}
}

Do the .ToString calls generate much garbage?
Every time you call ToString(), you get a new String object instance. Whether that's "much garbage" depends on your definition of "much garbage" and what you do with those instances.
Is there a better way?
You can inspect the contents of StringBuilder directly, but you'll have to write your own methods to do that. You could use state-machine-based techniques to monitor the stream of data.
Whether any of that would be "better" than your current implementation depends on a number of factors, including but not limited to:
Are you seeing a specific performance issue now?
If so, what specific performance goal are you trying to achieve?
What other overhead exists in your code?
The first question above is very important. Your first priority should be code that works. If your code is working now, and does not have a specific performance issue that you know you need to solve, then you can safely ignore the GC issues for now. .NET's GC system is designed to perform well in scenarios just like this one, and usually will. Only in unusual situations would you need to do extra work to solve a performance problem here.
Without a good, minimal, complete code example that clearly illustrates the above and any other relevant issues, it would not be possible to say with any specificity whether there is in fact "a better way". If the above answers don't provide the information you're looking for, consider improving your question so that it is not so broad.

FromBase64 string length must be multiple or 4 or not?

according to my understanding, a base64 encoded string (ie the output of encode) must always be a multiple of 4.
the c# Convert.FromBase64String says that its input must be a multiple of 4
However if I give it a 25 character string it doesnt complain
[convert]::FromBase64String("ei5gsIELIki+GpnPGyPVBA==")
[convert]::FromBase64String("1ei5gsIELIki+GpnPGyPVBA==")
both work. (The first one is 24 , second is 25)
[convert]::FromBase64String("11ei5gsIELIki+GpnPGyPVBA==")
fails with Invalid length exception
I assume this is a bug in the c# library but I just want to make sure - I am writing code that is sniffing strings to see if they are valid base64 strings and I want to be sure that I understand what a valid one looks like (one possible implementation was to give the string to system.convert and see if it threw - why reinvent perfectly good code)

Yes, this is a flaw (aka bug). It got started due to a perf optimization in an internal helper function named FromBase64_ComputeResultLength() which calculates the length of the byte[] result. It has this comment (edited to fit):
// For legal input, we can assume that 0 <= padding < 3. But it may be
// more for illegal input.
// We will notice it at decode when we see a '=' at the wrong place.
The "we will notice" remark is not entirely accurate, the decoder does flag an '=' if one isn't expected but it fails to check if there's one too many. Which is the case for the 25-char string.
You can report the problem at connect.microsoft.com, I don't see an existing report that resembles it. Do note that it is fairly unlikely that Microsoft can actually fix it any time soon since the change is going to break existing programs that now successfully parse bad base64 strings. It normally requires a major .NET release update to get rid of such problems, like it was done for .NET 4.0, there isn't one on the horizon afaik.
But yes, the simple workaround for you is to check if the string length is divisible by 4, use the % operator.

"Where are my bytes?" or Investigation of file length traits

This is a continuation of my question about downloading files in chunks. The explanation will be quite big, so I'll try to divide it to several parts.
1) What I tried to do?
I was creating a download manager for a Window-Phone application. First, I tried to solve the problem of downloading
large files (the explanation is in the previous question). No I want to add "resumable download" feature.
2) What I've already done.
At the current moment I have a well-working download manager, that allows to outflank the Windows Phone RAM limit.
The plot of this manager, is that it allows to download small chunks of file consequently, using HTTP Range header.
A fast explanation of how it works:
The file is downloaded in chunks of constant size. Let's call this size "delta". After the file chunk was downloaded,
it is saved to local storage (hard disk, on WP it's called Isolated Storage) in Append mode (so, the downloaded byte array is
always added to the end of the file). After downloading a single chunk the statement
if (mediaFileLength >= delta) // mediaFileLength is a length of downloaded chunk
is checked. If it's true, that
means, there's something left for download and this method is invoked recursively. Otherwise it means, that this chunk
was last, and there's nothing left to download.
3) What's the problem?
Until I used this logic at one-time downloads (By one-time I mean, when you start downloading file and wait until the download is finished)
that worked well. However, I decided, that I need "resume download" feature. So, the facts:
3.1) I know, that the file chunk size is a constant.
3.2) I know, when the file is completely downloaded or not. (that's a indirect result of my app logic,
won't weary you by explanation, just suppose, that this is a fact)
On the assumption of these two statements I can prove, that the number of downloaded chunks is equal to
(CurrentFileLength)/delta. Where CurrentFileLenght is a size of already downloaded file in bytes.
To resume downloading file I should simply set the required headers and invoke download method. That seems logic, isn't it? And I tried to implement it:
// Check file size
using (IsolatedStorageFileStream fileStream = isolatedStorageFile.OpenFile("SomewhereInTheIsolatedStorage", FileMode.Open, FileAccess.Read))
{
int currentFileSize = Convert.ToInt32(fileStream.Length);
int currentFileChunkIterator = currentFileSize / delta;
}
And what I see as a result? The downloaded file length is equal to 2432000 bytes (delta is 304160, Total file size is about 4,5 MB, we've downloaded only half of it). So the result is
approximately 7,995. (it's actually has long/int type, so it's 7 and should be 8 instead!) Why is this happening?
Simple math tells us, that the file length should be 2433280, so the given value is very close, but not equal.
Further investigations showed, that all values, given from the fileStream.Length are not accurate, but all are close.
Why is this happening? I don't know precisely, but perhaps, the .Length value is taken somewhere from file metadata.
Perhaps, such rounding is normal for this method. Perhaps, when the download was interrupted, the file wasn't saved totally...(no, that's real fantastic, it can't be)
So the problem is set - it's "How to determine number of the chunks downloaded". Question is how to solve it.
4) My thoughts about solving the problem.
My first thought was about using maths here. Set some epsilon-neiborhood and use it in currentFileChunkIterator = currentFileSize / delta; statement.
But that will demand us to remember about type I and type II errors (or false alarm and miss, if you don't like the statistics terms.) Perhaps, there's nothing left to download.
Also, I didn't checked, if the difference of the provided value and the true value is supposed to grow permanently
or there will be cyclical fluctuations. With the small sizes (about 4-5 MB) I've seen only growth, but that doesn't prove anything.
So, I'm asking for help here, as I don't like my solution.
5) What I would like to hear as answer:
What causes the difference between real value and received value?
Is there a way to receive a true value?
If not, is my solution good for this problem?
Are there other better solutions?
P.S. I won't set a Windows-Phone tag, because I'm not sure that this problem is OS-related. I used the Isolated Storage Tool
to check the size of downloaded file, and it showed me the same as the received value(I'm sorry about Russian language at screenshot):

I'm answering to your update:
This is my understanding so far: The length actually written to the file is more (rounded up to the next 1KiB) than you actually wrote to it. This causes your assumption of "file.Length == amount downloaded" to be wrong.
One solution would be to track this information separately. Create some meta-data structure (which can be persisted using the same storage mechanism) to accurately track which blocks have been downloaded, as well as the entire size of the file:
[DataContract] //< I forgot how serialization on the phone works, please forgive me if the tags differ
struct Metadata
{
[DataMember]
public int Length;
[DataMember]
public int NumBlocksDownloaded;
}
This would be enough to reconstruct which blocks have been downloaded and which have not, assuming that you keep downloading them in a consecutive fashion.
edit
Of course you would have to change your code from a simple append to moving the position of the stream to the correct block, before writing the data to the stream:
file.Position = currentBlock * delta;
file.Write(block, 0, block.Length);

Just as a possible bug. Dont forget to verify if the file was modified during requests. Specialy during long time between ones, that can occor on pause/resume.
The error could be big, like the file being modified to small size and your count getting "erronic", and the file being the same size but with modified contents, this will leave a corrupted file.

Have you heard an anecdote about a noob-programmer and 10 guru-programmers? Guru programmers were trying to find an error in his solution, and noob had already found it, but didn't tell about it, as it was something that stupid, we was afraid to be laughed at.
Why I remembered this? Because the situation is similar.
The explanation of my question was very heavy, and I decided not to mention some small aspects, that I was sure, worked correctly. (And they really worked correctly)
One of this small aspects, was the fact, that the downloaded file was encrypted via AES PKCS7 padding. Well, the decryption worked correctly, I knew it, so why should I mention it? And I didn't.
So, then I tried to find out, what exactly causes the error with the last chunk. The most credible version was about problems with buffering, and I tried to find, where am I leaving the missing bytes. I tested again and again, but I couldn't find them, as every chunk was saving without any losses. And one day I comprehended:
There is no spoon
There is no error.
What's the point of AES PKCS7? Well, the primary one is that it makes the decrypted file smaller. Not much, only at 16 bytes. And it was considered in my decryption method and download method, so there should be no problem, right?
But what happens, when the download process interrupts? The last chunk will save correctly, there will be no errors with buffering or other ones. And then we want to continue download. The number of the downloaded chunks will be equal to currentFileChunkIterator = currentFileSize / delta;
And here I should ask myself: "Why are you trying to do something THAT stupid?"
"Your downloaded one chunk size is not delta. Actually, it's less than delta". (the decryption makes chunk smaller to 16 bytes, remember?)
The delta itself consists of 10 equal parts, that are being decrypted. So we should divide not by delta, but by (delta - 16 * 10) which is (304160 - 160) = 304000.
I sense a rat here. Let's try to find out the number of the downloaded chunks:
2432000 / 304000 = 8. Wait... OH SHI~
So, that's the end of story.
The whole solution logic was right.
The only reason it failed, was my thought, that, for some reason, the downloaded decrypted file size should be the same as the sum of downloaded encrypted chunks.
And, of course, as I didn't mention about the decryption(it's mentioned only in previous question, which is only linked), none of you could give me a correct answer. I'm terribly sorry about that.

In continue to my comment..
The original file size as I understand from your description is 2432000 bytes.
The Chunk size is set to 304160 bytes (or 304160 per "delta").
So, the machine which send the file was able to fill 7 chunks and sent them.
The receiving machine now has 7 x 304160 bytes = 2129120 bytes.
The last chunk will not be filled to the end as there is not enough bytes left to fill to it.. so it will contain: 2432000 - 2129120 = 302880 which is less than 304160
If you add the numbers you will get 7x304160 + 1x302880 = 2432000 bytes
So according to that the original file transferred in full to the destination.
The problem is that you are calculating 8x304160 = 2433280 insisting that even the last chunk must be filled completely - but with what?? and why??
In humble.. are you locked in some kind of math confusion or did I misunderstand your problem?
Please answer, What is the original file size and what size is being received at the other end? (totals!)

Problem with C# multiline textbox memory usage

I am using a multiline text box in C# to just log some trace information. I simply use AppendText("text-goes-here\r\n") as I need to add lines.
I've let this program run for a few days (with a lot of active trace) and I noticed it was using a lot of memory. Long story short, it appears that even with the maxlength value to something very small (256) the content of the text box just keeps expanding.
I thought it worked like a FIFO (throwing away the oldest text that exceeds the maxlength size). It doesn't, it just keeps increasing in size. This is apparently the cause of my memory waste. Anybody know what I'm doing wrong?
Added a few hours after initial question...
Ok, I tried the suggested code below. To quickly test it, I simply added a timer to my app and from that timer tick I now call a method that does essentially the same thing as the code below. The tick rate is high so that I can observe the memory usage of the process and quickly determine if there is a leak. There wasn't. That was good; however, I put this in my application and memory usage did not change (still leaking). That sure seems to imply that I have a leak somwehere else :-( however, if I simply add a return at the top of that method, the usage drops back to stable. Any thoughts on this? The timer-tick-invoked code did not accumulate memory but my real code (same method) does. The difference is that I'm calling the method from a variety of different places in the real code. Can the context of the call affect this somehow? (note, if it isn't already obvious, I'm not a .NET expert by any means)...

TextBox will allow you to append text regardless of MaxLength value - it's only used to control user entry. You can create a method that will be adding new text after verifying that maxlength is not reached, and if it is, just remove x lines from the beginning.

You could use a simple function to append text:
int maxLength = 256;
private void AppendText(string text)
{
textBox1.AppendText(text);
if(textBox1.Text.Length > maxLength)
textBox1.Text = textBox1.Text.Substring(textBox1.Text.Length - maxLength);
}

StackOverFlowException - but oviously NO recursion/endless loop

I'm now blocked by this problem the entire day, read thousands of google results, but nothing seems to reflect my problem or even come near to it... i hope any of you has a push into the right direction for me.
I wrote a client-server-application (so more like 2 applications) - the client collects data about his system, as well as a screenshot, serializes all this into a XML stream (the picture as a byte[]-array]) and sends this to the server in regular intervals.
The server receives the stream (via tcp), deserializes the xml to an information-object and shows the information on a windows form.
This process is running stable for about 20-25 minutes at a submission interval of 3 seconds. When observing the memory usage there's nothing significant to see, also kinda stable. But after these 20-25 mins the server throws a StackOverflowException at the point where it deserializes the tcp-stream, especially when setting the Image property from the byte[]-array.
I thoroughly searched for recursive or endless loops, and regarding the fact that it occurs after thousands of sucessfull intervals, i could hardly imagine that.
public byte[] ImageBase
{
get
{
MemoryStream ms = new MemoryStream();
_screen.Save(ms, System.Drawing.Imaging.ImageFormat.Jpeg);
return ms.GetBuffer();
}
set
{
if (_screen != null) _screen.Dispose(); //preventing well-known image memory leak
MemoryStream ms = new MemoryStream(value);
try
{
_screen = Image.FromStream(ms); //<< EXCEPTION THROWING HERE
}
catch (StackOverflowException ex) //thx to new CLR management this wont work anymore -.-
{
Console.WriteLine(ex.Message + Environment.NewLine + ex.StackTrace);
}
ms.Dispose();
ms = null;
}
}
I hope that more code would be unnecessary, or it could get very complex...
Please help, i have no clue at all anymore
thx
Chris

I suspect that it's not the code you posted but the code that reads from the TCP stream that's growing the stack. The fact that the straw breaking the camel's back happens during Image.FromStream is probably irrelevant. I've oftentimes seen people write socket processing code containing self-calling code (sometimes indirectly, like A -> B -> A -> B). You should inspect that code and post it here for us to look at.

You might want to read this. Loading an image from a stream without keeping the stream open
It seem possible that streams are being maintained on the stack or some other object that is eventually blowing the stack.
My suggestion would be to then just hold onto the byte[] and wait until the last possible moment to decode it and draw it. then dispose of the Image immediately. Your get/set would just then set/get the byte[]. You would then implement a custom drawing routine that would decode the current byte[] and draw it making sure not to hold onto any more resources than necessary.
Update
If there is a way you can get us a full stack trace we might be able to help further. I'm beginning to think it isn't the problem like I described. I created a sample program that created 10,000 images just like you do in your setter and there wasn't a problem. If you send an image every 3 seconds, that's 30 images a minute times 20 minutes which is only 600 images.
I'm very interested in the solution to this. I'll come back to it later.
There are a few possibilities, however, remote.
Image.FromStream is trying to process an invalid/corrupt byte[] and that method somehow uses recursion to decode a bitmap. Highly unlikely.
The exception isn't being thrown where you think it is. A full stack trace, if possible would be very helpful. As you stated you cannot catch a StackOverflowException. I believe there are provisions for this if you are running it through the debugger though.

I'm not sure if it's relevant but the MSDN documentation for Image.FromStream states that
You must keep the stream open for the lifetime of the Image.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.