C# string length limit

C# string length limit - c#

I am trying to understand why VS loses all keyword higlighting after this line of code. The code will compile but it will through exeptions.
the string is a base64string represenation of an image. If you remove the first letter VS recognizes that the characters are a valid string, compiles and no exceptions. Interestingly enough, the string is 32,742. If you add 32,743..its a no-go. I assume it is related to
a limit to how you can initialize a string
a need to use a different data type like char.
Anyone have an idea..I just stumbled upon this and now I am curious.
Bob
string g = "Any string greater that 32,742 characters suddenly disables all keyword highlighting and code will fail......";

Related

casting char for ansi characters

I'm migrating a project from VB6 to C#.
In one part of the code it gets the int value in ANSI(I think!) and gets the char value to an string:
StringBuilder s_e = new StringBuilder();
s_e.Append(Microsoft.VisualBasic.Strings.Chr(inNCr).ToString());
Tried with
s_e.Append((char)inNCr);
But works only from a-z but not for another characters, so the result is not always correct.
I know I can use using Microsoft.VisualBasic; and be done with it, but is there any way to do it "natively" in c#?
Thanks in advance.
[edit] Clarification needed.
inNCr is the character code of the character as an int. then that int should be a char and show up as an string on the app.
For example:
Character code 130 should be '‚' but on the app is showing as blank and on debug as \u0082

Use Convert.ToChar. See here for more details: https://learn.microsoft.com/en-us/dotnet/api/system.convert.tochar?view=netframework-4.8#System_Convert_ToChar_System_Int32_

Mysteriously added quotation mark inside a string

I copied and pasted a certain source code into my program with a text editor. I basically need to confirm that the source code begins with "int main()" so I went ahead and compared line with "int main()" but the comparison always returned false.
I decided to strip the string into characters and found something weird.
so string line has "int main()" passed inside it which is the text that has been pasted inside the text editor. You would think a and b would have the same characters, but they don't:
I'm honestly not sure where is that quotation mark in the beginning coming from. The original string didn't contain it, the debugger doesn't show it (It would display "\"int main()\"" otherwise). What is happening here?
Edit: I tried line = line.Trim(). Still that character is not gone. Apparently it's some special unicode character for Zero width no-break space. How can I remove this from my string?

65279 looks like the decimal representation of a UTF-16 BOM (U+FEFF), is it possible that the way you're reading the data into "line" would've failed to remove it?

Could you set line to line.Trim(); It's hard to tell what might be going on without seeing how line is set.
update based on the BOM character: try line.Trim(new char[]{'\uFEFF'}); assuming .NET 4

I've found the solution:
private readonly string BYTE_ORDER_MARK_UTF8 = Encoding.UTF8.GetString(Encoding.UTF8.GetPreamble());
...
if (line.StartsWith(BYTE_ORDER_MARK_UTF8))
line = line.Remove(0, BYTE_ORDER_MARK_UTF8.Length);
That was bizzare...

In that code you have posted, it seems like the line variable begins with a space character. Try line = line.Trim();
Edit:
The reason the string.Trim() method is not working as expected can found on MSDN
Starting with the .NET Framework 4, the method trims all Unicode white-space characters (that is, characters that produce a true return value when they are passed to the Char.IsWhiteSpace method). Because of this change, the Trim method in the .NET Framework 3.5 SP1 and earlier versions removes two characters, ZERO WIDTH SPACE (U+200B) and ZERO WIDTH NO-BREAK SPACE (U+FEFF), that the Trim method in the .NET Framework 4 and later versions does not remove.
(U+FEFF) seems to be the character at the beginning of line, hence why Trim isn't dealing with it.

Find a specific string within another string without getting similar results

In a console application I get as input the RTF (Rich text Format) code of a file. The source is a database and data gathered via query.
My goal is to search whether in the input code, as string, is present the code: \par (end of carriage in RTF).
I tried with string.IndexOf and string.Contains but both returns me bad results since they match also code like: "\pard".
Given a string like:
{\rtf1\ansi\deff0{\fonttbl{\f0\fnil\fcharset0 Times New Roman;}
{\f1\fnil\fcharset0 MS Sans Serif;}
\deflang1033\pard\plain\tx0\f2\lang1033\fs20\cf1 Payment}
How can I build my condition so that it return false, since the string does not contain \par? Eventually how could I set a regex to say that exactly the keyword "\par" (so length 4 chars) and no other will match? Thanks.
EDIT: The language used is C# and I am developing the console application with VS 2010.

You don't tell us the language you are using, but generally you need a word boundary something like this:
\\par\b
to ensure that there is not a word character following

Comparing strings that contain formatting in C#

I'm working on a function that given some settings - such as line spacing, the output (in string form) is modified. In order to test such scenarios, I'm using string literals, as shown below for the expected result.
The method, using a string builder, (AppendLine) generates the said output. One issue I have run into is that of comparing such strings. In the example below, both are equal in terms of what they represent. The result is the area which I care about, however when comparing two strings, one literal, one not, equality naturally fails. This is because one of the strings emits line spacing, while the other only demonstrates the formatting it contains.
What would be the best way of solving this equality problem? I do care about formatting such as new lines from the result of the method, this is crucially important.
Code:
string expected = #"Test\n\n\nEnd Test.";
string result = "Test\n\n\nEnd Test";
Console.WriteLine(expected);
Console.WriteLine(result);
Output:
Test\n\n\nEnd Test.
Test
End Test

The # prefix tells the compiler to take the string exactly as it is written. So, it doesn't format the \n characters to carriage returns and line feeds.
Since you don't have the same prefix for the string assigned to your result variable, the compiler formats it. If you would like to continue to use the # prefix, just do the following:
string expected = #"Test
End Test";
You'll have to input the carriage returns and line feed within the string as invisible characters.

You're using the term "literal" incorrectly. "Literal" simply means an actual value that exists in code. In other words, values exist in code either as variables (for the sake of simplicity I'm including constants in this group) and literals. Variables are an abstract notion of a value, whereas literals are a value.
All this is to say that both of your strings are string literals, as they're hard-coded into your application. The # prefix simply states that the compiler is to include escape characters (indeed, anything other than a double-quote) in the string, rather than evaluating the escape sequences when compiling the string literal into the assembly.
First of all, whatever your function returns (either a string that contains standard escape sequences for newlines rather than newlines themselves, or a string that actually contains newlines) is what your test variable should contain. Make your tests as close to the actual output as possible, as the more work you do to massage the values into a comparable form the more code paths you have to test. If you're looking to be able to compare a string with formatting escape sequences embedded into it to a string where those sequences have been evaluated (essentially comparing the two strings in your example), then I would say this:
Be sure that this is really want you want to do.
You'll have to duplicate the functionality of the C# compiler in interpreting these values and turning your "format string" into a "formatted string".
For doing #2, a RegEx processor is probably going to be the simplest option. See this page for a list of C# string escape sequences.

I feel somewhat enlightened, yet annoyed at what I discovered.
This is my first project using MSTest, and after a failing test I was selecting View Test Details to see how and why my test failed. The formatting for string output in this details display is very poor, for example you get:
Assert.AreEqual failed. Expected:<TestTest End>. Actual:<TestTest End>.
This is for formatted text - the strange thing is if you have /r (line feeds) instead of line breaks (/n) the formatting is actually somewhat correct.
It turns out to view the correct output you need to run the tests in debug mode. In other words, when you have a failing test, run the test in debug and the exception will be caught and displayed as follows:
Assert.AreEqual failed. Expected:<Test
Test End>. Actual:<Test
Test End>.
The above obviously containing the correct formatting.
In the end it turns out my initial method of storing the expectations (with formatting) in strings was correct, yet my unfamiliarity of MSTest made me question my means as it appeared to be valid input, yet was simply being displayed back to myself in what appeared a valid output.

Use a regex to strip white space before you do your compare?

Base64 String throwing invalid character error

I keep getting a Base64 invalid character error even though I shouldn't.
The program takes an XML file and exports it to a document. If the user wants, it will compress the file as well. The compression works fine and returns a Base64 String which is encoded into UTF-8 and written to a file.
When its time to reload the document into the program I have to check whether its compressed or not, the code is simply:
byte[] gzBuffer = System.Convert.FromBase64String(text);
return "1F-8B-08" == BitConverter.ToString(new List<Byte>(gzBuffer).GetRange(4, 3).ToArray());
It checks the beginning of the string to see if it has GZips code in it.
Now the thing is, all my tests work. I take a string, compress it, decompress it, and compare it to the original. The problem is when I get the string returned from an ADO Recordset. The string is exactly what was written to the file (with the addition of a "\0" at the end, but I don't think that even does anything, even trimmed off it still throws). I even copy and pasted the entire string into a test method and compress/decompress that. Works fine.
The tests will pass but the code will fail using the exact same string? The only difference is instead of just declaring a regular string and passing it in I'm getting one returned from a recordset.
Any ideas on what am I doing wrong?

You say
The string is exactly what was written
to the file (with the addition of a
"\0" at the end, but I don't think
that even does anything).
In fact, it does do something (it causes your code to throw a FormatException:"Invalid character in a Base-64 string") because the Convert.FromBase64String does not consider "\0" to be a valid Base64 character.
byte[] data1 = Convert.FromBase64String("AAAA\0"); // Throws exception
byte[] data2 = Convert.FromBase64String("AAAA"); // Works
Solution: Get rid of the zero termination. (Maybe call .Trim("\0"))
Notes:
The MSDN docs for Convert.FromBase64String say it will throw a FormatException when
The length of s, ignoring white space
characters, is not zero or a multiple
of 4.
-or-
The format of s is invalid. s contains a non-base 64 character, more
than two padding characters, or a
non-white space character among the
padding characters.
and that
The base 64 digits in ascending order
from zero are the uppercase characters
'A' to 'Z', lowercase characters 'a'
to 'z', numerals '0' to '9', and the
symbols '+' and '/'.

Whether null char is allowed or not really depends on base64 codec in question.
Given vagueness of Base64 standard (there is no authoritative exact specification), many implementations would just ignore it as white space. And then others can flag it as a problem. And buggiest ones wouldn't notice and would happily try decoding it... :-/
But it sounds c# implementation does not like it (which is one valid approach) so if removing it helps, that should be done.
One minor additional comment: UTF-8 is not a requirement, ISO-8859-x aka Latin-x, and 7-bit Ascii would work as well. This because Base64 was specifically designed to only use 7-bit subset which works with all 7-bit ascii compatible encodings.

string stringToDecrypt = HttpContext.Current.Request.QueryString.ToString()
//change to
string stringToDecrypt = HttpUtility.UrlDecode(HttpContext.Current.Request.QueryString.ToString())

If removing \0 from the end of string is impossible, you can add your own character for each string you encode, and remove it on decode.

One gotcha to do with converting Base64 from a string is that some conversion functions use the preceding "data:image/jpg;base64," and others only accept the actual data.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.