casting char for ansi characters - c#

I'm migrating a project from VB6 to C#.
In one part of the code it gets the int value in ANSI(I think!) and gets the char value to an string:
StringBuilder s_e = new StringBuilder();
s_e.Append(Microsoft.VisualBasic.Strings.Chr(inNCr).ToString());
Tried with
s_e.Append((char)inNCr);
But works only from a-z but not for another characters, so the result is not always correct.
I know I can use using Microsoft.VisualBasic; and be done with it, but is there any way to do it "natively" in c#?
Thanks in advance.
[edit] Clarification needed.
inNCr is the character code of the character as an int. then that int should be a char and show up as an string on the app.
For example:
Character code 130 should be '‚' but on the app is showing as blank and on debug as \u0082

Use Convert.ToChar. See here for more details: https://learn.microsoft.com/en-us/dotnet/api/system.convert.tochar?view=netframework-4.8#System_Convert_ToChar_System_Int32_

Related

String comparison returns False for same strings [duplicate]

I am parsing emails using a regex in a c# VSTO project. Once in a while, the regex does not seem to work (although if I paste the text and regex in regexbuddy, the regex correctly matches the text). If I look at the email in gmail, I see
=E2=80=8B
at the beginning and end of some lines (which I understand is the UTF8 zero width space); this appears to be what is messing up the regex. This seems to be only sequence showing up.
What is the easiest way to get rid of this exact sequence? I cannot do the obvious
MailItem.Body.Replace("=E2=80=8B", "")
because those characters don't show up in the c# string.
I also tried
byte[] bytes = Encoding.Default.GetBytes(MailItem.TextBody);
string myString = Encoding.UTF8.GetString(bytes);
But the zero-width spaces just show up as ?. I suppose I could go through the bytes array and remove the bytes comprising the zero width space, but I don't know what the bytes would look like (it does not seem as simple as converting E2 80 8B to decimal and searching for that).
As strings in C# are stored in Unicode (not UTF-8) the following might do the trick:
MailItem.Body.Replace("\u200B", "");
As all the Regex.Replace() methods operate on strings, that's not going to be useful here.
The string indexer returns a char, so for want of a better solution (and if you can't predict where these characters are going to be), as long-winded as it seems, you may be best off with:
StringBuilder newText = new StringBuilder();
for (int i = 0; i < MailItem.Body.Length; i++)
{
if (a[i] != '\u200b')
{
newText.Append(a[i]);
}
}
Use System.Web.HttpUtility.HtmlDecode(string);
Quite simple.

Base64EncodedString does not include NewLines

I´m using a .NET core 3.0 project on Windows 10. I´m trying to encode a string to base64 with below code:
var stringvalue = "Row1" + Environment.NewLine + "\n\n" + "Row2";
var encodedString = Convert.ToBase64String(Encoding.UTF8.GetBytes(stringvalue));
encodedString has then below result:
Um93MQ0KCgpSb3cy
stringvalue is:
Row1\r\n\n\nRow2
However, if I´m passing the same value to this site (https://www.base64encode.org/), i´m getting another result:
Um93MVxyXG5cblxuUm93Mg==
In visual studio, I tried to resave the file with Unix lineendings, but without any luck:
I want the string to be encoded as how it´s done in https://www.base64encode.org. Any ideas how to get this done?
From the screenshot, I can see that you have entered a different string from the string you used in your C# code. The string you used in https://www.base64encode.org is represented as a C# string literal like this:
"Row1\\r\\n\n\\nRow2"
// or
#"Row1\r\n\n\nRow2"
So to answer your question:
I want the string to be encoded as how it´s done in https://www.base64encode.org. Any ideas how to get this done?
You should do:
var encodedString = Convert.ToBase64String(Encoding.UTF8.GetBytes("Row1\\r\\n\n\\nRow2"));
But that's probably not what you actually want. Your first attempt at the C# code is more likely to be desired, because that is actually a carriage return character, followed by 3 new line characters. The string you entered in https://www.base64encode.org is simply the backslash character followed by the letter r (or n).
You can't really make the output on https://www.base64encode.org match the C# output, because you can only choose one kind of line separator on there. You can only either encode Row1\r\n\r\n\r\nRow2 or Row\n\n\nRow2. Nevertheless, you can check that the C# result is correct by decoding the output using https://www.base64decode.org.
The \r\n will be encoded on the website, this is not a newline, these are 4 characters. There is this newline-separator-checkbox, to say you want the windows style, to convert your real world input value:
Row1
Row2.
I guess your \r\n\n\n is just a mistake, the website is prepared to convert it to \r\n\r\n only.

When converting a code from unicode to ansi code we get ? as first character

when converting a code from unicode to ansi code we get ? as first character .please help
byte[] ansibyte = System.Text.ASCIIEncoding.Convert(System.Text.Encoding.UTF8, System.Text.Encoding.GetEncoding(865), unibytes);
it is working fine but it always displays the first character as ?
OK, this isn't really an answer, but it's too difficult to write code in comments.
class StackOverflow
{
byte[] unibytes; // To be replaced by your data
public void JustTesting()
{
string s;
// Single-step these under the debugger, examine s after each attempt to see what works
s = Encoding.Unicode.GetString(unibytes);
s = Encoding.UTF8.GetString(unibytes);
s = Encoding.BigEndianUnicode.GetString(unibytes);
// Once you have the correct decoding, re-encode to code page 865
byte[] asciiLikeByteArray = Encoding.GetEncoding(865).GetBytes(s);
}
}
What I meant by doing it in two stages is to first convert the Unicode byte array into a C# string, and examine the string. That's where the problem is, probably. Then, when the C# string is OK, convert that to the new byte array - it's unlikely that the problem is in that stage.
In the above code I suggest three possible conversions from "Unicode" to C# string. Unicode can exist in several different variations.
If none of those three possibilities work then it is probable that your byte array is not pure Unicode after all. Maybe it has a one-byte length prefix or something. You'll have to analyze that situation.

C# string length limit

I am trying to understand why VS loses all keyword higlighting after this line of code. The code will compile but it will through exeptions.
the string is a base64string represenation of an image. If you remove the first letter VS recognizes that the characters are a valid string, compiles and no exceptions. Interestingly enough, the string is 32,742. If you add 32,743..its a no-go. I assume it is related to
a limit to how you can initialize a string
a need to use a different data type like char.
Anyone have an idea..I just stumbled upon this and now I am curious.
Bob
string g = "Any string greater that 32,742 characters suddenly disables all keyword highlighting and code will fail......";

Mysteriously added quotation mark inside a string

I copied and pasted a certain source code into my program with a text editor. I basically need to confirm that the source code begins with "int main()" so I went ahead and compared line with "int main()" but the comparison always returned false.
I decided to strip the string into characters and found something weird.
so string line has "int main()" passed inside it which is the text that has been pasted inside the text editor. You would think a and b would have the same characters, but they don't:
I'm honestly not sure where is that quotation mark in the beginning coming from. The original string didn't contain it, the debugger doesn't show it (It would display "\"int main()\"" otherwise). What is happening here?
Edit: I tried line = line.Trim(). Still that character is not gone. Apparently it's some special unicode character for Zero width no-break space. How can I remove this from my string?
65279 looks like the decimal representation of a UTF-16 BOM (U+FEFF), is it possible that the way you're reading the data into "line" would've failed to remove it?
Could you set line to line.Trim(); It's hard to tell what might be going on without seeing how line is set.
update based on the BOM character: try line.Trim(new char[]{'\uFEFF'}); assuming .NET 4
I've found the solution:
private readonly string BYTE_ORDER_MARK_UTF8 = Encoding.UTF8.GetString(Encoding.UTF8.GetPreamble());
...
if (line.StartsWith(BYTE_ORDER_MARK_UTF8))
line = line.Remove(0, BYTE_ORDER_MARK_UTF8.Length);
That was bizzare...
In that code you have posted, it seems like the line variable begins with a space character. Try line = line.Trim();
Edit:
The reason the string.Trim() method is not working as expected can found on MSDN
Starting with the .NET Framework 4, the method trims all Unicode white-space characters (that is, characters that produce a true return value when they are passed to the Char.IsWhiteSpace method). Because of this change, the Trim method in the .NET Framework 3.5 SP1 and earlier versions removes two characters, ZERO WIDTH SPACE (U+200B) and ZERO WIDTH NO-BREAK SPACE (U+FEFF), that the Trim method in the .NET Framework 4 and later versions does not remove.
(U+FEFF) seems to be the character at the beginning of line, hence why Trim isn't dealing with it.

Categories

Resources