Base64EncodedString does not include NewLines

Base64EncodedString does not include NewLines - c#

I´m using a .NET core 3.0 project on Windows 10. I´m trying to encode a string to base64 with below code:
var stringvalue = "Row1" + Environment.NewLine + "\n\n" + "Row2";
var encodedString = Convert.ToBase64String(Encoding.UTF8.GetBytes(stringvalue));
encodedString has then below result:
Um93MQ0KCgpSb3cy
stringvalue is:
Row1\r\n\n\nRow2
However, if I´m passing the same value to this site (https://www.base64encode.org/), i´m getting another result:
Um93MVxyXG5cblxuUm93Mg==
In visual studio, I tried to resave the file with Unix lineendings, but without any luck:
I want the string to be encoded as how it´s done in https://www.base64encode.org. Any ideas how to get this done?

From the screenshot, I can see that you have entered a different string from the string you used in your C# code. The string you used in https://www.base64encode.org is represented as a C# string literal like this:
"Row1\\r\\n\n\\nRow2"
// or
#"Row1\r\n\n\nRow2"
So to answer your question:
I want the string to be encoded as how it´s done in https://www.base64encode.org. Any ideas how to get this done?
You should do:
var encodedString = Convert.ToBase64String(Encoding.UTF8.GetBytes("Row1\\r\\n\n\\nRow2"));
But that's probably not what you actually want. Your first attempt at the C# code is more likely to be desired, because that is actually a carriage return character, followed by 3 new line characters. The string you entered in https://www.base64encode.org is simply the backslash character followed by the letter r (or n).
You can't really make the output on https://www.base64encode.org match the C# output, because you can only choose one kind of line separator on there. You can only either encode Row1\r\n\r\n\r\nRow2 or Row\n\n\nRow2. Nevertheless, you can check that the C# result is correct by decoding the output using https://www.base64decode.org.

The \r\n will be encoded on the website, this is not a newline, these are 4 characters. There is this newline-separator-checkbox, to say you want the windows style, to convert your real world input value:
Row1
Row2.
I guess your \r\n\n\n is just a mistake, the website is prepared to convert it to \r\n\r\n only.

Related

How to unescape multibyte unicode in c# [duplicate]

This question already has answers here:
How to unescape unicode string in C#
(2 answers)
Closed 2 years ago.
The following unicode string from a text file encodes a single apostrophe using 3 bytes:
It\u00e2\u0080\u0099s working
This should decode to:
It’s working
How can I decode this string in C#?
For example, when I try the following code:
string test = #"It\u00e2\u0080\u0099s working";
string test2 = System.Text.RegularExpressions.Regex.Unescape(test);
it incorrectly decodes the first byte only:
Itâ\u0080\u0099s awesome

This is UTF8. Try UTF8 Encoding
using System.Text;
using System.Text.RegularExpressions;
string test = "It\u00e2\u0080\u0099s working";
byte[] bytes = Encoding.GetEncoding(28591)
.GetBytes(test);
var converted = Encoding.UTF8.GetString(bytes);//It’s working

try this to parse file :
private static Regex _regex = new Regex(#"\\u(?<Value>[a-zA-Z0-9]{4})", RegexOptions.Compiled);
public string decodeString(string value)
{
return _regex.Replace(
value,
m => ((char)int.Parse(m.Groups["Value"].Value, NumberStyles.HexNumber)).ToString()
);
}

That is javascript unicode encoding. Use a C# javascript deserializer to convert it.
(I don't have enough reputation to comment, so I will write here)
Where did you get those characters from in the first place?
\uXXXX is an encoding used by JavaScript and C# (didn't know about C# this until now) to encode 16 bit Unicode characters in string literals. 16 bit - 4 hex characters, so \uXXXX, each X representing one Hexadecimal digit.
Note this is used to encode string literals in source code! It is not used to encode the bytes stored in files or memory or what not. It is an older style of encoding due to modern source code editors usually support UTF-8 or UTF-16 or some other encoding to be able to store unicode characters in source code files, and then they are also able to display the unicode character symbol, and allow it being typed right at the editor. So \uXXXX typing is not needed, and going out of style.
So that is why I asked where did you get the string initially? You wrote in one comment you read it from a file? What generated the file?
If each \uXXXX is taken alone by itself as unicode characters, which is what \uXXXX means, doesn't make sense being there. 00e2 is a character a with cap on it, 0080 and 0099 are control characters, not printable.
If e28099 are taken together as three single bytes, i.e. dropping off 00 valued first bytes of each as they are in the form of \u00XX then it fits as a UTF8 character representation of a Unicode character with decimal value 2019, which is "Unicode Character 'RIGHT SINGLE QUOTATION MARK' (U+2019)"
Then that is what you are looking for, but this doesn't seem correct usage of encoding that generated that string. If you end up with those strings and have to evaluate them, then comments above by "C# Novice" is working, but it may not work in every case.
You could convert string literals that uses \uXXXX encoding in its strings using a javascript script evaluator, or CSharpScript.Run() to make a string literal with those and assign to a variable, and then look at its bytes. But I tried that later and due to those byte values/characters not making sense I don't get anything meaningful from them. I get an a with a cap, and the next two, CSharpScript refuses to decode and leaves as is. Becuase those are control characters when decoded.
Here three different ways using C# avaliable libraries doing \uXXXX decoding. The first two uses NewtonSoft.JSON package, the last uses Roslyn/CSharpScript, both avalilable from Nuget. Note none of these print single aposthrope, due to what I described above. In contrast, if I change the string to "\u3053\u3093\u306B\u3061\u306F\u4E16\u754C!", it prints on the debug output window this Japanese text: "こんにちは世界!" , which is what Google translate told me is Japanese translation of "Hello World!"
https://translate.google.com/?sl=ja&tl=en&text=%E3%81%93%E3%82%93%E3%81%AB%E3%81%A1%E3%81%AF%E4%B8%96%E7%95%8C!&op=translate
So in summary, whatever generated those scripts, doesn't seem to be doing standard things.
string test = #"It\u00e2\u0080\u0099s working";
// Using JSON deserialization, since \uXXXX is valid encoding JavaScript string literals
// Have to add starting and ending quotes to make it a script literal definition, then deserialize as string
var d = Newtonsoft.Json.JsonConvert.DeserializeObject("\"" + test + "\"", typeof(string));
Console.WriteLine(d);
System.Diagnostics.Debug.WriteLine(d);
// Another way of JavaScript deserialization. If you are using a stream like reading from file this maybe better:
TextReader reader = new StringReader("\"" + test + "\"");
Newtonsoft.Json.JsonTextReader rdr = new JsonTextReader(reader);
rdr.Read();
Console.WriteLine(rdr.Value);
System.Diagnostics.Debug.WriteLine(rdr.Value);
// lastly overkill and too heavy: Using Roslyn CSharpScript, and letting C# compiler to decode \uXXXX's in string literal:
ScriptOptions opt = ScriptOptions.Default;
//opt = opt.WithFileEncoding(Encoding.Unicode);
Task<ScriptState<string>> task = Task.Run(async () => { return CSharpScript.RunAsync<string>("string str = \"" + test + "\".ToString();", opt); }).Result;
ScriptState<string> s = task.Result;
var ddd = s.Variables[0];
Console.WriteLine(ddd.Value);
System.Diagnostics.Debug.WriteLine(ddd.Value);

unicode to human readable string c# .net

This is probably a very basic question, but really appreciate if you could help me with this:
I want to convert an string that contains characters like \u000d\u000a\u000d\u000 to a human readable string, however I don't want to use .Replace method since the Unicode characters might be much more than what I include the software to check and replace.
string = "Test \u000d\u000a\u000d\u000aTesting with new line. \u000d\u000a\u000d\u000aone more new line"
I receive this string as a json Object from my server.

Do you even need that?
For example, the following code will print abc which is the actual decoded value:
var unicodeString = "\u0061\u0062\u0063";
Console.WriteLine(unicodeString);

Is it possible to enter a New Line in a string without Escape Sequences?

I want a String to have a New Line in it, but I cannot use escape sequences because the interface I am sending my string to does not recognize them. As far as I know, C# does not actually store a New Line in the String, but rather it stores the escape sequence, causing the literal contents to be passed, rather than what they actually mean.
My best guess is that I would have to somehow parse the number 10 (the decimal value of a New Line according to the ASCII table) into ASCII. But I'm not sure how to do that, because C# parses numbers directly to String if attempting this:
"hello" + 10 + "world"
Any suggestions?

If you say "hello\nworld", the actual string will contain:
hello
world
There will be an actual new-line character in the string. At no point are the characters \ and n stored in the string.
There are a few ways to get the exact same result, but a simple \n in the string is a common way.
A simple cast should also do the same:
"hello" + (char)10 + "world"
Although likely slightly slower because of string concatenation. I say "likely" because it could probably be optimized away, or an actual example using \n will also result in string concatenation, taking roughly the same amount of time.
Test.

The preferred new line character is Environment.NewLine for its cross-platform capability.

You could use xml for communication, if you're receiver can handle this

Escaping a double quotes in string in c#

I know this has been covered lots of times but I still have a problem with all of the solutions.
I need to build a string to send to a JSON parser which needs quotes in it. I've tried these forms:
string t1 = "[{\"TS\"}]";
string t2 = "[{" + "\"" + "TS" + "\"" + "}]";
string t3 = #"[{""TS""}]";
Debug.Print(t1);
Debug.Print(t1);
Debug.Print(t1);
The debug statement shows it correctly [{"TS"}] but when I look at it in the debugger and most importantly when I send the string to my server side json parser is has the escape character in it:
"[{\"TS\"}]"
How can I get rid of the escape characters in the actual string?

The debug statement shows it correctly [{"TS"}] but when I look at it
in the debugger and most importantly when I send the string to my
server side json parser is has the escape character in it:
"[{\"TS\"}]"
From the debugger point of view it will always show the escaped version (this is so you, as the developer, know exactly what the string value is). This is not an error. When you send it to another .Net system, it will again show the escaped version from the debugger point of view. If you output the value, (Response.Write() or Console.WriteLine()) you will see that the version you expect will be there.
If you highlight the variable (from the debugger) and select the dropdown next to the magnifying glass icon and select "Text Visualizer" you will see how it displays in plain text. This may be what you are looking for.
Per your comments, i wanted to suggest that you also watch how you convert your string in to bytes. You want to make sure you encode your bytes in a format that can be understood by other machines. Make sure you convert your string into bytes using a command as follows:
System.Text.Encoding.ASCII.GetBytes(mystring);
I have the sneaking suspicion that you are sending the bit representation of the string itself instead of an encoded version.

Read a file with unicode characters

I have an asp.net c# page and am trying to read a file that has the following charater ’ and convert it to '. (From slanted apostrophe to apostrophe).
FileInfo fileinfo = new FileInfo(FileLocation);
string content = File.ReadAllText(fileinfo.FullName);
//strip out bad characters
content = content.Replace("’", "'");
This doesn't work and it changes the slanted apostrophes into ? marks.

I suspect that the problem is not with the replacement, but rather with the reading of the file itself. When I tried this the nieve way (using Word and copy-paste) I ended up with the same results as you, however examining content showed that the .Net framework believe that the character was Unicode character 65533, i.e. the "WTF?" character before the string replacement. You can check this yourself by examining the relevant character in the Visual Studio debugger, where it should show the character code:
content[0]; // 65533 '�'
The reason why the replace isn't working is simple - content doesn't contain the string you gave it:
content.IndexOf("’"); // -1
As for why the file reading isn't working properly - you are probably using the wrong encoding when reading the file. (If no encoding is specified then the .Net framework will try to determine the correct encoding for you, however there is no 100% reliable way to do this and so often it can get it wrong). The exact encoding you need depends on the file itself, however in my case the encoding being used was Extended ASCII, and so to read the file I just needed to specify the correct encoding:
string content = File.ReadAllText(fileinfo.FullName, Encoding.GetEncoding("iso-8859-1"));
(See this question).
You also need to make sure that you specify the correct character in your replacement string - when using "odd" characters in code you may find it more reliable to specify the character by its character code, rather than as a string literal (which may cause problems if the encoding of the source file changes), for example the following worked for me:
content = content.Replace("\u0092", "'");

My bet is the file is encoded in Windows-1252. This is almost the same as ISO 8859-1. The difference is Windows-1252 uses "displayable characters rather than control characters in the 0x80 to 0x9F range". (Which is where the slanted apostrophe is located. i.e. 0x92)
//Specify Windows-1252 here
string content = File.ReadAllText(fileinfo.FullName, Encoding.GetEncoding(1252));
//Your replace code will then work as is
content = content.Replace("’", "'");

// This should replace smart single quotes with a straight single quote
Regex.Replace(content, #"(\u2018|\u2019)", "'");
//However the better approach seems to be to read the page with the proper encoding and leave the quotes alone
var sreader= new StreamReader(fileInfo.Create(), Encoding.GetEncoding(1252));

If you use String (capitalized) and not string, it should be able to handle any Unicode you throw at it. Try that first and see if that works.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Base64EncodedString does not include NewLines - c#

Related

How to unescape multibyte unicode in c# [duplicate]

unicode to human readable string c# .net

Is it possible to enter a New Line in a string without Escape Sequences?

Escaping a double quotes in string in c#

Read a file with unicode characters

Categories

Resources