Display and save '\0' in .Net - c#

.Net 4.5 Framework.
I have a string:
string input = "abcdqw\0asdv\0aaa";
Is there any way to display the string in a richtextbox like
abcdqwasdvaaa
and when I save it to a .txt file then open by notepad++, it is
abcdqw[nul]asdv[nul]aaa
???
When I display it in a normal way as
richTextBox.Text = input;
the output is just
abcdqw

You can make a RichTextBox load a file that has ASCII nul's in it:
yourRichTextBox.LoadFile(#"C:\path\to\file\with\nulls.txt", RichTextBoxStreamType.PlainText);
But you can't do it by setting its text property. I presume this is because that (ultimately) is managed via windows message calls (WM_SETTEXT)which will cut off at the first ascii nul encountered

I haven't tried a null character in a RichTextBox, however I guess you are getting a truncated string on display. If that is the case, the solution should be as easy as
var input = "abcdqw\0asdv\0aaa";
var displayResult = input.Replace("\0","");

\0 is a "null character".
It seems rich textbox is truncating the string at \0, use like this
string input = "abcdqw\0asdv\0aaa";
var cleaned = input.Replace("\0", string.Empty);
richTextBox.Text = cleaned;

Related

Base64EncodedString does not include NewLines

I´m using a .NET core 3.0 project on Windows 10. I´m trying to encode a string to base64 with below code:
var stringvalue = "Row1" + Environment.NewLine + "\n\n" + "Row2";
var encodedString = Convert.ToBase64String(Encoding.UTF8.GetBytes(stringvalue));
encodedString has then below result:
Um93MQ0KCgpSb3cy
stringvalue is:
Row1\r\n\n\nRow2
However, if I´m passing the same value to this site (https://www.base64encode.org/), i´m getting another result:
Um93MVxyXG5cblxuUm93Mg==
In visual studio, I tried to resave the file with Unix lineendings, but without any luck:
I want the string to be encoded as how it´s done in https://www.base64encode.org. Any ideas how to get this done?
From the screenshot, I can see that you have entered a different string from the string you used in your C# code. The string you used in https://www.base64encode.org is represented as a C# string literal like this:
"Row1\\r\\n\n\\nRow2"
// or
#"Row1\r\n\n\nRow2"
So to answer your question:
I want the string to be encoded as how it´s done in https://www.base64encode.org. Any ideas how to get this done?
You should do:
var encodedString = Convert.ToBase64String(Encoding.UTF8.GetBytes("Row1\\r\\n\n\\nRow2"));
But that's probably not what you actually want. Your first attempt at the C# code is more likely to be desired, because that is actually a carriage return character, followed by 3 new line characters. The string you entered in https://www.base64encode.org is simply the backslash character followed by the letter r (or n).
You can't really make the output on https://www.base64encode.org match the C# output, because you can only choose one kind of line separator on there. You can only either encode Row1\r\n\r\n\r\nRow2 or Row\n\n\nRow2. Nevertheless, you can check that the C# result is correct by decoding the output using https://www.base64decode.org.
The \r\n will be encoded on the website, this is not a newline, these are 4 characters. There is this newline-separator-checkbox, to say you want the windows style, to convert your real world input value:
Row1
Row2.
I guess your \r\n\n\n is just a mistake, the website is prepared to convert it to \r\n\r\n only.

Non-Unicode to unicode conversion of a txt file

Given a txt file with non-unicode text, I am able to detect its charset as 1251. Now, I would like to convert into unicode.
byte[] bytes1251 = Encoding.GetEncoding(1251).GetBytes(File.ReadAllText("sampleNU.txt"));
String str = Encoding.UTF8.GetString(bytes1251);
This doesn't work.
Is this the way to go about it for non-unicode to unicode conversion?
After trying the suggested approach on the RTF file, I get the below dialog when I try to open the output RTF file. Please let me know what to do because selecting Unicode doesn't make it readable or give the expected text?
// load as charset 1251
string text = File.ReadAllText("sampleNU.txt", Encoding.GetEncoding(1251));
// save as Unicode
File.WriteAllText("sampleU.txt", text, Encoding.Unicode);

RichTextBox not showing formfeed character

I have a text file that, when I open it in Notepad, shows the form feed character (byte 12). I want to show this character in my richtextbox but no matter which encoding I use when I read the text file it won't show. When I enter the character myself it shows. When I do myRTB.Text = "♀" it shows, but when I do
myRTB.Text = File.ReadAllText(myFileName.txt);
it doesn't show. I've also tried using the readers in the Encoding class to no avail.
How can I show the form feed character in my rtb?
Firstly, a line feed has a value of 13. If you have characters with the value 12 in there then they are not line feeds.
As for your issue, ReadAllLines reads the lines of a file into a String array, thus stripping out all the line breaks. You might do as Damith suggests and call ReadAllText, which reads the file contents as a single String, and assign the result to the Text property or else call ReadAllLines and assign the result to the Lines property. Better to call LoadFile on the RichTextBox itself though.
try with ReadAllText
myRTB.Text = File.ReadAllText(myFileName.txt, Encoding.Unicode);
Thanks for the help #jmcilhinney and #Damith. I ended up cheating the system by doing a dirty. I saw that myRTB was replacing the form feed char with \page in the RTF, but when I typed the form feed char myself it put \u9792. Therefore I went with the hack:
myRTB.Rtf = myRTB.Rtf.Replace("\\page", "\\u9792");
If you have something less hackish that I can get working please let me know.

C# - ReadLine() in a textbox

I'm sorry for asking noobish questions, but I am one :).
I can write a .txt file using Write or WriteLine, which reads the whole TextBox. The problem is when I read it. I can not read it using ReadLine. It gives the whole text on one line. It must be a problem with the reading, because in NotePad, I get the file correctly.
What is the reason of this quite strange behavior, and how can I change it?
method containing StreamReader
StreamReader streamreader = new StreamReader(openfiledialog.FileName);
textbox.Text = "";
while (!streamreader.EndOfStream)
{
string read_line = streamreader.ReadLine();
textbox.Text += read_line + "\n";
}
streamreader.Close();
method containing StreamWriter
StreamWriter streamwriter = new StreamWriter(savefiledialog.FileName);
streamwriter.Write(textbox.Text);
streamwriter.Close();
Thanks in advance.
UPDATED: ReadToEnd worked
Without seeing any code the best guess I have is you're using different line separators between the textbox and the text file.
I'd guess you either need to format the data to make sure the data gets the right separator for the source, or change the newline separator for the textbox.
Couple of possibilities here.
The text in the File is not UTF-8, so it needs to be converted to UTF-8 and then assigned to the text box.
The Textbox has a character limit that needs to be increased
Width of Text Box. Wrapping of text could make a difference.
Usually you would use ReadToEnd if you want the whole file worth of text in one run and ReadLine if you want 1 line. Difference here is in the encoding of the file. 1 Line in a text editor could be different from another. Some Text Editor convert the text to other encodings and some do not, before displaying. Would recommed Notepad++, because it will tell you at the bottom what encoding the file is in and let you change the encoding and save the file for testing.
.net is based on UTF-8 encoding for strings, so a difference in ecoding of text could make a big difference.
Best of Luck

Read a file with unicode characters

I have an asp.net c# page and am trying to read a file that has the following charater ’ and convert it to '. (From slanted apostrophe to apostrophe).
FileInfo fileinfo = new FileInfo(FileLocation);
string content = File.ReadAllText(fileinfo.FullName);
//strip out bad characters
content = content.Replace("’", "'");
This doesn't work and it changes the slanted apostrophes into ? marks.
I suspect that the problem is not with the replacement, but rather with the reading of the file itself. When I tried this the nieve way (using Word and copy-paste) I ended up with the same results as you, however examining content showed that the .Net framework believe that the character was Unicode character 65533, i.e. the "WTF?" character before the string replacement. You can check this yourself by examining the relevant character in the Visual Studio debugger, where it should show the character code:
content[0]; // 65533 '�'
The reason why the replace isn't working is simple - content doesn't contain the string you gave it:
content.IndexOf("’"); // -1
As for why the file reading isn't working properly - you are probably using the wrong encoding when reading the file. (If no encoding is specified then the .Net framework will try to determine the correct encoding for you, however there is no 100% reliable way to do this and so often it can get it wrong). The exact encoding you need depends on the file itself, however in my case the encoding being used was Extended ASCII, and so to read the file I just needed to specify the correct encoding:
string content = File.ReadAllText(fileinfo.FullName, Encoding.GetEncoding("iso-8859-1"));
(See this question).
You also need to make sure that you specify the correct character in your replacement string - when using "odd" characters in code you may find it more reliable to specify the character by its character code, rather than as a string literal (which may cause problems if the encoding of the source file changes), for example the following worked for me:
content = content.Replace("\u0092", "'");
My bet is the file is encoded in Windows-1252. This is almost the same as ISO 8859-1. The difference is Windows-1252 uses "displayable characters rather than control characters in the 0x80 to 0x9F range". (Which is where the slanted apostrophe is located. i.e. 0x92)
//Specify Windows-1252 here
string content = File.ReadAllText(fileinfo.FullName, Encoding.GetEncoding(1252));
//Your replace code will then work as is
content = content.Replace("’", "'");
// This should replace smart single quotes with a straight single quote
Regex.Replace(content, #"(\u2018|\u2019)", "'");
//However the better approach seems to be to read the page with the proper encoding and leave the quotes alone
var sreader= new StreamReader(fileInfo.Create(), Encoding.GetEncoding(1252));
If you use String (capitalized) and not string, it should be able to handle any Unicode you throw at it. Try that first and see if that works.

Categories

Resources