Given a txt file with non-unicode text, I am able to detect its charset as 1251. Now, I would like to convert into unicode.
byte[] bytes1251 = Encoding.GetEncoding(1251).GetBytes(File.ReadAllText("sampleNU.txt"));
String str = Encoding.UTF8.GetString(bytes1251);
This doesn't work.
Is this the way to go about it for non-unicode to unicode conversion?
After trying the suggested approach on the RTF file, I get the below dialog when I try to open the output RTF file. Please let me know what to do because selecting Unicode doesn't make it readable or give the expected text?
// load as charset 1251
string text = File.ReadAllText("sampleNU.txt", Encoding.GetEncoding(1251));
// save as Unicode
File.WriteAllText("sampleU.txt", text, Encoding.Unicode);
Related
.Net 4.5 Framework.
I have a string:
string input = "abcdqw\0asdv\0aaa";
Is there any way to display the string in a richtextbox like
abcdqwasdvaaa
and when I save it to a .txt file then open by notepad++, it is
abcdqw[nul]asdv[nul]aaa
???
When I display it in a normal way as
richTextBox.Text = input;
the output is just
abcdqw
You can make a RichTextBox load a file that has ASCII nul's in it:
yourRichTextBox.LoadFile(#"C:\path\to\file\with\nulls.txt", RichTextBoxStreamType.PlainText);
But you can't do it by setting its text property. I presume this is because that (ultimately) is managed via windows message calls (WM_SETTEXT)which will cut off at the first ascii nul encountered
I haven't tried a null character in a RichTextBox, however I guess you are getting a truncated string on display. If that is the case, the solution should be as easy as
var input = "abcdqw\0asdv\0aaa";
var displayResult = input.Replace("\0","");
\0 is a "null character".
It seems rich textbox is truncating the string at \0, use like this
string input = "abcdqw\0asdv\0aaa";
var cleaned = input.Replace("\0", string.Empty);
richTextBox.Text = cleaned;
I have a PDF that has a string that is in the catalog portion of the PDF file. I need to read that string.
With iTextSharp 5 I was able to read the catalog and pull out the string.
I am now limited to another library (Syncfusion) and in that library the catalog is marked as private and I do not have access to it.
I am able to "open" the PDF in Notepad++ and I can see the string as plain text. I need to programatically open that file and retrieve that string. Using ReadAllBytes I can read the file but then am at a loss as to how to search it for a specific string.
Any suggestions or examples that I can explore would be appreciated.
If you know the encoding of the text, you could always convert the raw bytes to a string and then use a Regex to find what you need.
Here's an example of that:
var bytes = File.ReadAllBytes("example.pdf");
string pdfStr = Encoding.UTF8.GetString(bytes); //for UTF8
Regex pdfReg = new Regex(...); //the regex for finding your string
string pdfSubstring = pdfReg.Match(pdfStr); //the string you needed
C# Regex Reference
I need to write a string from c# into an rtf file, but having weird problems.
To write the text I simply use
string fileName = System.IO.Path.GetTempPath() + Guid.NewGuid().ToString() + ".rtf";
System.IO.File.WriteAllText(fileName, body);
body is a string variable, that is filled from a varchar column from a database.
The problem is with the character é which is wrong displayed by wordpad when opening the file like this
If I open the file in notepad, I see this
(één schade gevonden -> ander dossier)
So for some dark reason wordpad decided to show the character é all messed up like this.
I tried writing the file as UTF8 or other unicode encodings, but then wordpad refused to see this file as rtf and just shows the plain text with all the tags
I also looked at this page where it tells me to write a tag like \uXXX? where XXX should be a number defining a Unicode UTF-16 code unit number.
But I cannot find what number to use, or any good example on how to do this.
Actually I am not even sure if its unicode related, the character é is not even a character that needs unicode in my mind, could be wrong off course.
Anyway, does anyone knows how to solve this problem ?
I just need a way to make wordpad not mess up the character é on display and on print.
The problem was that I did not encoded the RTF file properly.
Using this link provided by Filburt I managed to encode the RTF file correct like this.
var iso = Encoding.GetEncoding("ISO-8859-1");
string fileName = System.IO.Path.GetTempPath() + Guid.NewGuid().ToString() + ".rtf";
System.IO.File.WriteAllText(fileName, body, iso);
I'm trying to read a text file that has readable and unreadable characters. It opens easily in any text editor. Most of the text characters are unknown characters and the part I want to change is readable.
The file looks like this
readable1 gibberish readable2 gibberish.
I want to change readable2
If I use the following techniques they seem to return only readable1. They do not give the same output as dropping it on a text reader.
readFile(){
string sr=new StreamReader(path);
contents = sr.ReadToEnd();
//or
contents=File.ReadAllText(path);
}
I tried a few encodings ASCII, Unicode, UTF8, UTF32 but nothing seems to match the same output as dragging onto a text editor.
byte[] bytes = System.IO.File.ReadAllBytes(path);
string str = System.Text.Encoding.ASCII.GetString(bytes);
Is there any way to get it to return all the characters and just modify the readable characters?
Is it possible to read a file hex values into c# and output the corresponding ASCII? I can view the file in a hex editor which I can then see the appropriate ASCII next to the hex but rather than manually copying out the parts I need I imagine there is a way of the machine doing it for me in a c# program?
I did find Converting HEX data in a file to ascii but that didn't really help?
It sounds like you just need:
string text = File.ReadAllText("file.txt");
There's no such thing as "hex values" in a file - they're just bytes which are shown as hex in various editors geared towards editing non-text files.
The above line of code will load a text file, decoding it as UTF-8 - which is compatible with ASCII, so if your file is truly ASCII, it should be fine. If you need to specify a different encoding, you can do it with an overload, e.g.
// Load an ISO-8859-1 file
string text = File.ReadAllText("file.txt", Encoding.GetEncoding(28591));