Convert file path to UTF-8

Convert file path to UTF-8 - c#

I want to get, print and write to a text file the full path on disk of a file named A&T+X-8_L_R1.png but when I print it I get A&T+X-8_L_R1.png.
AFAIK I need to change the encoding. I did a search and found this potential solution but it doesn't work:
String filePathString = relativeUri.ToString();
byte[] bytes = Encoding.Default.GetBytes(filePathString);
filePathString = Encoding.UTF8.GetString(bytes);
filePathNode.SetValue(filePathString);
This is the full code of my class: http://pastebin.com/dZLGeS8p
The class searches recursively for *.png files and creates an XML structure from their paths. When I save the XML file the special characters from the paths like & are changed.
Can anyone point me to a solution?

You are writing an XML file, not a plain text file. In XML, an ampersand needs to be escaped to &.
So the result you get is perfectly ok. It's even required to be like this.
I recommend to open the XML file with an application that can properly validate and display XML. It'll be easier to see that the file is correct.
The UTF-8 conversion in your code isn't required. If the XML file is encoded in UTF-8, your XML classes will take care of any required conversions.

Related

Detect special symbols in c#

I'm working on a c# project in which some data contains characters which are not recognised by the encoding.
They are displayed like that:
"Some text � with special � symbols in it".
I have no control over the encoding process, also data come from files of various origins and various formats.
I want to be able to flag data that contains such characters as erroneous or incomplete. Right now I am able to detect them this way:
if(myString.Contains("�"))
{
//Do stuff
}
While it does work, it doesn't feel quite right to use the weird symbol directly in the Contains function. Isn't there a cleaner way to do this ?
EDIT:
After checking back with the team responsible for reading the files, this is how they do it:
var sr = new StreamReader(filePath, true);
var content = sr.ReadToEnd();
Passing true as a second parameter of StreamReader is supposed to detect the encoding from the file's BOM, and use it to read the content. It doesn't always work though, as some files don't bear that information, hence why their data is read incorrectly.
We've made some tests and using StreamReader(filePath, Encoding.Default) instead appears to work for most if not all files we had issues with. Expectedly, files that were working before not longer work because they do not use the default encoding.
So the best solution for us would be to do the following: read the file trying to detect its encoding, then if it wasn't successful read it again with the default encoding.
The problem remains the same though: how do we check, after trying to detect the file's encoding, if data has been read incorrectly ?

The � character is not a special symbol. It's the Unicode Replacement Character. This means that the code tried to convert ASCII text using the wrong codepage. Any characters that didn't have a match in the codepage were replaced with �.
The solution is to read the file using the correct encoding. The default encoding used by the File methods or StreamReader is UTF8. You can pass a different encoding using the appropriate constructor, eg StreamReader(Stream, Encoding, Boolean). To use the system locale's codepage, you need to use Encoding.Default :
var sr = new StreamReader(filePath,Encoding.Default);
You can use the StreamReader(Stream, Encoding, Boolean) constructor to autodetect Unicode encodings from the BOM and fallback to a different encoding.
Assuming the files are either some type of Unicode or match your system locale, you can use:
var sr = new StreamReader(filePath,Encoding.Default, true);
From StreamReader's source shows that the DetectEncoding method will check the first bytes of a file to determine the encoding. If one is found, it is used instead of the supplied encoding. The operation doesn't cause extra IO because the method checks the class's internal buffer

EDIT
I just realized you can't actually load the raw file into a .NET string and still be able to have full information about the original file.
The project here uses the Mlang api which does a better job at not loading the file into a .NET string before guessing. There is also a related SO question

Open file, read as hex and convert it to ASCII?

Is it possible to read a file hex values into c# and output the corresponding ASCII? I can view the file in a hex editor which I can then see the appropriate ASCII next to the hex but rather than manually copying out the parts I need I imagine there is a way of the machine doing it for me in a c# program?
I did find Converting HEX data in a file to ascii but that didn't really help?

It sounds like you just need:
string text = File.ReadAllText("file.txt");
There's no such thing as "hex values" in a file - they're just bytes which are shown as hex in various editors geared towards editing non-text files.
The above line of code will load a text file, decoding it as UTF-8 - which is compatible with ASCII, so if your file is truly ASCII, it should be fine. If you need to specify a different encoding, you can do it with an overload, e.g.
// Load an ISO-8859-1 file
string text = File.ReadAllText("file.txt", Encoding.GetEncoding(28591));

C#: Load *.txt to RichTextBox and convert into UTF8

I want to open text files and load them into a RichTextBox. This has been going fine so far, but now I'm struggling with an encoding issue.
So I used the GetType() method from this StackOverflow page:
How to find out the Encoding of a File? C#
- and it returns "System.Text.UnicodeEncoding".
My questions now are:
How do I convert Unicode (I guess that's what they are, although I haven't double checked) into UTF8 (and possibly backwards)?
Can I switch the RichTextBox to display Unicode correctly? The following shows awkward results: rtb.LoadFile(aFile, RichTextBoxStreamType.PlainText);
How can I define which encoding a SaveFileDialog should use?

Instead of having the RichTextBox load the file from the disk, load it yourself, while specifying the correct encoding. (By the way, Encoding.Unicode is just a synonym for "UTF-16 little-endian".)
string myText = File.ReadAllText(myFilePath, Encoding.Unicode);
This will take care of the conversion for you. The string you get is encoded "correctly" (i.e. in the format used internally by .NET), so you can just assign it to the Text property of your RichTextBox.
About your third question: The SaveFileDialog is just a tool that lets the user choose a file name. What you do with the file name (like: save some text into it, or encode some string and then save it) has nothing to do with the SaveFileDialog.

The SaveFileDialog just allow you to choose the path where the file will be saved. It doesn't save it for you..
Use Encoding class to convert from an encoding to another.
And read this article for some example on how to convert and write it to a file.

You can also use:
richTextBox.LoadFile(filePath, RichTextBoxStreamType.UnicodePlainText);

Special Charcter Issue in XML Creation using SQLXMLBULKLOADLib - C#

I am trying to load csv file using SQLXMLBULKLOADLib which first converts csv to xml and then maps it to database model. My cvs file contains special character. When SQLXMLBULKLOADLib loads it in XML, the special characters are converted to different representation like , ,  etc. I am not aware of what , ,  is. How to handle this in XML and SQL Server? I need to show the exact special character available in csv file.

I got answer from http://social.msdn.microsoft.com/Forums/en-au/sqlxml/thread/5c46314d-ec4c-4ec2-91c9-7ceb466af3c6

Reading only XML from a text file which contains text, binary, and XML data?

I have a text file (.txt) which has text data, binary data, and XML data all mixed together within it. I've googled around for a few minutes and cannot figure out how to only extract the XML from this text file. Can the good users of SO offer some suggestion?
I'm using C# 4.0.
Since I cannot simply load the text file into an XDocument, I've been messing with regex, but this approach is getting me no where.

First of all, file can't be text and binary simultaneously: if it contains binary data, it's binary file. But from your description seems like it's a text file with some binary data in text-encoded form.
If you know what root tag name is then use substring search to locate start and end of xml document, "cut" it, and then you can process it in any way you want.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Convert file path to UTF-8 - c#

Related

Detect special symbols in c#

Open file, read as hex and convert it to ASCII?

C#: Load *.txt to RichTextBox and convert into UTF8

Special Charcter Issue in XML Creation using SQLXMLBULKLOADLib - C#

Reading only XML from a text file which contains text, binary, and XML data?

Categories

Resources