Saving unicode characters as "“" into a file; C#

Saving unicode characters as "“" into a file; C# - c#

I have an xml file (converted from xfdl) which contains something like:
<custom:popUp xfdl:compute="toggle(activated,'off','on') == '1' ? viewer.messageBox('o Once you click ..... page.
o When you use the “Create ” function in.......Portal.','Information'):''">
I load it and save it using...
XmlDocument xmlOut = new XmlDocument(); //note: not read only
FileStream outfs = new FileStream(tempOutXmlFileName, FileMode.Open, FileAccess.Read,
FileShare.ReadWrite);
xmlOut.Load(outfs);
xmlOut.Save(tempOutXmlFileName);
outfs.Close();
This process converts some of the unicode instructions into actual characters which completely messes up the xml/xfdl parsing as there are now quotation marks where quotation marks shouldn't be.
Does anybody know a way I can save the file with all the lovely “ characters intact?
Thank you.
Well, after fiddling around for a bit and getting the xml->xfdl conversion working better, I ran into a new problem.
The solution below seems to work and all the parsing of the xml is correct, but the program to read the xfdl file doesn't seem to like when I encode it using UTF-8 and wants the encoding to be ISO-8859-1.
Any ideas?

Using StreamReader and StreamWriter should help. To be clear you are trying to read from and write to the same file? I added some nice using statements aswell.
XmlDocument xmlOut = new XmlDocument();
//note: not read only
using (FileStream outfs = new FileStream(tempOutXmlFileName, FileMode.Open, FileAccess.Read, FileShare.ReadWrite))
using (StreamReader reader = new StreamReader(outfs, Encoding.UTF8))
{
xmlOut.Load(reader);
}
using (StreamWriter writer = new StreamWriter(tempOutXmlFileName, false, Encoding.UTF8))
{
xmlOut.Save(writer);
}
I set append to false in the StreamWriter, seems to make sense.

Turns out that reading and writing the files byte by byte solved the problem since the writer never got the opportunity to do any interpretation on the content.

Related

How to remove special character or unwanted symbols from the file written JSON format in C# UNity?

The JSON format when I read after writing is as follows
�[{"SaveValues":[{"id":1,"allposition":{"x":-5.12429666519165,"y":4.792403697967529},"allrotation":{"x":0.0,"y":0.0,"z":0.0,"w":1.0},"allscale":{"x":1.0,"y":1.0},"linepos0":{"x":0.0,"y":0.0,"z":0.0},"linepos1":{"x":0.0,"y":0.0,"z":0.0},"movetype":1},{"id":1,"allposition":{"x":-4.788785934448242,"y":-3.4373996257781984},"allrotation":{"x":0.0,"y":0.0,"z":0.0,"w":1.0},"allscale":{"x":1.0,"y":1.0},"linepos0":{"x":0.0,"y":0.0,"z":0.0},"linepos1":{"x":0.0,"y":0.0,"z":0.0},"movetype":1}],"NoteValues":[{"movenumber":1,"notemsg":"Move One"}]},{"SaveValues":[{"id":1,"allposition":{"x":-5.12429666519165,"y":4.792403697967529},"allrotation":{"x":0.0,"y":0.0,"z":0.0,"w":1.0},"allscale":{"x":1.0,"y":1.0},"linepos0":{"x":0.0,"y":0.0,"z":0.0},"linepos1":{"x":0.0,"y":0.0,"z":0.0},"movetype":2},{"id":1,"allposition":{"x":-4.788785934448242,"y":-3.4373996257781984},"allrotation":{"x":0.0,"y":0.0,"z":0.0,"w":1.0},"allscale":{"x":1.0,"y":1.0},"linepos0":{"x":0.0,"y":0.0,"z":0.0},"linepos1":{"x":0.0,"y":0.0,"z":0.0},"movetype":2},{"id":2,"allposition":{"x":5.185188293457031,"y":4.803859233856201},"allrotation":{"x":0.0,"y":0.0,"z":0.0,"w":1.0},"allscale":{"x":1.0,"y":1.0},"linepos0":{"x":0.0,"y":0.0,"z":0.0},"linepos1":{"x":0.0,"y":0.0,"z":0.0},"movetype":2},{"id":2,"allposition":{"x":5.154441833496094,"y":-4.023111343383789},"allrotation":{"x":0.0,"y":0.0,"z":0.0,"w":1.0},"allscale":{"x":1.0,"y":1.0},"linepos0":{"x":0.0,"y":0.0,"z":0.0},"linepos1":{"x":0.0,"y":0.0,"z":0.0},"movetype":2}],"NoteValues":[{"movenumber":2,"notemsg":"Move Two"}]}]
The code I use for saving to JSON format is given below.
ListContainer container = new ListContainer(getAlldata,playerNotes);
var temp = container;
//--Adding data in container into List<string> jsonstring
jsonstring.Add(JsonUtility.ToJson(temp));
//--Combing list of string into a single string
string jsons = "[" +string.Join(",", jsonstring)+"]";
//Writing into a JSON file in the persistent path
using (FileStream fs = new FileStream( Path.Combine(Application.persistentDataPath , savedName+".json"), FileMode.Create))
{
BinaryWriter filewriter = new BinaryWriter(fs);
filewriter.Write(jsons);
fs.Close();
}
Here I am looking to remove the special character that came at the starting point of the JSON format.
I am trying to read the JSON by using the following code
using (FileStream fs = new FileStream(Application.persistentDataPath + "/" + filename, FileMode.Open))
{
fs.Dispose();
string dataAsJson = File.ReadAllText(Path.Combine(Application.persistentDataPath, filename));
Debug.Log("DataJsonRead - - -" + dataAsJson);
}
I am getting an error - ArgumentException: JSON parse error: Invalid value.
How to remove that special or unwanted symbol from the starting ?I think it is something to do with writing the file into the directory.While trying to save with other methods I did not find any character or symbols.

� is the Unicode Replacement character, emitted when there's an attempt to read text as if they were encoded with a single-byte codepage using the wrong codepage. It's not a BOM - File.ReadAllText would recognize it and use it to load the rest of the file using the correct encoding. This means there's garbage at the start.
The problem is caused by the inappropriate use of BinaryWriter. That class is used to write fields of primitive types in a binary format to a stream. For variable length types like stings, the first byte(s) contain the field length.
This code :
using var ms=new MemoryStream();
using( BinaryWriter writer = new BinaryWriter(ms))
{
writer.Write(new String('0',3));
}
var b=ms.ToArray();
Produces
3, 48,48,48
Use StreamWriter or File.WriteAllText instead. The default encoding used is UTF8 so there's no need to specify an encoding or try to change anything :
using (FileStream fs = new FileStream( Path.Combine(Application.persistentDataPath , savedName+".json"), FileMode.Create))
using(var writer=new StreamWriter(fs))
{
writer.Write(jsons);
}
or
var path=Path.Combine(Application.persistentDataPath , savedName+".json")
using(var writer=new StreamWriter(path))
{
writer.Write(jsons);
}

first add this to your .cs file
using System.Text.RegularExpressions;
then we can do this with RegEx as follows.
varName = Regex.Replace(SaveValues, "[-*]", "");
This will look for the - symbol and remove it from your string.
Hope this helps.

Encoding string issue reading a CSV file in C#

I am currently developing a Windows Phone 8 application in which one I have to download a CSV file from a web-service and convert data to a C# business object (I do not use a library for this part).
Download the file and convert data to a C# business object is not an issue using RestSharp.Portable, StreamReader class and MemoryStream class.
The issue I face to is about the bad encoding of the string fields.
With the library RestSharp.Portable, I retrieve the csv file content as a byte array and then convert data to string with the following code (where response is a byte array) :
using (var streamReader = new StreamReader(new MemoryStream(response)))
{
while (streamReader.Peek() >= 0)
{
var csvLine = streamReader.ReadLine();
}
}
but instead of "Jérome", my csvLine variable contains J�rome. I tried several things to obtain Jérome but without success like :
using (var streamReader = new StreamReader(new MemoryStream(response), true))
or
using (var streamReader = new StreamReader(new MemoryStream(response), Encoding.UTF8))
When I open the CSV file with a simple notepad software like notepad++ I obtain Jérome only when the file is encoding in ANSI. But if I try the following code in C# :
using (var streamReader = new StreamReader(new MemoryStream(response), Encoding.GetEncoding("ANSI")))
I have the following exception :
'ANSI' is not a supported encoding name.
Can someone help me to decode correctly my CSV file ?
Thank you in advance for your help or advices !

You need to pick one of these.
https://msdn.microsoft.com/en-us/library/windows/desktop/dd317756(v=vs.85).aspx
If you don't know, you can try to guess it. Guessing isn't a perfect solution, per the answer here.
You can't detect the codepage, you need to be told it. You can analyse the bytes and guess it, but that can give some bizarre (sometimes amusing) results.

From the link of Lawtonfogle I tried to use
using (var streamReader = new StreamReader(new MemoryStream(response), Encoding.GetEncoding("Windows-1252")))
But I had the following error :
'Windows-1252' is not a supported encoding name.
Searching why on the internet, I finally found following thread with the following answer that works for me.
So here the working solution in my case :
using (var streamReader = new StreamReader(new MemoryStream(response), Encoding.GetEncoding("ISO-8859-1")))
{
while (streamReader.Peek() >= 0)
{
var csvLine = streamReader.ReadLine();
}
}

Convert a file froM Shift-JIS to UTF8 No BOM without re-reading from disk

I am dealing with files in many formats, including Shift-JIS and UTF8 NoBOM. Using a bit of language knowledge, I can detect if the files are being interepeted correctly as UTF8 or ShiftJIS, but if I detect that the file is not of the type I read in, I was wondering if there is a way to just reinterperet my in-memory array without having to re-read the file with a new encoding specified.
Right now, I read in the file assuming Shift-JIS as such:
using (StreamReader sr = new StreamReader(path, Encoding.GetEncoding("shift-jis"), true))
{
String line = sr.ReadToEnd();
// Detection must be done AFTER you read from the file. Silly rabbit.
fileFormatCertain = !sr.CurrentEncoding.Equals(Encoding.GetEncoding("shift-jis"));
codingFromBOM = sr.CurrentEncoding;
}
and after I do my magic to determine if it is either a known format (has a BOM) or that the data makes sense as Shift-JIS, all is well. If the data is garbage though, then I am re-reading the file via:
using (StreamReader sr = new StreamReader(path, Encoding.UTF8))
{
String line = sr.ReadToEnd();
}
I am trying to avoid this re-read step and reinterperet the data in memory if possible.
Or is magic already happening and I am needlessly worrying about double I/O access?

var buf = File.ReadAllBytes(path);
var text = Encoding.UTF8.GetString(buf);
if (text.Contains("\uFFFD")) // Unicode replacement character
{
text = Encoding.GetEncoding(932).GetString(buf);
}

how to read special character like é, â and others in C#

I can't read those special characters
I tried like this
1st way #
string xmlFile = File.ReadAllText(fileName);
2nd way #
FileStream fs = new FileStream(fileName, FileMode.Open, FileAccess.Read);
StreamReader r = new StreamReader(fs);
string s = r.ReadToEnd();
But both statements don't understand those special characters.
How should I read?
UPDATE ###
I also try all encoding with
string xmlFile = File.ReadAllText(fileName, Encoding. );
but still don't understand those special characters.

There is no such thing as "special character". What those likely are is extended ascii characters from the latin1 set (iso-8859-1).
You can read those by supplying encoding explicitly to the stream reader (otherwise it will assume UTF8)
using (StreamReader r = new StreamReader(fileName, Encoding.GetEncoding("iso-8859-1")))
r.ReadToEnd();

StreamReader sr = new StreamReader(stream, Encoding.UTF8)

This worked for me :
var json = System.IO.File.ReadAllText(#"././response/response.json" , System.Text.Encoding.GetEncoding("iso-8859-1"));

You have to tell the StreamReader that you are reading Unicode like so
StreamReader sr = new StreamReader(stream, Encoding.Unicode);
If your file is of some other encoding, specify it as the second parameter

I had to "find" the encoding of the file first
//try to "find" the encoding, if not found, use UTF8
var enc = GetEncoding(filePath)??Encoding.UTF8;
var text = File.ReadAllText(filePath, enc );
(please refer to this answer to get the GetEncoding function)

If you can modify the file in question, you can save it with encoding.
I had a json file that I had created (normally) in VS, and I was having the same problem. Rather than specify the encoding when reading the file (I was using System.IO.File.ReadAllText which defaults to UTF8), I resaved the file (File->Save As) and on the Save button, I clicked the arrow and chose "Save with Encoding", then chose "Unicode (UTF-8 with signature) - Codepage 65001".
Problem solved, no need to specify the encoding when reading the file.

Writing XML files using XmlTextWriter with ISO-8859-1 encoding

I'm having a problem writing Norwegian characters into an XML file using C#. I have a string variable containing some Norwegian text (with letters like æøå).
I'm writing the XML using an XmlTextWriter, writing the contents to a MemoryStream like this:
MemoryStream stream = new MemoryStream();
XmlTextWriter xmlTextWriter = new XmlTextWriter(stream, Encoding.GetEncoding("ISO-8859-1"));
xmlTextWriter.Formatting = Formatting.Indented;
xmlTextWriter.WriteStartDocument(); //Start doc
Then I add my Norwegian text like this:
xmlTextWriter.WriteCData(myNorwegianText);
Then I write the file to disk like this:
FileStream myFile = new FileStream(myPath, FileMode.Create);
StreamWriter sw = new StreamWriter(myFile);
stream.Position = 0;
StreamReader sr = new StreamReader(stream);
string content = sr.ReadToEnd();
sw.Write(content);
sw.Flush();
myFile.Flush();
myFile.Close();
Now the problem is that in the file on this, all the Norwegian characters look funny.
I'm probably doing the above in some stupid way. Any suggestions on how to fix it?

Why are you writing the XML first to a MemoryStream and then writing that to the actual file stream? That's pretty inefficient. If you write directly to the FileStream it should work.
If you still want to do the double write, for whatever reason, do one of two things. Either
Make sure that the StreamReader and StreamWriter objects you use all use the same encoding as the one you used with the XmlWriter (not just the StreamWriter, like someone else suggested), or
Don't use StreamReader/StreamWriter. Instead just copy the stream at the byte level using a simple byte[] and Stream.Read/Write. This is going to be, btw, a lot more efficient anyway.

Both your StreamWriter and your StreamReader are using UTF-8, because you're not specifying the encoding. That's why things are getting corrupted.
As tomasr said, using a FileStream to start with would be simpler - but also MemoryStream has the handy "WriteTo" method which lets you copy it to a FileStream very easily.
I hope you've got a using statement in your real code, by the way - you don't want to leave your file handle open if something goes wrong while you're writing to it.
Jon

You need to set the encoding everytime you write a string or read binary data as a string.
Encoding encoding = Encoding.GetEncoding("ISO-8859-1");
FileStream myFile = new FileStream(myPath, FileMode.Create);
StreamWriter sw = new StreamWriter(myFile, encoding);
stream.Position = 0;
StreamReader sr = new StreamReader(stream, encoding);
string content = sr.ReadToEnd();
sw.Write(content);
sw.Flush();
myFile.Flush();
myFile.Close();

As mentioned in above answers, the biggest issue here is the Encoding, which is being defaulted due to being unspecified.
When you do not specify an Encoding for this kind of conversion, the default of UTF-8 is used - which may or may not match your scenario. You are also converting the data needlessly by pushing it into a MemoryStream and then out into a FileStream.
If your original data is not UTF-8, what will happen here is that the first transition into the MemoryStream will attempt to decode using default Encoding of UTF-8 - and corrupt your data as a result. When you then write out to the FileStream, which is also using UTF-8 as encoding by default, you simply persist that corruption into the file.
In order to fix the issue, you likely need to specify Encoding into your Stream objects.
You can actually skip the MemoryStream process entirely, also - which will be faster and more efficient. Your updated code might look something more like:
FileStream fs = new FileStream(myPath, FileMode.Create);
XmlTextWriter xmlTextWriter =
new XmlTextWriter(fs, Encoding.GetEncoding("ISO-8859-1"));
xmlTextWriter.Formatting = Formatting.Indented;
xmlTextWriter.WriteStartDocument(); //Start doc
xmlTextWriter.WriteCData(myNorwegianText);
StreamWriter sw = new StreamWriter(fs);
fs.Position = 0;
StreamReader sr = new StreamReader(fs);
string content = sr.ReadToEnd();
sw.Write(content);
sw.Flush();
fs.Flush();
fs.Close();

Which encoding do you use for displaying the result file? If it is not in ISO-8859-1, it will not display correctly.
Is there a reason to use this specific encoding, instead of for example UTF8?

After investigating, this is that worked best for me:
var doc = new XDocument(new XDeclaration("1.0", "ISO-8859-1", ""));
using (XmlWriter writer = doc.CreateWriter()){
writer.WriteStartDocument();
writer.WriteStartElement("Root");
writer.WriteElementString("Foo", "value");
writer.WriteEndElement();
writer.WriteEndDocument();
}
doc.Save("dte.xml");

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Saving unicode characters as "“" into a file; C# - c#

Turns out that reading and writing the files byte by byte solved the problem since the writer never got the opportunity to do any interpretation on the content.

Related

How to remove special character or unwanted symbols from the file written JSON format in C# UNity?

Encoding string issue reading a CSV file in C#

Convert a file froM Shift-JIS to UTF8 No BOM without re-reading from disk

how to read special character like é, â and others in C#

Writing XML files using XmlTextWriter with ISO-8859-1 encoding

Categories

Resources