Creating XML in C# with UTF8 encoding

Creating XML in C# with UTF8 encoding - c#

I know that there is a lot of tutorials about this and even answered questions here, but I have problem I'm trying to resolve for hours and I read almost everything here, but this still remains mistery for me. Please help:
I'm creating XML, and it's created, but the problem is that encoding is UTF-16, and it should be UTF-8. This is what I tried so far, but still is UTF-16:
var xmlText = new StringBuilder();
using (var xml = XmlWriter.Create(xmlText))
{
xml.WriteStartDocument();
xml.WriteStartElement("Weather");
if (model.ModuleList[0] != null)
{
foreach (var weather in model.ModuleList)
{
var AddProperty = new Action<XmlWriter, ModuleModel>((a, forc) =>
{
xml.WriteStartElement("Forecast");
a.WriteElementString("Description", forc.Description);
a.WriteElementString("Date", forc.Date.ToString());
a.WriteElementString("MinTemp", forc.Min_Temp.ToString());
a.WriteElementString("MaxTemp", forc.Max_Temp.ToString());
a.WriteElementString("Pressure", forc.Pressure.ToString());
a.WriteElementString("Humidity", forc.Humidity.ToString());
xml.WriteEndElement();
});
AddProperty(xml, weather);
}
}
xml.WriteEndElement();
xml.WriteEndDocument();
}
var xmlresult = xmlText.ToString();
How to set encoding to my XML to UTF-8? Please help...

The result of your code is a string xmlresult - and strings do not have an encoding, they are always Unicode.
You use an encoding when you convert a string to a sequence of byte - so your problem is not in the piece of code you posted, but in the code you use to write that string to a file.
Something like this:
using (StreamWriter writer = new StreamWriter(fileName, true, Encoding.UTF8))
{
writer.Write(xmlresult);
}
will write a UTF-8 file - where filename contains the path of the file.
If you need UTF-8 encoded bytes in memory use:
var utf8Bytes = Encoding.UTF8.GetBytes("xmlresult");

Related

Convert XmlWriter to Stream / char / byte []

I have an asp.net/C#/Blazor environment, where a button generates an XML with a specific class. With XML Writer, I can make the file, and even can save/download it, but it goes to the server-side (It must to be downloaded on client-side. please, do not argue about it).
I know Blazor implemented some instant download (https://learn.microsoft.com/en-us/aspnet/core/blazor/file-downloads?view=aspnetcore-6.0) and it works perfect with blank/new files, but the problem is, I don't know how to "pass" or convert my previously generated XML with XML Writer method, because Blazor method(s) only allow Stream, char or byte Arrays downloads.
When I tried to convert it, the error
Some of my code is:
protected async Task CreateXmlFile(int correlativo,string idDocumento, string siglaDocumento, List<DocumentoXML> documentos = null, List<SignersXML> signersXMLs = null,
List<ContentXMLComplemento> complementos = null,
List<SignersXMLComplemento> signersComplemento = null)
{
_xmlWriterSettings = new XmlWriterSettings
{
Indent = true,
Encoding = new UTF8Encoding(false)
};
string fullPath= "";
XmlWriter writer;
XmlSerializer serializer;
var documentoIngresoRaiz = new DocumentoIngresoRaiz
{
Content_XML = new List<ContentXML>
{
new ContentXML
{
sve_XML = new List<sveXML>
{
new sveXML
{
Documento_XML = documentos
}
}
}
},
Signers_XML = signersXMLs
};
fullPath = $"{mainPath}Ingreso-{correlativo}.xml";
var fileName = $"Ingreso-{correlativo}.xml";
writer = XmlWriter.Create(fullPath, _xmlWriterSettings);
serializer = new XmlSerializer(typeof(DocumentoIngresoRaiz));
serializer.Serialize(writer, documentoIngresoRaiz);
writer.Close();
//I've been trying with these 3 Blazor method lines, to send my xml as stream
var fileStream = new MemoryStream(writer);
using var streamRef = new DotNetStreamReference(stream: fileStream);
await JS.InvokeVoidAsync("downloadFileFromStream", fileName, streamRef);
}
Error CS1503: Argument 1: cannot convert from 'System.Xml.XmlWriter' to 'byte[]'
I've been looking all around StackOverflow and the Internet with no success.
I found some similar posts (I want to download XML file in C# which I created using XmlWriter.Create() on user's system) (How to get a Stream from an XMLWriter?), but they couldn't solve my problem. Any help or tip is welcome. Thank you in advance!

Since there was no way to convert the already generated XML file to byte/stream/char array, I found out that the solution was:
saving this XML file on server-side
and then immediately download it to local machine, via JavaScript code (pasted below), passing the fileURL (location of file on the server) and fileName (name of the file)
await JS.InvokeVoidAsync("triggerFileDownload", fileName, fileURL);
function triggerFileDownload(fileName, url) {
const anchorElement = document.createElement('a');
anchorElement.href = url;
anchorElement.download = fileName ?? '';
anchorElement.click();
anchorElement.remove();
}

How can I read Persian line in csv file c#

I want to read a simple CSV file with comma separated with this code:
var reader = new StreamReader(File.OpenRead(#"d:\34.csv"));
List<string> listA = new List<string>();
List<string> listB = new List<string>();
while (!reader.EndOfStream)
{
var line = reader.ReadLine();
var values = line.Split(',');
listA.Add(values[0]);
listB.Add(values[1]);
}
MessageBox.Show("READ IT!!!");
But when I read the file an debug that code,attention can not read Persian or Arabic character! How can I solve that? I think my file is not valid encoding?

if your CSV file contains just one line the ReadToEnd could be acceptable, but if you have a log file composed by more than one line then it is better to read line by line using ReadLine of the StreamReader object
link for true answer and more information
using (StreamReader sr = new StreamReader("c:/temp/34.csv"))
{
string currentLine;
// currentLine will be null when the StreamReader reaches the end of file
while((currentLine = sr.ReadLine()) != null)
{
// Search, case insensitive, if the currentLine contains the searched keyword
if(currentLine.IndexOf("I/RPTGEN", StringComparison.CurrentCultureIgnoreCase) >= 0)
{
Console.WriteLine(currentLine);
}
}
}
More information

You can create a class composed of get and set for each line of the CSV . You can then instantiate an object list to retrieve the CSV lines.
Try this way :
class Program
{
static void Main(string[] args)
{
var reader = new StreamReader(File.OpenRead(#"YourCSV"),Encoding.Unicode);
List<Customer> customer = new List<Customer>();
while (!reader.EndOfStream)
{
Customer c = new Customer
{
m_line1 = null,
m_line2 = null,
};
var line = reader.ReadLine();
var tokens = line.Split(',');
c.m_line1 = tokens[0];
c.m_line2 = tokens[1];
customer.Add(c);
}
foreach(var s in customer)
{
Console.Writline(s);
Console.Readline();
}
}
}
class Customer
{
private string line1;
public string m_line1
{
get
{
return line1;
}
set
{
line1= value;
}
}
private string line2;
public string m_line2
{
get
{
return line2;
}
set
{
line2= value;
}
}

You will have to pass the character encoding to the StreamReader constructor. There is no such thing as plain text. Reading text requires knowing its encoding.
The line
using (StreamReader sr = new StreamReader("c:/temp/34.csv"))
should be
using (StreamReader sr = new StreamReader("c:/temp/34.csv"), myencoding)
what myencoding is is something only you can know. With what encoding was the file saved? That's the encoding you need there. If the file was generated on Windows, and educated guess of the most likely encoding would be it is UTF-16LE. That encoding is available as Encoding.Unicode - which is a bad name, it should have been Encoding.UTF16LE, but that's the name the .NET framework uses.
Other possible encodings that are supported by StreamReader are listed on https://msdn.microsoft.com/en-us/library/System.Text.Encoding_properties(v=vs.110).aspx
If you don't know with what encoding the file was saved, some encodings leave hints in the form of a Byte order mark sometimes abbreviated to BOM. A byte order mark are the first few bytes of a text document that tell you its encoding. You can find more information on the byte order mark, and some of its values on http://en.wikipedia.org/wiki/Byte_order_mark
Relying on the BOM is generally a bad idea, because
it's not a full-proof solution: some encodings don't use a BOM, or make the BOM optional
Even if you successfully determine the encoding, that doesn't mean that StreamReader knows how to handle that encoding (though this is unlikely, but possible)
the BOM might not be a BOM at all, but be part of the actual text (also unlikely but possible)
In some cases it is impossible to know the encoding of a file, notably if the file comes from a file upload on the web, or if someone just mailed you the file, and they don't know how they encoded it. This can be a good reason not to allow "plain text" uploads (which is reasonable because, it can do with a little repetition, there is no such thing as plain text).
tl;dr: The most likely thing to work is one of
using (StreamReader sr = new StreamReader(File.OpenRead(#"c:/temp/34.csv"),Encoding.Unicode) {
...
}
or
using (StreamReader sr = new StreamReader(File.OpenRead(#"c:/temp/34.csv"),Encoding.UTF8)
or
using (StreamReader sr = new StreamReader(File.OpenRead(#"c:/temp/34.csv"),Encoding.UTF32)

Irregular characters added to beginning of file

I'm following Microsoft's tutorial on creating and writing to a simple file and I'm getting unexpected results.
http://msdn.microsoft.com/en-us/library/36b93480%28v=vs.110%29.aspx
Instead of writing a series of numbers to a file, I'm actually writing XML text to a file. But it's adding "Ł" to the very beginning and I don't know why.
Here's the code:
public static void CreateFile(string xml)
{
var dateStamp = DateTime.Now.ToString("yyyy-MM-dd");
var fileName = "file_" + dateStamp + ".xml";
if (File.Exists(fileName))
{
Console.WriteLine("File already exists.");
return;
}
using (FileStream fileStream = new FileStream(fileName, FileMode.CreateNew))
{
using (BinaryWriter writer = new BinaryWriter(fileStream))
{
writer.Write(xml);
}
}
}

When you read the manual for BinaryWriter.Write(string), it reads:
Writes a length-prefixed string to this stream…
So the “inappropriate” character is in fact the lenght of the string.
You should use a TextWriter-based writer instead (such as StreamWriter), or any other available method for outputting text files.
Also, you should pay attention to the encoding of the text. Specifically, when you are trying to output an XML, then if you had constructed it using .NET's XML manipulation means, and had it written into a string, then the <?xml?> directive will likely refer to utf-16 encoding. This is because .NET's strings use two-byte characters. Hence when dealing with XML, it is always better to use .NET's native means for serializing XML into text output (see e.g. XmlWriter). Only then the encoding will be correctly specified in the <?xml?> directive for sure.

That's because you are using a BinaryWriter to write the data to the file. It will write the string in a way that it can be read later, so it will write the string length first to the file, then the string data.
Just write the file as a text file instead. You can use a StreamWriter, or simply use one of the static helper methods in the File class that opens, writes and closes the file for you:
File.WriteAllText(fileName, xml);

This happens if you use a BinaryWriter. If you change it to a StreamWriter this problem goes away.

This is because the BinaryWriter adds the length (as int) of the writing string before.
public static void CreateFile(string xml)
{
var dateStamp = DateTime.Now.ToString("yyyy-MM-dd");
var fileName = "file_" + dateStamp + ".xml";
if (File.Exists(fileName))
{
Console.WriteLine("File already exists.");
return;
}
using (FileStream fileStream = new FileStream(fileName, FileMode.CreateNew))
{
using (StreamWriter writer = new StreamWriter(fileStream))
{
writer.Write(xml);
}
}
}

How to read an entire file to a string using C#?

What is the quickest way to read a text file into a string variable?
I understand it can be done in several ways, such as read individual bytes and then convert those to string. I was looking for a method with minimal coding.

How about File.ReadAllText:
string contents = File.ReadAllText(#"C:\temp\test.txt");

A benchmark comparison of File.ReadAllLines vs StreamReader ReadLine from C# file handling
Results. StreamReader is much faster for large files with 10,000+
lines, but the difference for smaller files is negligible. As always,
plan for varying sizes of files, and use File.ReadAllLines only when
performance isn't critical.
StreamReader approach
As the File.ReadAllText approach has been suggested by others, you can also try the quicker (I have not tested quantitatively the performance impact, but it appears to be faster than File.ReadAllText (see comparison below)). The difference in performance will be visible only in case of larger files though.
string readContents;
using (StreamReader streamReader = new StreamReader(path, Encoding.UTF8))
{
readContents = streamReader.ReadToEnd();
}
Comparison of File.Readxxx() vs StreamReader.Readxxx()
Viewing the indicative code through ILSpy I have found the following about File.ReadAllLines, File.ReadAllText.
File.ReadAllText - Uses StreamReader.ReadToEnd internally
File.ReadAllLines - Also uses StreamReader.ReadLine internally with the additionally overhead of creating the List<string> to return as the read lines and looping till the end of file.
So both the methods are an additional layer of convenience built on top of StreamReader. This is evident by the indicative body of the method.
File.ReadAllText() implementation as decompiled by ILSpy
public static string ReadAllText(string path)
{
if (path == null)
{
throw new ArgumentNullException("path");
}
if (path.Length == 0)
{
throw new ArgumentException(Environment.GetResourceString("Argument_EmptyPath"));
}
return File.InternalReadAllText(path, Encoding.UTF8);
}
private static string InternalReadAllText(string path, Encoding encoding)
{
string result;
using (StreamReader streamReader = new StreamReader(path, encoding))
{
result = streamReader.ReadToEnd();
}
return result;
}

string contents = System.IO.File.ReadAllText(path)
Here's the MSDN documentation

For the noobs out there who find this stuff fun and interesting, the fastest way to read an entire file into a string in most cases (according to these benchmarks) is by the following:
using (StreamReader sr = File.OpenText(fileName))
{
string s = sr.ReadToEnd();
}
//you then have to process the string
However, the absolute fastest to read a text file overall appears to be the following:
using (StreamReader sr = File.OpenText(fileName))
{
string s = String.Empty;
while ((s = sr.ReadLine()) != null)
{
//do what you have to here
}
}
Put up against several other techniques, it won out most of the time, including against the BufferedReader.

Take a look at the File.ReadAllText() method
Some important remarks:
This method opens a file, reads each line of the file, and then adds
each line as an element of a string. It then closes the file. A line
is defined as a sequence of characters followed by a carriage return
('\r'), a line feed ('\n'), or a carriage return immediately followed
by a line feed. The resulting string does not contain the terminating
carriage return and/or line feed.
This method attempts to automatically detect the encoding of a file
based on the presence of byte order marks. Encoding formats UTF-8 and
UTF-32 (both big-endian and little-endian) can be detected.
Use the ReadAllText(String, Encoding) method overload when reading
files that might contain imported text, because unrecognized
characters may not be read correctly.
The file handle is guaranteed to be closed by this method, even if
exceptions are raised

string text = File.ReadAllText("Path"); you have all text in one string variable. If you need each line individually you can use this:
string[] lines = File.ReadAllLines("Path");

System.IO.StreamReader myFile =
new System.IO.StreamReader("c:\\test.txt");
string myString = myFile.ReadToEnd();

if you want to pick file from Bin folder of the application then you can try following and don't forget to do exception handling.
string content = File.ReadAllText(Path.Combine(System.IO.Directory.GetCurrentDirectory(), #"FilesFolder\Sample.txt"));

#Cris sorry .This is quote MSDN Microsoft
Methodology
In this experiment, two classes will be compared. The StreamReader and the FileStream class will be directed to read two files of 10K and 200K in their entirety from the application directory.
StreamReader (VB.NET)
sr = New StreamReader(strFileName)
Do
line = sr.ReadLine()
Loop Until line Is Nothing
sr.Close()
FileStream (VB.NET)
Dim fs As FileStream
Dim temp As UTF8Encoding = New UTF8Encoding(True)
Dim b(1024) As Byte
fs = File.OpenRead(strFileName)
Do While fs.Read(b, 0, b.Length) > 0
temp.GetString(b, 0, b.Length)
Loop
fs.Close()
Result
FileStream is obviously faster in this test. It takes an additional 50% more time for StreamReader to read the small file. For the large file, it took an additional 27% of the time.
StreamReader is specifically looking for line breaks while FileStream does not. This will account for some of the extra time.
Recommendations
Depending on what the application needs to do with a section of data, there may be additional parsing that will require additional processing time. Consider a scenario where a file has columns of data and the rows are CR/LF delimited. The StreamReader would work down the line of text looking for the CR/LF, and then the application would do additional parsing looking for a specific location of data. (Did you think String. SubString comes without a price?)
On the other hand, the FileStream reads the data in chunks and a proactive developer could write a little more logic to use the stream to his benefit. If the needed data is in specific positions in the file, this is certainly the way to go as it keeps the memory usage down.
FileStream is the better mechanism for speed but will take more logic.

well the quickest way meaning with the least possible C# code is probably this one:
string readText = System.IO.File.ReadAllText(path);

you can use :
public static void ReadFileToEnd()
{
try
{
//provide to reader your complete text file
using (StreamReader sr = new StreamReader("TestFile.txt"))
{
String line = sr.ReadToEnd();
Console.WriteLine(line);
}
}
catch (Exception e)
{
Console.WriteLine("The file could not be read:");
Console.WriteLine(e.Message);
}
}

string content = System.IO.File.ReadAllText( #"C:\file.txt" );

You can use like this
public static string ReadFileAndFetchStringInSingleLine(string file)
{
StringBuilder sb;
try
{
sb = new StringBuilder();
using (FileStream fs = File.Open(file, FileMode.Open))
{
using (BufferedStream bs = new BufferedStream(fs))
{
using (StreamReader sr = new StreamReader(bs))
{
string str;
while ((str = sr.ReadLine()) != null)
{
sb.Append(str);
}
}
}
}
return sb.ToString();
}
catch (Exception ex)
{
return "";
}
}
Hope this will help you.

you can read a text from a text file in to string as follows also
string str = "";
StreamReader sr = new StreamReader(Application.StartupPath + "\\Sample.txt");
while(sr.Peek() != -1)
{
str = str + sr.ReadLine();
}

I made a comparison between a ReadAllText and StreamBuffer for a 2Mb csv and it seemed that the difference was quite small but ReadAllText seemed to take the upper hand from the times taken to complete functions.

I'd highly recommend using the File.ReadLines(path) compare to StreamReader or any other File reading methods. Please find below the detailed performance benchmark for both small-size file and large-size file.
I hope this would help.
File operations read result:
For small file (just 8 lines)
For larger file (128465 lines)
Readlines Example:
public void ReadFileUsingReadLines()
{
var contents = File.ReadLines(path);
}
Note : Benchmark is done in .NET 6.

This comment is for those who are trying to read the complete text file in winform using c++ with the help of C# ReadAllText function
using namespace System::IO;
String filename = gcnew String(charfilename);
if(System::IO::File::Exists(filename))
{
String ^ data = gcnew String(System::IO::File::RealAllText(filename)->Replace("\0", Environment::Newline));
textBox1->Text = data;
}

Reading iso-8859-1 rss feed C# WP7

I'm trying to read a rss feed which uses the iso-8859-1 encoding.
I can get all elements fine, the problem is when I put it in a textblock it will not show all characters. I'm not sure what i'm doing wrong. i've tried a few solutions I found on google but this didn't work for me. I must be missing something.. It's also the first time I really work with anything other than utf-16. I never had to convert anything before.
The app works as follows I downloadstring async(WebClient). So when that is called I get a string containing the complete rss feed.
I have tried getting the bytes, then encoding.convert.. But I must be missing something.
Like this is a sample
WebClient RSS = new WebClient();
RSS.Encoding = Encoding.GetEncoding("ISO-8859-1");
RSS.DownloadStringCompleted += new DownloadStringCompletedEventHandler(RSS_DSC);
RSS.DownloadStringAsync(new Uri("some rss feed"));
public void RSS_DSC(object sender, DownloadStringCompletedEventArgs args)
{
_xml = XElement.Parse(args.Result);
foreach(XElement item in _xml.Elements("channel").Elements("item"))
{
feeditem.title = item.Element("title").Value;
// + all other items
}
}
I've tried this aswell
private void RSS_ORC(object sender, OpenReadCompletedEventArgs args)
{
Encoding e = Encoding.GetEncoding("ISO-8859-1");
Stream ez = args.Result;
StreamReader rdr = new StreamReader(ez, e);
XElement _xml = _xml = XElement.Parse(rdr.ReadToEnd());
feedlist = new List<Code.NewsItem>();
XNamespace dc = "http://purl.org/dc/elements/1.1/";
foreach (XElement item in _xml.Elements("channel").Elements("item"))
{
Code.NewsItem feeditem = new Code.NewsItem();
feeditem.title = item.Element("title").Value;
feeditem.description = item.Element("description").Value;
feeditem.pubdate = item.Element("pubDate").Value;
feeditem.author = item.Element(dc + "creator").Value;
feedlist.Add(feeditem);
}
listBox1.ItemsSource = feedlist;
}
Though titles contain characters that are not displayed well either. Like.. I can get the encoding to partially work. Instead of having these characters: the square with a question mark, a question mark or the singe square.
Don't get me wrong I'm a total beginner on this. But the solutions that has been posted on the web do not solve it for me.
Note that I removed the encoding part because it wasn't working :/
If someone would be able to help me that would be amazing.

You can specify an encoding by setting encoding before calling client.DownloadStringAsync:
webClient.Encoding = Encoding.GetEncoding("iso-8859-1")
In your code sample you do not create the XML doc anywhere. Are some code missing? You should initialize it with something like:
var xml = XDocument.Load((string)args.Result);

If it helps, you can use:
var myString = HttpUtility.HtmlDecode(feeditem.description);
This way every special character will be decode, you can then display myString correctly

Windows Phone 7 and Silverlight does not support other encodings such as ISO-8859-1, they only support ASCII and the Unicode encoders. For anything else you will need to use OpenReadAsync to get a stream of bytes then apply your own implementation of an encoding.
This blog might be helpful to you in creating one.

ISO-8859-1 most definitely is supported in WP7. It is the only one of the ISO-8859-* encodings that is. I use an XmlReader to deserialize RSS streams and UTF-* and ISO-8859-1 are the only encodings that are supported by that class (windows-* and ISO-8859-2 and above throw exceptions in the XmlReader c'tor).
Try using an XmlReader like this (without specifying the encoding):
using (XmlReader reader = XmlReader.Create(stream))
{
...
}
The XmlReader will get the encoding from the xml declaration in the stream.
You may still have problems displaying the upper half of the characters (above 0x80). I had this problem in feed me (my WP7 app) and used this little hack to fix things up:
public static string EncodeHtml(string text)
{
if (text == null) return string.Empty;
StringBuilder decodedText = new StringBuilder();
foreach (char value in text)
{
int i = (int)value;
if (i > 127)
{
decodedText.Append(string.Format("&#{0};", i));
}
else
{
decodedText.Append(value);
}
}
return decodedText.ToString();
}
It only works in a WebBrowser control of course, but that is the only place that I ever saw an incorrect display.
Hope this helps,
Calum

This worked for me when needing to decode the rss xml. It's generic enough so that it will support all encryption types supported by .NET
WebClient wcRSSFeeds = new WebClient();
String rssContent;
// Support for international chars
Encoding encoding = wcRSSFeeds.Encoding;
if (encoding != null)
{
encoding = Encoding.GetEncoding(encoding.BodyName);
}
else
{
encoding = Encoding.UTF8; // set to standard if none given
}
Stream stRSSFeeds = wcRSSFeeds.OpenRead(feedURL); // feedURL is a string eg, "http://blah.com"
using (StreamReader srRSSFeeds = new StreamReader(stRSSFeeds, encoding, false))
{
rssContent = srRSSFeeds.ReadToEnd();
}

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Creating XML in C# with UTF8 encoding - c#

Related

Convert XmlWriter to Stream / char / byte []

How can I read Persian line in csv file c#

Irregular characters added to beginning of file

How to read an entire file to a string using C#?

Reading iso-8859-1 rss feed C# WP7

Categories

Resources