Xml entity get removed when load a xml document and save it - c#

i have a xml document that contains entity. when i load it and process and save then all entity get converted into UTF encoding. how to get document same as input content.
Input xml is
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE chapter SYSTEM "DTD_v1.5\PLI.dtd">
<chapter num="1" update-date="November 2012" type="chapter">
<page num="3"/>
<title>Artist-Dealer Relations</title>
<para align="left"><content-style font-style="bold">S</content-style>i ­Salander-O’Reilly .</para>
<para>Since that time, the New York state artist consignment statute—which, </para>
<para>The of the artist’s work; the amount o and the like. </para>
<para>But firss of “dealer” and “gallery” interchangeably.</para>
<itemizedlist type="•">
<item><para>To care </para></item>
</itemizedlist>
<chapter>
I want out put also same as the above xml
i have try the following codes
XmlDocument xDoc = new XmlDocument();
string text = string.Empty;
text = ReadFile("38149_Chapter01_Art_Law_XML.xml", enmReadType.None, Encoding.Default);
xDoc.LoadXml(text);
xDoc.Save("38149_Chapter01_Art_Law_XML.xml");
and
string sXmlData = File.ReadAllText("38149_Chapter01_Art_Law_XML.xml", Encoding.Default);
xDoc.LoadXml(sXmlData);
XmlAttribute aa = xDoc.CreateAttribute("Name");
aa.Value = "Sarvesh";
XmlNode snode = xDoc.SelectSingleNode("//chapter");
snode.Attributes.Append(aa);
using (MemoryStream ms = new MemoryStream())
{
XmlWriterSettings settings = new XmlWriterSettings();
settings.Encoding = Encoding.GetEncoding("ISO-8859-1");
XmlWriter xw = XmlWriter.Create(ms, settings);
xDoc.Save(xw);
xw.Close();
using (TextWriter tx = new StreamWriter("38149_Chapter01_Art_Law_XML.xml", false, Encoding.Default))
{
tx.Write(Encoding.UTF8.GetString(ms.ToArray()));
tx.Close();
}
}

Related

C# Diffgram to XElement, Reader/Writer confusion

How do I obtain the Diffgram of a DataSet in an XElement? (Or XDocument)
I found out how to obtain the Diffgram in a string:
// DataSet to Diffgram in a string:
StringBuilder sb = new StringBuilder();
XmlWriterSettings settings = new XmlWriterSettings { Indent = true, Encoding = Encoding.UTF8 };
using (XmlWriter xw = XmlWriter.Create(sb, settings))
{
ds.WriteXml(xw, XmlWriteMode.WriteSchema);//XmlWriteMode.DiffGram
}
string str = sb.ToString();
but it seems wasteful to first output the xml of a diffgram to a string and then parse it back in to an XElement. So I am trying to find out how to fill in the missing link in this code, which should transfer the xml of the diffgram without conversions to an Xml variable:
// I have a DataSet filled with some data
DataSet ds = new DataSet();
ds.Tables.Add(ScanData);
// I need the diffgram of the DataSet in an XElement
XElement xe = null;
XmlReader xr = xe.CreateReader();
// I could live with output to XDocument, and extract the XElement later
XDocument xd = new XDocument();
XmlReader xrd = xd.CreateReader();
// Q: How do I construct a stream that connects an XmlReader to ds.WriteXml()?
Stream stream = ...???... ;
// This method creates the DiffGram output format to a stream
ds.WriteXml(stream, XmlWriteMode.WriteSchema);//diffgram output
I hope to find the answer to my code problem, and maybe even to learn how stream/reader/writer really work.
You need a writer, not reader, to write from DataSet to XDocument:
XDocument xd = new XDocument();
using (var writer = xd.CreateWriter())
{
ds.WriteXml(writer, XmlWriteMode.WriteSchema); // XmlWriteMode.DiffGram
}

Xml file not opening

I am developing C# application, but I encountered a problem while typing into xml file. Let me show the code first:
Company comp = new Company();
comp.CompanyID = comboBox1.SelectedValue.ToString();
comp.CompanyName = comboBox1.Text;
comp.Serial = strEncryptedData;
comp.ListProduct = ll;
XmlDocument xDoc = new XmlDocument();
using (StringWriter stringWriter = new StringWriter())
{
XmlSerializer serializer = new XmlSerializer(typeof(Company));
serializer.Serialize(stringWriter, comp);
xDoc.LoadXml(stringWriter.ToString());
}
string temp = xDoc.OuterXml;
MessageBox.Show(temp);
System.IO.StreamWriter sw = new System.IO.StreamWriter(#"c:\test.xml");
sw.WriteLine(temp);
sw.Flush();
sw.Close();
Program writes the file but when i try to open it in xml format, i recieve blank document, nothing inside. When i opened it in text editor i recieve this:
<?xml version="1.0" encoding="utf-16"?><CompanyXml xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema"><CompanyName /><CompanyID>100</CompanyID><Serial>00000G2SB4BER9PSFJİ2GTVM2UC1VYEİ</Serial></CompanyXml>
It is the correct data I receive, but however cannot be opened as xml.
How can I format it? Or am I doing something wrong while writing it?
just remove this text encoding="utf-16" from your first line of xml then you will open the xml.
The issue appears to be with the encoding, removing utf-16 or changing to utf-8 corrects it.
You could try the StreamWriter constructor which takes an encoding to see if it saves the .xml with the correct encoding.
For example:
StreamWriter sw = new StreamWriter(#"c:\test.xml", Encoding.UTF8);
However, i found some other way to solve this issue after you guys pointed out utf format issue.
First of all we create a class which extends to StringWriter
public class Utf8StringWriter : StringWriter
{
public override Encoding Encoding
{
get { return Encoding.UTF8; }
}
}
Then we edit code into this by modifying StringWriter:
Company comp = new Company();
comp.CompanyID = comboBox1.SelectedValue.ToString();
comp.CompanyName = comboBox1.Text;
comp.Serial = strEncryptedData;
comp.ListProduct = ll;
XmlDocument xDoc = new XmlDocument();
using (StringWriter stringWriter = new Utf8StringWriter())
{
XmlSerializer serializer = new XmlSerializer(typeof(Company));
serializer.Serialize(stringWriter, comp);
StreamWriter sw = new StreamWriter(#"c:\text.xml");
sw.WriteLine(stringWriter);
sw.Flush();
sw.Close();
}
Regards...

Adding an xsd file value to an XmlAttribute

How I am creating an xml string of proper xml with the code below..
string myInputXmlString = #"<ApplicationData>
<something>else</something>
</ApplicationData>";
var doc = new XmlDocument();
doc.LoadXml(myInputXmlString);
XmlAttribute newAttr = doc.CreateAttribute(
"xsi",
"noNamespaceSchemaLocation",
"http://www.w3.org/2001/XMLSchema-instance");
doc.DocumentElement.Attributes.Append(newAttr);
var ms = new MemoryStream();
XmlWriterSettings ws = new XmlWriterSettings
{
OmitXmlDeclaration = false,
ConformanceLevel = ConformanceLevel.Document,
Encoding = UTF8Encoding.UTF8
};
var tx = XmlWriter.Create(ms, ws);
doc.Save(tx);
tx.Flush();
var xmlString = UTF8Encoding.UTF8.GetString(ms.ToArray());
Console.WriteLine(xmlString);
How do I add the xsd information to this so the xml looks like this (with "FullModeDataset.xsd" includded?
<ApplicationData
xsi:noNamespaceSchemaLocation="FullModeDataset.xsd"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" />
Instead of this which the current code is outputing
<ApplicationData
xsi:noNamespaceSchemaLocation=""
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" />
Does this work by chance?
doc.DocumentElement.SetAttribute("noNamespaceSchemaLocation",
"http://www.w3.org/2001/XMLSchema-instance",
"FullModeDataset.xsd");

When saving an XmlDocument, it ignores the encoding in the XmlDeclaration (UTF8) and uses UTF16

i have the following code:
var doc = new XmlDocument();
XmlDeclaration xmlDeclaration = doc.CreateXmlDeclaration("1.0", "UTF-8", null);
doc.AppendChild(xmlDeclaration);
XmlElement root = doc.CreateElement("myRoot");
doc.AppendChild(root);
root.InnerText = "myInnerText";
StringWriter sw = new StringWriter();
doc.Save(sw);
Console.WriteLine(sw.ToString());
Console.WriteLine();
MemoryStream ms = new MemoryStream();
doc.Save(ms);
Console.WriteLine(Encoding.ASCII.GetString(ms.ToArray()));
And here is the output:
<?xml version="1.0" encoding="utf-16"?>
<myRoot>myInnerText</myRoot>
???<?xml version="1.0" encoding="UTF-8"?>
<myRoot>myInnerText</myRoot>
Basically what it does is make an xml file, and set the encoding to utf8, but when it saves it to stringwriter it ignores my encoding and uses utf16. However, when using a memory stream, it uses utf8 (with the extra BOM chars)
Why is this? Why isn't it honouring my explicit encoding setting of utf-8?
Thanks a lot
Because all you are doing is setting an XML element that says it's UTF-8, you aren't actually saving it as UTF-8. You need to set the output stream to use UTF-8, like this:
var doc = new XmlDocument();
XmlElement root = doc.CreateElement("myRoot");
doc.AppendChild(root);
root.InnerText = "myInnerText";
using(TextWriter sw = new StreamWriter("C:\\output.txt", false, Encoding.UTF8)) //Set encoding
{
doc.Save(sw);
}
Once you do that, you don't even have to add the XML declaration. It figures it out on its own. If you want to save it to a MemoryStream, use a StreamWriter that wraps the MemoryStream.
I use the following method, it writes it out pretty and as UTF-8
public static string Beautify(XmlDocument doc)
{
string xmlString = null;
using (MemoryStream ms = new MemoryStream()) {
XmlWriterSettings settings = new XmlWriterSettings {
Encoding = new UTF8Encoding(false),
Indent = true,
IndentChars = " ",
NewLineChars = "\r\n",
NewLineHandling = NewLineHandling.Replace
};
using (XmlWriter writer = XmlWriter.Create(ms, settings)) {
doc.Save(writer);
}
xmlString = Encoding.UTF8.GetString(ms.ToArray());
}
return xmlString;
}
Call it like:
File.WriteAllText(fileName, Utilities.Beautify(xmlDocument));
From the MSDN we can see...
The encoding on the TextWriter determines the encoding that is written out (The encoding of the XmlDeclaration node is replaced by the encoding of the TextWriter). If there was no encoding specified on the TextWriter, the XmlDocument is saved without an encoding attribute.
If you want to use the encoding from the XmlDeclaration you'll need to use a stream to save the document.

Format XML string to print friendly XML string

I have an XML string as such:
<?xml version='1.0'?><response><error code='1'> Success</error></response>
There are no lines between one element and another, and thus is very difficult to read. I want a function that formats the above string:
<?xml version='1.0'?>
<response>
<error code='1'> Success</error>
</response>
Without resorting to manually write the format function myself, is there any .Net library or code snippet that I can use offhand?
You will have to parse the content somehow ... I find using LINQ the most easy way to do it. Again, it all depends on your exact scenario. Here's a working example using LINQ to format an input XML string.
string FormatXml(string xml)
{
try
{
XDocument doc = XDocument.Parse(xml);
return doc.ToString();
}
catch (Exception)
{
// Handle and throw if fatal exception here; don't just ignore them
return xml;
}
}
[using statements are ommitted for brevity]
Use XmlTextWriter...
public static string PrintXML(string xml)
{
string result = "";
MemoryStream mStream = new MemoryStream();
XmlTextWriter writer = new XmlTextWriter(mStream, Encoding.Unicode);
XmlDocument document = new XmlDocument();
try
{
// Load the XmlDocument with the XML.
document.LoadXml(xml);
writer.Formatting = Formatting.Indented;
// Write the XML into a formatting XmlTextWriter
document.WriteContentTo(writer);
writer.Flush();
mStream.Flush();
// Have to rewind the MemoryStream in order to read
// its contents.
mStream.Position = 0;
// Read MemoryStream contents into a StreamReader.
StreamReader sReader = new StreamReader(mStream);
// Extract the text from the StreamReader.
string formattedXml = sReader.ReadToEnd();
result = formattedXml;
}
catch (XmlException)
{
// Handle the exception
}
mStream.Close();
writer.Close();
return result;
}
This one, from kristopherjohnson is heaps better:
It doesn't require an XML document header either.
Has clearer exceptions
Adds extra behaviour options: OmitXmlDeclaration = true, NewLineOnAttributes = true
Less lines of code
static string PrettyXml(string xml)
{
var stringBuilder = new StringBuilder();
var element = XElement.Parse(xml);
var settings = new XmlWriterSettings();
settings.OmitXmlDeclaration = true;
settings.Indent = true;
settings.NewLineOnAttributes = true;
using (var xmlWriter = XmlWriter.Create(stringBuilder, settings))
{
element.Save(xmlWriter);
}
return stringBuilder.ToString();
}
The simple solution that is working for me:
XmlDocument xmlDoc = new XmlDocument();
StringWriter sw = new StringWriter();
xmlDoc.LoadXml(rawStringXML);
xmlDoc.Save(sw);
String formattedXml = sw.ToString();
Check the following link: How to pretty-print XML (Unfortunately, the link now returns 404 :()
The method in the link takes an XML string as an argument and returns a well-formed (indented) XML string.
I just copied the sample code from the link to make this answer more comprehensive and convenient.
public static String PrettyPrint(String XML)
{
String Result = "";
MemoryStream MS = new MemoryStream();
XmlTextWriter W = new XmlTextWriter(MS, Encoding.Unicode);
XmlDocument D = new XmlDocument();
try
{
// Load the XmlDocument with the XML.
D.LoadXml(XML);
W.Formatting = Formatting.Indented;
// Write the XML into a formatting XmlTextWriter
D.WriteContentTo(W);
W.Flush();
MS.Flush();
// Have to rewind the MemoryStream in order to read
// its contents.
MS.Position = 0;
// Read MemoryStream contents into a StreamReader.
StreamReader SR = new StreamReader(MS);
// Extract the text from the StreamReader.
String FormattedXML = SR.ReadToEnd();
Result = FormattedXML;
}
catch (XmlException)
{
}
MS.Close();
W.Close();
return Result;
}
I tried:
internal static void IndentedNewWSDLString(string filePath)
{
var xml = File.ReadAllText(filePath);
XDocument doc = XDocument.Parse(xml);
File.WriteAllText(filePath, doc.ToString());
}
it is working fine as expected.
.NET 2.0 ignoring name resolving, and with proper resource-disposal, indentation, preserve-whitespace and custom encoding:
public static string Beautify(System.Xml.XmlDocument doc)
{
string strRetValue = null;
System.Text.Encoding enc = System.Text.Encoding.UTF8;
// enc = new System.Text.UTF8Encoding(false);
System.Xml.XmlWriterSettings xmlWriterSettings = new System.Xml.XmlWriterSettings();
xmlWriterSettings.Encoding = enc;
xmlWriterSettings.Indent = true;
xmlWriterSettings.IndentChars = " ";
xmlWriterSettings.NewLineChars = "\r\n";
xmlWriterSettings.NewLineHandling = System.Xml.NewLineHandling.Replace;
//xmlWriterSettings.OmitXmlDeclaration = true;
xmlWriterSettings.ConformanceLevel = System.Xml.ConformanceLevel.Document;
using (System.IO.MemoryStream ms = new System.IO.MemoryStream())
{
using (System.Xml.XmlWriter writer = System.Xml.XmlWriter.Create(ms, xmlWriterSettings))
{
doc.Save(writer);
writer.Flush();
ms.Flush();
writer.Close();
} // End Using writer
ms.Position = 0;
using (System.IO.StreamReader sr = new System.IO.StreamReader(ms, enc))
{
// Extract the text from the StreamReader.
strRetValue = sr.ReadToEnd();
sr.Close();
} // End Using sr
ms.Close();
} // End Using ms
/*
System.Text.StringBuilder sb = new System.Text.StringBuilder(); // Always yields UTF-16, no matter the set encoding
using (System.Xml.XmlWriter writer = System.Xml.XmlWriter.Create(sb, settings))
{
doc.Save(writer);
writer.Close();
} // End Using writer
strRetValue = sb.ToString();
sb.Length = 0;
sb = null;
*/
xmlWriterSettings = null;
return strRetValue;
} // End Function Beautify
Usage:
System.Xml.XmlDocument xmlDoc = new System.Xml.XmlDocument();
xmlDoc.XmlResolver = null;
xmlDoc.PreserveWhitespace = true;
xmlDoc.Load("C:\Test.svg");
string SVG = Beautify(xmlDoc);
Customizable Pretty XML output with UTF-8 XML declaration
The following class definition gives a simple method to convert an input XML string into formatted output XML with the xml declaration as UTF-8. It supports all the configuration options that the XmlWriterSettings class offers.
using System;
using System.Text;
using System.Xml;
using System.IO;
namespace CJBS.Demo
{
/// <summary>
/// Supports formatting for XML in a format that is easily human-readable.
/// </summary>
public static class PrettyXmlFormatter
{
/// <summary>
/// Generates formatted UTF-8 XML for the content in the <paramref name="doc"/>
/// </summary>
/// <param name="doc">XmlDocument for which content will be returned as a formatted string</param>
/// <returns>Formatted (indented) XML string</returns>
public static string GetPrettyXml(XmlDocument doc)
{
// Configure how XML is to be formatted
XmlWriterSettings settings = new XmlWriterSettings
{
Indent = true
, IndentChars = " "
, NewLineChars = System.Environment.NewLine
, NewLineHandling = NewLineHandling.Replace
//,NewLineOnAttributes = true
//,OmitXmlDeclaration = false
};
// Use wrapper class that supports UTF-8 encoding
StringWriterWithEncoding sw = new StringWriterWithEncoding(Encoding.UTF8);
// Output formatted XML to StringWriter
using (XmlWriter writer = XmlWriter.Create(sw, settings))
{
doc.Save(writer);
}
// Get formatted text from writer
return sw.ToString();
}
/// <summary>
/// Wrapper class around <see cref="StringWriter"/> that supports encoding.
/// Attribution: http://stackoverflow.com/a/427737/3063884
/// </summary>
private sealed class StringWriterWithEncoding : StringWriter
{
private readonly Encoding encoding;
/// <summary>
/// Creates a new <see cref="PrettyXmlFormatter"/> with the specified encoding
/// </summary>
/// <param name="encoding"></param>
public StringWriterWithEncoding(Encoding encoding)
{
this.encoding = encoding;
}
/// <summary>
/// Encoding to use when dealing with text
/// </summary>
public override Encoding Encoding
{
get { return encoding; }
}
}
}
}
Possibilities for further improvement:-
An additional method GetPrettyXml(XmlDocument doc, XmlWriterSettings settings) could be created that allows the caller to customize the output.
An additional method GetPrettyXml(String rawXml) could be added that supports parsing raw text, rather than have the client use the XmlDocument. In my case, I needed to manipulate the XML using the XmlDocument, hence I didn't add this.
Usage:
String myFormattedXml = null;
XmlDocument doc = new XmlDocument();
try
{
doc.LoadXml(myRawXmlString);
myFormattedXml = PrettyXmlFormatter.GetPrettyXml(doc);
}
catch(XmlException ex)
{
// Failed to parse XML -- use original XML as formatted XML
myFormattedXml = myRawXmlString;
}
Check the following link: Format an XML file so it looks nice in C#
// Format the XML text.
StringWriter string_writer = new StringWriter();
XmlTextWriter xml_text_writer = new XmlTextWriter(string_writer);
xml_text_writer.Formatting = Formatting.Indented;
xml_document.WriteTo(xml_text_writer);
// Display the result.
txtResult.Text = string_writer.ToString();
It is possible to pretty-print an XML string via a streaming transformation with XmlWriter.WriteNode(XmlReader, true). This method
copies everything from the reader to the writer and moves the reader to the start of the next sibling.
Define the following extension methods:
public static class XmlExtensions
{
public static string FormatXml(this string xml, bool indent = true, bool newLineOnAttributes = false, string indentChars = " ", ConformanceLevel conformanceLevel = ConformanceLevel.Document) =>
xml.FormatXml( new XmlWriterSettings { Indent = indent, NewLineOnAttributes = newLineOnAttributes, IndentChars = indentChars, ConformanceLevel = conformanceLevel });
public static string FormatXml(this string xml, XmlWriterSettings settings)
{
using (var textReader = new StringReader(xml))
using (var xmlReader = XmlReader.Create(textReader, new XmlReaderSettings { ConformanceLevel = settings.ConformanceLevel } ))
using (var textWriter = new StringWriter())
{
using (var xmlWriter = XmlWriter.Create(textWriter, settings))
xmlWriter.WriteNode(xmlReader, true);
return textWriter.ToString();
}
}
}
And now you will be able to do:
var inXml = #"<?xml version='1.0'?><response><error code='1'> Success</error></response>";
var newXml = inXml.FormatXml(indentChars : "", newLineOnAttributes : false); // Or true, if you prefer
Console.WriteLine(newXml);
Which prints
<?xml version='1.0'?>
<response>
<error code="1"> Success</error>
</response>
Notes:
Other answers load the XML into some Document Object Model such as XmlDocument or XDocument/XElement, then re-serialize the DOM with indentation enabled.
This streaming solution completely avoids the added memory overhead of a DOM.
In your question you do not add any indentation for the nested <error code='1'> Success</error> node, so I set indentChars : "". Generally an indentation of two spaces per level of nesting is customary.
Attribute delimiters will be unconditionally transformed to double-quotes if currently single-quotes. (I believe this is true of other answers as well.)
Passing conformanceLevel : ConformanceLevel.Fragment allows strings containing sequences of XML fragments to be formatted.
Other than ConformanceLevel.Fragment, the input XML string must be well-formed. If it is not, XmlReader will throw an exception.
Demo fiddle here.
if you load up the XMLDoc I'm pretty sure the .ToString() function posses an overload for this.
But is this for debugging? The reason that it is sent like that is to take up less space (i.e stripping unneccessary whitespace from the XML).
Hi why don't you just try this:
XmlDocument xmlDoc = new XmlDocument();
xmlDoc.PreserveWhitespace = false;
....
....
xmlDoc.Save(fileName);
PreserveWhitespace = false; that option can be used xml beautifier as well.

Categories

Resources