Reading contents of XML file without having to remove the XML declaration - c#

I want to read all XML contents from a file. The code below only works when the XML declaration (<?xml version="1.0" encoding="UTF-8"?>) is removed. What is the best way to read the file without removing the XML declaration?
XmlTextReader reader = new XmlTextReader(#"c:\my path\a.xml");
reader.Read();
string rs = reader.ReadOuterXml();
Without removing the XML declaration, reader.ReadOuterXml() returns an empty string.
<?xml version="1.0" encoding="UTF-8"?>
<s:Envelope xmlns:s="http://www.w3.org/2003/05/soap-envelope" xmlns:a="http://www.w3.org/2005/08/addressing">
<s:Header>
<a:Action s:mustUnderstand="1">http://www.as.com/ver/ver.IClaimver/Car</a:Action>
<a:MessageID>urn:uuid:b22149b6-2e70-46aa-8b01-c2841c70c1c7</a:MessageID>
<ActivityId CorrelationId="16b385f3-34bd-45ff-ad13-8652baeaeb8a" xmlns="http://schemas.microsoft.com/2004/09/ServiceModel/Diagnostics">04eb5b59-cd42-47c6-a946-d840a6cde42b</ActivityId>
<a:ReplyTo>
<a:Address>http://www.w3.org/2005/08/addressing/anonymous</a:Address>
</a:ReplyTo>
<a:To s:mustUnderstand="1">http://localhost/ver.Web/ver2011.svc</a:To>
</s:Header>
<s:Body xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<Car xmlns="http://www.as.com/ver">
<carApplication>
<HB_Base xsi:type="HB" xmlns="urn:core">
<Header>
<Advisor>
<AdvisorLocalAuthorityCode>11</AdvisorLocalAuthorityCode>
<AdvisorType>1</AdvisorType>
</Advisor>
</Header>
<General>
<ApplyForHB>yes</ApplyForHB>
<ApplyForCTB>yes</ApplyForCTB>
<ApplyForFSL>yes</ApplyForFSL>
<ConsentSupplied>no</ConsentSupplied>
<SupportingDocumentsSupplied>no</SupportingDocumentsSupplied>
</General>
</HB_Base>
</carApplication>
</Car>
</s:Body>
</s:Envelope>
Update
I know other methods that use NON-xml reader (e.g. by using File.ReadAllText()). But I need to know a way that uses an xml method.

There can be no text or whitespace before the <?xml ?> encoding declaration other than a BOM, and no text between the declaration and the root element other than line break.
Anything else is an invalid document.
UPDATE:
I think your expectation of XmlTextReader.read() is incorrect.
Each call to XmlTextReader.Read() steps through the next "token" in the XML document, one token at a time. "Token" means XML elements, whitespace, text, and XML encoding declaration.
Your call to reader.ReadOuterXML() is returning an empty string because the first token in your XML file is an XML declaration, and an XML declaration does not have an OuterXML.
Consider this code:
XmlTextReader reader = new XmlTextReader("test.xml");
reader.Read();
Console.WriteLine(reader.NodeType); // XMLDeclaration
reader.Read();
Console.WriteLine(reader.NodeType); // Whitespace
reader.Read();
Console.WriteLine(reader.NodeType); // Element
string rs = reader.ReadOuterXml();
The code above produces this output:
XmlDeclaration
Whitespace
Element
The first "token" is the XML declaration.
The second "token" encountered is the line break after the XML declaration.
The third "token" encountered is the <s:Envelope> element. From here a call to reader.ReadOuterXML() will return what I think you're expecting to see - the text of <s:Envelope> element, which is the entire soap packet.
If what you really want is to load the XML file into memory as objects, just call
var doc = XDocument.Load("test.xml")
and be done with the parsing in one fell swoop.
Unless you're working with an XML doc that is so monstrously huge that it won't fit in system memory, there's really not a lot of reason to go poking through the XML document one token at a time.

What about
XmlDocument doc=new XmlDocument;
doc.Load(#"c:\my path\a.xml");
//Now we have the XML document - convert it to a String
//There are many ways to do this, one should be:
StringWriter sw=new StringWriter();
doc.Save(sw);
String finalresult=sw.ToString();

EDIT: I'm assuming you mean you actually have text between the document declaration and the root element. If that's not the case, please clarify.
Without removing the extra text, it's simply an invalid XML file. I wouldn't expect it to work. You don't have an XML file - you have something a bit like an XML file, but with extraneous stuff before the root element.

IMHO you can't read this file. It's because there's a plain text before the root element <s:Envelope> which makes whole document invalid.

You're parsing an XML document as XML just to obtain the source text? Why?
If you really want to do that then:
string rs;
using(var rdr = new StreamReader(#"c:\my path\a.xml"))
rs = rdr.ReadToEnd();
Will work, but I'm really not sure that is what you actually want. This pretty much ignores that it's XML and just reads the text. Useful for some things, but not a lot.

Related

How to add data after xml in c# and read the xml after?

I've got to create a file with xml header and after that i have to put normal data, smthing like this:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<Header>
<Algorithm>alg</Algorithm>
<nod2>aaa</nod2>
<nod3>bbb</nod3>
<node>
<User>
<Email />
<SessionKey />
</User>
</node>
</Header>
Data of the file....
I've already got the code to write it to the file.
Code for that part:
private void setHeader(FileStream output, string nod2, string nod3, string )
{
XmlDocument doc = new XmlDocument();
XmlNode docNode = doc.CreateXmlDeclaration("1.0", "UTF-8", "yes");
doc.AppendChild(docNode);
XmlNode header = doc.CreateElement("Header");
doc.AppendChild(header);
XmlNode algorithm = doc.CreateElement("Algorithm");
algorithm.InnerText = "alg";
header.AppendChild(algorithm);
XmlNode node2= doc.CreateElement("nod2");
node2.InnerText = nod2;
header.AppendChild(node2);
XmlNode node3= doc.CreateElement("nod3");
node3.InnerText = nod3;
header.AppendChild(node3);
XmlNode node= doc.CreateElement("node");
header.AppendChild(node);
XmlNode user1 = doc.CreateElement("User");
node.AppendChild(user1);
XmlNode mail = doc.CreateElement("Email");
user1.AppendChild(mail);
XmlNode sessionKey = doc.CreateElement("SessionKey");
user1.AppendChild(sessionKey);
doc.Save(output);
}
It work's pretty well, but when i want to read it with
private void readHeader(FileStream input, out string algorithm)
{
XmlDocument doc = new XmlDocument();
doc.Load(input);
}
I got an error that when the "Data of the file..." starts: "Data on the root level is invalid".
Is there a way to do it with the data after whole xml, or have i to add the data as a node?
This can be done in multiple ways. In comments, you've indicated that the best way is unacceptable for reasons outside the scope of the discussion. For completeness, I'm going to put that one first anyway. Skip down to tl;dr for what I think you'll have to end up doing.
The preferred way to do this is to base64 encode the encrypted data and put it in a CDATA block:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<File>
<Header>
<Algorithm>alg</Algorithm>
<nod2>aaa</nod2>
<nod3>bbb</nod3>
<node>
<User>
<Email />
<SessionKey />
</User>
</node>
</Header>
<Data><![CDATA[
ICAgIFhtbE5vZGUgYWxnb3JpdGhtID0gZG9jLkNyZWF0ZUVsZW1lbnQoIkFsZ29yaXRobSIpOw0K
ICAgIGFsZ29yaXRobS5Jbm5lclRleHQgPSAiYWxnIjsNCiAgICBoZWFkZXIuQXBwZW5kQ2hpbGQo
YWxnb3JpdGhtKTsNCiAgICBYbWxOb2RlIG5vZGUyPSBkb2MuQ3JlYXRlRWxlbWVudCgibm9kMiIp
Ow0KICAgIG5vZGUyLklubmVyVGV4dCA9IG5vZDI7DQogICAgaGVhZGVyLkFwcGVuZENoaWxkKG5v
ZGUyKTsNCiAgICBYbWxOb2RlIG5vZGUzPSBkb2MuQ3JlYXRlRWxlbWVudCgibm9kMyIpOw0KICAg
IG5vZGUzLklubmVyVGV4dCA9IG5vZDM7DQogICAgaGVhZGVyLkFwcGVuZENoaWxkKG5vZGUzKTs=
]]></Data>
</File>
That's the canonical answer to this question.
But you've told me that in your case, a requirement has been imposed that you can't do it that way.
Second choice is MIME (actually, MIME might be the first choice and the above might be the second). But I have a feeling they won't like MIME either.
Third choice, read the file as a string and search for some marker that's inserted between the XML and the binary data, something like a MIME boundary.
tl;dr
If they won't let you add such a marker to the file (and I bet they won't), search for the first occurrence of the substring "</Header>":
var xml = File.ReadAllText(filePath);
var endTag = "</Header>";
var headerXML = xml.Substring(0, xml.IndexOf(endTag) + endTag.Length);
var xdHeader = new XmlDocument();
xdHeader.LoadXml(headerXML);
I tested your code with writing directly to a file, doc.Save(#"c:\temp\test1.xml");
And loading from that file works fine. So there is nothing wrong with your xml document. Check your FileStream. Do you flush and close it properly? Does it have UTF-8 encoding?
What's the input in the node strings. Nothing that is invalid according to xml rules?
After a single root node, only comments and processor instructions can be written to xml. So, you can try to write your data in the comments.
It will look like this:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<Header>
...
</Header>
<!-- your data -->
<!-- another data -->
This method has limitations: your data may not contain -- (double-hyphen) and may not end in -.
But it is better, of course, not to do so.

Read XML with Arabic data embedded c#

I am trying to load an XML file that contains a mix of ASCII text and Arabic characters. Here is the top snippet:
<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE TS>
<TS version="2.1" language="ar_EG">
<context>
<message>
<location filename="ui/aboutdialog.cpp" line="90"/>
<source>You have </source>
<translation type="unfinished">يوجد لديك</translation>
</message>
<message>
<location filename="ui/aboutdialog.cpp" line="90"/>
<source> launches left</source>
<translation type="unfinished">عدد التشغيلات المتبقية</translation>
</message>
</context>
I want to load this up into a C# TreeView object, but I am having issues with loading into XDocument or XMLDocument.
Using this:
XDocument xd = XDocument.Load(File.ReadAllText(tbxTSFileName.Text));
or
XDocument xd = XDocument.Load(File.ReadAllText(tbxTSFileName.Text, Encoding.GetEncoding(874)));
Gives me a "Invalid URI: Uri string is too long" error.
Using this:
XmlDocument xd = new XmlDocument();
xd.Load(tbxTSFileName.Text);
Gives the error "Invalid character in the given encoding. Line 9 position 40".
Read the documentation for the method you're calling.
XDocument.Load takes a URL, not an XML string.
You want XDocument.Parse.
Your reader needs to use utf-8, as indicated in the document itself. Ideally, you would use an XML reader and it would take care of using the indicated encoding itself.

XmlException: Text node cannot appear in this state. Line 1, position 1

Before I get into the issue, I'm aware there is another question that sounds exactly the same as mine. However, I've tried that solution (using Notepad++ to encode the xml file as UTF-8 (without BOM) ) and it doesn't work.
XmlDocument namesDoc = new XmlDocument();
XmlDocument factionsDoc = new XmlDocument();
namesDoc.LoadXml(Application.persistentDataPath + "/names.xml");
factionsDoc.LoadXml(Application.persistentDataPath + "/factions.xml");
Above is the code I have problems with. I'm not sure what the problem is.
<?xml version="1.0" encoding="UTF-8"?>
<factions>
<major id="0">
...
Above is a section of the XML file (the start of it - names.xml is also the same except it has no 'id' attribute). The file(s) are both encoded in UTF-8 - in the latest notepad++ version, there is no option of "encode in UTF-8 without BOM" afaik UTF-8 is the same as UTF-8 without BOM.
Does anyone have any idea what the cause may be? Or am I doing something wrong/forgetting something? :/
You are receiving an error because the .LoadXml() method expects a string argument that contains the XML data, not the location of an XML file. If you want to load an XML file then you need to use the .Load() method, not the .LoadXml() method.

C# XDoc Parse XML string

I'm receiving data via an XML API and it's returning a node like the following:
<?xml version='1.0' encoding='utf-8' ?>
<location>
<name>ØL Shop</name>
</location>
I have no control over the response but I am trying to Load it into an XDocument in which it fails due to the invalid character.
Is there anything I can do to make this load properly? I want to keep the solution as general as possible because it is possible other invalid characters exist.
Thoughts?
You can use html parsers which are more tolerant to invalid inputs. For example; (using HtmlAgilityPack) this code works without any problem.
var doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml(xml);
var name = doc.DocumentNode.Descendants("name").First().InnerText;
You cant use "&" symbol in XDocument.Parse input text. Replace it with "&" , like this
<?xml version='1.0' encoding='utf-8' ?>
<location>
<name>&Oslash;L Shop</name>
</location>
Why not just escape any invalid XML characters before you load the response into an XDocument? You could use a regex for this, should be relatively straight forward.
See escape invalid XML characters in C#

How to save this string into XML file?

I have this string variable:
string xml = #"<Contacts>
<Contact>
<Name>Patrick Hines</Name>
<Phone Type=""Home"">206-555-0144</Phone>
<Phone Type=""Work"">425-555-0145</Phone>
<Phone Type=""Mobile"">332-899-5678</Phone>
<Address>
<Street1>123 Main St</Street1>
<City>Mercer Island</City>
<State>WA</State>
<Postal>68042</Postal>
</Address>
</Contact>
<Contact>
<Name>Dorothy Lee</Name>
<Phone Type=""Home"">910-555-1212</Phone>
<Phone Type=""Work"">336-555-0123</Phone>
<Phone Type=""Mobile"">336-555-0005</Phone>
<Address>
<Street1>16 Friar Duck Ln</Street1>
<City>Greensboro</City>
<State>NC</State>
<Postal>27410</Postal>
</Address>
</Contact>
</Contacts>";
How can I save this string into an XML file in my drive c? Using c#.
The fact that it's XML is basically irrelevant. You can save any text to a file very simply with File.WriteAllText:
File.WriteAllText("foo.xml", xml);
Note that you can also specify the encoding, which defaults to UTF-8. So for example, if you want to write a file in plain ASCII:
File.WriteAllText("foo.xml", xml, Encoding.ASCII);
XmlDocument xdoc = new XmlDocument();
xdoc.LoadXml(yourXMLString);
xdoc.Save("myfilename.xml");
If you don't need to do any processing on the string (with an XML library, for example), you could just do:
File.WriteAllText(#"c:\myXml.xml", xml);
System.IO.File.WriteAllText("filename.xml", xml );
You can do this:
string path = #"C:\testfolder\testfile.txt";
using (System.IO.StreamWriter file = new System.IO.StreamWriter(path))
{
file.Write(text);
}
You can also do this after you've created an XML Document, but it is slower:
xdoc.Save
If you want to save the string as-is without performing any check on whether it's well-formed or valid, then as has been answered above, use System.IO.File.WriteAllText("C:\myfilename.xml", xml );
As has also been noted, this defaults to saving the file as UTF-8, but you can specify encoding as Jon Skeet mentioned.
I'd recommend adding an XML declaration to the string, e.g.,
<?xml version="1.0" encoding="UTF-8"?>
and ensuring the encoding in the declaration matches that in the WriteAllText method. It's likely to save a fair amount of hassle at later date, judging by the frequency of XML encoding questions on stackoverflow.
If you want to ensure the XML is well-formed and/or valid, then you will need to use an XML parser on it first, such as XDocument doc = XDocument.Parse(str); That method is also overridden if you want to preserve whitespace: XDocument.Parse(str, LoadOptions.PreserveWhitespace)
You can then perform validation on it http://msdn.microsoft.com/en-us/library/bb340331.aspx
before saving to file: doc.Save("C:\myfilename.xml");

Categories

Resources