Copying an existing XML file to a new XML file - c#

I have an existing XML file that has the following start element:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE mynms SYSTEM 'mynms20.dtd'>
<mynms version="2.0" xmlns="mynms20.xsd"> <!-- This is the troublesome line-->
<cmData type="actual">
<header>
<log dateTime="2011-10-17T06:07:07" action="created" appInfo="ActualExporter">InternalValues are used</log>
</header>
.....
</cmData>
I'm reading this file using c#'s XMLReader and then altering certain elements of the file as above and outputing the new xml in another file using c#'s XMLWriter.
So when the XMLReader reads in an element, I have the following:
writer.WriteStartElement(xmlReader.Prefix, xmlReader.Name, null);
writer.WriteAttributes(xmlReader, true); <!-- This causes the assertion. Take this out then everything is OK -->
But I get an exception stating The prefix ' ' cannot be redefined from ' ' to 'mynms20.xsd' within the same start element tag. <--- What does this mean and how can I just copy the namespace and attributes over to another file?
Many thanks.

Related

DTD must be defined before the document root element

I'm creating an XML file programatically. To create the starting tag I have the code:
Dim XDoc As XDocument = <?xml version="1.0" encoding="UTF-8" standalone="yes"?><Customers></Customers>
It's then followed by looping through the data, adding in the required elements using the Root.Add method. The XML displays in the browser successfully.
Xdoc.Root.Add(<customer>
<fields>
</customer>
When some clients connect to this XML data, if there is no data to retrieve the page is displayed as:
<?xml version="1.0"?>
<Customers/>
When reading the XML URL from a .NET project:
Dim Xdoc As XDocument = XDocument.Load(UrlToXmlFile)
The error "DTD must be defined before the document root element" is thrown.
Although I can trap the error, I thought perhaps I may have done something wrong when creating the XML (XML isn't my strong point).
Some sites suggest adding a DTD (<!DOCTYPE note SYSTEM "Note.dtd"> for example). I don't know if this is correct or if I can ignore the error or if there is a better way to to declare this?
Update: when i view the page directly in Chrome, it displays the XML as
<Customers>
<customer>....</customer>
<customer>....</customer>
</Customers>
in IE it displays as
<?xml version="1.0"?>
<Customers>
<customer>....</customer>
<customer>....</customer>
</Customers>
but in both browsers when i look at 'view source' it shows
<Customers>
<customer>....</customer>
<customer>....</customer>
</Customers>
I don't know of this would be an issue?
Update 2
XDoc.Save(Sr)
Response.Clear()
Response.Buffer = True
Response.Charset = ""
Response.Cache.SetCacheability(HttpCacheability.NoCache)
Response.ContentType = "application/xml"
Response.Write(Sr.GetStringBuilder.ToString)
Response.Flush()
Response.End()
Here is an example showing the generation/saving and loading of your xml with no errors.
Dim XmlFile As String = "C:\\Temp\\TestData.xml"
Dim XDoc As XDocument = <?xml version="1.0" encoding="UTF-8"?><Customers></Customers>
For ForCount As Integer = 0 To 10
XDoc.Root.Add("<customer>Customer" & ForCount.ToString & "</customer>")
Next
XDoc.Save(XmlFile)
Dim XDocReader As XDocument = XDocument.Load(XmlFile)
Also it sounds like you might be using a web service. Use fiddler to verify your web service is not adding in that attribute when serving the data. I do not see how you are exporting the XML. Make sure you are not just doing a .ToSrting on XDoc, that will only generate the inner XML.

how to add uri of lexicon file in xml

I am writing code for custom grammar for that I created a lexicon file which I am using in XML grammar file. I want to add the lexicon with my software to the end user as an embedded resources so how can i reference it's uri in XML?
my XML codes are
<?xml version="1.0" encoding="utf-8"?>
<grammar
version="1.0" mode="voice" root="Voice_Automator"
xml:lang="en-IN" tag-format="semantics/1.0"
sapi:alphabet="x-microsoft-ups"
xml:base="http://www.contoso.com/"
xmlns="http://www.w3.org/2001/06/grammar"
xmlns:sapi="http://schemas.microsoft.com/Speech/2002/06/SRGSExtensions">
**<lexicon uri="C:\Users\Agrawal\Documents\Visual Studio 2010\Projects\123.pls" />**
i want to use 123.pls file as an embedded resource so at the end user the program should load it properly
Use one of the XML link mechanisms:
XLink
<lexicon xmlns:link="http://www.w3.org/1999/xlink" xlink:href="C:\Users\Agrawal\Documents\Visual Studio 2010\Projects\123.pls"/>
External Entity
<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE foo [<!ENTITY bar SYSTEM "C:\Users\Agrawal\Documents\Visual Studio 2010\Projects\123.pls"> ]>
<grammar version="1.0" mode="voice" root="Voice_Automator" xml:lang="en-IN" tag-format="semantics/1.0" sapi:alphabet="x-microsoft-ups" xml:base="http://www.contoso.com/" xmlns="http://www.w3.org/2001/06/grammar" xmlns:sapi="http://schemas.microsoft.com/Speech/2002/06/SRGSExtensions">
<? &bar; ?>
</grammar>
References
XML Linking Language
XML Linking and Style
XML: How to read one file into another
XML: How to load the contents of one xml file into another
How do I check for the existence of an external file with XSL?
how to use xpointer to link to specific node from another document
Generate HTML / Help files from VS 2010 C# XML documentation

Modifying xml file without changing special chars

I have an XML file with structure:
<?xml version='1.0'?>
<a>
<b>
<d>
<LineCode>0</LineCode>
<LineName>Metro</LineName>
<LineDescription>Test C&C all countries with MCFM</LineDescription>
</d>
.......
<e>.....</e>
<f>....</f>
</b>
</a>
In this file I have added section with following code:
XElement newElement = new XElement("e",
new XElement("e1", "test1"),
new XElement("e2", "test2"),
new XElement("e3", "test3 ));
doc.Root.Element("a").Element("d").AddAfterSelf(newElement);
doc.Save(file.Directory + "//" + file.Name);
After i run this code all my special chars used in the initial XML file are modified .
For exemple first row became:
<?xml version="1.0" encoding="utf-8"?>
line became:
<LineDescription>Test C&C all countries with MCFM</LineDescription>
How to add the new section in my XML file without modifying the existing chars?Or how to save without modifying existing special chars?
== Observation #1 ==
<!-- Before: -->
<?xml version='1.0'?>
<!-- After: -->
<?xml version="1.0" encoding="utf-8"?>
Explanation: If the encoding attribute is omitted, utf-8 is the default.
== Observation #2 ==
<!-- Before: -->
&
<!-- After: -->
&
Explanation: These XML entities for representing the ampersand are equivalent.
== Summary ==
The new files are to be used by other applications and it is possible that there
might be problems reading or processing the new files.
Well-behaved XML processing software should treat your before- and after- documents in an equivalent fashion. So, if you encounter problems reading or processing the newly edited XML files, those problems really should be addressed. But it is possible that you may not have the kinds of problems that you anticipate.
If the code processing your file is supposed to handle standard XML, it should not matter which form the characters are stored in. Your original file is using the numeric code for the character, while the newly saved file is using the standard entity name. The same applies for the XML header line - version='1.0' and version="1.0" - should be treated exactly the same, and the additional element just identifies which character set was used in writing the file.
Provided your other applications are using standard XML parsers, or custom parsers which are capable of reading standard XML there should be no problem with the modified XML. The only issue you might have is if these other applications cannot read standard XML (ie they assume that all values use single quotes, or they don't correctly process the XML standard entities, etc) - in this case you may need to use a filtering parser on any file sent to those applications to ensure that these requirements are met. (ie a simple SAX parser which writes out the file as the events are triggered using the additional limitations)

Getting garbled characters

My aim here is to convert the original xml file through some xsl to the destination having a utf-8 encoding. Here is the original xml file with the following header:
<?xml version='1.0' encoding='ISO-8859-1'?>
I'm transforming this using xsl to another xml file. The xsl file has the following header:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns="http://www.w3.org/1999/xhtml"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="1.0"
xmlns:xalan="http://xml.apache.org/xalan"
exclude-result-prefixes="xalan">
<xsl:output method="xml"
encoding="UTF-8"
indent="yes"
xalan:indent-amount="4"/>
Here is the C# code:
XPathDocument myXPathDoc = new XPathDocument(FileName);
XslCompiledTransform myXslTrans = new XslCompiledTransform();
myXslTrans.Load("C:/test/test.xsl");
XmlTextWriter myWriter = new XmlTextWriter(destinationFile, Encoding.UTF8);
myWriter.Formatting = Formatting.Indented;
myWriter.Indentation = 4;
myXslTrans.Transform(myXPathDoc, null, myWriter);
myWriter.Close();
The output of this is I get a garbled arabic text at destinationfile. How do get this to read proper arabic text.
EDIT, Question 2:
The original XML file is missing the closing root/child tags. How do I edit this xml to include these in.
e.g. original xml file, missing closing for aaaa and nnnn. How do I edit using C# to get them in.
<aaaa>
<nnnn)
<rrrr>
</rrrr>
If your original XML file contains Arabic characters then its XML declaration is lying - the file is not encoding="ISO-8859-1" as that encoding cannot represent Arabic.
If you can determine what encoding the original file really uses you can force the file to be read in that encoding by using the XPathDocument constructor that takes a TextReader instead of the one that takes a file name. For Arabic, the encoding is probably Windows-1256.

Reading contents of XML file without having to remove the XML declaration

I want to read all XML contents from a file. The code below only works when the XML declaration (<?xml version="1.0" encoding="UTF-8"?>) is removed. What is the best way to read the file without removing the XML declaration?
XmlTextReader reader = new XmlTextReader(#"c:\my path\a.xml");
reader.Read();
string rs = reader.ReadOuterXml();
Without removing the XML declaration, reader.ReadOuterXml() returns an empty string.
<?xml version="1.0" encoding="UTF-8"?>
<s:Envelope xmlns:s="http://www.w3.org/2003/05/soap-envelope" xmlns:a="http://www.w3.org/2005/08/addressing">
<s:Header>
<a:Action s:mustUnderstand="1">http://www.as.com/ver/ver.IClaimver/Car</a:Action>
<a:MessageID>urn:uuid:b22149b6-2e70-46aa-8b01-c2841c70c1c7</a:MessageID>
<ActivityId CorrelationId="16b385f3-34bd-45ff-ad13-8652baeaeb8a" xmlns="http://schemas.microsoft.com/2004/09/ServiceModel/Diagnostics">04eb5b59-cd42-47c6-a946-d840a6cde42b</ActivityId>
<a:ReplyTo>
<a:Address>http://www.w3.org/2005/08/addressing/anonymous</a:Address>
</a:ReplyTo>
<a:To s:mustUnderstand="1">http://localhost/ver.Web/ver2011.svc</a:To>
</s:Header>
<s:Body xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<Car xmlns="http://www.as.com/ver">
<carApplication>
<HB_Base xsi:type="HB" xmlns="urn:core">
<Header>
<Advisor>
<AdvisorLocalAuthorityCode>11</AdvisorLocalAuthorityCode>
<AdvisorType>1</AdvisorType>
</Advisor>
</Header>
<General>
<ApplyForHB>yes</ApplyForHB>
<ApplyForCTB>yes</ApplyForCTB>
<ApplyForFSL>yes</ApplyForFSL>
<ConsentSupplied>no</ConsentSupplied>
<SupportingDocumentsSupplied>no</SupportingDocumentsSupplied>
</General>
</HB_Base>
</carApplication>
</Car>
</s:Body>
</s:Envelope>
Update
I know other methods that use NON-xml reader (e.g. by using File.ReadAllText()). But I need to know a way that uses an xml method.
There can be no text or whitespace before the <?xml ?> encoding declaration other than a BOM, and no text between the declaration and the root element other than line break.
Anything else is an invalid document.
UPDATE:
I think your expectation of XmlTextReader.read() is incorrect.
Each call to XmlTextReader.Read() steps through the next "token" in the XML document, one token at a time. "Token" means XML elements, whitespace, text, and XML encoding declaration.
Your call to reader.ReadOuterXML() is returning an empty string because the first token in your XML file is an XML declaration, and an XML declaration does not have an OuterXML.
Consider this code:
XmlTextReader reader = new XmlTextReader("test.xml");
reader.Read();
Console.WriteLine(reader.NodeType); // XMLDeclaration
reader.Read();
Console.WriteLine(reader.NodeType); // Whitespace
reader.Read();
Console.WriteLine(reader.NodeType); // Element
string rs = reader.ReadOuterXml();
The code above produces this output:
XmlDeclaration
Whitespace
Element
The first "token" is the XML declaration.
The second "token" encountered is the line break after the XML declaration.
The third "token" encountered is the <s:Envelope> element. From here a call to reader.ReadOuterXML() will return what I think you're expecting to see - the text of <s:Envelope> element, which is the entire soap packet.
If what you really want is to load the XML file into memory as objects, just call
var doc = XDocument.Load("test.xml")
and be done with the parsing in one fell swoop.
Unless you're working with an XML doc that is so monstrously huge that it won't fit in system memory, there's really not a lot of reason to go poking through the XML document one token at a time.
What about
XmlDocument doc=new XmlDocument;
doc.Load(#"c:\my path\a.xml");
//Now we have the XML document - convert it to a String
//There are many ways to do this, one should be:
StringWriter sw=new StringWriter();
doc.Save(sw);
String finalresult=sw.ToString();
EDIT: I'm assuming you mean you actually have text between the document declaration and the root element. If that's not the case, please clarify.
Without removing the extra text, it's simply an invalid XML file. I wouldn't expect it to work. You don't have an XML file - you have something a bit like an XML file, but with extraneous stuff before the root element.
IMHO you can't read this file. It's because there's a plain text before the root element <s:Envelope> which makes whole document invalid.
You're parsing an XML document as XML just to obtain the source text? Why?
If you really want to do that then:
string rs;
using(var rdr = new StreamReader(#"c:\my path\a.xml"))
rs = rdr.ReadToEnd();
Will work, but I'm really not sure that is what you actually want. This pretty much ignores that it's XML and just reads the text. Useful for some things, but not a lot.

Categories

Resources