Datatable.WriteXml - How to preserve spaces in element names - c#

I've added functionality in my application where a DataTable is converted into XML, like so:
dtResults.WriteXml(fileName);
_x0020_ is being added instead of spaces within the outputted XML document.
Is it possible to generate XML files without the _x0020_ code? i.e. can an XML file be generated using that method or similar, and actually have the spaces preserved?
This is the DataGrid:
This is the resulting XML:
<Customers>
<Customer_x0020_Name>Sean</Customer_x0020_Name>
</Customers>
<Customers>
<Customer_x0020_Name>John</Customer_x0020_Name>
</Customers>
<Customers>
<Customer_x0020_Name>Sarah</Customer_x0020_Name>
</Customers>
<Customers>
<Customer_x0020_Name>Mark</Customer_x0020_Name>
</Customers>
<Customers>
<Customer_x0020_Name>Norman</Customer_x0020_Name>
</Customers>

The name of your column contains a space. XML element names cannot contain a space. A space is used in XML to separate element names from attribute names, for instance:
<ElementName Attribute1="value" />
The DataTable.WriteXml method tries to write out the XML file in a consistent way so that another DataTable object can later be used to load the XML and get as close to an exact copy of the original as possible. Therefore, it replaces illegal characters with their hex-values so that the illegal characters are not lost in translation.
So, if you want to write it to XML differently, you need to either:
Change the name of the column in the DataTable so that it does not contain a space
Manually output the XML yourself using XDocument, XmlDocument, XmlWriter, or XmlSerializer and format the output however you desire
Output the XML as you do now, but then run an XSLT script on it to fix the formatting

I don't think what you're wanting is possible. I don't believe an XML element name can contain a space, just like a variable name cannot contain a space. What is the reason that it needs to be a space?
If there actually needs to be a space (which I think will render the xml useless for parsing), you can simply do a find and replace in the file.
If you're storing it to read back in and display again in a DataTable, I would just rename the columns once I've read the data back in, replacing _x0020_ with spaces.

I am also facing the same problem. I am exporting data from an excel file which contains columns with space. So my columns also contains space.
So I tried the following manner.
StringWriter strXML = new StringWriter();
dtPartOriginalData.WriteXml(strXML, XmlWriteMode.IgnoreSchema, false);
strMessages = strXML.ToString().Replace("_x0020_", "");
This will replace the _x0020_ and generates xml. But you cannot construct the same datatable using this xml. Its a workaround, I thought of adding its as a suggestion. Thank you.

Related

Save XDocument without any formatting changes

I have a XML File where i have to replace a single value of an element. For this im loading my XML file into a XDocument: var camtXml = XDocument.Load(fileStream); After im done with my changes and saving the XDocument to a file there are multiple changes that shouldn't be done. As you can see in the following picture (Left side file from XDocument, rigth site original file):
The UTF-8 was changed from upper- to lower case, CR Linefeeds were added and
the indentation has been changed by removing withespaces. I really wan't to use XDocument because of its libary what easily allows to create and iterate through XElements. But the formatting changes are a show stopper. Is there a way to preserve these formatting changes or is there an alternativ to XDocument with the same options like XPath, XElement etc.?
I found this but it didn't solved my problem.
XDocument how to save without Byte Order Mark AND preseve formatting/whitespace

How to parse an xml that has non-xml data in it

I am working with some xml in C# and am having some issues parsing an xml file due to the format it is in. It has non xml data in the file and I have no control over the format of this file. The file is "test.xml"(see below). I am only concerned with the xml portion of the data, but am unsure the best way to go about accessing it. Any thoughts or recommendations would be greatly appreciated.
Test data -1
Smith, 2234
##*j
Random--
#<?xml version="1.0" encoding="utf-16"?>
<ConfigMessage xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns="http://www.Test.com/schemas/Test.test.Config">
<Config>
<Version>10</Version>
<Build>00520</Build>
<EnableV>false</EnableV>
<BuildL>22</BuildL>
<BuildP>\\testpath\test</BuildP>
</Config>
</ConfigMessage>
#
Put the whole file into a string that contains anything within the first '<' and the last '>' characters detected on the file. Then you can treat it as normal XML from there. If there's random non-XML elements throughout it though you will need to add additional logic to detect starting/stopping XML "blocks".
I can suggest you such solution: open your pseudo-xml like simple text-file, read whole text, after that, with using regex you ought to take xml document (part of primordial document that is able to be converted to XML [|startTag|any symbols|/endTag|]), put it into XDocument (in memory) and now parse it like XML-file.

Remove Empty Line From XML Document

I am Currently Facing A problem. I am loading a xml file in C# and remove some nodes from it and appending some nodes. now problem is that when i am doing removal from the xml file then there are some empty lines created automatically ,so i want to remove these line .
And when i append some nodes to the parent node in xml then i want the new line in each ending tag
For Eg. My Xml file is
<intro id="S0001">
<title>Introduction Title</title>
<para>This is a paragraph. Note that paragraphs can contain other block–level objects, such as lists, as well as directly containing text.</para>
<para>The introduction can contain all of the text objects that a section can contain, except that it cannot be divided into parts, sections and sub–sections.</para>
<para>The introduction can contain tables:</para>
</intro><part>
<no>Part A</no> Article Structure <sup>&lpar;Part Title&rpar;</sup><section1 id="S0002">`enter code here`
<no>Sect 1</no>
<title>First Section in Part 1 <sup>&lpar;Section 1 Title&rpar;</sup></title>
<shortsectionhead>Short Section Header</shortsectionhead>
<para>This is a section in the first part of the article.</para>
</section1><section1 id="S0003">
Code:
XmlNode partNnode = xmlDoc.SelectSingleNode("//part");
XmlNode introNode=xmlDoc.SelectSingleNode("//intro");
XmlDocumentFragment newNode=xmlDoc.CreateDocumentFragment();
newNode.InnerXml=partNnode.OuterXml;
introNode.ParentNode.InsertAfter(newNode,introNode);
partNnode.ParentNode.RemoveChild(partNnode);
partNnode = xmlDoc.SelectSingleNode("//part");
nodeList = xmlDoc.SelectNodes("//section1");
foreach (XmlNode refrangeNode in nodeList)
{
newNode=xmlDoc.CreateDocumentFragment();
newNode.InnerXml=refrangeNode.??OuterXml;
partNnode.AppendChild(newNode);
}
Please help me
Thanks in advance
If you load and save a XMl file with C#, then the XML should be formatted correctly (an easy way to format strange looking XML files is just to load and save them with some C# code).
If I understand your question correctly, then you are just not happy with the format of the XML file?
Like you want (A):
</intro><part>
But you get (B):
</intro>
<part>
If that is the question, then, in my eyes, you just want a strange thing. Because...
a) Code doesn't care how the XML file is formatted and
b) The format in (B) is the correct one
If you, for what reason ever, want to change it, then you have to parse through the XML file, opening it as a string and checking manually for closed and opened tags.

Reading XML file with Invalid character

I am using Dataset.ReadXML() to read an XML string. I get an error as the XML string contains the Invalid Character 0x1F which is 'US' - Unit seperator. This is contained within fully formed tags.
The data is extracted from an Oracle DB, using a Perl script. How would be the best way to escape this character so that the XML is read correctly.
EDIT: XML String:
<RESULT>
<DEPARTMENT>Oncology</DEPARTMENT>
<DESCRIPTION>Oncology</DESCRIPTION>
<STUDY_NAME>**7360C hsd**</STUDY_NAME>
<STUDY_ID>27</STUDY_ID>
</RESULT>
Is between the C and h in the bold part, is where there is a US seperator, which when pasted into this actually shows a space. So I want to know how can I ignore that in an XML string?
If you look at section 2.2 of the XML recommendation, you'll see that x01F is not in the range of characters allowed in XML documents. So while the string you're looking at may look like an XML document to you, it isn't one.
You have two problems. The relatively small one is what to do about this document. I'd probably preprocess the string and discard any character that's not legal in well-formed XML, but then I don't know anything about the relatively large problem.
And the relatively large problem is: what's this data doing in there in the first place? What purpose (if any) do non-visible ASCII characters in the middle of a (presumably) human-readable data field serve? Why is it doesn't the Perl script that produces this string failing when it encounters an illegal character?
I'll bet you one American dollar that it's because the person who wrote that script is using string manipulation and not an XML library to emit the XML document. Which is why, as I've said time and again, you should never use string manipulation to produce XML. (There are certainly exceptions. If you're writing a throwaway application, for instance, or an XML parser. Or if your name's Tim Bray.)
Your XmlReader/TextReader must be created with correct encoding. You can create it as below and pass to your Dataaset:
StreamReader reader = new StreamReader("myfile.xml",Encoding.ASCII); // or correct encoding
myDataset.ReadXml(reader);

XSLT: transfer xml with the closed tags

I'm using XSLT transfer an XML to a different format XML. If there is empty data with the element, it will display as a self-closing, eg. <data />, but I want output it with the closing tag like this <data></data>.
If I change the output method from "xml" to "html" then I can get the <data></data>, but I will lose the <?xml version="1.0" encoding="UTF-8"?> on the top of the document. Is this the correct way of doing this?
Many thanks.
Daoming
If you want this because you think that self closing tags are ugly, then get over it.
If you want to pass the output to some non-conformant XML Parser that is under control, then use a better parser, or fix the one you are using.
If it is out of your control, and you must send it to an inadequate XML Parser, then do you really need the prolog? If not, then html output method is fine.
If you do need the XML prolog, then you could use the html output method, and prepend the prolog after transformation, but before sending it to the deficient parser.
Alternatively, you could output it as XML with self-closing tags, and preprocess before sending it to your deficient parser with some kind of custom serialisation, using the DOM. If it can't handle self-closing tags, then I'm sure that isn't the only way in which it fails to parse XML. You might need to do something about namespaces, for example.
You could try adding an empty text node to any empty elements that you are outputting. That might do the trick.
Self-closed and explicitly closed elements are exactly the same thing in any regard whatsoever.
Only if somewhere along your processing chain there is a tool that is not XML aware (code that does XML processing with regex, for example), it might make a difference. At which point you should think about changing that part of the processing, instead of the XML generation/serialization part.

Categories

Resources