I am trying to generate some JSON from an XML file, but not a straightforward conversion. I wish to pick and choose bits and have a slightly different structure.
I would rather not just concatenate a giant string together and was wondering if there were some decent libraries around to do this.
Also, for testing I would like to be able to validate the created json, just a simple check to see if it is valid JSON
Load the XML into a set of classes (use XMLSerializer) then implement JSON generator methods on those classes. Different methods, different JSON.
You can convert XML to other text representations pretty easily using XSLT, particularly file-to-file using xsltproc or a command-line version of xalan.
XSLT is sometimes an awkward programming language, but if you go this route, I have two recommendations for JSON conversions. Set your output to text, with a UTF-8 character set:
<xsl:output method="text" encoding="UTF-8" />
and run JSLint on the result, in order to catch any bugs in your XSLT file.
I would probably use Linq to XML (XElement and friends) to generate the new object and then pass that object to the Json serializer.
Other answers look good: I think I would also bind source format into objects, then serialize as the other formats. And any transformations would be done to objects, and not using data format representation. When using proper parser (for input) and generators/serializers (for output), you do not have to worry about well-formedness (resulting xml or json being syntactically correct).
And for biz-logic validity you could (and should) do it using objects.
Related
I'm trying to deserialize xml into a complex POCO, and the result of using xsd.exe to make my c# classes from the xsd made some properties string arrays. That might work with my postgres db, but it doesn't work with a mocked dbcontext for unit testing.
An example XML might look like this:
<Car>
<CarWindow>FrontLeft</CarWindow>
<CarWindow>BackLeft</CarWindow>
<OtherFields></OtherFields>
</Car>
So what I want to do is change some of these string arrays, which have a low maxOccurs of 2-4 into dedicated columns. That change will be useful in other ways for this project too. So instead of string[] CarWindow I could have string CarWindow1; string CarWindow2
What I'm missing is a way to specify in the XmlElement attribute a way to map the first, second, etc occurrence of a repeating element. Something like [XmlElement(Occurrence=1)]
I've looked at the XmlElementAttribute documentation and maybe I'm missing it, but I don't see a way to specifically map the nth occurrence of a repeating xml node to one property. Thanks!
XmlWriter.WriteRaw will preserve ' and not send an actual apostrophe. Is there a method to read in ' and keep it as such?
You need to encode it properly. Let's take for example the following XML:
<root>'</root>
The value of the <root> node is ' no matter which XML parser you use to read this XML.
On the other hand if you have the following XML:
<root>'</root>
the value of the <root> node is '.
So in both cases we have properly encoded XML so that when a standard compliant parser reads it, it is able to correctly retrieve the value.
So be very careful when using the WriteRaw method when generating the XML. Since it properly encode the argument it is now your responsibility to ensure that you are passing correct data to it.
I am writing an XML parser; my application creates XML files. For this I have to handle special characters -- for example I know that < should be replaced with <, similarly > should be replaced with >, and so on. What are all the different characters which need to be handled in this way?
See this wikipedia article:
http://en.wikipedia.org/wiki/List_of_XML_and_HTML_character_entity_references
(unless you're doing it for academic purposes, I recommend you use the existing .Net Xml parsing libraries, such as those in the System.Xml namespace, or System.Xml.Linq. If you are trying to serialize/deserialize objects, use the built in Xml serialization)
For XML parsing you don't need to perform those replacements - you'd need to perform them when creating XML. You'd also want to consider replacing & with & where required - see the XML specification for details.
However, I would strongly advise you not to write your own XML API. .NET already contains several of them, including the excellent LINQ to XML. Use that instead of building your own. The chances of you independently creating your own XML API which is of a similar quality are very low, and you'll spend an awful lot of time getting there to start with.
Using a decent XML API, you don't need to worry about character conversions etc - the API will handle them for you.
There is a list of XML escape codes listed here.
Use the System.XML.XMLConvert class to handle special characters for you:
class Program
{
static void Main(string[] args)
{
string s;
s = System.Xml.XmlConvert.EncodeName("valid XML --> !##$%^&*()");
Console.WriteLine("Encoded: {0}", s);
Console.WriteLine("Decoded: {0}",System.Xml.XmlConvert.DecodeName(s));
Console.ReadLine();
}
}
Will yield this result:
Encoded:
valid_x0020_XML_x0020_--_x003E__x0020__x0021__x0040__x0023__x0024__x002
5__x005E__x0026__x002A__x0028__x0029_
Decoded: valid XML --> !##$%^&*()
There is a built in .NET method SecurityElement.Escape for escaping certain (not all) invalid XML characters. Check out this link:
http://msdn.microsoft.com/en-us/library/system.security.securityelement.escape%28v=VS.80%29.aspx
I'm attempting to find complete XML objects in a string. They have been placed in the string by an XmlSerializer, but may or may not be complete. I've toyed with the idea of using a regular expression, because it seems like the kind of thing they were built for, except for the fact that I'm trying to parse XML.
I'm trying to find complete objects in the form:
<?xml version="1.0"?>
<type>
<field>value</field>
...
</type>
My thought was a regex to find <?xml version="1.0"?><type> and </type>, but if a field has the same name as type, it obviously won't work.
There's plenty of documentation on XML parsers, but they seem to all need a complete, fully-formed document to parse. My XML objects can be in a string surrounded by pretty much anything else (including other complete objects).
hw<e>reR#lot$0fr#ndm&nchrs%<?xml version="1.0"?><type><field>...</field>...</type>#ndH#r$omOre!!>nuT6erjc?y!<?xml version="1.0"?><type><field>...</field>...</type>ty!=]
A regex would be able to match a string while excluding the random characters, but not find a complete XML object. I'd like some way to extract an object, parse it with a serializer, then repeat until the string contains no more valid objects.
Can you use a regular expression to search for the "<?xml" piece and then assume that's the beginning of an XML object, then use an XMLReader to read/check the remainder of the string until you have parsed one entire element at the root level (then stop reading from the stream with XMLReader after the root node has been completely parsed)?
Edit: For more information about using XMLReader, I suggest one of the questions I asked: I can never predict xmlreader behavior, any tips on understanding?
My final solution was to stick with the "Read" method when parsing XML and avoid other methods that actually read from the stream advancing the current position.
You could try using the Html Agility Pack, which can be used to parse "malformed XML" and make it accessible with a DOM.
It would be necessary to know which element you are looking for (like <type> in your example), because it will be parsing the accidental elements too (like <e> in your example).
I'm using XSLT transfer an XML to a different format XML. If there is empty data with the element, it will display as a self-closing, eg. <data />, but I want output it with the closing tag like this <data></data>.
If I change the output method from "xml" to "html" then I can get the <data></data>, but I will lose the <?xml version="1.0" encoding="UTF-8"?> on the top of the document. Is this the correct way of doing this?
Many thanks.
Daoming
If you want this because you think that self closing tags are ugly, then get over it.
If you want to pass the output to some non-conformant XML Parser that is under control, then use a better parser, or fix the one you are using.
If it is out of your control, and you must send it to an inadequate XML Parser, then do you really need the prolog? If not, then html output method is fine.
If you do need the XML prolog, then you could use the html output method, and prepend the prolog after transformation, but before sending it to the deficient parser.
Alternatively, you could output it as XML with self-closing tags, and preprocess before sending it to your deficient parser with some kind of custom serialisation, using the DOM. If it can't handle self-closing tags, then I'm sure that isn't the only way in which it fails to parse XML. You might need to do something about namespaces, for example.
You could try adding an empty text node to any empty elements that you are outputting. That might do the trick.
Self-closed and explicitly closed elements are exactly the same thing in any regard whatsoever.
Only if somewhere along your processing chain there is a tool that is not XML aware (code that does XML processing with regex, for example), it might make a difference. At which point you should think about changing that part of the processing, instead of the XML generation/serialization part.