Overriding or ignoring undeclared entities in C# using LINQ - c#

I have a little utility that runs through looking for certain things in XML files using LINQ. It processes a MASSIVE collection of them rather quickly and nicely. However, about 20% of a certain batch of files fail to be read and are skipped, failing because of the degree symbol's presence as ° in the files. This is the "Reference to undeclared entity 'deg'." a previous question was about.
The solutions offered in the previous question cannot be directly applied here. I am not at liberty to go around modifying the files, and making copies of them and replacing instances or inserting tags in the copies seems inefficient. What would be the best way to go about getting LINQ to ignore the undeclared entities, which have absolutely no bearing on what my program does anyway? Or is there perhaps a good way of getting an XDocument.Load to be fed some entity declarations beforehand?

Unfortunately entities form part of the well-formedness rules for XML (2.1 Well-Formed XML Documents). It seems like you're saying you want the XDocument.Load to load what is notionally an XML file, but does not in fact conform to the rules, which it won't do, quite reasonably.
If your users are passing you what are supposed to be XML files, but that have undefined entities, then either you have to get them to provide the files in a valid format, or manage the incorrectness youself at load-time, in the ways that have been suggested.
It seems to me, from your restrictions, that the neatest approach would be to follow the example linked-to and create some settings to pass into the XMLReader along the lines of (Validating an XML Document in the DOM).
If there are entities which aren't defined and aren't listed in public schemas, you'll need to create your own schema which defines all the entities you need. So, create a generic settings for the XMLReader which references your own, custom schema. Add the necessary entities to this schema as certain files fail to load and then you'll build up a list of all the entites that you need to define in order that the XML files are valid.
Then, for each document you try to load, create an XMLReader for the file using the settings above and call the XDocument(XMLReader) overload.

Related

How to deal with a XML based protocol where the response may conform to one of two XSDs?

I have to read and write data through a protocol where the response XML maybe different according to the error state of the server application. If the response is good it uses let's say Xml_1 with a specific schema but if the response indicates an error it uses Xml_2 with a complete different schema. The good design , in my opinion would be to incorporate the error state to the first schema, but we are just consumers of the this service and we don't have access to the design of the server application. My solution is to (using C#) read the XML response as string, do some searching in order to understand which XML schema is in use and then using the appropriate XML Serializer to convert the response to an object. Is there a more elegant solution?
Is the union of the two schemas a valid schema? (This will typically be the case, for example, if they use different namespaces, but it's likely not to be the case if they are both no-namespace schemas or if they are two versions of the same schema).
If the union is a valid schema, then you could consider validing against that.
Otherwise peeking at the start of the file will often be enough to tell you which vocabulary is in use.
It's possible to parse an XML document without validation, inspect it, and then validate the already parsed document. It's even possible to do this in a single pipeline without putting the whole document in memory. But the details depend on the toolkit you are using. You've tagged the question C# - I'm not sure if this is possible using the Microsoft tools, but it should be possible I think using Saxon-CS. [Disclaimer, my product].

C#: compare two xml files

I would like to write a function to compare two XML files. Based on the following cases, the diff could be due to
the order of the nodes and
some nodes or attributes could have been added or removed
I have found a solution to traverse the XML by using a recursive function or LINQ to XML so basically I can get all nodes and attributes. I have read about the XML Diff and Patch Tool but I'm trying to avoid dependencies on my project. An added complexity to this is to determine which line the diff occurred but this is optional for now.
I'm currently thinking of storing the nodes and attributes of the two XML files to a data structure (e.g dictionary) and compare the dictionaries later but I'm not pretty sure how to do this one. Can you share some ideas?
There is https://msdn.microsoft.com/en-us/library/system.xml.linq.xnode.deepequals%28v=vs.110%29.aspx to do XNode.DeepEquals(XDocument.Load("file1.xml"), XDocument.Load("file2.xml")) but that will only give you a boolean result, not an indication of where the difference is.

Performance difference between linq to xml and xml serialization

I would like to create an xml file (100 lines, 5 namespaces and 30 different tags, 20 attributes total). I already have a hardcoded xml example but i need to write some c# code to generate a dynamic xml and to fill the values, which of course can change. Performance is a concern.
Should I use linq to xml and create all the tags with XDocument and XElement and provide variables that contain the dynamic values
Since i have already an xml example, create a schema.xsd and provide the values to the object
The xml (the object stream) will be sent via HTTP POST every second to a web service.
I am going to timetest both versions but i was just curious if someone already did that.
The LINQ to XML version should have better performance.
If you want to optimize it even more you probably should consider direct string concatenation (but that's not a best practice and the performance gain won't be significant).
The next performance option will be XmlTextWriter. Probably the fastest way to write XML "correctly" - it don't need to create XML object model like LINQ to XML, so should be significantly faster.
You can optimize serialization a bit if you cache the XmlSerializer instance and won't create it every time. Then it will also be relatively fast, though definitely slower than direct XML writes.

Does Linq-to-Xml queries support intellisence in C#?

In my project I am using an XML-file for datastorage. I am accessing that file with linq-to-xml queries. Actually I have created that XML-file from my SQL-server database but as that tables in SQL contained more that 50 columns, the resulting XML-file is also having more than 50 elements...
Now while applying queries I initially load that XML-file in XDocument object and after that applying queries on that.
My main problem is that as it contain more than 50 element it is very difficult to write queries without intellisence support. Why it is not supporting intellisence? What have I done wrong? What can I do to get intellisence support?
LINQ to XML is based on strings and it isn't confined to documents that follow some schema. That's the reason you don't get IntelliSense, VS has no information about the schema.
If this is really important for you, maybe using something like xsd.exe to generate classes that represent the schema would be better for you.
It's not possible to get intellisense for Linq to Xml.
This is because you load a file at runtime and you expect it to have compile time intellisense. What if you would load a different file at runtime, would you then get a compile time error?
What you could do is generate classes from your Xml file and then deserialize your XML file into these classes. The you can use Linq To Objects to access the data.
Here is some documentation for creating your classes.

Read DTD or Schema and list all valid child elements or attributes for a given element

I want to develop an application something like XML editor.. providing intellisense like feature when user types an element, the application will read the DTD or schema and list the valid child elements and attributes (something like Oxygen XML Editor).
Is there an API that i can get this done?
I'm not familiar with an API that performs this task.
If you choose to implement this yourself, however, here's a couple of thoughts:
An XML schema is itself an XML file, that is structured according to the meta-schema. You can easily use one of the existing APIs to unmarshal a schema into an object structure that you can easily work with in-memory.
A DTD is not an XML structure, but any DTD can be represented as a simple schema. Therefore you should try and find a way to convert a DTD into a schema (and apply your schema solution).
HTH
You might find XSD4J useful:
XSD4J is a library to parse XML Schema
files into a structure of Java
objects, convert those back into an
XML DOM tree (and hence plain text)
again, and allow for performing
several queries on the XSD objects.
The library currently supports most
real-world features such as simple and
complex types, type restrictions and
attributes.

Categories

Resources