Performance difference between linq to xml and xml serialization - c#

I would like to create an xml file (100 lines, 5 namespaces and 30 different tags, 20 attributes total). I already have a hardcoded xml example but i need to write some c# code to generate a dynamic xml and to fill the values, which of course can change. Performance is a concern.
Should I use linq to xml and create all the tags with XDocument and XElement and provide variables that contain the dynamic values
Since i have already an xml example, create a schema.xsd and provide the values to the object
The xml (the object stream) will be sent via HTTP POST every second to a web service.
I am going to timetest both versions but i was just curious if someone already did that.

The LINQ to XML version should have better performance.
If you want to optimize it even more you probably should consider direct string concatenation (but that's not a best practice and the performance gain won't be significant).
The next performance option will be XmlTextWriter. Probably the fastest way to write XML "correctly" - it don't need to create XML object model like LINQ to XML, so should be significantly faster.
You can optimize serialization a bit if you cache the XmlSerializer instance and won't create it every time. Then it will also be relatively fast, though definitely slower than direct XML writes.

Related

Fastest efficient way to create XML document .NET/Oracle and return to web client

Does anyone know which is the fastest most efficient way to create XML documents in an Oracle/.NET environment.
There are two philosophies:
Do the coding in an Oracle Package and Use Oracles native XML abilities to return an XML document after querying the data from the DB, create the document by then looping through your Query result setting nodes like so (addXMLNode(doc, nContact, 'ROW_ID', rec_con.ROW_ID);)
Just query the Data from Oracle and use .NET on the data looping through the data reader and create your XML document using .NET XML classes. Essentially letting the DB serve the data, and DOM XML creation is done in .NET.
Assuming no knowledge difference in the two practices, does someone know if one is more efficient, faster, or better than the other one? Please don't give me your "favorite" way to handle it. An "our query was slow so we moved it into the code", or vice versa real world example would give me some direction for code refactoring and application performance improvement.
Thanks.

Extracting a small subset of data from XMLs

I am writing a C# / VB program that is to be used for reporting data based upon information received in XMLs.
My situation is that I receive many XMLs per month (about 100-200) - Each ranging in size from 10mb to 350mb. For each of these XMLs, I only need a small subset of its data (less than 5% of any one file's entire data) so as to produce the necessary reports.
Also, that subset of data will always be held in the same key-structure (it will exist within multiple keys and at differing levels down, perhaps, but it will always exist within the same key names / the keys containing it will always have the with the same attributes such as "name", etc)
So, my current idea of how to go about doing this is to:
To create a "scraper" that will pull the necessary data from the XMLs using XPath.
Store that small subset of necessary data in a SQL Server table along with file characteristic data stored in a separate table so as to know which file this scraped data came from
Query out the data into a program for reporting it.
My main question here is really what is the best way to scrape that data out?
I am most familiar with XPath, but for multiple files of 200MB in size, I'm afraid of performance issues loading in the entire file.
Other things I have seen / researched are:
Creating an XSLT file to transform / pull from the XML only the data I want
Using Linq to XML
Somehow linking the XMLs to SQL server and then being able to query them directly
Using ADO to query the XMLs from within the program
Doing it using the XMLReader class (rather than loading in each XML entirely)
Maybe there is a native .Net component that does this very well already
Quite honestly, I just have no clue what the standard is given the high number of XMLs and the large variance in file sizes and I'm not familiar with any of the other ways of doing this - such as, for example, linking the XMLs to SQL Server directly / using ADO to query the XML - and, therefore, don't know of their possible benefits / drawbacks.
If any of you have been in a similar situation, I'd really appreciate any kind of pointers in the right direction / at least validation that my method isn't the worst one out there :)
Thanks!!!
As for the memory consumption and performance concerns, a nice feature of the .NET XML APIs is that you can combine XmlReader with XPathDocument or XmlDocument or XElement to only selectively read part of a document into memory to then have the XPath or LINQ to XML features available on that part. LINQ to XML has http://msdn.microsoft.com/en-us/library/system.xml.linq.xnode.readfrom%28v=vs.110%29.aspx for doing that, DOM/XmlDocument has http://msdn.microsoft.com/en-us/library/system.xml.xmldocument.readnode%28v=vs.110%29.aspx. So depending on your XML structure you might be able to use an XmlReader to read forward through the XML in a fast way without consuming much memory and then, when you have the element you are interested in, you can read it into an XElement (LINQ to XML) or XmlNode (DOM) to then apply LINQ to XML and/or XPath to read out details.

Parsing XML with DataSet- Performance

As per my requirement I want to display some product related information into my UI.
All the information's come through one API URL.API Return XML Output. XML may have more than 100 tag. But as per my requirement I want only 30 to 50 tag. Here I need to pass parameter as input and get the product information.
I using .asmx service as wrapper service and all the parsing process are done here.
In code behind page, I consume the service and display the information.
How to parse the XML? Currently I planned to do is XML to DataSet-(ds.ReadXml(XML))
Does it affect performance? Is there any other way to do? Please guide me.
If you want to bind the result to a Control, then dataset approach (indicated by you) makes sense. How ever, if you need text value of those 30 / 50 tags without what the parent/child nodes are in between, you can use XmlDocument/XPath
I would use LINQ to XML
more info at
http://msdn.microsoft.com/en-us/library/bb387098.aspx
for older version of the framework use the XmlTextReader
Use the XmlTextReader class to process large XML documents in an efficient, forward - only manner. XmlTextReader uses small amounts of memory
Avoid using the DOM because the DOM reads the entire XML document into memory. If the entire XML document is read into memory, the scalability of your application is limited. Using XmlTextReader in combination with an XmlTextWriter class permits you to handle much larger documents than a DOM-based XmlDocument class.
http://msdn.microsoft.com/en-us/library/ff647804.aspx

Overriding or ignoring undeclared entities in C# using LINQ

I have a little utility that runs through looking for certain things in XML files using LINQ. It processes a MASSIVE collection of them rather quickly and nicely. However, about 20% of a certain batch of files fail to be read and are skipped, failing because of the degree symbol's presence as ° in the files. This is the "Reference to undeclared entity 'deg'." a previous question was about.
The solutions offered in the previous question cannot be directly applied here. I am not at liberty to go around modifying the files, and making copies of them and replacing instances or inserting tags in the copies seems inefficient. What would be the best way to go about getting LINQ to ignore the undeclared entities, which have absolutely no bearing on what my program does anyway? Or is there perhaps a good way of getting an XDocument.Load to be fed some entity declarations beforehand?
Unfortunately entities form part of the well-formedness rules for XML (2.1 Well-Formed XML Documents). It seems like you're saying you want the XDocument.Load to load what is notionally an XML file, but does not in fact conform to the rules, which it won't do, quite reasonably.
If your users are passing you what are supposed to be XML files, but that have undefined entities, then either you have to get them to provide the files in a valid format, or manage the incorrectness youself at load-time, in the ways that have been suggested.
It seems to me, from your restrictions, that the neatest approach would be to follow the example linked-to and create some settings to pass into the XMLReader along the lines of (Validating an XML Document in the DOM).
If there are entities which aren't defined and aren't listed in public schemas, you'll need to create your own schema which defines all the entities you need. So, create a generic settings for the XMLReader which references your own, custom schema. Add the necessary entities to this schema as certain files fail to load and then you'll build up a list of all the entites that you need to define in order that the XML files are valid.
Then, for each document you try to load, create an XMLReader for the file using the settings above and call the XDocument(XMLReader) overload.

problem with huge data

I have WCF service which reads data from xml. Data in xml is being changed every 1 minute.
This xml is very big, it has about 16k records. Parsing this takes about 7 sec. so its definitely to long.
Now it works in that way:
ASP.NET call WCF
WCF parse xml
ASP.NET is waiting for WCF callback
WCF gives back data to ASP.NET
of course there is caching for 1 minute but after it WCF must load data again.
Is there any possibility to make something that will refresh data without stopping site? Something like ... I don't know, double buffering? that will retrieve old data if there is none of new? Maybe you know better solution?
best regards
EDIT:
the statement which takes the longest time:
XDocument = XDocument.Load(XmlReader.Create(uri)); //takes 7 sec.
parse takes 70 ms, its okey, but this is not the problem. Is there a better solution to dont block the website? :)
EDIT2:
Ok I have found a better solution. Simply, I download xml to the hdd and Im read data from it. Then the other proccess starts download new version of xml and replace the old. Thx for engagement.
You seems to have XML to Object tool that creates an object model from the XML.
What usually takes most of the time is not the parsing but creating all these objects to represent the data.
So You might want to extract only part of the XML data which will be faster for you and not systematically create a big object tree for extracting only part of it.
You could use XPath to extract the pieces you need from the XML file for example.
I have used in the past a nice XML parsing tool that focuses on performances. It is called vtd-xml (see http://vtd-xml.sourceforge.net/).
It supports XPath and other XML Tech.
There is a C# version. I have used the Java version but I am sure that the C# version has the same qualities.
LINQ to XML is also a nice tool and it might do the trick for you.
It all depends on your database design. If you designed database in a way you can recognize which data is already queried then for each new query return only a records difference from last query time till current time.
Maybe you could add rowstamp for each record and update it on each add/edit/delete action, then you can easily achieve logic from the beginning of this answer.
Also, if you don't want first call to take long (when initial data has to be collected) think about storing that data locally.
Use something else then XML (like JSON). If you have big XML overhead, try to replace long element names with something shorter (like single char element names).
Take a look at this:
What is the easiest way to add compression to WCF in Silverlight?
Create JSON from C# using JSON Library
If you take a few stackshots, it might tell you that the biggest "bottleneck" is not parsing, but data structure allocation, initialization, and subsequent garbage collection. If so, a way around it is to have a pool of pre-allocated row objects and re-use them.
Also, if each item is appended to the list, you might find it spending a large fraction of time doing the append. It might be faster to simply push each new row on the front, and then reverse the whole list at the end.
(But don't implement these things unless you prove they are problems by stackshots. Until then, they are just guesses.)
It's been my experience that the real cost of XML is not the parsing, but the data structure manipulation.

Categories

Resources