Parsing XML with DataSet- Performance

Parsing XML with DataSet- Performance - c#

As per my requirement I want to display some product related information into my UI.
All the information's come through one API URL.API Return XML Output. XML may have more than 100 tag. But as per my requirement I want only 30 to 50 tag. Here I need to pass parameter as input and get the product information.
I using .asmx service as wrapper service and all the parsing process are done here.
In code behind page, I consume the service and display the information.
How to parse the XML? Currently I planned to do is XML to DataSet-(ds.ReadXml(XML))
Does it affect performance? Is there any other way to do? Please guide me.

If you want to bind the result to a Control, then dataset approach (indicated by you) makes sense. How ever, if you need text value of those 30 / 50 tags without what the parent/child nodes are in between, you can use XmlDocument/XPath

I would use LINQ to XML
more info at
http://msdn.microsoft.com/en-us/library/bb387098.aspx
for older version of the framework use the XmlTextReader
Use the XmlTextReader class to process large XML documents in an efficient, forward - only manner. XmlTextReader uses small amounts of memory
Avoid using the DOM because the DOM reads the entire XML document into memory. If the entire XML document is read into memory, the scalability of your application is limited. Using XmlTextReader in combination with an XmlTextWriter class permits you to handle much larger documents than a DOM-based XmlDocument class.
http://msdn.microsoft.com/en-us/library/ff647804.aspx

Related

Is it better to have SQL Server parse a large multi document XML or send it each document separately

I need to request the XML from PubMed like
https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=pubmed&id=27087788,28322247,26158412&retmode=xml
The example has 3 IDs but the request can be as much as 200 at a time. The request is being done by a .NET web service. I am looking for the most efficient way to process the XML files. I know that the the term "best" or "efficient" is very subjective and dependent upon many things but:
Is it better to send the entire string to the SQL Server database (if it is even possible because of length or possible nesting levels) and let it parse the document and save it to the database or is it better to parse the document in the web service using a XMLTextReader or XML Document Object and send each document? Each document needs to be saved as a separate record.
Thanks for your information.

My first thought was: Why SQL-Server? Why send all this data around? Do the parsing in C#!
But - on the second sight: If I understand this correctly, you want to read many different XML files and store them in your database.
Now I'd rather ask: When do you need to retrieve data from these XMLs and do you need to store extracted data in relational tables? Would it be a possible approach for you to store alle these XMLs as-is in XML typed columns and read them on demand?
You can pass your XML as C#-string (which is unicode) and insert this directly into an XML-typed column. To avoid any hassel you should cut away the first lines (<xml>declaration and DOCTYPE) and start with <PubmedArticleSet>.
The rest should be easily transfered and stored in SQL-Server.
If you need help on how to read this? Just come back with another more pointed question.
About your Which is faster question you might read this.

Extracting a small subset of data from XMLs

I am writing a C# / VB program that is to be used for reporting data based upon information received in XMLs.
My situation is that I receive many XMLs per month (about 100-200) - Each ranging in size from 10mb to 350mb. For each of these XMLs, I only need a small subset of its data (less than 5% of any one file's entire data) so as to produce the necessary reports.
Also, that subset of data will always be held in the same key-structure (it will exist within multiple keys and at differing levels down, perhaps, but it will always exist within the same key names / the keys containing it will always have the with the same attributes such as "name", etc)
So, my current idea of how to go about doing this is to:
To create a "scraper" that will pull the necessary data from the XMLs using XPath.
Store that small subset of necessary data in a SQL Server table along with file characteristic data stored in a separate table so as to know which file this scraped data came from
Query out the data into a program for reporting it.
My main question here is really what is the best way to scrape that data out?
I am most familiar with XPath, but for multiple files of 200MB in size, I'm afraid of performance issues loading in the entire file.
Other things I have seen / researched are:
Creating an XSLT file to transform / pull from the XML only the data I want
Using Linq to XML
Somehow linking the XMLs to SQL server and then being able to query them directly
Using ADO to query the XMLs from within the program
Doing it using the XMLReader class (rather than loading in each XML entirely)
Maybe there is a native .Net component that does this very well already
Quite honestly, I just have no clue what the standard is given the high number of XMLs and the large variance in file sizes and I'm not familiar with any of the other ways of doing this - such as, for example, linking the XMLs to SQL Server directly / using ADO to query the XML - and, therefore, don't know of their possible benefits / drawbacks.
If any of you have been in a similar situation, I'd really appreciate any kind of pointers in the right direction / at least validation that my method isn't the worst one out there :)
Thanks!!!

As for the memory consumption and performance concerns, a nice feature of the .NET XML APIs is that you can combine XmlReader with XPathDocument or XmlDocument or XElement to only selectively read part of a document into memory to then have the XPath or LINQ to XML features available on that part. LINQ to XML has http://msdn.microsoft.com/en-us/library/system.xml.linq.xnode.readfrom%28v=vs.110%29.aspx for doing that, DOM/XmlDocument has http://msdn.microsoft.com/en-us/library/system.xml.xmldocument.readnode%28v=vs.110%29.aspx. So depending on your XML structure you might be able to use an XmlReader to read forward through the XML in a fast way without consuming much memory and then, when you have the element you are interested in, you can read it into an XElement (LINQ to XML) or XmlNode (DOM) to then apply LINQ to XML and/or XPath to read out details.

Performance difference between linq to xml and xml serialization

I would like to create an xml file (100 lines, 5 namespaces and 30 different tags, 20 attributes total). I already have a hardcoded xml example but i need to write some c# code to generate a dynamic xml and to fill the values, which of course can change. Performance is a concern.
Should I use linq to xml and create all the tags with XDocument and XElement and provide variables that contain the dynamic values
Since i have already an xml example, create a schema.xsd and provide the values to the object
The xml (the object stream) will be sent via HTTP POST every second to a web service.
I am going to timetest both versions but i was just curious if someone already did that.

The LINQ to XML version should have better performance.
If you want to optimize it even more you probably should consider direct string concatenation (but that's not a best practice and the performance gain won't be significant).
The next performance option will be XmlTextWriter. Probably the fastest way to write XML "correctly" - it don't need to create XML object model like LINQ to XML, so should be significantly faster.
You can optimize serialization a bit if you cache the XmlSerializer instance and won't create it every time. Then it will also be relatively fast, though definitely slower than direct XML writes.

problem with huge data

I have WCF service which reads data from xml. Data in xml is being changed every 1 minute.
This xml is very big, it has about 16k records. Parsing this takes about 7 sec. so its definitely to long.
Now it works in that way:
ASP.NET call WCF
WCF parse xml
ASP.NET is waiting for WCF callback
WCF gives back data to ASP.NET
of course there is caching for 1 minute but after it WCF must load data again.
Is there any possibility to make something that will refresh data without stopping site? Something like ... I don't know, double buffering? that will retrieve old data if there is none of new? Maybe you know better solution?
best regards
EDIT:
the statement which takes the longest time:
XDocument = XDocument.Load(XmlReader.Create(uri)); //takes 7 sec.
parse takes 70 ms, its okey, but this is not the problem. Is there a better solution to dont block the website? :)
EDIT2:
Ok I have found a better solution. Simply, I download xml to the hdd and Im read data from it. Then the other proccess starts download new version of xml and replace the old. Thx for engagement.

You seems to have XML to Object tool that creates an object model from the XML.
What usually takes most of the time is not the parsing but creating all these objects to represent the data.
So You might want to extract only part of the XML data which will be faster for you and not systematically create a big object tree for extracting only part of it.
You could use XPath to extract the pieces you need from the XML file for example.
I have used in the past a nice XML parsing tool that focuses on performances. It is called vtd-xml (see http://vtd-xml.sourceforge.net/).
It supports XPath and other XML Tech.
There is a C# version. I have used the Java version but I am sure that the C# version has the same qualities.
LINQ to XML is also a nice tool and it might do the trick for you.

It all depends on your database design. If you designed database in a way you can recognize which data is already queried then for each new query return only a records difference from last query time till current time.
Maybe you could add rowstamp for each record and update it on each add/edit/delete action, then you can easily achieve logic from the beginning of this answer.
Also, if you don't want first call to take long (when initial data has to be collected) think about storing that data locally.
Use something else then XML (like JSON). If you have big XML overhead, try to replace long element names with something shorter (like single char element names).
Take a look at this:
What is the easiest way to add compression to WCF in Silverlight?
Create JSON from C# using JSON Library

If you take a few stackshots, it might tell you that the biggest "bottleneck" is not parsing, but data structure allocation, initialization, and subsequent garbage collection. If so, a way around it is to have a pool of pre-allocated row objects and re-use them.
Also, if each item is appended to the list, you might find it spending a large fraction of time doing the append. It might be faster to simply push each new row on the front, and then reverse the whole list at the end.
(But don't implement these things unless you prove they are problems by stackshots. Until then, they are just guesses.)
It's been my experience that the real cost of XML is not the parsing, but the data structure manipulation.

LINQ How to search data from large XML file?

I have an xml file with about 500 mb and i'm using LINQ with c# to query that file, but it's very slow, because it loads everything into memory. Is there anyway that i can query that file without loading all into memory?
Thanks

This article should get you up and running. Take a look at the SimpleStreamAxis method, which is very handy for finding nodes in large XML files. I've successfully used a variant of this method on 5GB XML files without loading the file into memory.

You can use the technique described on MSDN's page about XNode.ReadFrom to generate an IEnumerable of XNodes (in the example they provide, XElements) from an XmlReader.
Note that when you read an XElement from a Stream or XmlReader, the entire contents of that element must be read too - so you'll still need a little bit of custom logic in the IEnumerator logic to ensure that the right XElements get returned - for instance, if you return the root element, you might as well just parse the entire document right away since the root element contains almost everthing anyhow. The XNode.ReadFrom example contains such logic too.

No, its not possible when using Linq. Linq loads a model of the full xml into memory so you can have access using the tree structure.
If you want fast access without loading the file into memory you could use XmlReader class.
This class gives you a fast forward-only xml parser that has only the current node in memory.
Here is some help on that: http://support.microsoft.com/kb/307548
Edit: Sorry, didn't know that its possible to combine xmlreader with linq.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Parsing XML with DataSet- Performance - c#

If you want to bind the result to a Control, then dataset approach (indicated by you) makes sense. How ever, if you need text value of those 30 / 50 tags without what the parent/child nodes are in between, you can use XmlDocument/XPath

Related

Is it better to have SQL Server parse a large multi document XML or send it each document separately

Extracting a small subset of data from XMLs

Performance difference between linq to xml and xml serialization

problem with huge data

LINQ How to search data from large XML file?

Categories

Resources