Im looking for some advice on how I should go about a solution. I have an import to write using c#. The data comes from an xml file containing ~30000 records each with ~10 nodes for differnet data. My initial thought would be to create a node list of records ids(one of the nodes is a unique id). Then loop through the node list and use xpath to get the rest of the data for the record. My other thought was to convert the xml file into .cvs format and read it that way. Before i dive head first into one or the other any advice, pros/cons or suggestions? Thanks in advance
Go with whichever you feel more comfortable with.
Personally, I would use XDocument and LINQ to XML to query the XML directly.
Transforming to CSV has its own pitfalls, if you don't adhere to the rules (quoting fields, line breaks within fields etc...).
I agree with the above poster that you want to use LINQ to XML if possible, however if you are on an older version of the framework you could use an XMLDocument and the SelectNodes/SelectSingleNode methods. If you do that however make sure you use a NamespaceManager or you won't return anything from your methods unless your XML has no namespaces etc.
That got me a bunch of times.
Related
I would like to write a function to compare two XML files. Based on the following cases, the diff could be due to
the order of the nodes and
some nodes or attributes could have been added or removed
I have found a solution to traverse the XML by using a recursive function or LINQ to XML so basically I can get all nodes and attributes. I have read about the XML Diff and Patch Tool but I'm trying to avoid dependencies on my project. An added complexity to this is to determine which line the diff occurred but this is optional for now.
I'm currently thinking of storing the nodes and attributes of the two XML files to a data structure (e.g dictionary) and compare the dictionaries later but I'm not pretty sure how to do this one. Can you share some ideas?
There is https://msdn.microsoft.com/en-us/library/system.xml.linq.xnode.deepequals%28v=vs.110%29.aspx to do XNode.DeepEquals(XDocument.Load("file1.xml"), XDocument.Load("file2.xml")) but that will only give you a boolean result, not an indication of where the difference is.
I would like to create an xml file (100 lines, 5 namespaces and 30 different tags, 20 attributes total). I already have a hardcoded xml example but i need to write some c# code to generate a dynamic xml and to fill the values, which of course can change. Performance is a concern.
Should I use linq to xml and create all the tags with XDocument and XElement and provide variables that contain the dynamic values
Since i have already an xml example, create a schema.xsd and provide the values to the object
The xml (the object stream) will be sent via HTTP POST every second to a web service.
I am going to timetest both versions but i was just curious if someone already did that.
The LINQ to XML version should have better performance.
If you want to optimize it even more you probably should consider direct string concatenation (but that's not a best practice and the performance gain won't be significant).
The next performance option will be XmlTextWriter. Probably the fastest way to write XML "correctly" - it don't need to create XML object model like LINQ to XML, so should be significantly faster.
You can optimize serialization a bit if you cache the XmlSerializer instance and won't create it every time. Then it will also be relatively fast, though definitely slower than direct XML writes.
I have a particular requirement to parse Sharepoint CAML and produce something else.
The first scenario is to produce an SQL Query.
Are there any best practice dev tools/algorithms when parsing XML? I am thinking of using Linq To Xml as my tool. But not sure if there is a better approach for such type of parsing.
Another approach I like is that the one used by the OpenXML SDK where they have built a strongly typed engine around the Open XML format. Perhaps I could build something similar but it could be a little far fetched.
Any assistance (perhaps previous experience on xml parsing) would be greatly appreciated.
When reading any structure in any format, you have to find the optimal traversal mechanism. In this case, since CAML Logical Operations are binary (have 2 children only), a particular traversal mechanism can be adopted. Linq to XML is a powerful tool for traversing XML.
I have data in xml files (about 5000), need to search and filter this data. It would be wonderful if I can use Fuzzy Search. Suppose need use index? index of the attributes? what should I do use xml database, something like lucene? I prefer .net.
For a recordset that small, you may find that XSLT meets your needs.
See here for an XSLT Tutorial.
Have you looked at LINQ to XML? Sounds like it would be perfect for this situation, allowing you to query the XML with a SQL-like syntax, and the performance shouldn't be too bad over ~5000 records.
Edit: Sorry, I misread the question—it appears you meant 5000 XML files, not 5000 records. Still might be worth a look though.
I have an xml file with about 500 mb and i'm using LINQ with c# to query that file, but it's very slow, because it loads everything into memory. Is there anyway that i can query that file without loading all into memory?
Thanks
This article should get you up and running. Take a look at the SimpleStreamAxis method, which is very handy for finding nodes in large XML files. I've successfully used a variant of this method on 5GB XML files without loading the file into memory.
You can use the technique described on MSDN's page about XNode.ReadFrom to generate an IEnumerable of XNodes (in the example they provide, XElements) from an XmlReader.
Note that when you read an XElement from a Stream or XmlReader, the entire contents of that element must be read too - so you'll still need a little bit of custom logic in the IEnumerator logic to ensure that the right XElements get returned - for instance, if you return the root element, you might as well just parse the entire document right away since the root element contains almost everthing anyhow. The XNode.ReadFrom example contains such logic too.
No, its not possible when using Linq. Linq loads a model of the full xml into memory so you can have access using the tree structure.
If you want fast access without loading the file into memory you could use XmlReader class.
This class gives you a fast forward-only xml parser that has only the current node in memory.
Here is some help on that: http://support.microsoft.com/kb/307548
Edit: Sorry, didn't know that its possible to combine xmlreader with linq.