I have a huge XML file of 9 GB where i need to add a node:
XML File like:
<ABC>
<DEF>
<GHI>
<AB>"ab"</AB>
<CD>"cd"</CD>
<EF>"ef"</EF> --NEED TO ADD
</GHI>
</DEF>
</ABC>
<PQR>
</PQR>
This is fixed that need too add in only ABC tag . XDocument will need so much resources any help is appreciated
Have a look here at how to stream in an XML document to avoid loading it in one go, then simply match on the element(s) you desire and add what you need in.
Use XmlReader/XmlWriter, XmlTextReader/XmlTextWriter (see here). These are fast, forward only reader/writers, that do not load the whole xml in one go so should cope with large files.
Related
Below Example How can Get "Airtel" and "145" Values, because my Client Has given this type XML Response
So How Can I Get both Values
<item item="Campaign name" type="string">Airtel</item>
<item item="Daily Limit" type="number">145</item>
As per the comment, we would need to see the full XML to give you a more complete answer. But you have 3 real options :
XMLReader. Probably not what you want as it's forward only and involves a lot of manual parsing.
XMLDocument. Not a bad option if you only want limited amount of notes and want their values and don't want to deserialize the entire XML doc.
XMLSerializer. Good if you want to deserialize the entire object straight from XML to class without you having to do a heck of a lot.
If you edit your question to include the full XML doc, then I can give you a more complete answer or you can read about your options for parsing XML here : https://dotnetcoretutorials.com/2020/04/23/how-to-parse-xml-in-net-core/
I am working with some xml in C# and am having some issues parsing an xml file due to the format it is in. It has non xml data in the file and I have no control over the format of this file. The file is "test.xml"(see below). I am only concerned with the xml portion of the data, but am unsure the best way to go about accessing it. Any thoughts or recommendations would be greatly appreciated.
Test data -1
Smith, 2234
##*j
Random--
#<?xml version="1.0" encoding="utf-16"?>
<ConfigMessage xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns="http://www.Test.com/schemas/Test.test.Config">
<Config>
<Version>10</Version>
<Build>00520</Build>
<EnableV>false</EnableV>
<BuildL>22</BuildL>
<BuildP>\\testpath\test</BuildP>
</Config>
</ConfigMessage>
#
Put the whole file into a string that contains anything within the first '<' and the last '>' characters detected on the file. Then you can treat it as normal XML from there. If there's random non-XML elements throughout it though you will need to add additional logic to detect starting/stopping XML "blocks".
I can suggest you such solution: open your pseudo-xml like simple text-file, read whole text, after that, with using regex you ought to take xml document (part of primordial document that is able to be converted to XML [|startTag|any symbols|/endTag|]), put it into XDocument (in memory) and now parse it like XML-file.
How to add a new node, update an existing node and remove an existing node of an xml document without loading the whole document in memory?
I'm having an xml document and treating it as the memory of my application so would need to be able to do hundreds of reads and writes quickly without loading the whole document.
its structure is like this:
<spiderMemory>
<profileSite profileId="" siteId="">
<links>
<link>
<originalUrl></originalUrl>
<isCrawled></isCrawled>
<isBroken></isBroken>
<isHtmlPage></isHtmlPage>
<firstAppearedLevel></firstAppearedLevel>
</link>
</links>
</profileSite>
</spiderMemory>
How would that be possible with XDocument?
Thanks
If you want to do hundreds of reads and writes quickly... you might be using the wrong technology. Have you tried using a plain old RDBMS?
If you still need the XML representation, then you can create an export methods to produce it from the database.
XML isn't really a good substitute for this kind of problem. Just saying.
Also... what is wrong with having the whole thing in memory? How big can it possibly get? Say 1GB? Suck it up. Say 1TB? Oops. But then XML is wrong, wrong, wrong anyway in that case ;) way too verbose!
You can use XmlReader, something like this :
FileStream stream = new FileStream("test.xml", FileMode.Open);
XmlReader reader = new XmlTextReader(stream);
while(reader.Read())
{
Console.WriteLine(reader.Value);
}
here is an more elaborate example http://msdn.microsoft.com/en-us/library/cc189056%28v=vs.95%29.aspx
As Daren Thomas said, the proper solution is to use RDBMS instead of XML for your needs. I have a partial solution using XML and Java. Stax parser does not parse the whole document in memory and is a lot faster than DOM (still XML parsing will always be slow). A 'pull parser' (eg Stax) allows u to control what gets parsed. A less cleaner way is to throw an exception in SAX parser when you get the element(s) needed.
To modify, the simplest (but slow) way is to use XPath. Another (untested) option is to treat XML file as text and then 'Search and replace' stuff. Here you can use all kinds of text search optimization.
I am Currently Facing A problem. I am loading a xml file in C# and remove some nodes from it and appending some nodes. now problem is that when i am doing removal from the xml file then there are some empty lines created automatically ,so i want to remove these line .
And when i append some nodes to the parent node in xml then i want the new line in each ending tag
For Eg. My Xml file is
<intro id="S0001">
<title>Introduction Title</title>
<para>This is a paragraph. Note that paragraphs can contain other block–level objects, such as lists, as well as directly containing text.</para>
<para>The introduction can contain all of the text objects that a section can contain, except that it cannot be divided into parts, sections and sub–sections.</para>
<para>The introduction can contain tables:</para>
</intro><part>
<no>Part A</no> Article Structure <sup>(Part Title)</sup><section1 id="S0002">`enter code here`
<no>Sect 1</no>
<title>First Section in Part 1 <sup>(Section 1 Title)</sup></title>
<shortsectionhead>Short Section Header</shortsectionhead>
<para>This is a section in the first part of the article.</para>
</section1><section1 id="S0003">
Code:
XmlNode partNnode = xmlDoc.SelectSingleNode("//part");
XmlNode introNode=xmlDoc.SelectSingleNode("//intro");
XmlDocumentFragment newNode=xmlDoc.CreateDocumentFragment();
newNode.InnerXml=partNnode.OuterXml;
introNode.ParentNode.InsertAfter(newNode,introNode);
partNnode.ParentNode.RemoveChild(partNnode);
partNnode = xmlDoc.SelectSingleNode("//part");
nodeList = xmlDoc.SelectNodes("//section1");
foreach (XmlNode refrangeNode in nodeList)
{
newNode=xmlDoc.CreateDocumentFragment();
newNode.InnerXml=refrangeNode.??OuterXml;
partNnode.AppendChild(newNode);
}
Please help me
Thanks in advance
If you load and save a XMl file with C#, then the XML should be formatted correctly (an easy way to format strange looking XML files is just to load and save them with some C# code).
If I understand your question correctly, then you are just not happy with the format of the XML file?
Like you want (A):
</intro><part>
But you get (B):
</intro>
<part>
If that is the question, then, in my eyes, you just want a strange thing. Because...
a) Code doesn't care how the XML file is formatted and
b) The format in (B) is the correct one
If you, for what reason ever, want to change it, then you have to parse through the XML file, opening it as a string and checking manually for closed and opened tags.
I'm using XSLT transfer an XML to a different format XML. If there is empty data with the element, it will display as a self-closing, eg. <data />, but I want output it with the closing tag like this <data></data>.
If I change the output method from "xml" to "html" then I can get the <data></data>, but I will lose the <?xml version="1.0" encoding="UTF-8"?> on the top of the document. Is this the correct way of doing this?
Many thanks.
Daoming
If you want this because you think that self closing tags are ugly, then get over it.
If you want to pass the output to some non-conformant XML Parser that is under control, then use a better parser, or fix the one you are using.
If it is out of your control, and you must send it to an inadequate XML Parser, then do you really need the prolog? If not, then html output method is fine.
If you do need the XML prolog, then you could use the html output method, and prepend the prolog after transformation, but before sending it to the deficient parser.
Alternatively, you could output it as XML with self-closing tags, and preprocess before sending it to your deficient parser with some kind of custom serialisation, using the DOM. If it can't handle self-closing tags, then I'm sure that isn't the only way in which it fails to parse XML. You might need to do something about namespaces, for example.
You could try adding an empty text node to any empty elements that you are outputting. That might do the trick.
Self-closed and explicitly closed elements are exactly the same thing in any regard whatsoever.
Only if somewhere along your processing chain there is a tool that is not XML aware (code that does XML processing with regex, for example), it might make a difference. At which point you should think about changing that part of the processing, instead of the XML generation/serialization part.