Does XDocument.Load loads all data into memory? - c#

I must read all first level nodes of the root node of large xml file that looks like the following:
<root>
<record n="1"><a/><b/><c/></record>
<record n="2"><a/><b/><c/></record>
<record n="3"><a/><b/><c/></record>
</root>
And my code looks like:
var xml = XDocument.Load(filename);
var firstNode = xml?.Root?.Descendants()?.FirstOrDefault();
var elements = firstNode?.Elements();
I just need to get the first child of the root and all first level descendants of it. This code works fine, but the question is: is it safe to read like this? I guess it does not load all data into memory - only the structure of the xml file?
As I see memory is not increased while debugging. It only explodes if I actually try to see what is in xml variable.

No, XDocument loads the whole document into memory. Whether it's "safe" to do this or not depends on what size of document you need to be able to handle.
If you need to handle XML files that wouldn't fit into memory, you'd want to use XmlReader, which is unfortunately considerably harder to use.

I use combination of xmlreader and xdocument. Updated code to dynamically get first tag name.
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Xml;
using System.Xml.Linq;
namespace ConsoleApplication1
{
class Program
{
const string FILENAME = #"c:\temp\test.xml";
static void Main(string[] args)
{
XmlReader reader = XmlReader.Create(FILENAME);
reader.ReadStartElement(); //read root
XElement.ReadFrom(reader);// read \n
XElement record = null;
string recordName = "";
Boolean first = true;
while (!reader.EOF)
{
if (first)
{
record = (XElement)XElement.ReadFrom(reader);
first = false;
recordName = record.Name.LocalName;
}
else
{
if (reader.Name != recordName)
{
reader.ReadToFollowing(recordName);
}
if (!reader.EOF)
{
record = (XElement)XElement.ReadFrom(reader);
}
}
}
}
}
}

Related

ArgumentException: The node to be inserted is from a different document context

I've searched Stackoverflow on this question, as well as other forums but no one seems to be making this the way I'm making.
By this I mean, in my code, instead of using XMLNode, I'm using XMLElement.
So, without further ado, my intention is to save in an already existing XML Document, a new Element is a child of other existing Elements besides the root.
This is an example on my XML File:
<ROOT>
<NOT_THIS_ONE>
</NOT_THIS_ONE>
<THIS_ONE>
</THIS_ONE>
</ROOT>
So, this is my code:
//XML File
TextAsset repository = Resources.Load("Repository") as TextAsset;
//Create XML Reference
XmlDocument xmlDocument = new XmlDocument();
//Load XML File into XML Reference
xmlDocument.LoadXml(repository.text);
//Root Node
XmlNode statsNode = GetRootNode();
//Get History Node
XmlNode thisOneNode = statsNode.ChildNodes.Item(1);
The GetRootNode() function is this:
//Create Xml Reference
XmlDocument xmlData = new XmlDocument();
//Load Xml File into Xml Reference
xmlData.LoadXml(repository.text);
//Get Root Node
return xmlData.ChildNodes.Item(1);
The thisOneNode gets the <THIS_ONE> Element as a Node (at least that's what I think it does).
Later on, I do this:
XmlElement childOfThisOne = xmlDocument.CreateElement("CHILD");
XmlElement pointsSession = xmlDocument.CreateElement("POINTS");
pointsSession.InnerText = points.ToString();
childOfThisOne.AppendChild(pointsSession);
thisOneNode.AppendChild(childOfThisOne);
xmlDocument.Save("Assets/Resources/GamePoints.xml");
And my intention with this would be something like:
<ROOT>
<NOT_THIS_ONE>
</NOT_THIS_ONE>
<THIS_ONE>
<CHILD>
<POINTS>102</POINTS>
</CHILD>
</THIS_ONE>
</ROOT>
But I get the error in the title: "ArgumentException: The node to be inserted is from a different document context."
And the line in question is this: thisOneNode.AppendChild(childOfThisOne);
Now, where I've searched and the articles I found, people were using XmlNode and even used an xmlDocument.ImportNode(); I tried that too and the same error occurred. Now, I don't how to fix this and I'm requesting your help on this one.
Thank you for your time and happy holidays!
Using Xml Linq :
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Xml;
using System.Xml.Linq;
namespace ConsoleApplication1
{
class Program
{
static void Main(string[] args)
{
string xml =
#"<ROOT>
<NOT_THIS_ONE>
</NOT_THIS_ONE>
<THIS_ONE>
</THIS_ONE>
</ROOT>";
XDocument doc = XDocument.Parse(xml);
XElement thisOne = doc.Descendants("THIS_ONE").FirstOrDefault();
thisOne.Add(new XElement("CHILD", new XElement("POINTS", 102)));
doc.Save("Assets/Resources/GamePoints.xml");
}
}
}

C# Read xml file that uses name prefixes but does not define namespace in the document itself

I have an xml file from a client. It uses name prefixes with many nodes. But it does not defines any namespace in the document. A sample is given below:
<?xml version="1.0"?>
<SemiconductorTestDataNotification>
<ssdh:DocumentHeader>
<ssdh:DocumentInformation>
<ssdh:Creation>2019-03-16T13:49:23</ssdh:Creation>
</ssdh:DocumentInformation>
</ssdh:DocumentHeader>
<LotReport>
<BALocation>
<dm:ProprietaryLabel>ABCDEF</dm:ProprietaryLabel>
</BALocation>
</LotReport>
</SemiconductorTestDataNotification>
I used following xml classes to read it but failed
System.Xml.Linq.XElement
System.Xml.XmlDocument
System.Xml.XmlReader
System.Xml.Linq.XDocument
It gives error:
'ssdh' is an undeclared prefix.
I know the prefixes namespaces. These would be:
xmlns:ssdh="urn:rosettanet:specification:system:StandardDocumentHeader:xsd:schema:01.13"
xmlns:dm="urn:rosettanet:specification:domain:Manufacturing:xsd:schema:01.14"
Adding these namespaces in the xml file by myself is not feasible because there would be many xml files and these files would come on daily basis.
Is it possible that I create a file (e.g. xsd) and write namespaces in them and read xml file using this (so called) schema file in C# code.
You need to use a non xml method for reading bad xml file. Try following code :
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Xml;
using System.Xml.Linq;
using System.IO;
namespace ConsoleApplication3
{
class Program1
{
const string BAD_FILENAME = #"c:\temp\test.xml";
const string Fixed_FILENAME = #"c:\temp\test1.xml";
static void Main(string[] args)
{
StreamReader reader = new StreamReader(BAD_FILENAME);
StreamWriter writer = new StreamWriter(Fixed_FILENAME);
string line = "";
while ((line = reader.ReadLine()) != null)
{
if (line == "<SemiconductorTestDataNotification>")
{
line = line.Replace(">",
" xmlns:ssdh=\"urn:rosettanet:specification:system:StandardDocumentHeader:xsd:schema:01.13\"" +
" xmlns:dm=\"urn:rosettanet:specification:domain:Manufacturing:xsd:schema:01.14\"" +
" >");
}
writer.WriteLine(line);
}
reader.Close();
writer.Flush();
writer.Close();
XDocument doc = XDocument.Load(Fixed_FILENAME);
}
}
}

Check if document received is XML or Edifact in custom pipeline component

Problem: I need to check whether an incoming document, inside an XML element, is XML or Edifact formatted. Depending on what format the document has, it needs to be processed accordingly.
Current solution: An XDocument instance is created from the incoming message. The incoming message is always XML.
var originalStream = pInMsg.BodyPart.GetOriginalDataStream();
XDocument xDoc;
using (XmlReader reader = XmlReader.Create(originalStream))
{
reader.MoveToContent();
xDoc = XDocument.Load(reader);
}
After this the document is extracted from the XML element "msgbody". Currently it assumes this to be XML formatted, which throws an error when the document is Edifact formatted. The code below extracts it, and the creates a new XDocument, which is sent to the MessageBox.
string extractedDocument = xDoc.Root.Element("msgbody").Value;
extractedDocument = HttpUtility.HtmlDecode(extractedDocument);
XDocument outputXml = XDocument.Parse(extractedDocument);
Example message from biztalk:
<NewTable>
<conversationID>2ff845e7-30a4-482e-98d6-8c3249c5dea1</conversationID>
<hostUTC>2018-12-17T12:17:04.107Z</hostUTC>
<msgType>INVOIC</msgType>
<msgid>721254</msgid>
<icref>36655</icref>
<msgFormat_org>EDIFACTBauhaus</msgFormat_org>
<msgFormat>EDI</msgFormat>
<msgbody>"Edifact or XML document"</msgbody>
<fromID>GLN:5790034516518</fromID>
<toID>GLN:5790000451485</toID>
</NewTable>
Question: How can I create a check for the document inside the msgbody tag, to determine whether it is XML or Edifact formatted, before processing it?
I like using a dictionary to get all the properties using xml linq. See code below. If you are getting string response the nuse instead of the Load(filename) method use Parse(string).
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Data;
using System.Xml;
using System.Xml.Linq;
namespace ConsoleApplication93
{
class Program
{
const string FILENAME = #"c:\temp\test.xml";
static void Main(string[] args)
{
XDocument doc = XDocument.Load(FILENAME);
Dictionary<string, string> dict = doc.Descendants("NewTable").Elements()
.GroupBy(x => x.Name.LocalName, y => (string)y)
.ToDictionary(x => x.Key, y => y.FirstOrDefault());
}
}
}

Fast replacement node names in XML

I use a legacy service that manage Fetch CRM: crmService.Fetch(fetchXml).
I get XML string result like this:
<resultset>
<result>
<new_categoria name="Cliente" formattedvalue="1">1</new_categoria>
<new_name>Admin</new_name>
<new_tipodecampanaid>{F8F29978-4E0F-AE92-FB43-48B4DC406B1F}</new_tipodecampanaid>
<statuscode name="Activo">0</statuscode>
</result>
<result>
<new_categoria name="Client" formattedvalue="1">1</new_categoria>
<new_name>Client</new_name>
<new_tipodecampanaid>{758341BA-4661-D694-6743-8D2DC875793E}</new_tipodecampanaid>
<statuscode name="Activo">0</statuscode>
</result>
<result>
</resultset>
We need (because alias not support for Fetch method) replace several node names:
1 - Replace new_categoria node name by org_category
2 - Replace new_name node name by org_name
3 - Replace new_tipodecampanaid node name by org_campaignid
We need high peformance, maybe results can be huge.
Using XmlDocument:
var doc = new XmlDocument();
doc.LoadXml(fetchXMLResult);
XmlNode root = doc.SelectSingleNode("resultset");
foreach (XmlNode childNode in root.ChildNodes)
{
}
Using XDocument:
XDocument resultset = XDocument.Parse(fetchXMLResult);
if (resultset.Root == null || !resultset.Root.Elements("result").Any())
{
return;
}
resultset.Root.Elements("result")
Any suggestions?
For speed you could run through the file using an XmlReader and write each node you read to a new file using an XmlWriter.
See this link for an example.
When you have large xml file always use XmlReader. Try this code
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Xml;
using System.Xml.Linq;
namespace ConsoleApplication1
{
class Program
{
const string INPUT_FILENAME = #"c:\temp\test1.xml";
const string OUTPUT_FILENAME = #"c:\temp\test2.xml";
static void Main(string[] args)
{
XmlReader reader = XmlReader.Create(INPUT_FILENAME);
XmlWriter writer = XmlWriter.Create(OUTPUT_FILENAME);
writer.WriteStartElement("resultset");
while (!reader.EOF)
{
if (reader.Name != "result")
{
reader.ReadToFollowing("result");
}
if (!reader.EOF)
{
XElement result = (XElement)XElement.ReadFrom(reader);
result.Element("new_categoria").Name = "org_category";
result.Element("new_name").Name = "org_name";
result.Element("new_tipodecampanaid").Name = "org_campaignid";
writer.WriteRaw(result.ToString());
}
}
writer.WriteEndElement();
writer.Flush();
writer.Close();
}
}
}

How to add Process Instruction to an XML file in C#

How can I add a process instruction to ~50 xml files?
<?xml-stylesheet type="text/xsl" href="Sample.xsl"?>
If I append the node, it is added to the end of the file but it needs to be first.
I would use LINQ to XML:
using System;
using System.Xml.Linq;
public class Test
{
static void Main()
{
XDocument doc = XDocument.Load("test.xml");
var proc = new XProcessingInstruction
("xml-stylesheet", "type=\"text/xsl\" href=\"Sample.xsl\"");
doc.Root.AddBeforeSelf(proc);
doc.Save("test2.xml");
}
}

Categories

Resources