Extract text from XML using C#

Extract text from XML using C# - c#

I work with XML documents that look like this:
All I need is extract the text between the tags. Since it's being successfully highlighted in black by a common XML editor, I'm assuming I should be able to extract it manually?
So far I've tried the following:
private void Form1_Load(System.Object sender, System.EventArgs e)
{
XmlDocument doc = new XmlDocument();
doc.Load("C:\\users\\admin\\desktop\\index.xml");
foreach (object node_loopVariable in doc.ChildNodes) {
node = node_loopVariable;
ProcNode(node);
}
}
private void ProcNode(XmlNode node)
{
Console.WriteLine(node.InnerText);
foreach (XmlNode subNode in node.ChildNodes) {
Console.WriteLine(subNode.InnerText);
}
}
Is that a reliable solution?

Use the XDocument class to read the XML and query it using LINQ to XML.

You can do something like that:
XmlDocument xmlDoc = new XmlDocument();
xmlDoc.Load(HttpContext.Current.Server.MapPath("App_Data/file.xml"));
XmlElement xelNo = xmlDoc.GetElementById("ElementID");
Then you can access the Attributes or Text of this Element;
But to use this you must know the ID of your element.

Related

Using XDocument to read the root element from XML using C# is not showing the root element

I am new to C# programming and trying to update the XML file using C#. Here when I am trying to get the root element using XDocument it is showing the complete script in the file.
Below is my code explanation:
I am having the below function and it is reading the file path from the command line arguments.
private XDocument doc;
public void Update(string filepath)
{
string filename = Path.GetFileName(filepath);
doc = xDocument.Load(filepath);
XElement rootelement = doc.Root;
}
Into the filepath variable, we are taking the path "E:\BuilderTest\COMMON.wxs"
Then we are loading the file using XDocument.
But when we are trying to get the rootelement from the file, it is not showing the root element. Instead, it is showing the complete data in the file.
But when I am using XmlDocument() instead of XDocument() I am able to see only the root element.
Below is the code using XmlDocument():
private XmlDocument doc;
public void Update(string filepath)
{
string filename = Path.GetFileName(filepath);
doc = new XmlDocument();
doc.Load(filepath);
XmlElement rootelement = doc.DocumentElement;
}
Please help me by providing your valuable inputs on this.

XDocument and XmlDocument are different class structure to follow as per requirement.
XDocument will work like below
XDocument doc;
doc = XDocument.Load(filepath);
XElement root = doc.Root;
Root, Descendants, Elements are the operations provided in XDocument. For every node its gives XElement
In your case you should use doc.Root to find the element, then use .Value to get its value
XElement comes with System.Xml.Linq. It is derived from XNode.
It gives you serialized information of every node one by one.
On the other hand XMLDocument will work like below
XmlDocument doc;
doc = new XmlDocument();
doc.Load(filepath);
XmlElement rootelement = doc.DocumentElement;
XmlElement comes with System.Xml. It is derived from XmlNode which is again derived from IEnumerable.
It gives you information in a Enumerable which you can easily parse.

Select xml file part with xpath and xdocument - C#/Win8

I am building a Windows 8 app, and I need to extract the whole XML node and its children as string from a large xml document, and the method that does that so far looks like this:
public string GetNodeContent(string path)
{
XmlReaderSettings settings = new XmlReaderSettings();
settings.IgnoreWhitespace = true;
settings.ConformanceLevel = ConformanceLevel.Auto;
settings.IgnoreComments = true;
using (XmlReader reader = XmlReader.Create("something.xml", settings))
{
reader.MoveToContent();
reader.Read();
XmlDocument doc = new XmlDocument();
doc.LoadXml(reader.ReadOuterXml());
IXmlNode node = doc.SelectSingleNode(path);
return node.InnerText;
}
}
When I pass any form of xpath, node gets the value of null. I'm using the reader to get the first child of root node, and then use XMLDocument to create one from that xml. Since it's Windows 8, apparently, I can't use XPathSelectElements method and this is the only way I can't think of. Is there a way to do it using this, or any other logic?
Thank you in advance for your answers.
[UPDATE]
Let's say XML has this general form:
<nodeone attributes...>
<nodetwo attributes...>
<nodethree attributes... />
<nodethree attributes... />
<nodethree attributes... />
</nodetwo>
</nodeone >
I expect to get as a result nodetwo and all of its children in the form of xml string when i pass "/nodeone/nodetwo" or "//nodetwo"

I've come up with this solution, the whole approach was wrong to start with. The problematic part was the fact that this code
reader.MoveToContent();
reader.Read();
ignores the namespace by itself, because it skips the root tag. This is the new, working code:
public static async Task<string> ReadFileTest(string xpath)
{
StorageFolder folder = await Package.Current.InstalledLocation.GetFolderAsync("NameOfFolderWithXML");
StorageFile xmlFile = await folder.GetFileAsync("filename.xml");
XmlDocument xmldoc = await XmlDocument.LoadFromFileAsync(xmlFile);
var nodes = doc.SelectNodes(xpath);
XmlElement element = (XmlElement)nodes[0];
return element.GetXml();
}

C# - using XmlNode class to find acroform or xfa form

I am making use of Xmlnode class to find acroform or xfa form i am thinking the difference as to be something in the xml content what i am trying to find out is the template node
so for example i get the xml i want to make use of this class to find out
"template" node
if "template" node is there i want to print a message concerning template part and if it is not there i will print some other message
I have reached this far:
private void getFormType(string reasonstring)
{
XmlDocument docxml = new XmlDocument();
docxml.LoadXml(reasonstring);
System.Xml.XmlNode xmlNode = docxml.SelectSingleNode("xml");
}

Use Server.MapPath to load external files

I want to load an XML file which is in D: drive. This is what I used
doc.Load(System.Web.HttpContext.Current.Server.MapPath("/D:/Employee.xml"));
But it gives me an error whenever I try to run my program:
Object reference not set to an instance of an object.
I read it somewhere that Server.MapPath can be used only for webpages or web apps. I made a form in asp.net using c#.
Why am I getting this error?
This is my code:
private void btnRead_Click(object sender, EventArgs e)
{
XmlDocument doc = new XmlDocument();
doc.Load("D:\\Employee.xml");
XmlNode root = doc.DocumentElement;
StringBuilder sb = new StringBuilder();
XmlNodeList nodeList = root.SelectNodes("Employee");
foreach (XmlNode node in nodeList)
{
sb.Append("Name: ");
//Select the text from a single node, “Title” in this case
sb.Append(node.SelectSingleNode("Name").InnerText);
sb.Append("EmpID: ");
sb.Append(node.SelectSingleNode("EmpID").InnerText);
sb.Append("Dept: ");
sb.Append(node.SelectSingleNode("Dept").InnerText);
sb.Append("");
}
System.Web.HttpContext.Current.Response.Write(sb.ToString());
}
I have made a form in VS 2008. Saved the details in an XML file. And now want to display the output.

Why not load directly:
doc.Load("D:\\Employee.xml");

In a desktop application there is not such HttpContext.Current, that's why you get the NullReferenceException. Instead, use
doc.Load("D:/Employee.xml");

How can I retrieve all the text nodes of a HTMLDocument in the fastest way in C#?

I need to perform some logic on all the text nodes of a HTMLDocument. This is how I currently do this:
HTMLDocument pageContent = (HTMLDocument)_webBrowser2.Document;
IHTMLElementCollection myCol = pageContent.all;
foreach (IHTMLDOMNode myElement in myCol)
{
foreach (IHTMLDOMNode child in (IHTMLDOMChildrenCollection)myElement.childNodes)
{
if (child.nodeType == 3)
{
//Do something with textnode!
}
}
}
Since some of the elements in myCol also have children, which themselves are in myCol, I visit some nodes more than once! There must be some better way to do this?

It might be best to iterate over the childNodes (direct descendants) within a recursive function, starting at the top-level, something like:
HtmlElementCollection collection = pageContent.GetElementsByTagName("HTML");
IHTMLDOMNode htmlNode = (IHTMLDOMNode)collection[0];
ProcessChildNodes(htmlNode);
private void ProcessChildNodes(IHTMLDOMNode node)
{
foreach (IHTMLDOMNode childNode in node.childNodes)
{
if (childNode.nodeType == 3)
{
// ...
}
ProcessChildNodes(childNode);
}
}

You could access all the text nodes in one shot using XPath in HTML Agility Pack.
I think this would work as shown, but have not tried this out.
using HtmlAgilityPack;
HtmlDocument htmlDoc = new HtmlDocument();
// filePath is a path to a file containing the html
htmlDoc.Load(filePath);
HtmlNodeCollection coll = htmlDoc.DocumentNode.SelectNodes("//text()");
foreach (HTMLNode node in coll)
{
// do the work for a text node here
}

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Extract text from XML using C# - c#

Use the XDocument class to read the XML and query it using LINQ to XML.

Related

Using XDocument to read the root element from XML using C# is not showing the root element

Select xml file part with xpath and xdocument - C#/Win8

C# - using XmlNode class to find acroform or xfa form

Use Server.MapPath to load external files

How can I retrieve all the text nodes of a HTMLDocument in the fastest way in C#?

Categories

Resources