Handling nullreferenceexception with htmlagilitypack - c#

I'm getting a nullreferenceexception using the htmlagilitypack when my search returns nothing. I need to know how to handle this in code. I'm trying to use ?? but I'm both not using it right and not really sure how to use it anyway. I really just want to know how to run some method if nodes is empty. I could probably just check with an IF if there's no better way.
public DataTable tableIntoTable(HtmlDocument doc)
{
var table = new DataTable("MyTable");
table.Columns.Add("raw", typeof(string));
var xpath = #"//th[#class='ddlabel'] | //table[not(.//*[contains(#*,'pldefault') or contains(#*,'ntdefault') or contains(#*,'bgtabon')])]";
var nodes = doc.DocumentNode.SelectNodes(xpath);
foreach (var node in nodes ?? new HtmlAgilityPack.HtmlNodeCollection {null})
//new is underlined in red, not sure how it's supposed to work
{
table.Rows.Add(node.InnerHtml);
}
return table;
}

Well, if the exception is caused by nodes being null, then don't try to iterate through it if it is null.
public DataTable tableIntoTable(HtmlDocument doc)
{
var table = new DataTable("MyTable");
table.Columns.Add("raw", typeof(string));
var xpath = #"//th[#class='ddlabel'] | //table[not(.//*[contains(#*,'pldefault') or contains(#*,'ntdefault') or contains(#*,'bgtabon')])]";
var nodes = doc.DocumentNode.SelectNodes(xpath);
// Don't iterate if nodes is null.
if (nodes != null)
{
foreach (var node in nodes)
{
table.Rows.Add(node.InnerHtml);
}
}
return table;
}

If you really like the null-coalescing operator for its beauty (like me), try this:
foreach (var node in nodes ?? Enumerable.Empty<HtmlNode>())
{
// whatever
}

Try this one:
public DataTable tableIntoTable(HtmlDocument doc)
{
var table = new DataTable("MyTable");
table.Columns.Add("raw", typeof(string));
var xpath = #"//th[#class='ddlabel'] | //table[not(.//*[contains(#*,'pldefault') or contains(#*,'ntdefault') or contains(#*,'bgtabon')])]";
var nodes = doc.DocumentNode.SelectNodes(xpath);
if (nodes != null && nodes.Count > 0)
{
foreach (var node in nodes)
{
table.Rows.Add(node.InnerHtml);
}
}
return table;
}

Do not add any check if you are iterating nodes using foreach loop. It will simply skip the loop if nodes is null.
public DataTable tableIntoTable(HtmlDocument doc)
{
var table = new DataTable("MyTable");
table.Columns.Add("raw", typeof(string));
var xpath = #"//th[#class='ddlabel'] | //table[not(.//*[contains(#*,'pldefault') or contains(#*,'ntdefault') or contains(#*,'bgtabon')])]";
var nodes = doc.DocumentNode.SelectNodes(xpath);
foreach (var node in nodes)
{
table.Rows.Add(node.InnerHtml);
}
return table;
}

I think your problem is a line above when you are getting the nodes. Just declare the node nullable.
public DataTable tableIntoTable(HtmlDocument doc)
{
var table = new DataTable("MyTable");
table.Columns.Add("raw", typeof(string));
var xpath = #"//th[#class='ddlabel'] | //table[not(.//*contains(#*,'pldefault') or contains(#*,'ntdefault') or contains(#*,'bgtabon')])]";
HtmlAgilityPack.HtmlNode? node = doc.DocumentNode.SelectNodes(xpath);
foreach (var node in nodes)
{
table.Rows.Add(node.InnerHtml);
}
return table;
}

Related

Add columns do DataTable with loop from html file

I want to add columns to my DataTable with the help of foreach from my <th> tags.
I have some problem with it. I don't understand why there is an null exception. In my HTML file i don't have any empty tags.
Fragment of my C# code:
DataTable dt = new DataTable();
int i = 0;
HtmlNode table = doc.DocumentNode.SelectSingleNode("//table[1]");
foreach (var row in table.SelectNodes("tr"))
{
var headers = row.SelectNodes("th");
foreach (var el in headers)
{
if (headers != null)
{
dt.Columns.Add(headers[i].InnerText);
i++;
}
}
}
There is a fragment of my HTML file:
<table>
<colgroup><col/><col/><col/><col/><col/><col/><col/><col/><col/><col/><col/><col/><col/><col/><col/><col/><col/><col/><col/></colgroup>
<tr><th>id</th><th>inserted_at</th><th>DisplayName</th><th>DistinguishedName</th><th>Enabled</th><th>GivenName</th><th>HomeDirectory</th><th>Manager</th><th>Name</th><th>ObjectClass</th><th>ObjectGUID</th><th>SamAccountName</th><th>Surname</th><th>UserPrincipalName</th><th>RowError</th><th>RowState</th><th>Table</th><th>ItemArray</th><th>HasErrors</th></tr>
This works for your html:
var str = #"<table>
<colgroup><col/><col/><col/><col/><col/><col/><col/><col/><col/><col/><col/><col/><col/><col/><col/><col/><col/><col/><col/></colgroup>
<tr><th>id</th><th>inserted_at</th><th>DisplayName</th><th>DistinguishedName</th><th>Enabled</th><th>GivenName</th><th>HomeDirectory</th><th>Manager</th><th>Name</th><th>ObjectClass</th><th>ObjectGUID</th><th>SamAccountName</th><th>Surname</th><th>UserPrincipalName</th><th>RowError</th><th>RowState</th><th>Table</th><th>ItemArray</th><th>HasErrors</th></tr>";
var hdoc = new HtmlAgilityPack.HtmlDocument();
hdoc.LoadHtml(str);
var headerElements = hdoc.DocumentNode.Descendants("th");
foreach(var headerElement in headerElements)
{
Console.WriteLine(headerElement.InnerText);
}
I also need to select it from specific table so..
This actually worked for me:
HtmlNode table = doc.DocumentNode.SelectSingleNode("//table[1]");
var headerElements = table.Descendants("th");
foreach (var headerElement in headerElements)
{
dt.Columns.Add(headerElement.InnerText, typeof(string));
}

How to check if an XML node exists in the children of another node?

I want to add an XmlNode to another XmlNode if it doesn't contain this node (the comparison should be based on the node name and its contents)
System.Xml.XmlDocument doc;
...
XmlNode newNode = doc.CreateElement(name);
newNode.InnerXml = something
XmlNode parentNode = doc.GetElementsByTagName(parentName);
if (parentNode.???? (newNode))
{
parentNode.AppendChild(newNode);
}
How can I check this existence? parentNode.ChildNodes doesn't have a Contain method.
I think this will do the trick:
private void doSomething()
{
XmlDocument doc = new XmlDocument();
XmlNode newNode = doc.CreateElement("name");
newNode.InnerXml = "something";
XmlNode parentNode = doc.GetElementsByTagName("parentName")[0];
// I just stuck an index on end of above line...
// Note that GetElementsByTagName returns an XmlNodeList
int huh = 0;
foreach (XmlNode n in parentNode.ChildNodes)
{
// If I understood you correctly, you want these checks?
if (n.InnerXml == newNode.InnerXml && n.Name == newNode.Name) huh++;
}
if (huh == 0) parentNode.AppendChild(newNode);
}
You could do this using LINQ to XML making use of the XNode.DeepEquals method to compare your child nodes for equality. An example might look like this - the duplicateChild will not be added but newChild will be:
var doc = new XDocument(
new XElement("parent",
new XElement("child", 1)));
var parent = doc.Descendants("parent").Single();
var duplicateChild = new XElement("child", 1);
var newChild = new XElement("child", 2);
if (!parent.Elements().Any(e => XNode.DeepEquals(e, duplicateChild)))
{
parent.Add(duplicateChild);
}
if (!parent.Elements().Any(e => XNode.DeepEquals(e, newChild)))
{
parent.Add(newChild);
}
A demo here: https://dotnetfiddle.net/1t4Q1b

Merge nodes of the same kind to a single node

I'm trying to merge 2 Nodes of the same kind into a single Node
So by having both Nodes like this
<Clubs>
<SPE>Accepted</Community>
<SCU>Accepted</SCU>
</Clubs>
and this
<Clubs>
<BUS>Declined</BUS>
</Clubs>
it will become like this
<Clubs>
<SPE>Accepted</SPE>
<SCU>Accepted</SCU>
<BUS>Declined</BUS>
</Clubs>
How could i achieve such thing?
This might help you
XmlDocument myDocument = new XmlDocument();
myDocument.Load(XMLFile);
var NodeToadd = myDocument.ChildNodes.OfType<XmlElement>().Where(nodeVariant => nodeVariant.Name == "Clubs").SelectMany(o => o.ChildNodes.OfType<XmlElement>()).ToList();
var nodeToDelete = myDocument.ChildNodes.OfType<XmlElement>().Where(nodeVariant => nodeVariant.Name == "Clubs");
foreach (var m in nodeToDelete)
{
myDocument.RemoveChild(m);
}
XmlNode newNode = myDocument.CreateElement("Clubs");
foreach(var m in NodeToadd)
{
newNode.AppendChild(m);
}
myDocument.AppendChild(newNode);
myDocument.Save(XMLFile);

C# treeview of SQL data

I have a SQL table looking like this:
orderID customerName orderDate valueTotal
================================================================
1 JohnA 01/02/2013 100
2 AmandaF 01/02/2013 140
3 JohnA 05/03/2013 58
4 FredM 05/03/2013 200
And I want to order this information on a treeView by either orderDate or customerName, depending on user settings so that it looks like this if ordered by customerName:
JohnA
01/02/2013
05/03/2013
AmandaF
01/02/2013
FredM
05/03/2013
Or like this if ordered by orderDate:
01/02/2013
JohnA
AmandaF
05/03/2013
JohnA
FredM
What would be the best way to achieve this?
EDIT:
I'm using windows forms
If you're using ADO.Net, test this:
Dictionary<string, List<string>> groups = new Dictionary<string, List<string>>();
//set these dynamic
string groupingFieldName = "customerName";
string targetFieldName = "orderDate";
SqlDataReader rdr = sqlCmd.ExecuteReader();
while (rdr.Read())
{
if (!groups.ContainsKey(rdr[groupingFieldName].ToString()))
{
groups.Add(rdr[groupingFieldName].ToString(), new List<string>());
}
groups[rdr[groupingFieldName].ToString()].Add(rdr[targetFieldName].ToString());
}
//next, iterate the dictionary and populate the treeView
foreach (KeyValuePair<string, List<string>> group in groups)
{
//add to treeView
}
Please note that this is not tested. You still need to test it.
I've ended up creating two functions to perform this in a more dynamic and scalable way.
public static TreeNodeCollection SqlToTreeNodeHierarchy(this SqlDataReader dataReader, TreeNode parent)
{
// create a parent TreeNode if we don't have one, so we can anchor the new TreeNodes to it
// I think this will work better than a list since we might be given a real parent..
if (parent == null)
{
parent = new TreeNode("topNode");
}
while (dataReader.Read())
{
//at the beginning of each row, reset the parent
var parentNode = parent;
for (var i = 0; i < dataReader.FieldCount; i++)
{
// Adds a new TreeNode as a child of parentNode if it doesn't already exist
// at this level, else it will return the existing TreeNode and save
// it onto parentNode. This way, subsequent TreeNodes will always be a child
// of this one, until a new row begins and the parent TreeNode is reset.
parentNode = AddUniqueNode(dataReader[i].ToString(), parentNode);
}
}
return parent.Nodes;
}
public static TreeNode AddUniqueNode(string text, TreeNode parentNode)
{
// if parentNode is null, create new treeNode and return it
if (parentNode == null)
{
return new TreeNode {Name = text, Text = text};
}
// if parentNode is not null, do a find for child nodes at this level containing the key
// we're after (text and name have the same value) and return the first one it finds
foreach (var childNode in parentNode.Nodes.Find(text, false))
{
return childNode;
}
// Node does not yet exist, so just add a new node to the parentNode and return that
return parentNode.Nodes.Add(text, text);
}
Then I just need to call the functions as follows:
using (var sqlConn = new SqlConnection(connectionString))
{
sqlConn.Open();
const string query = "SELECT orderDate, customerName from MAIN";
using (var sqlCommand = new SqlCommand(query, sqlConn))
{
using (var sqlDataReader = sqlCommand.ExecuteReader())
{
var treeNodeCollection = sqlDataReader.SqlToTreeNodeHierarchy(null);
foreach (TreeNode treeNode in treeNodeCollection)
{
nativeTreeView.Nodes.Add(treeNode);
}
}
}
}
This way I can make it scale with as many child nodes as I want and it also gives me the flexibility of loading the child nodes only when on expand by doing another SQL query and passing the parent as the TreeNode that was just expanded.

Renaming xmlnodes using c# dynamically

I am using the below code to rename the xmlnode name dynamically. It's looping though the xml just fine, but it does not change the node name. Please help me to do this.
Sample XML doucment
- <NewDataSet>
- <Table5>
<FLD_ID>62</FLD_ID>
<FLD_DATE>2013-03-12</FLD_DATE>
<FLD_MOD_DATE>2013-04-05</FLD_MOD_DATE>
<FLD_DESC>New Creation</FLD_DESC>
</Table5>
- </NewDataSet>
Needed XML DOCUMENT
- <rows>
- <row>
<cell>62</cell>
<cell>2013-03-12</cell>
<cell>2013-04-05</cell>
<cell>New Creation</cell>
</row>
- </rows>
My code is here
XmlNode PackageListNode = hst_doc.SelectSingleNode("NewDataSet");
XmlNodeList PackageNodeList = PackageListNode.SelectNodes("Table5");
foreach (XmlNode node in PackageNodeList)
{
node.Name.Replace("Table5", "row");
foreach (XmlNode ls in node)
{
ls.Name.Replace(ls.Name, "cell");
}
}
As you can't replace element names in an XmlDocument...
...a replacement approach for your specific situation:
string srcXML = "<NewDataSet><Table5><FLD_ID>62</FLD_ID><FLD_DATE>2013-03-12</FLD_DATE><FLD_MOD_DATE>2013-04-05</FLD_MOD_DATE><FLD_DESC>New Creation</FLD_DESC></Table5></NewDataSet>";
var doc = new XmlDocument();
doc.LoadXml(srcXML);
XmlNode oldRoot = doc.SelectSingleNode("NewDataSet");
XmlNode newRoot = doc.CreateElement("rows");
doc.ReplaceChild(newRoot, oldRoot);
foreach (XmlNode childNode in oldRoot.ChildNodes)
{
newRoot.AppendChild(childNode.CloneNode(true));
}
XmlNodeList PackageNodeList = newRoot.SelectNodes("Table5");
foreach (XmlNode node in PackageNodeList)
{
var newNode = doc.CreateElement("row");
newRoot.ReplaceChild(newNode, node);
foreach (XmlNode childNode in node.ChildNodes)
{
var clonedChildNode = childNode.CloneNode(true);
newNode.AppendChild(clonedChildNode);
var newChildNode = doc.CreateElement("cell");
newNode.ReplaceChild(newChildNode, clonedChildNode);
foreach (XmlNode childChildNode in clonedChildNode.ChildNodes)
{
newChildNode.AppendChild(childChildNode.CloneNode(true));
}
}
}
Debug.Print(doc.OuterXml);
Embrace LINQ, embrace it!
// load the document from a file
var doc = XDocument.Load(xmlPath);
var root = doc.Root;
// replace the root element with a new element
root.ReplaceWith(
// create a new element with
// the name "rows" with new children
new XElement("rows",
// replace all child elements of
// the root with new elements
root.Elements().Select(table =>
// replace the current element with a new element
// with the name "row" with the new children
new XElement("row",
// replace all child elements of the
// current element with new elements
table.Elements().Select(field =>
// replace the current element with a new element
// with the name "cell" with the same value
new XElement("cell",
(string)field
)
)
)
)
)
);
// save the document back to the file
doc.Save(xmlPath);
String.Replace returns a new string, so of course one would love to:
node.Name = node.Name.Replace("Table5", "row");
which might as well be
node.Name = "row";
however, if you look at the documentation it says that XmlNode.Name is purely a 'getter' and not a 'setter', so maybe you'll need to create whole new nodes to replace them, it depends on the actual implementation, since XmlNode is an abstract class.
for (int i = 0; i < PackageNodeList.Count; ++i) XmlNode node in PackageNodeList)
{
XmlNode replacementNode = new XmlNode("row");
foreach (XmlNode ls in node)
{
XmlNode newCell = new XmlNode("cell");
newCell.Value = ls.Value;
replacementNode.AppendChild(newCell);
}
PackageNodeList[i] = replacementNode
PackageNodeList[i].ParentNode.ReplaceChild(PackageNodeList[i], replacementNode);
}

Categories

Resources