Add columns do DataTable with loop from html file - c#

I want to add columns to my DataTable with the help of foreach from my <th> tags.
I have some problem with it. I don't understand why there is an null exception. In my HTML file i don't have any empty tags.
Fragment of my C# code:
DataTable dt = new DataTable();
int i = 0;
HtmlNode table = doc.DocumentNode.SelectSingleNode("//table[1]");
foreach (var row in table.SelectNodes("tr"))
{
var headers = row.SelectNodes("th");
foreach (var el in headers)
{
if (headers != null)
{
dt.Columns.Add(headers[i].InnerText);
i++;
}
}
}
There is a fragment of my HTML file:
<table>
<colgroup><col/><col/><col/><col/><col/><col/><col/><col/><col/><col/><col/><col/><col/><col/><col/><col/><col/><col/><col/></colgroup>
<tr><th>id</th><th>inserted_at</th><th>DisplayName</th><th>DistinguishedName</th><th>Enabled</th><th>GivenName</th><th>HomeDirectory</th><th>Manager</th><th>Name</th><th>ObjectClass</th><th>ObjectGUID</th><th>SamAccountName</th><th>Surname</th><th>UserPrincipalName</th><th>RowError</th><th>RowState</th><th>Table</th><th>ItemArray</th><th>HasErrors</th></tr>

This works for your html:
var str = #"<table>
<colgroup><col/><col/><col/><col/><col/><col/><col/><col/><col/><col/><col/><col/><col/><col/><col/><col/><col/><col/><col/></colgroup>
<tr><th>id</th><th>inserted_at</th><th>DisplayName</th><th>DistinguishedName</th><th>Enabled</th><th>GivenName</th><th>HomeDirectory</th><th>Manager</th><th>Name</th><th>ObjectClass</th><th>ObjectGUID</th><th>SamAccountName</th><th>Surname</th><th>UserPrincipalName</th><th>RowError</th><th>RowState</th><th>Table</th><th>ItemArray</th><th>HasErrors</th></tr>";
var hdoc = new HtmlAgilityPack.HtmlDocument();
hdoc.LoadHtml(str);
var headerElements = hdoc.DocumentNode.Descendants("th");
foreach(var headerElement in headerElements)
{
Console.WriteLine(headerElement.InnerText);
}

I also need to select it from specific table so..
This actually worked for me:
HtmlNode table = doc.DocumentNode.SelectSingleNode("//table[1]");
var headerElements = table.Descendants("th");
foreach (var headerElement in headerElements)
{
dt.Columns.Add(headerElement.InnerText, typeof(string));
}

Related

Why do I get System.Data.DataRow? instead of datatable (I've retrieved a table from outlook as html body then I've parse it to a data table)

I've retrieved a table from outlook as html body then I've parse it to a datatable but when I run the code, all I get is System.Data.DataRow
static void Main(string[] args)
{
var mails = OutlookEmails.ReadMailItems();
foreach (var mail in mails)
{
StringBuilder builder = new StringBuilder();
builder.Append(mail.EmailBody.ToString());
HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml(builder.ToString());
var nodes = doc.DocumentNode.SelectNodes("//table//tr");
DataTable dataTable = new DataTable();
var headers = nodes[0]
.Elements("th")
.Select(th => th.InnerText.Trim());
foreach (var header in headers)
{
dataTable.Columns.Add(header);
}
var rows = nodes.Skip(1).Select(tr => tr
.Elements("td")
.Select(td => td.InnerText.Trim())
.ToArray());
foreach (var row in rows)
{
dataTable.Rows.Add(row);
}
Console.WriteLine(dataTable.Rows);
Console.ReadLine();
}
}
Because you are just printing out the type of the object.
What else did you expect?
If you want to print out every column for every row in your dataTable, you must specify it.
Try this:
foreach (DataRow row in dataTable.Rows)
{
Console.WriteLine();
foreach (DataColumn col in dataTable.Columns)
{
Console.Write(row[col] + " ");
}
}
For further information: MS DataTable Docs

Fill a datagridview with specific XML node from several XML files

I have to fill a datagridview with 3 specifics XML nodes data from several XML files.
Here it´s an example:
<?xml version='1.0' encoding='iso-8859-1'?>
<retorno>
<mensagem>
<codigo>00001 - Sucesso</codigo>
</mensagem>
<alerta>
</alerta>
<numero_nfse>641</numero_nfse>
<serie_nfse>1</serie_nfse>
<data_nfse>08/09/2020</data_nfse>
<hora_nfse>12:16:10</hora_nfse>
<arquivo_gerador_nfse>688569.xml</arquivo_gerador_nfse>
<cod_verificador_autenticidade>03379569</cod_verificador_autenticidade>
</retorno>
I need these get 3 tags - <numero_nfse>, <data_nfse>, <cod_verificador_autenticidade> - and load them into a datagridview.
However, there are more XML files, with the same tags and I would to load all of them at the same time into a datagridview.
I wrote the code bellow and as you can see, it´s not working.
string[] arquivos = Directory.GetFiles(#"D:\Documentos\retorno");
DataSet retorno = new DataSet();
for (int j = 0; j < arquivos.Length; j++)
{
FileInfo Xmls = new FileInfo(arquivos[j]);
string caminhoXmls = Convert.ToString(Xmls);
XmlDocument retornoXml = new XmlDocument();
retornoXml.Load(caminhoXmls);
XmlNodeList retornoTags = retornoXml.GetElementsByTagName("retorno");
foreach (XmlNode xn in retornoTags)
{
string XmlNumeroNfse = xn["numero_nfse"].InnerText;
string XmlDataNfse = xn["data_nfse"].InnerText;
string XmlHoraNfse = xn["hora_nfse"].InnerText;
string XmlCodigo = xn["cod_verificador_autenticidade"].InnerText;
}
retorno.ReadXml(caminhoXmls);
dgvDadosNota.DataSource = retorno.Tables[j];
}
To clarify: I want one column for each tag. So my datagridview would be with 3 columns and as many rows as there are files in the directory. There´s only one <retorno> in each XML file.
Can anyone help me?
You are loading your multiple XML files into a DataSet with one DataTable for each file, but as explained in How to bind Dataset to DataGridView in windows application you can only bind a single DataTable to a DataGridView.
Since you have only one <retorno> node in each file, it would make sense to load the files into a DataTable with 3 columns - one each for <numero_nfse>, <data_nfse>, and <cod_verificador_autenticidade> - and one row for each file.
The following code does this:
static DataTable CreateDataTableFromRetornoXML(IEnumerable<string> fileNames)
{
var columnNames = new []
{
"numero_nfse",
"data_nfse",
"cod_verificador_autenticidade",
};
var rootName = "retorno";
var table = new DataTable();
foreach (var name in columnNames)
table.Columns.Add(name, typeof(string));
foreach (var fileName in fileNames)
{
var row = table.NewRow();
var root = XElement.Load(fileName);
var retorno = root.DescendantsAndSelf(rootName).Single(); // There should be only one.
foreach (DataColumn column in table.Columns)
{
row[column] = retorno.Element(column.ColumnName)?.Value;
}
table.Rows.Add(row);
}
return table;
}
Note I have switch to the LINQ to XML API which is generally easier to work with than the old XmlDocument API.
Demo fiddle here.

Should I Convert excel to csv or write htmltable to csv

I have 10 reports in my application which I let users to export to excel. I have never written CSV files. In my existing application, I convert the results from the stored procedure to an HTML table and write it to Excel. Some of my results from the stored procedures have dynamic columns so I use dapper. My new requirement is to provide CSV export as well.
So should I first convert html datatable to excel and convert it to CSV or write HTML datatable to CSV. I dont want to manually parse because there are 10 different reports with different columns and some of the reports have dynamic columns so I cant manually parse.
Stored procs returning Dapper, Dynamic Columns
EFDbContext db = new EFDbContext();
var recordDate = StartDate.Date;
var cnn = new SqlConnection(db.Database.Connection.ConnectionString);
cnn.Open();
var p = new DynamicParameters();
p.Add("#StartDate", StartDate);
p.Add("#UserRoleID", UserRoleID);
p.Add("#SelectedSystemIDs", SelectedSystemIDs);
p.Add("#SelectedPartIDs", SelectedPartIDs);
p.Add("#SelectedSubSystems", SelectedSubsystems);
p.Add("#SelectedServiceTypes", SelectedServiceTypes);
var obs = cnn.Query(sql: "spExportInstrumentConfigAll", param: p, commandType: CommandType.StoredProcedure);
var dt = ToDataTable(obs);
return ExportDatatableToHtml(dt);
public static DataTable ToDataTable(IEnumerable<dynamic> items)
{
if (items == null) return null;
var data = items.ToArray();
if (data.Length == 0) return null;
var dt = new DataTable();
foreach (var pair in ((IDictionary<string, object>)data[0]))
{
dt.Columns.Add(pair.Key, (pair.Value ?? string.Empty).GetType());
}
foreach (var d in data)
{
dt.Rows.Add(((IDictionary<string, object>)d).Values.ToArray());
}
return dt;
}
public static string ExportDatatableToHtml(DataTable dt)
{
StringBuilder strHTMLBuilder = new StringBuilder();
strHTMLBuilder.Append("<html >");
strHTMLBuilder.Append("<head>");
strHTMLBuilder.Append("</head>");
strHTMLBuilder.Append("<body>");
strHTMLBuilder.Append("<table border='1px' cellpadding='1' cellspacing='1 style='font-family:Garamond; font-size:medium'>");
strHTMLBuilder.Append("<tr >");
foreach (DataColumn myColumn in dt.Columns)
{
strHTMLBuilder.Append("<td >");
strHTMLBuilder.Append(myColumn.ColumnName);
strHTMLBuilder.Append("</td>");
}
strHTMLBuilder.Append("</tr>");
foreach (DataRow myRow in dt.Rows)
{
strHTMLBuilder.Append("<tr >");
foreach (DataColumn myColumn in dt.Columns)
{
strHTMLBuilder.Append("<td >");
strHTMLBuilder.Append(myRow[myColumn.ColumnName].ToString());
strHTMLBuilder.Append("</td>");
}
strHTMLBuilder.Append("</tr>");
}
//Close tags.
strHTMLBuilder.Append("</table>");
strHTMLBuilder.Append("</body>");
strHTMLBuilder.Append("</html>");
string Htmltext = strHTMLBuilder.ToString();
return Htmltext;
}
Non-Dynamic Columns mapped to entity
return db.Database.SqlQuery<ServiceEntryPartExportDataRow>("[dbo].[spExportServiceParts] #parm1, #parm2, #parm3, #parm4,#parm5,#parm6",
new SqlParameter("parm1", StartDate),
new SqlParameter("parm2", EndDate),
new SqlParameter("parm3", Reconciled),
new SqlParameter("parm4", ServiceTypes),
new SqlParameter("parm5", SelectedSystemIDs),
new SqlParameter("parm6", UserRoleID)
).ToList().ToHTMLTable();
public static string ToHTMLTable<T>(this IList<T> data)
{
PropertyDescriptorCollection props =
TypeDescriptor.GetProperties(typeof(T));
StringBuilder builder = new StringBuilder();
builder.Append("<table border=\"1\">");
builder.Append("<tr>");
for (int i = 0; i < props.Count; i++)
{
builder.Append("<td>");
PropertyDescriptor prop = props[i];
builder.Append(prop.Name);
builder.Append("</td>");
}
builder.Append("</tr>");
object[] values = new object[props.Count];
foreach (T item in data)
{
builder.Append("<tr>");
for (int i = 0; i < values.Length; i++)
{
builder.Append("<td>");
builder.Append(props[i].GetValue(item));
builder.Append("</td>");
}
builder.Append("</tr>");
}
builder.Append("</table>");
return "<html><body>" + builder.ToString() + "</body></html";
}
Current code Sending to Excel
return new PostActionResult(htmlTable, "ServiceEntryHistory", submit);
public PostActionResult(string htmlTable, string typeName, string submit) { this.htmlTable = htmlTable; this.typeName = typeName; this.submit = submit; }
public PostActionResult(DataTable dataTable, string typeName, string submit) { this.dataTable = dataTable; this.typeName = typeName; this.submit = submit; }
public override void ExecuteResult(ControllerContext context)
{
if (submit == "Excel")
{
ExcelHelpers.ExportToExcel(context.HttpContext, typeName, htmlTable);
}
if (submit == "CSV")
{
ExcelHelpers.ExportToExcelCSV(context.HttpContext, typeName, htmlTable);
}
}
public static void ExportToExcel(HttpContextBase httpBase, string fileNamePrefix, string table)
{
string TimeStamp = DateTime.Now.ToLocalTime().ToString();
string fileName = string.Format("attachment;filename={0}_{1}.xls", fileNamePrefix, TimeStamp);
httpBase.Response.ClearHeaders();
httpBase.Response.ClearContent();
httpBase.Response.Clear();
httpBase.Response.AddHeader("content-disposition", fileName);
httpBase.Response.ContentType = "application/vnd.ms-excel";
httpBase.Response.Write(table);
httpBase.Response.End();
}
You already have code to build an HTML table from the data. Building a CSV is very nearly identical. For brevity, let's simplify the HTML table pseudo-code:
builder.Append("<table>");
// header
builder.Append("<tr>");
foreach (var column in columns)
builder.Append("<th>" + column.name + "</th>");
builder.Append("</tr>");
// rows
foreach (var row in rows)
{
builder.Append("<tr>");
foreach (var column in row.columns)
builder.Append("<td>" + column.value + "</td>");
builder.Append("</tr>");
}
builder.Append("</table>");
Building a CSV is the exact same structure:
// header
foreach (var column in columns)
builder.Append("\"" + column.name + "\",");
// there's now an extra comma at the end. remove it, or use a
// different method to have built the row, such as string.Join.
// rows
foreach (var row in rows)
{
foreach (var column in row.columns)
builder.Append("\"" + column.value + "\",");
// there's now an extra comma at the end. remove it, or use a
// different method to have built the row, such as string.Join.
builder.Append(Environment.NewLine);
}
Remember that this is free-hand pseudo-code, there are some clean-ups you can employ. You might also check the column types to determine if you need those escaped quotes or not, since numeric types wouldn't want them. But the point is that the structure is the same. A CSV is text in the same way that HTML is text. It's only the dressing around the values that's different.
Side note: This is actually a classic example of the Template Method Pattern.

Edit a row in a datatable C#

I have a datatable running in a foreach loop, getting site usage information on multiple sahrepoint websites. I would like to be able to add a column next to each foreach iteration adding the site url, I can only figure out how to do this adding a new row making the site url appear below the entry. Like So:
How can I get the url to go into the row above it?
My code is below:
SPListItemCollection items = list.GetItems(query);
DataTable aggregatedTable = new DataTable();
foreach (SPListItem item in items)
{
string url = item["SiteUrl"].ToString();
try
{
using (SPSite siteadd = new SPSite(url))
using (SPWeb webadd = siteadd.OpenWeb())
{
//
DataTable table = webadd.GetUsageData(Microsoft.SharePoint.Administration.SPUsageReportType.browser, Microsoft.SharePoint.Administration.SPUsagePeriodType.lastMonth);
table.Columns.Add("url");
if (table == null)
{
// HttpContext.Current.Response.Write("Table Null");
}
else
{
DataRow dr;
dr = table.NewRow();
dr["url"] = url;
table.Rows.Add(dr);
// table.Rows.Add(url);
aggregatedTable.Merge(table);//Append the data to previous site data.
}
}
}
catch { }
}
dataGridView1.DataSource = aggregatedTable;//bind datatable with
Why you adding a new row to you existing DataTable rather you should set value to you existing row.
e.g.
var CurRow = table.AsEnumerable().FirstOrDefault();
table.Columns.Add("url");
if (CurRow != null)
{
CurRow["url"] = url;
}

Like statement or removal of trailing blanks in html agility pack?

I m trying to download data from a website into a datatable. The problem is I cannot access the right node because there seem to be blanck spaces. Here is my code so far:
public static DataTable downloadtable()
{
DataTable dt = new DataTable();
string htmlCode = "";
using (WebClient client = new WebClient())
{
client.Headers.Add(HttpRequestHeader.UserAgent, "AvoidError");
htmlCode = client.DownloadString("https://www.eex.com/en/Market%20Data/Trading%20Data/Power/Hour%20Contracts%20%7C%20Spot%20Hourly%20Auction/Area%20Prices/spot-hours-area-table/2013-08-22");
}
//this is just to check the file structure from text file
System.IO.StreamWriter file = new System.IO.StreamWriter("c:\\temp\\test.txt");
file.WriteLine(htmlCode);
HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml(htmlCode);
dt = new DataTable();
foreach (HtmlNode table in doc.DocumentNode.SelectNodes("//table[#class='list electricity']/tr/th[#class='title'][.='Market Area']"))
{
//This is the problem name where I get the error
foreach (HtmlNode row in table.SelectNodes("//td[#class='title'][.=' 00-01 ']"))
{
foreach (var cell in row.SelectNodes("//td"))
{
//this is to check for correct result, final result would be to dump it into datatable
Console.WriteLine(cell.InnerText);
}
}
}
return dt;
}
I m trying to download the Hours prices from the link in the code but it seems to fail because of trailing blanks (I think).
Is there a like statement for the name of a node? Or can you remove trailing blanks?
I believe your problem is that you are trying to retrieve td's from inside a td node which obviously doesn't have more td's.
<tr>
<td class="title"> 00-01 </td>
<td class="spacer"></td>
<td class="r">€/MWh</td>
<td class="spacer"></td>
<td>35.34</td>
<td class="spacer"></td>
<td>34.02</td>
<td class="spacer"></td>
<td>34.02</td>
</tr>
So if you try to iterate with your result table.SelectNodes("//td[#class='title'][.=' 00-01 ']") it will contain no td's inside of it.
If you want all the rows starting from 00-01 you can use this one:
HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml(htmlCode);
foreach (HtmlNode row in doc2.DocumentNode.SelectNodes("//td[#class='title'][(normalize-space(.)='00-01')]/ancestor::table"))
{
foreach (var cell in row.SelectNodes("./tr/td"))
{
if (string.IsNullOrEmpty(cell.InnerText.Trim()))
continue;
Console.WriteLine(cell.InnerText.Trim());
}
}
If you want only the 00-01 row you can use this one:
HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml(htmlCode);
foreach (HtmlNode row in doc.DocumentNode.SelectNodes("//td[#class='title']"))
{
if (row.InnerText.Trim() == "00-01")
{
foreach (var cell in row.ParentNode.ChildNodes)
{
if (string.IsNullOrEmpty(cell.InnerText.Trim()))
continue;
Console.WriteLine(cell.InnerText.Trim());
}
}
}
Or you can use it as:
HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml(htmlCode);
foreach (HtmlNode row in doc2.DocumentNode.SelectNodes("//td[#class='title'][(normalize-space(.)='00-01')]"))
{
foreach (var cell in row.ParentNode.ChildNodes)
{
if (string.IsNullOrEmpty(cell.InnerText.Trim()))
continue;
Console.WriteLine(cell.InnerText.Trim());
}
}

Categories

Resources