converting xml of sharepoint list to dataset - c#

its been more than one week and i still cant figure out what is the problem here. hope you could help me. i am successfully retrieving xml from share point server using SOAP web service, then i am converting the xml to dataset object, i am getting the dataset successfully but its "damaged" - there are few columns that has a missing values from the xml. here is the code for importing the xml using SOAP:
private void button2_Click(object sender, EventArgs e)
{
oportal.Lists list = new oportal.Lists();
list.Credentials = System.Net.CredentialCache.DefaultCredentials;
list.Url = "http://xxx/xxx/xxx/xxx/_vti_bin/Lists.asmx";
XmlDocument xmlDoc = new System.Xml.XmlDocument();
XmlNode ndQUery = xmlDoc.CreateNode(XmlNodeType.Element, "Query", "");
XmlNode ndViewFields = xmlDoc.CreateNode(XmlNodeType.Element, "ViewFields", "");
XmlNode ndQueryOptions = xmlDoc.CreateNode(XmlNodeType.Element, "QueryOptions", "");
ndQueryOptions.InnerXml =
"<IncludeMandatoryColumns>TRUE</IncludeMandatoryColumns>" +
"<DateInUtc>FALSE</DateInUtc>";
ndViewFields.InnerXml = #"<FieldRef Name='שם לקוח' />
<FieldRef Name='שם מתל'/>";
try
{
XmlNode ndListItems = list.GetListItems("{DD1CF626-62E1-4E36-BF2B-C7D08EA73674}",null, ndQUery, ndViewFields, "14000", ndQueryOptions, null);
System.Diagnostics.Debug.WriteLine(ndListItems.OuterXml);
dataGridView1.DataSource = ConverttYourXmlNodeToDataSet(ndListItems).Tables[1];
}
catch(System.Web.Services.Protocols.SoapException ex) {
MessageBox.Show(ex.Message + Environment.NewLine + ex.Detail.InnerText + Environment.NewLine + ex.StackTrace);
}
}
the xml i am getting looks ok, the column (field) names are in hebrew language but the xml shows them in HTML Entity (Hexadecimal) - maybe thats the root of the problem?
after i am getting the xml i am converting it into dataset with ConverttYourXmlNodeToDataSet() function here is the code:
public static DataSet ConverttYourXmlNodeToDataSet(XmlNode xmlnodeinput)
{
DataSet dataset = null;
if (xmlnodeinput != null)
{
XmlTextReader xtr = new XmlTextReader(xmlnodeinput.OuterXml, XmlNodeType.Element,null);
dataset = new DataSet();
dataset.ReadXml(xtr);
}
return dataset;
}
i am getting the dataset succefuly but like i mentioned its damaged because of the missing values, they exist in the xml but not in the dataset (the columns exist but not the values).
please have a look at this screen shoot:
iv`e surrounded with red color one of the columns that dont get their value from the XML. here is a screen shoot of the xml and the missing value that should be in the dataset surronded with red color:
also tryed to convert the xml to dataset like that but the results are the same:
public static DataSet read(XmlNode x) {
DataSet ds = new DataSet();
XmlReader r = new XmlNodeReader(x);
ds.ReadXml(r);
return ds;
}
hope someone can help me here. tnx.
UPDATE:
ok i have not solved it yet but i discovered few things that may lead to the solution:
i have noticed that all the columns that appears without values in the dataset are the columns that filled by the user in the website controls, and guess what? all the captions for those columns are in hebrew language , hence the columns that appears with values on the dataset are sharepint default columns, and their captions are in English, and they dont have HTML Entity (Hexadecimal)! name (look at the xml). So it makes me suspect that the problem is related to the HTML Entity (Hexadecimal) column names that related to the Hebrew captions... my assumption is that the dataset cant interpret this HTML Entity (Hexadecimal) encode. another clue is that the column name as its spelled in the dataset (for example look at the screen shoot of the datagridview above - column 4 from the left side (index 3)) is not interpreted right, the column name should be 'שם מתל' and thats all - as you can see (you don`t have to understand Hebrew for that) only half of this Hebrew string is there and concatenated to it part of the encoded HTML Entity (Hexadecimal).
i have noticed that when im sorting columns in share point website the requested url using the hexadecimal html entity of the column and not the hebrew name of the column:
http://xxx/xxx/xxx/xxx/Lists/1/view9.aspx?View={c2538b95-efae-453b-b536-aad6f98265ed}&SortField=_x05e9__x05dd__x0020__x05de__x05&SortDir=Desc
and i expected to see something like:
http://xxx/xxx/xxx/xxx/Lists/1/view9.aspx?View={c2538b95-efae-453b-b536-aad6f98265ed}&SortField=_'שם מתל'=Desc
so i made a change in my code in order to explicitly declare the column names in the encoded HTML Entity (Hexadecimal) and i did this (the original code is above):
ndViewFields.InnerXml = #"<FieldRef Name='_x05d0__x05d9__x05e9__x05d5__x05' />
<FieldRef Name='_x05e9__x05dd__x0020__x05de__x05'/>";
now the result i was getting in the dataset has changed! the change was that the columns i explicitly declared them moved to the first column indexes of the dataset but still there aren`t any values in those columns.
so, to summery all of that digging, here are my assumptions:
*. the problem is the interpreter between the xml and the dataset
*. the interpreter is defective because he cant interpret ecoded HTML Entity (Hexadecimal) properly
*. column captions written in HTML Entity (Hexadecimal) because their captions are in hebrew
*. solution can be or making columns captions to plain hebrew (in xml) or doing something that will make the interpreter between xml and dataset work properly (maybe using XmlParserContext class?? - tryed a little with no success or other class that can manipulate encoded xml text).

finally, after i have accomplish to solve this. the solution i found was super simple.
i have been searching and struggling some time for a solution, never found one and then this simple solution crossed my mind.
only one line of code was needed:
String s = xmlnodeinput.OuterXml.Replace("ows__x05e9__x05dd__x0020__x05de__x05",
"AccountManager");
just replacing the hex value and the dataset loads properly.
also i checked to see that there are no time proccessing issues(replacing string takes less then a second):
start reading 12000 rows: 26/03/2016 17:18:00
start replacing strings: 26/03/2016 17:18:04
load xml string to dataset: 26/03/2016 17:18:04
finish loading dataset: 26/03/2016 17:18:04
the complte converstion from xml to dataset function:
public static DataSet ConverttYourXmlNodeToDataSet(XmlNode xmlnodeinput)
{
//declaring data set object
DataSet dataset = null;
if (xmlnodeinput != null)
{
NameTable nt = new NameTable();
nt.Add("row");
XmlNamespaceManager nsmgr = new XmlNamespaceManager(nt);
XmlParserContext context = new XmlParserContext(nt, null, "heb",null, null, null, null, null, XmlSpace.None,Encoding.Unicode);
String s = xmlnodeinput.OuterXml.Replace("ows__x05e9__x05dd__x0020__x05de__x05", "AccountManager");
XmlTextReader xtr = new XmlTextReader(s, XmlNodeType.Element,context);
dataset = new DataSet();
dataset.ReadXml(xtr);
}
return dataset;
}

Encountered the same problem (Missing values after Xml load via/to DataSet).
Seems there is a problem with some chars (in my case "-").
Solution from jonathana works (replacing chars from attribute names before loading data to the dataset).
I will additionally provide a solution for .NET2 that changes all attribute names in the resulting SharePoint SOAP query XML to make sure the conversation into dataset won't result in an error (can be done more nicly with .NET3+ but im forced to .NET2 in my situation):
using System.Text.RegularExpressions;
using System.Web;
using System.Xml;
XmlDocument doc = new XmlDocument();
doc.LoadXml(spResXml.OuterXml);
System.Xml.XmlNamespaceManager nm = new System.Xml.XmlNamespaceManager(doc.NameTable);
nm.AddNamespace("rs", "urn:schemas-microsoft-com:rowset");
nm.AddNamespace("z", "#RowsetSchema");
nm.AddNamespace("rootNS", "http://schemas.microsoft.com/sharepoint/soap");
var zRows = doc.SelectNodes("//z:row", nm);
for (int i = 0; i < zRows.Count; i++)
{
XmlNode zRow = zRows[i];
List<XmlAttribute> attsList = new List<XmlAttribute>();
for (int j = 0; j < zRow.Attributes.Count; j++)
{ attsList.Add(zRow.Attributes[j]); }
foreach (XmlAttribute att in attsList)
{
string patchedAttName = att.Name;
patchedAttName = patchedAttName.Replace("_x", "%u");
patchedAttName = HttpUtility.UrlDecode(patchedAttName);
patchedAttName = Regex.Replace(patchedAttName,"[^A-Za-z0-9_]", "_", RegexOptions.None);
if (att.Name.Equals(patchedAttName))
{ continue; }
var newAtt = doc.CreateAttribute(att.Prefix, patchedAttName, att.NamespaceURI);
newAtt.Value = att.Value;
zRow.Attributes.Remove(att);
zRow.Attributes.Append(newAtt);
}
}
DataSet ds = new DataSet();
ds.ReadXml(new XmlNodeReader(doc));
DataTable t = ds.Tables[1];

Related

Html Agility Pack Xpath not working

so when I'm trying to do is parse a HTML document using Html Agility Pack. I load the html doc and it works. The issue lies when I try to parse it using XPath. I get a "System.NullReferenceException: 'Object reference not set to an instance of an object.'" Error.
To get my xpath I use the Chrome Development window and highlight the whole table that has the rows which contains the data that I want to parse, right click it and copy Xpath.
Here's my code
string url = "https://www.ctbiglist.com/index.asp";
string myPara = "LastName=Smith&FirstName=James&PropertyID=&Submit=Search+Properties";
string htmlResult;
// Get the raw HTML from the website
using (WebClient client = new WebClient())
{
client.Headers[HttpRequestHeader.ContentType] = "application/x-www-form-urlencoded";
// Send in the link along with the FirstName, LastName, and Submit POST request
htmlResult = client.UploadString(url, myPara);
//Console.WriteLine(htmlResult);
}
HtmlDocument doc = new HtmlDocument();
doc.LoadHtml(htmlResult);
HtmlNodeCollection table = doc.DocumentNode.SelectNodes("//*[#id=\"Table2\"]/tbody/tr[2]/td/table/tbody/tr/td/div[2]/table/tbody/tr[2]/td/table/tbody/tr[2]/td/form/div/table[1]/tbody/tr");
Console.WriteLine(table.Count);
When I run this code it works but grabs all the tables in the HTML document.
var query = from table in doc.DocumentNode.SelectNodes("//table").Cast<HtmlNode>()
from row in table.SelectNodes("//tr").Cast<HtmlNode>()
from cell in row.SelectNodes("//th|td").Cast<HtmlNode>()
select new { Table = table.Id, CellText = cell.InnerText };
foreach (var cell in query)
{
Console.WriteLine("{0}: {1}", cell.Table, cell.CellText);
}
What I want is a specific table that holds all the tables rows that has the data I want to parse into objects.
Thanks for the help!!!
Change the line
from table in doc.DocumentNode.SelectNodes("//table").Cast<HtmlNode>()
to
from table in doc.DocumentNode.SelectNodes("//table[#id=\"Table2\"]").Cast<HtmlNode()
This will only select specific table with given Id. But if you have nested Tables then you have change your xpath accordingly to get the nested table rows.

FileHelpers - FieldOptional doesn't seem to work when using ReadStringAsDT

I'm using DelimitedFieldBuilder and marked my fields as FieldOptional = true.
I later create my class builder like so:
classBuilder = ClassBuilder.LoadFromXmlString(ColumnMappings);
I create my engine:
engine = new FileHelperEngine((classBuilder as DelimitedClassBuilder).CreateRecordClass());
I then populate the dataset like so:
var myString = "01,122242843,456183,160823,0716,84,80,1\n02,456183,122242843,1,160822,,USD,1/\n03,008066662,USD,010,0,,,015\n88,125,,450,1134403,,,570/";
using(var dt = engine.ReadStringAsDT(myString))
{
using (var ds = new DataSet())
{
ds.Tables.Add(dt);
var myXml = ds.GetXml());
}
}
This seemed pretty straight forward. However the fields that lack values are still outputted by the ds.GetXml() call like so:
<NewDataSet>
<Table1>
<ID>01</ID>
<Credit>122242843/</Credit>
<Field3>456183</Field3>
<Field4>160823</Field4>
<Field5>0716</Field5>
<Field6>84</Field6>
<Field7>80</Field7>
<Field8>1</Field8>
<Field9 />
</Table1>
</NewDataSet>
Notice Field9. It is marked using FieldOptional = true. I was under the impression that by marking it as optional, if there was no value, the Field element would not be output in the final XML. Is that not the case or am I missing another property setting perhaps?
Any help would be appreciated.
Kind regards everyone!

XmlDocument to DataSet only returning 1 Row

I have an problematic piece of code and it's something really peculiar and nothing I've ever experienced before!
I'm calling a Sharepoint SOAP function and the results are being returned absolutely fine., many XML records of data are being retruned.
Now I have tried to convert the results into an XmlDocument, which I then use to load into a DataSet.
However when it gets inserted into the DataSet it's only inserting 1 record, which happens to be the very first record of the Xml.
The problematic code is the below:
Lists list = new Lists();
list.Url = URL + "_vti_bin/Lists.asmx";
list.UseDefaultCredentials = true;
//Gets the entire Lists attached to the Sharepoint Site
XmlNode Results = list.GetListCollection();
//Writes the entire results into an XmlDocument.
doc.AppendChild(doc.ImportNode(Results, true));
using (StringReader xmlSR = new StringReader(doc.InnerXml))
{
ds.ReadXml(xmlSR, XmlReadMode.Auto);
}
The Xml from 'doc.InnerXml' is all valid and is pastable into Xml Notepad 2007, so i'm a bit at a lost.
I hope someone can shed some light onto this, be much appreciated
The following example works for me:
Lists list = new Lists(); //SharePoint Lists SOAP service
//Perform request
XmlNode result = list.GetListCollection();
//Process result
var ds = new DataSet("ListsResults");
using (var reader = new StringReader(result.OuterXml))
{
ds.ReadXml(reader, XmlReadMode.Auto);
}
//print List Titles
foreach (DataRow row in ds.Tables[0].Rows)
{
Console.WriteLine(row["Title"]);
}
Another common approach is to utilize LINQ to XML:
Lists list = new Lists(); //SharePoint Lists SOAP service
//Perform request
XmlNode result = list.GetListCollection();
var docResult = XDocument.Parse(result.OuterXml);
XNamespace s = "http://schemas.microsoft.com/sharepoint/soap/";
var listEntries = from le in docResult.Descendants(s + "List")
select new
{
Title = le.Attribute("Title").Value
};
foreach (var e in listEntries)
{
Console.WriteLine(e.Title);
}

FileHelpers - Column mapping

Quick question regarding filehelper library:
I have used file helper engine to read stream, do my validation and if the CSV file has not got a header we need to match/map it to my model: i.e
id, name, age, phone, sex,
but the CSV might not come in this format/order all the time and we need to match them using a drop down list for each column.
Is there any way I can do this?
Thannks,
The short answer, no. BUT you can create a dependent class dynamically:
Since you have the list of possible fields in your JSON file, I would recommend doing a basic System.IO ReadLine for the first data row, and then parse by your delimiter for the individual headers. i.e.:
string headerString;
var headers = new List<String>();
var file = new System.IO.StreamReader("C:\\myFile.txt");
headerString = file.ReadLine();
file.Close();
headers = headerString.Split(',').ToList();
now you have the list of strings for the first row to match against your JSON file. Then you can create your dependent class using System.Reflection.Emit (referenced link below)
typeBuilder.SetParent(typeof(MyFileHelperBaseClass));
// can place the property definitions in a for loop against your headers
foreach(string h in headers){
typeBuilder.DefineProperty("<header/col#>", ..., typeof(System.Int32), null);
}
stackoverflow article 14724822: How Can I add properties to a class on runtime in C#?
File Helpers gets a little finicky at times, so it will take some tweaking.
Hope this helps
You can use File.ReadLines(#"C:\myfile.txt").First() to read the first line and get the headers.
Then you can just use a FileHelpers CodeBuilder to build your runtime class. From the example for a delimited csv file:
DelimitedClassBuilder cb = new DelimitedClassBuilder("Customers", ",");
cb.IgnoreFirstLines = 1;
cb.IgnoreEmptyLines = true;
cb.AddField("BirthDate", typeof(DateTime));
cb.LastField.TrimMode = TrimMode.Both;
cb.LastField.FieldNullValue = DateTime.Today;
cb.AddField("Name", typeof(string));
cb.LastField.FieldQuoted = true;
cb.LastField.QuoteChar = '"';
cb.AddField("Age", typeof(int));
engine = new FileHelperEngine(cb.CreateRecordClass());
DataTable dt = engine.ReadFileAsDT("testCustomers.txt");
Then you can traverse the resulting data table.

HTML Agility pack - parsing tables

I want to use the HTML agility pack to parse tables from complex web pages, but I am somehow lost in the object model.
I looked at the link example, but did not find any table data this way.
Can I use XPath to get the tables? I am basically lost after having loaded the data as to how to get the tables. I have done this in Perl before and it was a bit clumsy, but worked. (HTML::TableParser).
I am also happy if one can just shed a light on the right object order for the parsing.
How about something like:
Using HTML Agility Pack
HtmlDocument doc = new HtmlDocument();
doc.LoadHtml(#"<html><body><p><table id=""foo""><tr><th>hello</th></tr><tr><td>world</td></tr></table></body></html>");
foreach (HtmlNode table in doc.DocumentNode.SelectNodes("//table")) {
Console.WriteLine("Found: " + table.Id);
foreach (HtmlNode row in table.SelectNodes("tr")) {
Console.WriteLine("row");
foreach (HtmlNode cell in row.SelectNodes("th|td")) {
Console.WriteLine("cell: " + cell.InnerText);
}
}
}
Note that you can make it prettier with LINQ-to-Objects if you want:
var query = from table in doc.DocumentNode.SelectNodes("//table").Cast<HtmlNode>()
from row in table.SelectNodes("tr").Cast<HtmlNode>()
from cell in row.SelectNodes("th|td").Cast<HtmlNode>()
select new {Table = table.Id, CellText = cell.InnerText};
foreach(var cell in query) {
Console.WriteLine("{0}: {1}", cell.Table, cell.CellText);
}
The most simple what I've found to get the XPath for a particular Element is to install FireBug extension for Firefox go to the site/webpage press F12 to bring up firebug; right select and right click the element on the page that you want to query and select "Inspect Element" Firebug will select the element in its IDE then right click the Element in Firebug and choose "Copy XPath" this function will give you the exact XPath Query you need to get the element you want using HTML Agility Library.
I know this is a pretty old question but this was my solution that helped with visualizing the table so you can create a class structure. This is also using the HTML Agility Pack
HtmlDocument doc = new HtmlDocument();
doc.LoadHtml(#"<html><body><p><table id=""foo""><tr><th>hello</th></tr><tr><td>world</td></tr></table></body></html>");
var table = doc.DocumentNode.SelectSingleNode("//table");
var tableRows = table.SelectNodes("tr");
var columns = tableRows[0].SelectNodes("th/text()");
for (int i = 1; i < tableRows.Count; i++)
{
for (int e = 0; e < columns.Count; e++)
{
var value = tableRows[i].SelectSingleNode($"td[{e + 1}]");
Console.Write(columns[e].InnerText + ":" + value.InnerText);
}
Console.WriteLine();
}
In my case, there is a single table which happens to be a device list from a router. If you wish to read the table using TR/TH/TD (row, header, data) instead of a matrix as mentioned above, you can do something like the following:
List<TableRow> deviceTable = (from table in document.DocumentNode.SelectNodes(XPathQueries.SELECT_TABLE)
from row in table?.SelectNodes(HtmlBody.TR)
let rows = row.SelectSingleNode(HtmlBody.TR)
where row.FirstChild.OriginalName != null && row.FirstChild.OriginalName.Equals(HtmlBody.T_HEADER)
select new TableRow
{
Header = row.SelectSingleNode(HtmlBody.T_HEADER)?.InnerText,
Data = row.SelectSingleNode(HtmlBody.T_DATA)?.InnerText}).ToList();
}
TableRow is just a simple object with Header and Data as properties.
The approach takes care of null-ness and this case:
<tr>
<td width="28%"> </td>
</tr>
which is row without a header. The HtmlBody object with the constants hanging off of it are probably readily deduced but I apologize for it even still. I came from the world where if you have " in your code, it should either be constant or localizable.
Line from above answer:
HtmlDocument doc = new HtmlDocument();
This doesn't work in VS 2015 C#. You cannot construct an HtmlDocument any more.
Another MS "feature" that makes things more difficult to use. Try HtmlAgilityPack.HtmlWeb and check out this link for some sample code.

Categories

Resources