Error in Parsing information from HTML document - c#

I’m writing a test application to see how far html agility pack can go. I decided to use rockauto as they have a similar build of my site which isn’t complete (I don’t sent to many request to the site because I know how annoying it is to spam a website) but when you run the code below instead of running though each element and adding a list of the models it binds them in the one string even though it works on other parent elements, those familiar with this NuGet what am I doing wrong? Thank you for your help.
`using HTMLAgilityPack;
private List<Model> ModelList = new List<Model>();
public IActionResult GetModels(string automaker, string year)
{
var source = $"https://www.rockauto.com/en/catalog/{automaker},{year}";
HtmlWeb web = new HtmlWeb();
var htmlDoc = web.Load(source);
var table = htmlDoc.DocumentNode.SelectSingleNode("//*[#id=\"navchildren[1]\"]");
HtmlNodeCollection childNodes = table.ChildNodes;
foreach (var node in childNodes)
{
if (node.NodeType == HtmlNodeType.Element)
{
ModelList.Add(new Model { ModelName = $"{node.InnerText}", ModelYear = $"{year}", Automaker = $"{automaker}" });
}
}
return Ok(ModelList);
}
public class Model
{
// Auto-Initialized properties
public string ModelName { get; set; }
public string ModelYear { get; set; }
public string Automaker { get; set; }
}`
I tried the above and can’t get the list result of the inputted values, it just returns one long string.

Related

Json Deserialize a webclient response C#

I am new in C# and I know there are hundreds of examples on the google for Json deserialization. I tried many but could not understand how C# works for deserialization.
using (var client = new WebClient())
{
client.Headers.Add("Content-Type", "text/json");
result = client.UploadString(url, "POST", json);
}
result looks like this:
{"Products":[{"ProductId":259959,"StockCount":83},{"ProductId":420124,"StockCount":158}]}
First I created a class:
public class ProductDetails
{
public string ProductId { get; set; }
public string StockCount { get; set; }
}
Then I tried to deserialize using this statement but couldn't understand.
var jsonresult = JsonConvert.DeserializeObject<ProductDetails>(result);
Debug.WriteLine(jsonresult.ProductId);
The above worked fine in visual basic with the following code but how to do this similar in C#
Dim Json As Object
Set Json = JsonConverter.ParseJson(xmlHttp.responseText)
For Each Product In Json("Products")
Debug.Print = Product("ProductId")
Debug.Print = Product("StockCount")
Next Product
You should use:
public class Product
{
public int ProductId { get; set; }
public int StockCount { get; set; }
}
public class RootObject
{
public List<Product> Products { get; set; }
}
var jsonresult = JsonConvert.DeserializeObject<RootObject>(result);
Because your JSON contains list of products, in jsonresult you have list of Product.
If you want get Product you can use eg. foreach
foreach(Product p in jsonresult.Products)
{
int id = p.ProductId;
}
Your JSON reads "an object that has a property named Products which contains an array of objects with properties ProductId and StockCount". Hence,
public class Inventory
{
public ProductDetails[] Products { get; set; }
}
var inventory = JsonConvert.DeserializeObject<Inventory>(result);
Your C# code cannot work because your json string contains values for 2 Product objects. As a result your var jsonresult variable will contain an array of Product objects, not one.
It is obvious in your VB code as you need to loop the Json variable in order to acquire each Product object.
Still your C# code would work if you string contained values for only one object like this:
{"ProductId" = 420124,"StockCount" = 158}
as you can see here http://www.newtonsoft.com/json/help/html/SerializingJSON.htm
Also you can try json parsing with JObject class, check this out: http://www.newtonsoft.com/json/help/html/t_newtonsoft_json_linq_jobject.htm

How to set Umbraco Child Node to List<T>

I have a blog export package which exports blog content in Umbraco to XML.
Now I want to export comment data, the comments section is set as a childNode on the NewsItem node, how can I use this format to grab the data from the childNode into the list?
Here is my code:
public List<BlogPosts> getPostList()
{
var contentType = ApplicationContext.Current.Services.ContentTypeService
.GetContentType("umbNewsItem");
var nodes = ApplicationContext.Current.Services.ContentService
.GetContentOfContentType(contentType.Id).Select(content => new Node(content.Id));
return nodes.Select(node => new BlogPosts()
{
Title = node.GetProperty("title").ToNullSafeString(),
BodyText = node.GetProperty("bodyText").ToNullSafeString(),
PublishDate = node.GetProperty("publishDate").ToNullSafeString(),
Author = node.GetProperty("author").ToNullSafeString(),
Image = node.GetProperty("image").ToNullSafeString(),
//This is where I want to grab the blog comments content
Comments = node.ChildrenAsList.Add("comments")
}).ToList();
}
My first attempt with this, I get an error on the .Add("comments") line which reads:
The best overloaded method match for 'System.Collections.Generic.List<umbraco.interfaces.INode>.Add(umbraco.interfaces.INode)' has some invalid arguments
the next thing I tried was this:
Comments = node.ChildrenAsList<BlogComment>.Add("comments").ToList()
which returns the following error:
The property 'umbraco.NodeFactory.Node.ChildrenAsList' cannot be used with type arguments
I have also tried this:
Comments = node.ChildrenAsList.Add("comments").ToList()
which returned this error:
The best overloaded method match for 'System.Collections.Generic.List<umbraco.interfaces.INode>.Add(umbraco.interfaces.INode)' has some invalid arguments
This is my BlogPosts model:
public class BlogPosts
{
public string Title { get; set; }
public string BodyText { get; set; }
public string PublishDate { get; set; }
public string Author { get; set; }
public string Image { get; set; }
public List<BlogComment> Comments { get; set; }
}
public class BlogComment
{
public string Comment { get; set; }
public string CommentDate { get; set; }
}
This is an example of the Umbraco backoffice page:
Image
I've searched throughout stackoverflow and google for anything which refers to calling data from a childNode into a list but the list type here is INode, when using this:
Comments = node.ChildrenAsList
it returns this error:
Cannot implicitly convert type 'System.Collections.Generic.List<umbraco.interfaces.INode>' to 'System.Collections.Generic.List<UmbracoBlogsExportPackage.Models.BlogComment>'
Okay then :-)
First of all, .Add() tries to add something to a collection, so that
won't work here.
Second, I think selecting Content as Nodes is a bit backwards, so I
would try not to do that.
Third, IEnumerable have a Cast() method that I think might work
here. I can't really test it, though.
Again, this is very untested, but maybe try something like this? Obviously I don't know the Comment DocType alias, so remember to change that bit :-)
public List<BlogPosts> getPostList()
{
var contentType = UmbracoContext.Current.Application.Services.ContentTypeService
.GetContentType("umbNewsItem");
var contentService = UmbracoContext.Current.Application.Services.ContentService;
var nodes = contentService.GetContentOfContentType(contentType.Id);
return nodes.Select(node => new BlogPosts()
{
Title = node.GetValue("title").ToNullSafeString(),
BodyText = node.GetValue("bodyText").ToNullSafeString(),
PublishDate = node.GetValue("publishDate").ToNullSafeString(),
Author = node.GetValue("author").ToNullSafeString(),
Image = node.GetValue("image").ToNullSafeString(),
//This is where I want to grab the blog comments content
Comments = contentService.GetChildren(node.Id).Where(x => x.ContentType.Alias == "Comment").Cast<BlogComment>().ToList()
}).ToList();
}

System.ArgumentException: invalid JSON primitive:. but only when browsing from mobile browser

I have a simple test responsive aspx web page, that download json data and try to display results.
If i browse from the desktop/laptop computer i do not receive any error everything is ok, json data are deserialized and data are shown (i can list results about students) but if i open via mobile browser i receive System.ArgumentException: invalid JSON primitive:.
I dont know what can be the reason,or where to search the problem since in one case is working as espected
Json string is like this and is very simple:
{
"students":[
{
"studentName":"TEST- MORE TEST",
"MoreInfo":"sdpThP5YUMsaFfwOM7tj",
"Counter":"404",
"Age":"20"
}
]
}
I try to replace all dots in strings, remove all special characters and without results, i remove all records and only with one still dont work i cant figure out what can be wrong:
c# code that i use for desalinize:
using (var wc = new System.Net.WebClient())
{
//here i download my json data
string json = wc.DownloadString(download_url);
JavaScriptSerializer jss = new JavaScriptSerializer();
var myStudents = jss.Deserialize<studentList>(json);
foreach (student student in myStudents.students)
{
Response.Write(student.studentName + " " + student.Age </br>");
}
}
any help appreciated, thanks
Here is how is model defined:
public class student
{
public string studentName { get; set; }
public string MoreInfo { get; set; }
public string Counter { get; set; }
public string Age { get; set; }
}
public class studentList
{
public List<student> students { get; set; }
}

Parsing Json facebook c#

I am trying for many hours to parse a JsonArray, I have got by graph.facebook, so that i can extra values. The values I want to extract are message and ID.
Getting the JasonArry is no Problem and works fine:
[
{
"code":200,
"headers":[{"name":"Access-Control-Allow-Origin","value":"*"}],
"body":"{
\"id\":\"255572697884115_1\",
\"from\":{
\"name\":\"xyzk\",
\"id\":\"59788447049\"},
\"message\":\"This is the first message\",
\"created_time\":\"2011-11-04T21:32:50+0000\"}"},
{
"code":200,
"headers":[{"name":"Access-Control-Allow-Origin","value":"*"}],
"body":"{
\"id\":\"255572697884115_2\",
\"from\":{
\"name\":\"xyzk\",
\"id\":\"59788447049\"},
\"message\":\"This is the second message\",
\"created_time\":\"2012-01-03T21:05:59+0000\"}"}
]
Now I have tried several methods to get access to message, but every method ends in catch... and throws an exception.
For example:
var serializer = new JavaScriptSerializer();
var result = serializer.Deserialize<dynamic>(json);
foreach (var item in result)
{
Console.WriteLine(item.body.message);
}
throws the exception: System.Collections.Generic.Dictionary doesnt contain definitions for body. Nevertheless you see in the screenshot below, that body contains definitions.
Becaus I am not allowed to post pictures you can find it on directupload: http://s7.directupload.net/images/120907/zh5xyy2k.png
I don't havent more ideas so i please you to help me. I need this for a project, private, not commercial.
Maybe you could give me an phrase of code, so i can continue my development.
Thank you so far
Dominic
If you use Json.Net, All you have to do is
replacing
var serializer = new JavaScriptSerializer();
var result = serializer.Deserialize<dynamic>(json);
with
dynamic result = JsonConvert.DeserializeObject(json);
that's all.
You are not deserializing to a strongly typed object so it's normal that the applications throws an exception. In other words, the deserializer won't create an Anynymous class for you.
Your string is actually deserialized to 2 objects, each containing Dictionary<string,object> elements. So what you need to do is this:
var serializer = new JavaScriptSerializer();
var result = serializer.Deserialize<dynamic>(s);
foreach(var item in result)
{
Console.WriteLine(item["body"]["message"]);
}
Here's a complete sample code:
void Main()
{
string json = #"[
{
""code"":200,
""headers"":[{""name"":""Access-Control-Allow-Origin"",""value"":""*""}],
""body"":{
""id"":""255572697884115_1"",
""from"":{
""name"":""xyzk"",
""id"":""59788447049""},
""message"":""This is the first message"",
""created_time"":""2011-11-04T21:32:50+0000""}},
{
""code"":200,
""headers"":[{""name"":""Access-Control-Allow-Origin"",""value"":""*""}],
""body"":{
""id"":""255572697884115_2"",
""from"":{
""name"":""xyzk"",
""id"":""59788447049""},
""message"":""This is the second message"",
""created_time"":""2012-01-03T21:05:59+0000""}}
]";
var serializer = new JavaScriptSerializer();
var result = serializer.Deserialize<dynamic>(json);
foreach(var item in result)
{
Console.WriteLine(item["body"]["message"]);
}
}
Prints:
This is the first message
This is the second message
I am using this simple technique
var responseTextFacebook =
#"{
"id":"100000891948867",
"name":"Nishant Sharma",
"first_name":"Nishant",
"last_name":"Sharma",
"link":"https:\/\/www.facebook.com\/profile.php?id=100000891948867",
"gender":"male",
"email":"nihantanu2010\u0040gmail.com",
"timezone":5.5,
"locale":"en_US",
"verified":true,
"updated_time":"2013-06-10T07:56:39+0000"
}"
I have declared a class
public class RootObject
{
public string id { get; set; }
public string name { get; set; }
public string first_name { get; set; }
public string last_name { get; set; }
public string link { get; set; }
public string gender { get; set; }
public string email { get; set; }
public double timezone { get; set; }
public string locale { get; set; }
public bool verified { get; set; }
public string updated_time { get; set; }
}
Now I am deserializing
JavaScriptSerializer objJavaScriptSerializer = new JavaScriptSerializer();
RootObject parsedData = objJavaScriptSerializer.Deserialize<RootObject>(responseTextFacebook );

Extracting particular node values from a list of nodes using HtmlAgilityPack in C#

I am crawling a page www.thenextweb.com
I want to extract all the post links, article content, article image etc.
I have written this code...
string url = TextBox1.Text.ToString();
var webGet = new HtmlWeb();
var document = webGet.Load(url);
var infos = from info in document.DocumentNode.SelectNodes("//div[#class='article-listing']")
select new
{
Contr = info.InnerHtml
};
lvLinks.DataSource = infos;
lvLinks.DataBind();
This extracs all the required information from the page... and i have used this informatin in Home Page using listview control in asp.net page as
<li> <%# Eval("Contr") %> </li>
Now what i want is a way trhough which i can extract the nodes information as
we have all the nodes present in infos containg link url, post image text etc.
I want a way so that i can store them as URL[0], PostContent[0], PostImage[0], Date[0] and URL[1], PostContent[1] etc all these contains respected values which are being stored in these array strings....one by one each post...
Its like extracting information one by one from inner nodes in infos.
Please suggest a way ?
Why not create a class that parses the HTML and exposes those nodes as properties.
class ArticleInfo
{
public ArticleInfo (string html) { ... }
public string URL { get; set; }
public string PostContent { get; set; }
public string PostImage { get; set; }
public DateTime PostDate { get; set; }
}
You could then do something like this:
var infos = from info in document.DocumentNode.SelectNodes("//div[#class='article-listing']")
select new ArticleInfo(info.InnerHtml);
Then if you have an array of these `infoArray = infos.ToArray()' you can do:
infoArray[0].URL
infoArray[0].PostDate
infoArray[1].PostContent
etc...
Update
Something like this:
class ArticleInfo
{
private string html;
public ArticleInfo (string html)
{
this.html = html;
URL = //code to extract and assign Url from html
PostContent = //code to extract content from html
PostImage = //code to extract Image from html
PostDate = //code to extract date from html
}
public string URL { get; private set; }
public string PostContent { get; private set; }
public string PostImage { get; private set; }
public DateTime PostDate { get; private set; }
public string Contr { get { return html; } }
}
or maybe this:
class ArticleInfo
{
private string html;
public ArticleInfo (string html)
{
this.html = html;
}
public string URL { get { return /*code to extract and return Url from html*/; } }
public string PostContent { get { return /*code to extract and return Content from html*/; } }
public string PostImage { get { return /*code to extract and return Image from html*/; } }
public DateTime PostDate { get { return /*code to extract and return Date from html*/; } }
public string Contr { get { return html; } }
}
Your link query then returns a sequence of ArticleInfo rather than the anonymous types. This way you don't have to maintain separate arrays for each element of the post. Each item in the array (or sequence) has properties to give you the associated element from that item. Of course, this might not fit what you're trying to achieve. I just thought it might be a bit cleaner.

Categories

Resources