Group by the parsed value HTML AgilityPack C# - c#

Group data in C#, I have parsed the html file and get all the data on it, now I want to group them as following:
Those lines which are selected are the parent and contain the following childs, the code that I'm working on is here:
var uricontent = File.ReadAllText("TestHtml/Bew.html");
var doc = new HtmlDocument(); // with HTML Agility pack
doc.LoadHtml(uricontent);
var rooms = doc.DocumentNode.SelectNodes("//table[#class='rates']").SelectMany(
detail =>
{
return doc.DocumentNode.SelectNodes("//td[#class='rate-description'] | //table[#class='rooms']//h2 | //table[#class='rooms']//td[#class='room-price room-price-total']").Select(
r => new
{
RoomType = r.InnerText.CleanInnerText(),
});
}).ToArray();
the RoomType contains the data which is parsed by HTML AgilityPack, how can I group them by the Name like Pay & Save , Best Available Room Only ...
HTML File is here : http://notepad.cc/share/g0zh0TcyaG
Thank you

Instead of doing union of 3 XPath queries, then trying to group them back by "Rate Description" (aka by element : <td class="rate-description">), you can do it another way around.
You can base your LINQ selection by "Rate Description", then in projection part, get all room types and room rates under current "Rate Description" using relative XPath :
var rooms =
doc.DocumentNode
.SelectNodes("//table[#class='rates']//tr[#class='rate']")
.Select(r => new
{
RateType = r.SelectSingleNode("./td[#class='rate-description']")
.InnerText.CleanInnerText,
RoomTypes = r.SelectNodes("./following-sibling::tr[#class='rooms'][1]//table[#class='rooms']//h2")
.Select(s => new
{
RoomType = s.InnerText.CleanInnerText,
Rate = s.SelectSingleNode(".//parent::td/following-sibling::td[#class='room-price room-price-total'][1]")
.InnerText.CleanInnerText
}).ToArray()
}).ToArray();
Notice period at the beginning of some XPath queries above. That tells HtmlAgilityPack that the query is relative to current HtmlNode. The result is about like this :

Related

XML parsing C# using LINQ

How do you parse this kind of XML file with LINQ?
<houses>
<house nbr="146" city="Linköping" owner="john"/>
<house nbr="134" city="Norrköping" owner="wayne"/>
<house nbr="146" city="Köping" owner="steffe"/>
</houses>
All examples I can find only describe how to parse when each element has a value.
If this was the case I would have done it like this:
var houses = from house in xmlDoc.Descendants("house")
select new RowData
{
number = spec.Element("nbr").Value,
city = spec.Element("city").Value,
owner = spec.Element("owner").Value,
};
return houses ;
But this xml file is not formatted that way.
Try this:
var houses = from house in document.Descendants("house")
select new RowData
{
number = (int)house.Attribute("nbr"),
city = (string)house.Attribute("city"),
owner = (string)house.Attribute("owner")
};

Linq group by to create nested listview

I need to create a nested listview and found a great article on how to do it, but my situation is a bit different. I am a linq newbie and need a little help please :)
I need to get my data into a format similar in that article to work (on that link above, search for "Configuring the ListView" and see table right above it).
Here is my data:
Format Movie Name Price
DVD Star Wars 12.99
DVD Star Wars II 13.99
Blue-Ray Star Wars 15.99
Blue-Ray Star Wars II 17.99
Here is what I have, which isn't really that close, but it is as far as I could get:
var MoviesToBuy = from Movie in dtMovieListDetails.AsEnumerable()
//join MovieDetails in dtMovieListDetails.AsEnumerable() on (string)Movie["ID"] equals (string)MovieDetails["ID"]
group Movie by new
{
Format = Movie["Format"],
Movies = Movie
} into grp
select new
{
Format = (string)grp.Key.Format,
Movies = grp.Key.Movies
};
MoviesToBuy = MoviesToBuy.OrderBy(p => p.Format);
lvwAmazonMovieGroup.DataSource = MoviesToBuy.ToList();
lvwAmazonMovieGroup.DataBind();
I have 3 specific issues/questions:
1.) What I have doesn't work. Since my second column in the group equates to all rows, no actual group is created.
2.) Despite prior issue, I am also getting "Data source is an invalid type. It must be either an IListSource, IEnumerable, or IDataSource" error. In this case, the Movies column is being created as a DataRow datatype. Not sure if that is what is creating the problem. Can I cast on that field somehow?
3.) how can I sort on the fields in the movies. I.e. in the end I want the data to be sorted by Format then Movie Name so the nested list view looks like this:
Blue-Ray
Star Wars 12.99
Star Wars II 13.99
DVD
Star Wars 15.99
Star Wars II 17.99
Any points are greatly appreciated!
Thanks in advance,
Chad
I was thinking you could start with the following, adjusting for the proper variable names and AsEnumerable(), etc.
It orders your movies as you want and puts them in a nested structure as you requested:
var moviesToBuy = from movie in dtMovieListDetails
orderby movie.Format, movie.Price
group movie by movie.Format into grp
select new
{
Format = grp.Key,
Movies = grp.Select (g => new { MovieName = g.MovieName, Price = g.Price })
};
Here's an example program that implements the above query.
Try something like this:
var res = from m in movies
group m by m.Format into grouped
orderby grouped.Key
select new
{
Format = grouped.Key,
Movies = grouped.AsEnumerable().OrderBy(x => x.MovieName)
};
Alternatively
var res = from m in movies
orderby m.MovieName
group m by m.Format into grouped
orderby grouped.Key
select new
{
Format = grouped.Key,
Movies = grouped.AsEnumerable()
};
Using this seed data:
var movies = new[] {
new Movie { Format = "DVD", MovieName = "SW1"},
new Movie { Format = "Blue-ray", MovieName = "SW1"},
new Movie { Format = "DVD", MovieName = "SW2"},
new Movie { Format = "Blue-ray", MovieName = "SW2"},
new Movie { Format = "DVD", MovieName = "RF"}
};
Produced:
Format: Blue-ray
Movie: SW1
Movie: SW2
Format: DVD
Movie: RF
Movie: SW1
Movie: SW2
Just for completness, I used this code to generate the previous list
foreach (var item in res)
{
Console.WriteLine("Format: " + item.Format);
foreach (var item2 in item.Movies)
{
Console.WriteLine("\tMovie: " + item2.MovieName);
}
}

All nodes in XML using Linq C# [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Use LINQ to read all nodes from XML
I am trying to read an XML file using Linq in C# windows application. The sample of the xml string is given below.
<Root>
<Name>John Doe</Name>
<Data>FBCCF14D504B7B2DBCB5A5BDA75BD93B</Data>
<customer>true</customer>
<Accounts>1</Accounts>
<dataSet>
<Type1>Found matching records.</Type1>
<Type2>No matches found.</Type2>
<Type3>Found matching records.</Type3>
</dataSet>
</Root>
I want to display all the data inside the <dataset> tag and <datatag> i want to read <customer> tag as well.
I have created a class with members (string type, string status). Where in type i want to store the type1, 2...and in status i want to store what is inside the type node.
I am able to accomplish this but in the code i have to give
type1 = (string)row.Element("type1"),
type2=(string)row.Element("type2"),
i want to have a generic code in which i dont have to mention every type. In other words i want to read all the child nodes of tag whithout mentioning the tag name. I have spent 2 hours searching for this on google, but haven't found anything yet.
Expected output
save the information in class object (type and status).
And i want to read the customer tag so that i can know whether the person is already a customer
Any help will be very much appreciated.
Thanks
Update
According to inputs received from Raphaël Althaus
I have the following code:
var list = xml.Descendants("dataSet").Elements()
.Select(m => new CustomerInfo
{
Type = m.Name.LocalName,
Value = m.Value
}).ToList();
foreach (CustomerInfo item in list)
{
MessageBox.Show(item.Type+ " "+item.Value);
}
and for reading the Customer tag i have written more code.
var isCustomer = from customer in xmlDoc.Descendants("Root")
select new
{
customer = tutorial.Element("customer").Value,
}
Can i do both in one query?. Or this method is not so heavy on performance, so i can use this?
something like that ?
var q = xml.Descendants("dataSet").Elements()
.Select(m => new
{
type = m.Name.LocalName,
value = m.Value
}).ToList();
You can also directly populate a list of your "class with members"
var list = xml.Descendants("dataSet").Elements()
.Select(m => new <TheNameOfYourClass>
{
Type = m.Name.LocalName,
Value = m.Value
}).ToList();
EDIT :
to get the "customer" value, I would do another query
var customerElement = xml.Element("customer");
var isCustomer = customerElement != null && customerElement.Value == "true";
So you could mix all of that it in a little function
public IList<YourClass> ParseCustomers(string xmlPath, out isCustomer) {
var xml = XElement.Load(xmlPath);
var customerElement = xml.Element("customer");
isCustomer = customerElement != null && customerElement.Value == "true";
return xml.Descendants("dataSet").Elements()
.Select(m => new <YourClass>
{
Type = m.Name.LocalName,
Value = m.Value
}).ToList();
}

How to Search Within a XMl file based on Attributes values...

I am using an xml file from this link..
http://www.goalserve.com/samples/soccer_livescore.xml
..
Lets say "category" is our "Tournament" then
I need to search and show the ---
1. The listing of all the "Tournaments" in gridview or datalist.
2. The listing of matches within a selected "Tournament"..
3. The listing of events within the matches etc..
Pls guide me how to achieve this... M using a Dataset.Readxml but then the inner linking of fields become very complex...
Pls guide...
Thanks..n..regards,
The simplest way to do this is with LINQ to XML. Something like this:
var doc = XDocument.Load(url);
var tournaments = doc.Root
.Elements("category")
.Where(x => (string) x.Attribute("name") == "Tournament")
.Single(); // Is there only one matching catgeory?
var matches = tournaments
.Elements("match")
.Select(m => new
{
LocalTeam = (string) m.Element("localteam").Attribute("name"),
VisitorTeam = (string) m.Element("localteam").Attribute("name"),
Events = m.Elements("Events")
.Select(e => new
{
Player = (string) e.Attribute("player"),
Type = (string) e.Attribute("type"),
// etc
})
.ToList();
});
How you display that is then up to you. You may want to create your own "normal" types for Event, Match etc rather than using the anonymous types above.
LINQ to XML is by far the simplest way of working with XML that I've used.

Sort by most recent date and cluster (group) similar titles

Looking for LINQ needed to sort on a date field but also have similar titles grouped and sorted. Consider something like the following desired ordering:
Title Date
"Some Title 1/3" 2009/1/3 "note1: even this is old title 3/3 causes this group to be 1st"
"Some Title 2/3" 2011/1/31 "note2: dates may not be in sequence with titles"
"Some Title 3/3" 2011/1/1 "note3: this date is most recent between "groups" of titles
"Title XYZ 1of2" 2010/2/1
"Title XYz 2of2" 2010/2/21
I've shown titles varying by some suffix. What if a poster used something like the following for titles?
"1 LINQ Tutorial"
"2 LINQ Tutorial"
"3 LINQ Tutorial"
How would the query recognize these are similar titles?
You don't have to solve everything, a solution for the 1st example is much appreciated.
Thank you.
Addendum #1 20110605
#svick also Title authors typically are not thoughtful to use say 2 digits when their numbering scheme goes beyond 9. for example 01,02...10,11 etc..
Typical patterns I've seen tend to be either prefix or suffix or even buried in such as
1/10 1-10 ...
(1/10) (2/10) ...
1 of 10 2 of 10
Part 1 Part 2 ...
You pointed out a valid pattern as well:
xxxx Tutorial : first session, xxxx Tutorial : second session, ....
If I have a Levenshtein function StringDistance( s1, s2 ) how would I fit into the LINQ query :)
Normal grouping in LINQ (and in SQL, but that's not relevant here) works by selecting some key for every element in the collection. You don't have such key, so I wouldn't use LINQ, but two nested foreaches:
var groups = new List<List<Book>>();
foreach (var book in books)
{
bool found = false;
foreach (var g in groups)
{
if (sameGroup(book.Title, g[0].Title))
{
found = true;
g.Add(book);
break;
}
}
if (!found)
groups.Add(new List<Book> { book });
}
var result = groups.Select(g => g.OrderBy(b => b.Date).ToArray()).ToArray();
This gradually creates a list of groups. Each book is compared with the first one in each group. If it matches, it is added to the group. If no group matched, the book creates a new group. In the end, we sort the results using LINQ with dot notation.
It would be more correct if books were compared with each book in a group, not just the first. But you're may not get completely correct results anyway, so I think this optimization is worth it.
This has time complexity O(N²), so it's probably not the best solution if you had millions of books.
EDIT: To sort the groups, use something like
groups.OrderBy(g => g.Max(b => b.Date))
For ordering by date you should use the OrderBy operator.
Example:
//Assuming your table is called Table in datacontext ctx
var data = from t in ctx.Table
order by t.Date
select t;
For grouping strings after similarity you should consider something like the Hamming distance or the Metaphone algorithm. (Although I do not know any direct implementations of these in .Net).
EDIT: As suggested in the comment by svick, the Levenstein distance may also be considered, as a better alternative to the Hamming distance.
Assuming that your Title and Date fields are contained in class called model consider the following class definition
public class Model
{
public DateTime Date{get;set;}
public string Title{get;set;}
public string Prefix
{get
{
return Title.Substring(0,Title.LastIndexOf(' '));
}
}
}
Alongside Date and Title properties i have created a prefix property with no setter and it is returning us the common prefix using substring. you can use any method of your choice in getter of this property. Rest of job is simple. Consider this Linqpad program
void Main()
{
var model = new List<Model>{new Model{Date = new DateTime(2011,1,3), Title = "Some Title 1/3"},
new Model{Date = new DateTime(2011,1,1), Title = "Some Title 2/3"},
new Model{Date = new DateTime(2011,1,1), Title = "Some Title 3/3"},
new Model{Date = new DateTime(2011,1,31), Title = "Title XYZ 1of2"},
new Model{Date = new DateTime(2011,1,31), Title = "Title XYZ 2of2"}};
var result = model.OrderBy(x => x.Date).GroupBy(x => x.Prefix);
Console.WriteLine(result);
}
Edits >>>
If we put the prefix aside the query itself is not returning what I was after which is: 1) Sort the groups by their most recent date 2) sort by title within clusters. Try the following
var model = new List<Model>{
new Model{Date = new DateTime(2009,1,3), Title = "BTitle 1/3"},
new Model{Date = new DateTime(2011,1,31), Title = "BTitle 2/3"},
new Model{Date = new DateTime(2011,1,1), Title = "BTitle 3/3"},
new Model{Date = new DateTime(2011,1,31), Title = "ATitle XYZ 2of2"},
new Model{Date = new DateTime(2011,1,31), Title = "ATitle XYZ 1of2"}
};
var result = model.OrderBy(x => x.Date).GroupBy(x => x.Prefix);
Console.WriteLine(result);

Categories

Resources