Parse html table using LINQ and HtmlAgilityPack - c#

I want to parse date, link text and link href from table class='nice' on web page http://cslh.cz/delegace.html?id_season=2013
I have created object DelegationLink
public class DelegationLink
{
public string date { get; set; }
public string link { get; set; }
public string anchor { get; set; }
}
and used it with LINQ to create List of DelegationLink
var parsedValues =
from table in htmlDoc.DocumentNode.SelectNodes("//table[#class='nice']")
from date in table.SelectNodes("tr//td")
from link in table.SelectNodes("tr//td//a")
.Where(x => x.Attributes.Contains("href"))
select new DelegationLink
{
date = date.InnerText,
link = link.Attributes["href"].Value,
anchortext = link.InnerText,
};
return parsedValues.ToList();
which takes date column ony by one and combine it with link column in every row, but i just want to simply take every row in table and get date, href and hreftext from that row.
I am new to LINQ and i used google for a 4 hours without any effect. Thanks for the help.

Well, that's rather easy, you just have to select the tr's in the SelectNodes function calls and adjust your code a bit.
Something like this.
var parsedValues = htmlDoc.DocumentNode.SelectNodes("//table[#class='nice']/tr").Skip(1)
.Select(r =>
{
var linkNode = r.SelectSingleNode(".//a");
return new DelegationLink()
{
date = r.SelectSingleNode(".//td").InnerText,
link = linkNode.GetAttributeValue("href",""),
anchor = linkNode.InnerText,
};
}
);
return parsedValues.ToList();

Related

how to take specific column from text file(.txt) with delimiters with C#

I have example data like this , the data is in the text file(.txt) sry i got this type of file, if its excel or csv maybe it will be easier
Edit : i make a console app with C#
FamilyID;name;gender;DOB;Place of birth;status
1;nicky;male;01-01-1998;greenland;married
1;sonia;female;02-02-1995;greenland;married
2;dicky;male;04-01-1995;bali;single
3;redding;male;01-05-1996;USA;single
3;sisca;female;05-03-1994;australia;married
i want to take the specific column from that data, for example i want to take FamilyID,Name and status.
I already tried some code to read data and take all the data and list it to new text file.
The goal is to create a new text file based on family ID, and only take specific columns.
The problem is : i cant take a specific column that i want from text file (don't know how to select many column in the code that i write)
DateTime date = DateTime.Now;
string tgl = date.Date.ToString("dd");
string bln = date.Month.ToString("d2");
string thn = date.Year.ToString();
string tglskrg = thn + "/" + bln + "/" + tgl;
string filename = ("C:\\Users\\Documents\\My Received Files\\exampledata.txt");
string[] liness = File.ReadAllLines(filename);
string[] col;
var lines = File.ReadAllLines(filename);
var groups = lines.Skip(1)
.Select(x => x.Split(';'))
.GroupBy(x => x[0]).ToArray();
foreach (var group in groups)
{
Console.WriteLine(group);
File.WriteAllLines(#"C:\\Users\\Documents\\My Received Files\\exampledata_"+group.Key+".txt", group.Select(x => string.Join(";", x)));
}
maybe someone can help? thankyou
One way to approach this would be capture the details to a data structure and later write the required details to file. For example,
public class Detail
{
public int FamilyID{get;set;}
public string Name{get;set;}
public string Gender{get;set;}
public DateTime DOB{get;set;}
public string PlaceOfBirth{get;set;}
public string Status{get;set;}
}
Now you can write a method that parses the string based on delimiter and returns an IEnumerable.
public IEnumerable<Detail> Parse(string source,char delimiter)
{
return source.Split(new []{Environment.NewLine},StringSplitOptions.RemoveEmptyEntries)
.Skip(1)
.Select(x=>
{
var detail = x.Split(new []{delimiter});
return new Detail
{
FamilyID = Int32.Parse(detail[0]),
Name = detail[1],
Gender = detail[2],
DOB = DateTime.Parse(detail[3]),
PlaceOfBirth = detail[4],
Status = detail[5]
};
}
);
}
Client Call
Parse(stringFromFile,';');
Output
Now you can pick and write the details you want to write to output file from the collection.
try this.
var list = new List<String>();
list.Add("FamilyID;name;gender;DOB;Place of birth;status");
list.Add("1;nicky;male;01-01-1998;greenland;married");
list.Add("1;sonia;female;02-02-1995;greenland;married");
list.Add("2;dicky;male;04-01-1995;bali;single");
list.Add("3;redding;male;01-05-1996;USA;single");
list.Add("3;sisca;female;05-03-1994;australia;married");
var group = from item in list.Skip(1)
let splitItem = item.Split(';', StringSplitOptions.RemoveEmptyEntries)
select new
{
FamilyID = splitItem[0],
Name = splitItem[1],
Status = splitItem[5],
};
foreach(var item in group.ToList())
{
Console.WriteLine($"Family ID: {item.FamilyID}, Name: {item.Name}, Status: {item.Status}");
}

Copy text from a table on a webpage into string in C#

I need to make a list like this: List>
And I want to copy contents from a table on this website .
More specificly, I want the word in languange 1 in the first string and the word in languange 2 in the second string, and then do that for every word in that table.
I want to be able to do that by just entering a url because I want to do this for more languanges.
It`s probably pretty easy but I have never done something like this before so sorry if this question is trivial.
Also, please excuse my english scince it`s not my first language.
Thanks in advance.
You can use AngleSharp
public static async Task Main(string[] args)
{
List<WordCls> wordList = new List<WordCls>();
IBrowsingContext context = BrowsingContext.New(Configuration.Default.WithDefaultLoader());
Url url = Url.Create("http://1000mostcommonwords.com/1000-most-common-afrikaans-words");
IDocument doc = await context.OpenAsync(url);
IElement tableElement = doc.QuerySelector("table");
var trs = tableElement.QuerySelectorAll("tr");
foreach (IElement tr in trs.Next(selector: null))
{
var tds = tr.QuerySelectorAll("td");
WordCls word = new WordCls
{
Number = Convert.ToInt32(tds[0].Text()),
African = tds[1].Text(),
English = tds[2].Text()
};
wordList.Add(word);
}
Console.WriteLine(wordList.Count);
}
public class WordCls
{
public int Number { get; set; }
public string African { get; set; }
public string English { get; set; }
}
You can check out HTMLAgility pack for C#. it is a powerful tool to crawl Web content as well. you can find enough information over here https://html-agility-pack.net/from-web

Match sections of a List, and Replace if both exist

I've got dates from separate countries within a single List<>. I'm trying to get two records that contain the same characters before the second comma, and replace BOTH of those items with a new one.
Example:
From This:
18/04/2014,Good Friday,England and Wales
18/04/2014,Good Friday,Scotland
Into this:
18/04/2014,Good Friday,"England, Wales and Scotland"
Please note there may be multiple scenarios within the list like the above example. I've managed to get everything before the second Comma with:
splitSubstring = line.Remove(line.LastIndexOf(','));
I've tried the below, but it's clearly flawed since it won't delete both the records even if it does find a match:
foreach (var line in orderedLines)
{
if (splitSubstring == line.Remove(line.LastIndexOf(',')))
{
//Replace if previous is match here
}
splitSubstring = line.Remove(line.LastIndexOf(','));
File.AppendAllText(correctFile, line);
}
I would suggest parsing it into a structure you can work with e.g.
public class HolidayInfo
{
public DateTime Date { get; set; }
public string Name { get; set; }
public string[] Countries { get; set; }
};
And then
string[] lines = new string[]
{
"18/04/2014,Good Friday,England and Wales",
"18/04/2014,Good Friday,Scotland"
};
// splits the lines into an array of strings
IEnumerable<string[]> parsed = lines.Select(l => l.Split(','));
// copy the parsed lines into a data structure you can write code against
IEnumerable<HolidayInfo> info = parsed
.Select(l => new HolidayInfo
{
Date = DateTime.Parse(l[0]),
Name = l[1],
Countries = l[2].Split(new[] {",", " and " }, StringSplitOptions.RemoveEmptyEntries)
});
...etc. And once you have it in a helpful data structure you can begin to develop the required logic. The above code is just an example, the approach is what you should focus on.
I ended up using LINQ to pull apart the List and .Add() them into another based on an if statement. LINQ made it nice and simple.
//Using LINQ to seperate the two locations from the list.
var seperateScotland = from s in toBeInsertedList
where s.HolidayLocation == scotlandName
select s;
var seperateEngland = from e in toBeInsertedList
where e.HolidayLocation == engAndWales
select e;
Thanks for pointing me to LINQ

Extract data into datatable to draw multiple graphs on one chart

I am working on a project that aims to extract and treat data for statitics purpose.
Let's say I have many elements "E" and each element has a list of fields {F1, F2, F3, ...}.
My main table looks like the following:
I need to extract data by elementID into a data table with "Date" as Key.
[{"key": "date1",
"F1": "value",
"F2": "value"}
,{"key": "date2",
"F1": "value",
"F2": "value"}
,{"key": "date3",
"F1": "value",
"F2": "value"}
,{....}]
My current implementation is the following does the next:
1) Query from database by field and order by date in a Dictionary<DateTime, double>
2) Check and fill missing values in each Dictionary.
3) Loop through the list or Dictionary by key and fill a DataRow a row by row.
I don't think that this is the ultimate solution, I have been trying to optimize code. But I am not really sure in wich layer I should focus. Is there any possible way to get the required structure using a select from database ( no need to further loops ) ?
It's not that clear, however, i assume that you want to group by the date-part of the DateTime and select a dictionary where this date is the key and the value is a List<double>.
Then you don't need a DataTable at all. Instead of a double i would use custom classes with all properties. Assuming that mainTable is already a DataTable with the raw data:
public class ElementMeasurement
{
public Element Element { get; set; }
public double Value { get; set; }
public string Field { get; set; }
public DateTime Date { get; set; }
}
public class Element
{
public int ElementID { get; set; }
public string Name { get; set; }
}
Now you can use Enumerable.GroupBy which is part of System.Linq:
Dictionary<DateTime, List<ElementMeasurement>> dateMeasurements = mainTable.AsEnumerable()
.GroupBy(row => row.Field<DateTime>("Date").Date)
.ToDictionary(g =>
g.Key,
g => g.Select(row => new ElementMeasurement
{
Element = new Element { ElementId = row.Field<int>("ElementId") },
Field = row.Field<string>("Field"),
Value = row.Field<double>("Value"),
TimeOfMeasurement = row.Field<DateTime>("Date")
}).ToList()
);
Edit: "Well I should let you know that it is a huge table with thousands of miles of rows!"
I didnt know that the table was not already in memory. Then this might be too memory expensive. So maybe a loop on the DataReader which yields rows ordered by Date is more efficient. Then you could still use my classes above to keep it readable and maintainable but fill the Dictionary step-by-step.
For example (assuming SQL-Server):
var dateMeasurements = new Dictionary<DateTime, List<ElementMeasurement>>();
using(var con = new SqlConnection("ConnectionString"))
using (var cmd = new SqlCommand("SELECT e.* FROM Elements e ORDER BY Date,ElementId,Field,Value", con))
{
con.Open();
using (var rd = cmd.ExecuteReader())
while(rd.Read())
{
DateTime timeOfMeasurement = rd.GetDateTime(rd.GetOrdinal("Date"));
DateTime dateOfMeasurement = timeOfMeasurement.Date;
List<ElementMeasurement> measurements = null;
if (!dateMeasurements.TryGetValue(dateOfMeasurement, out measurements))
{
measurements = new List<ElementMeasurement>();
dateMeasurements.Add(dateOfMeasurement, measurements);
}
var measurement = new ElementMeasurement
{
Element = new Element { ElementId = rd.GetInt32(rd.GetOrdinal("ElementId")) },
Field = rd.GetString(rd.GetOrdinal("Field")),
Value = rd.GetDouble(rd.GetOrdinal("Value")),
TimeOfMeasurement = timeOfMeasurement
};
measurements.Add(measurement);
}
}

Get SQL LINQ Results Based off of String List

Lets start off with a list of strings that will be used to filter the results:
List<String> RadioNames = new List<String>();
RadioNames.AddRange(new String[] { "abc", "123", "cba", "321" });
I want to be able to filter a LINQ to SQL database table based on RadioNames but the catch is that I want RadioNames to be a partial match (meaning it will catch Radio123 and not just 123).
The source that I need to filter is below:
var ChannelGrants = from cg in sddc.ChannelGrants
select new
{
cg.ID,
cg.Timestamp,
cg.RadioID,
cg.Radio
};
So I need to perform something similar to below (outside of the original ChannelGrants results as this is a conditional search)
if(RadioNamesToSearch != null)
{
List<String> RadioNames = new List<String>();
// Here I split all the radio names from RadioNamesToSearch based on a command separator and then populate RadioNames with the results
ChannelGrants = from cg in ChannelGrants
where ???
select cg;
}
I need help where ??? is in the code above (or if ChannelGrants = ... is invalid all together). Repeating above, I need to filter ChannelGrants to return any matches from RadioNames but it will do partial matches (meaning it will catch Radio123 and not just 123).
All the code is contained in a method as such...
public static DataTable getBrowseChannelGrants(int Count = 300, String StartDate = null, String StartTime = null, String EndDate = null, String EndTime = null, String RadioIDs = null, String RadioNamesToSearch = null, String TalkgroupIDs = null, String TalkgroupNames = null, bool SortAsc = false)
What field in ChannelGrants are you comparing RadioNames to?
To retrieve entries that are only in your RadioNames list, you'd use the contains method like this
ChannelGrants = from cg in ChannelGrants
where RadioNames.Contains(cg.Radio)
select cg;
(If you wanted to find all rows that had one of your RadioNames in the Radio property. Replace cg.Radio with the appropriate column you are matching)
This gives you a similar outcome if you had this where clause in SQL
where cg.Radio in ("abc", "123", "cba", "321")
from this link How to do SQL Like % in Linq?
it looks like you can combo it with like matching as well, but adding slashes, by it's not something I've done personally.
in place of the ???
RadioNames.Where(rn=>cg.Radio.ToLower().Contains(rn.ToLower())).Count() > 0
That should do it...
The ToLower() calls are optional, of course.
EDIT: I just wrote this and it worked fine for me in a Console Application. The result contained one item and the WriteLine spit out "cbaKentucky". Not sure what to tell ya.
class Program
{
static void Main(string[] args)
{
List<String> RadioNames = new List<String>();
RadioNames.AddRange(new String[] { "abc", "123", "cba", "321" });
List<ChannelGrants> grants = new List<ChannelGrants>();
grants.Add(new ChannelGrants() { ID = 1, Radio = "cbaKentucky", RadioID = 1, TimeStamp = DateTime.Now });
var result = from cg in grants
where RadioNames.Where(rn=>cg.Radio.ToLower().Contains(rn.ToLower())).Count() > 0
select cg;
foreach (ChannelGrants s in result)
{
Console.WriteLine(s.Radio);
}
}
}
class ChannelGrants
{
public int ID { get; set; }
public DateTime TimeStamp { get; set; }
public int RadioID { get; set; }
public string Radio { get; set; }
}
At the moment, there doesn't seem to be a best way so I'll answer this until a new answer that doesn't repeat the other answers that don't work on this thread.

Categories

Resources