I need to make a list like this: List>
And I want to copy contents from a table on this website .
More specificly, I want the word in languange 1 in the first string and the word in languange 2 in the second string, and then do that for every word in that table.
I want to be able to do that by just entering a url because I want to do this for more languanges.
It`s probably pretty easy but I have never done something like this before so sorry if this question is trivial.
Also, please excuse my english scince it`s not my first language.
Thanks in advance.
You can use AngleSharp
public static async Task Main(string[] args)
{
List<WordCls> wordList = new List<WordCls>();
IBrowsingContext context = BrowsingContext.New(Configuration.Default.WithDefaultLoader());
Url url = Url.Create("http://1000mostcommonwords.com/1000-most-common-afrikaans-words");
IDocument doc = await context.OpenAsync(url);
IElement tableElement = doc.QuerySelector("table");
var trs = tableElement.QuerySelectorAll("tr");
foreach (IElement tr in trs.Next(selector: null))
{
var tds = tr.QuerySelectorAll("td");
WordCls word = new WordCls
{
Number = Convert.ToInt32(tds[0].Text()),
African = tds[1].Text(),
English = tds[2].Text()
};
wordList.Add(word);
}
Console.WriteLine(wordList.Count);
}
public class WordCls
{
public int Number { get; set; }
public string African { get; set; }
public string English { get; set; }
}
You can check out HTMLAgility pack for C#. it is a powerful tool to crawl Web content as well. you can find enough information over here https://html-agility-pack.net/from-web
Related
I'm a beginner programmer working on a small webscraper in C#. The purpose is to take a hospital's public website, grab the data for each doctor, their department, phone and diploma info, and display it in a Data Grid View. It's a public website, and as far as I'm concerned, the website's robots.txt allows this, so I left everything in the code as it is.
I am able to grab each data (name, department, phone, diploma) separately, and can successfully display them in a text box.
// THIS WORKS:
string text = "";
foreach (var nodes in full)
{
text += nodes.InnerText + "\r\n";
}
textBox1.Text = text;
However, when I try to pass the data on to the data grid view using a class, the foreach loop only goes through the first name and fills the data grid with that.
foreach (var nodes in full)
{
var Doctor = new Doctor
{
Col1 = full[0].InnerText,
Col2 = full[1].InnerText,
Col3 = full[2].InnerText,
Col4 = full[3].InnerText,
};
Doctors.Add(Doctor);
}
I spent a good few hours looking for solutions but none of what I've found have been working, and I'm at the point where I can't decide if I messed up the foreach loop somehow, or if I'm not doing something according to HTML Agility Pack's rules. It lets me iterate through for the textbox, but not the foreach. Changing full[0] to nodes[0] or nodes.InnerText doesn't seem to solve it either.
link to public gist file (where you can see my whole code)
screenshot
Thank you for the help in advance!
The problem is how you're selecting the nodes from the page. full contains all individual names, departments etc. in a flat list, which means full[0] is the name of the first doctor while full[4] is the name of the next. Your for-loop doesn't take that into account, as you (for every node) always access full[0] to full[3] - so, only the properties of the first doctor.
To make your code more readable I'd split it up a bit to first make a list of all the card-elements for each doctor and then select the individual parts within the loop:
HtmlWeb web = new HtmlWeb();
HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
doc = web.Load("https://klinikaikozpont.unideb.hu/doctor_finder");
const string doctorListItem = "div[contains(#class, 'doctor-list-item-model')]";
const string cardContent = "div[contains(#class, 'card-content')]";
var doctorCards = doc.DocumentNode.SelectNodes($"//{doctorListItem}/{cardContent}");
var doctors = new List<Doctor>();
foreach (var card in doctorCards)
{
var name = card.SelectSingleNode("./h3")?.InnerText;
const string departmentNode = "div[contains(#class, 'department-name')]";
var department = card.SelectSingleNode($"./{departmentNode}/p")?.InnerText;
// other proprties...
doctors.Add(new Doctor{NameAndTitle = name, Department = department});
}
// I took the liberty to make this class easier to understand
public class Doctor
{
public string NameAndTitle { get; set; }
public string Department { get; set; }
// Add other properties
}
Check out the code in action.
Ok, i'm super new to this and this is for a schoolproject.
The project is to code a program where a person can store, update and search information.
In my program i make lists which store cloth information (brand, type, color, size) and i think my information gets stored but i don't know how access it / make a search function for it.
Is my code correct? Should i use another strategy?
This is where my list is defined(?!)
public class klädDATALIST
{
public string märke;
public string typ;
public string färg;
public string storlek;
public klädDATALIST(string _märke, string _typ, string _färg, string _storlek)
{
this.märke = _märke;
this.typ = _typ;
this.färg = _färg;
this.storlek = _storlek;
}
}
This is wehre the string variabels will be filled through a couple of Readline() functions.
For exampel:
string _färg = Console.ReadLine().ToUpper();
Then after i've saved it ill make a new list, i think?:
List<klädDATALIST> newklädDataList = new List<klädDATALIST>();
newklädDataList.Add(new klädDATALIST(_märke, _typ, _färg, _storlek));
I hope you can help me, thank you!
Elements can be accessed by iterating through the collection/List.
foreach( var item in newklädDataList)
{
// access or read item members.
Console.WriteLine(item.märke);
}
When you want to find an element in the List, you can either use Linq
var item = newklädDataList.FirstOrDefault(e=>e.märke == "searchstring"); //Any key to identify list item.
if(item != null)
{
Console.WriteLine(item.märke);
}
Or use Find
var item = newklädDataList.Find(e=>e.märke == "searchstring");
Hope this helps!
I want to parse date, link text and link href from table class='nice' on web page http://cslh.cz/delegace.html?id_season=2013
I have created object DelegationLink
public class DelegationLink
{
public string date { get; set; }
public string link { get; set; }
public string anchor { get; set; }
}
and used it with LINQ to create List of DelegationLink
var parsedValues =
from table in htmlDoc.DocumentNode.SelectNodes("//table[#class='nice']")
from date in table.SelectNodes("tr//td")
from link in table.SelectNodes("tr//td//a")
.Where(x => x.Attributes.Contains("href"))
select new DelegationLink
{
date = date.InnerText,
link = link.Attributes["href"].Value,
anchortext = link.InnerText,
};
return parsedValues.ToList();
which takes date column ony by one and combine it with link column in every row, but i just want to simply take every row in table and get date, href and hreftext from that row.
I am new to LINQ and i used google for a 4 hours without any effect. Thanks for the help.
Well, that's rather easy, you just have to select the tr's in the SelectNodes function calls and adjust your code a bit.
Something like this.
var parsedValues = htmlDoc.DocumentNode.SelectNodes("//table[#class='nice']/tr").Skip(1)
.Select(r =>
{
var linkNode = r.SelectSingleNode(".//a");
return new DelegationLink()
{
date = r.SelectSingleNode(".//td").InnerText,
link = linkNode.GetAttributeValue("href",""),
anchor = linkNode.InnerText,
};
}
);
return parsedValues.ToList();
I'm trying to add some csv elements to a list of Alimento, where Alimento is declared as:
namespace ContaCarboidrati
{
class Alimento
{
public virtual string Codice { get; set; }
public virtual string Descrizione { get; set; }
public virtual int Carboidrati { get; set; }
}
}
My csv looks something like this:
"C00, Pasta, 75".
Here's the method that should create the list from the csv:
private static List<Alimento> CreaListaAlimentiDaCsv()
{
List<Alimento> listaCsv = new List<Alimento>();
StreamReader sr = new StreamReader(#"C:\Users\Alex\Documents\RecordAlimenti.csv");
string abc = sr.ReadLine();
//listaCsv = abc.Split(",");
}
abc is "C00, Pasta, 75". I want to get a single element to add it to the list, or add all the 3 elements to the list, i thought that a single element is easier to made.
Sorry for my bad English
Thanks in advance
Alex
You are on the right track, but you cannot just create an Alimento of three strings, which is what you will get if you do abc.Split(","). You need to create a new Alimento object for each item (line) in the csv file and initialize each object correctly. Something like this:
var item = abc.Split(',');
listaCsv.Add(new Alimento() { Codice = item[0], Descrizione = item[1],
Carboidrati = int.Parse(item[2])};
Also, your csv seems to include spaces after the commas which you might want to get rid of. You could use string.Trim() to get rid of leading/trailing spaces. You also have to make sure the third item is actually an integer and take action if that is not the case (i.e. add some error handling).
As a side note, implementing a csv reader is not as trivial as one may think, but there are several free C# implementations out there. If you need something a bit more advanced than just reading a simple (and strictly one-line-per-item) csv, try one of these:
http://www.codeproject.com/Articles/9258/A-Fast-CSV-Reader
http://www.filehelpers.com/
You can parse file with LINQ
var listaCsv = (from line in File.ReadAllLines("RecordAlimenti.csv")
let items = line.Split(',')
select new Alimento {
Codice = items[0],
Descrizione = items[1],
Carboidrati = Int32.Parse(items[2])
}).ToList();
You can parse it pretty easy assuming your data isn't bad.
private IEnumerable<Alimento> CreaListaAlimentiDaCsv(string fileName)
{
return File.Readlines(fileName) //#"C:\Users\Alex\Documents\RecordAlimenti.csv"
.Select(line => line.Split(',').Trim())
.Select(
values =>
new Alimento
{
Codice = value[0],
Descrizione = values[0],
Carboidrati = Convert.ToInt32(values[3])
});
}
You can also use Linq on the method such as
//Takes one line without iterating the entire file
CreaListaAlimentiDaCsv(#"C:\Users\Alex\Documents\RecordAlimenti.csv").Take(1);
//Skips the first line and takes the second line reading two lines total
CreaListaAlimentiDaCsv(#"C:\Users\Alex\Documents\RecordAlimenti.csv").Skip(1).Take(1);
I have a text file that looks like this:
1,Smith, 249.24, 6/10/2010
2,Johnson, 1332.23, 6/11/2010
3,Woods, 2214.22, 6/11/2010
1,Smith, 219.24, 6/11/2010
I need to be able to find the balance for a client on a given date.
I'm wondering if I should:
A. Start from the end and read each line into an Array, one at a time.
Check the last name index to see if it is the client we're looking for.
Then, display the balance index of the first match.
or
B. Use RegEx to find a match and display it.
I don't have much experience with RegEx, but I'll learn it if it's a no brainer in a situation like this.
I would recommend using the FileHelpers opensource project:
http://www.filehelpers.net/
Piece of cake:
Define your class:
[DelimitedRecord(",")]
public class Customer
{
public int CustId;
public string Name;
public decimal Balance;
[FieldConverter(ConverterKind.Date, "dd-MM-yyyy")]
public DateTime AddedDate;
}
Use it:
var engine = new FileHelperAsyncEngine<Customer>();
// Read
using(engine.BeginReadFile("TestIn.txt"))
{
// The engine is IEnumerable
foreach(Customer cust in engine)
{
// your code here
Console.WriteLine(cust.Name);
// your condition >> add balance
}
}
This looks like a pretty standard CSV type layout, which is easy enough to process. You can actually do it with ADO.Net and the Jet provider, but I think it is probably easier in the long run to process it yourself.
So first off, you want to process the actual text data. I assume it is reasonable to assume each record is seperated by some newline character, so you can utilize the ReadLine method to easily get each record:
StreamReader reader = new StreamReader("C:\Path\To\file.txt")
while(true)
{
var line = reader.ReadLine();
if(string.IsNullOrEmpty(line))
break;
// Process Line
}
And then to process each line, you can split the string on comma, and store the values into a data structure. So if you use a data structure like this:
public class MyData
{
public int Id { get; set; }
public string Name { get; set; }
public decimal Balance { get; set; }
public DateTime Date { get; set; }
}
And you can process the line data with a method like this:
public MyData GetRecord(string line)
{
var fields = line.Split(',');
return new MyData()
{
Id = int.Parse(fields[0]),
Name = fields[1],
Balance = decimal.Parse(fields[2]),
Date = DateTime.Parse(fields[3])
};
}
Now, this is the simplest example, and doesn't account for cases where the fields may be empty, in which case you would either need to support NULL for those fields (using nullable types int?, decimal? and DateTime?), or define some default value that would be assigned to those values.
So once you have that you can store the collection of MyData objects in a list, and easily perform calculations based on that. So given your example of finding the balance on a given date you could do something like:
var data = customerDataList.First(d => d.Name == customerNameImLookingFor
&& d.Date == dateImLookingFor);
Where customerDataList is the collection of MyData objects read from the file, customerNameImLookingFor is a variable containing the customer's name, and customerDateImLookingFor is a variable containing the date.
I've used this technique to process data in text files in the past for files ranging from a couple records, to tens of thousands of records, and it works pretty well.
I think the cleanest way is to load the entire file into an array of custom objects and work with that. For 3 MB of data, this won't be a problem. If you wanted to do completely different search later, you could reuse most of the code. I would do it this way:
class Record
{
public int Id { get; protected set; }
public string Name { get; protected set; }
public decimal Balance { get; protected set; }
public DateTime Date { get; protected set; }
public Record (int id, string name, decimal balance, DateTime date)
{
Id = id;
Name = name;
Balance = balance;
Date = date;
}
}
…
Record[] records = from line in File.ReadAllLines(filename)
let fields = line.Split(',')
select new Record(
int.Parse(fields[0]),
fields[1],
decimal.Parse(fields[2]),
DateTime.Parse(fields[3])
).ToArray();
Record wantedRecord = records.Single
(r => r.Name = clientName && r.Date = givenDate);
Note that both your options will scan the file. That is fine if you only want to search in the file for 1 item.
If you need to search for multiple client/date combinations in the same file, you could parse the file into a Dictionary<string, Dictionary <date, decimal>> first.
A direct answer: for a one-off, a RegEx will probably be faster.
If you're just reading it I'd consider reading in the whole file in memory using StreamReader.ReadToEnd and then treating it as one long string to search through and when you find a record you want to look at just look for the previous and next line break and then you have the transaction row you want.
If it's on a server or the file can be refreshed all the time this might not be a good solution though.
If it's all well-formatted CSV like this then I'd use something like the Microsoft.VisualBasic.TextFieldParser class or the Fast CSV class over on code project to read it all in.
The data type is a little tricky because I imagine not every client has a record for every day. That means you can't just have a nested dictionary for your looksup. Instead, you want to "index" by name first and then date, but the form of the date record is a little different. I think I'd go for something like this as I read in each record:
Dictionary<string, SortedList<DateTime, double>>
hey, hey, hey!!! why not do it with this great project on codeproject Linq to CSV, way cool!
rock solid