Scrape Data and join fields from web in c# - c#

I'm trying to make a simple TVchannel guide for a school project using C#.
I made this, viewing a youtube tutorial:
List<string> programasSPTV1 = new List<string>();
List<string> horasSPTV1 = new List<string>();
WebClient web = new WebClient();
String html = web.DownloadString("http://www.tv.sapo.pt/programacao/detalhe/sport-tv-1");
MatchCollection m1 = Regex.Matches(html, "\\s*(.+?)\\s*", RegexOptions.Singleline);
MatchCollection m2 = Regex.Matches(html, "<p>\\s*(.+?)\\s*</p>", RegexOptions.Singleline);
foreach(Match m in m1)
{
string programaSPTV1 = m.Groups[1].Value;
programasSPTV1.Add(programaSPTV1);
}
foreach (Match m in m2)
{
string hora_programaSPTV1 = m.Groups[1].Value;
horasSPTV1.Add(hora_programaSPTV1);
}
listBox1.DataSource = programasSPTV1 + horasSPTV1;
The last line is not correct... :(
What I really need is to get the time and program together in the same box...
Something like
17h45 : Benfica-FCPorto
And not 17h45 in a box and Benfica-FCPorto in another... :/
How can I do that?

Assuming that counts in both lists are the same, then the following should give you what you want:
listBox1.DataSource = programasSPTV1.Zip(horasSPTV1, (a,b) => (a + " : " + b)).ToList();

Related

Need write # under each tweet C#

Apologies in advance, English is not my first language.
I need to write under each tweet: #..... I am using Regex.IsMatch, but console write all tweets.
var tweet = tweets[i].Text;
var CreatedDate = tweets[i].CreatedDate.ToString("F");
var TweetTime = DateTime.Parse(CreatedDate);
var age = DateTime.Now.Subtract(TweetTime);
Console.WriteLine(tweet);
Console.WriteLine($"с момента создания прошло {age} времени");
Console.WriteLine();
var pattern = #"#\D*";
foreach(var sharp in tweet)
{
if (Regex.IsMatch(tweet, pattern, RegexOptions.IgnoreCase))
Console.WriteLine(sharp);
}
I don't know what you're trying to implement, but i see why your not getting the hashtag to print. Regex.IsMatch doesn't change the text it just evaluates it. Try something like this.
var pattern = #"#\D*";
foreach (var sharp in tweet)
{
var match = Regex.Match(pattern, sharp);
if (match.Success)
Console.WriteLine(Regex.Replace(sharp, match.Value, "#" + match.Value, RegexOptions.Singleline));
else
Console.WriteLine(sharp);
}
Thank's all
Right anwsers is next:
string pattern = #"\s*#(\w+)\s*";
Regex rgx = new Regex(pattern, RegexOptions.IgnoreCase);
MatchCollection matches = rgx.Matches(tweet);
if (matches.Count > 0)
{
foreach (Match match in matches)
Console.WriteLine(" " + match.Value);
}
Thanks for #M. Green

How to split and take multiple strings from a url in c#?

I have a string looking something like this:
/Gender=&Age=&Query=&Orgrimmar+l%C3%A4n=01&Stormwind+l%C3%A4n=07&Undercity+l%C3%A4n=09&Pag
I want a list of string with "Orgrimmar", "Stormwind" and "Undercity". How is this possible so that it splits AFTER Query and between & and + in order so that we avoid getting a string like this "Orgrimmar+l%C3%A4n=01&Stormwind".
Let us assume that we don't know the name of the strings.. :)
Updated, i still don't seem to get it to work. I have added a list of counties that i can use to validate this. However i still find it hard in this case. countyList is used to validate that the counties/cities in the url matches a pre-existing Collection.
var countyQuery = Request.Url.Query;
var counties = this._locationService.GetAllCounties();
List<string> countyList = new List<string>();
List<string> selectedCountiesList = new List<string>();
foreach (var i in counties)
{
countyList.Add(i.Name);
}
Regex r = new Regex(#"&(.+?)\+");
MatchCollection mc = r.Matches(countyQuery);
foreach (Match curMatch in mc)
{
if (countyList.Contains(curMatch.Groups[1].Value))
{
selectedCountiesList.Add(curMatch.Groups[1].Value);
}
}
return selectedCountiesList;
Changed url to be/?Gender=&Age=&Query=&county=13&county=08&county=01&Page=1
where 13, 08, 01 and so on is Id of the counties
The final solution was:
var selectedCountyQuery = Request.QueryString
//CountySearch = "county"
[QueryStringParameters.CountySearch];
List countyList = new List();
List<string> selectedCounties = new List<string>();
if (!string.IsNullOrEmpty(selectedCountyQuery))
{
var selectedCountiesArray = selectedCountyQuery.Split(new[]{ ',' });
foreach (var selectedCounty in selectedCountiesArray)
{
selectedCounties.Add(selectedCounty);
}
}
return selectedCounties;
You can get all parameter and value with Substring() and Split() method.
Example :
var URL = "controller/method?var1=&var2=&var3=dsgdf";
var ParameterPart = URL.Split("?")[1];
var ParametersArray = ParameterPart.Split("&");
//output : ["var1=","var2=","var3=dsgdf"];
foreach(var Parameter in ParametersArray)
{
var ParameterName= Parameter.Split("=")[0];
var ParameterValue= Parameter.Split("=")[1];
}
You can use a regex and extract the matches:
Regex r = new Regex(#"&(.+?)\+");
MatchCollection mc = r.Matches(s);
Then you can itterate your desired strings (in this case wow cities) like:
foreach(Match curMatch in mc)
{
Console.WriteLine(curMatch.Groups[1].Value);
}
string[] numbers ={ "/Gender=&Age=&Query=&Orgrimmar+l%C3%A4n=01&Stormwind+l%C3%A4n=07&Undercity+l%C3%A4n=09&Pag"};
string sPattern = #"(?<=&Orgrimmar)+";
foreach (string s in numbers){
if (System.Text.RegularExpressions.Regex.IsMatch(s, sPattern)){
System.Console.WriteLine(" - valid");}
else{System.Console.WriteLine(" - invalid");}
Output: valid
string[] numbers ={ "/Gender=&Age=&Query=Orgrimmar+l%C3%A4n=01&Stormwind+l%C3%A4n=07&Undercity+l%C3%A4n=09&Pag"};
Output: invalid
Further to check two parameters:
string[] numbers ={ "/Gender=&Age=&Query=&Orgrimmar+l%C3%A4n=01&Stormwind+l%C3%A4n=07&Undercity+l%C3%A4n=09&Pag"};
string sPattern = #"(?<=&Orgrimmar)+";
string sPattern2 = #"(?<=&Stormwind)+";
foreach (string s in numbers){
if (System.Text.RegularExpressions.Regex.IsMatch(s, sPattern) && System.Text.RegularExpressions.Regex.IsMatch(s, sPattern2))
...

Retrieve The Second Name Using Regex

I want to use Regex to retrieve the person and its address.
The result wild be :
All Frank Anderson and its address inside of a string list.
Problem:
The problem I'm facing is that I cannot retrieve the second name that is "Frank Andre Anderson" based on my regex.
It also might be other people who can have another second name.
Thank you!
string pFirstname = "Frank"
string pLastname = "Anderson";
string input = w.DownloadString("http://www.birthday.no/sok/?f=Frank&l=Anderson");
Match theRegex8 = Regex.Match(input, #"(?<=\><b>)" + pFirstname + "(.+?)" + pLastname + "</b></a></h3><p><span>(.+?<)", RegexOptions.IgnoreCase);
foreach (var matchgroup in theRegex8.Groups)
{
var sss = matchgroup;
}
The current result that I'm using the code is:
You must be looking for something like
(?<=>[^<]*<b>)Frank([^<]+)Anderson</b></a></h3><p><span>([^<]+)
See concise RegexStorm demo
In C#, the regex declaration will be
Match theRegex8 = Regex.Match(input, #"(?<=>[^<]*<b>)" + pFirstname + "([^<]+)" + pLastname + "</b></a></h3><p><span>([^<]+)", RegexOptions.IgnoreCase);
The problem you had was with . matching any character while we need to restrict to a non-angle bracket.
Update
Perhaps, you could leverage HtmlAgilityPack by getting all <a> tags that have <b> as the first child, and then get the InnerText that meets your conditions:
var conditions = new[] { pFirstname, pLastname};
var seconds = new List<string>();
var webGet = new HtmlAgilityPack.HtmlWeb();
var doc = webGet.Load("http://www.birthday.no/sok/?f=Frank&l=Anderson");
var a_nodes = doc.DocumentNode.Descendants("a").Where(a => a.HasChildNodes && a.ChildNodes[0].Name == "b");
var res = a_nodes.Select(a => a.ChildNodes[0].InnerText).Where(b => conditions.All(condition => b.Contains(condition))).ToList();
foreach (var name in res)
{
var splts = name.Split(new[] {" "}, StringSplitOptions.RemoveEmptyEntries);
if (splts.GetLength(0) > 2) // we have 3 elements at the least
seconds.Add(name.Trim().Substring(name.Trim().IndexOf(" ") + 1, name.Trim().LastIndexOf(" ") - name.Trim().IndexOf(" ") - 1));
}
This way, you will get just the second names. I could not test this code, but I think you get the gist.

SplitString or SubString or?

Is there (.NET 3.5 and above) already a method to split a string like this:
string str = "{MyValue} something else {MyOtherValue}"
result: MyValue , MyOtherValue
Do like:
string regularExpressionPattern = #"\{(.*?)\}";
Regex re = new Regex(regularExpressionPattern);
foreach (Match m in re.Matches(inputText))
{
Console.WriteLine(m.Value);
}
System.Console.ReadLine();
dont forget to add new namespace: System.Text.RegularExpressions;
You can use regular expressions to do it. This fragment prints MyValue and MyOtherValue.
var r = new Regex("{([^}]*)}");
var str = "{MyValue} something else {MyOtherValue}";
foreach (Match g in r.Matches(str)) {
var s = g.Groups[1].ToString();
Console.WriteLine(s);
}
MatchCollection match = Regex.Matches(str, #"\{([A-Za-z0-9\-]+)\}", RegexOptions.IgnoreCase);
Console.WriteLine(match[0] + "," + match[1]);
Something like this:
string []result = "{MyValue} something else {MyOtherValue}".
Split(new char[]{'{','}'}, StringSplitOptions.RemoveEmptyEntries)
string myValue = result[0];
string myOtherValue = result[2];

Looping through Regex Matches

This is my source string:
<box><3>
<table><1>
<chair><8>
This is my Regex Patern:
<(?<item>\w+?)><(?<count>\d+?)>
This is my Item class
class Item
{
string Name;
int count;
//(...)
}
This is my Item Collection;
List<Item> OrderList = new List(Item);
I want to populate that list with Item's based on source string.
This is my function. It's not working.
Regex ItemRegex = new Regex(#"<(?<item>\w+?)><(?<count>\d+?)>", RegexOptions.Compiled);
foreach (Match ItemMatch in ItemRegex.Matches(sourceString))
{
Item temp = new Item(ItemMatch.Groups["item"].ToString(), int.Parse(ItemMatch.Groups["count"].ToString()));
OrderList.Add(temp);
}
Threre might be some small mistakes like missing letter it this example because this is easier version of what I have in my app.
The problem is that In the end I have only one Item in OrderList.
UPDATE
I got it working.
Thans for help.
class Program
{
static void Main(string[] args)
{
string sourceString = #"<box><3>
<table><1>
<chair><8>";
Regex ItemRegex = new Regex(#"<(?<item>\w+?)><(?<count>\d+?)>", RegexOptions.Compiled);
foreach (Match ItemMatch in ItemRegex.Matches(sourceString))
{
Console.WriteLine(ItemMatch);
}
Console.ReadLine();
}
}
Returns 3 matches for me. Your problem must be elsewhere.
For future reference I want to document the above code converted to using a declarative approach as a LinqPad code snippet:
var sourceString = #"<box><3>
<table><1>
<chair><8>";
var count = 0;
var ItemRegex = new Regex(#"<(?<item>[^>]+)><(?<count>[^>]*)>", RegexOptions.Compiled);
var OrderList = ItemRegex.Matches(sourceString)
.Cast<Match>()
.Select(m => new
{
Name = m.Groups["item"].ToString(),
Count = int.TryParse(m.Groups["count"].ToString(), out count) ? count : 0,
})
.ToList();
OrderList.Dump();
With output:

Categories

Resources