I'm building a website in ASP.net/C# and currently I want to get the amount of Facebook likes of a specific page (think of a video/article). I need this value programmatically, because I want to sort on it later, but that's a different story.
I already know the link Facebook itself provides to get this amount, which is posted below.
http://api.facebook.com/method/fql.query?query=select%20like_count%20from%20link_stat%20where%20url=%27http://www.google.com%27
With www.google.com being the website, whose links are being counted and can of course be changed to whichever page one needs.
Does anybody know how I can access the xml file, of the URL/XML file posted above? I've done some research, but I can't seem to find an answer that works for me.
EDIT: I found the answer. I had to navigate through the XML a bit and modify the actual URL used. Working code is posted below.
string result;
string urlToXMLfile, currentURL;
currentURL = Globals.NavigateURL(TabId, "", "CategoryId=" + catId, "MovieId=" + Request.QueryString["MovieId"]);
urlToXMLfile = "https://api.facebook.com/method/fql.query?query=select%20%20like_count%20from%20link_stat%20where%20url=%22";
urlToXMLfile += currentURL;
urlToXMLfile += "%22";
//XDocument xdoc = XDocument.Load(urlToXMLfile);
//string test = xdoc.Descendants(XName.Get("like_count")).First().Value;
XmlDocument doc = new XmlDocument();
doc.Load(urlToXMLfile);
result = doc.FirstChild.NextSibling.InnerText;
return result;
I had same issue once, when I've worked with Selenium. I found that for me it was ok just to get the text representation of that xml and keep it simple string, storing the HTML body in a variable. Which allowed me later to extract the count I need via regex or other algorithm.
I added my own answer below the question. That line of code works and returns a simple String, with the amount of FB likes that page got.
I found a Selenium solution for you, try this:
string pageSource = driver.getPageSource();
and after you get the data, you can do something like:
// Extract the text between the two like_count elements
pattern = "(?i)(<like_count.*?>)(.+?)(</like_count>)";
Related
Hey I would like to cut off a title from an RSS feed after a specific character, in this case, the character ";". I looked up plenty of questions and they all seem to do this with a predefined string. I need my code to pull the title of an RSS feed (which is dynamic, but always in a similar format with the ";" I want to delete the contents before). Here's my Code
ASP.NET - P.S I'm using a fancybox iframe to pull the link up. Its irrelevant to my issue.
<%# FormatTitle( XPath("title") ) %>
C# - I made this code after searching similar questions on StackOverflow
public static string FormatTitle(object TitleIn)
{
string input = "Bid - Contract.: 13-C-00038; Howard F. Curren AWTP New Primary Sludge Pump Station Rehabilitation – Sheltered Market";
int index = input.IndexOf(";") + 1;
if (index > 0)
input= input.Substring(index);
return input;
}
Now, the problem is now that all of my feeds have the same title, "Howard F. Curren AWTP New Primary Sludge Pump Station Rehabilitation – Sheltered Market". I need the "input" string to accept the "title" field on the xml that's being pulled. Sorry if this has already been answered. I looked up a bunch on StackOverflow and I can't find any that deal with dynamic titles.
Your code ignores the input param TitleIn and uses the local variable input that is set to the string literal. Hence, your method will always return the same value.
I have a Hyperlink field (aka column) in SharePoint 2010.
Say it's called SalesReportUrl. The url looks like:
http://portal.cab.com/SalessiteCollection/October2012Library/Forms/customview.aspx
The hyperlink field stores values in two fields (the link and description).
What would be the RegEx if I want to get the October2012Library out of the Url?
I tried this but it's definitely not working:
#"<a[\s]+[^>]*?href[\s]?=[\s\"\']+(.*?)[\"\']+.*?>([^<]+|.*?)?<\/a>";
I also tried:
^(.*?/)?Forms/$
but no luck.
I think sharepoint stores hyperlink like this:
http://portal.cab.com/SalessiteCollection/October2012Library/Forms/customview.aspx, some description
Looks like this has a solution. but what's the syntax substring get the list or library name ?https://sharepoint.stackexchange.com/questions/40712/get-list-title-in-sharepoint-designer-workflow
How about this (as Daniel suggested) :
string url = #"http://portal.cab.com/SalessiteCollection/October2012Library/Forms/customview.aspx";
Uri uri = new Uri(url);
if(uri.Segments.Length > 2))
Console.WriteLine(uri.Segments[2]); // will output "October2012Library/"
you can add .Replace("/", string.Empty) if you want to get rid of the "/"
Console.WriteLine(uri.Segments[2].Replace("/", string.Empty));
http://[^/]+/[^/]+/([^/]+)/
match's group[1] is the value you need. it gets the 3rd part (divided by /) in the url. if you need make sure it is followed by other parts, i.e. forms, add it at the end.
try using this new RegEx("SalessiteCollection/(.+?)/Forms").match(<urlString>).groups[1].value
Though it is a rough answer, you might have to make few corrections but I hope you understand what I am trying to explain.
maybe this?
http:\/\/([a-zA-Z0-9]([a-zA-Z0-9\-]{0,61}[a-zA-Z0-9])?\.)+[a-zA-Z]{2,6}\/[a-zA-Z]*\/([a-zA-Z0-9]*)\/
http://rubular.com/r/LuuuORPRXt
Is it possible to find links on a webpage by searching their text using a pattern like A-ZNN:NN:NN:NN, where N is a single digit (0-9).
I've used Regex in PHP to turn text into links, so I was wondering if it's possible to use this sort of filter in Selenium with C# to find links that will all look the same, following a certain format.
I tried:
driver.FindElements(By.LinkText("[A-Z][0-9]{2}):([0-9]{2}):([0-9]{2}):([0-9]{2}")).ToList();
But this didn't work. Any advice?
In a word, no, none of the FindElement() strategies support using regular expressions for finding elements. The simplest way to do this would be to use FindElements() to find all of the links on the page, and match their .Text property to your regular expression.
Note though that if clicking on the link navigates to a new page in the same browser window (i.e., does not open a new browser window when clicking on the link), you'll need to capture the exact text of all of the links you'd like to click on for later use. I mention this because if you try to hold onto the references to the elements found during your initial FindElements() call, they will be stale after you click on the first one. If this is your scenario, the code might look something like this:
// WARNING: Untested code written from memory.
// Not guaranteed to be exactly correct.
List<string> matchingLinks = new List<string>();
// Assume "driver" is a valid IWebDriver.
ReadOnlyCollection<IWebElement> links = driver.FindElements(By.TagName("a"));
// You could probably use LINQ to simplify this, but here is
// the foreach solution
foreach(IWebElement link in links)
{
string text = link.Text;
if (Regex.IsMatch("your Regex here", text))
{
matchingLinks.Add(text);
}
}
foreach(string linkText in matchingLinks)
{
IWebElement element = driver.FindElement(By.LinkText(linkText));
element.Click();
// do stuff on the page navigated to
driver.Navigate().Back();
}
Dont use regex to parse Html.
Use htmlagilitypack
You can follow these steps:
Step1 Use HTML PARSER to extract all the links from the particular webpage and store it into a List.
HtmlWeb hw = new HtmlWeb();
HtmlDocument doc = hw.Load(/* url */);
foreach(HtmlNode link in doc.DocumentElement.SelectNodes("//a[#href]"))
{
//collect all links here
}
Step2 Use this regex to match all the links in the list
.*?[A-Z]\d{2}:\d{2}:\d{2}:\d{2}.*?
Step 3 You get your desired links.
I'm working on a website where I need a google map to display the positions of the members of the site.
However, i'm having a bit of trouble getting data from the returned XML document when using the HTTP Geocode Service. When I put the string into the browser it returns the XML just fine and if I set a textbox.Text to the documents InnerText it also displays as it should. But when I want to extract values from nodes, it says object reference not set to an instance of an object.
I'm doing it this way:
string address = m.getProperty("adresse").Value.ToString();
string zip = m.getProperty("postNummer").Value.ToString();
string city = m.getProperty("by").Value.ToString();
XmlDocument doc = new XmlDocument();
doc.Load("http://maps.googleapis.com/maps/api/geocode/xml?address=" + zip + "+" + city + "+" + address + "+DK&sensor=true");
XmlNode latNode = doc.SelectSingleNode("GeoCodeResponse/result/geometry/location/lat/text()");
XmlNode lonNode = doc.SelectSingleNode("GeoCodeResponse/result/geometry/location/lng/text()");
// The error occurs when the code hits these:
string lat = latNode.Value;
string lon = lonNode.Value;
I must admin that I haven't worked that much with XML in C# yet, so any hint will be greatly appreciated! :-) Should also say that the above code is in a foreach loop, looping through the members of the site.
Thanks a lot in advance!
All the best,
Bo
Edit: Sorry, I forgot to paste how I get the values! ;)
Replace "GeoCodeResponse" with "GeocodeResponse"
Please note the capital C in Code is incorrect. Xml is case sensitive.
How did you get latNode and lonNode ? It seems to be those that are null.
Since you are doing it in a loop, does any of the members succeed ? Perhaps you are not getting a hit for some of the addresses, so the lat/long nodes in the document might not be there ?
There really is no way to tell the exact problem from the code you posted. Use your debugger, and step through the code to see why you are not getting latNode assigned.
Edit
This works:
XmlNode latNode = doc.SelectSingleNode("GeocodeResponse/result/geometry/location/lat/text()");
XmlNode lonNode = doc.SelectSingleNode("GeocodeResponse/result/geometry/location/lng/text()");
You had a little type in the path. "code" in "GeocodeResponse" should be lowercase. XPath is case sensitive.
I'm attempting to write a screen scraper for Digikey that will allow our company to keep accurate track of pricing, part availability and product replacements when a part is discontinued. There seems to be a discrepancy between the XPATH that I'm seeing in Chrome Devtools as well as Firebug on Firefox and what my C# program is seeing.
The page that I'm scraping currently is http://search.digikey.com/scripts/DkSearch/dksus.dll?Detail&name=296-12602-1-ND
The code I'm currently using is pretty quick and dirty...
//This function retrieves data from the digikey
private static List<string> ExtractProductInfo(HtmlDocument doc)
{
List<HtmlNode> m_unparsedProductInfoNodes = new List<HtmlNode>();
List<string> m_unparsedProductInfo = new List<string>();
//Base Node for part info
string m_baseNode = #"//html[1]/body[1]/div[2]";
//Write part info to list
m_unparsedProductInfoNodes.Add(doc.DocumentNode.SelectSingleNode(m_baseNode + #"/table[1]/tr[1]/td[1]/table[1]/tr[1]/td[1]"));
//More lines of similar form will go here for more info
//this retrieves digikey PN
foreach(HtmlNode node in m_unparsedProductInfoNodes)
{
m_unparsedProductInfo.Add(node.InnerText);
}
return m_unparsedProductInfo;
}
Although the path I'm using appears to be "correct" I keep getting NULL when I look at the list "m_unparsedProductInfoNodes"
Any idea what's going on here? I'll also add that if I do a "SelectNodes" on the baseNode it only returns a div with the only significant child being "cs=####" which seems to vary with browser user agents. If I try to use this in anyway (putting /cs=0 in the path for the unidentifiable browser) it pitches a fit insisting that my expression doesn't evaluate to a node set, but leaving them still leaves the problem of all data past div[2] is returned as NULL.
Try using this XPath expression:
/html[1]/body[1]/div[2]/cs=0[1]/rf=141[1]/table[1]/tr[1]/td[1]/table[1]/tr[1]/td[1]
Using Google Chrome Developer Tools and Firebug in Firefox, it seems like webpage has a 'cs' and 'rf' tags before the first table. Something like:
<cs="0">
<rf="141">
<table>
...
</table>
</rf>
</cs>
There is something that might be useful to know what is happening when you want to parse a known HTML file and you're not getting results as expected. In this case I just did:
string xpath = "";
//In this case I'll get all cells and see what cell has the text "296-12602-1-ND"
foreach (HtmlNode node in doc.DocumentNode.SelectNodes("//td"))
{
if (node.InnerText.Trim() == "296-12602-1-ND")
xpath = node.XPath; //Here it is
}
Or you could just debug your application after document loads, and go through each child node until you find the node you want to get the info from. If you just set a breakpoint when InnerText is found, you can just go through parents and then keep looking for other nodes. I usually do that entering manually commands in a 'watch' window and navigating using the treeview to see properties, attributes and childs.
Just for an update:
I switched from c# into a bit more friendly Python (my experience with programming is asm, c, and python, the whole OO thing was totally new) and managed to correct my xpath issues. The tag was indeed the problem, but luckily it's unique, so a little regular expression and a removed line and I was in good shape. I'm not sure why a tag like that breaks the XPATH though. If anyone has some insight I'd like to hear it.