Yahoo News API In C# - c#

So I'm working on a speech recognition program in C# and while trying to implement the YAHOO News API into the program I am getting no response.
I won't copy/paste my whole code as it would be very long so here are the main bits.
private void GetNews()
{
string query = String.Format("http://news.yahoo.com/rss/");
XmlDocument wData = new XmlDocument();
wData.Load(query);
XmlNamespaceManager manager = new XmlNamespaceManager(wData.NameTable);
manager.AddNamespace("media", "http://search.yahoo.com/mrss/");
XmlNode channel = wData.SelectSingleNode("rss").SelectSingleNode("channel");
XmlNodeList nodes = wData.SelectNodes("rss/channel/item/description", manager);
FirstStory = channel.SelectSingleNode("item").SelectSingleNode("title", manager).Attributes["alt"].Value;
}
I believe I have done something wrong here:
XmlNode channel = wData.SelectSingleNode("rss").SelectSingleNode("channel");
XmlNodeList nodes = wData.SelectNodes("rss/channel/item/description", manager);
FirstStory = channel.SelectSingleNode("item").SelectSingleNode("title", manager).Attributes["alt"].Value;
Here is the full XML Document: http://news.yahoo.com/rss/
If any more info is required let me know.

Hmm I have implemented my own code to get news from Yahoo, I read all the news Title ( which is located at rss/channel/item/title ) and Short story ( which is located rss/channel/item/description ).
The short story is the problem for news, and that is the point when we need to get all the inner text of description node in a string and then parse it like XML. The text code is in this format and the Short story is right behind </p>
<p><a><img /></a></p>"Short Story"<br clear="all"/>
We need to modify it since we have many xml roots (p and br) and we add an extra root <me>
string ShStory=null;
string Title = null;
//Creating a XML Document
XmlDocument doc = new XmlDocument();
//Loading rss on it
doc.Load("http://news.yahoo.com/rss/");
//Looping every item in the XML
foreach (XmlNode node in doc.SelectNodes("rss/channel/item"))
{
//Reading Title which is simple
Title = node.SelectSingleNode("title").InnerText;
//Putting all description text in string ndd
string ndd = node.SelectSingleNode("description").InnerText;
XmlDocument xm = new XmlDocument();
//Loading modified string as XML in xm with the root <me>
xm.LoadXml("<me>"+ndd+"</me>");
//Selecting node <p> which has the text
XmlNode nodds = xm.SelectSingleNode("/me/p");
//Putting inner text in the string ShStory
ShStory= nodds.InnerText;
//Showing the message box with the loaded data
MessageBox.Show(Title+ " "+ShStory);
}
Choose the me as the right answer or vote me up if the code works for you. If there are any issues you can ask me. Cheers

It's likely that you are passing that namespace manager to those attributes, but I'm not 100% certain. Those are definitely not in that .../mrss/ namespace, so I would guess that is your problem.
I would try it without passing the namespace (if possible) or using the GetElementsByTagName method to avoid namespace issues.

Tag contains the text rather than Xml.
Here is an example to display text news:
foreach (XmlElement node in nodes)
{
Console.WriteLine(Regex.Match(node.InnerXml,
"(?<=(/a>)).+(?=(</p))"));
Console.WriteLine();
}

Related

I'm unable to extract the Data from this XML

I'm currently working on a small weather application in C#. To do this, I need to extract data from this xml file: http://query.yahooapis.com/v1/public/yql?q=select%20*%20from%20geo.places%20where%20text%3D%22london%22&format=xml
In this specific case I need the value of the first /query/results/place/woeid node. I've been looking around and tried many different methods, but didn't manage to get any values with any of those. My current code looks like this:
string query = String.Format("http://query.yahooapis.com/v1/public/yql?q=select%20*%20from%20geo.places%20where%20text%3D%22london%22&format=xml");
XmlDocument xml = new XmlDocument();
xml.Load(query);
XmlNodeList nodeList = wData.DocumentElement.SelectNodes("/query/results/place");
foreach (XmlNode node in nodeList)
{
return node.SelectSingleNode("woeid").InnerText;
}
return "NO WOEID FOUND!";
I'm just starting to learn C# so I might do some stupid mistakes. Still, I would really appreciate any kind of help.
You can use XDocument
string query = String.Format("http://query.yahooapis.com/v1/public/yql?q=select%20*%20from%20geo.places%20where%20text%3D%22london%22&format=xml");
XDocument xml = XDocument.Load(query);
XNamespace ns = "http://where.yahooapis.com/v1/schema.rng";
var woeid = xml.Element("query").Element("results").Elements(ns + "place").FirstOrDefault().Element(ns +"woeid").Value;
Make sure you check if there are any elements in the XDocuments

Selecting value from first xml child node

I have an XML file that contains multiple URLs for different image file sizes, and I'm trying to get a single url to load into a picture box. My issue is that the child nodes are named similarly, and the parent nodes are named similarly as well. For example, I want to pull the first medium image (ending in SL160_.jpg). See below for XML code
<Items>
<SmallImage>
<URL>http://ecx.images-amazon.com/images/I/51TAL%2Bn7AqL._SL75_.jpg</URL>
</SmallImage>
<MediumImage>
<URL>http://ecx.images-amazon.com/images/I/51TAL%2Bn7AqL._SL160_.jpg</URL>
</MediumImage>
<LargeImage>
<URL>http://ecx.images-amazon.com/images/I/51TAL%2Bn7AqL.jpg</URL>
</LargeImage>
<MediumImage>
<URL>http://ecx.images-amazon.com/images/I/51TAL%2Bn7AqL._SL162_.jpg</URL>
</MediumImage>
<LargeImage>
<URL>http://ecx.images-amazon.com/images/I/51TAL%2Bn7AqL.jpg</URL>
</LargeImage>
</Items>
I've tried using GetElementsByTag, as well as trying to call something like doc.SelectSingleNode("LargeImage").SelectSingleNode("URL").InnerText, and GetElementByID. All of these have given me an Object set to null reference exception.
What can I do to specify that I want the url from the first found MediumImage node?
Use LinqToXMLïĵŒIt is rather simple
string xml = #"<Items>
<SmallImage>
<URL>http://ecx.images-amazon.com/images/I/51TAL%2Bn7AqL._SL75_.jpg</URL>
</SmallImage>
<MediumImage>
<URL>http://ecx.images-amazon.com/images/I/51TAL%2Bn7AqL.01_SL160_.jpg</URL>
</MediumImage>
<LargeImage>
<URL>http://ecx.images-amazon.com/images/I/51TAL%2Bn7AqL.jpg</URL>
</LargeImage>
<MediumImage>
<URL>http://ecx.images-amazon.com/images/I/51TAL%2Bn7AqL.02_SL162_.jpg</URL>
</MediumImage>
<LargeImage>
<URL>http://ecx.images-amazon.com/images/I/51TAL%2Bn7AqL.jpg</URL>
</LargeImage>
</Items>";
XElement root = XElement.Parse(xml);
var ele = root.Elements("MediumImage").Where(e => e.Element("URL").Value.EndsWith("SL160_.jpg")).FirstOrDefault();
Console.WriteLine(ele);
In addition to Sky Fang's answer, I think the OP wants this:
var firstMedImg = root.Elements("MediumImage").First();
var imgUrl = firstMedImg.Element("URL").Value;
XmlDocument doc = new XmlDocument();
// PATH TO YOUR DOCUMENT
doc.Load("daco.xml");
// Select LIST ALL ELEMENTS SmallImage,MediumImage,LargeImage
XmlNodeList listOfAllImageElements = doc.SelectNodes("/Items/*");
foreach (XmlNode imageElement in listOfAllImageElements)
{
// Select URL ELEMENT
XmlNode urlElement= node.SelectSingleNode("URL");
System.Console.WriteLine(urlElement.InnerText);
}
Console.ReadLine();
If you want to select multiple url's
XmlDocument doc = new XmlDocument();
// PATH TO YOUR DOCUMENT
doc.Load("daco.xml");
// Select LIST ALL ELEMENTS SmallImage,MediumImage,LargeImage
XmlNodeList listOfAllImageElements = doc.SelectNodes("/Items/*");
foreach (XmlNode imageElement in listOfAllImageElements)
{
// Select URL's ELEMENTs
XmlNodeList listOfAllUrlElements = imageElement.SelectNodes("URL");
foreach (XmlNode urlElement in listOfAllUrlElements)
{
System.Console.WriteLine(urlElement.InnerText);
}
}
Console.ReadLine();
if you have specific namespace in your xml file
XmlDocument doc = new XmlDocument();
doc.Load("doc.xml");
XmlNamespaceManager man = new XmlNamespaceManager(doc.NameTable);
// reaplace http://schemas.microsoft.com/vs/2009/dgml with your namespace
man.AddNamespace("x", "http://schemas.microsoft.com/vs/2009/dgml");
// next you have to use x: in your path like this
XmlNodeList node = doc.SelectNodes("/x:Items/x:*, man);

Create loops and parsing XML nodes value for Selenium tests

I am trying to write a test function in C# that read data from an XML file and parse into Selenium testing methods , the XML code is like:
<home>
<ask_frame>
<button>
<id>Object ID<id>
<xpath>Object XPath<xpath>
<textbox>
<id>Object ID<id>
<xpath>Object XPath<xpath>
</ask_frame>
<search_frame>
<id>Object ID<id>
<xpath>Object XPath<xpath>
</search_frame>
<home>
I am trying to create a loop that read the id and xpath value from these nodes and parse them into an method for searching a webpage element by id and xpath. My initial attempt was:
Code updated
public void CheckIdTest()
{
driver.Navigate().GoToUrl(baseURL + "FlightSearch");
XmlDocument xd = new XmlDocument();
xd.Load(#"C:\XMLFile1.xml");
XmlNodeList mainlist = xd.SelectNodes("//home/*");
XmlNode mainroot = mainlist[0];
foreach (XmlNode xnode in mainroot)
{
string objID = xnode.SelectSingleNode("id").InnerText;
string objXPath = xnode.SelectSingleNode("XPath").InnerText;
objID = objID.Trim();
objXPath = objXPath.Trim();
String checkValue = "ObjID value is: " + objID + Environment.NewLine+ "ObjXPath value is: " + objXPath;
System.IO.File.WriteAllText(#"C:\checkvalue.txt", checkValue);
objectCheck(objXPath, objID);
}
}
I have put a String and checked that correct values for ObjID and ObjXPath have been achieved, but this loop also went only twice (checked 2 nodes in first branch). How could I make it runs through every node in my XML?
Any suggestions and explanations to the code will be highly appreciated.
Basically these two lines are using incorrect XPath :
XmlNodeList idlist = xd.SelectNodes("id");
XmlNodeList xpathlist = xd.SelectNodes("XPath");
<id> and <xpath> nodes aren't located directly at the root level, so you can't access it just like above. Besides, xpath is case-sensitive so you should've used "xpath" instead of "XPath". Try to fix it like this :
XmlNodeList idlist = xd.SelectNodes("//id");
XmlNodeList xpathlist = xd.SelectNodes("//xpath");
or more verbose :
XmlNodeList idlist = xd.SelectNodes("home/*/id");
XmlNodeList xpathlist = xd.SelectNodes("home/*/xpath");
UPDATE :
Responding to your comment about looping problem, I think you want to change it like this :
foreach (XmlNode xnode in mainroot.ChildNodes)
{
string objID = xnode.SelectSingleNode("id").InnerText;
string objXPath = pathroot.SelectSingleNode("xpath").InnerText;
objectCheck(objID, objXPath);
}
You are getting this error because you are trying to use an object that is null i.e not instantiated.
Put in a breakpoint at the line
XmlDocument xd = new XmlDocument();
and step through line by line till you find where the nothing.null reference is.
It should not take long to find out what the problem is.

Selecting Particular Node List in XML

<Report xmlns="Microsoft.SystemCenter.DataWarehouse.Report.Alert" xmlns:p1="w3.org/2001/XMLSchema-instance"; Name="Microsoft.SystemCenter.DataWarehouse.Report.Alert" p1:schemaLocation="Microsoft.SystemCenter.DataWarehou?Schema=True">
<Title>Alert Report</Title>
<Created>6/27/2013 9:32 PM</Created>
<StartDate>6/1/2013 9:29 PM</StartDate>
<EndDate>6/27/2013 9:29 PM</EndDate>
<TimeZone>(UTC)</TimeZone>
<Severity>Warning, Critical</Severity>
<Priority>Low, Medium, High</Priority>
<AlertTable>
<Alerts>
<Alert>
<AlertName></AlertName>
<Priority></Priority>
</Alert>
</Alerts>
</AlertTable>
</Report>
So I'm trying to pull down the list of nodes that appear under Alerts child. So /Report/AlertTable/Alerts.
I've done very similar before but in this format it is not working for some reason. Can someone point me out in the right direction?
XmlDocument Log = new XmlDocument();
Log.Load("test.xml");
XmlNodeList myLog = Log.DocumentElement.SelectNodes("//Report/AlertTable/Alerts");
foreach (XmlNode alert in myLog)
{
Console.Write("HERE");
Console.WriteLine(alert.SelectNodes("AlertName").ToString());
Console.WriteLine(alert.SelectNodes("Priority").ToString());
Console.Read();
}
EDIT:
One of the responses had me try to use a bunch of namespace with p1 but had no such luck.
EDIT:
Did not work either:
var name = new XmlNamespaceManager(log.NameTable);
name.AddNamespace("Report", "http://www.w3.org/2001/XMLSchema-instance");
XmlNodeList xml = log.SelectNodes("//Report:Alerts", name);
From a site:
nodename Selects all nodes with the name "nodename"
/ Selects from the root node
// Selects nodes in the document from the current node that match the selection no matter where they are
So I believe
"/AlertTable/Alerts"
would work, as that would be 'from the root node' as well as
"Report/AlertTable/Alerts"
XPath Site
Figured this sucker out.
It had to do with the namespace of "Microsoft.SystemCenter.DataWarehouse.Report.Alert". Changing this to anything but that won't read the XML properly.
XmlDocument log = new XmlDocument();
log.Load(#"C:\Users\barranca\Desktop\test.xml");
// XmlNodeList xml = log.SelectNodes("//ns1:Alerts");
var name = new XmlNamespaceManager(log.NameTable);
name.AddNamespace("ns1", "Microsoft.SystemCenter.DataWarehouse.Report.Alert");
XmlNodeList xml = log.SelectNodes("//ns1:Alert", name);
foreach (XmlNode alert in xml)
{
Console.Write("HERE");
XmlNode test = alert.SelectSingleNode("//ns1:AlertName",name);
string testing = test.InnerText;
Console.Write(testing);
}

Parsing html -> xml and querying with Xpath

I want to parse a html page to get some data.
First, I convert it to XML document using SgmlReader.
Then, I load the result to XMLDocument and then navigate through XPath:
//contains html document
var loadedFile = LoadWebPage();
...
Sgml.SgmlReader sgmlReader = new Sgml.SgmlReader();
sgmlReader.DocType = "HTML";
sgmlReader.WhitespaceHandling = WhitespaceHandling.All;
sgmlReader.CaseFolding = Sgml.CaseFolding.ToLower;
sgmlReader.InputStream = new StringReader(loadedFile);
XmlDocument doc = new XmlDocument();
doc.PreserveWhitespace = true;
doc.XmlResolver = null;
doc.Load(sgmlReader);
This code works fine for most cases, except on this site - www.arrow.com (try to search something like OP295GS). I can get a table with result using the following XPath:
var node = doc.SelectSingleNode(".//*[#id='results-table']");
This gives me a node with several child nodes:
[0] {Element, Name="thead"}
[1] {Element, Name="tbody"}
[2] {Element, Name="tbody"}
FirstChild {Element, Name="thead"}
Ok, let's try to get some child nodes using XPath. But this doesn't work:
var childNodes = node.SelectNodes("tbody");
//childnodes.Count = 0
This also:
var childNode = node.SelectSingleNode("thead");
// childNode = null
And even this:
var childNode = doc.SelectSingleNode(".//*[#id='results-table']/thead")
What can be wrong in Xpath queries?
I've just tried to parse that HTML page with Html Agility Pack and my XPath queries work good. But my application use XmlDocument inside, Html Agility Pack doesn't suit me.
I even tried the following trick with Html Agility Pack, but Xpath queries doesn't work also:
//let's parse and convert HTML document using HTML Agility Pack and then load
//the result to XmlDocument
HtmlDocument xmlDocument = new HtmlDocument();
xmlDocument.OptionOutputAsXml = true;
xmlDocument.Load(new StringReader(webPage));
XmlDocument document = new XmlDocument();
document.LoadXml(xmlDocument.DocumentNode.InnerHtml);
Perhaps, web page contains errors (not all tags are closed and so on), but in spite of this I can see child nodes (through Quick Watch in Visual Studio), but cannot access them through XPath.
My XPath queries works correctly in Firefox + FirePath + XPather plugins, but don't work in .net XmlDocument :(
I have not used SqmlReader, but every time I have seen this problem it has been due to namespaces. A quick look at the HTML on www.arrow.com shows that this node has a namespace (note the xmlns:javaurlencoder):
<form name="CatSearchForm" method="post" action="http://components.arrow.com/part/search/OP295GS" xmlns:javaurlencoder="java.net.URLEncoder">
This code is how I loop through all nodes in a document to see which ones have namespaces and which don't. If the node you are looking for or any of its parents have namespaces, you must create a XmlNamespaceManager and pass it along with your call to SelectNodes().
This is kind of annoying, so another idea might be to strip all the xmlns: attributes out of the XML before loading it into a XmlDocument. Then, you won't need to fool with XmlNamespaceManager!
XmlDocument doc = new XmlDocument();
doc.Load(#"C:\temp\X.loadtest.xml");
Dictionary<string, string> namespaces = new Dictionary<string, string>();
XmlNodeList nlAllNodes = doc.SelectNodes("//*");
foreach (XmlNode n in nlAllNodes)
{
if (n.NodeType != XmlNodeType.Element) continue;
if (!String.IsNullOrEmpty(n.NamespaceURI) && !namespaces.ContainsKey(n.Name))
{
namespaces.Add(n.Name, n.NamespaceURI);
}
}
// Inspect the namespaces dictionary to write the code below
XmlNamespaceManager nMgr = new XmlNamespaceManager(doc.NameTable);
// Sometimes this works
nMgr.AddNamespace("ns1", doc.DocumentElement.NamespaceURI);
// You can make the first param whatever you want, it just must match in XPath queries
nMgr.AddNamespace("javaurlencoder", "java.net.URLEncoder");
XmlNodeList iter = doc.SelectNodes("//ns1:TestProfile", nMgr);
foreach (XmlNode n in iter)
{
// Do stuff
}
To be honest when I am trying to get information from a website I use regex.
Ok Kore Nordmann (in his php blog) thinks, this is not good. But some of the comments tell differently.
http://kore-nordmann.de/blog/0081_parse_html_extract_data_from_html.html
http://kore-nordmann.de/blog/do_NOT_parse_using_regexp.html
But it is in php, so sorry for this =) Hope it helps anyway.

Categories

Resources