Deserializing an RSS feed in .NET - c#

Is it practical / possible to use serialization to read data from an RSS feed? I basically want to pull information from my Netflix queue (provided from an RSS feed), and I'm trying to decide if serialization is feasible / possible, or if I should just use something like XMLReader. Also, what would be the best way to download the feed from a URL? I've never pulled files from anything but drives, so I'm not sure how to go about doing that.

If you can use LINQ, LINQ to XML is an easy way to get at the basics of an RSS feed document.
This is from something I wrote to select out a collection of anonymous types from my blog's RSS feed, for example:
protected void Page_Load(object sender, EventArgs e)
{
XDocument feedXML = XDocument.Load("http://feeds.encosia.com/Encosia");
var feeds = from feed in feedXML.Descendants("item")
select new
{
Title = feed.Element("title").Value,
Link = feed.Element("link").Value,
Description = feed.Element("description").Value
};
PostList.DataSource = feeds;
PostList.DataBind();
}
You should be able to use something very similar against your Netflix feed.

The .NET 3.5 framework added syndication support. The System.ServiceModel.Syndication namespace provides a bunch of types to manage feeds, feed content and categories, feed formatting (RSS 2.0, Atom 1.0), etc.
http://msdn.microsoft.com/en-us/library/system.servicemodel.syndication.aspx
You have a few options for serialization, but the simplest is probably best described here:
http://msdn.microsoft.com/en-us/library/bb536530.aspx

using System.ServiceModel.Syndication;
public static SyndicationFeed GetFeed(string uri)
{
if (!string.IsNullOrEmpty(uri))
{
var ff = new Rss20FeedFormatter(); // for Atom you can use Atom10FeedFormatter()
var xr = XmlReader.Create(uri);
ff.ReadFrom(xr);
return ff.Feed;
}
return null;
}

If you're using .NET 3.0 or 3.5...then I would suggest using an XMLReader to read the document into an XDocument. You can then use LINQ to XML to query against and render the RSS feed into something usable.
Building something to de-serialize the XML is also feasible and will perform just as well (if not better) but will be more time intensive to create.
Either way will work...do what you're more comfortable with (or, if you're trying to learn XML serialization, go for it and learn something new).

Check out this link for a pretty thorough download routine.
RSS is basically a derivative of XML. I like this link for defining the RSS format. This one has a really basic sample.

Get rss schema from
http://www.thearchitect.co.uk/schemas/rss-2_0.xsd
Generate C# class using xsd.exe. xsd rssschema.xsd /c
During runtime, deserialize the rss
xml using the xsd and class generated above.

Related

What is the easiest way of handling xml files with C#?

I'm developing a windows app using C#. I chose xml for data storage.
It is required to read xml file, make small changes, and then write it back to hard disk.
Now, what is the easiest way of doing this?
XLinq is much comfortable than the ordinary Xml, because is much more object oriented, supports linq, has lots of implicit casts and serializes to the standard ISO format.
The best way is to use XML Serialization where it loads the XML into a class (with various classes representing all the elements/attributes). You can then change the values in code and then serialize back to XML.
To create the classes, the best thing to do is to use xsd.exe which will generate the c# classes for you from an existing XML document.
I think the easiest way of doing it - it is using XmlDocument class:
var doc = new XmlDocument();
doc.Load("filename or stream or streamwriter or XmlReader");
//do something
doc.Save("filename or stream or streamwriter or XmlWriter");
I think I found the easiest way, check out this Project in Codeproject. It is easy to use as XML elements are accessed similarly to array elements using name strings as indexes.
Code sample to write bool property to XML:
Xmlconfig xcfg = new Xmlconfig("config.xml", true);
xcfg.Settings[this.Name]["AddDateStamp"]["bool"].boolValue = checkBoxAddStamp.Checked;
xcfg.Save("config.xml");
Sample to read the property:
Xmlconfig xcfg = new Xmlconfig("config.xml", true);
checkBoxAddStamp.Checked = xcfg.Settings[this.Name]["AddDateStamp"]["bool"].boolValue;
To write string use .Value, for int .intValue.
You can use LINQ to read XML Files as described here...
LINQ to read XML
Check out linq to XML

SyndicationFeed & SyndicationItem LastUpdatedTime

I'm trying to use the SyndicationFeed class to get content from a feed, but for some feeds it works ok but for others it doesn't.
For example, when I want to get feeds from http://www.alistapart.com/site/rss , the LastUpdatedTime has a value of 01-01-0001 for all feed items and the feed itself.
Is there something that i need to do? or is it maybe that SyndicationFeed doesn't support all websites to read from them all the information?
some sample code that i'm using :
var feed = SyndicationFeed.Load(XmlReader.Create("http://www.alistapart.com/site/rss"));
var feedPosts = feed.Items; // here all feedPosts have the invalid LastUpdatedTime, but if i go to the website i can see that there is actually one
You are looking at the LastUpdatedTime while the date in the RSS you mentioned is not LastUpdatedTime nor the more common date pubDate. Note the namespace which is "http://purl.org/dc/elements/1.1/".
Most of these elements are optional and you must be able to live without them or use alternative ones.
I have create a Podcast software and I have found the SyndicationFeed implementation very poor and brittle to deal with various dates which are there in the real world.
UPDATE
is there a way to use the framework's
classes to parse this non standard
attributes?
Yes, have a look at Element Extensions.

Convert HTML to XML with WP7

simple situation, want to search through a HTML string, get out a couple of information.
Gets annoying after writing mass lines of .Substing and. IndexOf for each element i want to find and cut out of the HTML file.
Afaik i´m unable to load such dll as HTMLtidy or HTML Agility Pack into my WP7 project so is there a more efficient and reliable way to search trough my HTML string instead of building Substings with IndexOf?
void client_OpenReadCompleted(object sender, OpenReadCompletedEventArgs e)
{
string document = string.Empty;
using (var reader = new StreamReader(e.Result))
document = reader.ReadToEnd();
string temp = document.Substring(document.IndexOf("Games Played"), (document.IndexOf("League Games") - document.IndexOf("Games Played")));
temp = (temp.Substring(temp.IndexOf("<span>"), (temp.IndexOf("</span>") - temp.IndexOf("<span>")))).Remove(0, 6);
Int32.TryParse(temp, out leaugeGamesPlayed);
}
Thanks for your help
Gpx
You can use the HTML Agility Pack but you need the converted version of HTML Agility Pack for the Phone. It's only available from svn repository but it works great, I use it in my app.
http://htmlagilitypack.codeplex.com/SourceControl/changeset/view/77494#
You can find two projects under trunk named HAPPhone and HAPPhoneTest. You can use the download button to the right to get the code. It uses Linq instead of XPath to work.
You could use LINQ to parse the HTML and locate the elements that you're interested in. For example:
XDocument parsed = XDocument.Parse(document);
var spans = parsed.Descendants("span");
Beth Massi has a great blog post: Querying HTML with LINQ to XML
Assuming you're doing this because you're getting the HTML from a web site/page/server.
Don't convert it on the device.
Create a wrapper/proxy site/server/page to do the conversion for you. While this has the downside of having to create the extra service, it has the following advantages:
Code on the server will be easier to update than code within a distrbued app. (Experience with parsing HTML you don't directly control will show that you will need to make changes in your parsing as the original HTML is almost certain to throw something unexpected at you when changed in the future.)
If you can do it once on the server you can cache the result rather than having instance of the app have to do the conversion over.
By virtue of the above 2 points, the app will run faster!
If you have the HTML file at design/build time then convert it to something easier to work with and avoid unnecessary computation at run time.
As a workaround, you could consider loading the HTML into a WebBrowser control and then query the DOM via injected javascript (which calls back to .NET)

Best way to consume an RSS feed

I'm currently working on an ASP.NET Website where I want to retrieve data from an RSS feed. I can easily retrieve the data I want and get it to show in i.e. a Repeater control.
My problem is, that the blog (Wordpress) that I'm getting the RSS from uses \n for linebreaks which I obviously can't use in HTML. I need to replace these \n with a <br /> tag.
What I've done so far is:
SyndicationFeed myFeed = SyndicationFeed.Load(XmlReader.Create("urltofeed/"));
IEnumerable<SyndicationItem> items = myFeed.Items;
foreach(SyndicationItem item in items)
{
Feed f = new Feed();
f.Content = f.ConvertLineBreaks(item.Summary.Text);
f.Title = item.Title.Text;
feedList.Add(f);
}
rptEvents.DataSource = feedList;
rptEvents.DataBind();
Then having a Feed class with two properties: Title and Content and a helper-method to replace \n with <br />
However, I'm not sure if this is a good/pretty approach to get data from an RSS feed?
Thanks in advance,
Bo
If you are adverse to all the xml parsing in your code you can also run the rss xml schema through xsd and generate a topic and feed class in you code.
This classes should serialize/deserialize to xml. This may be overkill but it's worked great for me when integrating with a standard xml api for a third party.
Does it have anything to do with the type of rss feed you're consuming?
http://codex.wordpress.org/WordPress_Feeds
If it's not returning from the RSS feed formatted as you wish, you have little other choice. That's definitely not a particuarly bad way.
This is a sane approach in my opinion.
Remember that RSS is a data format and not HTML. Replacing \n with in order to get your wanted HTML is just something which has to be done every now and then. (Unless you want to use <pre>)
I find people stuffing way too much html in xml structures, not considering they will be used for other things than web pages. UI and data should be separated.

XML Parsing with C#?

I'm working on a project for school that involves a heavy amount of XML Parsing. I'm coding in C#, but I have yet to find a "suitable" method of parsing this XML out. There's several different ways I've looked at, but haven't gotten it right yet; so I have come to you. Ideally, I'm looking for something kind of similar to Beautiful Soup in Python (sort of).
I was wondering if there was any way to convert XML like this:
<config>
<bgimg>C:\\background.png</bgimg>
<nodelist>
<node>
<oid>012345</oid>
<image>C:\\image.png</image>
<label>EHRV</label>
<tooltip>
<header>EHR Viewer</header>
<body>Version 1.0</body>
<icon>C:\\ico\ehrv.png</icon>
</tooltip>
<msgSource>8181:iqLog</msgSource>
</nodes>
</nodeList>
<config>
Into an Array/Hastable/Dictionary/Other like this:
Array
(
["config"] => array
(
["bgimg"] => "C:\\background.png"
["nodelist"] => array
(
["node"] => array
(
["oid"] => "012345"
["image"] => "C:\\image.png"
["label"] => "Version 1.0"
["tooltip"] => array
(
["header"] => "EHR Viewer"
["body"] => "Version 1.0"
["icon"] => "C:\\ico\ehrv.png"
)
["msgSource"] => "8181:iqLog"
)
)
)
)
Even just giving me a decent resource to look through would be really helpful. Thanks a ton.
I would look into Linq to Xml. This gives you an object structure similar to the Xml file that is fairly easy to traverse.
XmlDocument + XPath is pretty much all you ever need in .NET to parse XML.
There must be 1/2 dozen different ways to do this in C#. My favorite uses the System.Xml namespace, particularly System.Xml.Serialization.
You use a command line tool called xsd.exe to turn an xml sample into an xsd schema file (tip: make sure your nodelist has more than one node in the sample), and then use it again on the schema to turn that into a C# class file you can load into your project and easily use with the System.Xml.Serialization.XmlSerializer class.
There's no shame in using an old-fashioned XmlDocument:
var xml = "<config>hello world</config>";
var doc = new System.Xml.XmlDocument();
doc.LoadXml(xml);
var nodes = doc.SelectNodes("/config");
You should defiantly use LINQ to XML, A.K.A. XLINQ. There is a nice tool called LINQPad that you should check out. It has nice features, from a comprehensive examples library to allowing you to directly query an SQL database via Linq to SQL. Best of all, it lets you test your queries before putting them into code.
The best approach will be dictated by what you actually want to do with the data once you've parsed it out.
If you want to pass it around in a structured-but-not-tied-to-XML fashion, XML Serialization is probably your best bet. This will also get you closest to what you've described, though you'll be dealing with an object graph rather than nested maps.
If you are just looking for a convenient format to query for specific bits of data, your best option would be LINQ to Xml. Alternatively, you could use the more traditional classes in the System.Xml namespace (starting with XmlDocument) and query using XPath.
You could also use any of these techniques (or an XmlTextReader) as building blocks to create the datastructure you've described but, barring some special need, I don't think it'll give you any more versatility than what the other approaches will.
You can also use serialization to convert the XML text back into a strongly typed class instance.
I personally like to map XML elements to classes and viceversa using System.Xml.Serialization.XmlSerializer class.
http://msdn.microsoft.com/es-es/library/system.xml.serialization.xmlserializer(VS.80).aspx
I personally use XPathDocument, XPathNavigator and XPathNodeIterator e.g.
XPathDocument xDoc = new XPathDocument(CHOOSE SOURCE!);
XPathNavigator xNav = xDoc.CreateNavigator();
XPathNodeIterator iterator = xNav.Select("nodes/node[#SomePredicate = 'SomeValue']");
while (iterator.MoveNext())
{
string val = iterator.Current.SelectSingleNode("nodeWithValue");
// etc etc
}
Yeah, i agree..
The linq-way is very nice.
And i especially like the way you write XML using it.
It is much more simple using the "objects in objects"-way.

Categories

Resources