SyndicationFeed & SyndicationItem LastUpdatedTime - c#

I'm trying to use the SyndicationFeed class to get content from a feed, but for some feeds it works ok but for others it doesn't.
For example, when I want to get feeds from http://www.alistapart.com/site/rss , the LastUpdatedTime has a value of 01-01-0001 for all feed items and the feed itself.
Is there something that i need to do? or is it maybe that SyndicationFeed doesn't support all websites to read from them all the information?
some sample code that i'm using :
var feed = SyndicationFeed.Load(XmlReader.Create("http://www.alistapart.com/site/rss"));
var feedPosts = feed.Items; // here all feedPosts have the invalid LastUpdatedTime, but if i go to the website i can see that there is actually one

You are looking at the LastUpdatedTime while the date in the RSS you mentioned is not LastUpdatedTime nor the more common date pubDate. Note the namespace which is "http://purl.org/dc/elements/1.1/".
Most of these elements are optional and you must be able to live without them or use alternative ones.
I have create a Podcast software and I have found the SyndicationFeed implementation very poor and brittle to deal with various dates which are there in the real world.
UPDATE
is there a way to use the framework's
classes to parse this non standard
attributes?
Yes, have a look at Element Extensions.

Related

parsing resume tablewith HtmlAgilityPack

Can you say me how to parse data from this link?
http://www.e1.ru/business/job/resume.detail.php?id=956004
I tryed something like this
var nodes = doc.DocumentNode.SelectNodes("/html[1]/body[1]/table[5]/tbody[1]/tr[1]/td[2]/table[4]/tbody[1]/tr[1]/td[1]/table[1]");
but it is not good variant.
Abbath, I recommend using some 3rd party tools. which can extract data from HTML and then extract your required data. like egrabber, rchilli and many more .
if you are looking for your own solution - then add a index of complete text, and then catch them as XML - study DOM structure and pick out selective values.

Best way to consume an RSS feed

I'm currently working on an ASP.NET Website where I want to retrieve data from an RSS feed. I can easily retrieve the data I want and get it to show in i.e. a Repeater control.
My problem is, that the blog (Wordpress) that I'm getting the RSS from uses \n for linebreaks which I obviously can't use in HTML. I need to replace these \n with a <br /> tag.
What I've done so far is:
SyndicationFeed myFeed = SyndicationFeed.Load(XmlReader.Create("urltofeed/"));
IEnumerable<SyndicationItem> items = myFeed.Items;
foreach(SyndicationItem item in items)
{
Feed f = new Feed();
f.Content = f.ConvertLineBreaks(item.Summary.Text);
f.Title = item.Title.Text;
feedList.Add(f);
}
rptEvents.DataSource = feedList;
rptEvents.DataBind();
Then having a Feed class with two properties: Title and Content and a helper-method to replace \n with <br />
However, I'm not sure if this is a good/pretty approach to get data from an RSS feed?
Thanks in advance,
Bo
If you are adverse to all the xml parsing in your code you can also run the rss xml schema through xsd and generate a topic and feed class in you code.
This classes should serialize/deserialize to xml. This may be overkill but it's worked great for me when integrating with a standard xml api for a third party.
Does it have anything to do with the type of rss feed you're consuming?
http://codex.wordpress.org/WordPress_Feeds
If it's not returning from the RSS feed formatted as you wish, you have little other choice. That's definitely not a particuarly bad way.
This is a sane approach in my opinion.
Remember that RSS is a data format and not HTML. Replacing \n with in order to get your wanted HTML is just something which has to be done every now and then. (Unless you want to use <pre>)
I find people stuffing way too much html in xml structures, not considering they will be used for other things than web pages. UI and data should be separated.

Regex or XML Parser C#

I have some word templates(dot/dotx) files that contain xml tags along with plain text.
At run time, I need to replace the xml tags with their respective mail merge fields.
So, need to parse the document for these xml tags and replace them with merge fields.
I was using Regex to find and replace these xml tags. But I was suggested to use XML parser to parse for XML tags ([Regex for string enclosed in <*>, C#).
The sample document looks like:
Solicitor Letter
<Tfirm/>
<Tbuilding/>
<TstreetNumber/> <TstreetName/>
For the attention of: <TContact1/> <TEmail/>
Dear <TContact1/>
RE: <Pbuilding/> <PstreetNumber/> <PstreetName/> <Pvillage/> <PTown/>
We were pleased to hear that contracts have now been exchanged in the sale of the
above property on behalf of our mutual client/s. We now have pleasure in enclosing a
copy of our invoice for your kind attention upon completion.
....
One more note, the angle brackets are typed manually by end user in the template.
I tried using XMLReader, but got error as my documents have no root tags on their own.
Please guide if I should stick to Regex or is there any way to use XML Parser.
Thank you!
Unless you can get it structured as an XML document, the tools in the .NET Libraries to read XML are going to be entirely useless.
What you have is not XML. Having a tag or two that would qualify as XML does not an XML document make. The problem is that it simply does not follow any of the rules of XML.
Moral of the story is that you will have to come up with your own method to parse this. If you like to drink the RegEx kool-aid, that'll be the best solution for ya. Of course, there are plenty of ways to skin this cat.
It looks like you aren't actually using XML, just using a token that looks similar to XML as a placeholder for replacement.
If that's the case, you should be using Regex.
I would suggest neither. Microsoft has a free library in C# specifically for modifying open xml format documents without an installation of Microsoft Office.
OpenXML SDK
Doesn't seem like XML processing to me. It's not an XML doc. It's looks like straight string-replacement, and for that, you're better off with a Regular Expression.
An XML parser doesn't help you locate XML; it only helps you understand a given piece of XML. You will need some other mechanism, perhaps a Regex, to find the XML.
Seems that authors of most replies didnt read the question carefully.
inutan is asking for something that will parse Word documents. If a Word document is saved in docx format, it will be actually XML file that can be read by XML Reader or XPathReader, however I will not recomend to do it
Normally, mail merge with Word doesnt require any programming and XML parsing, see http://helpdesk.ua.edu/training/word/merg07.html
However if you still want to have XML-like fields in your Word templates and replace them with values, I would suggest using Word automation objects.
Below is an example of VBA code, for a similar code on other languages please refer MS Office development site http://msdn.microsoft.com/en-us/library/bb726434.aspx . For example if you use .NET - you should use Office interops and best of all is to install MS Visual Studio Tools for Office development http://msdn.microsoft.com/en-us/library/5s12ew2x.aspx
With Selection.Find
.Text = "<TContact1/>"
.Replacement.Text = "TContact1"
.Forward = True
.Wrap = wdFindContinue
.Format = False
.MatchCase = False
.MatchWholeWord = False
.MatchWildcards = False
.MatchSoundsLike = False
.MatchAllWordForms = False
End With
Selection.Find.Execute Replace:=wdReplaceAll

Effective Custom Tag Parsing with C#

I'm currently playing around with a CMS idea I've got. It's based on a MonoRail, NHibernate stack. I know there are already a million CMS solutions out there. This is more for my benefit for trying some new stuff out.
Anyway, the admin side of things is going well with a plugin architecture in full flow, however I've hit a bit of a road block with the front end template management side of things.
What I'm wanting to do is allow developers to write their own custom tags e.g.
<cms:news>
<h1><cms:news:title /></h1>
<p><cms:news:date /></p>
<cms:news:story />
</cms:news>
I believe this will give developers a great deal of flexibility.
The part I'm struggling with is the parsing of these tags. I could use reflection, however I'm worried that this may be quite intensive for every page. Has anyone else done something like this, that has a better solution?
Sorry for the lack of info guys. Here is a bit more info for you.
The above code would site in a "page" in the CMS. The complete page markup would simply be a DB record.
Once the parser hits there tags it would then need to process them to convert them to content. In the example above the parser would hit the cms:news tag and make a call to a function like this
public void news()
{
// Get all of the news articles from the database
}
The cms:news:title (or cms:news.title) tag would call a function like this
public string newstitle()
{
// Return the news title for the current news element we are rendering
}
Hope this makes more sense now
Thanks
John
I think I've been looking at this all wrong.
I could basically do this my using something like the Spark View Engine's InMemoryViewFolder and using ViewComponents for the custom tags.
The tags you're considering to use are not valid XML : you can't have multiple colons in an element name (only one to separate the namespace from the local name)
Consider this instead :
<cms:news>
<h1><cms:news.title /></h1>
<p><cms:news.date /></p>
<cms:news.story />
</cms:news>
To parse this XML, there are a number of options available to you :
XmlReader
XmlDocument
XDocument (Linq to XML)
I don't think XML serialization is an option if the tags are customizable...
Anyway, I'm not sure what you're trying to achieve exactly... What would you do with those tags ? Could you be more specific in your question ?

Deserializing an RSS feed in .NET

Is it practical / possible to use serialization to read data from an RSS feed? I basically want to pull information from my Netflix queue (provided from an RSS feed), and I'm trying to decide if serialization is feasible / possible, or if I should just use something like XMLReader. Also, what would be the best way to download the feed from a URL? I've never pulled files from anything but drives, so I'm not sure how to go about doing that.
If you can use LINQ, LINQ to XML is an easy way to get at the basics of an RSS feed document.
This is from something I wrote to select out a collection of anonymous types from my blog's RSS feed, for example:
protected void Page_Load(object sender, EventArgs e)
{
XDocument feedXML = XDocument.Load("http://feeds.encosia.com/Encosia");
var feeds = from feed in feedXML.Descendants("item")
select new
{
Title = feed.Element("title").Value,
Link = feed.Element("link").Value,
Description = feed.Element("description").Value
};
PostList.DataSource = feeds;
PostList.DataBind();
}
You should be able to use something very similar against your Netflix feed.
The .NET 3.5 framework added syndication support. The System.ServiceModel.Syndication namespace provides a bunch of types to manage feeds, feed content and categories, feed formatting (RSS 2.0, Atom 1.0), etc.
http://msdn.microsoft.com/en-us/library/system.servicemodel.syndication.aspx
You have a few options for serialization, but the simplest is probably best described here:
http://msdn.microsoft.com/en-us/library/bb536530.aspx
using System.ServiceModel.Syndication;
public static SyndicationFeed GetFeed(string uri)
{
if (!string.IsNullOrEmpty(uri))
{
var ff = new Rss20FeedFormatter(); // for Atom you can use Atom10FeedFormatter()
var xr = XmlReader.Create(uri);
ff.ReadFrom(xr);
return ff.Feed;
}
return null;
}
If you're using .NET 3.0 or 3.5...then I would suggest using an XMLReader to read the document into an XDocument. You can then use LINQ to XML to query against and render the RSS feed into something usable.
Building something to de-serialize the XML is also feasible and will perform just as well (if not better) but will be more time intensive to create.
Either way will work...do what you're more comfortable with (or, if you're trying to learn XML serialization, go for it and learn something new).
Check out this link for a pretty thorough download routine.
RSS is basically a derivative of XML. I like this link for defining the RSS format. This one has a really basic sample.
Get rss schema from
http://www.thearchitect.co.uk/schemas/rss-2_0.xsd
Generate C# class using xsd.exe. xsd rssschema.xsd /c
During runtime, deserialize the rss
xml using the xsd and class generated above.

Categories

Resources