How to get rss old items like google reader - c#

I'm creating RSS reader application. I need to get any linked rss old items. For example some web rss result count is too less. My application check time range is too long. Sometimes loss some news.
How can I get rss old items?
When scrolling down on the google reader,reader shown previous items.

try this http://www.google.com/reader/atom/feed/{complete url to rssfeed without {} }?n=5000

I guess, Google saves these items and can display them, even if they are no longer in the feed. Google Reader might even show you items from before you added the feed, because the feeds might be stored globally and not per user.

Yes, the strategy recommended by #someone can help on this. Expanding on that:
Google Reader unofficial API lets you ask for old items from feeds, but it'll be very slow (if you're asking for 10000 items for instance), so you should ask for that once and cache it on your side.
If you need more than 10000~20000 you'll probably get timeouts on the Google server side. To help with this you can probably ask for 1000 or something items each time (http://www.google.com/reader/atom/feed/), and then use the continuation parameter for paging. I've never used this one, but it contains a parameter (c, for continuation) that can be promising for what you need. As described here (in the 'Atom set of items' section):
a string used for continuation process. Each feed return not all items, but only a certain number of items. You'll find in the atom feed (under the name gr:continuation) a string called continuation. Just add that string as argument for this parameter, and you'll retrieve next items.
One more thing, you'll need to login to Google Reader before using that API. If you want code for that, check my answer to this other question.
Hope it helps!

Since Google Reader shut down about a year ago, I'd suggest you give a shot at Superfeedr if you're looking for a replacement.

Related

Get a new link for a site

Suppose I had a link like this: https://site/2019
This site updates irregularly and I want to check if there is a new entry as soon as there is one.
if https://site/2020 becomes available, then I want to parse the full link to a string.
if the site doesn't contain an element (Selenium), it should skip over this link and wait for https://site/2021 to become available.
I have tried a while loop in which I passed an old link (like https://site/2020) and repeatedly checked if https://site/2021 has become available. I have found this to be more difficult then I thought, and thus it failed.
I think it could be done with events, but I don't know how.
If you have any ideas, I would love to hear them.

Connecting To A Website To Look Up A Word(Compiling Mass Data/Webcrawler)

I am currently developing a Word-Completion application in C# and after getting the UI up and running, keyboard hooks set, and other things of that nature, I came to the realization that I need a WordList. The only issue is, I cant seem to find one with the appropriate information. I also don't want to spend an entire week formatting and gathering a WordList by hand.
The information I want is something like "TheWord, The definition, verb/etc."
So, it hit me. Why not download a basic word list with nothing but words(Already did this; there are about 109,523 words), write a program that iterates through every word, connects to the internet, retrieves the data(definition etc) from some arbitrary site, and creates XML data from said information. It could be 100% automated, and I would only have to wait for maybe an hour depending on my internet connection speed.
This however, brought me to a few questions.
How should I connect to a site to look up these words? << This my actual question.
How would I read this information from the website?
Would I piss off my ISP or the website for that matter?
Is this a really bad idea? Lol.
How do you guys think I should go about this?
EDIT
Someone noticed that Dictionary.com uses the word as a suffix in the url. This will make it easy to iterate through the word file. I also see that the webpage is stored in XHTML(Or maybe just HTML). Here is the source for the Word "Cat". http://pastebin.com/hjZj6AC1
For what you marked as your actual question - you just need to download the data from the website and find what you need.
A great tool for this is CsQuery which allows you to use jquery selectors.
You could do something like this:
var dom = CQ.CreateFromUrl("http://www.jquery.com");
string definition = dom.Select(".definitionDiv").Text();

Difference between origLink and Link in RSS feedback xml File

I am developing RSS Reader, and i am confused what is the difference between <link> and <feedburner:origLink> elements in xml file?,
and which better to use when navigating to topic page?
<link>http://rss.sciam.com/~r/sciam/topic/environmental-policy/~3/PTd5RKuTV_0/</link>
<feedburner:origLink>http://www.scientificamerican.com/article/keeling-curve-co2-monitoring-project-draws-a-decent-donation/</feedburner:origLink>
Thank You.
So, <feedburner:origLink> is the original (canonical) link to the content, while <link> is the one that Feedburner wants you to use...
In practice, <feedburner:origLink> is only valid for Feedburner and your reader will likely have more feeds than just feedburner feeds, so your general use case would be to keep track of the standard <link> element. However, you may want to keep track of it, just in case the redirection dies at some point (if Feedburner goes away for example).
Also, I hear from Google employees that they never intended to expose <feedburner:origLink>. It was initially a bug, but once it got out there they could not take it back!

Keyword entries for videos coming back blank, subsequently deleted

UPDATE: apparently this only happens when fetching videos from a playlist feed, which is what I'm doing.
I recently noticed my youtube api requests for videos were returning blank keyword entries. I found the blog post at http://apiblog.youtube.com/2012/08/video-tags-just-for-uploaders.html, and I'm already sending requests as the channel/video owner, yet I still get blank keywords. This has the undesirable side-effect of deleting existing keywords if I make any changes to the video details, such as to descriptions or titles.
For instance, I have video series where every video will have the same description. Perfect place to use the API to run through all the vids in a list and update their details. This used to work fine. But one ill-fated day, this routine became destructive. Any time I do this now, the keywords get blanked out, and I have to go back through all of the affected vids, replacing the lost keywords by hand. I've stopped using my API-based utility since this began happening.
The descriptions and titles will get updated as desired, but the keywords get blanked out, even if I don't touch them. I recall reading somewhere in the API docs something to the effect that when you submit updates for video details, any entries not filled in will be erased. In this instance, because the keyword entries I get back are already blank, any updates I do to the video other than to the keywords cause the keywords to be deleted.
Anybody have any ideas or workarounds? If I can't continue using the API to manage keywords, I would at least like to be able to continue making updates to titles and descriptions, but that won't work right now because the keywords get deleted with any title or description updates :(
The YouTube API should absolutely return media:keywords when you make an authenticated request for a video or a feed of videos in the current account. You can test it yourself at
http://gdata.youtube.com/demo/index.html
Click Authenticate there, then make a request for Uploads -> Query, and enter default as the user name. Run that request and take a look at the responses—all the videos that actually have keywords should have a media:keywords returned for them. (Obviously if you've already deleted the keywords for a given video, they won't be returned, so test with a newly uploaded video that you've set keywords for.)
There is an internal bug that I believe is still open that prevented media:keywords from being returned in playlist entries when you're fetching a playlist feed. Are you perhaps reading your videos from a playlist?
Actually, this is a known issue as YouTube decided to allow Keywords retrieval only for authenticated users.
This is very annoying, but I'm also currently looking for a way safe way to retrieve those keywords, using Zend, without writing donw my password in plain text.
Let's look for a solution together :)
YOUTUBE API : Retrieve video keywords

Get referral item (link)

We have a sitecore website and we need to know the item from which the link that brought you to page X.
Example:
You're on page A and click a link provided by item X that will lead you to page B.
On page B we need to be able to get that item X referred you, and thus access the item and it's properties.
It could go through session, Sitecore context, I don't know what and we don't even need the entire item itself, just the ID would do.
Anyone know how to accomplish this?
From the discussion in the comments you have a web-architecture problem that isn't really Sitecore specific.
You have a back end which consumes several data items to produce some HTML which is sent to the client. Each of those data items may produce links in the HTML. They may produce identical links. Only one of the items is considered the source of the HTML page.
You wan't to know which of those items produced the link. Your only option is to find a way of identifying the links produced. To do this you will have to add some form of tagging information to the URL produced(such as a querystring) that can be interpretted when the request for the URL is processed. The items themselves don't exist in the client.
The problem would be exactly the same if your links were produced by a database query. If you wanted to know which record produced the link you'd have to add an identifier to the link.
You could probably devise a system that would allow you to identify item most of the time (i.e. when the link clicked on was unique to that page), but it would involve either caching lots of data in a session (list of links produced and the items that produced them) or recreating the request for the referring URL. Either sounds like a lot of hassle for a non-perfect solution that could feasibly slow your server down a fair amount.
James is correct... your original parameters are basically impossible to satisfy.
With some hacking and replacing of the standard Sitecore providers though, you could track these. But it would be far easier to use a querystring ID of some sort.
On our system, we have 3rd party advertising links... they have client javascript which actually submits the request to a local page and then gets redirected to the target URL. So when you hover over the link, the status bar shows you "http://whatever.com"... it appears the link is going to whatever.com, but you are actually going to http://ourserver/redirect.aspx first so we can track that link, and then getting a Response.Redirect().
You could do something similar by providing your own LinkManager and including the generating item ID in the tracking URL, then redirecting to the actual page/item the user wants.
However... this seems rather convoluted and error-prone, and I would not recommend it.

Categories

Resources