Some sites can get full text Rss feed when the rss address don't have full text
like this site This
how can I do that?
I don't know much about C#, but I can still give a general answer on how to solve your problem. RSS feeds (almost) always link to the article, hosted on the newspaper/blog's website, where the whole article is available. So the "RSS filler" takes the content of the article from the website content and basically puts it back in the feed, replacing the available (short) intro.
To achieve this you need to:
parse/generate RSS/Atoms feeds (I'm sure there are plenty of C# libs to do that)
find the actual article from the html page linked in the original RSS feed. Indeed the linked page contains a lot of things you don't want to put in the "full" RSS feed (such as the website header, nav bar, ads, comments, facebook like button and so on). The easiest way to do this is to use readability (a quick google check gives this lib).
If you combine both of these, you can achieve your goal.
You can find one implementation of this kind of tool at http://fivefilters.org, and their source code (for older versions) is at /content-only/ http://code.fivefilters.org/full-text-rss/. It's in PHP, but it can give a rough idea on how to proceed.
You can get the complete script which enlarge the partial rss feed from Full post rss feed website
The steps involves:
- Get the post URL from the RSS feed.
- Fetch the full content of the post URL, it will use curl to get the content.
- Parse the content, it uses templates for that. They keep on update the templates for the most popular websites and wordpress themes. Based on the template, parse the html content in to html dom objects and then find the content based on the html dom objects.
- Finally, generate the rss feed again with full content.
You can check the script which is written in PHP to get some idea, later you can rewrite the logic to any language.
Related
I would like to make an app in C#, where to generate some graphics about football teams, for the most important championships.
Ex: FC Barcelona, or Real Madrid... in Primera Division...
For this, I need the last updated data, from internet, about teams like, name of team, points, ranking...
Which is the most common way to do this?
Do I have to find a RSS Feed for this? Do you have any information about this?
Do I have to find a website and from there to parse its source code?
If you could find the RSS feed for the information you want, that would be the best solution.
Otherwise, You could use the HTML Agility Pack, http://html-agility-pack.net/ (also available on NuGet). This makes it really easy to parse out relevant content from a messy html-page. You can (with some creative coding) write selectors that targets the content you want, on a dynamic site where the structure might change a bit. Really handy to get content of the web.
I'm using a expertPDF to convert a couple webpages to PDF, and there's one that i'm having difficulties with. This page only renders content when info is POST'd to it, and the content is text and a PNG graph (the graph is the most important piece).
I tried creating a page form with a 'auto submit' on the body onload='' event. If i go to this page, it auto posts to the 3rd party page and i get the page as i expect. But it appears ExpertPDF won't take a 'snapshot' if the page is redirected.
I tried using HTTPRequest/Response and WebClient, but have only been able to retrieve the HTML, which doesn't include the PNG graph.
Any idea how i can create a memorystream that includes the HTML AND the PNG graph or post to it, but then somehow send ExpertPDF to that URL to take a snapshot of the posted results?
Help is greatly appreciated - i've spent too much time trying on this one sniff.
Thanks!
In HTML/HTTP the web page (the HTML) is a separate resource from any images it includes. So you would need to parse the HTML and find the URL that points to your graph, and then make a second request to that URL to get the image. (This is unless the page spits the image out inline, which is pretty rare, and if that were the case you probably wouldn't be asking.)
A quick look at ExpertPDF's FAQ page, there's a FAQ question that deals specifically with your problem. I would recommend you take a look at that.
** UPDATE **
Take a look at the second FAQ question:
Q: When I convert a HTML string to PDF, the external CSS files and images are not applied in the rendered PDF document.
You can take the original (single) response from your WebClient and convert that into a string and pass that string to ExpertPDF based on the answer to that question.
I am trying to make a video download application for desktop in C#.
Now the problem is that following code works fine:
WebClient webOne = new WebClient();
string temp1 = " http://www.c-sharpcorner.com/UploadFile/shivprasadk/visual-studio-and-net-tips-and-tricks-15/Media/Tip15.wmv";
webOne.DownloadFile(new Uri(temp1), "video.wmv");
But following code doesn't:
temp1="http://www.youtube.com/watch?v=Y_..."
(in this case a 200-400 kilobyte junk file gets downloaded )
Difference between the two URLs is obvious, first one contains exact name for file while other seems to be encrypted in some way...
I was unable to find any proper and satisfactory solution to the problem so I would highly appreciate a little help here, Thanks.
Note:
from one of the questions here I got a link http://youtubefisher.codeplex.com/ so I visited there, got the source code and read it. It's great work but what I don't seem to get is that how in the world that person came to know what structures and classes he had to make for downloading a YouTube video and why did he have to go through all that trouble why isn't my method working?
Someone please guide. Thanks again.
In order to download a video from youtube, you have to find the actual video location. Not the page that you use to watch the video. The http://www.youtube.com/watch?v=... url, is an html page (much like this one) that will load the video from it's source location and display it. Normally, you have to parse the html and extract the video location from the html.
In your case, you found code that does this already - and lucky you, because downloading videos from youtube is not simple at all. Looking at the link you provided in your question, the magic behind the madness is available in YoutubeService.cs / GetDownloadUrl():
http://youtubefisher.codeplex.com/SourceControl/changeset/view/68461#1113202
That method is parsing the html page returned by a youtube watch url, and finding the actual video content. The added complexity, is that youtube videos can also be a variety of different formats.
If you need to convert the video type after downloading, i recommend FFMPEG
EDIT: In response to your comment - You didnt look at the source code of YoutubeFisher at all, did you.. I'd recommend analysing the file I mentioned (YoutubeService.cs). Although after taking a quick look myself, you'll have to parse the yt.playerConfig variable within the html page.
Use that source to help you.
EDIT: In response to your second comment: "Actually I am trying to develop an application that can download video from any video site." You say that like its easy - fyi, its not. Since every video website is different, you cant just write something that will work for everything out of the box. If I had to do it though, heres how i would: I would write custom parsers for the major video sharing websites (Metacafe, Youtube, Whatever else) so that those ones are guarenteed to work. After that, I would write a "fallover" if you will. Basically, if you're requesting a video from an unknown website, it would scour the html looking for known video extentions (flv, wmv, mp4, etc) and then extract the url from that.
You could use a regex for extracting the url in the latter case, or a combination of something like indexof, substring, and lastindexof.
I found this page # CodeProject, it shows you how to make a very efficient Youtube downloader using no third party libraries. Remember it is sometimes necessary to slightly modify the code as Youtube sometimes makes changes to it's web structure, which may interfere with the way your app interacts with Youtube.
Here is the link: here you can also download the C# project files and see the files directly.
CodeProject - Youtube downloader using C# .NET
I was wondering how can I do something similar to Facebook when a link is posted or like shortening link services that can get the title of the page and its content.
Example:
My idea is to get only the plain text from a web page, for example if the url is an article of a newspaper how can I get only the news's text, like showed in the image. For now I have been trying to use the HtmlAgilityPack but I can never get the text clean.
Note this app is for Windows Phone 7.
You're on the right track with HtmlAgilityPack.
If you want all the text of the website, go for the innerText attribute. But I suggest you go with the meta description tag (if available).
EDIT - Go for the meta description. I believe that's what Facebook is doing:
Facebook link sample
Site source
Any api's available or any reference documents??
Thanks
You could make an HTTP request to one of the various RSS feeds on their site, parse the XML and include accreditation and links back to the original stories.
The simplest way would be to pick up one of their RSS feeds, and parse it with some RSS reader library.