How to download images from response of Readability Parser API in C# - c#

I'm using Readability Parser API to get content of the page.
After result received content goes to kidlegen.exe(to generate .mobi) and then to my kindle via email. The problem is content i get from Readability Parser API contains <img> to remote images, so i need to download them firts and only then launch kindlegen.exe.
The question is how to download remote images from article to my disk in efficient way? I can see only one solution - use regexp to parse response to extract <img>, then extract scr attribute and finally download images, but that's definitely worst way.
I'm using ASP.NET MVC.

Looks like i need HtmlAgilityPack. I'll detach this task from web application to console.

Related

c# webclient.DownloadFileTaskAsync downloads a corrupted 1KB PDF

I have a WebClient created in a WebBrowser_Navigating event handler. It stops navigation (to prevent manual filedownload dialog) and passes the referred URL to the webclient's DownloadFileTaskAsync method.
await client.DownloadFileTaskAsync(e.Url, AppDomain.CurrentDomain.BaseDirectory + "\\SUCCESS.pdf");
I have already set the SecurityProtocolType to Tls12 and passed all cookies and other headers to the webclient.
The file expected is about 11 MB large.
I'll assume that you download the HTML website instead of the actual file. If that is the case you will need to scrape the download link by using an HTML Parser and XPath to navigate inside the HTML (for example the HTML Agility Pack).
If that's not the case, can you maybe print out what exactly e.Url contains to see what URL you are trying to access with await client.DownloadFileTaskAsync(...).
Maybe another problem could be that you don't correctly dispose your HttpClient which might interfere with your file creation. It would be generous of you to add more information about your code to your question.

Handling Authentication for a File Display using a Web Service

This is my first time developing this kind of system, so many of these concepts are very new to me. Any and all help would be appreciated. I'll try to sum up what I'm doing as efficiently as possible.
Background: I have a web application running AngularJS with Bootstrap. The app communicates with the server and DB through a web service programmed using C#. On the site, users can upload files and reference them later using direct links. There's no restriction to file type (yet), so just about anything is allowed.
My Goal: Having direct links creates a big security problem for me, since the documents/images are supposed to be private data. What I would prefer to do is validate a user's credentials when the link is clicked, then load the file in the browser using a more generic url path.
--Example--
"mysite.com/attachments/1" ---> (Image)
--instead of--
"mysite.com/data/files/importantImg.jpg"
Where I'm At: Not very far. My first thought was to add a page that sends the server request and receives a file byte stream along with mime type that I can reassemble and present to the user. However, I have no idea if this is possible using a web service that sends JSON requests, nor do I have a clue about how the reassembling process would work client-side.
Like I said, I'll take any and all advice. I'd love to learn more about this subject for future projects as well, but for now I just need to be pointed in the right direction.
Your first thought is correct, for it, you need to use the Response object, and more specifically the AddHeader and Write functions. Of course this will be a different page that will only handle file downloads, so it will be perfectly fine in your JSON web service.
I don't think you want to do this with a web service. Just use a regular IHttpHandler to perform the validation and return the data. So you would have the URL "attachments/1" get rewritten to "attachments/download.ashx?id=1". When you've verified access, write the data to the response stream. You can use the Content Disposition header to set the file name.

How can I use HTMLAgilityPack to download a CSV file?

I have the code to go to a website, login with the hidden fields and cookies and include a browser header so that I appear as a normal user.
Now that I am in the protected content I need to download a csv file that I have found within the document using HTMLAgilityPack.
I would like to grab the csv with HTMLAgilityPack so that I can continue to use the cookies and browser user-agent string already setup.
From what I have read HTMLAgilityPack parses the dom. I would expect a csv file to cause an error and return null. But I have seen vague references of being able to grab the raw data of the page/file requested before it is parsed. If so, that would be the solution but I cannot find how to do that.
You don't need to use HtmlAgilityPack at all, assuming the HTML form you're submitting is constant. Just craft the HTTP request manually and submit it, then download the corresponding CSV file using a HttpWebRequest.
HtmlAgilityPack is only used for working with HTML you already have in your possession. It does include an ability to make basic HTTP requests, but that's a convenience feature. Generally you should use HttpWebRequest where possible.

Retrieve data from a website

I want to grab a set of data from a site into my C# application. I've referred to some sites and articles using the WebClient class.
But the problem is the data I want is in a news bar made using flash. Is it possible to grab the data from it? The data in it also keeps on updating as well.
Have you tried the Yahoo approach? The below project does just that.
It is easy to download stock data from Yahoo!. For example, copy and
paste this URL into your browser address:
http://download.finance.yahoo.com/d/quotes.csv?s=YHOO+GOOG+MSFT&f=sl1d1t1c1hgvbap2.
Depending on your Internet browser setting, you may be asked to save
the results into a filename called "quotes.csv" or the following will
appear in your browser:
http://www.codeproject.com/KB/aspnet/StockQuote.aspx?display=Normal
It is unable to grab a data from Flash.
One possible solution is that, if you dig into embed tag at the Flash object or find some url or rss that looks to be consumed by the flash, you can read that by WebClient or (hopefully) XmlReader.

How to upload files using Yahoo uploader widget in asp.net

Many might have had experience using File Upload widget from Yahoo User Interface library. The docs and community all know how to receive the files on the server using another server technology other than ASP.NET. If anyone has indeed used the widget in their asp.net pages could you share the code on
How to receive the uploaded files Stream/Bytes to a file.
How to check Integrity of the File
How to check if file was received correctly.
Also i would love to do it in single page because doing so i would learn how to differentiate between a normal webpage request and the one caused my file upload widget
Yahoo Upload Widget can be Found here: https://developer.yahoo.com/yui/uploader/.
Have you tried looking at postedfiles collection though? The API looks like it does a standard post. If it does, the just use that collection.
If it doesn't, then you need to use the inputstream property on the request object to read the incoming bytes.
Using something like Fiddler or firebug will tell you how it's making the request. Look for the request type being multipart/mime
edit
Checking the file integrity & whether it was uploaded correctly are pretty much impossible. The only way I can think to do it is to have the user generate a hash of the file then upload the file & the hash & you check the hash is valid. ie not really practical.
All you're getting is a stream of bytes. you have to assume when the stream ends, it ended cleanly & you got all the file.
I answered my own question with code over here.
http://labs.deeptechtons.com/asp-net-tuts/how-to-upload-files-asynchronously-using-yahoo-uploader/

Categories

Resources