How can I use HTMLAgilityPack to download a CSV file? - c#

I have the code to go to a website, login with the hidden fields and cookies and include a browser header so that I appear as a normal user.
Now that I am in the protected content I need to download a csv file that I have found within the document using HTMLAgilityPack.
I would like to grab the csv with HTMLAgilityPack so that I can continue to use the cookies and browser user-agent string already setup.
From what I have read HTMLAgilityPack parses the dom. I would expect a csv file to cause an error and return null. But I have seen vague references of being able to grab the raw data of the page/file requested before it is parsed. If so, that would be the solution but I cannot find how to do that.

You don't need to use HtmlAgilityPack at all, assuming the HTML form you're submitting is constant. Just craft the HTTP request manually and submit it, then download the corresponding CSV file using a HttpWebRequest.
HtmlAgilityPack is only used for working with HTML you already have in your possession. It does include an ability to make basic HTTP requests, but that's a convenience feature. Generally you should use HttpWebRequest where possible.

Related

c# webclient.DownloadFileTaskAsync downloads a corrupted 1KB PDF

I have a WebClient created in a WebBrowser_Navigating event handler. It stops navigation (to prevent manual filedownload dialog) and passes the referred URL to the webclient's DownloadFileTaskAsync method.
await client.DownloadFileTaskAsync(e.Url, AppDomain.CurrentDomain.BaseDirectory + "\\SUCCESS.pdf");
I have already set the SecurityProtocolType to Tls12 and passed all cookies and other headers to the webclient.
The file expected is about 11 MB large.
I'll assume that you download the HTML website instead of the actual file. If that is the case you will need to scrape the download link by using an HTML Parser and XPath to navigate inside the HTML (for example the HTML Agility Pack).
If that's not the case, can you maybe print out what exactly e.Url contains to see what URL you are trying to access with await client.DownloadFileTaskAsync(...).
Maybe another problem could be that you don't correctly dispose your HttpClient which might interfere with your file creation. It would be generous of you to add more information about your code to your question.

How do I download documents from AtTask?

I'm working on a continuing API project. The current issue at hand is to be able to download my data from the AtTask server in precisely the folder structure they exist in on the AtTask servers. I've got the folder creation working nicely; the data types between Document, Document Folder and Document Version seem to be pretty clear. I am a little disillusioned about the fact that extension isn't in the document object (that I have to refer to the document VERSION for that)... but I can see some of the reason for that from a design perspective.
The issue I'm running into now is that I need to get the file content. I originally through from the API documentation that I'd be able to get to the file contents the same way as the documentation recommends uploading it -- through the handle. Unfortunately, neither document nor docv seem to support me accessing the handle except to write a new file.
So that leaves me the "download URL" as the remaining option. If I build the UI strings from the API calls using my browser, I get a URL with https://attaskURL/document/download?ID=xxxx (and can also get the versionID and such). If I paste the url into the browser where I'm logged in to the user interface of AtTask, it works fine and I can download the file. If, instead, I use my C# code to do so, I get the login page returned as a stream for me to download instead of my actual file because I'm not authenicated. I've tried creating a network credential and attaching it to the request with the username and password, but to no avail.
I imagine there's a couple ways to solve this problem -- the easy one being finding a way to "log in" to the download site through code (which doesn't seem to be the usual network credential object in C#) OR find a way to access the file contents through the API.
Appreciate your thoughts!
It looks like you can use the download URL if you put a session id in the URL. The details on getting a session id are here (basically just call login and a session id is returned in JSON):
http://developers.attask.com/api-docs/#Authentication
Then cram it on the end of your document download URL:
https://yourcompany.attask-ondemand.com/document/download?ID=xxxx&sessionID=abc1234
I've given this a quick test and I'm able to access a document.
You can use the downloadURL and a sessionID IF you are not using SAML authentication.
I have tried it both ways and using SAML will redirect you to the login page.

Read contents JSON file sitting on webserver from c# code behind

I am trying to read the contents of a JSON file sitting in my github pages repository.
I can navigate and see the file contents in my browser if I specify the url.
If I use the code here:
http://www.codeproject.com/Tips/397574/Use-Csharp-to-get-JSON-Data-from-the-Web-and-Map-i?msg=4615047#xx4615047xx
It claims to "just work", but it doesn't.
All I get back is:
<html><frameset><frame src="URL-TO-JSON-FILE"></frameset></html>
How am I supposed to read the json file and get its contents back as a string. I am using c#?
Once I get the JSON string back I can do the processing I need to do in c#.
EDIT:
According to rawgithub.com those types of urls are not to be used for production. I need this for production. How do production website read remote JSON files that are located on a webserver?
Thank you
Sometimes in github, if you wish to use code from a repository, you must change the url to raw.github.com/ or click on the raw button and use this url.

How to download images from response of Readability Parser API in C#

I'm using Readability Parser API to get content of the page.
After result received content goes to kidlegen.exe(to generate .mobi) and then to my kindle via email. The problem is content i get from Readability Parser API contains <img> to remote images, so i need to download them firts and only then launch kindlegen.exe.
The question is how to download remote images from article to my disk in efficient way? I can see only one solution - use regexp to parse response to extract <img>, then extract scr attribute and finally download images, but that's definitely worst way.
I'm using ASP.NET MVC.
Looks like i need HtmlAgilityPack. I'll detach this task from web application to console.

How to upload files using Yahoo uploader widget in asp.net

Many might have had experience using File Upload widget from Yahoo User Interface library. The docs and community all know how to receive the files on the server using another server technology other than ASP.NET. If anyone has indeed used the widget in their asp.net pages could you share the code on
How to receive the uploaded files Stream/Bytes to a file.
How to check Integrity of the File
How to check if file was received correctly.
Also i would love to do it in single page because doing so i would learn how to differentiate between a normal webpage request and the one caused my file upload widget
Yahoo Upload Widget can be Found here: https://developer.yahoo.com/yui/uploader/.
Have you tried looking at postedfiles collection though? The API looks like it does a standard post. If it does, the just use that collection.
If it doesn't, then you need to use the inputstream property on the request object to read the incoming bytes.
Using something like Fiddler or firebug will tell you how it's making the request. Look for the request type being multipart/mime
edit
Checking the file integrity & whether it was uploaded correctly are pretty much impossible. The only way I can think to do it is to have the user generate a hash of the file then upload the file & the hash & you check the hash is valid. ie not really practical.
All you're getting is a stream of bytes. you have to assume when the stream ends, it ended cleanly & you got all the file.
I answered my own question with code over here.
http://labs.deeptechtons.com/asp-net-tuts/how-to-upload-files-asynchronously-using-yahoo-uploader/

Categories

Resources