I'm trying to download a ".aspx" file from the web server using a WebClient object and save it to the file system, but it raises an exception of "HTTP 500 Internal Error", I think becuase the server tries to render the html and send the content of the same rather than the file itself.
var objWebClient = new WebClient();
var remoteUrl = "someserverURL" + "default.aspx";
objWebClient.DownloadFile(remoteUrl, localPathToSave);
Tried adding HTTP headers but I think they might not be of use with request object being from a desktop system and not a browser, have set the server, to serve all content in "application/octet-stream" format.
You can't do this.
If the web server is set up correctly, it will not allow you to directly download an aspx file.
The reason it downloads all your other files like jpegs and text files is because the web server will happily serve these file types and allow them to be downloaded.
If what you are attempting to do was possible then anyone would be able to download the aspx source files for any .net site, which would be hugely insecure.
What you could do is to get the rendered html content from the .aspx page and save that.
var webClient = new WebClient();
var remoteUrl = "someserverURL" + "default.aspx";
byte[] data = webClient.DownloadData(remoteUrl);
var utf8Encoding = new UTF8Encoding();
var html = utf8Encoding.GetString(data);
//now you could save the html to a file
Related
So, I have been running into all kinds of CORS errors (when using HTTPS) and Not allowed to load local resource: file:///C:/Windows/TEMP/e3ef26_75603_4.xml when saving my file to a temp folder and then trying to serve the request via AJAX to be displayed on my browser.
Basically the scenario is that I am requesting a file from a S3 bucket. Now there are couple of things that I tried:
By directly giving the full file path (HTTPS) with associated bucket and file name to a AJAX call. This is done by first generating the file path on the Controller method and assigning a ViewBag variable. Something like:
ViewBag.currentURL = JsonConvert.SerializeObject(tempfilepath);
And associated AJAX:
$(function executeXML() {
//console.log('#Html.Raw(ViewBag.currentURL)');
$("#myeditor").execute({
ajaxOptions: {
pathtoxml: #Html.Raw(ViewBag.currentURL)
},
});
});
This method works quite well when the S3 bucket has public access and the CORS policies are there for the bucket.
Problem: Using this method on a S3 bucket that has no public access and no CORS policies will result in the No 'Access-Control-Allow-Origin' header is present on the requested resource from any browser.
Sigh! But not yet,
The second method that I was trying to do is to read the file on the server side and save it to a XML document. Now when I want to save this XML document, I use a temp folder to save my file. Something like this:
using (WebClient client = new WebClient())
{
string myXMLString = client.DownloadString(fullpathstory);
XmlDocument xml = new XmlDocument();
xml.LoadXml(myXMLString); // suppose that myXmlString contains "<Names>...</Names>"
//Now save the file to temp folder
tempfilepath = Path.Combine(Path.GetTempPath(), filename);
xml.Save(tempfilepath);
}
This gives me a path like: file:///C:/Windows/TEMP/e3ef26_75603_4.xml
Now when I am sending this path to my AJAX, it gives me the error jquery.min.js:4 Not allowed to load local resource: file:///C:/Windows/TEMP/e3ef26_75603_4.xml which is quite obvious and expected.
Question: I am looking for a way to save my XML document in-memory and generate a path or a stream that can be read by my AJAX call and serve it on the browser.
Is there such a way or do I need to create a proper file sever where I store all my generated XML files and then read from that location. It would basically be a temp server folder but then I would need to keep monitoring the ever increasing size of it.
Thanks in advance
Rather than pre generating the file, i would recommend to generate file on demand. The moment user issues an ajax request for file, file would be generated in memory, converted to byte array, returned to client (as a base64 encoded string) and download would start at client's end.
If anyone load this url https://de.visiblealpha.com/links/80488d55-ae41-4def-9452-bae3ac2e2b06 into browser then a excel file start download. so when i invoke the same url by HttpWebRequest then excel file does not start download. this code example i tried.
string address = "https://de.visiblealpha.com/links/80488d55-ae41-4def-9452-bae3ac2e2b06";
using (WebClient client = new WebClient())
{
client.DownloadString(address);
}
again i tried this one too.
string url = "https://de.visiblealpha.com/links/80488d55-ae41-4def-9452-bae3ac2e2b06";
WebRequest request = HttpWebRequest.Create(url);
WebResponse response = request.GetResponse();
StreamReader reader = new StreamReader(response.GetResponseStream());
string responseText = reader.ReadToEnd();
but failed to reach my goal. code successfully executed but no excel file start downloading which i am trying to achieve.
when i tried to load this url https://de.visiblealpha.com/links/80488d55-ae41-4def-9452-bae3ac2e2b06 into webbrowser control then also saw same problem no excel file start download. here is code which i tried.
webBrowser1.Navigate("https://de.visiblealpha.com/links/80488d55-ae41-4def-9452-bae3ac2e2b06");
webBrowser1.ScriptErrorsSuppressed = true;
i just do not understand why excel file is not getting download when invoke or execute the same very url.
so please some one tell me what i need to do as a result the moment i will execute the url excel file will start downloading in client pc.
please share some working code example.
DownloadString returns the contents into a variable aka in memory. A file will not get saved on the system. If that is what you intended, there's a small change you need to make in your code:
string address = "https://de.visiblealpha.com/links/80488d55-ae41-4def-9452-bae3ac2e2b06";
using (WebClient client = new WebClient())
{
string contents = client.DownloadString(address);
}
The variable "contents" will contain html of the URL in your question. If you want it as a file, then I you need to use DownloadFile method instead. The spreadsheet itself is a different URL.
There's an example at the end of this documentation.
In short, I need to detect a webpage's GET requests programmatically.
The long story is that my company is currently trying to write a small installer for a piece of proprietary software that installs another piece of software.
To get this other piece of software, I realize it's as simple as calling the download link through C#'s lovely WebClient class (Dir is just the Temp directory in AppData/Local):
using (WebClient client = new WebClient())
{
client.DownloadFile("[download link]", Dir.FullName + "\\setup.exe");
}
However, the page which the installer comes from does is not a direct download page. The actual download link is subject to change (our company's specific installer might be hosted on a different download server another time around).
To get around this, I realized that I can just monitor the GET requests the page makes and dynamically grab the URL from there.
So, I know I'm going to do, but I was just wondering, is there was a built-in part of the language that allows you to see what requests a page has made? Or do I have to write this functionality myself, and what would be a good starting point?
I think I'd do it like this. First download the HTML contents of the download page (the page that contains the link to download the file). Then scrape the HTML to find the download link URL. And finally, download the file from the scraped address.
using (WebClient client = new WebClient())
{
// Get the website HTML.
string html = client.DownloadString("http://[website that contains the download link]");
// Scrape the HTML to find the download URL (see below).
// Download the desired file.
client.DownloadFile(downloadLink, Dir.FullName + "\\setup.exe");
}
For scraping the download URL from the website I'd recommend using the HTML Agility Pack. See here for getting started with it.
I think you have to write your own "mediahandler", which returns a HttpResponseMessage.
e.g. with webapi2
[HttpGet]
[AllowAnonymous]
[Route("route")]
public HttpResponseMessage GetFile([FromUri] string path)
{
HttpResponseMessage result = new HttpResponseMessage(HttpStatusCode.OK);
result.Content = new StreamContent(new FileStream(path, FileMode.Open, FileAccess.Read));
string fileName = Path.GetFileNameWithoutExtension(path);
string disposition = "attachment";
result.Content.Headers.ContentDisposition = new ContentDispositionHeaderValue(disposition) { FileName = fileName + Path.GetExtension(absolutePath) };
result.Content.Headers.ContentType = new MediaTypeHeaderValue(MimeMapping.GetMimeMapping(Path.GetExtension(path)));
return result;
}
for a web crawler project in C# I try to execute Javascript and Ajax to retrieve the full page source of a crawled page.
I am using an existing web crawler (Abot) that needs a valid HttpWebResponse object. Therefore I cannot simply use driver.Navigate().GoToUrl() method to retrieve the page source.
The crawler downloads the page source and I want to execute the existing Javascript/Ajax inside the source.
In a sample project I tried the following without success:
WebClient wc = new WebClient();
string content = wc.DownloadString("http://www.newegg.com/Product/Product.aspx?Item=N82E16834257697");
string tmpPath = Path.Combine(Path.GetTempPath(), "temp.htm");
File.WriteAllText(tmpPath, content);
var driverService = PhantomJSDriverService.CreateDefaultService();
var driver = new PhantomJSDriver(driverService);
driver.Navigate().GoToUrl(new Uri(tmpPath));
string renderedContent = driver.PageSource;
driver.Quit();
You need the following nuget packages to run the sample:
https://www.nuget.org/packages/phantomjs.exe/
http://www.nuget.org/packages/selenium.webdriver
Problem here is that the code stops at GoToUrl() and it takes several minutes until program terminates without even giving me the driver.PageSource.
Doing this returns the correct HTML:
driver.Navigate().GoToUrl("http://www.newegg.com/Product/Product.aspx?Item=N82E16834257697");
string renderedContent = driver.PageSource;
But I don't want to download the data twice. The crawler (Abot) downloads the HTML and I just want to parse/render the javascript and ajax.
Thank you!
Without running it, I would bet you need file:/// prior to tmpPath. That is:
WebClient wc = new WebClient();
string content = wc.DownloadString("http://www.newegg.com/Product/Product.aspx?Item=N82E16834257697");
string tmpPath = Path.Combine(Path.GetTempPath(), "temp.htm");
File.WriteAllText(tmpPath, content);
var driverService = PhantomJSDriverService.CreateDefaultService();
var driver = new PhantomJSDriver(driverService);
driver.Navigate().GoToUrl(new Uri("file:///" + tmpPath));
string renderedContent = driver.PageSource;
driver.Quit();
You probably need to allow PhantomJS to make arbitrary requests. Requests are blocked when the domain/protocol doesn't match as is the case when a local file is opened.
var driverService = PhantomJSDriverService.CreateDefaultService();
driverService.LocalToRemoteUrlAccess = true;
driverService.WebSecurity = false; // may not be necessary
var driver = new PhantomJSDriver(driverService);
You might need to combine this with the solution of Dave Bush:
driver.Navigate().GoToUrl(new Uri("file:///" + tmpPath));
Some of the resources have URLs that begin with // which means that the protocol of the page is used when the browser retrieves those resources. When a local file is read, this protocol is file:// in which case none of those resources will be found. The protocol must be added to the local file in order to download all those resources.
File.WriteAllText(tmpPath, content.Replace('"//', '"http://'));
It is apparent from your output that you use PhantomJS 1.9.8. It may be the case that a newly introduced bug is responsible for this sort of thing. You should user PhantomJS 1.9.7 with driverService.SslProcotol = 'tlsv1'.
You should also enable the disk cache if you do this multiple times for the same domain. Otherwise, the resources are downloaded each time you try to scrape it. This can be done with driverService.DiskCache = true;
ASP.NET MVC 4 Razor:
I've been working at this for a bit, so I apologize if I'm missing something obvious, but I will truly appreciate any assistance that could be offered.
In a nutshell, what I'm looking to do is download an XML file from a URI using C#. It ought to be pretty straightforward, but the URI leads to a blank page with a download prompt popup populated with a dynamically created filename.
I can't provide the URI due to its confidential nature, but here is the code I've been toying with. (Forgive my ignorance on this matter, it's the first time I've tried anything like this)
byte[] data;
using (WebClient Client = new WebClient())
{
data = Client.DownloadData(uriString + fileString);
}
File.WriteAllBytes(dirString + fileString, data);
I've also tried:
using (WebClient Client = new WebClient())
{
Client.DownloadFile(uriString + fileString, dirString + fileString);
}
To be honest, this code doesn't really work for me. The downloaded files aren't correct. The XML files appear to contain the code from the webpage they've been downloaded from, and if I try something like an image, the image is broken. So, again, any assistance would be appreciated.
Thanks in advance!
The URI that you are using is probably wrong. You are using the URI that opens the popup page. The popup page should be doing another GET to the dynamically generated file.
To automate this process, you should use a WebRequest to get the contents of the popup page. Scrape the contents of the page to get the actual URL to download the file. Then use the code you have written to download the file.
var request = WebRequest.Create("PopupUrl");
var response = request.GetResponse();
string url = GetUrlFromResponseByRegExOrXMLParsing();
var client = new WebClient();
webClient.DownloadFile(url, filePath);