Download A Non-Directed File With C# In WPF - c#

I have a small problem and I can not figure out how to solve my problem.
I need to download a file from a site in my WPF application, but my problem is i can't access direct link of file because the file is updated daily and the URL respond different file each day. link is correct and i can download in IDM or any web browser, but when I use C# WPF classes like WebClient, it download something else and when I open the file with office it says that file is corrupted. can anyone offer me how to download my excel file from this link using C# in WPF?
Here is the link
Also another problem is I don't know respond file's name, is it possible to figure out the file name too ?
I will appreciate any response, thanks a lot.

From the looks of it, the server is not respecting the Accept-Encoding header sent by the request. It just always sends the response with gzip encoding. I was able to download the file sucessfully with HttpClient, once I set AutomaticDecompression to GZip.
static void Main()
{
var task = DownloadFileAsync("http://members.tsetmc.com/tsev2/excel/MarketWatchPlus.aspx?d=0");
task.Wait();
}
static async Task DownloadFileAsync(string url)
{
HttpClient client = new HttpClient(new HttpClientHandler { AutomaticDecompression = DecompressionMethods.GZip });
HttpResponseMessage response = await client.GetAsync(url);
// Get the file name from the content-disposition header.
// This is nasty because of bug in .net: http://stackoverflow.com/questions/21008499/httpresponsemessage-content-headers-contentdisposition-is-null
string fileName = response.Content.Headers.GetValues("Content-Disposition")
.Select(h => Regex.Match(h, #"(?<=filename=).+$").Value)
.FirstOrDefault()
.Replace('/', '_');
using (FileStream file = File.Create(fileName))
{
await response.Content.CopyToAsync(file);
}
}

I had a look at the http headers that the server sends back:
Content-Encoding: gzip
Vary: *
Content-Disposition: attachment; filename=MarketWatchPlus-1393/4/25.xlsx
Content-Length: 77100
Cache-Control: public, max-age=60
Content-Type: application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
Date: Tue, 15 Jul 2014 20:59:46 GMT
Expires: Tue, 15 Jul 2014 21:00:44 GMT
Last-Modified: Tue, 15 Jul 2014 20:59:44 GMT
Server: Microsoft-IIS/7.5
X-Powered-By: ASP.NET
Notice how it says gzip there. I renamed the file to .gz and unzipped it, looks fine then. Looks that while web browsers sort that out by themselves, the WebRequest classes do not.
To answer your second question, the file name is in the headers as well.

Related

Why is FileContentResult putting the file contents into the response body instead of putting through a file download?

I'm trying to export data from my database to CSV using return File(bytes, contentType, fileName);
This code done based on accepted answer here
I'm expecting a download to start in my browser, but what happens instead, is that the intended contents of the file gets written out to the response body and the response headers have a weird value for content-disposition.
Having updated my response by manually adding a content disposition and sending the return value, I see the content disposition now correctly reads content-disposition: attachment; filename=foo.csv
I'm still not getting my download so I replaced the ; with a , but that didn't help either.
Here's the headers:
content-disposition: attachment, filename=foo.csv
content-encoding: gzip
content-type: text/csv
date: Wed, 21 Apr 2021 06:54:40 GMT
server: Microsoft-IIS/10.0
vary: Accept-Encoding
x-powered-by: ASP.NET
The response does contain the data though...
So the question is how on earth do I get this data output to an actual file that'll be automatically downloaded on the browser?
I'd like to avoid saving files to the server first if at all possible because it's a bit of a nightmare in regard to maintaining.
I feel like a bit of a noob right now because I discovered the cause of the problem was nothing at all to do with the way the server was handling the request or building the response.
The problem is the way I was handling it on the front end.
$.get("/r/foo/export" + args, null, function (data) {
console.log(data);
});
My understanding was that this would trigger the download since I'm requesting it on this url but this isn't sufficient or correct.
Having changed this ajax get request to a redirect triggers the download on the browser perfectly.
location.href = "/r/foo/export" + args;

Respond with SUCCESS in header reply

I am responding to a GET request from a field device with the following:
var reply = new HttpResponseMessage(System.Net.HttpStatusCode.OK)
{
Content = new StringContent("SUCCESS")
};
The word appears in the message of the body. But the field device is saying that it is not successful.
From this, I can gather that I shouldn't be using HttpResponseMessage but some other means.
The suggested reply needs to look like this:
HTTP/1.1 200 OK<CR><LF>
Date: Mon, 15 Feb 2016 11:34:50 GMT<CR><LF>
Server: Apache/2.2.31 (Win32) mod_ssl/2.2.31 OpenSSL/1.0.2f PHP/5.4.45<CR><LF>
X-Powered-By: PHP/5.4.45<CR><LF>
Content-Length: 7<CR><LF>
Keep-Alive: timeout=5, max=100<CR><LF>
Connection: Keep-Alive<CR><LF>
Content-Type: text/plain<CR><LF>
<CR><LF>
SUCCESS<CR><LF>
BTW I am using Microsoft for this not PHP, this is from the manual.
Should I be using HttpRequestMessage ?
The part I am concerned with is <CR><LF>SUCCESS<CR><LF>
So the issue was that the device required a time sync with my server before it could send data later on. So there never was an issue with the response.

Is Dot Net HttpClient Unexpectedly Caching Responses?

I'm attempting to write a curl-like tool that demonstrates the effect of various HTTP caching headers on dot net's HttpClient class.
In my initial attempt I'm pointing the tool at one of my internal web services that does not specify any caching information in the response and examining the header of the response.
I expect to see that the request is re-sent each time and executed on the server, returning a new but identical set of content each time (for the purpose of this test, the content is static on the server). But, instead, each request after the first returns much more quickly than the first and includes a new header Age that was not present in the very first response. This indicates to me that the HttpClient in my command-line tool is returning the response from cache, not placing a new request.
Here is the first request with the response headers:
HTTP:>GET http://myserver:8058/path1/path2
Status 200 OK (OK in 00:00:00.3235905):
Date = Sat, 08 Jul 2017 15:55:22 GMT
Server = Microsoft-HTTPAPI/2.0
Content-Length = 150867
Content-Type = application/json; charset=utf-8
and here is the request from the same session of my curl tool, a little while later:
HTTP:>GET http://myserver:8058/path1/path2
Status 200 OK (OK in 00:00:00.0188433):
Date = Sat, 08 Jul 2017 15:55:22 GMT
Server = Microsoft-HTTPAPI/2.0
Age = 312
Content-Length = 150867
Content-Type = application/json; charset=utf-8
and finally, after I stop and start my program, here's another request from the new instance:
HTTP:>GET http://myserver:8058/path1/path2
Status 200 OK (OK in 00:00:00.0517271):
Date = Sat, 08 Jul 2017 15:55:22 GMT
Server = Microsoft-HTTPAPI/2.0
Age = 528
Content-Length = 150867
Content-Type = application/json; charset=utf-8
The last one I find even more difficult to understand as I was under the impression (from reading this: https://aspnetmonsters.com/2016/08/2016-08-27-httpclientwrong/) that caching is maintained per instance of HttpClient.
This seems to continue forever with Age increasing each request. The only way to get back to the original response is to use Internet Explorer and delete temporary internet files.
[Additional Info] After leaving my command line application open for a couple of hours I repeated the request and received a response identical to the original, without the Age header. So it appears that, if HttpClient was caching the response, that cache expired after a couple of hours.
Can anyone tell me if I'm correct that HttpClient is performing internal caching in this case, and if so, why it's doing so in the absence of any caching-related response headers and what policy it's using?

Would like to get http response results like Fiddler

I'm trying to get the same type of results that Fiddler gets when I launch a webpage from my app.
Below is the code I'm using and the results I'm getting. I've used google.com only as an example.
What do I need to modify in my code to get the results I want or do I need an entirely different approach?
Thanks for your help.
My code:
// create the HttpWebRequest object
HttpWebRequest objRequest = (HttpWebRequest)WebRequest.Create("http://www.google.com");
// get the response object which has the header info, using the GetResponse method
var objResults = objRequest.GetResponse();
// get the header count
int intCount = objResults.Headers.Count;
// loop through the results object
for (int i = 0; i < intCount; i++)
{
string strKey = objResults.Headers.GetKey(i);
string strValue = objResults.Headers.Get(i);
lblResults.Text += strKey + "<br />" + strValue + "</br /><br />";
}
My results:
Cache-Control
private, max-age=0
Content-Type
text/html; charset=ISO-8859-1
Date
Tue, 05 Jun 2012 17:40:38 GMT
Expires
-1
Set-Cookie
PREF=ID=526197b0260fd361:FF=0:TM=1338918038:LM=1338918038:S=gefqgwkuzuPJlO3G; expires=Thu, 05-Jun-2014 17:40:38 GMT; path=/; domain=.google.com,NID=60=CJbpzMe6uTKf58ty7rysqUFTW6GnsQHZ-Uat_cFf1AuayffFtJoFQSIwT5oSQKqQp5PSIYoYtBf_8oSGh_Xsk1YtE7Z834Qwn0A4Sw3ruVCA9v3f_UDYH4b4fAloFJbW; expires=Wed, 05-Dec-2012 17:40:38 GMT; path=/; domain=.google.com; HttpOnly
P3P
CP="This is not a P3P policy! See http://www.google.com/support/accounts/bin/answer.py?hl=en&answer=151657 for more info."
Server
gws
X-XSS-Protection
1; mode=block
X-Frame-Options
SAMEORIGIN
Transfer-Encoding
chunked
=========================
Fiddler results:
Result Protocol Host URL Body Caching Content-Type Process Comments Custom
1 304 HTTP www.rolandgarros.com /images/misc/weather/P8.gif 0 max-age=700 Expires: Tue, 05 Jun 2012 17:53:40 GMT image/gif firefox:5456
2 200 HTTP www.google.com / 23,697 private, max-age=0 Expires: -1 text/html; charset=UTF-8 chrome:2324
3 304 HTTP www.rolandgarros.com /images/misc/weather/P9.gif 0 max-age=700 Expires: Tue, 05 Jun 2012 17:53:57 GMT image/gif firefox:5456
4 200 HTTP Tunnel to translate.googleapis.com:443 0 chrome:2324
5 200 HTTP www.google.com
The difference is Fiddler is actually recording an entire session, not just a single HTTP request.
If a user loads Google.com, the response is typically an HTML document which contains images, script files, CSS files, etc. Your browser will then initiate a new HTTP request for each one of those resources. With Fiddler running, it tracks each of those HTTP requests and spits out the result code and other information about the session.
With your C# code above, you're only initiating a single HTTP request, thus you only have information about a single result.
You'd probably be better off writing a browser plugin. Otherwise, you'd have to parse the HTML response and load other resources from that document as well.
If you do need to do this with C# code, you could probably parse the document with the HTML Agility Pack and then look for other resources within the HTML to simulate a browser. There's also embedded browsers, such as Awesomium, that might be helpful.
You are not asking for the same information that Fiddler is displaying. Fiddler shows the HTTP Status code, the host and URI and (it appears, from your example) the Content Length, Content Type and Cache status.
For many of these you will have to peek in to the response headers.

WebClient problem with URL which ends with a period

I'm running the following code;
using (WebClient wc = new WebClient())
{
string page = wc.DownloadString(URL);
...
}
To access the URL of a share price website, http://www.shareprice.co.uk
If you append a company's symbol name onto the end of the URL, then a page is returned which I parse to get the latest price info etc.
e.g.
http://www.shareprice.co.uk/VOD
http://www.shareprice.co.uk/TW.
Now, my problem is that some symbols end in periods, as in the second example there. For some unknown reason, the code above has a problem retrieving these sorts of URLs.
There is no run-time error, but a page is returned back which reports "Symbol could not be found" from the website itself, indicating that something is happening to the period on the end of the URL in between the call to DownloadString and the actual HTTP request.
Does anyone have any idea what might be causing this, and how to fix it?
Thanks
It seems you found a bug in WebClient/WebRequest, though perhaps Microsoft put that in intentionally, who knows. Nonetheless, when you pass in TW., the URI class is translating that to TW without the period. Since WebClient/WebRequest parse strings into URI, your . is disappearing in that world.
You may have to use TcpClient to get around this and roll your own web client. Any variation of this:
TcpClient oClient = new TcpClient("www.shareprice.co.uk", 80);
NetworkStream ns = oClient.GetStream();
StreamWriter sw = new StreamWriter(ns);
sw.Write(
string.Format(
"GET /{0} HTTP/1.1\r\nUser-Agent: {1}\r\nHost: www.shareprice.co.uk\r\n\r\n",
"TW.",
"MyTCPClient" )
);
sw.Flush();
StringBuilder sb = new StringBuilder();
while (true)
{
int i = ns.ReadByte(); // Inefficient but more reliable
if (i == -1) break; // Other side has closed socket
sb.Append( (char) i ); // Accrue 'c' to save page data
}
oClient.Close();
This will give you a 302 redirect, so just parse out the 'Location:' and execute the above again with the new location.
HTTP/1.1 302 Found
Date: Wed, 11 Nov 2009 19:29:27 GMT
Server: lighttpd
X-Powered-By: PHP/5.2.4-2ubuntu5.7
Expires: Thu, 19 Nov 1981 08:52:00 GMT
Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre-check=0
Pragma: no-cache
Location: /TW./TAYLOR-WIMPEY-PLC
Content-type: text/html; charset=UTF-8
Content-Length: 0
Set-Cookie: SSID=668d5d0023e9885e1ef3762ef5e44033; path=/
Vary: Accept-Encoding
Connection: close
Try adding a slash to the end, after the period. Your normal web browser will do that for you, and the WebClient class isn't that smart.
http://www.shareprice.co.uk/TW./
This worked for me as well when I typed it into the browser.
Edit - added
The following all also worked in the browser
http://www.shareprice.co.uk/TW
and
http://www.shareprice.co.uk/TW/
so it looks like you should be able to just check to see if the last character is a period, and remove it.
use URL encoding...it will turn the "." into %2E
To address a single period (.) at the end of a URL use the following:
<system.web>
<httpRuntime relaxedUrlToFileSystemMapping="true" />
</system.web>
To address two periods (..) or other denied sequences, see the following article:
http://www.iis.net/ConfigReference/system.webServer/security/requestFiltering/denyUrlSequences
Just add a space after the period, when parsing the space will be removed but the period will stay there.

Categories

Resources