I need to allow users to download files from our server, and I'd like to serve these files via an ASP.NET MVC 5 controller action. My action looks like this:
public FileContentResult Download(int fileId)
{
var myContent = GetContentForFile(fileId);
var myFileMeta = GetFileMeta(fileId);
if (myContent == null || myFileMeta == null)
throw new FriendlyException("The file or its associated data could not be found.");
return File(myContent.Content, myContent.MediaType, myFileMeta.FileName);
}
The above is as simple as I could get it, it works fine on PC and iPhone, but not on Android. Using Fiddler, I can see that the following response headers when I try to download one of my files - in this case a JPG file called "1447114384146-643143584.jpg":
HTTP/1.1 200 OK
Cache-Control: private, s-maxage=0
Content-Type: image/jpeg
Server: Microsoft-IIS/8.5
X-AspNetMvc-Version: 5.2
Content-Disposition: attachment; filename=1447114384146-643143584.jpg
X-AspNet-Version: 4.0.30319
X-Powered-By: ASP.NET
Date: Thu, 12 Nov 2015 23:09:00 GMT
Content-Length: 1682868
Note that I don't have any reliable way to know the correct MIME-type - is this an issue and could it explain why the file isn't being downloaded in Android?
To clarify, when I attempt to download any file from the database using Android, I get a toast notification telling me "Download started", but then the download sits in the queue for a while on 0% before eventually just changing to "Failed".
What I've tried
I've tried all manner of things that people have suggested in similar questions, most of which are to do with the content-disposition header or the content-type header. I've tried forcing the content-type header to application/octet-stream for every file, I've tried sending the correct content-type header for the particular file. I've tried manually sending the content-disposition header. I've tried forcing the filename extension to uppercase.
None of the above has worked, in fact none of them have had any impact at all on the problem, for better OR worse. I'm amazed that this is so hard - I feel like I must be missing something obvious?
Additional information
Browser: latest Chrome on Android
OS: Android 5.1 (also occurs for a coworker on their Android phone which is at an earlier Android version (not sure which specifically), so I don't think this is tied to a specific Android version).
Update
After reading this blog entry: http://www.digiblog.de/2011/04/android-and-the-download-file-headers/ I tried following the advice and set my headers exactly as suggested:
HTTP/1.1 200 OK
Cache-Control: private, s-maxage=0
Content-Type: application/octet-stream
Server: Microsoft-IIS/8.5
X-AspNetMvc-Version: 5.2
Content-Disposition: attachment; filename="1447114384146-643143584.JPG"
X-AspNet-Version: 4.0.30319
X-Powered-By: ASP.NET
Date: Thu, 12 Nov 2015 23:42:18 GMT
Content-Length: 1682868
Again, this had no impact on the problem at all.
Futher update
I have been able to test on a Marshmallow (Android v6.0) device and the download works. It seems to be a pre-Marshmallow issue.
Sadly this was caused by something very specific to my environment, but I'd like to put the answer here in case anyone else stumbles across this same problem.
It turns out the Android download manager doesn't like underscores in domain names, and our local domain address had an underscore in it. I used the server's IP address instead and everything worked as expected.
For example this: http://www.my_domain.com.au/file.png won't work. This: http://192.168.x.x/file.png does work.
Found as an answer on this question: Trouble downloading file from browser on Android
Disclaimer: I don't have enough rep to add to the comments so I am forced to comment here.
Have you tried different versions of Android using the emulator or
have you only tried using an actual device?
If only on a device, is the code in production or are using
connecting to your local development system through a local wireless
connection?
Have you tried to use Chrome Remote Debugging on the device?
https://developers.google.com/web/tools/chrome-devtools/debug/remote-debugging/remote-debugging?hl=en
As a way to rule out issues with the setup on your device would be to write a small Android app using Xamarin + RestSharp that does nothing but hits your download url to see if that works. If it does, then that helps to point the finger at Chrome itself. If it doesn't then at least you can run the app with the debugger attached to get better insight as to what is happening on the other end.
https://xamarin.com/
https://github.com/restsharp/RestSharp
UPDATE: Response headers as seen by Fiddler when calling a test served by my local machine
HTTP/1.1 200 OK
Cache-Control: private
Content-Type: application/octet-stream
X-Content-Type-Options: nosniff
X-Frame-Options: DENY
Content-Disposition: attachment; filename=profile.jpg
Date: Fri, 13 Nov 2015 02:09:23 GMT
Content-Length: 218143
Update: Here are the incoming request server variable
ALL_HTTP=HTTP_CACHE_CONTROL:max-age=0
HTTP_CONNECTION:keep-alive
HTTP_ACCEPT:text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8
HTTP_ACCEPT_ENCODING:gzip, deflate, sdch
HTTP_ACCEPT_LANGUAGE:en-US,en;q=0.8
HTTP_COOKIE:_ga=GA1.1.420021277.1447377172
HTTP_HOST:192.168.1.2
HTTP_USER_AGENT:Mozilla/5.0 (Linux; Android 5.0.2; HTC One Build/LRX22G) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/46.0.2490.76 Mobile Safari/537.36
HTTP_UPGRADE_INSECURE_REQUESTS:1
HTTP_DNT:1
ALL_RAW=Cache-Control: max-age=0
Connection: keep-alive
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8
Accept-Encoding: gzip, deflate, sdch
Accept-Language: en-US,en;q=0.8
Cookie: _ga=GA1.1.420021277.1447377172
Host: 192.168.1.2
User-Agent: Mozilla/5.0 (Linux; Android 5.0.2; HTC One Build/LRX22G) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/46.0.2490.76 Mobile Safari/537.36
Upgrade-Insecure-Requests: 1
DNT: 1
APPL_MD_PATH=/LM/W3SVC/2/ROOT
APPL_PHYSICAL_PATH=C:\development\rumble-strip\projects\net-framework\RumbleStrip.Website\
AUTH_TYPE=
AUTH_USER=
AUTH_PASSWORD=
LOGON_USER=
REMOTE_USER=
CERT_COOKIE=
CERT_FLAGS=
CERT_ISSUER=
CERT_KEYSIZE=
CERT_SECRETKEYSIZE=
CERT_SERIALNUMBER=
CERT_SERVER_ISSUER=
CERT_SERVER_SUBJECT=
CERT_SUBJECT=
CONTENT_LENGTH=0
CONTENT_TYPE=
GATEWAY_INTERFACE=CGI/1.1
HTTPS=off
HTTPS_KEYSIZE=
HTTPS_SECRETKEYSIZE=
HTTPS_SERVER_ISSUER=
HTTPS_SERVER_SUBJECT=
INSTANCE_ID=2
INSTANCE_META_PATH=/LM/W3SVC/2
LOCAL_ADDR=192.168.1.2
PATH_INFO=/
PATH_TRANSLATED=C:\development\rumble-strip\projects\net-framework\RumbleStrip.Website
QUERY_STRING=&REMOTE_ADDR=192.168.1.5&REMOTE_HOST=192.168.1.5
REMOTE_PORT=54748
REQUEST_METHOD=GET
SCRIPT_NAME=/
SERVER_NAME=192.168.1.2
SERVER_PORT=80
SERVER_PORT_SECURE=0
SERVER_PROTOCOL=HTTP/1.1
SERVER_SOFTWARE=Microsoft-IIS/10.0
URL=/
HTTP_CACHE_CONTROL=max-age=0
HTTP_CONNECTION=keep-alive
HTTP_ACCEPT=text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8
HTTP_ACCEPT_ENCODING=gzip, deflate, sdch
HTTP_ACCEPT_LANGUAGE=en-US,en;q=0.8
HTTP_COOKIE=_ga=GA1.1.420021277.1447377172
HTTP_HOST=192.168.1.2
HTTP_USER_AGENT=Mozilla/5.0 (Linux; Android 5.0.2; HTC One Build/LRX22G) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/46.0.2490.76 Mobile Safari/537.36
HTTP_UPGRADE_INSECURE_REQUESTS=1
HTTP_DNT=1
IS_LOGIN_PAGE=1
Related
~(I found out when it happens, see bottom of question)
I am working with a traditional ASP.NET web application. There is an .aspx page that hosts an angular 11 application which loads fine 9/10 times but occasionally a bad response is returned with a 200 OK status. When this happens, in Firefox a page loads with a "content encoding error" and in Chrome and Edge, just a blank screen with the same verbiage in console.
Using Wireshark, I was able to determine that when the "content-encoding-error" occurs the response header has three comma separated "gzip" values appended to the Content-Encoding header, see below:
HTTP/1.1 200 OK
Cache-Control: no-cache, no-store, must-revalidate
Content-Type: text/html; charset=utf-8
Content-Encoding: gzip, gzip, gzip
...
Whereas, a normal response from the .aspx page look like this.
HTTP/1.1 200 OK
Cache-Control: no-cache, no-store, must-revalidate
Content-Type: text/html; charset=utf-8
Content-Encoding: gzip
..
I can duplicate the issue using one of aspx's [web method] calls:
var ctx = HttpContext.Current
var unused = ctx.Response.Filter // Because apparently you must access it before you can set it
ctx.Response.Filter = new GZipStream(ctx.Response.OutputStream, CompressionLevel.Optimal)
ctx.Response.AppendHeader("Content-Encoding", "gzip")
ctx.Response.AppendHeader("Content-Encoding", "gzip") // <--Gzip added twice here
The troubling part is that the multiple "gzip" values are on the response from the aspx page itself. I have search the entire code base and all web.config(s) in an attempt to find where this compression is being applied but to no avail. So, I am thinking it could be a third party doing this.
We use DevExtreme and I have been looking at these settings in our config:
<add key="DXEnableCallbackCompression" value="true" />
<add key="DXEnableResourceCompression" value="true" />
<add key="DXEnableResourceMerging" value="true" />
<add key="DXEnableHtmlCompression" value="true" />
I am still having trouble scanning the code for issues. Does anyone know of a trick using fiddler or Wireshark or any other tool that could reveal where these headers are sporadically showing triples at?
Edit: Here is the GET request header which returns a response which proper encoding ~90% of the time.
GET http://xxx/xxx.aspx?xxx=4 HTTP/1.1
Host: xxx.com
Connection: keep-alive
Pragma: no-cache
Cache-Control: no-cache
Upgrade-Insecure-Requests: 1
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.131 Safari/537.36 Edg/92.0.902.73
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9
Referer: http://xxx/Home.aspx
Accept-Encoding: gzip, deflate
Accept-Language: en-US,en;q=0.9
Cookie: ASP.NET_SessionId=x; .ASPXAUTH=x;
Found out when it happens:
I was able to duplicate this issue on a regular basis. If I close all browser sessions and recycle the app pool, the issue occurs on the first request. On subsequent requests, the issue does not happen.
Also, the culprit is a google script embedded on the HTML page. When this script is removed the page loads fine on first request after a recycle or not.
Culprit Code:
<script type="text/javascript" src="//maps.googleapis.com/maps/api/js?key=
<%=GoogleMapAPIKey%>&channel=<%=GoogleMappingChannel%>"></script>
I am sure it is not the js file itself. The keys are embedded into the tag via server side processors. Those two processor's call an API to get the keys and those calls are gzipped. I still don't know why the aspx's response header is getting three "gzips" when the js include statement is present in the page markup.
I may remove this wall of text and add a new question due to the new findings.
The problem seems to occur when you Gzip encoding was added to outgoing that were being triggered from markup on the aspx page. All web methods that are called after page load and in an async fasion from the angular client have no encoding issue.
There were two called via a page property that was triggered by page markup to access its value. These web methods had gzip applied and I guess since these were processed earlier in the page-lifecycle something was getting mixed up.
My problem was solved by removing the compression on those two calls.
There were two calls to a function that added Gzip encoding prior to page load and at that time the response was the aspx page itself.
I have a .NET Core 1.1 MVC controller that somehow isn't getting called correctly when a request comes in.
The controller method looks like this:
Although I don't know if that really matters, because I have debugging lines in the controller's constructor (and have run in debug with breakpoints in the constructor, as well), and it looks like even the constructor is never getting called.
The application output contains a line like this, when the call comes into the server:
Microsoft.AspNetCore.Mvc.Internal.ControllerActionInvoker:Information: Executed action Namespace.NameController.GetData (AssemblyName) in 0.9161ms
Something that is suspicious there is that most of the lines I see logged like this for other controllers being called contain the argument information, as well, and this one doesn't.
I'm not getting an error from the client side, instead I'm getting a success response with an empty body. It's almost like an empty response is getting returned before any of my controller's code actually runs.
Here are the details of the request/response (the response body is empty):
Request URL: http://localhost:61410/path-to-controller/GetData?xtype=xtypeargument
Request Method: POST
Status Code: 200
Query Url
xtype: xtypeargument
Request Headers
accept: */*
Origin: http://localhost:61410
X-Requested-With: XMLHttpRequest
User-Agent: Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.113 Safari/537.36
Authorization: bearer <bearer-token>
Referer: http://localhost:61410/path-to/index.html
Accept-Encoding: gzip, deflate, br
Accept-Language: en-US,en;q=0.8
Response Headers
Server: Kestrel
Access-Control-Allow-Origin: *
X-SourceFiles: =?UTF-8?B?...?=
X-Powered-By: ASP.NET
Date: Wed, 06 Sep 2017 14:08:10 GMT
Content-Length: 0
Any ideas of what might be going on here?
It turns out that the controller's constructor had arguments that were expected to be provided by Dependency Injection, but that weren't. Somehow this caused the behavior I was seeing, although I still don't really understand why I wasn't getting exceptions instead of these empty responses. Anyway, I fixed the code and it's working now.
I'm using Abot (C#) to crawl a website using the standard settings in their getting started documentation.
After retrieving a web page I can't read the content - it doesn't appear to have been decoded correctly.
If I comment out the Abot code and just use the standard (HttpWebResponse)request.GetResponse() .net method I can see the page content correctly.
I want to use Abot for its scraping capabilities though. But as you can see below I get a load of incorrectly decoded content.
Has anyone got any ideas on how I can fix the problem?
EDIT: I'm pretty sure its something to do with the website as I don't have the same problem if I run against http://www.google.com
EDIT 2: Here are the headers
WebRequest
User-Agent: Mozilla/5.0 (Windows NT 6.3; Trident/7.0; rv:11.0) like Gecko
Accept: */*
Host: www.<website>.com
Connection: Keep-Alive
WebResponse
Transfer-Encoding: chunked
Connection: keep-alive
Content-Type: text/html; charset=UTF-8
Date: Wed, 29 Jul 2015 12:28:53 GMT
Set-Cookie: __cfduid=de5028c9ea76b127d7aebe40617a7a6b51438172932; expires=Thu, 28-Jul-16 12:28:52 GMT; path=/; domain=.<website>.com; HttpOnly,PHPSESSID=e2ekece8flgs000h6u6kvf66k6; path=/,ct_cookies_test=7a1a1460017221ec70f96f0f2a3cdaac; path=/
X-Powered-By: W3 Total Cache/0.9.4.1
Expires: Wed, 29 Jul 2015 13:28:53 GMT
Cache-Control: max-age=3600, public, must-revalidate, proxy-revalidate
Pragma: public
X-Pingback: http://www.<website>.com/<file>.php
Link: <http://wp.me/P2xmvI-a>; rel=shortlink
Last-Modified: Wed, 29 Jul 2015 12:28:53 GMT
Vary: Accept-Encoding,User-Agent
Server: cloudflare-nginx
CF-RAY: 20d8d37b9fc406be-LHR
If you remove the User-Agent: Mozilla/5.0 (Windows NT 6.3; Trident/7.0; rv:11.0) like Gecko header your response will probably be more readable. I'm not sure but it looks like the web server encodes responses sent to this user agent in some way. (I'm not an expert either)
I can recommend you to use Fiddler (http://www.telerik.com/fiddler) to check how web requests are handled. (Which is quite nice for debugging this kind of problems)
Bad content seen in fiddler
Correct content seen in fiddler
I am trying to handle a website programmatically. Lets say I visit the page www.example.com/something. On the website there is a button which I am pressing. The code of the button looks something like this:
<form action="/something" method="POST" enctype="text/plain">
<input type="submit" class="button" value="Click me" >
</form>
Pressing this button updates the information on the website.
Now I would like to do this procedure programatically to receive the content of the updated website after pressing the button.
Can someone lead me to the right direction on how to do this? preferably in C#.
Thank you in advance!
Edit:
I used Fiddler to capture the HTTP request and response, it looks like this:
POST /something HTTP/1.1
Host: example.com
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:25.0) Gecko/20100101 Firefox/25.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: de-de,de;q=0.8,en-us;q=0.5,en;q=0.3
Accept-Encoding: gzip, deflate
Referer: http://example.com/something
Cookie: cookie1=cookiecontent; cookie2=cookiecontent
Connection: keep-alive
Content-Type: text/plain
Content-Length: 0
HTTP/1.1 200 OK
Cache-Control: private
Content-Type: text/html; charset=utf-8
Content-Encoding: gzip
Vary: Accept-Encoding
Server: Microsoft-IIS/8.0
X-AspNet-Version: 4.0.30319
X-Powered-By: ASP.NET
Date: Thu, 05 Dec 2013 23:36:31 GMT
Content-Length: 2202
Although the requests includes cookies they don't appear to be relevant. I decompressed the received content with fiddler and found the wanted data to be included in the response.
I am not very experienced in HTTP requests and am therefore hoping that someone can help me convertion this into a C# http request to receive the content.
If the website in question is open and doesn't do any sort of cookie generation to validate requests (there are plenty of sites like this) then you can just use System.Net.WebRequest or similar to post the required form data, then examine the response. See this MSDN page for an example.
If the page does use cookies and so on you'll have to get a bit more creative. In some cases you can issue one web request to get the first page, examine the results for cookies and hidden form values and use those in your POST.
If all else fails then the Selenium WebDriver library will give you almost complete browser emulation with full access to the DOM. It's a bit more complex than using a WebRequest, but will work for pretty much everything you can use a web browser for.
Regardless of which method you use, Fiddler is a good debugging tool. Use it to compare what your C# code is doing to what the web browser is doing to see if there's anything your code isn't getting right.
Since it's a submit button then simulating the resulting HTTP Request would be easier than simulating a click. First, I would use a program like Fiddler to inspect what is being sent when you submit the form. Then I would replicate that request, just changing the values that I need changing, using HTTPWebRequest. You can find an example here.
The resultant HTTPWebResponse can then be parsed for data. Using something like HtmlAgilityPack makes that part easier.
You can do what you want with http://www.seleniumhq.org/projects/webdriver/. It is possible to do web automation with c# in a console program. I am using it for ui integration testing and it works fairly well
I would look into searching for a browser automation framework. I would usually do this in Python and have not used .Net for this, but a quick Google search yields quite a few results.
Included within these:
http://watin.org/
Web automation using .NET
Can we script and automate a browser, preferably with .Net?
I am planning create a movil application (for fun) that should use the result from this web page (http://consultawebvehiculos.carabineros.cl/index.php). is there any ways to create a instance of a browser in my Net code and read this result and publish it using a web service..
something like:
var IE= new broswer("http://consultawebvehiculos.carabineros.cl/index.php");
var result=IE.FindElementByID("txtIdentityCar").WriteText(YourIdentityCar);
publicToWebSerivce(result);
Update:
Using Fiddler i can see that http post is somthing like that:
POST http://consultawebvehiculos.carabineros.cl/index.php HTTP/1.1
Host: consultawebvehiculos.carabineros.cl
Connection: keep-alive
Content-Length: 61
Cache-Control: max-age=0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Origin: http://consultawebvehiculos.carabineros.cl
User-Agent: Mozilla/5.0 (Windows NT 6.2; WOW64) AppleWebKit/537.17 (KHTML, like Gecko) Chrome/24.0.1312.57 Safari/537.17
Content-Type: application/x-www-form-urlencoded
Referer: http://consultawebvehiculos.carabineros.cl/index.php
Accept-Encoding: gzip,deflate,sdch
Accept-Language: es-ES,es;q=0.8
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.3
accion=buscar&txtLetras=CL&txtNumeros1=sk&txtNumeros2=12&vin=
May be i need some .Net class like webclient in order connect with the php page...no sure.
UPDATE: I finally i found the solution using Fiddler to know the total parameters and I've used the code from http://www.hanselman.com/blog/HTTPPOSTsAndHTTPGETsWithWebClientAndCAndFakingAPostBack.aspx
If your are just interested in scraping the page, I suggest using Html Agility Pack.
If you also want to display the page, then you could use the WebBrowser control.
We've been using http://htmlunit.sourceforge.net/ for similair tasks. It allows you to send requests, receive response/status code/etc.
(it's a Java lib, so you could either google for a .Net port or use a converter to convert Java assembly into .Net assembly - see http://blog.stevensanderson.com/2010/03/30/using-htmlunit-on-net-for-headless-browser-automation/ for guidance. We've used the convertion approach).