Unicode in Content-Disposition header

Unicode in Content-Disposition header - c#

I am using HttpContext object implemented in HttpHandler child to download a file, when I have non-ascii characters in file name it looks weird in IE whereas it looks fine in Firefox.
below is the code:-
context.Response.ContentType = ".cs";
context.Response.AppendHeader("Content-Length", data.Length.ToString());
context.Response.AppendHeader("Content-Disposition", String.Format("attachment; filename={0}",filename));
context.Response.OutputStream.Write(data, 0, data.Length);
context.Response.Flush();
when I supply 'ÃŸ' 'Ã¤' 'Ã¶' 'Ã¼' 'Ã³' 'ÃŸ' 'Ã¤' 'Ã¶' 'Ã¼' 'Ã³' in file name field it looks different than what I have in file name it looks fine in firefox. adding EncodingType and charset has been of no use.
In ie it is 'ÃƒÅ¸''ÃƒÂ¤''ÃƒÂ¶''ÃƒÂ¼''ÃƒÂ³''ÃƒÅ¸''ÃƒÂ¤''ÃƒÂ¶''ÃƒÂ¼'_'ÃƒÂ³' and in firefox it is 'ÃŸ' 'Ã¤' 'Ã¶' 'Ã¼' 'Ã³' 'ÃŸ' 'Ã¤' 'Ã¶' 'Ã¼' 'Ã³'.
Any Idea how this can be fixed?

I had similar problem. You have to use HttpUtility.UrlEncode or Server.UrlEncode to encode filename. Also I remember firefox didn't need it. Moreoverit ruined filename when it's url-encoded. My code:
// IE needs url encoding, FF doesn't support it, Google Chrome doesn't care
if (Request.Browser.IsBrowser ("IE"))
{
fileName = Server.UrlEncode(fileName);
}
Response.Clear ();
Response.AddHeader ("content-disposition", String.Format ("attachment;filename=\"{0}\"", fileName));
Response.AddHeader ("Content-Length", data.Length.ToString (CultureInfo.InvariantCulture));
Response.ContentType = mimeType;
Response.BinaryWrite(data);
Edit
I have read specification more carefully. First of all RFC2183 states that:
Current [RFC 2045] grammar restricts parameter values (and hence Content-Disposition filenames) to US-ASCII.
But then I found references that [RFC 2045] is absolete and one must reference RFC 2231, which states:
Asterisks ("*") are reused to provide
the indicator that language and
character set information is present
and encoding is being used. A single
quote ("'") is used to delimit the
character set and language information
at the beginning of the parameter
value. Percent signs ("%") are used as
the encoding flag, which agrees with
RFC 2047.
Which means that you can use UrlEncode for non-ascii symbols, as long as you include the encoding as stated in the rfc. Here is an example:
string.Format("attachment; filename=\"{0}\"; filename*=UTF-8''{0}", Server.UrlEncode(fileName, Encoding.UTF8));
Note that filename is included in addition to filename* for backwards compatibility. You can also choose another encoding and modify the parameter accordingly, but UTF-8 covers everything.

HttpUtility.UrlPathEncode might be a better option. As URLEncode will replace spaces with '+' signs.

For me this solution is working on all major browsers:
Response.AppendHeader("Content-Disposition", string.Format("attachment; filename*=UTF-8''{0}", HttpUtility.UrlPathEncode(fileName).Replace(",", "%2C"));
var mime = MimeMapping.GetMimeMapping(fileName);
return File(fileName, mime);
Using ASP.NET MVC 3.
The Replace is necessary, because Chrome doesn't like Comma (,) in parameter values: http://www.gangarasa.com/lets-Do-GoodCode/tag/err_response_headers_multiple_content_disposition/

You may want to read RFC 6266 and look at the tests at http://greenbytes.de/tech/tc2231/.

For me this solved the problem:
var result = new HttpResponseMessage(HttpStatusCode.OK)
{
Content = new ByteArrayContent(data)
};
result.Content.Headers.ContentDisposition = new ContentDispositionHeaderValue("attachment")
{
FileNameStar = "foo-ä-€.html"
};
When i look ad the repsonse in fiddler i can see the filename has automaticcaly been encoded using UTF-8:
Fiddler response example with encoded Content-Disposition filename using UTF-8
If we look at the value of the Content-Disposition header we can see it will be the same as #Johannes Geyer his answer. The only difference is that we didn't have to do the encoding ourselfs, the ContentDispositionHeaderValue class takes care of that.
I used the Testcases for the Content-Disposition header on: http://greenbytes.de/tech/tc2231/ as mentioned by Julian Reschke.
Information about the ContentDispositionHeaderValue class can be found on MSDN.

For Asp.Net Core (version 2 as of this post) UrlPathEncode is deprecated, here's how to achieve the desired result:
System.Net.Mime.ContentDisposition cd = new System.Net.Mime.ContentDisposition
{
FileName = Uri.EscapeUriString(fileName),
Inline = true // false = prompt the user for downloading; true = browser to try to show the file inline
};
Response.Headers.Add("Content-Disposition", cd.ToString());

I`m using Uri.EscapeUriString for converts all characters to their hexadecimal representation, and string.Normalize for Unicode normalization form C.
(tested in ASP.NET MVC5 framework 4.5)
var contentDispositionHeader = new System.Net.Mime.ContentDisposition
{
Inline = false,
FileName = Uri.EscapeUriString(Path.GetFileName(pathFile)).Normalize()
};
Response.Headers.Add("Content-Disposition", contentDispositionHeader.ToString());
string mimeType = MimeMapping.GetMimeMapping(Server.MapPath(pathFile));
return File(file, mimeType);

Related

WebClient.DownloadString uses wrong encoding

I'm downloading XML files from sharepoint online using webclient.
However, when I use WebClient.DownloadString(string url) method, some characters are not correctly decoded.
When I use WebClient.DownloadFile(string url, string file) and then I read the file all characters are correct.
The xml itself does not contain encoding declaration.
string wrongXml = webClient.DownloadString(url);
//wrongXml contains Ä™ instead of ę
webClient.DownloadFile(url, #"C:\temp\file1.xml");
string correctXml = File.ReadAllText(#"C:\temp\file1.xml");
//contains ę, like it should.
Also, when open the url in Internet Explorer, it is shown correctly.
Why is that? Is it because of the default windows encoding on my machine or webclient handles responses differently when using DownloadString, resp DownloadFile?

Probably the encoding it is using now is not the one the service returns.
You can set the encoding you expect before you make the request:
webClient.Encoding = Encoding.UTF8;
string previouslyWrongXml = webClient.DownloadString(url);

Can you use "inline" content-disposition with "application/octet-stream"?

I need the browser to open file types it understands directly in the browser (i.e. no "Open/Save/Cancel" dialog.
Here's my code, which currently works great!...except that every file pops up the dialog box and doesn't directly open the file:
string filePath = Path.Combine(WebConfigurationManager.AppSettings["NewsAttachmentPath"], context.Request.QueryString["FileName"]);
byte[] bytes = System.IO.File.ReadAllBytes(filePath);
context.Response.Clear();
context.Response.ContentType = "application/octet-stream";
context.Response.Cache.SetCacheability(HttpCacheability.Private);
context.Response.Expires = -1;
context.Response.Buffer = true;
context.Response.AddHeader("Content-Disposition", string.Format("{0};FileName=\"{1}\"", "inline", context.Request.QueryString["FileName"]));
context.Response.BinaryWrite(bytes);
context.Response.End();
As you can see, even when I change the Content-Disposition to "inline", it still prompts for download. This is with files that I know my browser understands. In other words, I can go to some random site and click a PDF, and it will open in the browser. My site will make me save it in order to view it.
Pre-emptive answer to "why do you wanna use application/octet-stream?" Because I don't want to create a handler for each single file type. If this is mistaken, please let me know.

You do not need to create a handler per file type. You just change the line:
context.Response.ContentType = "application/octet-stream";
to be:
string contentType = //your logic here, possibly many lines in a separate method
context.Response.ContentType = contentType;
But no: you can't "inline" an application/octet-stream. That means "here's some bytes, but I don't know what they are". The browser can't do much with that, other than save it somewhere, hence a download prompt. You can use content-disposition to suggest a filename, though.
The browser does not work on file extensions - it works on content-type. So: you need to report the correct content-type in your response. This might mean writing a switch / lookup based on the file extension that you know, or it might mean storing the explicit content-type separately as metadata along with the file information.

ABCpdf 5 Problems with encoding (special characters)

I am using ABCpdf Version 5 in order to render some html-pages into PDFs.
I basically use HttpServerUtility.Execute() - Method in order to retrieve the html for the pdf:
System.IO.StringWriter writer = new System.IO.StringWriter();
server.Execute(requestUrl, writer);
string pageResult = writer.ToString();
WebSupergoo.ABCpdf5.Doc pdfDoc = new WebSupergoo.ABCpdf5.Doc();
pdfDoc.AddImageHtml(pageResult);
response.Buffer = false;
response.ContentType = "application/pdf";
response.AddHeader("Content-Disposition", "attachment;filename=MyPdf_" +
FormatDate(DateTime.Now, "yyyy-MM-dd") + ".pdf");
response.BinaryWrite(pdfDoc.GetData());
Now some special characters like Umlaute (äöü) are replaced with an empty space. Interestingly not all of them. What I did figure out:
Within the html-page I have.
`<meta http-equiv="content-type" content="text/xhtml; charset=utf-8" />`
If I parse this away, all special chars are rendered correctly. But this seems to me like an ugly hack.
In earlier days I did not use HttpServerUtility.Execute(), but I let ABCpdf call the URL itself: pdfDoc.AddImageUrl("someUrl");. There I had no such encoding-problems.
What could I try else?

Just came across this problem with ABCpdf 8.
In your code you retrieve HTML contents and pass the pageResult to AddImageHtml(). As the documentation states,
ABCpdf saves this HTML into a temporary file and renders the file
using a 'file://' protocol specifier.
What is not mentioned is that the temp file is UTF-8 encoded, but the encoding is not stated in the HTML file.
The <meta> tag actually sets the required encoding, and solved my problem.
One way to avoid the declaration of the encoding is to use the AddImageUrl() method that I expect to detect the HTML encoding from the HTTP/HTML response.

Encoding meta tag and AddImageURL method perhaps helps with simple document, but not in a chain situation, where encoding somehow gets lost despite encoding tag. I encountered this problem (exactly as described in original question - some foreign characters such as umlauts would disappear), and see no solution. I am considering getting rid of ABCPDF altogether and replace it with SSRS, which can render PDF formats.

How to "iso-8859-1" encoding a string in jQuery?

I'm looking for a jQuery(or jQuery plugin) equivalent of this C# code block. What it does is to encode a string to base64 string in iso-8859-1 character set.
string authInfo = "encrypted secret";
Encoding encoding = Encoding.GetEncoding("iso-8859-1");
byte[] authBytes = encoding.GetBytes(authInfo);
string encryptedMsg = Convert.ToBase64String(authBytes);
Is there a plugin out there that can do this?

Found a jQuery plugin that's close enough to what I need: Base64 encode and decode
It doesn't have an option to specify character set but I can live with it for now. So the jQuery code becomes:
authInfo = $.base64.encode(authInfo);

I believe you must specify the character encoding of the page (or where ever authInfo is defined) to ISO-8859-1. You may also specify the character encoding of the tag for referenced javascript files if authInfo is defined in one of those.
As for base64 encoding, here's a page that has a code snippet that does just that: http://www.webtoolkit.info/javascript-base64.html

Is it necessary to url encode the file name?

In my asp.net mvc application I have the code:
response.ContentType = "application/octet-stream";
response.AddHeader("Content-Disposition", "attachment;filename=" +
HttpUtility.UrlEncode(attachment.FileName));
So that all the Chinese characters are url-encoded to something like %5C%2D. In IE/Chrome when users download the file, they get the Chinese file name(that is, IE/Chrome will automatically url-decode the file name). But in Firefox, they will get something like %5C%2D%0A.docx. Now I'm going to remove HttpUtility.UrlEncode in the code. But before doing this, I want to confirm that there is no security issues in this case. Would you please give me some ideas?
EDIT Corbin's answer is correct. But after removing the url-encoding of the filename, some users using old version IE will get strange messy file names. At last I do url-encode for those users only.

The name is allowed to be in quotes if it's ASCII.
If it's non-ASCII, then you have to use the encoding defined in RFC 2231 or the one in RFC 5987 or the one in RFC 2047... which browsers support which of these is a fun game, of course. :(
If you just stick the raw non-ASCII bytes into the header, it will almost certainly look like garbage for a large fraction of users.

please change your code as follow:
if (Request.Browser.Browser == "IE" || Request.Browser.Browser == "Chrome")
{
filename = HttpUtility.UrlPathEncode(filename);
}
Response.AddHeader("Content-Disposition", "attachment;filename=\"" + filename + "\"");
notes: your code miss "\"" for wrap file name in quotes

http://www.w3.org/Protocols/rfc2616/rfc2616-sec19.html
Unless I'm misunderstanding it, it looks like the name should just be in quotes, not url encoded.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Unicode in Content-Disposition header - c#

HttpUtility.UrlPathEncode might be a better option. As URLEncode will replace spaces with '+' signs.

You may want to read RFC 6266 and look at the tests at http://greenbytes.de/tech/tc2231/.

Related

WebClient.DownloadString uses wrong encoding

Can you use "inline" content-disposition with "application/octet-stream"?

ABCpdf 5 Problems with encoding (special characters)

How to "iso-8859-1" encoding a string in jQuery?

Is it necessary to url encode the file name?

Categories

Resources