Why UTF-8 fail on this encoding? - c#

I'm about to download a page encoded in UTF-8.
So this is my code:
using (WebClient client = new WebClient())
{
client.Headers.Add("user-agent", Request.UserAgent);
htmlPage = client.DownloadString(HttpUtility.UrlDecode(resoruce_url));
var KeysParsed = HttpUtility.ParseQueryString(client.ResponseHeaders["Content-Type"].Replace(" ", "").Replace(";", "&"));
var charset = ((KeysParsed["charset"] != null) ? KeysParsed["charset"] : "UTF-8");
Response.Write(client.ResponseHeaders);
byte[] bytePage = Encoding.GetEncoding(charset).GetBytes(htmlPage);
using (var reader = new StreamReader(new MemoryStream(bytePage), Encoding.GetEncoding(charset)))
{
htmlPage = reader.ReadToEnd();
Response.Write(htmlPage);
}
}
so, it set UTF-8 for the encoding. But the downloaded title, for example, show in my screen as:
Sexy cover: 60 e più di “quei dischi” vietati ai minori
and not as:
Sexy cover: 60 e più di “quei dischi” vietati ai minori
somethings is wrong, but I don't find where. Any ideas?

The problem is that by the time you get the data it's already been converted.
When WebClient.DownloadString executes, it gets the raw bytes and converts them to a string using the default encoding. The damage is done. You can't take the resulting string, turn it back into bytes, and re-interpret it.
Put another way, this is what's happening:
// WebClient.DownloadString does, essentially, this.
byte[] rawBytes = DownloadData();
string htmlPage = Encoding.Default.GetString(rawBytes);
// Now you're doing this:
byte[] myBytes = Encoding.Utf8.GetBytes(htmlPage);
But myBytes will not necessarily be the same as rawBytes.
If you know what encoding to use beforehand, you can set the WebClient instance's Encoding property. If you want to interpret the string based on the encoding specified in the Content-Type header, then you have to download the raw bytes, determine the encoding, and use that to interpret the string. For example:
var rawBytes = client.DownloadData(HttpUtility.UrlDecode(resoruce_url));
var KeysParsed = HttpUtility.ParseQueryString(client.ResponseHeaders["Content-Type"].Replace(" ", "").Replace(";", "&"));
var charset = ((KeysParsed["charset"] != null) ? KeysParsed["charset"] : "UTF-8");
var theEncoding = Encoding.GetEncoding(charset);
htmlPage = theEncoding.GetString(rawBytes);

Related

Getting empty pdf attachments when trying to create "activitymimeattachment"

I want to attach pdf from url into CRM "activitymimeattachment". Everything works fine but im getting empty pdf attachments (even the size is the same as original pdf). Could it be some problem with Encoding or converting? Maybe somebody could help me with this?
I see that email is created with attachment which is empty, but the size is the same as for original pdf.
Entity attachment = new Entity("activitymimeattachment");
attachment["subject"] = "Attachment";
string fileName = "Invoice.pdf";
attachment["filename"] = fileName;
WebClient wc = new WebClient();
string theTextFile = wc.DownloadString(url);
byte[] fileStream = Encoding.ASCII.GetBytes(theTextFile);
attachment["body"] = Convert.ToBase64String(fileStream);
attachment["mimetype"] = "application/pdf";
attachment["attachmentnumber"] = 1;
attachment["objectid"] = new EntityReference("email", emailguid);
attachment["objecttypecode"] = "email";
service.Create(attachment);
I believe the encoding is issue. The PDF is neither a string nor an ASCII file so these lines are at fault:
string theTextFile = wc.DownloadString(url);
byte[] fileStream = Encoding.ASCII.GetBytes(theTextFile);
I would suggest changing your web client to download a binary file i.e. application/pdf and save the file locally e.g to d:\temp\invoice.pdf.
Then you can do this:
var bytes = File.ReadAllBytes(#"d:\temp\invoice.pdf");
var body = Convert.ToBase64String(bytes);
In short, avoid trying to put the PDF into a string, or using the ASCII encoding to get its byte array. It's a binary file until you convert it to Base64.
Of course you can also probably get your web client to download the file into memory and convert it to Base64 without writing the file locally. I just wanted the simplest example to make the point about the encoding.
Thank You for advice #Aron. I actually have found a solution which is very simnply. I just used another method from WebClient class. The main thing I needed to change DownloadString(url) method into DownloadData(url) method:
WebClient wc = new WebClient();
byte[] theTextFile = wc.DownloadData(url);
attachment["body"] = Convert.ToBase64String(theTextFile);enter code here

Equivalent of C# Encoding.UTF8.GetBytes in php

I have an example in C# and have to write the same in PHP.
request = request.Replace(sign, string.Empty);
byte[] sha1Request;
using (var shaM = new SHA1Managed())
{
sha1Request = shaM.ComputeHash(Encoding.UTF8.GetBytes(request));
}
log.InfoFormat($"request={request}. sha1Request={Convert.ToBase64String(sha1Request)}. Sign={sign}", request, Convert.ToBase64String(sha1Request));
var pubKey = (RSACryptoServiceProvider)FrontInterface.GetCertificate(checkFrontCertificateCod.Value).PublicKey.Key;
var isValid = pubKey.VerifyData(Encoding.UTF8.GetBytes(Convert.ToBase64String(sha1Request)), new SHA1CryptoServiceProvider(), Convert.FromBase64String(sign));
if (!isValid)
{
throw new Exception("Wrong digital sign");
}
So, I may not convert string to bytes in php and line sha1Request = shaM.ComputeHash(Encoding.UTF8.GetBytes(request));
will be in PHP: sha1Request =sha1(request, true);
Am I rigth? If not, please help me to convert in PHP this line.
Thanks a lot.
Note that sha1 should not really be used any more for security relevant applications, it is out of date.
C# Version:
string text = "<Hällo World>";
byte[] sha1;
using (var shaM = new SHA1Managed())
{
sha1 = shaM.ComputeHash(Encoding.UTF8.GetBytes(text));
}
string encoded = Convert.ToBase64String(sha1);
Console.Write(encoded);
PHP Version:
$text = "<Hällo World>";
// Encode as UTF8 if necessary (May not be necessary if string is already utf-8)
$text = utf8_encode($text);
// Calculate SHA1
$sha1 = sha1($text, TRUE);
// Convert to Base64
$encoded = base64_encode($sha1);
echo($encoded);
Both versions should output
1nSiStZRa/quRru7Sqe+ejupqfs=
Note that the call to utf8_encode should only be there if the string you work with is not actually already encoded in utf8.
If the string is a literal in a *.php file, this depends on how the file is stored on the disk. (What character set it uses).
If the string is retrieved from a web request or from a database or from reading a file, this also depends on what character set the web form, the database or the external file use.

Convert image url into image bye in asp.net

for (int i = 0; i < (DataTable)ViewState["Table_RemarksDetails"]).Rows.Count; i++)
{
string url = Image1.ImageUrl;
Byte[] imgByte = GetBytesFromUrl(url);
obj_ICCommon.Userid = username;
obj_ICCommon.Modules = "Transfer-TO";
obj_ICCommon.Invoiceno = TransferNo;
obj_ICCommon.Comments = ((DataTable)ViewState["Table_RemarksDetails"]).Rows[i]["Comments"].ToString();
//obj_ICCommon.Image = imgByte;
string Result = obj_ICCommon.funinsertRemarks();
fun_InsertRemarksImage(TransferNo, imgByte, ((DataTable)ViewState["Table_RemarksDetails"]).Rows[i]["Comments"].ToString());
}
static public byte[] GetBytesFromUrl(string url)
{
byte[] b;
System.Net.HttpWebRequest myReq = (System.Net.HttpWebRequest)System.Net.WebRequest.Create(url);
System.Net.WebResponse myResp = myReq.GetResponse();
Stream stream = myResp.GetResponseStream();
//int i;
using (BinaryReader br = new BinaryReader(stream))
{
//i = (int)(stream.Length);
b = br.ReadBytes(500000);
br.Close();
}
myResp.Close();
return b;
}
This code throws an exception url not recognized. My url is in format like "data://image.png" . What to do ? Anyone lets you know. I am struggle in this past 2 days.
I just want to convert the url into byte
. If my code is wrong or any other way is possible to convert url into byte. Please comment me. Thanks in advance
If you want to turn a data URI into a byte image, your input string is not sufficient. Given the following sample data URI:
""
You would need to:
Trim the data: string from the front.
Split by the ; character to get the image format and encoding type.
If the encoding type is Base64, you would need to use the data obtained in (2) to decode the image string back into a byte array.
If on the other hand, you want to change the image data.png into its corresponding base64 representation, then you would need to make some changes to your code:
Decide what URL's you are going to be reading. If they are web, such as http and https, then your code should be good since you are dealing with web requests. If they are actual file paths, you would need to handle the file URI, so your code would be slightly different.
One that you will have your file, you would need to load it as a byte array.
Create a string with the following: data:image/png;base64,.
Append the base64 representation of the byte array you got in step 2.

AddAttachment from MemoryStream

The SendGrid API docs specify you can add attachments from a Stream. The example it gives uses a FileStream object.
I have some blobs in Azure Storage which I'd like to email as attachments. To achieve this I'm trying to use a MemoryStream:
var getBlob = blobContainer.GetBlobReferenceFromServer(fileUploadLink.Name);
if(getBlob != null)
{
// Get file as a stream
MemoryStream memoryStream = new MemoryStream();
getBlob.DownloadToStream(memoryStream);
emailMessage.AddAttachment(memoryStream, fileUploadLink.Name);
}
emailTransport.Deliver(emailMessage);
It sends fine but when the email arrives, the attachment appears to be there but it's actually empty. Looking at the email source, there is no content for the attachment.
Is using a MemoryStream a known limitation when using the SendGrid C# API to send attachments? Or should I be approaching this in some other way?
You probably just need to reset the stream position back to 0 after you call DownloadToStream:
var getBlob = blobContainer.GetBlobReferenceFromServer(fileUploadLink.Name);
if (getBlob != null)
{
var memoryStream = new MemoryStream();
getBlob.DownloadToStream(memoryStream);
memoryStream.Seek(0,SeekOrigin.Begin); // Reset stream back to beginning
emailMessage.AddAttachment(memoryStream, fileUploadLink.Name);
}
emailTransport.Deliver(emailMessage);
You might want to check who cleans up the stream as well and if they don't you should dispose of it after you've called Deliver().
According to their API, they have implemented void AddAttachment(Stream stream, String name).
You are probably using a MemoryStream which you have written to before. I suggest resetting the position inside the stream to the beginning, like:
memoryStream.Seek(0, SeekOrigin.Begin);
I ended up with the following which fixed the issue for me:
fileByteArray = new byte[getBlob.Properties.Length];
getBlob.DownloadToByteArray(fileByteArray, 0);
attachmentFileStream = new MemoryStream(fileByteArray);
emailMessage.AddAttachment(attachmentFileStream, fileUploadLink.Name);
The thread is a bit old, but I use a varient with NReco PDF converter:
private async Task SendGridasyncBid(string from, string to, string displayName, string subject, **byte[] PDFBody**, string TxtBody, string HtmlBody)
{
...
var myStream = new System.IO.MemoryStream(**PDFBody**);
myStream.Seek(0, SeekOrigin.Begin);
myMessage.AddAttachment(myStream, "NewBid.pdf");
...
}
convert the html to pdf and return it instead of writing it for download...
private byte[] getHTML(newBidViewModel model)
{
string strHtml = ...;
HtmlToPdfConverter pdfConverter = new HtmlToPdfConverter();
pdfConverter.CustomWkHtmlArgs = "--page-size Letter";
var pdfBytes = pdfConverter.GeneratePdf(strHtml);
return **pdfBytes**;
}
I am not sure how efficient this is, but it is working for me and I hope it helps someone else get their attachments figured out.

Get filename while downloading it

We are providing files that are saved in our database and the only way to retrieve them is by going by their id as in:
www.AwesomeURL.com/AwesomeSite.aspx?requestedFileId=23
Everything is working file as I am using the WebClient Class.
There's only one issue that I am facing:
How can I get the real filename?
My code looks like this atm:
WebClient client = new WebClient ();
string url = "www.AwesomeURL.com/AwesomeSite.aspx?requestedFileId=23";
client.DownloadFile(url, "IDontKnowHowToGetTheRealFileNameHere.txt");
All I know is the id.
This does not happen when I try accessing url from the browser where it get's the proper name => DownloadedFile.xls.
What's the proper way to get the correct response?
I had the same problem, and I found this class: System.Net.Mime.ContentDisposition.
using (WebClient client = new WebClient()){
client.OpenRead(url);
string header_contentDisposition = client.ResponseHeaders["content-disposition"];
string filename = new ContentDisposition(header_contentDisposition).FileName;
...do stuff...
}
The class documentation suggests it's intended for email attachments, but it works fine on the server I used to test, and it's really nice to avoid the parsing.
Here is the full code required, assuming the server has applied content-disposition header:
using (WebClient client = new WebClient())
{
using (Stream rawStream = client.OpenRead(url))
{
string fileName = string.Empty;
string contentDisposition = client.ResponseHeaders["content-disposition"];
if (!string.IsNullOrEmpty(contentDisposition))
{
string lookFor = "filename=";
int index = contentDisposition.IndexOf(lookFor, StringComparison.CurrentCultureIgnoreCase);
if (index >= 0)
fileName = contentDisposition.Substring(index + lookFor.Length);
}
if (fileName.Length > 0)
{
using (StreamReader reader = new StreamReader(rawStream))
{
File.WriteAllText(Server.MapPath(fileName), reader.ReadToEnd());
reader.Close();
}
}
rawStream.Close();
}
}
If the server did not set up this header, try debugging and see what ResponseHeaders you do have, one of them will probably contain the name you desire. If the browser show the name, it must come from somewhere.. :)
You need to look at the content-disposition header, via:
string disposition = client.ResponseHeaders["content-disposition"];
a typical example would be:
"attachment; filename=IDontKnowHowToGetTheRealFileNameHere.txt"
I achieve this with the code of wst.
Here is the full code to download the url file in c:\temp folder
public static void DownloadFile(string url)
{
using (WebClient client = new WebClient())
{
client.OpenRead(url);
string header_contentDisposition = client.ResponseHeaders["content-disposition"];
string filename = new ContentDisposition(header_contentDisposition).FileName;
//Start the download and copy the file to the destinationFolder
client.DownloadFile(new Uri(url), #"c:\temp\" + filename);
}
}
You can use HTTP content-disposition header to suggest filenames for the content you are providing:
Content-Disposition: attachment; filename=downloadedfile.xls;
So, in your AwesomeSite.aspx script, you would set the content-disposition header. In your WebClient class you would retrieve that header to save the file as suggested by your AwesomeSite site.
Although the solution proposed by Shadow Wizard works well for text files, I needed to support downloading binary files, such as pictures and executables, in my application.
Here is a small extension to WebClient that does the trick. Download is asynchronous. Also default value for file name is required, because we don't really know if the server would send all the right headers.
static class WebClientExtensions
{
public static async Task<string> DownloadFileToDirectory(this WebClient client, string address, string directory, string defaultFileName)
{
if (!Directory.Exists(directory))
throw new DirectoryNotFoundException("Downloads directory must exist");
string filePath = null;
using (var stream = await client.OpenReadTaskAsync(address))
{
var fileName = TryGetFileNameFromHeaders(client);
if (string.IsNullOrWhiteSpace(fileName))
fileName = defaultFileName;
filePath = Path.Combine(directory, fileName);
await WriteStreamToFile(stream, filePath);
}
return filePath;
}
private static string TryGetFileNameFromHeaders(WebClient client)
{
// content-disposition might contain the suggested file name, typically same as origiinal name on the server
// Originally content-disposition is for email attachments, but web servers also use it.
string contentDisposition = client.ResponseHeaders["content-disposition"];
return string.IsNullOrWhiteSpace(contentDisposition) ?
null :
new ContentDisposition(contentDisposition).FileName;
}
private static async Task WriteStreamToFile(Stream stream, string filePath)
{
// Code below will throw generously, e. g. when we don't have write access, or run out of disk space
using (var outStream = new FileStream(filePath, FileMode.CreateNew))
{
var buffer = new byte[8192];
while (true)
{
int bytesRead = await stream.ReadAsync(buffer, 0, buffer.Length);
if (bytesRead == 0)
break;
// Could use async variant here as well. Probably helpful when downloading to a slow network share or tape. Not my use case.
outStream.Write(buffer, 0, bytesRead);
}
}
}
}
Ok, my turn.
I had a few things in mind when I tried to "download the file":
Use only HttpClient. I had a couple of extension methods over it, and it wasn't desirable to create other extensions for WebClient.
It was mandatory for me also to get a File name.
I had to write the result to MemoryStream but not FileStream.
Solution
So, for me, it turned out to be this code:
// assuming that httpClient created already (including the Authentication cumbersome)
var response = await httpClient.GetAsync(absoluteURL); // call the external API
// reading file name from HTTP headers
var fileName = response.Content.Headers.ContentDisposition.FileNameStar; // also available to read from ".FileName"
// reading file as a byte array
var fileBiteArr = await response.Content
.ReadAsByteArrayAsync()
.ConfigureAwait(false);
var memoryStream = new MemoryStream(fileBiteArr); // memory streamed file
Test
To test that the Stream contains what we have, we can check it by converting it to file:
// getting the "Downloads" folder location, can be anything else
string pathUser = Environment.GetFolderPath(Environment.SpecialFolder.UserProfile);
string downloadPath = Path.Combine(pathUser, "Downloads\\");
using (FileStream file =
new FileStream(
$"{downloadPath}/file.pdf",
FileMode.Create,
FileAccess.Write))
{
byte[] bytes = new byte[memoryStream .Length];
memoryStream.Read(bytes, 0, (int)memoryStream.Length);
file.Write(bytes, 0, bytes.Length);
memoryStream.Close();
}

Categories

Resources