Download files from the azure data lake - c#

I upload my files in azure data lake. I try to download that file through asp.net mvc application.I have adl path for that file. I can download below 150 MB files. But i can't download the more then 150 MB files. Time out error came.
My Code in the bellow...
public ActionResult Download(string adlpath)
{
String header = adlpath;
Console.WriteLine(header);
string[] splitedStr = header.Split('/');
var path = GenerateDownloadPaths(adlpath);
string filename = path["fileName"];
HttpResponseMessage val = DataDownloadFile(path["fileSrcPath"]);
byte[] filedata = val.Content.ReadAsByteArrayAsync().Result;
string contentType = MimeMapping.GetMimeMapping(filename);
var cd = new System.Net.Mime.ContentDisposition
{
FileName = filename,
Inline = true,
};
Response.AppendHeader("Content-Disposition", cd.ToString());
return File(filedata, contentType);
}
public HttpResponseMessage DataDownloadFile(string srcFilePath)
{
string DownloadUrl = "https://{0}.azuredatalakestore.net/webhdfs/v1/{1}?op=OPEN&read=true";
var fullurl = string.Format(DownloadUrl, _datalakeAccountName, srcFilePath);
using (var client = new HttpClient())
{
client.DefaultRequestHeaders.Authorization = new AuthenticationHeaderValue("Bearer", _accesstoken.access_token);
using (var formData = new MultipartFormDataContent())
{
resp = client.GetAsync(fullurl).Result;
}
}
return resp;
}
Image :

You should modify your code to use async and await. Your implementation blocks while retrieving the file and that is probably what times out:
public async Task<HttpResponseMessage> DataDownloadFile(string srcFilePath)
{
string DownloadUrl = "https://{0}.azuredatalakestore.net/webhdfs/v1/{1}?op=OPEN&read=true";
var fullurl = string.Format(DownloadUrl, _datalakeAccountName, srcFilePath);
using (var client = new HttpClient())
{
client.DefaultRequestHeaders.Authorization = new AuthenticationHeaderValue("Bearer", _accesstoken.access_token);
using (var formData = new MultipartFormDataContent())
{
resp = await client.GetAsync(fullurl);
}
}
return resp;
}
The return value of the method is changed to Task<HttpResponseMessage> and the async modifier is added.
Calling client.GetAsync is changed to use await instead of blocking by retrieving the Result property.
Your code may still timeout. I believe that there is a configurable limit on how long a request can take before it is aborted and if you still get a timeout you should investigate this.

Per my understanding, you could try to increase the HttpClient.Timeout (100 seconds by default) for your HttpClient instance.
HttpClient.Timeout
Gets or sets the timespan to wait before the request times out.
The default value is 100,000 milliseconds (100 seconds).
Moreover, if you host your application via Azure Web App, you may encounter an idle timeout setting of 4 minutes from Azure Load Balancer. You could change the idle timeout setting in Azure VM and Azure Cloud Service. Details you could follow here.

Related

Is it nessessary to verify an uploaded file via checksum?

Lets say, I upload really important files such a contracts via an API with HttpClient in .Net with the following code:
using (var content = new MultipartFormDataContent())
{
foreach (FileInfo fi in inputFiles)
{
content.Add(CreateFileContent(fi));
}
AwaitRateLimit();
var response = await _Client.PostAsync("upload/", content);
response.EnsureSuccessStatusCode();
// deserialize
string responseJson = await response.Content.ReadAsStringAsync();
ClientResponse.Response decodedResponse =
JsonSerializer.Deserialize<ClientResponse.Response>(responseJson);
}
private StreamContent CreateFileContent(FileInfo fileInfo)
{
var fileContent = new StreamContent(fileInfo.OpenRead());
fileContent.Headers.ContentDisposition = new ContentDispositionHeaderValue("form-data")
{
Name = "\"file\"",
FileName = "\"" + fileInfo.Name + "\""
};
return fileContent;
}
Currently I do the following:
Send multiple files via post in one request
Download each file into ram, build sha256checksum of it and compare it against local files sha256checksum
step 1 is really quick (~a couple of seconds) for a large quantity of files
step 2 takes at least 15 minutes, because each file can only be downloaded individually.
Therefore I would know if you consider step 2 nessesary or if the HttpClient will handle that automatically.

404 for a small period of time after uploading file to Azure Blob Storage

I upload attachments to an Azure Blob storage. I also create a SAS-Token when uploading; I retrieve the unique URL, send it to the browser where it is opened immediatly.
When I do this I often (but not always) retrieve a 404 that this resource does not exist. When refreshing the page just a few seconds later it is retrieved correctly.
So it seems that I am "too fast" after uploading the attachment. Is there a way to wait until Azure is ready to serve the attachment? I would have expected awaiting the call would be sufficient to achieve this.
public async void UploadFile(string filename, byte[] filecontent)
{
var containerClient = _blobServiceclient.GetBlobContainerClient("attachments");
var blobClient = containerClient.GetBlobClient(filename);
using (var stream = new MemoryStream(filecontent))
{
await blobClient.UploadAsync(stream, new BlobHttpHeaders { ContentType = GetContentTypeByFilename(filename) });
}
}
public async Task<string> GetLinkForFile(string filename)
{
var containerClient = _blobServiceclient.GetBlobContainerClient("attachments");
var sasBuilder = new BlobSasBuilder()
{
BlobContainerName = containerName,
BlobName = filename,
Resource = "b",
StartsOn = DateTime.UtcNow.AddMinutes(-1),
ExpiresOn = DateTime.UtcNow.AddMinutes(5),
};
// Specify read permissions
sasBuilder.SetPermissions(BlobSasPermissions.Read);
var credentials = new StorageSharedKeyCredential(_blobServiceclient.AccountName, _accountKey);
var sasToken = sasBuilder.ToSasQueryParameters(credentials);
// Construct the full URI, including the SAS token.
UriBuilder fullUri = new UriBuilder()
{
Scheme = "https",
Host = string.Format("{0}.blob.core.windows.net", _blobServiceclient.AccountName),
Path = string.Format("{0}/{1}", containerName, filename),
Query = sasToken.ToString()
};
return fullUri.ToString();
}
public async Task<Document> GetInvoice(byte[] invoiceContent, string invoiceFilename)
{
string filePath = await GetLinkForFile(invoiceFilename);
UploadFile(invoiceFilename, file);
return new Document()
{
Url = filePath
};
}
The method GetInvoice is called by REST and the response (containing the URL) returned to the browser where it is opened.
If you notice, you're not waiting for the upload operation to finish here:
_azureStorageRepository.UploadFile(invoiceFilename, file);
Please change this to:
await _azureStorageRepository.UploadFile(invoiceFilename, file);
And you should not see the 404 error. Azure Blob Storage is strongly consistent.
Also, change the UploadFile method from public async void UploadFile to public async Task UploadFile as mentioned by #Fildor in the comments.

Define header for upload file in API

I am trying to upload excel file to convert it to Json, but i need to passing through API Gateway. I have problem to passing the file from API Gateway.
I try to set header in ContentDisposition, ContentLength and ContentType manually.
using (var client = new HttpClient())
{
using (var Content = new MultipartFormDataContent())
{
var name = Path.GetFileName(postedFile.FileName);
HttpContent content = new StringContent("");
content.Headers.Clear();
content.Headers.ContentDisposition = new ContentDispositionHeaderValue("form-data")
{
Name = name,
FileName = name
};
content.Headers.Add("Content-Length", postedFile.ContentLength.ToString());
content.Headers.ContentType = new System.Net.Http.Headers.MediaTypeHeaderValue("multipart/form-data");
Content.Add(content);
}
HttpResponseMessage reply = new HttpResponseMessage();
reply = await client.GetAsync(#"http://localhost:60897/api/ExceltoJSONConversion");
if (reply.IsSuccessStatusCode)
{
var responseString = await reply.Content.ReadAsStringAsync();
return Json(JsonConvert.DeserializeObject(responseString));
}
}
I have been tried several code but reply always return code 405 MethodNotAllowed.
here my controller where i proceed file
[HttpPost]
[Route("api/ExceltoJSONConversion")]
public IHttpActionResult ExceltoJSONConversion()
{
// Process the file from API Gateway
}
Am i missing something when define Header multipart/form-data? or my code just a mess?
Your API method accepts POST requests only ([HttpPost] attribute).
And in your client you are trying to get API through GET method (client.GetAsync ... ).
Either decorate your API method with [HttpGet] instead of [HttpPost], either change client part to use POST (client.PostAsync ... ).

Asynchronous API call, System.Threading.Tasks.TaskCanceledException: A task was canceled

I'm working on an application I inherited that makes asynchronous calls to an API. The application sends about 60 asynchronous requests to the API, and then retrieves them as they are ready. The API returns the results in the form of a zip archive object. It was using the following (abbreviated) code to retrieve results from the API, but this kept throwing intermittent System.Threading.Tasks.TaskCanceledException errors
HttpResponseMessage response = client.SendAsync(requestMessage).Result;
Stream responseStream = response.Content.ReadAsStreamAsync().Result;
responseStream.Seek(0, 0);
za = new ZipArchive(responseStream, ZipArchiveMode.Read);
So I attempted to fix this by using await, and implemented the following methods, but I'm still getting the same errors. I can check the status of my API requests through a website, so I know they're not being canceled by the API. The requests that fail, fail in less than 5 minutes, so I know it's also not because the timeout value is too low on the HTTPClient. This is my first crack at asynchronous programming so if anyone can help with this, it would be greatly appreciated. Thanks.
public async Task<ZipArchive> GetExport(SourceTable sourceTable)
{
ZipArchive zipArchive = null;
switch (GetResultStatus(sourceTable))
{
case null:
break;
case "Completed":
{
zipArchive = await RetrieveResult(sourceTable);
}
break;
}
return zipArchive;
}
private async Task<ZipArchive> RetrieveResult(SourceTable sourceTable)
{
Export export = sourceTable.exports[0];
ZipArchive za = await RetrieveResultAsync(export);
return za;
}
private async Task<ZipArchive> RetrieveResultAsync(Export export)
{
ZipArchive za = null;
var credentials = new NetworkCredential(userName, password);
HttpClientHandler handler = new HttpClientHandler { Credentials = credentials };
HttpClient client = new HttpClient(handler);
client.Timeout.Add(new TimeSpan(0, 5, 0)); //5 minutes
HttpResponseMessage response = await client.GetAsync(restURL + "file/" + export.FileId);
response.EnsureSuccessStatusCode();
Stream responseStream = await response.Content.ReadAsStreamAsync();
responseStream.Seek(0, 0);
za = new ZipArchive(responseStream, ZipArchiveMode.Read);
return za;
}
UPDATE: After adding some more logging to this code I found out that it was indeed a timeout issue, and that I wasn't setting the timeout value correctly. When setting the value like below, it resolved the issues (of course with setting a higher timeout value than the default)
var credentials = new NetworkCredential(userName, password);
HttpClientHandler handler = new HttpClientHandler { Credentials = credentials };
HttpClient client = new HttpClient(handler);
client.Timeout = TimeSpan.FromMinutes(httpClientTimeout);

Error with a file upload WCF WEB API (Preview 6) : Cannot write more bytes to the buffer than the configured maximum buffer size: 65536

I'm having a real problem with the WCF web api.
I have a simple method that uploads a file and saves to disk. I seem to have set all the right params, but get the above error message when I try to upload a 2mb file.
Server Code:
public static void RegisterRoutes(RouteCollection routes)
{
routes.IgnoreRoute("{resource}.axd/{*pathInfo}");
HttpServiceHostFactory _factory = new HttpServiceHostFactory();
var config = new HttpConfiguration()
{
EnableTestClient = true,
IncludeExceptionDetail = true,
TransferMode = System.ServiceModel.TransferMode.Streamed,
MaxReceivedMessageSize = 4194304,
MaxBufferSize = 4194304
};
_factory.Configuration = config;
RouteTable.Routes.Add(new ServiceRoute("api/docmanage", _factory, typeof(WorksiteManagerApi)));
}
client:
HttpClientHandler httpClientHandler = new HttpClientHandler();
httpClientHandler.MaxRequestContentBufferSize = 4194304;
var byteArray =
Encoding.ASCII.GetBytes(ConnectionSettings.WebUsername + ":" + ConnectionSettings.WebPassword);
HttpClient httpClient = new HttpClient(httpClientHandler);
httpClient.DefaultRequestHeaders.Authorization =
new AuthenticationHeaderValue("Basic", Convert.ToBase64String(byteArray));
httpClient.BaseAddress = new Uri(ConnectionSettings.WebApiBaseUrl);
httpClient.MaxResponseContentBufferSize = 4194304;
...
multipartFormDataContent.Add(new FormUrlEncodedContent(formValues));
multipartFormDataContent.Add(byteArrayContent);
var postTask = httpClient.PostAsync("api/docmanage/UploadFile", multipartFormDataContent);
Then, on the server:
[WebInvoke(Method = "POST")]
public HttpResponseMessage UploadFile(HttpRequestMessage request)
{
// Verify that this is an HTML Form file upload request
if (!request.Content.IsMimeMultipartContent("form-data"))
{
throw new HttpResponseException(HttpStatusCode.UnsupportedMediaType);
}
// Create a stream provider for setting up output streams
MultipartFormDataStreamProvider streamProvider = new MultipartFormDataStreamProvider();
// Read the MIME multipart content using the stream provider we just created.
IEnumerable<HttpContent> bodyparts = request.Content.ReadAsMultipart(streamProvider);
foreach (var part in bodyparts)
{
switch (part.Headers.ContentType.MediaType)
{
case "application/octet-stream":
if (part.Headers.ContentLength.HasValue)
{
// BLOWS UP HERE:
var byteArray = part.ReadAsByteArrayAsync().Result;
if (null == fileName)
{
throw new HttpResponseException(HttpStatusCode.NotAcceptable);
}
else
{
uniqueFileId = Guid.NewGuid().ToString();
string tempFilename = Path.GetTempPath() + #"\" + uniqueFileId + fileName;
using (FileStream fileStream = File.Create(tempFilename, byteArray.Length))
{
fileStream.Write(byteArray, 0, byteArray.Length);
}
}
}
break;
}
}
}
Any ideas? I am using the latest preview of the web api.... I noticed a lot is missing from the support documentation but it seems there is some buffer limit that I can't find out how to specify or is being ignored.....
One of the things that is not made clear in the (lack of) documentation for the HttpContent class is that the default internal buffer is 64K so any content that is not a stream will throw the exception you are seeing once the content exceeds 64Kb.
The way around it is to use the following:
part.LoadIntoBufferAsync(bigEnoughBufferSize).Result();
var byteArray = part.ReadAsByteArrayAsync().Result;
I presume that the 64K limit on the buffer of the HttpContent class is there for preventing to much memory allocation occuring on the server. I wonder if you would be better of passing through the byte array content as a StreamContent? This way it should still work without having to increase the HttpContent buffer size.
Have you set the maxRequestLength in web.config:
<httpRuntime maxRequestLength="10240" />
I struggled with a similar problem with WCF WinAPI for few days, I was trying to post a file of 12Mb and I could not figure out what was going on. My server side was a WCF service hosted in IIS; my problem was not the WCF settings but what Toni mentioned. When hosting in IIS, remember to increase this setting. The MS documentation indicates that the default value is 4Mb, that explains why I could post files of 400Kb.
Hope this helps others with the same sort of trouble.

Categories

Resources