I have created a simple WCF service to prototype file uploading. The service:
[ServiceContract]
public class Service1
{
[OperationContract]
[WebInvoke(Method = "POST", UriTemplate = "/Upload")]
public void Upload(Stream stream)
{
using (FileStream targetStream = new FileStream(#"C:\Test\output.txt", FileMode.Create, FileAccess.Write))
{
stream.CopyTo(targetStream);
}
}
}
It uses webHttpBinding with transferMode set to "Streamed" and maxReceivedMessageSize, maxBufferPoolSize and maxBufferSize all set to 2GB. httpRuntime has maxRequestLength set to 10MB.
The client issues HTTP requests in the following way:
HttpWebRequest request = (HttpWebRequest)HttpWebRequest.Create(#"http://.../Service1.svc/Upload");
request.Method = "POST";
request.SendChunked = true;
request.AllowWriteStreamBuffering = false;
request.ContentType = MediaTypeNames.Application.Octet;
using (FileStream inputStream = new FileStream(#"C:\input.txt", FileMode.Open, FileAccess.Read))
{
using (Stream outputStream = request.GetRequestStream())
{
inputStream.CopyTo(outputStream);
}
}
Now, finally, what's wrong:
When uploading the file 100MB big, the server returns HTTP 400 (Bad request). I've tried to enable WCF tracing, but it shows no error. When I increase httpRuntime.maxRequestLength to 1GB, the file gets uploaded without problems. The MSDN says that maxRequestLength "specifies the limit for the input stream buffering threshold, in KB".
This leads me to believe that the whole file (all 100MB of it) is first stored in "input stream buffer" and only then it is available to my Upload method on server. I can actually see that the size of file on server does not gradually increase (as I would expect), instead, in the moment it is created it is already 100MB big.
The question: How can I get this to work so that the "input stream buffer" is reasonably small (say, 1MB) and when it overflows, my Upload method gets called? In other words, I want the upload to be truly streamed without having to buffer the whole file anywhere.
EDIT:
I now discovered the httpRuntime contains another setting that is relevant here - requestLengthDiskThreshold. It seems that when the input buffer grows beyond this threshold, it is no longer stored in memory, but instead, on filesystem. So at least the whole 100MB big file is not kept in memory (this is what I was most afraid of), however, I still would like to know whether there is some way to avoid this buffer altogether.
If you are using .NET 4 and hosting your service in IIS7+, you may be affected an ASP.NET bug which is described in the following blog post:
http://blogs.microsoft.co.il/blogs/idof/archive/2012/01/17/what-s-new-in-wcf-4-5-improved-streaming-in-iis-hosting.aspx
Basically, for streamed requests, the ASP.NET handler in IIS will buffer the whole request before handing over control to WCF. And this handler obeys the maxRequestLength limit.
As far as I know, there is no workaround for the bug and you have the following options:
upgrade to .NET 4.5
self-host your service instead of using IIS
use a binding that is not based on HTTP, so that the ASP.NET handler is not involved
This may be a bug in the streaming implementation. I found a MSDN article that suggests doing exactly what you are describing at http://social.msdn.microsoft.com/Forums/en-US/wcf/thread/fb9efac5-8b57-417e-9f71-35d48d421eb4/. Unfortunately the Microsoft employee suggesting the fix found a bug in the implementation and didn't follow up with details on a fix.
That said it looks like the implementation is broken which you could test by profiling your code with a memory profiler and verifying whether or not the entire file is being stored in memory. If the entire file is being stored in memory, you won't be able to fix this issue, unless somebody finds a configuration issue with your code.
That said, while using requestLengthDiskThreshold could technically work, it will dramatically increase your write times as each file will have to be written first as temp data, read from temp data, written again as final, and finally the temp data deleted. As you have already said you are dealing with extremely large files so I doubt such a solution is acceptable.
Your best bet is to use a chunking framework and manually reconstruct the file. I found instructions on how to write such logic at http://aspilham.blogspot.com/2011/03/file-uploading-in-chunks-using.html but have not had the time to check it for accuracy.
I'm sorry I can't tell you why your code isn't working as documented, but something similar to the 2nd example should be able to work without ballooning your memory footprint.
Related
Context: I've been trying to find a "proper" solution to upload a file throw an API endpoint and upload the stream from request.Body without using the "IFormFile" so that I will have no data binding or load on the memory(if possible).
I tried https://trailheadtechnology.com/dos-and-donts-for-streaming-file-uploads-to-azure-blob-storage-with-net-mvc/ and this https://learn.microsoft.com/en-us/aspnet/core/mvc/models/file-uploads?view=aspnetcore-6.0#upload-large-files-with-streaming
but I got stuck on issues like this "Multipart body length limit 16384 exceeded" or MultipartReader with currentStream = 0.
Next thing I want to try is to somehow get from request.Body the stream and only send the stream without boundary and those headers.
My question is: What would be a better solution for streaming file upload to AWS.S3 of big files without having big load on memory?
I am working on an application where file uploads happen often, and can be pretty large in size.
Those files are being uploaded to a Web API, which will then get the Stream from the request, and pass it on to my storage service, that then uploads it to Azure Blob Storage.
I need to make sure that:
No temp files are written on the Web API instance
The request stream is not fully read into memory before passing it on to the storage service (to prevent OutOfMemoryExceptions).
I've looked at this article, which describes how to disable input stream buffering, but because many file uploads from many different users happen simultaneously, it's important that it actually does what it says on the tin.
This is what I have in my controller at the moment:
if (this.Request.Content.IsMimeMultipartContent())
{
var provider = new MultipartMemoryStreamProvider();
await this.Request.Content.ReadAsMultipartAsync(provider);
var fileContent = provider.Contents.SingleOrDefault();
if (fileContent == null)
{
throw new ArgumentException("No filename.");
}
var fileName = fileContent.Headers.ContentDisposition.FileName.Replace("\"", string.Empty);
// I need to make sure this stream is ready to be processed by
// the Azure client lib, but not buffered fully, to prevent OoM.
var stream = await fileContent.ReadAsStreamAsync();
}
I don't know how I can reliably test this.
EDIT: I forgot to mention that uploading directly to Blob Storage (circumventing my API) won't work, as I am doing some size checking (e.g. can this user upload 500mb? Has this user used his quota?).
Solved it, with the help of this Gist.
Here's how I am using it, along with a clever "hack" to get the actual file size, without copying the file into memory first. Oh, and it's twice as fast
(obviously).
// Create an instance of our provider.
// See https://gist.github.com/JamesRandall/11088079#file-blobstoragemultipartstreamprovider-cs for implementation.
var provider = new BlobStorageMultipartStreamProvider ();
// This is where the uploading is happening, by writing to the Azure stream
// as the file stream from the request is being read, leaving almost no memory footprint.
await this.Request.Content.ReadAsMultipartAsync(provider);
// We want to know the exact size of the file, but this info is not available to us before
// we've uploaded everything - which has just happened.
// We get the stream from the content (and that stream is the same instance we wrote to).
var stream = await provider.Contents.First().ReadAsStreamAsync();
// Problem: If you try to use stream.Length, you'll get an exception, because BlobWriteStream
// does not support it.
// But this is where we get fancy.
// Position == size, because the file has just been written to it, leaving the
// position at the end of the file.
var sizeInBytes = stream.Position;
Voilá, you got your uploaded file's size, without having to copy the file into your web instance's memory.
As for getting the file length before the file is uploaded, that's not as easy, and I had to resort to some rather non-pleasant methods in order to get just an approximation.
In the BlobStorageMultipartStreamProvider:
var approxSize = parent.Headers.ContentLength.Value - parent.Headers.ToString().Length;
This gives me a pretty close file size, off by a few hundred bytes (depends on the HTTP header I guess). This is good enough for me, as my quota enforcement can accept a few bytes being shaved off.
Just for showing off, here's the memory footprint, reported by the insanely accurate and advanced Performance Tab in Task Manager.
Before - using MemoryStream, reading it into memory before uploading
After - writing directly to Blob Storage
I think a better approach is for you to go directly to Azure Blob Storage from your client. By leveraging the CORS support in Azure Storage you eliminate load on your Web API server resulting in better overall scale for your application.
Basically, you will create a Shared Access Signature (SAS) URL that your client can use to upload the file directly to Azure storage. For security reasons, it is recommended that you limit the time period for which the SAS is valid. Best practices guidance for generating the SAS URL is available here.
For your specific scenario check out this blog from the Azure Storage team where they discuss using CORS and SAS for this exact scenario. There is also a sample application so this should give you everything you need.
My client is uploading file more then 1 GB through application. I know i can only upload only 100mb using asp.net MVC application.
public static byte[] ReadStream(Stream st)
{
st.Position = 0;
byte[] data = new byte[st.Length];
.
.
.
.
}
i am getting error at byte[] data = new byte[st.Length]; because st.Length=1330768612
Error - "Exception of type 'System.OutOfMemoryException' was thrown."
Is there any way i can upload more then 1gb file?
Why we can define maxRequestLength= 0 - 2097151 in webconfig,
IMO you need to use the right tool for the job. Http was simply not intended to transfer large files like this. Why dont you use ftp instead, and maybe you could then build a web interface around that.
The error shown to you suggests the server has not enough memory to process the file in memory. Validate if your server has enough memory to allocate such a big array/file.
You could also try to process chuncks of the stream. The fact that you get an out of memory suggests that the file is sent to the server, but the server cannot process the file.
I really think it has to do with the size of the array you allocate. It just won't fit in the memory of you machine (of in the memory assigned to .NET).
The error says that you run out of memory while trying to allocate a 1GB byte array in memory. This is not related to MVC. You should also note that the memory limit for 32bit processes is 2GB. If your server runs a 32bit OS and you allocate 1GB of that for a single upload you will quickly deplete the available memory.
Instead of trying to read the entire stream in memory, use Stream.Read to read the data in chuncks using a reasonably sized buffer and store the chuncks to a file stream with a Write call. Not only will you avoid OutOfMemoryExceptions, your code will also run much faster, because you won't have to wait to load the entire 1GB before storing it to a file.
The code can be as simple as this:
public static void SaveStream(Stream st,string targetFile)
{
byte[] inBuffer = new byte[10000];
using(FileStream outStream=File.Create(targetFile,20000))
using (BinaryWriter wr = new BinaryWriter(outStream))
{
st.Read(inBuffer, 0, inBuffer.Length);
wr.Write(inBuffer);
}
}
You can tweak the buffer sizes to balance throughput (how quickly you upload and save) vs scalability (how many clients you can handle).
I remember on an older project we had to work out a way to allow the user to upload a 2-4gb file for our ASP.NET web application.
If I recall correctly, we used the 'File Upload Control' and edited the web.config to allow for a greater file size:
<system.web>
<httpRuntime executionTimeout="2400" maxRequestLength="40960" />
</system.web>
Another option would be to use the HttpModule:
http://dotnetslackers.com/Community/blogs/haissam/archive/2008/09/12/upload-large-files-in-asp-net-using-httpmodule.aspx
I would suggest using FTP or write a small desktop application.
HTTP was never intended to send such large files.
Here's a Microsoft Knowledge base answer for you
I have a 5Mb pdf on the server dowloading this file using a writeFile gives me a 15Mb download, where as the transmitfile gives the correct 5Mb filesize...
Is this due to some sort of uncompression into memory on the server for the writeFile? Just wonder if anyone had seen the same thing happening...
(ps only noticed it since we went to iis7??)
code being...
if (File.Exists(filepath))
{
HttpContext.Current.Response.Clear();
HttpContext.Current.Response.ContentType = "application/octet-stream";
HttpContext.Current.Response.AddHeader("content-disposition","attachment;filename=\""+Path.GetFileName(filepath)+"\"");
HttpContext.Current.Response.AddHeader("content-length", new FileInfo(filepath).Length.ToString());
//HttpContext.Current.Response.WriteFile(filepath);
HttpContext.Current.Response.TransmitFile(filepath);
HttpContext.Current.Response.Flush();
HttpContext.Current.Response.Close();
}
TransmitFile - Writes the specified file directly to an HTTP response output stream without buffering it in memory.
WriteFile - Writes the specified file directly to an HTTP response output stream.
I would say the difference occurs because Transmit file doesn't buffer it. Write file is using buffering (Afiak), basically temporarily holding the data before transmitting it, as such it cannot guess the accurate file size because its writing it in chunks.
You can understand by following definition.
Response.TransmitFile VS Response.WriteFile:
TransmitFile: This method sends the file to the client without loading it to the Application memory on the server. It is the ideal way to use it if the file size being download is large.
WriteFile: This method loads the file being download to the server's memory before sending it to the client. If the file size is large, you might the ASPNET worker process might get restarted.*
Reference :- TransmitFile VS WriteFile
So, I'm actually trying to setup a Wopi Host for a Web project.
I've been working with this sample (the one from Shawn Cicoria, if anyone knows this), and he provides a whole code sample which tells you how to build the links to use your Office Web App servers with some files.
My problem here, is that his sample is working with files that are ON the OWA server, and i need it to work with online files (like http://myserv/res/test.docx. So when he reads his file content, he's using this :
var stream = new FileStream(myFile, FileMode.Open, FileAccess.Read);
responseMessage.Content = new StreamContent(stream);
But that ain't working on "http" files, so i changed it with this :
byte[] tmp;
using (WebClient client = new WebClient())
{
client.Credentials = CredentialCache.DefaultNetworkCredentials;
tmp = client.DownloadData(name);
}
responseMessage.Content = new ByteArrayContent(tmp);
which IS compiling. And with this sample, i managed to open excel files in my office web app, but words and powerpoint files aren't opened. So, here's my question.
Is there a difference between theses two methods, which could alter the content of the files that i'm reading, despite the fact that the WebClient alows "online reading" ?
Sorry for the unclear post, it's not that easy to explain such a problem x) I did my best.
Thanks four your help !
Is there a difference between theses two methods, which could alter
the content of the files that i'm reading, despite the fact that the
WebClient allows "online reading"
FileStream open a file handle to a file placed locally on disk, or a remote disk sitting elsewhere inside a network. When you open a FileStream, you're directly manipulating that particular file.
On the other hand, WebClient is a wrapper around the HTTP protocol. It's responsibility is to construct HTTP request and response messages, allowing you to conveniently work with them. It has no direct knowledge of a resources such as a file, or particularly where it's located. All it knows is to construct message complying with the specification, sends a request and expects a response.