How do I download a large file (via HTTP) in .NET? - c#

I need to download a large file (2 GB) over HTTP in a C# console application. Problem is, after about 1.2 GB, the application runs out of memory.
Here's the code I'm using:
WebClient request = new WebClient();
request.Credentials = new NetworkCredential(username, password);
byte[] fileData = request.DownloadData(baseURL + fName);
As you can see... I'm reading the file directly into memory. I'm pretty sure I could solve this if I were to read the data back from HTTP in chunks and write it to a file on disk.
How could I do this?

If you use WebClient.DownloadFile you could save it directly into a file.

The WebClient class is the one for simplified scenarios. Once you get past simple scenarios (and you have), you'll have to fall back a bit and use WebRequest.
With WebRequest, you'll have access to the response stream, and you'll be able to loop over it, reading a bit and writing a bit, until you're done.
From the Microsoft documentation:
We don't recommend that you use WebRequest or its derived classes for
new development. Instead, use the System.Net.Http.HttpClient class.
Source: learn.microsoft.com/WebRequest
Example:
public void MyDownloadFile(Uri url, string outputFilePath)
{
const int BUFFER_SIZE = 16 * 1024;
using (var outputFileStream = File.Create(outputFilePath, BUFFER_SIZE))
{
var req = WebRequest.Create(url);
using (var response = req.GetResponse())
{
using (var responseStream = response.GetResponseStream())
{
var buffer = new byte[BUFFER_SIZE];
int bytesRead;
do
{
bytesRead = responseStream.Read(buffer, 0, BUFFER_SIZE);
outputFileStream.Write(buffer, 0, bytesRead);
} while (bytesRead > 0);
}
}
}
}
Note that if WebClient.DownloadFile works, then I'd call it the best solution. I wrote the above before the "DownloadFile" answer was posted. I also wrote it way too early in the morning, so a grain of salt (and testing) may be required.

You need to get the response stream and then read in blocks, writing each block to a file to allow memory to be reused.
As you have written it, the whole response, all 2GB, needs to be in memory. Even on a 64bit system that will hit the 2GB limit for a single .NET object.
Update: easier option. Get WebClient to do the work for you: with its DownloadFile method which will put the data directly into a file.

WebClient.OpenRead returns a Stream, just use Read to loop over the contents, so the data is not buffered in memory but can be written in blocks to a file.

i would use something like this

The connection can be interrupted, so it is better to download the file in small chunks.
Akka streams can help download file in small chunks from a System.IO.Stream using multithreading. https://getakka.net/articles/intro/what-is-akka.html
The Download method will append the bytes to the file starting with long fileStart. If the file does not exist, fileStart value must be 0.
using Akka.Actor;
using Akka.IO;
using Akka.Streams;
using Akka.Streams.Dsl;
using Akka.Streams.IO;
private static Sink<ByteString, Task<IOResult>> FileSink(string filename)
{
return Flow.Create<ByteString>()
.ToMaterialized(FileIO.ToFile(new FileInfo(filename), FileMode.Append), Keep.Right);
}
private async Task Download(string path, Uri uri, long fileStart)
{
using (var system = ActorSystem.Create("system"))
using (var materializer = system.Materializer())
{
HttpWebRequest request = WebRequest.Create(uri) as HttpWebRequest;
request.AddRange(fileStart);
using (WebResponse response = request.GetResponse())
{
Stream stream = response.GetResponseStream();
await StreamConverters.FromInputStream(() => stream, chunkSize: 1024)
.RunWith(FileSink(path), materializer);
}
}
}

Related

Mvc - How to stream large file in 4k chunks for download

i was following this example, but when download starts it hangs and than after a minute it shows server error. I guess response end before all data id sent to client.
Do you know another way that i can do this or why it's not working?
Writing to Output Stream from Action
private void StreamExport(Stream stream, System.Collections.Generic.IList<byte[]> data)
{
using (BufferedStream bs = new BufferedStream(stream, 256 * 1024))
using (StreamWriter sw = new StreamWriter(bs))
{
foreach (var stuff in data)
{
sw.Write(stuff);
sw.Flush();
}
}
}
Can you show the calling method? What is the Stream being passed in? Is it the Response Stream?
There are many helpful classes to use that you don't have to chuck yourself because they chunk by default. If you use StreamContent there is a constructor overload where you can specify buffer size. I believe default is 10kB.
From memory here so it my not be complete:
[Route("download")]
[HttpGet]
public async Task<HttpResponseMessage> GetFile()
{
var response = this.Request.CreateResponse(HttpStatusCode.OK);
//don't use a using statement around the stream because the framework will dispose StreamContent automatically
var stream = await SomeMethodToGetFileStreamAsync();
//buffer size of 4kB
var content = new StreamContent(stream, 4096);
response.Content = content;
return response;
}

Raw Stream Has Data, Deflate Returns Zero Bytes

I'm reading data (an adCenter report, as it happens), which is supposed to be zipped. Reading the contents with an ordinary stream, I get a couple thousand bytes of gibberish, so this seems reasonable. So I feed the stream to DeflateStream.
First, it reports "Block length does not match with its complement." A brief search suggests that there is a two-byte prefix, and indeed if I call ReadByte() twice before opening DeflateStream, the exception goes away.
However, DeflateStream now returns nothing at all. I've spent most of the afternoon chasing leads on this, with no luck. Help me, StackOverflow, you're my only hope! Can anyone tell me what I'm missing?
Here's the code. Naturally I only enabled one of the two commented blocks at a time when testing.
_results = new List<string[]>();
using (Stream compressed = response.GetResponseStream())
{
// Skip the zlib prefix, which conflicts with the deflate specification
compressed.ReadByte(); compressed.ReadByte();
// Reports reading 3,000-odd bytes, followed by random characters
/*byte[] buffer = new byte[4096];
int bytesRead = compressed.Read(buffer, 0, 4096);
Console.WriteLine("Read {0} bytes.", bytesRead.ToString("#,##0"));
string content = Encoding.ASCII.GetString(buffer, 0, bytesRead);
Console.WriteLine(content);*/
using (DeflateStream decompressed = new DeflateStream(compressed, CompressionMode.Decompress))
{
// Reports reading 0 bytes, and no output
/*byte[] buffer = new byte[4096];
int bytesRead = decompressed.Read(buffer, 0, 4096);
Console.WriteLine("Read {0} bytes.", bytesRead.ToString("#,##0"));
string content = Encoding.ASCII.GetString(buffer, 0, bytesRead);
Console.WriteLine(content);*/
using (StreamReader reader = new StreamReader(decompressed))
while (reader.EndOfStream == false)
_results.Add(reader.ReadLine().Split('\t'));
}
}
As you can probably guess from the last line, the unzipped content should be TDT.
Just for fun, I tried decompressing with GZipStream, but it reports that the magic number is not correct. MS' docs just say "The downloaded report is compressed by using zip compression. You must unzip the report before you can use its contents."
Here's the code that finally worked. I had to save the content out to a file and read it back in. This does not seem reasonable, but for the small quantities of data I'm working with, it's acceptable, I'll take it!
WebRequest request = HttpWebRequest.Create(reportURL);
WebResponse response = request.GetResponse();
_results = new List<string[]>();
using (Stream compressed = response.GetResponseStream())
{
// Save the content to a temporary location
string zipFilePath = #"\\Server\Folder\adCenter\Temp.zip";
using (StreamWriter file = new StreamWriter(zipFilePath))
{
compressed.CopyTo(file.BaseStream);
file.Flush();
}
// Get the first file from the temporary zip
ZipFile zipFile = ZipFile.Read(zipFilePath);
if (zipFile.Entries.Count > 1) throw new ApplicationException("Found " + zipFile.Entries.Count.ToString("#,##0") + " entries in the report; expected 1.");
ZipEntry report = zipFile[0];
// Extract the data
using (MemoryStream decompressed = new MemoryStream())
{
report.Extract(decompressed);
decompressed.Position = 0; // Note that the stream does NOT start at the beginning
using (StreamReader reader = new StreamReader(decompressed))
while (reader.EndOfStream == false)
_results.Add(reader.ReadLine().Split('\t'));
}
}
You will find that DeflateStream is hugely limited in what data it will decompress. In fact if you are expecting entire files it will be of no use at all.
There are hundereds of (mostly small) variations of ZIP files and DeflateStream will get along only with two or three of them.
Best way is likely to use a dedicated library for reading Zip files/streams like DotNetZip or SharpZipLib (somewhat unmaintained).
You could write the stream to a file and try my tool Precomp on it. If you use it like this:
precomp -c- -v [name of input file]
any ZIP/gZip stream(s) inside the file will be detected and some verbose information will be reported (position and length of the stream). Additionally, if they can be decompressed and recompressed bit-to-bit identical, the output file will contain the decompressed stream(s).
Precomp detects ZIP/gZip (and some other) streams anywhere in the file, so you won't have to worry about header bytes or garbage at the beginning of the file.
If it doesn't detect a stream like this, try to add -slow, which detects deflate streams even if they don't have a ZIP/gZip header. If this fails, you can try -brute which even detects deflate streams that lack the two byte header, but this will be extremely slow and can cause false positives.
After that, you'll know if there is a (valid) deflate stream in the file and if so, the additional information should help you to decompress other reports correctly using zLib decompression routines or similar.

Preserving binary data in streams

Using C#, I was surprised how complicated it seemed to preserve binary info from a stream. I'm trying to download a PNG datafile using the WebRequest class, but just transfering the resulting Stream to a file, without corrupting it was more verbose than I thought. First, just using StreamReader and StreamWriter was no good as the ReadToEnd() function returns a string, which effectivly doubles the size of the PNG file (probably due to the UTF conversion)
So my question is, do I really have to write all this code, or is there a cleaner way of doing it?
Stream srBytes = webResponse.GetResponseStream();
// Write to file
Stream swBytes = new FileStream("map(" + i.ToString() + ").png",FileMode.Create,FileAccess.Write);
int count = 0;
byte[] buffer = new byte[4096];
do
{
count = srBytes.Read(buffer, 0, buffer.Length);
swBytes.Write(buffer, 0, count);
}
while (count != 0);
swBytes.Close();
Using StreamReader/StreamWriter is definitely a mistake, yes - because that's trying to load the file as text, which it's not.
Options:
Use WebClient.DownloadFile as SLaks suggested
In .NET 4, use Stream.CopyTo(Stream) to copy the data in much the same way as you've got here
Otherwise, write your own utility method to do the copying, then you only need to do it once; you could even write this as an extension method, which means when you upgrade to .NET 4 you can just get rid of the utility method and use the built-in one with no change to the calling code:
public static class StreamExtensions
{
public static void CopyTo(this Stream source, Stream destination)
{
if (source == null)
{
throw new ArgumentNullException("source");
}
if (destination == null)
{
throw new ArgumentNullException("destination");
}
byte[] buffer = new byte[8192];
int bytesRead;
while ((bytesRead = source.Read(buffer, 0, buffer.Length)) > 0)
{
destination.Write(buffer, 0, bytesRead);
}
}
}
Note that you should be using using statements for the web response, response stream and output stream in order to make sure they're always closed appropriately, like this:
using (WebResponse response = request.GetResponse())
using (Stream responseStream = response.GetResponseStream())
using (Stream outputStream = File.OpenWrite("map(" + i + ").png"))
{
responseStream.CopyTo(outputStream);
}
You can call WebClient.DownloadFile(url, localPath).
In .Net 4.0, you can simplify your current code by calling Stream.CopyTo.

Getting a Stream from an absolute path?

I have this method:
public RasImage Load(Stream stream);
if I want to load a url like:
string _url = "http://localhost/Application1/Images/Icons/hand.jpg";
How can I make this url in to a stream and pass it into my load method?
Here's one way. I don't really know if it's the best way or not, but it works.
// requires System.Net namespace
WebRequest request = WebRequest.Create(_url);
using (var response = request.GetRespone())
using (var stream = response.GetResponseStream())
{
RasImage image = Load(stream);
}
UPDATE: It looks like in Silverlight, the WebRequest class has no GetResponse method; you've no choice but to do this asynchronously.
Below is some sample code illustrating how you might go about this. (I warn you: I wrote this just now, without putting much thought into how sensible it is. How you choose to implement this functionality would likely be quite different. Anyway, this should at least give you a general idea of what you need to do.)
WebRequest request = WebRequest.Create(_url);
IAsyncResult getResponseResult = request.BeginGetResponse(
result =>
{
using (var response = request.EndGetResponse(result))
using (var stream = response.GetResponseStream())
{
RasImage image = Load(stream);
// Do something with image.
}
},
null
);
Console.WriteLine("Waiting for response from '{0}'...", _url);
getResponseResult.AsyncWaitHandle.WaitOne();
Console.WriteLine("The stream has been loaded. Press Enter to quit.");
Console.ReadLine();
Dan's answer is a good one, though you're requesting from localhost. Is this a file you can access from the filesystem? If so, I think you should be able to just pass in a FileStream:
FileStream stream = new FileStream(#"\path\to\file", FileMode.Open);

Reading "chunked" response with HttpWebResponse

I'm having trouble reading a "chunked" response when using a StreamReader to read the stream returned by GetResponseStream() of a HttpWebResponse:
// response is an HttpWebResponse
StreamReader reader = new StreamReader(response.GetResponseStream());
string output = reader.ReadToEnd(); // throws exception...
When the reader.ReadToEnd() method is called I'm getting the following System.IO.IOException: Unable to read data from the transport connection: The connection was closed.
The above code works just fine when server returns a "non-chunked" response.
The only way I've been able to get it to work is to use HTTP/1.0 for the initial request (instead of HTTP/1.1, the default) but this seems like a lame work-around.
Any ideas?
#Chuck
Your solution works pretty good. It still throws the same IOExeception on the last Read(). But after inspecting the contents of the StringBuilder it looks like all the data has been received. So perhaps I just need to wrap the Read() in a try-catch and swallow the "error".
Haven't tried it this with a "chunked" response but would something like this work?
StringBuilder sb = new StringBuilder();
Byte[] buf = new byte[8192];
Stream resStream = response.GetResponseStream();
string tmpString = null;
int count = 0;
do
{
count = resStream.Read(buf, 0, buf.Length);
if(count != 0)
{
tmpString = Encoding.ASCII.GetString(buf, 0, count);
sb.Append(tmpString);
}
}while (count > 0);
I am working on a similar problem. The .net HttpWebRequest and HttpWebRequest handle cookies and redirects automatically but they do not handle chunked content on the response body automatically.
This is perhaps because chunked content may contain more than simple data (i.e.: chunk names, trailing headers).
Simply reading the stream and ignoring the EOF exception will not work as the stream contains more than the desired content. The stream will contain chunks and each chunk begins by declaring its size. If the stream is simply read from beginning to end the final data will contain the chunk meta-data (and in case where it is gziped content it will fail the CRC check when decompressing).
To solve the problem it is necessary to manually parse the stream, removing the chunk size from each chunk (as well as the CR LF delimitors), detecting the final chunk and keeping only the chunk data. There likely is a library out there somewhere that does this, I have not found it yet.
Usefull resources :
http://en.wikipedia.org/wiki/Chunked_transfer_encoding
https://www.rfc-editor.org/rfc/rfc2616#section-3.6.1
I've had the same problem (which is how I ended up here :-). Eventually tracked it down to the fact that the chunked stream wasn't valid - the final zero length chunk was missing. I came up with the following code which handles both valid and invalid chunked streams.
using (StreamReader sr = new StreamReader(response.GetResponseStream(), Encoding.UTF8))
{
StringBuilder sb = new StringBuilder();
try
{
while (!sr.EndOfStream)
{
sb.Append((char)sr.Read());
}
}
catch (System.IO.IOException)
{ }
string content = sb.ToString();
}
After trying a lot of snippets from StackOverflow and Google, ultimately I found this to work the best (assuming you know the data a UTF8 string, if not, you can just keep the byte array and process appropriately):
byte[] data;
var responseStream = response.GetResponseStream();
var reader = new StreamReader(responseStream, Encoding.UTF8);
data = Encoding.UTF8.GetBytes(reader.ReadToEnd());
return Encoding.Default.GetString(data.ToArray());
I found other variations work most of the time, but occasionally truncate the data. I got this snippet from:
https://social.msdn.microsoft.com/Forums/en-US/4f28d99d-9794-434b-8b78-7f9245c099c4/problems-with-httpwebrequest-and-transferencoding-chunked?forum=ncl
It is funny. During playing with the request header and removing "Accept-Encoding: gzip,deflate" the server in my usecase did answer in a plain ascii manner and no longer with chunked, encoded snippets. Maybe you should give it a try and keep "Accept-Encoding: gzip,deflate" away. The idea came while reading the upper mentioned wiki in topic about using compression.

Categories

Resources