C# - Downloading from Google Drive in byte chunks

C# - Downloading from Google Drive in byte chunks - c#

I'm currently developing for an environment that has poor network connectivity. My application helps to automatically download required Google Drive files for users. It works reasonably well for small files (ranging from 40KB to 2MB), but fails far too often for larger files (9MB). I know these file sizes might seem small, but in terms of my client's network environment, Google Drive API constantly fails with the 9MB file.
I've concluded that I need to download files in smaller byte chunks, but I don't see how I can do that with Google Drive API. I've read this over and over again, and I've tried the following code:
// with the Drive File ID, and the appropriate export MIME type, I create the export request
var request = DriveService.Files.Export(fileId, exportMimeType);
// take the message so I can modify it by hand
var message = request.CreateRequest();
var client = request.Service.HttpClient;
// I change the Range headers of both the client, and message
client.DefaultRequestHeaders.Range =
message.Headers.Range =
new System.Net.Http.Headers.RangeHeaderValue(100, 200);
var response = await request.Service.HttpClient.SendAsync(message);
// if status code = 200, copy to local file
if (response.IsSuccessStatusCode)
{
using (var fileStream = new FileStream(downloadFileName, FileMode.CreateNew, FileAccess.ReadWrite))
{
await response.Content.CopyToAsync(fileStream);
}
}
The resultant local file (from fileStream) however, is still full-length (i.e. 40KB file for the 40KB Drive file, and a 500 Internal Server Error for the 9MB file). On a sidenote, I've also experimented with ExportRequest.MediaDownloader.ChunkSize, but from what I observe it only changes the frequency at which the ExportRequest.MediaDownloader.ProgressChanged callback is called (i.e. callback will trigger every 256KB if ChunkSize is set to 256 * 1024).
How can I proceed?

You seemed to be heading in the right direction. From your last comment, the request will update progress based on the chunk size, so your observation was accurate.
Looking into the source code for MediaDownloader in the SDK the following was found (emphasis mine)
The core download logic. We download the media and write it to an
output stream ChunkSize bytes at a time, raising the ProgressChanged
event after each chunk. The chunking behavior is largely a historical
artifact: a previous implementation issued multiple web requests, each
for ChunkSize bytes. Now we do everything in one request, but the API
and client-visible behavior are retained for compatibility.
Your example code will only download one chunk from 100 to 200. Using that approach you would have to keep track of an index and download each chunk manually, copying them to the file stream for each partial download
const int KB = 0x400;
int ChunkSize = 256 * KB; // 256KB;
public async Task ExportFileAsync(string downloadFileName, string fileId, string exportMimeType) {
var exportRequest = driveService.Files.Export(fileId, exportMimeType);
var client = exportRequest.Service.HttpClient;
//you would need to know the file size
var size = await GetFileSize(fileId);
using (var file = new FileStream(downloadFileName, FileMode.CreateNew, FileAccess.ReadWrite)) {
file.SetLength(size);
var chunks = (size / ChunkSize) + 1;
for (long index = 0; index < chunks; index++) {
var request = exportRequest.CreateRequest();
var from = index * ChunkSize;
var to = from + ChunkSize - 1;
request.Headers.Range = new RangeHeaderValue(from, to);
var response = await client.SendAsync(request);
if (response.StatusCode == HttpStatusCode.PartialContent || response.IsSuccessStatusCode) {
using (var stream = await response.Content.ReadAsStreamAsync()) {
file.Seek(from, SeekOrigin.Begin);
await stream.CopyToAsync(file);
}
}
}
}
}
private async Task<long> GetFileSize(string fileId) {
var file = await driveService.Files.Get(fileId).ExecuteAsync();
var size = file.size;
return size;
}
This code makes some assumptions about the drive api/server.
That the server will allow the multiple requests needed to download the file in chunks. Don't know if requests are throttled.
That the server still accepts the Range header like stated in the developer documenation

Related

Extract the file header signature as it is being streamed directly to disk in ASP.NET Core

I have an API method that streams uploaded files directly to disk to be scanned with a virus checker. Some of these files can be quite large, so IFormFile is a no go:
Any single buffered file exceeding 64 KB is moved from memory to a
temp file on disk.
Source: https://learn.microsoft.com/en-us/aspnet/core/mvc/models/file-uploads?view=aspnetcore-3.1
I have a working example that uses multipart/form-data and a really nice NuGet package that takes out the headache when working with multipart/form-data, and it works well, however I want to add a file header signature check, to make sure that the file type defined by the client is actually what they say it is. I can't rely on the file extension to do this securely, but I can use the file header signature to make it at least a bit more secure. Since I'm am streaming directly to disk, how can I extract the first bytes as it's going through the file stream?
[DisableFormValueModelBinding] // required for form binding
[ValidateMimeMultipartContent] // simple check to make sure this is a multipart form
[FileUploadOperation(typeof(SwaggerFileItem))] // used to define the Swagger schema
[RequestSizeLimit(31457280)] // 30MB
[RequestFormLimits(MultipartBodyLengthLimit = 31457280)]
public async Task<IActionResult> PostAsync([FromRoute] int customerId)
{
// place holders
var uploadLocation = string.Empty;
var trustedFileNameForDisplay = string.Empty;
// this is using a nuget package that does the hard work on reading the multipart form-data.... using UploadStream;
var model = await this.StreamFiles<FileItem>(async x =>
{
// never trust the client
trustedFileNameForDisplay = WebUtility.HtmlEncode(Path.GetFileName(x.FileName));
// determien the quarantine location
uploadLocation = GetUploadLocation(trustedFileNameForDisplay);
// stream the input stream to the file stream
// importantly this should never load the file into memory
// it should be a straight pass through to disk
await using var fs = System.IO.File.Create(uploadLocation, BufSize);
// --> How do I extract the file signature? I.e. a copy of the header bytes as it is being streamed??? <--
await x.OpenReadStream().CopyToAsync(fs);
});
// The model state can now be checked
if (!ModelState.IsValid)
{
// delete the file
DeleteFileIfExists(uploadLocation);
// return a bad request
ThrowProblemDetails(ModelState, StatusCodes.Status400BadRequest);
}
// map as much as we can
var request = _mapper.Map<CreateAttachmentRequest>(model);
// map the remaining properties
request.CustomerId = customerId;
request.UploadServer = Environment.MachineName;
request.uploadLocation = uploadLocation;
request.FileName = trustedFileNameForDisplay;
// call mediator with this request to send it over WCF to Pulse Core.
var result = await _mediator.Send(request);
// build response
var response = new FileResponse { Id = result.FileId, CustomerId = customerId, ExternalId = request.ExternalId };
// return the 201 with the appropriate response
return CreatedAtAction(nameof(GetFile), new { fileId = response.Id, customerId = response.customerId }, response);
}
The section I'm stuck on is around the line await x.OpenReadStream().CopyToAsync(fs);. I would like to pull out the file header here as the stream is being copied to the FileStream. Is there a way to add some kind of inspector? I don't want to read the entire stream again, just the header.
Update
Based on the answer given by #Ackdari I have successfully switched the code to extract the header from the uploaded file stream. I don't know if this could be made any more efficient, but it does work:
//...... removed for clarity
var model = await this.StreamFiles<FileItem>(async x =>
{
trustedFileNameForDisplay = WebUtility.HtmlEncode(Path.GetFileName(x.FileName));
quarantineLocation = QuarantineLocation(trustedFileNameForDisplay);
await using (var fs = System.IO.File.Create(quarantineLocation, BufSize))
{
await x.OpenReadStream().CopyToAsync(fs);
fileFormat = await FileHelpers.GetFileFormatFromFileHeader(fs);
}
});
//...... removed for clarity
and
// using https://github.com/AJMitev/FileTypeChecker
public static async Task<IFileType> GetFileFormatFromFileHeader(FileStream fs)
{
IFileType fileFormat = null;
fs.Position = 0;
var headerData = new byte[40];
var bytesRead = await fs.ReadAsync(headerData, 0, 40);
if (bytesRead > 0)
{
await using (var ms = new MemoryStream(headerData))
{
if (!FileTypeValidator.IsTypeRecognizable(ms))
{
return null;
}
fileFormat = FileTypeValidator.GetFileType(ms);
}
}
return fileFormat;
}

You may want to consider reading the header yourself dependent on which file type is expected
int n = 4; // length of header
var headerData = new byte[n];
var bytesRead = 0;
while (bytesRead < n)
bytesRead += await x.ReadAsync(headerData.AsMemory(bytesRead));
CheckHeader(headerData);
await fs.WriteAsync(headerData.AsMemory());
await x.CopyToAsync(fs);

Uploading media files to Azure File Share over 4 MB corrupts them

I'm trying to upload large files to Azure File Share via the Azure.Storage.Files.Shares library and am I running into corruption issues on all media files (images, PDFs, etc) over ~4 MB. Azure File Share has a limit of 4 MB for a single request which is why I've split the upload in to multiple chunks, but it still corrupts the files despite every chunk upload returning a 201.
Notes:
It doesn't seem like it's an issue with having to write multiple chunks as I can write a 3 MB file in as many chunks as I want and it will be totally fine
.txt files over 4 MB have no issues and display totally fine after uploading
This uploading portion of this function is basically copied/pasted from the only other stack overflow "solution" I found regarding this issue:
public async Task WriteFileFromStream(string fullPath, MemoryStream stream)
{
// Get pieces of path
string dirName = Path.GetDirectoryName(fullPath);
string fileName = Path.GetFileName(fullPath);
ShareClient share = new ShareClient(this.ConnectionString, this.ShareName);
// Set position of the stream to 0 so that we write all contents
stream.Position = 0;
try
{
// Get a directory client for specified directory and create the directory if it doesn't exist
ShareDirectoryClient directory = share.GetDirectoryClient(dirName);
directory.CreateIfNotExists();
if (directory.Exists())
{
// Get file client
ShareFileClient file = directory.GetFileClient(fileName);
// Create file based on stream length
file.Create(stream.Length);
int blockSize = 300 * 1024; // can be anything as long as it doesn't exceed 4194304
long offset = 0; // Define http range offset
BinaryReader reader = new BinaryReader(stream);
while (true)
{
byte[] buffer = reader.ReadBytes(blockSize);
if (buffer.Length == 0)
break;
MemoryStream uploadChunk = new MemoryStream();
uploadChunk.Write(buffer, 0, buffer.Length);
uploadChunk.Position = 0;
HttpRange httpRange = new HttpRange(offset, buffer.Length); // offset -> buffer.Length-1 (inclusive)
var resp = file.UploadRange(httpRange, uploadChunk);
Console.WriteLine($"Wrote bytes {offset}-{offset+(buffer.Length-1)} to {fullPath}. Response: {resp.GetRawResponse()}");
offset += buffer.Length; // Shift the offset by number of bytes already written
}
reader.Close();
}
else
{
throw new Exception($"Failed to create directory: {dirName}");
}
}
catch (Exception e)
{
// Close out memory stream
throw new Exception($"Error occured while writing file from stream: {e.Message}");
}
}
Any help on this is greatly appreciated.

"maximum request length exceeded" When uploading a file to Onedrive

I'm using the sample code from "OneDriveApiBrowser" as the base for adding save to one drive support to my app. This makes use of Microsoft.Graph, I can upload small files but larger files (10Mb) will not upload and give an error "maximum request length exceeded". I get the same error in both my app and the sample code with the following line of code:
DriveItem uploadedItem = await graphClient.Drive.Root.ItemWithPath(drivePath).Content.Request().PutAsync<DriveItem>(newStream);
Is there a way to increase the maximum size of file that can be uploaded? If so how?

Graph will only accept small files using PUT-to-content, so you'll want to look into creating an upload session. Since you're using the Graph SDK I'd use this test case as a guide.
Here's some code for completeness - it won't directly compile but it should let you see the steps involved:
var uploadSession = await graphClient.Drive.Root.ItemWithPath("filename.txt").CreateUploadSession().Request().PostAsync();
var maxChunkSize = 320 * 1024; // 320 KB - Change this to your chunk size. 5MB is the default.
var provider = new ChunkedUploadProvider(uploadSession, graphClient, inputStream, maxChunkSize);
// Setup the chunk request necessities
var chunkRequests = provider.GetUploadChunkRequests();
var readBuffer = new byte[maxChunkSize];
var trackedExceptions = new List<Exception>();
DriveItem itemResult = null;
//upload the chunks
foreach (var request in chunkRequests)
{
var result = await provider.GetChunkRequestResponseAsync(request, readBuffer, trackedExceptions);
if (result.UploadSucceeded)
{
itemResult = result.ItemResponse;
}
}

Get actual upload progress on Azure Blob

i know that this has been already asked, but the marked solution is not correct. Usually this article is marked as solution: https://learn.microsoft.com/en-us/archive/blogs/kwill/asynchronous-parallel-blob-transfers-with-progress-change-notification-2-0
It works and give an actual progress, but not the real time progress (and in some cases it gives a totally wrong value). Let me explain:
It gives the progress on the local read buffer, so when i upload something my first "uploaded value" is the read buffer total size. In my case this buffer is 4mb so every file smaller than 4mb results completed in 0 seconds for the progress bar, but it takes the real upload time to complete for real.
Also, if you try to kill your connection just before the upload start it gives as actual progress the first buffer size, so for my 1mb file i get 100% progress while disconnect.
I found another article with another solution, it reads the http response from azure everytime it complete a single block upload, but i need my blocks to be 4mb (since max block count for a single file is 50.000) and its not a perfect solution even with low block size.
The first article overrides the stream class and create a ProgressStream class with an ProgressChanged event that is triggered every time a read is done, there is some way to know the actual uploaded bytes when that ProgressChanged is triggered?

You can do this by using code similar to https://learn.microsoft.com/en-us/archive/blogs/kwill/asynchronous-parallel-block-blob-transfers-with-progress-change-notification (version 1.0 of the blog post you referenced), but instead of calling m_Blob.PutBlock you would instead upload the block with an HTTPWebRequest object and use the progress events from the HTTPWebRequest class. This introduces a lot more code complexity and you would have to add some additional error handling.
The alternative would be to download the Storage Client Library source code from GitHub and modify the block upload methods to track and report progress. The challenge you will face here is that you will have to make these same changes to every new version of the SCL if you plan on staying up to date with the latest fixes.

I must admit I didn't check if everything is as you desired, but here are my 2 cents on uploading with progress indication.
public async Task UploadVideoFilesToBlobStorage(List<VideoUploadModel> videos, CancellationToken cancellationToken)
{
var blobTransferClient = new BlobTransferClient();
//register events
blobTransferClient.TransferProgressChanged += BlobTransferClient_TransferProgressChanged;
//files
_videoCount = _videoCountLeft = videos.Count;
foreach (var video in videos)
{
var blobUri = new Uri(video.SasLocator);
//create the sasCredentials
var sasCredentials = new StorageCredentials(blobUri.Query);
//get the URL without sasCredentials, so only path and filename.
var blobUriBaseFile = new Uri(blobUri.GetComponents(UriComponents.SchemeAndServer | UriComponents.Path,
UriFormat.UriEscaped));
//get the URL without filename (needed for BlobTransferClient (seems to me like a issue)
var blobUriBase = new Uri(blobUriBaseFile.AbsoluteUri.Replace("/"+video.Filename, ""));
var blobClient = new CloudBlobClient(blobUriBaseFile, sasCredentials);
//upload using stream, other overload of UploadBlob forces to put online filename of local filename
using (FileStream fs = new FileStream(video.FilePath, FileMode.Open, FileAccess.Read, FileShare.ReadWrite))
{
await blobTransferClient.UploadBlob(blobUriBase, video.Filename, fs, null, cancellationToken, blobClient,
new NoRetry(), "video/x-msvideo");
}
_videoCountLeft -= 1;
}
blobTransferClient.TransferProgressChanged -= BlobTransferClient_TransferProgressChanged;
}
private void BlobTransferClient_TransferProgressChanged(object sender, BlobTransferProgressChangedEventArgs e)
{
Console.WriteLine("progress, seconds remaining:" + e.TimeRemaining.Seconds);
double bytesTransfered = e.BytesTransferred;
double bytesTotal = e.TotalBytesToTransfer;
double thisProcent = bytesTransfered / bytesTotal;
double procent = thisProcent;
//devide by video amount
int videosUploaded = _videoCount - _videoCountLeft;
if (_videoCountLeft > 0)
{
procent = (thisProcent + videosUploaded) / _videoCount;
}
procent = procent * 100;//to real %
UploadProgressChangedEvent?.Invoke((int)procent, videosUploaded, _videoCount);
}
Actually Microsoft.WindowsAzure.MediaServices.Client.BlobTransferClient should be able to do concurrent uploads but there is no Method for uploading multiple yet it has properties for NumberOfConcurrentTransfers and ParallelTransferThreadCount, not sure how to use this.
There is a bug in this BlobTransferClient, when uploading using the localFile parameter, it will use the filename of that file, while I gave permissions on a specific file name in the SaSLocator.
This example shows how to upload from a client (not on server), so we don't need any CloudMediaContext which is usually the case. Everything about SasLocators can be found here.

Download file in chunks (Windows Phone)

In my application I can download some media files from web. Normally I used WebClient.OpenReadCompleted method to download, decrypt and save the file to IsolatedStorage. It worked well and looked like that:
private void downloadedSong_OpenReadCompleted(object sender, OpenReadCompletedEventArgs e, SomeOtherValues someOtherValues) // delegate, uses additional values
{
// Some preparations
try
{
if (e.Result != null)
{
using (isolatedStorageFile = IsolatedStorageFile.GetUserStoreForApplication())
{
// working with the gained stream, decryption
// saving the decrypted file to isolatedStorage
isolatedStorageFileStream = new IsolatedStorageFileStream("SomeFileNameHere", FileMode.OpenOrCreate, isolatedStorageFile);
// and use it for MediaElement
mediaElement.SetSource(isolatedStorageFileStream);
mediaElement.Position = new TimeSpan(0);
mediaElement.MediaOpened += new RoutedEventHandler(mediaFile_MediaOpened);
// and some other work
}
}
}
catch(Exception ex)
{
// try/catch stuff
}
}
But after some investigation I found out that with large files(for me it's more than 100 MB) I'm getting OutOfMemory exception during downloading this file. I suppose that's because WebClient.OpenReadCompleted loads the whole stream into RAM and chokes... And I will need more memory to decrypt this stream.
After another investigation, I've found how to divide large file into chunks after OpenReadCompleted event at saving this file to IsolatedStorage(or decryption and then saving in my ocasion), but this would help with only a part of problem... The primary problem is how to prevent phone chokes during download process. Is there a way to download large file in chunks? Then I could use the found solution to pass through decryption process. (and still I'd need to find a way to load such big file into mediaElement, but that would be another question)
Answer:
private WebHeaderCollection headers;
private int iterator = 0;
private int delta = 1048576;
private string savedFile = "testFile.mp3";
// some preparations
// Start downloading first piece
using (IsolatedStorageFile isolatedStorageFile = IsolatedStorageFile.GetUserStoreForApplication())
{
if (isolatedStorageFile.FileExists(savedFile))
isolatedStorageFile.DeleteFile(savedFile);
}
headers = new WebHeaderCollection();
headers[HttpRequestHeader.Range] = "bytes=" + iterator.ToString() + '-' + (iterator + delta).ToString();
webClientReadCompleted = new WebClient();
webClientReadCompleted.Headers = headers;
webClientReadCompleted.OpenReadCompleted += downloadedSong_OpenReadCompleted;
webClientReadCompleted.OpenReadAsync(new Uri(song.Link));
// song.Link was given earlier
private void downloadedSong_OpenReadCompleted(object sender, OpenReadCompletedEventArgs e)
{
try
{
if (e.Cancelled == false)
{
if (e.Result != null)
{
((WebClient)sender).OpenReadCompleted -= downloadedSong_OpenReadCompleted;
using (IsolatedStorageFile myIsolatedStorage = IsolatedStorageFile.GetUserStoreForApplication())
{
using (IsolatedStorageFileStream fileStream = new IsolatedStorageFileStream(savedFile, FileMode.Append, FileAccess.Write, myIsolatedStorage))
{
int mediaFileLength = (int)e.Result.Length;
byte[] byteFile = new byte[mediaFileLength];
e.Result.Read(byteFile, 0, byteFile.Length);
fileStream.Write(byteFile, 0, byteFile.Length);
// If there's something left, download it recursively
if (byteFile.Length > delta)
{
iterator = iterator + delta + 1;
headers = new WebHeaderCollection();
headers[HttpRequestHeader.Range] = "bytes=" + iterator.ToString() + '-' + (iterator + delta).ToString();
webClientReadCompleted.Headers = headers;
webClientReadCompleted.OpenReadCompleted += downloadedSong_OpenReadCompleted;
webClientReadCompleted.OpenReadAsync(new Uri(song.Link));
}
}
}
}
}
}

To download a file in chunks you'll need to make multiple requests. One for each chunk.
Unfortunately it's not possible to say "get me this file and return it in chunks of size X";
Assuming that the server supports it, you can use the HTTP Range header to specify which bytes of a file the server should return in response to a request.
You then make multiple requests to get the file in pieces and then put it all back together on the device. You'll probably find it simplest to make sequential calls and start the next one once you've got and verified the previous chunk.
This approach makes it simple to resume a download when the user returns to the app. You just look at how much was downloaded previously and then get the next chunk.
I've written an app which downloads movies (up to 2.6GB) in 64K chunks and then played them back from IsolatedStorage with the MediaPlayerLauncher. Playing via the MediaElement should work too but I haven't verified. You can test this by loading a large file directly into IsolatedStorage (via Isolated Storage Explorer, or similar) and check the memory implications of playing that way.

Confirmed: You can use BackgroundTransferRequest to download multi-GB files but you must set TransferPreferences to None to force the download to happen while connected to an external power supply and while connected to wi-fi, else the BackgroundTransferRequest will fail.
I wonder if it's possible to use a BackgroundTransferRequest to download large files easily and let the phone worry about the implementation details? The documentation seems to suggest that file downloads over 100 MB are possible, and the "Range" verb is reserved for it's own use, so it probably uses this automatically if it can behind the scenes.
From the documentation regarding files over 100 MB:
For files larger than 100 MB, you must set the TransferPreferences
property of the transfer to None or the transfer will fail. If you do
not know the size of a transfer and it is possible that it could
exceed this limit, you should set the value to None, meaning that the
transfer will only proceed when the phone is connected to external
power and has a Wi-Fi connection.
From the documentation regarding use of the "Range" verb:
The Headers property of the BackgroundTransferRequest object is used
to set the HTTP headers for a transfer request. The following headers
are reserved for use by the system and cannot be used by calling
applications. Adding one of the following headers to the Headers
collection will cause a NotSupportedException to be thrown when the
Add(BackgroundTransferRequest) method is used to queue the transfer
request:
If-Modified-Since
If-None-Match
If-Range
Range
Unless-Modified-Since
Here's the documentation:
http://msdn.microsoft.com/en-us/library/windowsphone/develop/hh202955(v=vs.105).aspx

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

C# - Downloading from Google Drive in byte chunks - c#

Related

Extract the file header signature as it is being streamed directly to disk in ASP.NET Core

Uploading media files to Azure File Share over 4 MB corrupts them

"maximum request length exceeded" When uploading a file to Onedrive

Get actual upload progress on Azure Blob

Download file in chunks (Windows Phone)

Categories

Resources