C# Azure AppendBlob AppendBlock adding a file larger than the 4mb limit - c#

I've searched and searched and have not found any examples.
I'm using the Azure.Storage.Blobs nuget packages in C# .NET Core.
Here is an example of my current code that doesn't work.
I get a Status: 413 (The request body is to large and exceeds the maximum permissible limit.)
Searching seems to indicate there is either a 4mb limit or a 100mb limit it's not clear but I think it's 4mb on Append Blobs and 100mb limit on Block Blobs.
var appendBlobClient = containerClient.GetAppendBlobClient(string.Format("{0}/{1}", tenantName, Path.GetFileName(filePath)));
using FileStream uploadFileStream = File.OpenRead(filePath);
appendBlobClient.CreateIfNotExists();
appendBlobClient.AppendBlock(uploadFileStream);
uploadFileStream.Close();
This doesn't work because of the 4mb limit so I need to append 4mb chunks of my file but I've not found examples of the best way to do this.
So what I'm trying to figure out is the best way to upload large files it seems it has to be done in chunks maybe 4mb for append blobs and 100mb for block blobs but the documentation isn't clear and doesn't have examples.

I want to thank #silent for responding since he provided enough info to work out what I needed. Sometimes just having someone to talk through things helps me figure things out.
What I found in on the BlockBlobClient.Upload method it chunks your file stream for you. I believe this to be 100mb blocks from my research. It appears it has a limit of 100mb blocks and 50,000 of them
For AppendBlockClient.AppendBlock it does not chunk your stream for you. It has a limit of 4mb blocks and 50,000 of them.
Here is part of my code that allowed me to upload a 6gb file as a block blob and a 200mb file as an append blob.
BlobServiceClient blobServiceClient = new BlobServiceClient(azureStorageAccountConnectionString);
BlobContainerClient containerClient = blobServiceClient.GetBlobContainerClient(azureStorageAccountContainerName);
containerClient.CreateIfNotExists();
if (appendData)
{
var appendBlobClient = containerClient.GetAppendBlobClient(string.Format("{0}/{1}", tenantName, Path.GetFileName(filePath)));
appendBlobClient.CreateIfNotExists();
var appendBlobMaxAppendBlockBytes = appendBlobClient.AppendBlobMaxAppendBlockBytes;
using (var file = File.OpenRead(filePath))
{
int bytesRead;
var buffer = new byte[appendBlobMaxAppendBlockBytes];
while ((bytesRead = file.Read(buffer, 0, buffer.Length)) > 0)
{
//Stream stream = new MemoryStream(buffer);
var newArray = new Span<byte>(buffer, 0, bytesRead).ToArray();
Stream stream = new MemoryStream(newArray);
stream.Position = 0;
appendBlobClient.AppendBlock(stream);
}
}
}
else
{
var blockBlobClient = containerClient.GetBlockBlobClient(string.Format("{0}/{1}", tenantName, Path.GetFileName(filePath)));
using FileStream uploadFileStream = File.OpenRead(filePath);
blockBlobClient.DeleteIfExists();
blockBlobClient.Upload(uploadFileStream);
uploadFileStream.Close();
}

Related

Uploading media files to Azure File Share over 4 MB corrupts them

I'm trying to upload large files to Azure File Share via the Azure.Storage.Files.Shares library and am I running into corruption issues on all media files (images, PDFs, etc) over ~4 MB. Azure File Share has a limit of 4 MB for a single request which is why I've split the upload in to multiple chunks, but it still corrupts the files despite every chunk upload returning a 201.
Notes:
It doesn't seem like it's an issue with having to write multiple chunks as I can write a 3 MB file in as many chunks as I want and it will be totally fine
.txt files over 4 MB have no issues and display totally fine after uploading
This uploading portion of this function is basically copied/pasted from the only other stack overflow "solution" I found regarding this issue:
public async Task WriteFileFromStream(string fullPath, MemoryStream stream)
{
// Get pieces of path
string dirName = Path.GetDirectoryName(fullPath);
string fileName = Path.GetFileName(fullPath);
ShareClient share = new ShareClient(this.ConnectionString, this.ShareName);
// Set position of the stream to 0 so that we write all contents
stream.Position = 0;
try
{
// Get a directory client for specified directory and create the directory if it doesn't exist
ShareDirectoryClient directory = share.GetDirectoryClient(dirName);
directory.CreateIfNotExists();
if (directory.Exists())
{
// Get file client
ShareFileClient file = directory.GetFileClient(fileName);
// Create file based on stream length
file.Create(stream.Length);
int blockSize = 300 * 1024; // can be anything as long as it doesn't exceed 4194304
long offset = 0; // Define http range offset
BinaryReader reader = new BinaryReader(stream);
while (true)
{
byte[] buffer = reader.ReadBytes(blockSize);
if (buffer.Length == 0)
break;
MemoryStream uploadChunk = new MemoryStream();
uploadChunk.Write(buffer, 0, buffer.Length);
uploadChunk.Position = 0;
HttpRange httpRange = new HttpRange(offset, buffer.Length); // offset -> buffer.Length-1 (inclusive)
var resp = file.UploadRange(httpRange, uploadChunk);
Console.WriteLine($"Wrote bytes {offset}-{offset+(buffer.Length-1)} to {fullPath}. Response: {resp.GetRawResponse()}");
offset += buffer.Length; // Shift the offset by number of bytes already written
}
reader.Close();
}
else
{
throw new Exception($"Failed to create directory: {dirName}");
}
}
catch (Exception e)
{
// Close out memory stream
throw new Exception($"Error occured while writing file from stream: {e.Message}");
}
}
Any help on this is greatly appreciated.

Uploading a File from OneDrive to Azure - issues with file content

We writing an application to move content from an OneDrive account into Azure Storage. We've managed to get this working but ran into memory issues working with big files (> 1GB) and Block Blobs. We've decided that Append Blobs are the best way going forward as that will solve the memory issues.
We're using a RPC call to SharePoint to get the file stream for big files, more info can be found here:
http://sharepointfieldnotes.blogspot.co.za/2009/09/downloading-content-from-sharepoint-let.html
Using the following code is working fine when writing the file from OneDrive to local storage
using (var strOut = System.IO.File.Create("path"))
using (var sr = wReq.GetResponse().GetResponseStream())
{
byte[] buffer = new byte[16 * 1024];
int read;
bool isHtmlRemoved = false;
while ((read = sr.Read(buffer, 0, buffer.Length)) > 0)
{
if (!isHtmlRemoved)
{
string result = Encoding.UTF8.GetString(buffer);
int startPos = result.IndexOf("</html>");
if (startPos > -1)
{
//get the length of the text, '</html>' as well
startPos += 8;
strOut.Write(buffer, startPos, read - startPos);
isHtmlRemoved = true;
}
}
else
{
strOut.Write(buffer, 0, read);
}
}
}
This creates the file with the correct size, but when we try to write it to an append blob in Azure Storage, we are not getting the complete file and in other cases getting bigger files.
using (var sr = wReq.GetResponse().GetResponseStream())
{
byte[] buffer = new byte[16 * 1024];
int read;
bool isHtmlRemoved = false;
while ((read = sr.Read(buffer, 0, buffer.Length)) > 0)
{
if (!isHtmlRemoved)
{
string result = Encoding.UTF8.GetString(buffer);
int startPos = result.IndexOf("</html>");
if (startPos > -1)
{
//get the length of the text, '</html>' as well
startPos += 8;
//strOut.Write(buffer, startPos, read - startPos);
appendBlob.UploadFromByteArray(buffer, startPos, read - startPos);
isHtmlRemoved = true;
}
}
else
{
//strOut.Write(buffer, 0, read);
appendBlob.AppendFromByteArray(buffer, 0, read);
}
}
}
Is this the correct way of doing it? Why would we be getting different file sizes?
Any suggestions will be appreciated
Thanks
In response to "Why would we be getting different file sizes?":
From the CloudAppendBlob.appendFromByteArray documentation
"This API should be used strictly in a single writer scenario
because the API internally uses the append-offset conditional header
to avoid duplicate blocks which does not work in a multiple writer
scenario." If you are indeed using a single writer, you need to
explicitly set the value of
BlobRequestOptions.AbsorbConditionalErrorsOnRetry to true.
You can also check if you are exceeding the 50,000 committed block
limit. Your block sizes are relatively small, so this is a
possibility with sufficiently large files (> 16KB * 50,000 = .82
GB).
In response to "Is this the correct way of doing it?":
If you feel you need to use Append Blobs, try using the CloudAppendBlob.OpenWrite method to achieve functionality more similar to your code example for local storage.
Block Blobs seem like they might be a more appropriate fit for your scenario. Can you please post the code you were using to upload Block Blobs? You should be able to upload to Block Blobs without running out of memory. You can upload different blocks in parallel to achieve faster throughput. Using Append Blobs to append (relatively) small blocks will result in degradation of sequential read performance, as currently append blocks are not defragmented.
Please let me know if any of these solutions work for you!

C# - Downloading from Google Drive in byte chunks

I'm currently developing for an environment that has poor network connectivity. My application helps to automatically download required Google Drive files for users. It works reasonably well for small files (ranging from 40KB to 2MB), but fails far too often for larger files (9MB). I know these file sizes might seem small, but in terms of my client's network environment, Google Drive API constantly fails with the 9MB file.
I've concluded that I need to download files in smaller byte chunks, but I don't see how I can do that with Google Drive API. I've read this over and over again, and I've tried the following code:
// with the Drive File ID, and the appropriate export MIME type, I create the export request
var request = DriveService.Files.Export(fileId, exportMimeType);
// take the message so I can modify it by hand
var message = request.CreateRequest();
var client = request.Service.HttpClient;
// I change the Range headers of both the client, and message
client.DefaultRequestHeaders.Range =
message.Headers.Range =
new System.Net.Http.Headers.RangeHeaderValue(100, 200);
var response = await request.Service.HttpClient.SendAsync(message);
// if status code = 200, copy to local file
if (response.IsSuccessStatusCode)
{
using (var fileStream = new FileStream(downloadFileName, FileMode.CreateNew, FileAccess.ReadWrite))
{
await response.Content.CopyToAsync(fileStream);
}
}
The resultant local file (from fileStream) however, is still full-length (i.e. 40KB file for the 40KB Drive file, and a 500 Internal Server Error for the 9MB file). On a sidenote, I've also experimented with ExportRequest.MediaDownloader.ChunkSize, but from what I observe it only changes the frequency at which the ExportRequest.MediaDownloader.ProgressChanged callback is called (i.e. callback will trigger every 256KB if ChunkSize is set to 256 * 1024).
How can I proceed?
You seemed to be heading in the right direction. From your last comment, the request will update progress based on the chunk size, so your observation was accurate.
Looking into the source code for MediaDownloader in the SDK the following was found (emphasis mine)
The core download logic. We download the media and write it to an
output stream ChunkSize bytes at a time, raising the ProgressChanged
event after each chunk. The chunking behavior is largely a historical
artifact: a previous implementation issued multiple web requests, each
for ChunkSize bytes. Now we do everything in one request, but the API
and client-visible behavior are retained for compatibility.
Your example code will only download one chunk from 100 to 200. Using that approach you would have to keep track of an index and download each chunk manually, copying them to the file stream for each partial download
const int KB = 0x400;
int ChunkSize = 256 * KB; // 256KB;
public async Task ExportFileAsync(string downloadFileName, string fileId, string exportMimeType) {
var exportRequest = driveService.Files.Export(fileId, exportMimeType);
var client = exportRequest.Service.HttpClient;
//you would need to know the file size
var size = await GetFileSize(fileId);
using (var file = new FileStream(downloadFileName, FileMode.CreateNew, FileAccess.ReadWrite)) {
file.SetLength(size);
var chunks = (size / ChunkSize) + 1;
for (long index = 0; index < chunks; index++) {
var request = exportRequest.CreateRequest();
var from = index * ChunkSize;
var to = from + ChunkSize - 1;
request.Headers.Range = new RangeHeaderValue(from, to);
var response = await client.SendAsync(request);
if (response.StatusCode == HttpStatusCode.PartialContent || response.IsSuccessStatusCode) {
using (var stream = await response.Content.ReadAsStreamAsync()) {
file.Seek(from, SeekOrigin.Begin);
await stream.CopyToAsync(file);
}
}
}
}
}
private async Task<long> GetFileSize(string fileId) {
var file = await driveService.Files.Get(fileId).ExecuteAsync();
var size = file.size;
return size;
}
This code makes some assumptions about the drive api/server.
That the server will allow the multiple requests needed to download the file in chunks. Don't know if requests are throttled.
That the server still accepts the Range header like stated in the developer documenation

Raw Stream Has Data, Deflate Returns Zero Bytes

I'm reading data (an adCenter report, as it happens), which is supposed to be zipped. Reading the contents with an ordinary stream, I get a couple thousand bytes of gibberish, so this seems reasonable. So I feed the stream to DeflateStream.
First, it reports "Block length does not match with its complement." A brief search suggests that there is a two-byte prefix, and indeed if I call ReadByte() twice before opening DeflateStream, the exception goes away.
However, DeflateStream now returns nothing at all. I've spent most of the afternoon chasing leads on this, with no luck. Help me, StackOverflow, you're my only hope! Can anyone tell me what I'm missing?
Here's the code. Naturally I only enabled one of the two commented blocks at a time when testing.
_results = new List<string[]>();
using (Stream compressed = response.GetResponseStream())
{
// Skip the zlib prefix, which conflicts with the deflate specification
compressed.ReadByte(); compressed.ReadByte();
// Reports reading 3,000-odd bytes, followed by random characters
/*byte[] buffer = new byte[4096];
int bytesRead = compressed.Read(buffer, 0, 4096);
Console.WriteLine("Read {0} bytes.", bytesRead.ToString("#,##0"));
string content = Encoding.ASCII.GetString(buffer, 0, bytesRead);
Console.WriteLine(content);*/
using (DeflateStream decompressed = new DeflateStream(compressed, CompressionMode.Decompress))
{
// Reports reading 0 bytes, and no output
/*byte[] buffer = new byte[4096];
int bytesRead = decompressed.Read(buffer, 0, 4096);
Console.WriteLine("Read {0} bytes.", bytesRead.ToString("#,##0"));
string content = Encoding.ASCII.GetString(buffer, 0, bytesRead);
Console.WriteLine(content);*/
using (StreamReader reader = new StreamReader(decompressed))
while (reader.EndOfStream == false)
_results.Add(reader.ReadLine().Split('\t'));
}
}
As you can probably guess from the last line, the unzipped content should be TDT.
Just for fun, I tried decompressing with GZipStream, but it reports that the magic number is not correct. MS' docs just say "The downloaded report is compressed by using zip compression. You must unzip the report before you can use its contents."
Here's the code that finally worked. I had to save the content out to a file and read it back in. This does not seem reasonable, but for the small quantities of data I'm working with, it's acceptable, I'll take it!
WebRequest request = HttpWebRequest.Create(reportURL);
WebResponse response = request.GetResponse();
_results = new List<string[]>();
using (Stream compressed = response.GetResponseStream())
{
// Save the content to a temporary location
string zipFilePath = #"\\Server\Folder\adCenter\Temp.zip";
using (StreamWriter file = new StreamWriter(zipFilePath))
{
compressed.CopyTo(file.BaseStream);
file.Flush();
}
// Get the first file from the temporary zip
ZipFile zipFile = ZipFile.Read(zipFilePath);
if (zipFile.Entries.Count > 1) throw new ApplicationException("Found " + zipFile.Entries.Count.ToString("#,##0") + " entries in the report; expected 1.");
ZipEntry report = zipFile[0];
// Extract the data
using (MemoryStream decompressed = new MemoryStream())
{
report.Extract(decompressed);
decompressed.Position = 0; // Note that the stream does NOT start at the beginning
using (StreamReader reader = new StreamReader(decompressed))
while (reader.EndOfStream == false)
_results.Add(reader.ReadLine().Split('\t'));
}
}
You will find that DeflateStream is hugely limited in what data it will decompress. In fact if you are expecting entire files it will be of no use at all.
There are hundereds of (mostly small) variations of ZIP files and DeflateStream will get along only with two or three of them.
Best way is likely to use a dedicated library for reading Zip files/streams like DotNetZip or SharpZipLib (somewhat unmaintained).
You could write the stream to a file and try my tool Precomp on it. If you use it like this:
precomp -c- -v [name of input file]
any ZIP/gZip stream(s) inside the file will be detected and some verbose information will be reported (position and length of the stream). Additionally, if they can be decompressed and recompressed bit-to-bit identical, the output file will contain the decompressed stream(s).
Precomp detects ZIP/gZip (and some other) streams anywhere in the file, so you won't have to worry about header bytes or garbage at the beginning of the file.
If it doesn't detect a stream like this, try to add -slow, which detects deflate streams even if they don't have a ZIP/gZip header. If this fails, you can try -brute which even detects deflate streams that lack the two byte header, but this will be extremely slow and can cause false positives.
After that, you'll know if there is a (valid) deflate stream in the file and if so, the additional information should help you to decompress other reports correctly using zLib decompression routines or similar.

How to avoid C# Azure API from running out of memory for large blob uploads?

I'm trying to uploading very large (>100GB) blobs to Azure using Microsoft.Azure.Storage.Blob (9.4.2). However, it appears that even when using the stream-based blob write API, the library will allocate memory proportional to the size of the file (a 1.2GB test file results in a 2GB process memory footprint). I need this to work in constant memory. My code is below (similar results using UploadFromFile, UploadFromStream, etc.):
var container = new CloudBlobContainer(new Uri(sasToken));
var blob = container.GetBlockBlobReference("test");
const int bufferSize = 64 * 1024 * 1024; // 64MB
blob.StreamWriteSizeInBytes = bufferSize;
using (var writeStream = blob.OpenWrite())
{
using (var readStream = new FileStream(archiveFilePath, FileMode.Open))
{
var buffer = new byte[bufferSize];
var bytesRead = 0;
while ((bytesRead = readStream.Read(buffer, 0, bufferSize)) != 0)
{
writeStream.Write(buffer, 0, bytesRead);
}
}
}
This behavior is pretty baffling - I can see in TaskMgr that the upload indeed starts right away, so it's not like it's buffering things up waiting to send; there is no reason why it needs to hang on to previously sent data. How does anyone use this API for non-trivial blob uploads?
I suggest you take a look at the BlobStorageMultipartStreamProvider sample, as it shows how a request stream can "forwarded" to an Azure Blob stream and this might reduce the amount of memory used at the server side while uploading.
Hope it helps!

Categories

Resources