BlobServiceClient takes forever to read files from Blob Storage - c#

I have the following code that retrieves json files from Blob storage. It takes about 2-3 seconds to return 15000 file names.
var blobServiceClient = new BlobServiceClient(connectionString);
BlobContainerClient blobContainer = blobServiceClient .GetBlobContainerClient("ContainerName");
var foundItems = blobServiceClient.FindBlobsByTags("Client = 'AAA' ").ToList();
foreach (var blob in foundItems)
{
myList.Add(blob.BlobName);
}
Now I want to read the content from each file. Files are pretty small( about 500B for each json file), but it takes forever and I have to cancel the request.
var blobServiceClient = new BlobServiceClient(connectionString);
BlobContainerClient blobContainer = blobServiceClient.GetBlobContainerClient("ContainerName");
var foundItems = blobServiceClient.FindBlobsByTags("Client = 'AAA'").ToList();
foreach (var blob in foundItems)
{
var blobClient2 = blobContainer.GetBlockBlobClient(blob.BlobName);
BlobDownloadInfo download = await blobClient2.DownloadAsync();
var content = download.Content;
using (var streamReader = new StreamReader(content))
{
while (!streamReader.EndOfStream)
{
var line = await streamReader.ReadLineAsync();
myList.Add(line);
}
}
}
From my debugging, the following line takes forever. Is this proper way to read files from blob storage?
BlobDownloadInfo download = await blobClient2.DownloadAsync();
More testings:
If I limit the foreach loop to 200 items, it takes about 10 seconds and 90 seconds for 500 items
if (++count == 200) break;
Are there any other ways to read the content from the files?

Related

c# - MailKit imap download attachments in memory/byte[] (No access to the file system)

I need to download attachments from an email, i'm using imap (using MailKit)
I don't have access to the file system, I have to convert it to byte[], since I have to store it in an azure storage.
I have tried this example: http://www.mimekit.net/docs/html/M_MailKit_Net_Imap_ImapFolder_GetBodyPartAsync.htm
But again I don't have access to the file system, I have to convert it to byte[], since I have to store it in an azure storage.
I've also tried this one:
MailKit: How to download all attachments locally from a MimeMessage
But again it is stored locally in the file system.
Here is the code, but as currently writes it to the local file system:
`
await client.Inbox.OpenAsync(FolderAccess.ReadOnly);
var items = await client.Inbox.FetchAsync(new List<UniqueId>() { new UniqueId(uid) }, MessageSummaryItems.UniqueId | MessageSummaryItems.BodyStructure);
foreach (var item in items)
{
var bodyPart = item.TextBody;
foreach (var attachment in item.Attachments)
{
var entity = await client.Inbox.GetBodyPartAsync(item.UniqueId, attachment);
var fileName = attachment.ContentDisposition?.FileName ?? attachment.ContentType.Name;
var directory = #"C:\temp\mails";
if (entity is MessagePart)
{
var rfc822 = (MessagePart)entity;
var path = Path.Combine(directory, fileName);
await rfc822.Message.WriteToAsync(path);
}
else
{
var part = (MimePart)entity;
var path = Path.Combine(directory, fileName);
using (var stream = File.Create(path))
await part.Content.DecodeToAsync(stream);
}
}
}
`
I've tried this, but the file that comes out of this doesn't work
`
var directory = #"C:\temp\mails";
using (var stream = new MemoryStream())
{
if (entity is MessagePart)
{
var rfc822 = (MessagePart)entity;
await rfc822.Message.WriteToAsync(stream);
}
else
{
var part = (MimePart)entity;
await part.Content.DecodeToAsync(stream);
}
//To test if the file is converted, and readable
var byteArr = stream.ToByteArray();
File.WriteAllBytes(Path.Combine(directory, fileName), byteArr);
}
`
Thanks #ckuri, the solution is:
var byteArr = stream.ToArray();

How to zip multiple S3 Objects in a single zip file and move it to another folder in the same bucket using C#

I'm trying to write a lambda function which would zip all the s3 objects present in Download folder in a single zip file and then move that zip file to BatchDownload folder in the same s3 bucket.
ListObjectsRequest downloadS3Object = new ListObjectsRequest
{
BucketName = sample,
Prefix = download
};
ListObjectsResponse downloadResponse = s3Client.ListObjectsAsync(downloadS3Object).Result;
List<string> downloadS3ObjectKeys = downloadResponse.S3Objects.Where(x => !string.IsNullOrEmpty(Path.GetFileName(x.Key)))
.Select(s3Object => s3Object.Key)
.ToList();
foreach (string downloadS3ObjectKey in downloadS3ObjectKeys)
{
ListObjectsRequest checkBatchDownload = new ListObjectsRequest
{
BucketName = sample,
Prefix = batchDownload
};
ListObjectsResponse s3ObjectResponse = s3Client.ListObjectsAsync(checkBatchDownload).Result;
bool IsArchived = false;
if (s3ObjectResponse.S3Objects.Count <= 0)
{
PutObjectRequest createBatchFolder = new PutObjectRequest()
{
BucketName = sample,
Key = batchDownload
};
s3Client.PutObjectAsync(createBatchFolder);
}
In the above code I'm getting all the objects from download folder and then looping through each of the object keys. I don't understand how to zip all the object keys in a single zip file. Is there a better way to do this without getting the object keys separately.
Can you please help with the code to zip all the objects of download folder in a zip file and move that file to a new folder.
I'm not sure why you appear to be calling ListObjects again, as well as just re-uploading the same objects again, but it doesn't seem right.
It seems you want to download all your objects, place them in a zip archive, and re-upload it.
So you need something like the following:
var downloadS3Object = new ListObjectsRequest
{
BucketName = sample,
Prefix = download
};
List<string> downloadS3ObjectKeys;
using (var downloadResponse = await s3Client.ListObjectsAsync(downloadS3Object))
{
downloadS3ObjectKeys = downloadResponse.S3Objects
.Where(x => !string.IsNullOrEmpty(Path.GetFileName(x.Key)))
.Select(s3Object => s3Object.Key)
.ToList();
}
var stream = new MemoryStream();
using (var zip = new ZipArchive(stream, ZipArchiveMode.Update, true))
{
foreach (string downloadS3ObjectKey in downloadS3ObjectKeys)
{
var getObject = new GetObjectRequest
{
BucketName = sample,
Key = downloadS3ObjectKey,
};
var entry = zip.CreateEntry(downloadS3ObjectKey);
using (var zipStream = entry.Open())
using (var objectResponse = await s3Client.GetObjectAsync(getObject))
using (var objectStream = objectResponse.ResponseStream)
{
await objectStream.CopyToAsync(zip);
}
}
}
stream.Position = 0; // reset the memorystream to the beginning
var createBatchFolder = new PutObjectRequest()
{
BucketName = sample,
Key = batchDownload,
InputStream = stream,
};
using (await s3Client.PutObjectAsync(createBatchFolder))
{ //
}
Note the use of using to dispose things, and also do not use .Result as you may deadlock, instead use await.

How to improve performance of downloading large size azure blob file over a stream?

I have JSON blob file of size around 212 MB.
On Local while debugging it is taking around 15 minutes to download.
When i deploy code to Azure app service it runs for 10 minutes and fails with error : (locally it fails intermittently with same error)
Server failed to authenticate the request. Make sure the value of
Authorization header is formed correctly including the signature
Code Attempt 1:
// Create SAS Token for referencing a file for a duration of 5 min
SharedAccessBlobPolicy sasConstraints = new SharedAccessBlobPolicy
{
SharedAccessExpiryTime = DateTime.UtcNow.AddMinutes(15),
Permissions = SharedAccessBlobPermissions.Read
};
var blob = cloudBlobContainer.GetBlockBlobReference(blobFilePath);
string sasContainerToken = blob.GetSharedAccessSignature(sasConstraints);
var cloudBlockBlob = new CloudBlockBlob(new Uri(blob.Uri + sasContainerToken));
using (var stream = new MemoryStream())
{
await cloudBlockBlob.DownloadToStreamAsync(stream);
//resetting stream's position to 0
stream.Position = 0;
var serializer = new JsonSerializer();
using (var sr = new StreamReader(stream))
{
using (var jsonTextReader = new JsonTextReader(sr))
{
jsonTextReader.SupportMultipleContent = true;
result = new List<T>();
while (jsonTextReader.Read())
{
result.Add(serializer.Deserialize<T>(jsonTextReader));
}
}
}
}
Code Attempt 2: I have tried using DownloadRangeToStreamAsync for downloading a blob in chunk but nothing changed :
int bufferLength = 1 * 1024 * 1024;//1 MB chunk
long blobRemainingLength = blob.Properties.Length;
Queue<KeyValuePair<long, long>> queues = new Queue<KeyValuePair<long, long>>();
long offset = 0;
do
{
long chunkLength = (long)Math.Min(bufferLength, blobRemainingLength);
offset += chunkLength;
blobRemainingLength -= chunkLength;
using (var ms = new MemoryStream())
{
await blob.DownloadRangeToStreamAsync(ms, offset, chunkLength);
ms.Position = 0;
lock (outPutStream)
{
outPutStream.Position = offset;
var bytes = ms.ToArray();
outPutStream.Write(bytes, 0, bytes.Length);
}
}
}
while (blobRemainingLength > 0);
I think 212 MB data is not a large JSON file. Can you please suggest a
solution ?
I suggest you can give it a try by using Azure Storage Data Movement Library.
I tested with a larger file of 220MB size, it takes about 5 minutes to download it into memory.
The sample code:
SharedAccessBlobPolicy sasConstraints = new SharedAccessBlobPolicy
{
SharedAccessExpiryTime = DateTime.UtcNow.AddMinutes(15),
Permissions = SharedAccessBlobPermissions.Read
};
CloudBlockBlob blob = blobContainer.GetBlockBlobReference("t100.txt");
string sasContainerToken = blob.GetSharedAccessSignature(sasConstraints);
var cloudBlockBlob = new CloudBlockBlob(new Uri(blob.Uri + sasContainerToken));
var stream = new MemoryStream();
//set this value as per your need
TransferManager.Configurations.ParallelOperations = 5;
Console.WriteLine("begin to download...");
//use Stopwatch to calculate the time
Stopwatch stopwatch = new Stopwatch();
stopwatch.Start();
DownloadOptions options = new DownloadOptions();
options.DisableContentMD5Validation = true;
//use these lines of code just for checking the downloading progress, you can remove it in your code.
SingleTransferContext context = new SingleTransferContext();
context.ProgressHandler = new Progress<TransferStatus>((progress) =>
{
Console.WriteLine("Bytes downloaded: {0}", progress.BytesTransferred);
});
var task = TransferManager.DownloadAsync(cloudBlockBlob, stream,options,context);
task.Wait();
stopwatch.Stop();
Console.WriteLine("the length of the stream is: "+stream.Length);
Console.WriteLine("the time is taken: "+stopwatch.ElapsedMilliseconds);
The test result:

C# migrating files from MongoDB to Azure Blob Storage with "UploadFromStream" method not working

I'm having problems migrating files from MongoDB 2 Azure blob storage.
The next method gets a GridFSFile object (which represents a file in MongoDB GridFSFileStorage) and then calls uploadMemoryStream method to do the upload.
It's worth to mention that gridFSFile does have content after findById and that length dows also has content and that position is initially in 0.
The gridFSFile.Open method creates a Stream object that I then pass as an argument to the upload.
private static void iterateOverVersionCollection(Version version, Asset asset)
{
try
{
string _gridFSId = version.GridFSId;
GridFSFile gridFSFile = gridFSFileStorage.FindById(_gridFSId);
if (gridFSFile == null) return;
string size = version.Name.ToLower();
asset.Size = size;
CloudBlockBlob blockBlob = GetBlockBlobReference(version, gridFSFile, asset);
uploadMemoryStream(blockBlob, gridFSFile, asset);
asset.UploadedOK = true;
}
catch (StorageException ex)
{
asset.UploadedOK = false;
logException(ex, asset);
}
}
private static void uploadMemoryStream(CloudBlockBlob blockBlob, GridFSFile gridFSFile, Asset asset)
{
Stream st = gridFSFile.Open();
blockBlob.UploadFromStream(st);
}
UploadFromStream takes forever and never does the upload, and one thing to mention is that no matter how I work with gridFSFile, if I try to create a MemoryStream with it with Stream.copyTo c# method it is also taking forever and never ending so the app is getting stuck at blockBlob.UploadFromStream(st);
Instead of just passing gridFSFile.Open to UploadFromMemoryStream I've also tried the next piece of code:
using (var stream = new MemoryStream())
{
byte[] buffer = new byte[2048]; // read in chunks of 2KB
int bytesRead;
while((bytesRead = st.Read(buffer, 0, buffer.Length)) > 0)
{
stream.Write(buffer, 0, bytesRead);
}
byte[] result = stream.ToArray();
}
But the same, program gets stuck in st.Read line.
Any help will be much appreciated.
Please note that since UploadFromFileAsync() or UploadFromStream is not a reliable and efficient operation for a huge blob, I'd suggest you to consider following alternatives:
If you can accept command line tool, you can try AzCopy, which is able to transfer Azure Storage data in high performance and its transferring can be paused & resumed.
If you want to control the transferring jobs programmatically, please use Azure Storage Data Movement Library, which is the core of AzCopy.Sample code for the same
string storageConnectionString = "myStorageConnectionString";
CloudStorageAccount account = CloudStorageAccount.Parse(storageConnectionString);
CloudBlobClient blobClient = account.CreateCloudBlobClient();
CloudBlobContainer blobContainer = blobClient.GetContainerReference("mycontainer");
blobContainer.CreateIfNotExistsAsync().Wait();
string sourcePath = #"C:\Tom\TestLargeFile.zip";
CloudBlockBlob destBlob = blobContainer.GetBlockBlobReference("LargeFile.zip");
// Setup the number of the concurrent operations
TransferManager.Configurations.ParallelOperations = 64;
// Setup the transfer context and track the upoload progress
var context = new SingleTransferContext
{
ProgressHandler =
new Progress<TransferStatus>(
progress => { Console.WriteLine("Bytes uploaded: {0}", progress.BytesTransferred); })
};
// Upload a local blob
TransferManager.UploadAsync(sourcePath, destBlob, null, context, CancellationToken.None).Wait();
Console.WriteLine("Upload finished !");
Console.ReadKey();
If you are still looking for uploading file programmatically from stream, i would suggest you to upload it in chunks which is possible using below code
var container = _client.GetContainerReference("test");
container.CreateIfNotExists();
var blob = container.GetBlockBlobReference(file.FileName);
var blockDataList = new Dictionary<string, byte[]>();
using (var stream = file.InputStream)
{
var blockSizeInKB = 1024;
var offset = 0;
var index = 0;
while (offset < stream.Length)
{
var readLength = Math.Min(1024 * blockSizeInKB, (int)stream.Length - offset);
var blockData = new byte[readLength];
offset += stream.Read(blockData, 0, readLength);
blockDataList.Add(Convert.ToBase64String(BitConverter.GetBytes(index)), blockData);
index++;
}
}
Parallel.ForEach(blockDataList, (bi) =>
{
blob.PutBlock(bi.Key, new MemoryStream(bi.Value), null);
});
blob.PutBlockList(blockDataList.Select(b => b.Key).ToArray());
on hte other hand if you have file available in your system and want to use Uploadfile method, we have flexibility in this method too to upload files data in chunks
TimeSpan backOffPeriod = TimeSpan.FromSeconds(2);
int retryCount = 1;
BlobRequestOptions bro = new BlobRequestOptions()
{
SingleBlobUploadThresholdInBytes = 1024 * 1024, //1MB, the minimum
ParallelOperationThreadCount = 1,
RetryPolicy = new ExponentialRetry(backOffPeriod, retryCount),
};
CloudStorageAccount cloudStorageAccount = CloudStorageAccount.Parse(ConnectionString);
CloudBlobClient cloudBlobClient = cloudStorageAccount.CreateCloudBlobClient();
cloudBlobClient.DefaultRequestOptions = bro;
cloudBlobContainer = cloudBlobClient.GetContainerReference(ContainerName);
CloudBlockBlob blob = cloudBlobContainer.GetBlockBlobReference(Path.GetFileName(fileName));
blob.StreamWriteSizeInBytes = 256 * 1024; //256 k
blob.UploadFromFile(fileName, FileMode.Open);
For detailed explanation, please browse
https://www.red-gate.com/simple-talk/cloud/platform-as-a-service/azure-blob-storage-part-4-uploading-large-blobs/
Hope it helps.

How do I unzip a file of size 40 gb in Azure blob store using C#

How do I unzip a file of size 40 GB in Azure blob store using C#? I tried using Sharpziplib and ionic.zip. But I run into errors
Bad state (invalid block type)
. Can anyone please help me out?
Below is my code
var storageAccount1 = CloudStorageAccount.Parse(connectionString1);
var blobClient = storageAccount1.CreateCloudBlobClient();
CloudBlobContainer container = blobClient.GetContainerReference("testwwpoc");
if (share.Exists())
{
Console.WriteLine("Yes");
Console.WriteLine("Yes");
CloudFileDirectory rootDir = share.GetRootDirectoryReference();
var file = container.GetBlobReference("WAV_WW_5988.zip");
// Ensure that the file exists.
if (file.Exists())
{
using (ZipFile zipFile = ZipFile.Read(file.OpenRead()))
{
zipFile.CompressionLevel = Ionic.Zlib.CompressionLevel.Default;
//zipFile.UseZip64WhenSaving = Zip64Option.Always;
zipFile.Encryption = EncryptionAlgorithm.None;
//zipFile.BufferSize = 65536 * 19000;
zipFile.Password = "xyz";
Console.WriteLine(zipFile.Entries.Count);
//var entry = zipFile.Entries.First();
//CloudFileDirectory sampleDir = rootDir.GetDirectoryReference("WAV_WW_5988");
//foreach (var entry in zipFile.Entries)
for (var i = 1; i < zipFile.Count; i++)
{
var blob = container.GetBlockBlobReference("test/" + zipFile[i].FileName);//+ entry.FileName);
Console.WriteLine(zipFile[i].FileName);
Console.WriteLine(zipFile[i].UncompressedSize);
try
{
blob.UploadFromStream(zipFile[i].OpenReader());
}
catch(Exception ex)
{
Console.WriteLine(ex);
}
}
}

Categories

Resources