How to perform atomic batch uploading to Azure Blob Storage? - c#

For my application, I need to upload several blobs to different containers in Azure Blob storage as part of a single transaction; that is, either all the files are uploaded successfully or none of them are, so there is no partial upload if the connection breaks, for instance.
It seems the blob storage API does not support batch uploading, so I need to implement it on my end. I considered using TransactionScope, but according to this post the uploaded blob is not cancelled if an exception is raised. Is there a way I can work around this issue?

There is no specific transaction-management for blobs. You'd have to build something into your app to synchronize your blob writes across multiple blobs (or manage the metadata in an alternate database, accordingly). How you do that is really up to you and your app's design.

Related

Azure blob storage migration with huge data in each container with decryption

I have a requirement to migrate encrypted blobs from source Azure storage account to destination storage account in decrypted format (Key vault key).
I have written C# code but it was taking almost 3 days for single container. I am trying event grid triggered azure function connected to destination storage account on new file captured event and migrating blobs using Azure data factory copy pipeline, azure function is using app service plan which can scale out till 10 instances.
am I on right path? is there any other performant way?
If your Azure function need is only to initiate ADF pipeline, then I guess you can take advantage of event based trigger or you can opt of LogicApp to do the same job for better performance.
Event-driven architecture (EDA) is a popular data integration paradigm that entails event creation, detection, consumption, and response. Data integration situations frequently need users triggering pipelines based on storage account events, such as the arrival or deletion of a file in an Azure Blob Storage account.
Please check below link to know more about event based triggers: Create a trigger that runs a pipeline in response to a storage event | Microsoft Docs
Also, you can consider increasing DTUs/Parallel copy options as well inside copy activity which helps you to improve performance of your copy.
If there is a need to migrate a big amount of data from a data lake or an enterprise data warehouse (EDW) to Azure. Other times, you may need to import huge volumes of data into Azure from several sources for big data analytics. In each scenario, achieving optimal performance and scalability is important.
Please check below link to know more details about : Copy activity performance and scalability guide

How do I detect an azure blob storage file change and delete local cache instance of that file?

I am currently migrating a legacy .net application from a dedicated server to auzre web app. The application uses uses System.Web.Cache CacheDependency for XML file caching.
Caching.CacheDependency(xmlFile) normally detects changes made to the file and updating the cache with the latest version.
The issue is that the files are now being stored in an Azure storage account (ie. not the local file system) and I need a way to detect changes made to the files. The Caching.CacheDependency(xmlFile) will not work in this case as it looks for a local path.
Since the file based CacheDependency does not detect changes to files on the Azure blob storage, how can we make the web app detect changes and remove the stale cache file from the local cache?
I am thinking that a webfunction with a blob trigger will solve the file monitoring part but how do I remove the file from the System.Cache of the web app? I am also concerned about excessive resources being consumed. There are thousands of files.
Has anyone run into this yet and if so, what was your solution.
I had a issue like that.
The solution was create new Endpoint in WebApp. This endpoint just clean the cache. So we built a WebJob with blob storage trigger, then when this trigger occurs, the webjob call the new endpoint by a POST and the cache read the new datas.

Redeploy All Files to Cloud Storage After Implementing within Kentico

I had a Kentico site that I was using at first without any cloud storage. Now that I have switched to Amazon S3 using their documentation (https://docs.kentico.com/display/K8/Configuring+Amazon+S3), I have a lot of files that are still being stored locally. I want to move them into the cloud automatically without having to touch each file.
Is there an easy way to automatically push files in media library, app_theme, attachments, images, etc into the new bucket in the Amazon S3 cloud storage?
It should be possible to move all the files to Amazon S3 storage but it is important to note there are a few downsides in the process as well.
First of all, you could copy all the files you want to move to the corresponding bucket in the storage which should ensure system would look for and retrieve all files from the storage instead of a local file system. However, I would also recommend that you delete the files from your local filesystem afterwards because it might cause some conflicts in specific scenarios.
The downside is the fact that Amazon S3 storage does not support some special characters so you might need to adjust file names manually which would mean the references would need to be changed accordingly.

Azure Instance change deleted files

I am new to windows Azure, I am using 6month subscription plan with 1 VM yesterday instance of my VM changed & my files from root folder are deleted, how can I restore that files and how can I prevennt this in near future.
Azure PAAS instances aren't meant to store anything persistently on local disk since it may/will be wiped regularly when the instances are automatically replaced.
For best performance, you could store the files in blob storage where they can be accessed by any server using your storage account instead of just the single machine.
If you really need persistent file storage, you can attach blob storage as a local disk and store your data there. Note though that the disk will only be accessible by one instance at a time.
As for the files stored on the local file system when the instance was replaced, unless you have a backup, I know of no way to restore them.
This link is a good read regarding various storage options in Azure that gives much more details than this space allows for.
If you are using a virtual machine (IaaS):
Add a data disk and store files there, and not on the OS disk
You are responsible for making backups yourself
If you are using cloud services (PaaS):
Don't store data on the machine itself, use that only for cache, temporary data
Add a data disk and mount it, if it's a single server
Use blob storage if the data is used from multiple hosts

PDF Attachments in Azure, use memory or temporary directory?

I'm planning on writing an application that sends multiple PDFs to the users' emails as attachments.
Should I use memory (MemoryStream) or is there a temporary directory that I can use? Which is more advisable? Thanks!
BTW I'm using C# ASP.NET
I would go with file-system storage, since memory is a more scarce resource. Windows Azure provides Local Storage Resources for this purpose, which are areas of disk that you configure in Service Definition and then access through the Azure SDK at runtime. They are not permanent storage, and will get cleaned up when a role recycles, thus they are ideal for temporary operations such as the one you describe. Although you should still try to clean up the files after each operation to make sure you don't fill up the space.
Full information on Local Storage Resources is here: http://msdn.microsoft.com/en-us/library/windowsazure/ee758708.aspx
A table detailing the amount of disk space available for Local Storage Resources on each instance size is here: http://msdn.microsoft.com/en-us/library/windowsazure/ee814754.aspx
You could use a different pattern. Put the PDFs in blob storage and place a queue message with the e-mail address & list of PDFs to send. Have a separate worker role build & send the e-mail. You could use X-Small or Small. Since this would also allow for asynch communication, you could just use 1 instance. If it can't keep up, spin up a second one via the config file (i.e. no re-deployment). This also has the added benefit of giving your solution more aggregate bandwidth.
If the traffic isn't very heavy, you could just spin up a separate thread (or process) that does the same thing.
Pat

Categories

Resources