I'm attempting to migrate fairly complex application to Windows Azure. In both the worker role and web role there are many instances where the application saves files to a local file system.
Here's an example:
string thumbnailFileName = System.IO.Path.GetDirectoryName(fileName) + "\\" + "bthumb_" + System.IO.Path.GetFileName(fileName);
thumbnail.Save(thumbnailFileName);
and another example:
using (System.IO.StreamWriter file = System.IO.File.AppendText(GetCurrentLogFilePath()))
{
string logEntry = String.Format("\r\n{0} - {1}: {2}", DateTime.Now.ToString("yyyy.MM.dd#HH.mm.ss"), type.ToString(), message);
file.Write(logEntry);
file.Close();
}
In these examples we are saving images and log files to file locations specified in the app.config. Here's an example:
<add key="ImageFileDirectory" value="C:\temp\foo\root\auth\inventorypictures"/>
I'd like to make as few code changes as possible to support Azure blob storage in case we ever decide to move back to a more traditional hosting environment and more generally to reduce the potential for creating unintended problems.
Based on this post I've decided that Azure Drive is not the best way to go.
Can someone guide me in the right direction (ideally with an example)? The best solution in my mind would be one that only requires a change to my config file. But I'm guessing that is not realistic.
Indeed, you want to use Azure Blob storage to save your files.
As for your coding question, consider creating an interface, call it IFileStore:
public interface IFileStore
{
void Save(string filePath, byte [] contents);
byte [] Read(string filePath);
}
Then you create 2 provider classes, one for the file system, and one for Azure Blob storage.
The file system provider can implement the save function like this:
public void Save(string filePath, byte [] content)
{
File.WriteAllBytes(filePath, content);
}
public byte [] Read(string filePath)
{
return File.ReadAllBytes(filePath);
}
As for the Azure Blob provider, you will have to derive the storage path based on the filePath passed in to you.
Generally
For your storage, I'd recommend using Blob and Table storage - this allows multiple instances to access the storage simultaneously. If you want to assist with making the code portable, then I'd recommend abstracting your code behind interfaces/APIs (see #Philpp's answer).
e.g.
for your log file example, then table storage might be the best thing to use
for your image files, then blob storage might be the best thing to use
If you really want to use AzureDrive
I'd only recommend using AzureDrive if you are only ever going to deploy a single instance of your role - otherwise you will end up fighting problems with sharing files across multiple instances (and remember that only 1 instance can mount with write access at any one time)
If you are operating with a single instance, and if you are only storing temp and log files, then you could also look at using local storage instead of azure drive - it's much simpler and cheaper to use than blob storage. e.g. One possible specialist alternative for your log file example is that you could consider using local storage alongside Azure Diagnostics-controlled uploading of that local storage to Blob storage.
Related
I was looking for resources on how to create a simple background service using C# that checks a specific folder for FLAC files and sends them to a GCP bucket, once the file is uploaded successfully the file is erased or moved to another folder. Where can I find something to read about this kind of thing?
To move a file to another location using c# you can use the move method. The Move method moves an existing file to a new location with the same or a different file name.The Move method moves an existing file to a new location with the same or a different file name in File Move. The Move method takes two parameters. The Move method deletes the original file. The method that renames files is called File.Move
Example:
{
File.Move(sourceFile, destinationFile);
}
catch (IOException iox)
{
Console.WriteLine(iox.Message);
}
If you need more examples about File.Move method please follow this link
Adding to that, you can use the Directory.GetFiles method to select the file extension, like in the example below.
This is the original thread where the example was posted
Example:
//Assume user types .txt into textbox
string fileExtension = "*" + textbox1.Text;
string[] txtFiles = Directory.GetFiles("Source Path", fileExtension);
foreach (var item in txtFiles)
{
File.Move(item, Path.Combine("Destination Directory", Path.GetFileName(item)));
}
If you want to know more about Directory.GetFiles method follow this link
And concerning GCP,using Cloud Storage Transfer Service you can move or backup data to a Cloud Storage bucket either from other cloud storage providers or from your on-premises storage. Storage Transfer Service provides options that make data transfers and synchronization easier. For example, you can:
Schedule one-time transfer operations or recurring transfer
operations.
Delete existing objects in the destination bucket if they do not have
a corresponding object in the source.
Delete data source objects after transferring them.
Schedule periodic synchronization from a data source to a data sink
with advanced filters based on file creation dates, file-names, and
the times of day you prefer to import data.
If you want to know more about GCP Cloud Storage Transfer Service follow this link
If you want to know more about how to create storage buckets follow this link
I'm trying to figure out how to backup videos produced by Azure Media Services.
Where are the assets and streaming locators stored, how to backup them or recreate them for existing binary files stored in the Azure Media Service's blob storage?
Proposed solution:
I've come up with a solution, once the video is processed by transformation job, the app will create a copy of the container to separate backup blob storage.
Since, from my understanding, the data produced by transformation jobs are immutable, I don't have to manage another synchronization.
if (job.State == JobState.Finished)
{
StreamingLocator locator = await AzureMediaServicesService.CreateStreamingLocatorAsync(client, azureMediaServicesConfig, outputAssetName, locatorName);
var videoUrls = await AzureMediaServicesService.GetVideoUrlsAsync(client, azureMediaServicesConfig, locator.Name);
// backup blobs in creted container here
}
Are only the binary data stored in blob storage sufficient for restoring the videos successfully? After restore, will the already existing streaming and download links work properly?
Since, when I'm creating locators, I'm passing the asset name as well, I reckon I should backup asset's data too. Can/should I somehow backup assets and locators? Where are they stored? Is there any better way to backup videos?
I was looking for the answers here:
https://learn.microsoft.com/en-us/azure/media-services/latest/streaming-locators-concept
https://learn.microsoft.com/en-us/azure/media-services/latest/stream-files-tutorial-with-api#get-a-streaming-locator
https://learn.microsoft.com/en-us/azure/media-services/latest/limits-quotas-constraints
Part of what you're asking is 'What is an asset in Media Services?'. The Storage container that is created as part of the encoding process is definitely a good portion of what you need to backup. Technically that is all you need to recreate an asset from the backup Storage account. Well, if you don't mind recreating the other aspects of the asset.
An asset is/can be several things:
The Storage container and the contents of that container. These would include the MP4 video files, the manifests (.ism and .ismc), and metadata XML files.
The published locator or URL where clients make GET requests to the streaming endpoint.
Metadata. This includes things like the asset name, creation date, description, etc.
If you keep track of the Storage container in your backup and what metadata is associated with it as well as have a way of updating your site with a new streaming locator then all you really need is the Storage container for recreating the asset.
I am working on an application where file uploads happen often, and can be pretty large in size.
Those files are being uploaded to a Web API, which will then get the Stream from the request, and pass it on to my storage service, that then uploads it to Azure Blob Storage.
I need to make sure that:
No temp files are written on the Web API instance
The request stream is not fully read into memory before passing it on to the storage service (to prevent OutOfMemoryExceptions).
I've looked at this article, which describes how to disable input stream buffering, but because many file uploads from many different users happen simultaneously, it's important that it actually does what it says on the tin.
This is what I have in my controller at the moment:
if (this.Request.Content.IsMimeMultipartContent())
{
var provider = new MultipartMemoryStreamProvider();
await this.Request.Content.ReadAsMultipartAsync(provider);
var fileContent = provider.Contents.SingleOrDefault();
if (fileContent == null)
{
throw new ArgumentException("No filename.");
}
var fileName = fileContent.Headers.ContentDisposition.FileName.Replace("\"", string.Empty);
// I need to make sure this stream is ready to be processed by
// the Azure client lib, but not buffered fully, to prevent OoM.
var stream = await fileContent.ReadAsStreamAsync();
}
I don't know how I can reliably test this.
EDIT: I forgot to mention that uploading directly to Blob Storage (circumventing my API) won't work, as I am doing some size checking (e.g. can this user upload 500mb? Has this user used his quota?).
Solved it, with the help of this Gist.
Here's how I am using it, along with a clever "hack" to get the actual file size, without copying the file into memory first. Oh, and it's twice as fast
(obviously).
// Create an instance of our provider.
// See https://gist.github.com/JamesRandall/11088079#file-blobstoragemultipartstreamprovider-cs for implementation.
var provider = new BlobStorageMultipartStreamProvider ();
// This is where the uploading is happening, by writing to the Azure stream
// as the file stream from the request is being read, leaving almost no memory footprint.
await this.Request.Content.ReadAsMultipartAsync(provider);
// We want to know the exact size of the file, but this info is not available to us before
// we've uploaded everything - which has just happened.
// We get the stream from the content (and that stream is the same instance we wrote to).
var stream = await provider.Contents.First().ReadAsStreamAsync();
// Problem: If you try to use stream.Length, you'll get an exception, because BlobWriteStream
// does not support it.
// But this is where we get fancy.
// Position == size, because the file has just been written to it, leaving the
// position at the end of the file.
var sizeInBytes = stream.Position;
Voilá, you got your uploaded file's size, without having to copy the file into your web instance's memory.
As for getting the file length before the file is uploaded, that's not as easy, and I had to resort to some rather non-pleasant methods in order to get just an approximation.
In the BlobStorageMultipartStreamProvider:
var approxSize = parent.Headers.ContentLength.Value - parent.Headers.ToString().Length;
This gives me a pretty close file size, off by a few hundred bytes (depends on the HTTP header I guess). This is good enough for me, as my quota enforcement can accept a few bytes being shaved off.
Just for showing off, here's the memory footprint, reported by the insanely accurate and advanced Performance Tab in Task Manager.
Before - using MemoryStream, reading it into memory before uploading
After - writing directly to Blob Storage
I think a better approach is for you to go directly to Azure Blob Storage from your client. By leveraging the CORS support in Azure Storage you eliminate load on your Web API server resulting in better overall scale for your application.
Basically, you will create a Shared Access Signature (SAS) URL that your client can use to upload the file directly to Azure storage. For security reasons, it is recommended that you limit the time period for which the SAS is valid. Best practices guidance for generating the SAS URL is available here.
For your specific scenario check out this blog from the Azure Storage team where they discuss using CORS and SAS for this exact scenario. There is also a sample application so this should give you everything you need.
Suppose I have something like this:
string TheFile = HttpRuntime.AppDomainAppPath + "\\SomeDir\\" + TheFilename + ".js";
System.IO.File.WriteAllText(TheFile , SomeText);
This works on my local machine: the file is created and visible in the file system and in the solution explorer. If I deploy on Azure, is the file going to be written only on the instance that's running this code or will it be written and available on all other instances?
Unless you write to Azure Drive or some equivalent thereof the change will of course be limited to the instance filesystem. The instance filesystem changes will be lost if the VM crashes and also in some other cases, so whatever you need to preserve should be stored to durable storage such as Azure Blob Storage.
I have my project files on my Dropbox folder so I can play around with my files at the office as well.
My project contains an EmbeddableDocumentStore with UseEmbeddedHttpServer set to true.
const int ravenPort = 8181;
NonAdminHttp.EnsureCanListenToWhenInNonAdminContext(ravenPort);
var ds = new EmbeddableDocumentStore {
DataDirectory = "Data",
UseEmbeddedHttpServer = true,
Configuration = { Port = ravenPort }
};
Now, this day when I started my project on my office pc I saw this message: Could not open transactional storage: D:\Dropbox\...\Data
Since it's early in my development stage I deleted the data folder on my Dropbox and the project started flawlessly. Now I'm back at home I ran into the same issue! I don't want to end up deleting this folder every time of course.
Can't I store my development data on my Dropbox? Should I bypass something to get this to work?
Set a data directory to a physical disk volume on your local computer. You will not be able to use any sort of mapped drive, network share, UNC path, dropbox or skydrive as a data directory. Just because you have a drive letter does not mean you have a physical disk.
The only types of non-physical storage that even make sense is a LUN attached from a SAN over iSCSI or FibreChannel, or an attached VHD in a virtualized or cloud environment. They will all present as physical disks to the OS.
This would be the case for just about ANY data access environment. Try it with SQL Server if you don't believe me. In RavenDB's case, it is using ESENT as its data store, which requires direct access to the filesystem.
Update
To clarify, even if you are storing on a physical disk, you can't rely on any type of synchronization technology like DropBox or SkyDrive. Why? Because they will be taking a shared read lock on the files to watch for changes. Technologies like ESENT (which RavenDB is based upon) require an exclusive lock to the file.
Other technologies like SQL Server and Windows Virtual Machine also take exclusive locks on their data stores. Why? Because they are constantly reading and writing bits of data in a random-access manner to the file. Would you really want DropBox to be trying to perform an sync operation for every bit of data change? It would be very inefficient and problematic.
Applications that use shared locks don't have this problem. For example, when you work on an MS Word document, it is all being done in memory. When you save the file, DropBox can read the entire file and sync it to the cloud. It can optimize by sending only the bits that have changed, but it still needs to be able to read the file to do so.
So if DropBox has a shared read lock on the ESENT file, then when RavenDB tries to open it exclusively, it gets an error and raises the exception you are seeing.