I'm trying to figure out how to backup videos produced by Azure Media Services.
Where are the assets and streaming locators stored, how to backup them or recreate them for existing binary files stored in the Azure Media Service's blob storage?
Proposed solution:
I've come up with a solution, once the video is processed by transformation job, the app will create a copy of the container to separate backup blob storage.
Since, from my understanding, the data produced by transformation jobs are immutable, I don't have to manage another synchronization.
if (job.State == JobState.Finished)
{
StreamingLocator locator = await AzureMediaServicesService.CreateStreamingLocatorAsync(client, azureMediaServicesConfig, outputAssetName, locatorName);
var videoUrls = await AzureMediaServicesService.GetVideoUrlsAsync(client, azureMediaServicesConfig, locator.Name);
// backup blobs in creted container here
}
Are only the binary data stored in blob storage sufficient for restoring the videos successfully? After restore, will the already existing streaming and download links work properly?
Since, when I'm creating locators, I'm passing the asset name as well, I reckon I should backup asset's data too. Can/should I somehow backup assets and locators? Where are they stored? Is there any better way to backup videos?
I was looking for the answers here:
https://learn.microsoft.com/en-us/azure/media-services/latest/streaming-locators-concept
https://learn.microsoft.com/en-us/azure/media-services/latest/stream-files-tutorial-with-api#get-a-streaming-locator
https://learn.microsoft.com/en-us/azure/media-services/latest/limits-quotas-constraints
Part of what you're asking is 'What is an asset in Media Services?'. The Storage container that is created as part of the encoding process is definitely a good portion of what you need to backup. Technically that is all you need to recreate an asset from the backup Storage account. Well, if you don't mind recreating the other aspects of the asset.
An asset is/can be several things:
The Storage container and the contents of that container. These would include the MP4 video files, the manifests (.ism and .ismc), and metadata XML files.
The published locator or URL where clients make GET requests to the streaming endpoint.
Metadata. This includes things like the asset name, creation date, description, etc.
If you keep track of the Storage container in your backup and what metadata is associated with it as well as have a way of updating your site with a new streaming locator then all you really need is the Storage container for recreating the asset.
Related
I am uploading files to cloud storage using the .net client.
at the moment am uploading files one by one like
StorageClient client = StorageClient.Create();
foreach(file in files)
{
client.UploadObject(bucketName, uploadLocation, contentType, file);
}
But I couldn't find any way to bulk upload files. Is there any way to upload files in bulk ?
You are effectively bulk uploading; you're just uploading each file serially 😃
If you're looking for a method that you give files to and it handles the entire upload, I'm unsure that this exists.
You can run the uploads in parallel using threads or equivalent.
You'll want to ensure that you can resume failed uploads (uploading multiple files increases the likelihood of failure), see Create Object Uploader
You'll need to self-manage resuming on failures.
It's possible that there are libraries that implement this abstraction.
I understand that you are searching for a way to upload a large number of files at once in Google Cloud Storage. However, there is no direct method to handle the entire upload. If you have a large number of files to upload you can perform a parallel multi-threaded/multi-processing copy. The following steps are to be followed:
1.Instantiation of a StorageClient object.
2.Specifying the parallelization options.
3.Getting a list of filenames from my upload folder and storing the names of the files.
4.Getting a list of files in the cloud.
5.Using the Parallel Foreach.
6.calling the UploadObject method from the Google Cloud Storage client library.
You can also refer to this article for more details on the above methods.
I was looking for resources on how to create a simple background service using C# that checks a specific folder for FLAC files and sends them to a GCP bucket, once the file is uploaded successfully the file is erased or moved to another folder. Where can I find something to read about this kind of thing?
To move a file to another location using c# you can use the move method. The Move method moves an existing file to a new location with the same or a different file name.The Move method moves an existing file to a new location with the same or a different file name in File Move. The Move method takes two parameters. The Move method deletes the original file. The method that renames files is called File.Move
Example:
{
File.Move(sourceFile, destinationFile);
}
catch (IOException iox)
{
Console.WriteLine(iox.Message);
}
If you need more examples about File.Move method please follow this link
Adding to that, you can use the Directory.GetFiles method to select the file extension, like in the example below.
This is the original thread where the example was posted
Example:
//Assume user types .txt into textbox
string fileExtension = "*" + textbox1.Text;
string[] txtFiles = Directory.GetFiles("Source Path", fileExtension);
foreach (var item in txtFiles)
{
File.Move(item, Path.Combine("Destination Directory", Path.GetFileName(item)));
}
If you want to know more about Directory.GetFiles method follow this link
And concerning GCP,using Cloud Storage Transfer Service you can move or backup data to a Cloud Storage bucket either from other cloud storage providers or from your on-premises storage. Storage Transfer Service provides options that make data transfers and synchronization easier. For example, you can:
Schedule one-time transfer operations or recurring transfer
operations.
Delete existing objects in the destination bucket if they do not have
a corresponding object in the source.
Delete data source objects after transferring them.
Schedule periodic synchronization from a data source to a data sink
with advanced filters based on file creation dates, file-names, and
the times of day you prefer to import data.
If you want to know more about GCP Cloud Storage Transfer Service follow this link
If you want to know more about how to create storage buckets follow this link
I want to extract a thumbnail from an mp4 video hosted in azure storage. My current method in C# uses a NReco NuGet package:
But that is a local file. How do i extract the thumb from an azure storage file.
string mp4inputpath = server.mappath("~/testfolder/myvideo.mp4");
string thumbOutputPath = server.mappath("~/testfolder/mythumb.jpg");
var ffMpeg = new NReco.VideoConverter.FFMpegConverter();
// Get the thumb at the frame 1 second into the video
ffMpeg.GetVideoThumbnail(mp4inputpath, thumbOutputPath, 1);
That works! But i need to use an azure storage file url for mp4inputpath.
I can download the mp4 file from azure storage and save it temporarily into my azure web app. I can do that programatically.
Then extract the thumb, ie,
ffMpeg.GetVideoThumbnail(mp4inputpath, thumbOutputPath, 1);
Then delete the temporary mp4 within my app.
this works but i don't know it is advisable to download mp4 files into my azure web app. I don't know if it will scale. This is the only solution i have, so far.
string mp4Url = #"https://mysorageaccount.blob.core.windows.net/mp4/vacation/summer/dogbarking.mp4";
string thumbOutputPath = server.mappath("~/testfolder/mythumb.jpg");
var ffMpeg = new NReco.VideoConverter.FFMpegConverter();
// Get the thumb at the frame 1 second into the video
ffMpeg.GetVideoThumbnail(mp4Url, thumbOutputPath, 1);
This does not seem to work. No Error, but the thumbOutputPath file is empty.
What you've done is pretty much what you have to do, since you cannot open an object in Azure Storage as you would a local file. So, grabbing the file into a local file or a stream is what you'd need to do.
As far as scaling: that will depend on the size (and number of instances) you're running in your Web App. Just be aware that you should have both your storage account and your Web App in the same region, to reduce latency and avoid egress charges for bandwidth.
I've just started working with Data Lake and I'm currently trying to figure out the real workflow steps and how to automatize the whole process.
Say I have some files as an input and I would like to process them and download output files in order to push into my data warehouse or/and SSAS.
I've found absolutely lovely API and it's all good but I can't find a way to get all the file names in a directory to get them downloaded further.
Please correct my thoughts regarding workflow. Is there another, more elegant way to automatically get all the processed data (outputs) into a storage (like conventional SQL Server, SSAS, data warehouse and etc)?
If you have a working solution based on Data Lake, please describe the workflow (from "raw" files to reports for end-users) with a few words.
here is my example of NET Core application
using Microsoft.Azure.DataLake.Store;
using Microsoft.IdentityModel.Clients.ActiveDirectory;
using Microsoft.Rest.Azure.Authentication;
var creds = new ClientCredential(ApplicationId, Secret);
var clientCreds = ApplicationTokenProvider.LoginSilentAsync(Tenant, creds).GetAwaiter().GetResult();
var client = AdlsClient.CreateClient("myfirstdatalakeservice.azuredatalakestore.net", clientCreds);
var result = client.GetDirectoryEntry("/mynewfolder", UserGroupRepresentation.ObjectID);
Say I have some files as an input and I would like to process them and download output files in order to push into my data warehouse or/and SSAS.
If you want to download the files from the folder in the azure datalake to the local path, you could use the following code to do that.
client.BulkDownload("/mynewfolder", #"D:\Tom\xx"); //local path
But based on my understanding, you could use the azure datafactory to push your data from datalake store to azure storage blob or azure file storge.
I have my project files on my Dropbox folder so I can play around with my files at the office as well.
My project contains an EmbeddableDocumentStore with UseEmbeddedHttpServer set to true.
const int ravenPort = 8181;
NonAdminHttp.EnsureCanListenToWhenInNonAdminContext(ravenPort);
var ds = new EmbeddableDocumentStore {
DataDirectory = "Data",
UseEmbeddedHttpServer = true,
Configuration = { Port = ravenPort }
};
Now, this day when I started my project on my office pc I saw this message: Could not open transactional storage: D:\Dropbox\...\Data
Since it's early in my development stage I deleted the data folder on my Dropbox and the project started flawlessly. Now I'm back at home I ran into the same issue! I don't want to end up deleting this folder every time of course.
Can't I store my development data on my Dropbox? Should I bypass something to get this to work?
Set a data directory to a physical disk volume on your local computer. You will not be able to use any sort of mapped drive, network share, UNC path, dropbox or skydrive as a data directory. Just because you have a drive letter does not mean you have a physical disk.
The only types of non-physical storage that even make sense is a LUN attached from a SAN over iSCSI or FibreChannel, or an attached VHD in a virtualized or cloud environment. They will all present as physical disks to the OS.
This would be the case for just about ANY data access environment. Try it with SQL Server if you don't believe me. In RavenDB's case, it is using ESENT as its data store, which requires direct access to the filesystem.
Update
To clarify, even if you are storing on a physical disk, you can't rely on any type of synchronization technology like DropBox or SkyDrive. Why? Because they will be taking a shared read lock on the files to watch for changes. Technologies like ESENT (which RavenDB is based upon) require an exclusive lock to the file.
Other technologies like SQL Server and Windows Virtual Machine also take exclusive locks on their data stores. Why? Because they are constantly reading and writing bits of data in a random-access manner to the file. Would you really want DropBox to be trying to perform an sync operation for every bit of data change? It would be very inefficient and problematic.
Applications that use shared locks don't have this problem. For example, when you work on an MS Word document, it is all being done in memory. When you save the file, DropBox can read the entire file and sync it to the cloud. It can optimize by sending only the bits that have changed, but it still needs to be able to read the file to do so.
So if DropBox has a shared read lock on the ESENT file, then when RavenDB tries to open it exclusively, it gets an error and raises the exception you are seeing.