I am making a custom activity in ADF, which involves reading multiple files from Azure Storage Blobs, doing some work on them, and then finally writing a resulting file to the Azure Data Lake Store.
Last step is where I stop, because as far as I can see, the .NET SDK only allows for uploading from a local file.
Is there any way to to (programmatically) upload a file to ADL Store, where it is not from a local file? Could be a blob or a stream. If not, any workarounds?
Yes, it's possible to upload from Stream, the trick is to create file first and then append your stream to it:
string dataLakeAccount = "DLSAccountName";
var adlsFileSystemClient = new DataLakeStoreFileSystemManagementClient(credentials);
adlsFileSystemClient.FileSystem.Create(dataLakeAccount, filepath, overwrite: true);
adlsFileSystemClient.FileSystem.Append(dataLakeAccount, filepath, stream);
See also this article.
Related
I am uploading files to cloud storage using the .net client.
at the moment am uploading files one by one like
StorageClient client = StorageClient.Create();
foreach(file in files)
{
client.UploadObject(bucketName, uploadLocation, contentType, file);
}
But I couldn't find any way to bulk upload files. Is there any way to upload files in bulk ?
You are effectively bulk uploading; you're just uploading each file serially 😃
If you're looking for a method that you give files to and it handles the entire upload, I'm unsure that this exists.
You can run the uploads in parallel using threads or equivalent.
You'll want to ensure that you can resume failed uploads (uploading multiple files increases the likelihood of failure), see Create Object Uploader
You'll need to self-manage resuming on failures.
It's possible that there are libraries that implement this abstraction.
I understand that you are searching for a way to upload a large number of files at once in Google Cloud Storage. However, there is no direct method to handle the entire upload. If you have a large number of files to upload you can perform a parallel multi-threaded/multi-processing copy. The following steps are to be followed:
1.Instantiation of a StorageClient object.
2.Specifying the parallelization options.
3.Getting a list of filenames from my upload folder and storing the names of the files.
4.Getting a list of files in the cloud.
5.Using the Parallel Foreach.
6.calling the UploadObject method from the Google Cloud Storage client library.
You can also refer to this article for more details on the above methods.
I want to extract a thumbnail from an mp4 video hosted in azure storage. My current method in C# uses a NReco NuGet package:
But that is a local file. How do i extract the thumb from an azure storage file.
string mp4inputpath = server.mappath("~/testfolder/myvideo.mp4");
string thumbOutputPath = server.mappath("~/testfolder/mythumb.jpg");
var ffMpeg = new NReco.VideoConverter.FFMpegConverter();
// Get the thumb at the frame 1 second into the video
ffMpeg.GetVideoThumbnail(mp4inputpath, thumbOutputPath, 1);
That works! But i need to use an azure storage file url for mp4inputpath.
I can download the mp4 file from azure storage and save it temporarily into my azure web app. I can do that programatically.
Then extract the thumb, ie,
ffMpeg.GetVideoThumbnail(mp4inputpath, thumbOutputPath, 1);
Then delete the temporary mp4 within my app.
this works but i don't know it is advisable to download mp4 files into my azure web app. I don't know if it will scale. This is the only solution i have, so far.
string mp4Url = #"https://mysorageaccount.blob.core.windows.net/mp4/vacation/summer/dogbarking.mp4";
string thumbOutputPath = server.mappath("~/testfolder/mythumb.jpg");
var ffMpeg = new NReco.VideoConverter.FFMpegConverter();
// Get the thumb at the frame 1 second into the video
ffMpeg.GetVideoThumbnail(mp4Url, thumbOutputPath, 1);
This does not seem to work. No Error, but the thumbOutputPath file is empty.
What you've done is pretty much what you have to do, since you cannot open an object in Azure Storage as you would a local file. So, grabbing the file into a local file or a stream is what you'd need to do.
As far as scaling: that will depend on the size (and number of instances) you're running in your Web App. Just be aware that you should have both your storage account and your Web App in the same region, to reduce latency and avoid egress charges for bandwidth.
I am working on an application where file uploads happen often, and can be pretty large in size.
Those files are being uploaded to a Web API, which will then get the Stream from the request, and pass it on to my storage service, that then uploads it to Azure Blob Storage.
I need to make sure that:
No temp files are written on the Web API instance
The request stream is not fully read into memory before passing it on to the storage service (to prevent OutOfMemoryExceptions).
I've looked at this article, which describes how to disable input stream buffering, but because many file uploads from many different users happen simultaneously, it's important that it actually does what it says on the tin.
This is what I have in my controller at the moment:
if (this.Request.Content.IsMimeMultipartContent())
{
var provider = new MultipartMemoryStreamProvider();
await this.Request.Content.ReadAsMultipartAsync(provider);
var fileContent = provider.Contents.SingleOrDefault();
if (fileContent == null)
{
throw new ArgumentException("No filename.");
}
var fileName = fileContent.Headers.ContentDisposition.FileName.Replace("\"", string.Empty);
// I need to make sure this stream is ready to be processed by
// the Azure client lib, but not buffered fully, to prevent OoM.
var stream = await fileContent.ReadAsStreamAsync();
}
I don't know how I can reliably test this.
EDIT: I forgot to mention that uploading directly to Blob Storage (circumventing my API) won't work, as I am doing some size checking (e.g. can this user upload 500mb? Has this user used his quota?).
Solved it, with the help of this Gist.
Here's how I am using it, along with a clever "hack" to get the actual file size, without copying the file into memory first. Oh, and it's twice as fast
(obviously).
// Create an instance of our provider.
// See https://gist.github.com/JamesRandall/11088079#file-blobstoragemultipartstreamprovider-cs for implementation.
var provider = new BlobStorageMultipartStreamProvider ();
// This is where the uploading is happening, by writing to the Azure stream
// as the file stream from the request is being read, leaving almost no memory footprint.
await this.Request.Content.ReadAsMultipartAsync(provider);
// We want to know the exact size of the file, but this info is not available to us before
// we've uploaded everything - which has just happened.
// We get the stream from the content (and that stream is the same instance we wrote to).
var stream = await provider.Contents.First().ReadAsStreamAsync();
// Problem: If you try to use stream.Length, you'll get an exception, because BlobWriteStream
// does not support it.
// But this is where we get fancy.
// Position == size, because the file has just been written to it, leaving the
// position at the end of the file.
var sizeInBytes = stream.Position;
Voilá, you got your uploaded file's size, without having to copy the file into your web instance's memory.
As for getting the file length before the file is uploaded, that's not as easy, and I had to resort to some rather non-pleasant methods in order to get just an approximation.
In the BlobStorageMultipartStreamProvider:
var approxSize = parent.Headers.ContentLength.Value - parent.Headers.ToString().Length;
This gives me a pretty close file size, off by a few hundred bytes (depends on the HTTP header I guess). This is good enough for me, as my quota enforcement can accept a few bytes being shaved off.
Just for showing off, here's the memory footprint, reported by the insanely accurate and advanced Performance Tab in Task Manager.
Before - using MemoryStream, reading it into memory before uploading
After - writing directly to Blob Storage
I think a better approach is for you to go directly to Azure Blob Storage from your client. By leveraging the CORS support in Azure Storage you eliminate load on your Web API server resulting in better overall scale for your application.
Basically, you will create a Shared Access Signature (SAS) URL that your client can use to upload the file directly to Azure storage. For security reasons, it is recommended that you limit the time period for which the SAS is valid. Best practices guidance for generating the SAS URL is available here.
For your specific scenario check out this blog from the Azure Storage team where they discuss using CORS and SAS for this exact scenario. There is also a sample application so this should give you everything you need.
I want to create an application where the user after authentication can upload a file(txt/xml etc.) using the Azure Mobile Services and after that he can download only those files which were uploaded by himself.
I've watched a lot of tutorials (including this one: link ) but in this case they simply inserts a row to a database table. I want basically the same thing, just with files. How can I do that?
I'm really new to this, so I'm just guessing, but should I upload the files to Blob Storage, and store a link in the database pointing to that file? I'm searching for the best practice.
Yes, you are correct!
You would be limited in size if you tried to store the text file as a field in the database.
http://azure.microsoft.com/en-us/documentation/articles/mobile-services-windows-store-dotnet-upload-data-blob-storage/
Shows how to do what you want to do but with images.
You would want to change the image stream to a text stream here:
// Get the new image as a stream.
using (var fileStream = await media.OpenStreamForReadAsync())
{
...
}
And use the Stream classes instead to open the file stream:
http://msdn.microsoft.com/en-us/library/windows/apps/system.io.stream(v=vs.105).aspx
I have large amount of files(approx 80000) stored into sql server database as BLOB. Now I have a situation where I need to export all those file in blob to IBM Filenet.
For that what I think that first I need to stream those existing blob data into file system and then I will use filenet for uploading those file into filenet server.
Now please help me writing a C# utility which will convert those huge blob data into corresponding files.
You can be Managing FILESTREAM Data by Using Win32 API.
This link contains C# code that loads BLOB into a variable in you C# code. Then you can save it with path, file name and extention derrived from DB as well. Here is a a small quote of code:
//Read the data from the FILESTREAM
//BLOB.
sqlFileStream.Seek(0L, SeekOrigin.Begin);
numBytes = sqlFileStream.Read(buffer, 0, buffer.Length);
string readData = unicode.GetString(buffer);
if (numBytes != 0)
Console.WriteLine(readData);
See also Using FILESTREAM Storage in Client Applications.