I am uploading files to cloud storage using the .net client.
at the moment am uploading files one by one like
StorageClient client = StorageClient.Create();
foreach(file in files)
{
client.UploadObject(bucketName, uploadLocation, contentType, file);
}
But I couldn't find any way to bulk upload files. Is there any way to upload files in bulk ?
You are effectively bulk uploading; you're just uploading each file serially 😃
If you're looking for a method that you give files to and it handles the entire upload, I'm unsure that this exists.
You can run the uploads in parallel using threads or equivalent.
You'll want to ensure that you can resume failed uploads (uploading multiple files increases the likelihood of failure), see Create Object Uploader
You'll need to self-manage resuming on failures.
It's possible that there are libraries that implement this abstraction.
I understand that you are searching for a way to upload a large number of files at once in Google Cloud Storage. However, there is no direct method to handle the entire upload. If you have a large number of files to upload you can perform a parallel multi-threaded/multi-processing copy. The following steps are to be followed:
1.Instantiation of a StorageClient object.
2.Specifying the parallelization options.
3.Getting a list of filenames from my upload folder and storing the names of the files.
4.Getting a list of files in the cloud.
5.Using the Parallel Foreach.
6.calling the UploadObject method from the Google Cloud Storage client library.
You can also refer to this article for more details on the above methods.
Related
I was looking for resources on how to create a simple background service using C# that checks a specific folder for FLAC files and sends them to a GCP bucket, once the file is uploaded successfully the file is erased or moved to another folder. Where can I find something to read about this kind of thing?
To move a file to another location using c# you can use the move method. The Move method moves an existing file to a new location with the same or a different file name.The Move method moves an existing file to a new location with the same or a different file name in File Move. The Move method takes two parameters. The Move method deletes the original file. The method that renames files is called File.Move
Example:
{
File.Move(sourceFile, destinationFile);
}
catch (IOException iox)
{
Console.WriteLine(iox.Message);
}
If you need more examples about File.Move method please follow this link
Adding to that, you can use the Directory.GetFiles method to select the file extension, like in the example below.
This is the original thread where the example was posted
Example:
//Assume user types .txt into textbox
string fileExtension = "*" + textbox1.Text;
string[] txtFiles = Directory.GetFiles("Source Path", fileExtension);
foreach (var item in txtFiles)
{
File.Move(item, Path.Combine("Destination Directory", Path.GetFileName(item)));
}
If you want to know more about Directory.GetFiles method follow this link
And concerning GCP,using Cloud Storage Transfer Service you can move or backup data to a Cloud Storage bucket either from other cloud storage providers or from your on-premises storage. Storage Transfer Service provides options that make data transfers and synchronization easier. For example, you can:
Schedule one-time transfer operations or recurring transfer
operations.
Delete existing objects in the destination bucket if they do not have
a corresponding object in the source.
Delete data source objects after transferring them.
Schedule periodic synchronization from a data source to a data sink
with advanced filters based on file creation dates, file-names, and
the times of day you prefer to import data.
If you want to know more about GCP Cloud Storage Transfer Service follow this link
If you want to know more about how to create storage buckets follow this link
I'm trying to figure out how to backup videos produced by Azure Media Services.
Where are the assets and streaming locators stored, how to backup them or recreate them for existing binary files stored in the Azure Media Service's blob storage?
Proposed solution:
I've come up with a solution, once the video is processed by transformation job, the app will create a copy of the container to separate backup blob storage.
Since, from my understanding, the data produced by transformation jobs are immutable, I don't have to manage another synchronization.
if (job.State == JobState.Finished)
{
StreamingLocator locator = await AzureMediaServicesService.CreateStreamingLocatorAsync(client, azureMediaServicesConfig, outputAssetName, locatorName);
var videoUrls = await AzureMediaServicesService.GetVideoUrlsAsync(client, azureMediaServicesConfig, locator.Name);
// backup blobs in creted container here
}
Are only the binary data stored in blob storage sufficient for restoring the videos successfully? After restore, will the already existing streaming and download links work properly?
Since, when I'm creating locators, I'm passing the asset name as well, I reckon I should backup asset's data too. Can/should I somehow backup assets and locators? Where are they stored? Is there any better way to backup videos?
I was looking for the answers here:
https://learn.microsoft.com/en-us/azure/media-services/latest/streaming-locators-concept
https://learn.microsoft.com/en-us/azure/media-services/latest/stream-files-tutorial-with-api#get-a-streaming-locator
https://learn.microsoft.com/en-us/azure/media-services/latest/limits-quotas-constraints
Part of what you're asking is 'What is an asset in Media Services?'. The Storage container that is created as part of the encoding process is definitely a good portion of what you need to backup. Technically that is all you need to recreate an asset from the backup Storage account. Well, if you don't mind recreating the other aspects of the asset.
An asset is/can be several things:
The Storage container and the contents of that container. These would include the MP4 video files, the manifests (.ism and .ismc), and metadata XML files.
The published locator or URL where clients make GET requests to the streaming endpoint.
Metadata. This includes things like the asset name, creation date, description, etc.
If you keep track of the Storage container in your backup and what metadata is associated with it as well as have a way of updating your site with a new streaming locator then all you really need is the Storage container for recreating the asset.
I am trying to write a custom .NET activity, which will be run from Azure Data Factory. It will do two tasks, one after the other:
it will download grib2 files from an FTP server daily (grib2 is a custom compression for meteorological data)
it will decompress each file as it is downloaded.
So far I have setup an Azure Batch with a pool with two nodes - Windows Server machines, which are used to run the FTP downloads. The nodes are downloading the grib2 files to a blob storage container.
The code for the custom app so far looks like this:
using System;
using System.Collections.Generic;
using System.Linq;
using Microsoft.Azure;
using Microsoft.Azure.Management.DataFactories.Models;
using Microsoft.Azure.Management.DataFactories.Runtime;
namespace ClassLibrary1
{
public class Class1 : IDotNetActivity
{
public IDictionary string, string Execute(
IEnumerable linkedServices,
IEnumerable datasets,
Activity activity,
IActivityLogger logger)
{
logger.Write("Start");
//Get extended properties
DotNetActivity dotNetActivityPipeline = (DotNetActivity)activity.TypeProperties;
string sliceStartString = dotNetActivityPipeline.ExtendedProperties["SliceStart"];
//Get linked service details
Dataset inputDataset = datasets.Single(dataset = dataset.Name == activity.Inputs.Single().Name);
Dataset outputDataset = datasets.Single(dataset = dataset.Name == activity.Outputs.Single().Name);
/*
DO FTP download here
*/
logger.Write("End");
return new Dictionary string, string();
}
}
}
So far my code works and I have the files downloaded to my blob storage account.
Now that I have the files downloaded, I would like to have the nodes of the Batch pool decompress the files and put the decompressed files in my blob storage for further processing.
For this, wgrib2.exe is used, which comes with some dll files. I have already zipped and uploaded the executable and all dll files it needs to the Application Packages to my pool. If I am correct, when each node joins the pool, this executable will be extracted and available for calls.
My question is: how do I go about to write the custom .NET activity so the files are downloaded by the nodes of my pool and after each file is downloaded, a decompression command is run on each file to convert it to csv file? The command line for this would look like:
wgrib2.exe downloadedfileName.grb2 -csv downloadedfileName.csv
How do I get a handle of the name of each downloaded file, how do I precess it on the node and save it back to the blob storage?
Also, how can I control how many files are downloaded at the same time and how many are decompressed at the same time?
i have multiple web server and one central file server inside my data center.
and all my Web server store the user uploaded files into central internal file server.
i would like to know what is the best way to pass the file from web server to file server in this case?
as suggested i try to add more details to question:
the solution i came up was:
after receiving files from user at web server, i should just do an Http Post to the file server. but i think there is some thing wrong with this because it causes large files to be entirely loaded into memory twice: (once at web server and once at file server)
Is your file server just another windows/linux server or is it a NAS device. I can suggest you number of approaches based on your requirement. The question is why d you want to use HTTP protocol when you have much better way to transfer files between servers.
HTTP protocol is best when you send text data as HTTP itself is based
on text.From the client side to Server side HTTP is used as that is
the only available option for you by our browsers .But
between your servers ,I feel you should use SMB protocol(am assuming
you are using windows as it is tagged for IIS) to move data.It will
be orders of magnitude faster as much more efficient to transfer the same data over SMB vs
HTTP.
And for SMB protocol,you do not have to write any code or complex scripts to do this.As provided by one of the answers above,you can just issue a simple copy command and it will happen for you.
So just summarizing the options for you (based on my preference)
Let the files get upload to some location on the each IIS web server e.g C:\temp\UploadedFiles . You can write a simple 2-3 line powershell script which will copy the files from this C:\temp\UploadedFiles to \FileServer\Files\UserID\\uploaded.file .This same powershell script can delete the file once it is moved to the other server successfully.
E.g script can be this simple and easy to make it as windows scheduled task
$Destination = "\\FileServer\Files\UserID\<FILEGUID>\"
New-Item -ItemType directory -Path $Destination -Force
Copy-Item -Path $Source\*.* -Destination $Destination -Force
This script can be modified to suit your needs to delete the files if it is done :)
In the Asp.net application ,you can directly save the file to network location.So in the SaveAs call,you can give the network path itself. This you have to make sure this network share is accessible for the IIS worker process and also has write permission.Also in my understanding asp.net gets the file saved to temporary location first (you do not have control on this if you are using the asp.net HttpPostedFileBase or FormCollection ). More details here
You can even run this in an async so that your requests will not be blocked
if (FileUpload1.HasFile)
// Call to save the file.
FileUpload1.SaveAs("\\networkshare\filename");
https://msdn.microsoft.com/en-us/library/system.web.ui.webcontrols.fileupload.saveas(v=vs.110).aspx
3.Save the file the current way to local directory and then use HTTP POST. This is worst design possible as you are first going to read the contents and then transfer it as chunked to other server where you have to setup another webservice which recieves the file.The you have to read the file from request stream and again save it to your location. Am not sure if you need to do this.
let me know if you need more details on any of the listed method.
Or you just write it to a folder on the webservers, and create a scheduled task that moves the files to the file server every x minutes (e.g. via robocopy). This also makes sure your webservers are not reliant on your file server.
Assuming that you have an HttpPostedFileBase then the best way is just to call the .SaveAs() method.
You need the UNC path to the file server and that is it. The simplest version would look something like this:
public void SaveFile(HttpPostedFileBase inputFile) {
var saveDirectory = #"\\fileshare\application\directory";
var savePath = Path.Combine(saveDirectory, inputFile.FileName);
inputFile.SaveAs(savePath);
}
However, this is simplistic in the extreme. Take a look at the OWASP Guidance on Unrestricted File Uploads. File uploads can be the source of many vulnerabilities in your application.
You also need to make sure that the web application has access to the file share. Take a look at this answer
Creating a file on network location in asp.net
for more info. Generally the best solution is to run the application pool with a special identity which is only used to access the folder.
the solution i came up was: after receiving files from user at web server, i should just do an Http Post to the file server. but i think there is some thing wrong with this because it causes large files to be entirely loaded into memory twice: (once at web server and once at file server)
I would suggest not posting the file at once - it's then full in memory, which is not needed.
You could post the file in chunks, by using ajax. When a chunk receives at your server, just add it to the file.
With the File Reader API, you could read the file in chunks in Javascript.
Something like this:
/** upload file in chunks */
function upload(file) {
var chunkSize = 8000;
var start = 0;
while (start < file.size) {
var chunk = file.slice(start, start + chunkSize);
var xhr = new XMLHttpRequest();
xhr.onload = function () {
//check if all chunks are and then send filename or send in in the first/last request.
};
xhr.open("POST", "/FileUpload", true);
xhr.send(chunk);
start = end;
}
}
It can be implemented in different ways. If you are storing files in files server as files in file system. And all of your servers inside the same virtual network
Then will be better to create shared folder on your file server and once you received files at web server, just save this file in this shared folder directly on file server.
Here the instructions how to create shared folders: https://technet.microsoft.com/en-us/library/cc770880(v=ws.11).aspx
Just map a drive
I take it you have a means of saving the uploaded file on the web server's local filesystem. The question pertains to moving the file from the web server (which is probably one of many load-balanced nodes) to a central file system all web servers can access it.
The solution to this is remarkably simple.
Let's say you are currently saving the files some folder, say c:\uploadedfiles. The path to uploadedfiles is stored in your web.config.
Take the following steps:
Sign on as the service account under which your web site executes
Map a persistent network drive to the desired location, e.g. from command line:
NET USE f: \\MyFileServer\MyFileShare /user:SomeUserName password
Modify your web.config and change c:\uploadedfiles to f:\
Ta da, all done.
Just make sure the drive mapping is persistent, and make sure you use a user with adequate permissions, and voila.
In my scenario, users are able to upload zip files to a.example.com
I would love to create a "daemon" which in specified time intervals will move-transfer any zip files uploaded by the users from a.example.com to b.example.com
From the info i gathered so far,
The daemon will be an .ashx generic handler.
The daemon will be triggered at the specified time intervals via a plesk cron job
The daemon (thanks to SLaks) will consist of two FtpWebRequest's (One for reading and one for writing).
So the question is how could i implement step 3?
Do i have to read into to a memory() array the whole file and try to write that in b.example.com ?
How could i write the info i read to b.example.com?
Could i perform reading and writing of the file at the same time?
No i am not asking for the full code, i just can figure out, how could i perform reading and writing on the fly, without user interaction.
I mean i could download the file locally from a.example.com and upload it at b.example.com but that is not the point.
Here is another solution:
Let ASP.Net in server A receive the file as a regular file upload and store it in directory XXX
Have a windows service in server A that checks directory XXX for new files.
Let the window service upload the file to server B using HttpWebRequest
Let server B receive the file using a regular ASP.Net file upload page.
Links:
File upload example (ASP.Net): http://msdn.microsoft.com/en-us/library/aa479405.aspx
Building a windows service: http://www.codeproject.com/KB/system/WindowsService.aspx
Uploading files using HttpWebRequest: Upload files with HTTPWebrequest (multipart/form-data)
Problems you gotto solve:
How to determine which files to upload to server B. I would use Directory.GetFiles in a Timer to find new files instead of using a FileSystemWatcher. You need to be able to check if a file have been uploaded previously (delete it, rename it, check DB or whatever suits your needs).
Authentication on server B, so that only you can upload files to it.
To answer your questions - yes you can read and write the files at the same time.
You can open an FTPWebRequest to ServerA and a FTPWebRequest to ServerB. On the FTPWebRequest to serverA you would request the file, and get the ResponseStream. Once you have the ResponseStream, you would read a chunk of bytes at a time, and write that chunck of bytes to the serverB RequestStream.
The only memory you would be using would be the byte[] buffer in your read/write loop. Just keep in mind though that the underlying implementation of FTPWebRequest will download the complete FTP file before returning the response stream.
Similarly, you cannot send your FTPWebRequest to upload the new file until all bytes have been written. In effect, the operations will happen synchronously. You will call GetResponse which won't return until the full file is available, and only then can you 'upload' the new file.
References:
FTPWebRequest
Something you have to take into consideration is that a long running web requests (your .ashx generic handler) may be killed when the AppDomain refreshes. Therefore you have to implement some sort of atomic transaction logic in your code, and you should handle sudden disconnects and incomplete FTP transfers if you go that way.
Did you have a look at Windows Azure before? This cloud platform supports distributed file system, and has built-in atomic transactions. Plus it scales nicely, should your service grow fast.
I would make it pretty simple. The client program uploads the file to server A. This can be done very easily in C# with an FtpWebRequest.
http://msdn.microsoft.com/en-us/library/ms229715.aspx
I would then have a service on server A that monitors the directory where files are uploaded. When a file is uploaded to that directory or on certain intervals it simply copies files over to server B. Again this can be done via Ftp or other means if they're on the same network.
you need some listener on the target domain, ftp server running there, and on the client side you will use System.Net.WebClient and UploadFile or UploadFileAsync to send the file. is that what you are asking?
It sounds like you don't really need a webservice or handler. All you need is a program that will, at regular intervals, open up an FTP connection to the other server and move the files. This can be done by any .NET program with the System.WebClient library, doesn't have to be a "web app". This other program could be a service, which could handle its own timing, or a simple app run by your cron job. If you need this to go two ways, for instance if the two servers are mirrors, you simply have the same app on the second box doing the same thing to upload files over to the first.
If both machines are in the same domain, couldn't you just do file replication at the OS level?
DFS
set up keys if you are using linux based systems:
http://compdottech.blogspot.com/2007/10/unix-login-without-password-setting.html
Once you have the keys working, you can copy the file from system A to system B by writing regular shell scripts that would not need any user interactions.