unzip Http FormFile in Azure Function - c#

I have a front hand in which the user can download a zip. My idea is to use http triggered azure function to unzip that file and send it to Azure blob storage. Therefore I am simulating the http function with postman sending the zip in the form-data. I am not able to figure it out how to go from the Http.FormFile to the unzipped file that I am going to send. I am using c#.
Do you have some suggestion/reference ?
Maybe my approach is wrong and I should send the data(which unzipped is like 60-70 Mb) first to the blob and then use a Blob trigger to send the unzipped file to another container. This last approach feel to me more resource intensive. Which would you choose ?

As per article credits by FBoucher
Install the extension from Inside Visual Studio Code and Azure Function Core Tools set the setting AzureWebJobsStorage to UseDevelopmentStorage=true in the local.settings.json file.
{
"IsEncrypted": false,
"Values": {
"AzureWebJobsStorage": "UseDevelopmentStorage=true",
"FUNCTIONS_WORKER_RUNTIME": "dotnet",
"unziptools_STORAGE": "DefaultEndpointsProtocol=https;AccountName=unziptools;AccountKey=XXXXXXXXX;EndpointSuffix=core.windows.net",
}
}
In Azure Function Extension, select your subscription and Function App name under Function App use AzUnzipEverything -> add new setting -> create cloud5mins_storage, destinationStorage and destinationContainer.
In your storage account -> resource group -> select your blob -> input files -> upload a zip file
After few mins uploaded zip file will be Unzipped into the blob storage container output-files.
For your Reference :
https://github.com/FBoucher/AzUnzipEverything by FBoucher

Related

Azure blob trigger, how to trigger on "move" for a file?

So I have a blob trigger and recently discovered if I have a file and I move it to the folder in question, it does not trigger the blob trigger. How can I trigger off of a "move".
Context - The blob trigger is looking for a json file. It DOES recognize if I copy/ drag and drop from other folder on PC via upload and I overwrite, but NOT if the file is new!
How I have come across this is having an "a" folder and my trigger folder. If I DO NOT have the file in my trigger file and I perform a "move" from file "a" to my trigger folder. This is ignored by the blob trigger? Why is that? Is there a work around?
As far as code, I have to clean it up some but it's a general consumption blob trigger that's connected to look for a json file. It's a ADLS Gen 2 storage account. It DOES work on other blob copies and such. Just NOT on moves.
So far I have tried to move the file and it never triggers but if I copy, drag and drop WITH overwrite, it triggers. Looked through config and checked documents and can't find any mention yet.
After reproducing from my end, this works fine. I'm using ADLS Gen 2 storage account. Even after moving from a different folder to trigger folder, it is getting triggered. I made sure about this after attaching the storage account with function app and logic app. Below is the function.json for my function app.
{
"bindings": [
{
"name": "myBlob",
"type": "blobTrigger",
"direction": "in",
"path": "input/{name}.json",
"connection": "AzureWebJobsStorage"
}
]
}
RESULTS:
Make sure the extension that you have added for filter is in the right format to get it triggered.

Deleting files in Azure Synapse Notebook

This should have been simple but turned out to require a bit of GoogleFu.
I have an Azure Synapse Spark Notebook written in C# that
Receives a list of Deflate compressed IIS files.
Reads the files as binary into a DataFrame
Decompresses these files one at a time and writes them into Parquet format.
Now after all of them have been successfully processed I need to delete the compressed files.
This is my proof of concept but it works perfectly.
Create a linked service pointing to the storage account that contains the files you want to delete see Configure access to Azure Blob Storage
See code sample below
#r "nuget:Azure.Storage.Files.DataLake,12.0.0-preview.9"
using Microsoft.Spark.Extensions.Azure.Synapse.Analytics.Utils;
using Microsoft.Spark.Extensions.Azure.Synapse.Analytics.Notebook.MSSparkUtils;
using Azure.Storage.Files.DataLake;
using Azure.Storage.Files.DataLake.Models;
string blob_sas_token = Credentials.GetConnectionStringOrCreds('your linked service name here');
Uri uri = new Uri($"https://'your storage account name here'.blob.core.windows.net/'your container name here'{blob_sas_token}") ;
DataLakeServiceClient _serviceClient = new DataLakeServiceClient(uri);
DataLakeFileSystemClient fileClient = _serviceClient.GetFileSystemClient("'path to directory containing the file here'") ;
fileClient.DeleteFile("'file name here'") ;
The call to Credentials.GetConnectionStringOrCreds returns a signed SAS token that is ready for your code to attach to a storage resource uri.
You could of course use the DeleteFileAsync method if you so desire.
Hope this saves someone else a few hours of GoogleFu.

Custom .NET activity to download and extract files by using Azure Batch

I am trying to write a custom .NET activity, which will be run from Azure Data Factory. It will do two tasks, one after the other:
it will download grib2 files from an FTP server daily (grib2 is a custom compression for meteorological data)
it will decompress each file as it is downloaded.
So far I have setup an Azure Batch with a pool with two nodes - Windows Server machines, which are used to run the FTP downloads. The nodes are downloading the grib2 files to a blob storage container.
The code for the custom app so far looks like this:
using System;
using System.Collections.Generic;
using System.Linq;
using Microsoft.Azure;
using Microsoft.Azure.Management.DataFactories.Models;
using Microsoft.Azure.Management.DataFactories.Runtime;
namespace ClassLibrary1
{
public class Class1 : IDotNetActivity
{
public IDictionary string, string Execute(
IEnumerable linkedServices,
IEnumerable datasets,
Activity activity,
IActivityLogger logger)
{
logger.Write("Start");
//Get extended properties
DotNetActivity dotNetActivityPipeline = (DotNetActivity)activity.TypeProperties;
string sliceStartString = dotNetActivityPipeline.ExtendedProperties["SliceStart"];
//Get linked service details
Dataset inputDataset = datasets.Single(dataset = dataset.Name == activity.Inputs.Single().Name);
Dataset outputDataset = datasets.Single(dataset = dataset.Name == activity.Outputs.Single().Name);
/*
DO FTP download here
*/
logger.Write("End");
return new Dictionary string, string();
}
}
}
So far my code works and I have the files downloaded to my blob storage account.
Now that I have the files downloaded, I would like to have the nodes of the Batch pool decompress the files and put the decompressed files in my blob storage for further processing.
For this, wgrib2.exe is used, which comes with some dll files. I have already zipped and uploaded the executable and all dll files it needs to the Application Packages to my pool. If I am correct, when each node joins the pool, this executable will be extracted and available for calls.
My question is: how do I go about to write the custom .NET activity so the files are downloaded by the nodes of my pool and after each file is downloaded, a decompression command is run on each file to convert it to csv file? The command line for this would look like:
wgrib2.exe downloadedfileName.grb2 -csv downloadedfileName.csv
How do I get a handle of the name of each downloaded file, how do I precess it on the node and save it back to the blob storage?
Also, how can I control how many files are downloaded at the same time and how many are decompressed at the same time?

efficiently pass files from webserver to file server

i have multiple web server and one central file server inside my data center.
and all my Web server store the user uploaded files into central internal file server.
i would like to know what is the best way to pass the file from web server to file server in this case?
as suggested i try to add more details to question:
the solution i came up was:
after receiving files from user at web server, i should just do an Http Post to the file server. but i think there is some thing wrong with this because it causes large files to be entirely loaded into memory twice: (once at web server and once at file server)
Is your file server just another windows/linux server or is it a NAS device. I can suggest you number of approaches based on your requirement. The question is why d you want to use HTTP protocol when you have much better way to transfer files between servers.
HTTP protocol is best when you send text data as HTTP itself is based
on text.From the client side to Server side HTTP is used as that is
the only available option for you by our browsers .But
between your servers ,I feel you should use SMB protocol(am assuming
you are using windows as it is tagged for IIS) to move data.It will
be orders of magnitude faster as much more efficient to transfer the same data over SMB vs
HTTP.
And for SMB protocol,you do not have to write any code or complex scripts to do this.As provided by one of the answers above,you can just issue a simple copy command and it will happen for you.
So just summarizing the options for you (based on my preference)
Let the files get upload to some location on the each IIS web server e.g C:\temp\UploadedFiles . You can write a simple 2-3 line powershell script which will copy the files from this C:\temp\UploadedFiles to \FileServer\Files\UserID\\uploaded.file .This same powershell script can delete the file once it is moved to the other server successfully.
E.g script can be this simple and easy to make it as windows scheduled task
$Destination = "\\FileServer\Files\UserID\<FILEGUID>\"
New-Item -ItemType directory -Path $Destination -Force
Copy-Item -Path $Source\*.* -Destination $Destination -Force
This script can be modified to suit your needs to delete the files if it is done :)
In the Asp.net application ,you can directly save the file to network location.So in the SaveAs call,you can give the network path itself. This you have to make sure this network share is accessible for the IIS worker process and also has write permission.Also in my understanding asp.net gets the file saved to temporary location first (you do not have control on this if you are using the asp.net HttpPostedFileBase or FormCollection ). More details here
You can even run this in an async so that your requests will not be blocked
if (FileUpload1.HasFile)
// Call to save the file.
FileUpload1.SaveAs("\\networkshare\filename");
https://msdn.microsoft.com/en-us/library/system.web.ui.webcontrols.fileupload.saveas(v=vs.110).aspx
3.Save the file the current way to local directory and then use HTTP POST. This is worst design possible as you are first going to read the contents and then transfer it as chunked to other server where you have to setup another webservice which recieves the file.The you have to read the file from request stream and again save it to your location. Am not sure if you need to do this.
let me know if you need more details on any of the listed method.
Or you just write it to a folder on the webservers, and create a scheduled task that moves the files to the file server every x minutes (e.g. via robocopy). This also makes sure your webservers are not reliant on your file server.
Assuming that you have an HttpPostedFileBase then the best way is just to call the .SaveAs() method.
You need the UNC path to the file server and that is it. The simplest version would look something like this:
public void SaveFile(HttpPostedFileBase inputFile) {
var saveDirectory = #"\\fileshare\application\directory";
var savePath = Path.Combine(saveDirectory, inputFile.FileName);
inputFile.SaveAs(savePath);
}
However, this is simplistic in the extreme. Take a look at the OWASP Guidance on Unrestricted File Uploads. File uploads can be the source of many vulnerabilities in your application.
You also need to make sure that the web application has access to the file share. Take a look at this answer
Creating a file on network location in asp.net
for more info. Generally the best solution is to run the application pool with a special identity which is only used to access the folder.
the solution i came up was: after receiving files from user at web server, i should just do an Http Post to the file server. but i think there is some thing wrong with this because it causes large files to be entirely loaded into memory twice: (once at web server and once at file server)
I would suggest not posting the file at once - it's then full in memory, which is not needed.
You could post the file in chunks, by using ajax. When a chunk receives at your server, just add it to the file.
With the File Reader API, you could read the file in chunks in Javascript.
Something like this:
/** upload file in chunks */
function upload(file) {
var chunkSize = 8000;
var start = 0;
while (start < file.size) {
var chunk = file.slice(start, start + chunkSize);
var xhr = new XMLHttpRequest();
xhr.onload = function () {
//check if all chunks are and then send filename or send in in the first/last request.
};
xhr.open("POST", "/FileUpload", true);
xhr.send(chunk);
start = end;
}
}
It can be implemented in different ways. If you are storing files in files server as files in file system. And all of your servers inside the same virtual network
Then will be better to create shared folder on your file server and once you received files at web server, just save this file in this shared folder directly on file server.
Here the instructions how to create shared folders: https://technet.microsoft.com/en-us/library/cc770880(v=ws.11).aspx
Just map a drive
I take it you have a means of saving the uploaded file on the web server's local filesystem. The question pertains to moving the file from the web server (which is probably one of many load-balanced nodes) to a central file system all web servers can access it.
The solution to this is remarkably simple.
Let's say you are currently saving the files some folder, say c:\uploadedfiles. The path to uploadedfiles is stored in your web.config.
Take the following steps:
Sign on as the service account under which your web site executes
Map a persistent network drive to the desired location, e.g. from command line:
NET USE f: \\MyFileServer\MyFileShare /user:SomeUserName password
Modify your web.config and change c:\uploadedfiles to f:\
Ta da, all done.
Just make sure the drive mapping is persistent, and make sure you use a user with adequate permissions, and voila.

How to upload files to Azure Blob Storage using Azure WebJob? (Using C#, .NET)

I have a very basic understanding of Azure WebJob, that is, it can perform tasks in background. I want to upload files to Azure Blob Storage, specifically using Azure WebJob. I would like to know how to do this, from scratch. Assume that the file to be uploaded is locally available on the system in a certain folder (say C:/Users/Abc/Alpha/Beta/).
How and where do I define the background task that is supposed to be performed?
How to make sure, that whenever a new file is available in the same folder (
C:/Users/Abc/Alpha/Beta/) the function is automatically triggered, and this new file is also transferred to Azure Blob Storage?
Can I monitor progress of transfer for each file? or for all files?
How to handle connection failures during transfer? and what other errors should I worry about?
How and where do I define the background task that is supposed to be performed?
According to your description, you could create a webjob console application in the VS.You could run this console application in the local.
More details, you could refer to this article to know how to create webjob in the VS.
Notice:Sine you need watch the local side folder, this webjob is running in your local side not upload to the azure web app.
How to make sure, that whenever a new file is available in the same folder ( C:/Users/Abc/Alpha/Beta/) the function is automatically triggered, and this new file is also transferred to Azure Blob Storage?
As far as I know, webjob support the filetrigger, it will monitor for file additions/changes to a particular directory, and triggers a job function when they occur.
More details, you could refer to below code sample:
Program.cs:
static void Main()
{
var config = new JobHostConfiguration();
FilesConfiguration filesConfig = new FilesConfiguration();
//set the root path when the function to watch the folder
filesConfig.RootPath = #"D:\";
config.UseFiles(filesConfig);
var host = new JobHost(config);
// The following code ensures that the WebJob will be running continuously
host.RunAndBlock();
}
function.cs:
public static void ImportFile(
[FileTrigger(#"fileupload\{name}", "*.*", WatcherChangeTypes.Created | WatcherChangeTypes.Changed)] Stream file,
FileSystemEventArgs fileTrigger,
[Blob("textblobs/{name}", FileAccess.Write)] Stream blobOutput,
TextWriter log)
{
log.WriteLine(string.Format("Processed input file '{0}'!", fileTrigger.Name));
file.CopyTo(blobOutput);
log.WriteLine("Upload File Complete");
}
Can I monitor progress of transfer for each file? or for all files?
As far as I know, there's a [BlobInput] attribute that lets you specify a container to listen on, and it includes an efficient blob listener that will dispatch to the method when new blobs are detected. More details, you could refer to this article.
How to handle connection failures during transfer? and what other errors should I worry about?
You could use try catch to catch the error.If the error happens you could send the details to the queue or write a txt file in the blob. Then you could do some operation according to the queue message or the blob txt file.

Categories

Resources