I have a file such as Parent.zip and when unzipped, it will yield these files: child1.jpg , child2.txt , child3.pdf.
When running Parent.zip through the function below, the files are correctly unzipped to:
some-container/child1.jpg
some-container/child2.txt
some-container/child3.pdf
How do I unzip the files to their parent folder? The desired result would be:
some-container/Parent/child1.jpg
some-container/Parent/child2.txt
some-container/Parent/child3.pdf
As you can see above the folder Parent was created.
I am using this to create the files in blob:
using (var stream = entry.Open ()) {
//check for file or folder and update the above blob reference with actual content from stream
if (entry.Length > 0) {
await blob.UploadFromStreamAsync (stream);
}
}
Here's the full source:
[FunctionName ("OnUnzipHttpTriggered")]
public static async Task<IActionResult> Run (
[HttpTrigger (AuthorizationLevel.Function, "get", "post", Route = null)] HttpRequest req,
ILogger log) {
log.LogInformation ("C# HTTP trigger function processed a request.");
var requestBody = new StreamReader (req.Body).ReadToEnd ();
var data = JsonConvert.DeserializeObject<ZipFileMetaData> (requestBody);
var storageAccount =
CloudStorageAccount.Parse (Environment.GetEnvironmentVariable ("StorageConnectionString"));
var blobClient = storageAccount.CreateCloudBlobClient ();
var container = blobClient.GetContainerReference (data.SourceContainer);
var blockBlob = container.GetBlockBlobReference (data.FileName);
var extractcontainer = blockBlob.ServiceClient.GetContainerReference (data.DestinationContainer.ToLower ());
await extractcontainer.CreateIfNotExistsAsync ();
var files = new List<string> ();
// Save blob(zip file) contents to a Memory Stream.
using (var zipBlobFileStream = new MemoryStream ()) {
await blockBlob.DownloadToStreamAsync (zipBlobFileStream);
await zipBlobFileStream.FlushAsync ();
zipBlobFileStream.Position = 0;
//use ZipArchive from System.IO.Compression to extract all the files from zip file
using (var zip = new ZipArchive (zipBlobFileStream)) {
//Each entry here represents an individual file or a folder
foreach (var entry in zip.Entries) {
files.Add (entry.FullName);
//creating an empty file (blobkBlob) for the actual file with the same name of file
var blob = extractcontainer.GetBlockBlobReference (entry.FullName);
using (var stream = entry.Open ()) {
//check for file or folder and update the above blob reference with actual content from stream
if (entry.Length > 0) {
await blob.UploadFromStreamAsync (stream);
}
}
// TO-DO : Process the file (Blob)
//process the file here (blob) or you can write another process later
//to reference each of these files(blobs) on all files got extracted to other container.
}
}
}
return new OkObjectResult (files);
}
Simply add directory name before entry name, and we can see the directory created automatically.
var blob = extractcontainer.GetBlockBlobReference ("Parent/"+entry.FullName);
Note that the directory is virtual. Blob Storage is in container/blob structure, the directories are actually prefixes of blob names, and Storage service displays the directory structure according to the / separator for us.
Related
In a down stream system Every day one data folder is created within folder Data and files are generated over time within a sub folder TS and finally it's zip with name Data.zip and uploaded to azure blob by customer.
Now I am downloading the zip file and trying to find out the one file which has max date/time. using below code I am able to print all files name inside the zip file, but how to get (print) only the file with max date/time?
var blobClient = new BlobClient("conn-string", "upload", "Data.zip");
await DownloadFromStream(blobClient);
public static async Task DownloadFromStream(BlobClient blobClient)
{
int i = 0;
var stream = await blobClient.OpenReadAsync(new BlobOpenReadOptions(false));
using ZipArchive archive = new ZipArchive(stream);
foreach (ZipArchiveEntry entry in archive.Entries.OrderBy(x => x.LastWriteTime))
{
if (entry.Name.StartsWith("XXX_TS_"))
{
i++;
Console.WriteLine(i);
Console.WriteLine(entry.Name);
}
}
}
I have tried in the below way, and it worked for me.
Uploaded the below files to azure container in zip folder.
In Azure portal.
Code snippet
var blobClient = new BlobClient("ConnectionString", "ContainerName", "Data.zip");
await DownloadFromStream(blobClient);
public static async Task DownloadFromStream(BlobClient blobClient)
{
var stream = await blobClient.OpenReadAsync(new BlobOpenReadOptions(false));
ZipArchive archive = new ZipArchive(stream);
List<string> sFileslist = new List<string>();
foreach (ZipArchiveEntry entry in archive.Entries.OrderBy(x => x.LastWriteTime))
{
if (entry.Name.Contains("_TS_"))
{
string[] strFileTokens = entry.Name.Split('_');
sFileslist.Add(strFileTokens[2]);
}
}
string maxValue = sFileslist.Max();
Console.WriteLine(maxValue);
}
Fetched the Latest file as output.
I have embedded pdf files stored inside a blob file.I want to extract those file from my blob.
below are the thing I have done so far:
I have made http trigger function app
establish connection with the storage container
able to fetch the blob.
get the embedded file I am using following code:
namespace PDFDownloader {
public static class Function1 { [FunctionName("Function1")]
public static async Task <IActionResult> Run([HttpTrigger(AuthorizationLevel.Function, "get", "post", Route = null)] HttpRequest req, ILogger log) {
log.LogInformation($"GetVolumeData function executed at:
{DateTime.Now}");
try {
CloudStorageAccount storageAccount = CloudStorageAccount.Parse(Parameter.ConnectionString);
CloudBlobClient cloudBlobClient = storageAccount.CreateCloudBlobClient();
CloudBlobContainer cloudcontainer = cloudBlobClient.GetContainerReference(Parameter.SuccessContainer);
BlobResultSegment resultSegment = await
cloudcontainer.ListBlobsSegmentedAsync(currentToken: null);
IEnumerable <IListBlobItem> blobItems = resultSegment.Results;
string response = "";
int count = 0;
//string blobName = "";
foreach(IListBlobItem item in blobItems) {
var type = item.GetType();
if (type == typeof(CloudBlockBlob)) {
CloudBlockBlob blob = (CloudBlockBlob) item;
count++;
var blobname = blob.Name;
// response = blobname;
response = blob.DownloadTextAsync().Result;
//response = blob.DownloadToStream().Result;
}
}
if (count == 0) {
return new OkObjectResult("Error : File Not Found !!");
} else {
return new OkObjectResult(Convert.ToString(response));
}
} catch(Exception ex) {
log.LogError($ " Function Exception Message: {ex.Message}");
return new OkObjectResult(ex.Message.ToString());
} finally {
log.LogInformation($"Function- ENDED ON : {DateTime.Now}");
}
}
}
how can I read embedded files from my blob file response and send it to http?
Apart from the fact that your code needs quite some cleanup and that you should read up on the proper use of async, I believe your actual issue is here:
FileStream inputStream = new FileStream(response, FileMode.Open);
The response object contains the text content of your blob that you downloaded earlier. the Filestream ctor, however, expects a path to a file. Since you do not have a file here, Filestream is not the right thing to use. Either download the blob as a temp file or even directly as a string
Plus, do yourself a favor and switch to the latest version of the Storage Blob SDK (https://github.com/Azure/azure-sdk-for-net/tree/master/sdk/storage/Azure.Storage.Blobs#downloading-a-blob).
Think the one blob is the equivalent of one file(check it here).
Check the line:
response = Convert.ToString(blob.DownloadTextAsync().Result);
Is the content of your blob a valid file path ?
Maybe you are not using correctly the constructor of FileStream class public FileStream (string path, System.IO.FileMode mode). This constructor can throw a lot of different exceptions, try to find yours.
Also, as it was recommended in a previous answer, it is worth using the Azure.Storage.Blobs package based on SDK version 12, now you are using the SDK version 11(Microsoft.Azure.Storage.Blob).
using Bytescout.PDFExtractor;
var stream1 = await blob.OpenReadAsync(); //read your blob like
this
attachmentExtractor extractor = new AttachmentExtractor();
extractor.RegistrationName = "demo";
extractor.RegistrationKey = "demo";
// Load sample PDF document
extractor.LoadDocumentFromFile(stream1);
for (int i = 0; i < extractor.Count; i++)
{
Console.WriteLine("Saving attachment: " +
extractor.GetFileName(i));
// Save attachment to file
extractor.Save(i, extractor.GetFileName(i));
Console.WriteLine("File size: " + extractor.GetSize(i));
}
extractor.Dispose();*/
We are using parquet.net to write parquet files. I've set up a simple schema containing 3 columns, and 2 rows:
// Set up the file structure
var UserKey = new Parquet.Data.DataColumn(
new DataField<Int32>("UserKey"),
new Int32[] { 1234, 12345}
);
var AADID = new Parquet.Data.DataColumn(
new DataField<string>("AADID"),
new string[] { Guid.NewGuid().ToString(), Guid.NewGuid().ToString() }
);
var UserLocale = new Parquet.Data.DataColumn(
new DataField<string>("UserLocale"),
new string[] { "en-US", "en-US" }
);
var schema = new Schema(UserKey.Field, AADID.Field, UserLocale.Field
);
When using a FileStream to write to a local file, a file is created, and when the code finishes, I can see two rows in the file (which is 1 kb after):
using (Stream fileStream = System.IO.File.OpenWrite("C:\\Temp\\Users.parquet")) {
using (var parquetWriter = new ParquetWriter(schema, fileStream)) {
// Creare a new row group in the file
using (ParquetRowGroupWriter groupWriter = parquetWriter.CreateRowGroup()) {
groupWriter.WriteColumn(UserKey);
groupWriter.WriteColumn(AADID);
groupWriter.WriteColumn(UserLocale);
}
}
}
Yet, when I attempt to use the same to write to our blob storage, that only generates an empty file, and the data is missing:
// Open reference to Blob Container
CloudAppendBlob blob = OpenBlobFile(blobEndPoint, fileName);
using (MemoryStream stream = new MemoryStream()) {
blob.CreateOrReplaceAsync();
using (var parquetWriter = new ParquetWriter(schema, stream)) {
// Creare a new row group in the file
using (ParquetRowGroupWriter groupWriter = parquetWriter.CreateRowGroup()) {
groupWriter.WriteColumn(UserKey);
groupWriter.WriteColumn(AADID);
groupWriter.WriteColumn(UserLocale);
}
// Set stream position to 0
stream.Position = 0;
blob.AppendBlockAsync(stream);
return true;
}
...
public static CloudAppendBlob OpenBlobFile (string blobEndPoint, string fileName) {
CloudBlobContainer container = new CloudBlobContainer(new System.Uri(blobEndPoint));
CloudAppendBlob blob = container.GetAppendBlobReference(fileName);
return blob;
}
Reading the documentation, I would think my implementation of the blob.AppendBlocAsync should do the trick, but yet I end up with an empty file. Would anyone have suggestions as to why this is and how I can resolve it so I actually end up with data in the file?
Thanks in advance.
The explanation for the file ending up empty is the line:
blob.AppendBlockAsync(stream);
Note how the function called has the Async suffix. This means it expects whatever is calling it to wait. I turned the function the code was in into an Async one, and had Visual Studio suggest the following change to the line:
_ = await blob.AppendBlockAsync(stream);
I'm not entirely certain what _ represents, and hovering my mouse over it doesn't reveal much more, other than it being a long data type, but the code now works as intended.
I want to upload zip file in small chunks (less than 5 MB) to blob containers in Microsoft Azure Storage. I already configured 4 MB chunk limits in BlobRequestOptions but when I run my code and check the memory usage in Azure Cloud, its not uploading in chunks. I am using C# .NET Core. Because I want to zip files that are already located in Azure Cloud, so first I am downloading the individual files to stream, adding stream to zip archive and then uploading the zip back to the cloud. The following is my code:
if (CloudStorageAccount.TryParse(_Appsettings.GetSection("StorConf").GetSection("StorageConnection").Value, out CloudStorageAccount storageAccount)) {
CloudBlobClient BlobClient = storageAccount.CreateCloudBlobClient();
TimeSpan backOffPeriod = TimeSpan.FromSeconds(2);
int retryCount = 1;
BlobRequestOptions bro = new BlobRequestOptions() {
SingleBlobUploadThresholdInBytes = 4096 * 1024, // 4MB
ParallelOperationThreadCount = 1,
RetryPolicy = new ExponentialRetry(backOffPeriod, retryCount),
// new
ServerTimeout = TimeSpan.MaxValue,
MaximumExecutionTime = TimeSpan.FromHours(3),
//EncryptionPolicy = policy
};
// set blob request option for created blob client
BlobClient.DefaultRequestOptions = bro;
// using specified container which comes via transaction id
CloudBlobContainer container = BlobClient.GetContainerReference(transaction id);
using(var zipArchiveMemoryStream = new MemoryStream()) {
using(var zipArchive = new ZipArchive(zipArchiveMemoryStream, ZipArchiveMode.Create, true)) // new
{
foreach(FilesListModel FileName in filesList) {
if (await container.ExistsAsync()) {
CloudBlob file = container.GetBlobReference(FileName.FileName);
if (await file.ExistsAsync()) {
// zip: get stream and add zip entry
var entry = zipArchive.CreateEntry(FileName.FileName, CompressionLevel.Fastest);
// approach 1
using(var entryStream = entry.Open()) {
await file.DownloadToStreamAsync(entryStream, null, bro, null);
await entryStream.FlushAsync();
entryStream.Close();
}
} else {
downlReady = "false";
}
} else {
// case: Container does not exist
//return BadRequest("Container does not exist");
}
}
}
if (downlReady == "true") {
string zipFileName = "sample.zip";
CloudBlockBlob zipBlockBlob = container.GetBlockBlobReference(zipFileName);
zipArchiveMemoryStream.Position = 0;
//zipArchiveMemoryStream.Seek(0, SeekOrigin.Begin);
// new
zipBlockBlob.Properties.ContentType = "application/x-zip-compressed";
await zipArchiveMemoryStream.FlushAsync();
await zipBlockBlob.UploadFromStreamAsync(zipArchiveMemoryStream, zipArchiveMemoryStream.Length, null, bro, null);
}
zipArchiveMemoryStream.Close();
}
}
The following is a snapshot of the memory usage (see private_Memory) in azure cloud kudu process explorer:
memory usage
Any suggestions would be really helpful. Thank you.
UPDATE 1:
To make it more clear. I have files which are already located in Azure blob storage. Now I want to read the files from the container, create a ZIP which contains all of my files. The major challenge here is that my code is obviously loading all files into memory to create the zip. If and how it is possible to read files from a container and write the ZIP file back into the same container in parallel/pieces, so that my Azure web app does NOT need to load the whole files into memory? Ideally I read the files in pieces and also start writing the zip already so that my Azure web app consumes less memory.
I have found the solution by referring to this stackoverflow article:
How can I dynamically add files to a zip archive stored in Azure blob storage?
The way to do is to simultaneously write to the zip memory stream while reading / downloading the input files.
Below is my code snippet:
using (var zipArchiveMemoryStream = await zipBlockBlob.OpenWriteAsync(null, bro, null))
using (var zipArchive = new ZipArchive(zipArchiveMemoryStream, ZipArchiveMode.Create))
{
foreach (FilesListModel FileName in filesList)
{
if (await container.ExistsAsync())
{
CloudBlob file = container.GetBlobReference(FileName.FileName);
if (await file.ExistsAsync())
{
// zip: get stream and add zip entry
var entry = zipArchive.CreateEntry(FileName.FileName, CompressionLevel.Fastest);
// approach 1
using (var entryStream = entry.Open())
{
await file.DownloadToStreamAsync(entryStream, null, bro, null);
entryStream.Close();
}
}
}
}
zipArchiveMemoryStream.Close();
}
I am trying to read a csv file from my azure storage account.
To convert each line into an object and build a list of those objects.
It keeps erring, and the reason is it cant find the file (Blob not found). The file is there, It is a csv file.
Error:
StorageException: The specified blob does not exist.
BatlGroup.Site.Services.AzureStorageService.AzureFileMethods.ReadCsvFileFromBlobAsync(CloudBlobContainer container, string fileName) in AzureFileMethods.cs
+
await blob.DownloadToStreamAsync(memoryStream);
public async Task<Stream> ReadCsvFileFromBlobAsync(CloudBlobContainer container, string fileName)
{
// Retrieve reference to a blob (fileName)
var blob = container.GetBlockBlobReference(fileName);
using (var memoryStream = new MemoryStream())
{
//downloads blob's content to a stream
await blob.DownloadToStreamAsync(memoryStream);
return memoryStream;
}
}
I've made sure the file is public. I can download any text file that is stored there, but none of the csv files.
I am also not sure what format to take it in as I need to iterate through the lines.
I see examples of bringing the whole file down to a temp drive and working with it there but that seems unproductive as then I could just store the file in wwroot folder instead of azure.
What is the most appropriate way to read a csv file from azure storage.
Regarding how to iterate through the lines, after you get the memory stream, you can use StreamReader to read them line by line.
Sample code as below:
using Microsoft.WindowsAzure.Storage;
using Microsoft.WindowsAzure.Storage.Blob;
using System;
using System.IO;
namespace ConsoleApp17
{
class Program
{
static void Main(string[] args)
{
string connstr = "your connection string";
CloudStorageAccount storageAccount = CloudStorageAccount.Parse(connstr);
CloudBlobClient blobClient = storageAccount.CreateCloudBlobClient();
CloudBlobContainer container = blobClient.GetContainerReference("t11");
CloudBlockBlob blockBlob = container.GetBlockBlobReference("students.csv");
string text="";
string temp = "";
using (var memoryStream = new MemoryStream())
{
blockBlob.DownloadToStream(memoryStream);
//remember set the position to 0
memoryStream.Position = 0;
using (var reader = new StreamReader(memoryStream))
{
//read the csv file as per line.
while (!reader.EndOfStream && !string.IsNullOrEmpty(temp=reader.ReadLine()))
{
text = text + "***" + temp;
}
}
}
Console.WriteLine(text);
Console.WriteLine("-------");
Console.ReadLine();
}
}
}
My csv file:
The test result: