Extract embedded files from azure blob in c# - c#

I have embedded pdf files stored inside a blob file.I want to extract those file from my blob.
below are the thing I have done so far:
I have made http trigger function app
establish connection with the storage container
able to fetch the blob.
get the embedded file I am using following code:
namespace PDFDownloader {
public static class Function1 { [FunctionName("Function1")]
public static async Task <IActionResult> Run([HttpTrigger(AuthorizationLevel.Function, "get", "post", Route = null)] HttpRequest req, ILogger log) {
log.LogInformation($"GetVolumeData function executed at:
{DateTime.Now}");
try {
CloudStorageAccount storageAccount = CloudStorageAccount.Parse(Parameter.ConnectionString);
CloudBlobClient cloudBlobClient = storageAccount.CreateCloudBlobClient();
CloudBlobContainer cloudcontainer = cloudBlobClient.GetContainerReference(Parameter.SuccessContainer);
BlobResultSegment resultSegment = await
cloudcontainer.ListBlobsSegmentedAsync(currentToken: null);
IEnumerable <IListBlobItem> blobItems = resultSegment.Results;
string response = "";
int count = 0;
//string blobName = "";
foreach(IListBlobItem item in blobItems) {
var type = item.GetType();
if (type == typeof(CloudBlockBlob)) {
CloudBlockBlob blob = (CloudBlockBlob) item;
count++;
var blobname = blob.Name;
// response = blobname;
response = blob.DownloadTextAsync().Result;
//response = blob.DownloadToStream().Result;
}
}
if (count == 0) {
return new OkObjectResult("Error : File Not Found !!");
} else {
return new OkObjectResult(Convert.ToString(response));
}
} catch(Exception ex) {
log.LogError($ " Function Exception Message: {ex.Message}");
return new OkObjectResult(ex.Message.ToString());
} finally {
log.LogInformation($"Function- ENDED ON : {DateTime.Now}");
}
}
}
how can I read embedded files from my blob file response and send it to http?

Apart from the fact that your code needs quite some cleanup and that you should read up on the proper use of async, I believe your actual issue is here:
FileStream inputStream = new FileStream(response, FileMode.Open);
The response object contains the text content of your blob that you downloaded earlier. the Filestream ctor, however, expects a path to a file. Since you do not have a file here, Filestream is not the right thing to use. Either download the blob as a temp file or even directly as a string
Plus, do yourself a favor and switch to the latest version of the Storage Blob SDK (https://github.com/Azure/azure-sdk-for-net/tree/master/sdk/storage/Azure.Storage.Blobs#downloading-a-blob).

Think the one blob is the equivalent of one file(check it here).
Check the line:
response = Convert.ToString(blob.DownloadTextAsync().Result);
Is the content of your blob a valid file path ?
Maybe you are not using correctly the constructor of FileStream class public FileStream (string path, System.IO.FileMode mode). This constructor can throw a lot of different exceptions, try to find yours.
Also, as it was recommended in a previous answer, it is worth using the Azure.Storage.Blobs package based on SDK version 12, now you are using the SDK version 11(Microsoft.Azure.Storage.Blob).

using Bytescout.PDFExtractor;
var stream1 = await blob.OpenReadAsync(); //read your blob like
this
attachmentExtractor extractor = new AttachmentExtractor();
extractor.RegistrationName = "demo";
extractor.RegistrationKey = "demo";
// Load sample PDF document
extractor.LoadDocumentFromFile(stream1);
for (int i = 0; i < extractor.Count; i++)
{
Console.WriteLine("Saving attachment: " +
extractor.GetFileName(i));
// Save attachment to file
extractor.Save(i, extractor.GetFileName(i));
Console.WriteLine("File size: " + extractor.GetSize(i));
}
extractor.Dispose();*/

Related

Upload a zip file in small chunks to azure cloud blob storage

I want to upload zip file in small chunks (less than 5 MB) to blob containers in Microsoft Azure Storage. I already configured 4 MB chunk limits in BlobRequestOptions but when I run my code and check the memory usage in Azure Cloud, its not uploading in chunks. I am using C# .NET Core. Because I want to zip files that are already located in Azure Cloud, so first I am downloading the individual files to stream, adding stream to zip archive and then uploading the zip back to the cloud. The following is my code:
if (CloudStorageAccount.TryParse(_Appsettings.GetSection("StorConf").GetSection("StorageConnection").Value, out CloudStorageAccount storageAccount)) {
CloudBlobClient BlobClient = storageAccount.CreateCloudBlobClient();
TimeSpan backOffPeriod = TimeSpan.FromSeconds(2);
int retryCount = 1;
BlobRequestOptions bro = new BlobRequestOptions() {
SingleBlobUploadThresholdInBytes = 4096 * 1024, // 4MB
ParallelOperationThreadCount = 1,
RetryPolicy = new ExponentialRetry(backOffPeriod, retryCount),
// new
ServerTimeout = TimeSpan.MaxValue,
MaximumExecutionTime = TimeSpan.FromHours(3),
//EncryptionPolicy = policy
};
// set blob request option for created blob client
BlobClient.DefaultRequestOptions = bro;
// using specified container which comes via transaction id
CloudBlobContainer container = BlobClient.GetContainerReference(transaction id);
using(var zipArchiveMemoryStream = new MemoryStream()) {
using(var zipArchive = new ZipArchive(zipArchiveMemoryStream, ZipArchiveMode.Create, true)) // new
{
foreach(FilesListModel FileName in filesList) {
if (await container.ExistsAsync()) {
CloudBlob file = container.GetBlobReference(FileName.FileName);
if (await file.ExistsAsync()) {
// zip: get stream and add zip entry
var entry = zipArchive.CreateEntry(FileName.FileName, CompressionLevel.Fastest);
// approach 1
using(var entryStream = entry.Open()) {
await file.DownloadToStreamAsync(entryStream, null, bro, null);
await entryStream.FlushAsync();
entryStream.Close();
}
} else {
downlReady = "false";
}
} else {
// case: Container does not exist
//return BadRequest("Container does not exist");
}
}
}
if (downlReady == "true") {
string zipFileName = "sample.zip";
CloudBlockBlob zipBlockBlob = container.GetBlockBlobReference(zipFileName);
zipArchiveMemoryStream.Position = 0;
//zipArchiveMemoryStream.Seek(0, SeekOrigin.Begin);
// new
zipBlockBlob.Properties.ContentType = "application/x-zip-compressed";
await zipArchiveMemoryStream.FlushAsync();
await zipBlockBlob.UploadFromStreamAsync(zipArchiveMemoryStream, zipArchiveMemoryStream.Length, null, bro, null);
}
zipArchiveMemoryStream.Close();
}
}
The following is a snapshot of the memory usage (see private_Memory) in azure cloud kudu process explorer:
memory usage
Any suggestions would be really helpful. Thank you.
UPDATE 1:
To make it more clear. I have files which are already located in Azure blob storage. Now I want to read the files from the container, create a ZIP which contains all of my files. The major challenge here is that my code is obviously loading all files into memory to create the zip. If and how it is possible to read files from a container and write the ZIP file back into the same container in parallel/pieces, so that my Azure web app does NOT need to load the whole files into memory? Ideally I read the files in pieces and also start writing the zip already so that my Azure web app consumes less memory.
I have found the solution by referring to this stackoverflow article:
How can I dynamically add files to a zip archive stored in Azure blob storage?
The way to do is to simultaneously write to the zip memory stream while reading / downloading the input files.
Below is my code snippet:
using (var zipArchiveMemoryStream = await zipBlockBlob.OpenWriteAsync(null, bro, null))
using (var zipArchive = new ZipArchive(zipArchiveMemoryStream, ZipArchiveMode.Create))
{
foreach (FilesListModel FileName in filesList)
{
if (await container.ExistsAsync())
{
CloudBlob file = container.GetBlobReference(FileName.FileName);
if (await file.ExistsAsync())
{
// zip: get stream and add zip entry
var entry = zipArchive.CreateEntry(FileName.FileName, CompressionLevel.Fastest);
// approach 1
using (var entryStream = entry.Open())
{
await file.DownloadToStreamAsync(entryStream, null, bro, null);
entryStream.Close();
}
}
}
}
zipArchiveMemoryStream.Close();
}

Is there a way to get a file stream to download to the browser in Blazor?

I'm trying out a few things with Blazor and I'm still new to it. I'm trying to get a file stream to download to the browser. What's the best way to download a file from Blazor to browser?
I've tried using a method in my razor view that returns a stream but that didn't work.
//In my Blazor view
#code{
private FileStream Download()
{
//get path + file name
var file = #"c:\path\to\my\file\test.txt";
var stream = new FileStream(test, FileMode.OpenOrCreate);
return stream;
}
}
The code above doesn't give me anything, not even an error
Another solution is to add a simple api controller endpoint using endpoints.MapControllerRoute. This will work only with server side blazor though.
Ex:
endpoints.MapBlazorHub();
endpoints.MapControllerRoute("mvc", "{controller}/{action}");
endpoints.MapFallbackToPage("/_Host");
Then add a controller. For example:
public class InvoiceController : Controller
{
[HttpGet("~/invoice/{sessionId}")]
public async Task<IActionResult> Invoice(string sessionId, CancellationToken cancel)
{
return File(...);
}
}
Usage in a .razor file:
async Task GetInvoice()
{
...
Navigation.NavigateTo($"/invoice/{orderSessionId}", true);
}
Although the above answer is technically correct, if you need to pass in a model -POST-, then NavigationManager won't work. In which case you, must likely end up using HttpClient component. If so wrap the response.Content -your stream- in a DotNetStreamReference instance - new DotNetStreamReference(response.Content). This will create a ReadableStream. Then create the blob with the content. Keep in mind DotNetStreamReference was recently introduced with .NET 6 RC1. As of now the most efficient way. Otherwise, you can use fetch API and create a blob from the response.
I wound up doing it a different way, not needing NavigationManager. It was partially taken from the Microsoft Docs here. In my case I needed to render an Excel file (using EPPlus) but that is irrelevant. I just needed to return a Stream to get my result.
On my Blazor page or component when a button is clicked:
public async Task GenerateFile()
{
var fileStream = ExcelExportService.GetExcelStream(exportModel);
using var streamRef = new DotNetStreamReference(stream: fileStream);
await jsRuntime.InvokeVoidAsync("downloadFileFromStream", "Actual File Name.xlsx", streamRef);
}
The GetExcelStream is the following:
public static Stream GetExcelStream(ExportModel exportModel)
{
var result = new MemoryStream();
ExcelPackage.LicenseContext = LicenseContext.Commercial;
var fileName = #$"Gets Overwritten";
using (var package = new ExcelPackage(fileName))
{
var sheet = package.Workbook.Worksheets.Add(exportModel.SomeUsefulName);
var rowIndex = 1;
foreach (var dataRow in exportModel.Rows)
{
...
// Add rows and cells to the worksheet
...
}
sheet.Cells.AutoFitColumns();
package.SaveAs(result);
}
result.Position = 0; // This is required or no data is in result
return result;
}
This JavaScript is in the link above, but adding it here as the only other thing I needed.
window.downloadFileFromStream = async (fileName, contentStreamReference) => {
const arrayBuffer = await contentStreamReference.arrayBuffer();
const blob = new Blob([arrayBuffer]);
const url = URL.createObjectURL(blob);
const anchorElement = document.createElement("a");
anchorElement.href = url;
anchorElement.download = fileName ?? "";
anchorElement.click();
anchorElement.remove();
URL.revokeObjectURL(url);
}

how to create folder in blob storage

I have a file such as Parent.zip and when unzipped, it will yield these files: child1.jpg , child2.txt , child3.pdf.
When running Parent.zip through the function below, the files are correctly unzipped to:
some-container/child1.jpg
some-container/child2.txt
some-container/child3.pdf
How do I unzip the files to their parent folder? The desired result would be:
some-container/Parent/child1.jpg
some-container/Parent/child2.txt
some-container/Parent/child3.pdf
As you can see above the folder Parent was created.
I am using this to create the files in blob:
using (var stream = entry.Open ()) {
//check for file or folder and update the above blob reference with actual content from stream
if (entry.Length > 0) {
await blob.UploadFromStreamAsync (stream);
}
}
Here's the full source:
[FunctionName ("OnUnzipHttpTriggered")]
public static async Task<IActionResult> Run (
[HttpTrigger (AuthorizationLevel.Function, "get", "post", Route = null)] HttpRequest req,
ILogger log) {
log.LogInformation ("C# HTTP trigger function processed a request.");
var requestBody = new StreamReader (req.Body).ReadToEnd ();
var data = JsonConvert.DeserializeObject<ZipFileMetaData> (requestBody);
var storageAccount =
CloudStorageAccount.Parse (Environment.GetEnvironmentVariable ("StorageConnectionString"));
var blobClient = storageAccount.CreateCloudBlobClient ();
var container = blobClient.GetContainerReference (data.SourceContainer);
var blockBlob = container.GetBlockBlobReference (data.FileName);
var extractcontainer = blockBlob.ServiceClient.GetContainerReference (data.DestinationContainer.ToLower ());
await extractcontainer.CreateIfNotExistsAsync ();
var files = new List<string> ();
// Save blob(zip file) contents to a Memory Stream.
using (var zipBlobFileStream = new MemoryStream ()) {
await blockBlob.DownloadToStreamAsync (zipBlobFileStream);
await zipBlobFileStream.FlushAsync ();
zipBlobFileStream.Position = 0;
//use ZipArchive from System.IO.Compression to extract all the files from zip file
using (var zip = new ZipArchive (zipBlobFileStream)) {
//Each entry here represents an individual file or a folder
foreach (var entry in zip.Entries) {
files.Add (entry.FullName);
//creating an empty file (blobkBlob) for the actual file with the same name of file
var blob = extractcontainer.GetBlockBlobReference (entry.FullName);
using (var stream = entry.Open ()) {
//check for file or folder and update the above blob reference with actual content from stream
if (entry.Length > 0) {
await blob.UploadFromStreamAsync (stream);
}
}
// TO-DO : Process the file (Blob)
//process the file here (blob) or you can write another process later
//to reference each of these files(blobs) on all files got extracted to other container.
}
}
}
return new OkObjectResult (files);
}
Simply add directory name before entry name, and we can see the directory created automatically.
var blob = extractcontainer.GetBlockBlobReference ("Parent/"+entry.FullName);
Note that the directory is virtual. Blob Storage is in container/blob structure, the directories are actually prefixes of blob names, and Storage service displays the directory structure according to the / separator for us.

Azure storage not finding csv file

I am trying to read a csv file from my azure storage account.
To convert each line into an object and build a list of those objects.
It keeps erring, and the reason is it cant find the file (Blob not found). The file is there, It is a csv file.
Error:
StorageException: The specified blob does not exist.
BatlGroup.Site.Services.AzureStorageService.AzureFileMethods.ReadCsvFileFromBlobAsync(CloudBlobContainer container, string fileName) in AzureFileMethods.cs
+
await blob.DownloadToStreamAsync(memoryStream);
public async Task<Stream> ReadCsvFileFromBlobAsync(CloudBlobContainer container, string fileName)
{
// Retrieve reference to a blob (fileName)
var blob = container.GetBlockBlobReference(fileName);
using (var memoryStream = new MemoryStream())
{
//downloads blob's content to a stream
await blob.DownloadToStreamAsync(memoryStream);
return memoryStream;
}
}
I've made sure the file is public. I can download any text file that is stored there, but none of the csv files.
I am also not sure what format to take it in as I need to iterate through the lines.
I see examples of bringing the whole file down to a temp drive and working with it there but that seems unproductive as then I could just store the file in wwroot folder instead of azure.
What is the most appropriate way to read a csv file from azure storage.
Regarding how to iterate through the lines, after you get the memory stream, you can use StreamReader to read them line by line.
Sample code as below:
using Microsoft.WindowsAzure.Storage;
using Microsoft.WindowsAzure.Storage.Blob;
using System;
using System.IO;
namespace ConsoleApp17
{
class Program
{
static void Main(string[] args)
{
string connstr = "your connection string";
CloudStorageAccount storageAccount = CloudStorageAccount.Parse(connstr);
CloudBlobClient blobClient = storageAccount.CreateCloudBlobClient();
CloudBlobContainer container = blobClient.GetContainerReference("t11");
CloudBlockBlob blockBlob = container.GetBlockBlobReference("students.csv");
string text="";
string temp = "";
using (var memoryStream = new MemoryStream())
{
blockBlob.DownloadToStream(memoryStream);
//remember set the position to 0
memoryStream.Position = 0;
using (var reader = new StreamReader(memoryStream))
{
//read the csv file as per line.
while (!reader.EndOfStream && !string.IsNullOrEmpty(temp=reader.ReadLine()))
{
text = text + "***" + temp;
}
}
}
Console.WriteLine(text);
Console.WriteLine("-------");
Console.ReadLine();
}
}
}
My csv file:
The test result:

Upload same documents to Dropbox .NET SDK

I have a web API function that successfully upload files to Dropbox (using their new .NET SDK) and then gets shared links to the uploaded files (each document a time).
private async Task<string> Upload(DropboxClient dbx, string localPath, string remotePath)
{
const int ChunkSize = 4096 * 1024;
using (var fileStream = File.Open(localPath, FileMode.Open))
{
if (fileStream.Length <= ChunkSize)
{
WriteMode mode = new WriteMode();
FileMetadata fileMetadata = await dbx.Files.UploadAsync(remotePath, body: fileStream, mode: mode.AsAdd, autorename: true);
//set the expiry date
var settings = new SharedLinkSettings(expires: DateTime.Today.AddDays(7));
SharedLinkMetadata sharedLinkMetadata = await dbx.Sharing.CreateSharedLinkWithSettingsAsync(fileMetadata.PathLower, settings);
return sharedLinkMetadata.Url;
}
else
{
await this.ChunkUpload(dbx, remotePath, fileStream, ChunkSize);
}
return "error";
}
}
That usually works fine but when I upload the exact same document (name and content) twice, nothing happens and I do need to have both files stored in my Dropbox account.
It can be as much same documents as needed (not only two), my best scenario would be to have the second document (and third etc..) automatically renamed and uploaded to Dropbox.
Any idea on how to accomplish that?
I post the answer, maybe it will help someone.. I spent long time till I figure it out.
This is the code that checks if a file already exists in Dropbox.
If the file exists it checks if a link was shared for this file and based on the result it generates/retrieves a/the shared link.
private async Task<string> Upload(DropboxClient dbx, string localPath, string remotePath)
{
const int ChunkSize = 4096 * 1024;
using (var fileStream = File.Open(localPath, FileMode.Open))
{
if (fileStream.Length <= ChunkSize)
{
WriteMode mode = new WriteMode();
FileMetadata fileMetadata = await dbx.Files.UploadAsync(remotePath, body: fileStream, mode: mode.AsAdd, autorename: true);
//set the expiry date
var existingDoc = await dbx.Files.GetMetadataAsync(remotePath);
if (existingDoc.IsFile)
{
var sharedLink = await dbx.Sharing.ListSharedLinksAsync(remotePath);
var settings = new ListSharedLinksArg(remotePath);
ListSharedLinksResult listSharedLinksResult = await dbx.Sharing.ListSharedLinksAsync(remotePath);
if (listSharedLinksResult.Links.Count > 0)
{
return listSharedLinksResult.Links[0].Url;
}
else
{
var sharedLinkSettings = new SharedLinkSettings(expires: DateTime.Today.AddDays(7));
SharedLinkMetadata sharedLinkMetadata = await dbx.Sharing.CreateSharedLinkWithSettingsAsync(remotePath, sharedLinkSettings);
return sharedLinkMetadata.Url;
}
}
else
{
var settings = new SharedLinkSettings(expires: DateTime.Today.AddDays(7));
SharedLinkMetadata sharedLinkMetadata = await dbx.Sharing.CreateSharedLinkWithSettingsAsync(fileMetadata.PathLower, settings);
return sharedLinkMetadata.Url;
}
}
else
{
var sharedLink = await this.ChunkUpload(dbx, remotePath, fileStream, ChunkSize);
return sharedLink;
}
}
When you upload the same exact content to the same path again, the Dropbox API won't produce a conflict or another copy of the file, as nothing changed. (Edit: you can force a conflict even for identical contents by using strictConflict=true, e.g., on UploadAsync).
If you want another copy of the same data in your account, you can specify the different desired path when calling UploadAsync the second time.
Or, more efficiently, you can use CopyAsync to make a copy of the file already on Dropbox.

Categories

Resources