Read Parquet file from Azure blob with out downloading it locally c# .net - c#

We have a parquet formatfile (500 mb) which is located in Azure blob.How to read the file directly from blob and save in memory of c# ,say eg:Datatable.
I am able to read parquet file which is physically located in folder using the below code.
public void ReadParqueFile()
{
using (Stream fileStream = System.IO.File.OpenRead("D:/../userdata1.parquet"))
{
using (var parquetReader = new ParquetReader(fileStream))
{
DataField[] dataFields = parquetReader.Schema.GetDataFields();
for (int i = 0; i < parquetReader.RowGroupCount; i++)
{
using (ParquetRowGroupReader groupReader = parquetReader.OpenRowGroupReader(i))
{
DataColumn[] columns = dataFields.Select(groupReader.ReadColumn).ToArray();
DataColumn firstColumn = columns[0];
Array data = firstColumn.Data;
//int[] ids = (int[])data;
}
}
}
}
}
}
(I am able to read csv file directly from blob using sourcestream).Please kindly suggest a fastest method to read the parquet file directly from blob

Per my experience, the solution to directly read the parquet file from blob is first to generate the blob url with sas token and then to get the stream of HttpClient from the url with sas and finally to read the http response stream via ParquetReader.
First, please refer to the sample code below of the section Create a service SAS for a blob of the offical document Create a service SAS for a container or blob with .NET using Azure Blob Storage SDK for .NET Core.
private static string GetBlobSasUri(CloudBlobContainer container, string blobName, string policyName = null)
{
string sasBlobToken;
// Get a reference to a blob within the container.
// Note that the blob may not exist yet, but a SAS can still be created for it.
CloudBlockBlob blob = container.GetBlockBlobReference(blobName);
if (policyName == null)
{
// Create a new access policy and define its constraints.
// Note that the SharedAccessBlobPolicy class is used both to define the parameters of an ad hoc SAS, and
// to construct a shared access policy that is saved to the container's shared access policies.
SharedAccessBlobPolicy adHocSAS = new SharedAccessBlobPolicy()
{
// When the start time for the SAS is omitted, the start time is assumed to be the time when the storage service receives the request.
// Omitting the start time for a SAS that is effective immediately helps to avoid clock skew.
SharedAccessExpiryTime = DateTime.UtcNow.AddHours(24),
Permissions = SharedAccessBlobPermissions.Read | SharedAccessBlobPermissions.Write | SharedAccessBlobPermissions.Create
};
// Generate the shared access signature on the blob, setting the constraints directly on the signature.
sasBlobToken = blob.GetSharedAccessSignature(adHocSAS);
Console.WriteLine("SAS for blob (ad hoc): {0}", sasBlobToken);
Console.WriteLine();
}
else
{
// Generate the shared access signature on the blob. In this case, all of the constraints for the
// shared access signature are specified on the container's stored access policy.
sasBlobToken = blob.GetSharedAccessSignature(null, policyName);
Console.WriteLine("SAS for blob (stored access policy): {0}", sasBlobToken);
Console.WriteLine();
}
// Return the URI string for the container, including the SAS token.
return blob.Uri + sasBlobToken;
}
Then to get the http response stream of HttpClient from the url with sas token .
var blobUrlWithSAS = GetBlobSasUri(container, blobName);
var client = new HttpClient();
var stream = await client.GetStreamAsync(blobUrlWithSAS);
Finally to read it via ParquetReader, the code comes from Reading Data of GitHub repo aloneguid/parquet-dotnet.
var options = new ParquetOptions { TreatByteArrayAsString = true };
var reader = new ParquetReader(stream, options);

Related

Getting error when Dropbox url copy to azure blob storage

I am using the azure blob storage to copy the dropbox file. But when I try to copy that file via URL, got the 500 error and totalbytes are -1.
I am using StartCopy method of WindowsAzure.Storage.Blob package. But here I get the copyStatus.TotalBytes as -1 and copy not working.
Tried the all types of url as below:
https://dl.dropboxusercontent.com/s/1v9re1dozilpdgi/1_32min.mp4?dl=0
https://dl.dropboxusercontent.com/s/1v9re1dozilpdgi/1_32min.mp4?dl=1
https://www.dropbox.com/s/1v9re1dozilpdgi/1_32min.mp4?dl=0
So can you please help me to solve this issue? Anything needs to change in URL or any way to copy the dropbox media to azure blob storage.
Also, I am using the .net 4.8 frameworks with the C#.
Sample Code:
string url = "https://dl.dropboxu`enter code here`sercontent.com/s/1v9re1dozilpdgi/1_32min.mp4?dl=0";
Uri fileUri = new Uri(url);
string filename = "test-file.mp4";
var account = CloudStorageAccount.Parse(connectionstring);
var blobClient = account.CreateCloudBlobClient();
var container = blobClient.GetContainerReference("test-container");
var blob = container.GetBlockBlobReference(filename);
blob.DeleteIfExists();
blob.StartCopy(fileUri);
var refBlob = (CloudBlockBlob)container.GetBlobReferenceFromServer(filename);
var fileLength = refBlob.CopyState.TotalBytes ?? 0;
while (refBlob.CopyState.Status == CopyStatus.Pending)
{
refBlob = (CloudBlockBlob)container.GetBlobReferenceFromServer(filename);
var copyStatus = refBlob.CopyState;
}
Error message: 500 InternalServerError "Copy failed."
We need to use CloudBlockBlob instead of using GetBlockBlobReference .
Because the filename, not the URL, is passed to GetBlockBlobReference in its Constructor.
For more information please refer the below
SO THREAD as suggested by #Tobias Tengler
& This BLOG:- Azure – Upload and Download data using C#.NET

Trouble downloading azure storage blobs under private access in c#

I'm a complete noob at c# and know very little about azure apis and a current cs student doing a project for work. I built some middleware with youtube tutorials that authenticates a with a storage account using a string connection and it enumerates, uploads, downloads, and deletes blobs within a container. The issue i'm having lies with ONLY the downloading functionality and ONLY when the storage account access is set to private. This function works fine with anon access. I suspect the issue is with appending the url, and I'm not sure how to fix it. The blobs are mainly csv data if that matters. Any help or direction to resources would be greatly appreciated 🙂 here is the relevant code:
url function
public async Task<string> GetBlob(string name, string containerName)
{
var containerClient = _blobClient.GetBlobContainerClient(containerName);
var blobClient = containerClient.GetBlobClient(name);
return blobClient.Uri.AbsoluteUri;
}
The config file
"AllowedHosts": "*",
"BlobConnection" : "<mystringconnection>******==;EndpointSuffix=core.windows.net"
action request
[HttpGet]
public async Task<IActionResult> ViewFile(string name)
{
var res = await _blobService.GetBlob(name, "<mystorageacc>");
return Redirect(res);
}
The reason you are not able to download the blobs from a private container is because you are simply returning the blob's URL from your method without any authorization information. Request to access blobs in a private container must be authorized.
What you would need to do is create a Shared Access Signature (SAS) with at least Read permission and then return that SAS URL. The method you would want to use is GenerateSasUri. Your code would be something like:
public async Task<string> GetBlob(string name, string containerName)
{
var containerClient = _blobClient.GetBlobContainerClient(containerName);
var blobClient = containerClient.GetBlobClient(name);
return blobClient.GenerateSasUri(BlobSasPermissions.Read, DateTime.UtcNow.AddMinutes(5)).Uri.AbsoluteUri;
}
This will give you a link which is valid for 5 minutes from the time of creation and has the permission to read (download) the blob.
If you want to download from the blob service;
public async Task<byte[]> ReadFileAsync(string path)
{
using var ms = new MemoryStream();
var blob = _client.GetBlobClient(path);
await blob.DownloadToAsync(ms);
return ms.ToArray();
}
If you want to download the file byte array from controllers, you can check this;
https://stackoverflow.com/a/3605510/3024129
If you want to set a blob file public access level;
https://learn.microsoft.com/en-us/azure/storage/blobs/anonymous-read-access-configure.
Pay attention to the images please;
Or you can connect with Azure Storage Explorer and choose the easy way.
I found the images on the Google, there may be differences. :)
This worked for me by returning a byte array:
byte[] base64ImageRepresentation = new byte[] { };
BlobClient blobClient = new BlobClient(blobConnectionString,
blobContainerUserDocs,+ "/" + fileName);
if (await blobClient.ExistsAsync())
{
using var ms = new MemoryStream();
await blobClient.DownloadToAsync(ms);
return ms.ToArray();
}

How to correctly server a file from Azure Storage using a dotnet core controller?

I have a controller in my dotnet core web application to fetch a resource from a Azure storage account in Azure and offer it to the user for download. The user can't directly access the Azure storage account, so my webapp works as a proxy and authenticates the user before service the file.
My doubt is if my implementation if efficient with large files? My concern is that DownloadToStreamAsync() actually fetches the entire file in the memory of the webapp before serving it.
public async Task<IActionResult> Serve(string path)
{
MemoryStream streamIn = null;
CloudFile file = null;
Stream fileStream = null;
var filename = Path.GetFileName(path);
// application-level permission checks checks
// fetching file from Azure Storage
try {
var storageConnectionString = _azureOptions.AzureStorageAccountConnectionString;
var storageAccount = CloudStorageAccount.Parse(storageConnectionString);
var fileClient = storageAccount.CreateCloudFileClient();
var share = fileClient.GetShareReference(_azureOptions.AzureStorageAccountContentShareName);
var root = share.GetRootDirectoryReference();
file = root.GetFileReference(path);
if (!await file.ExistsAsync())
{
return NotFound();
}
streamIn = new MemoryStream();
await file.DownloadToStreamAsync(streamIn);
fileStream = await file.OpenReadAsync();
} catch (StorageException e) {
_logger.LogError($"Error while retrieving content resource: {path}", e);
return NotFound();
}
return File(fileStream, _getContentType(filename));
}
You are right to have this concern because your code is downloading the entire blob to memory before uploading it to the client. This is very inefficient.
Furthermore, you are not even using the MemoryStream where you download the file. Just delete this code:
streamIn = new MemoryStream();
await file.DownloadToStreamAsync(streamIn);
The rest of the code should stream the file from Azure and stream it to the client in parallel.

Azure blob storage download to stream returning "" asp.net

I am currently trying to download a file from Azure blob storage using the DownloadToStream method to download the contents of a blob as a text string.
However I am not getting anything back but an empty string.
Here is my code that I use to connect to the azure blob container and retrieve the blob file.
public static string DownLoadFroalaImageAsString(string blobStorageName, string companyID)
{
// Retrieve storage account from connection string.
CloudStorageAccount storageAccount = CloudStorageAccount.Parse(
CloudConfigurationManager.GetSetting("StorageConnectionString"));
// Create the blob client.
CloudBlobClient blobClient = storageAccount.CreateCloudBlobClient();
// Retrieve reference to a previously created container.
CloudBlobContainer container = blobClient.GetContainerReference(companyID.ToLower());
//retrieving the actual filename of the blob
string removeString = "BLOB/";
string trimmedString = blobStorageName.Remove(blobStorageName.IndexOf(removeString), removeString.Length);
// Retrieve reference to a blob named "trimmedString"
CloudBlockBlob blockBlob2 = container.GetBlockBlobReference(trimmedString);
string text;
using (var memoryStream = new MemoryStream())
{
blockBlob2.DownloadToStream(memoryStream);
text = System.Text.Encoding.UTF8.GetString(memoryStream.ToArray());
}
return text;
}
I was following along this documentation however I cannot seem to get it to work. Any help would be greatly appreciated.
However I am not getting anything back but an empty string.
I test your supplied code on my side, it works correctly. I assume that the test blob content is empty in your case. We could trouble shooting with following ways:
1.please have a try to check the Length of memoryStream. If length equal 0 we could know that the blob content is empty.
using (var memoryStream = new MemoryStream())
{
blockBlob2.DownloadToStream(memoryStream);
var length = memoryStream.Length;
text = System.Text.Encoding.UTF8.GetString(memoryStream.ToArray());
}
2.We could upload a blob with content to container, we could do that with Azure portal or Microsoft Azure storage explorer easily. And please have a try test it with uploaded blob.
If you want to get the text from the blob, you can use DownloadTextAsync()
var text = await blockBlob2.DownloadTextAsync();
If you want to return file stream back to an API respoinse, you can use FileStreamResult which is IActionResult.
var stream = await blockBlob2.OpenReadAsync();
return File(stream, blockBlob2.Properties.ContentType, "name");

C# MVC Web App Service Connect to Azure Storage Blob

I've got a basic web app in C# MVC (i'm new to MVC) which is connected to a database. In that database there is a table with a list of filenames. These files are stored in Azure Storage Blob Container.
I've used Scaffolding (creates a controller and view) to show data from my table of filenames and that works fine.
Now I would like to connect those filenames to the blob storage so that the user can click on and open them. How do I achieve this?
Do I edit the index view? Do I get the user to click on a filename and then connect to Azure storage to open that file? How is this done?
Please note that files on storage are private and is accessed using the storage key. Files cannot be made public.
Thanks for any advice.
[Update]
I've implemented the Shared Access Signature (SAS) using the code below.
public static string GetSASUrl(string containerName)
{
CloudStorageAccount storageAccount = CloudStorageAccount.Parse(CloudConfigurationManager.GetSetting("StorageConnectionString"));
CloudBlobClient blobClient = storageAccount.CreateCloudBlobClient();
CloudBlobContainer container = blobClient.GetContainerReference(containerName);
BlobContainerPermissions containerPermissions = new BlobContainerPermissions();
containerPermissions.SharedAccessPolicies.Add("twominutepolicy", new SharedAccessBlobPolicy()
{
SharedAccessStartTime = DateTime.UtcNow.AddMinutes(-1),
SharedAccessExpiryTime = DateTime.UtcNow.AddMinutes(2),
Permissions = SharedAccessBlobPermissions.Write | SharedAccessBlobPermissions.Read
});
containerPermissions.PublicAccess = BlobContainerPublicAccessType.Off;
container.SetPermissions(containerPermissions);
string sas = container.GetSharedAccessSignature(new SharedAccessBlobPolicy(), "twominutepolicy");
return sas;
}
public static string GetSasBlobUrl(string containerName, string fileName, string sas)
{
// Create new storage credentials using the SAS token.
StorageCredentials accountSAS = new StorageCredentials(sas);
// Use these credentials and the account name to create a Blob service client.
CloudStorageAccount accountWithSAS = new CloudStorageAccount(accountSAS, [Enter Account Name], endpointSuffix: null, useHttps: true);
CloudBlobClient blobClientWithSAS = accountWithSAS.CreateCloudBlobClient();
// Retrieve reference to a previously created container.
CloudBlobContainer container = blobClientWithSAS.GetContainerReference(containerName);
// Retrieve reference to a blob named "photo1.jpg".
CloudBlockBlob blockBlob = container.GetBlockBlobReference(fileName);
return blockBlob.Uri.AbsoluteUri + sas;
}
In order to access blobs that are not public, you'll need to use Shared Access Signatures, with that, you'll create access tokens valid for a period of time (you'll choose) and you can also restrict by IP address.
More info in here:
https://learn.microsoft.com/en-us/azure/storage/storage-dotnet-shared-access-signature-part-1
As they are not public, you'll need to add an additional step before pass the data to your view, which is concatenate the SAS token to the blob Uri. You can find a very good example in here: http://www.dotnetcurry.com/windows-azure/901/protect-azure-blob-storage-shared-access-signature

Categories

Resources