How to search Blob Container's file content?

How to search Blob Container's file content? - c#

We store our logs in Blob Container and we create individual JSON file for each action
for example 12345.json
{"User\":\"User1\",\"Location\":\"LOC\","timestamp":"2023-01-10T10:34:43.5470187+00:00","Id":"12345"}
I want to return all the data that User = User1.
I can use BlobServiceClient to connect to Blog storage account and retrieve all the json files. I would assume I can read individual json file and do some filtering, but are there any better ways to do this?
My ultimate goal is to create an endpoint and accept list of keywords, date range and then return the corresponding results.

If you just want to use Blob Storage only, then the option would be to first list all blobs in the container and then search inside each of the blob using Query Blob Contents (I linked REST API documentation. Please check the equivalent method in the SDK).
Other (a much better option IMO) would be to use Azure Cognitive Search and create a Blob Indexer. Have the contents of the blob container indexed by Azure Cognitive Search and then do a search over that indexed data.
You can learn more about using Azure Cognitive Search with Blob Storage here: https://learn.microsoft.com/en-us/azure/search/search-blob-storage-integration. For working with JSON data in Blob Storage, please see this: https://learn.microsoft.com/en-us/azure/search/search-howto-index-json-blobs.

Related

Use an existing blob into a Media Services Asset (API v3)

I have large media files on Azure Storage block blobs, and I would like to encode them with Azure Media Service API V3.
I've found equivalent in API v2 : Copying existing blobs into a Media Services Asset but the v2 is obsolete and will be retired soon.
Where can I find exemple of Creating an Asset from an existing blob? All exemples I can find (including offical SDK v3 tutorials) are always using small local videos uploaded directly.
Also, it's not clear if in V3 I still need to copy my blob to an asset like in V2, or if you can use a blob with an asset as long as the Media Service is using the same Storage Account (because as said in the v2 to v3 migration guide, AssetFiles doesn't exists anymore in v3).

I have a pretty extensive sample of copying content from a storage account, encoding it with AMS and delivering it back to the same location in this Node.js/Typescript sample:
https://github.com/Azure-Samples/media-services-v3-node-tutorials/tree/main/VideoEncoding/Encoding_Bulk_Remote_Storage_Account_SAS
Take a look there first and tell me if that is what you are in need of. There are a number of helper functions I use with the storage blob SDK in the common folder here.
https://github.com/Azure-Samples/media-services-v3-node-tutorials/tree/main/Common
Keep in mind that the workflow for remote assets can be achieved in a couple ways in v3.
V3 Jobs support the JobInputHTTP object, which can point to a read only SAS URL that you pass in from your remote storage blob (if the storage account is not attached to the AMS account).
You can create an empty Asset and copy the blob into it from a remote storage account and the submit the job as JobInputAsset as usual
You can create an asset and pass in the container name - if this is an attached storage account, you can then wrap an existing storage account container as an Asset and then submit a job with the specified file in that Asset container as the input source. This is what you said in the last sentence above... but it may not be clear that you can do this in JobInputAssets - look at the Files property on JobInputAsset to pass in a specific list of files to the encoder (single or multi if doing overlays).
"input": {
"#odata.type": "#Microsoft.Media.JobInputAsset",
"files": [],
"inputDefinitions": [],
"assetName": "job1-InputAsset"
},

What is the efficient way to get data from ADLS Gen2 and bind it to Angular UI?

I am using Angular SPA project type for creating a client side application.
The data that i want to bind to UI is present in ADLS Gen2 storage.
What is the efficient way to connect to ADLS Gen 2 and fetch the data?
I know there is blob SDK available but i am not sure how efficient it is as far as performance is concerned

1.If you prefer to use SDK, then the blob SDK is the right way to fetch data. Using the blob SDK, you can download the data to locally or get the data like stream.
2.If you want to download the data to local, then bind to UI, you can use AZcopy, which provides high performance than blob SDK.
3.The last way, you can use sas-token with these data. Then you can get the data url with sas-token => then you can directly use these urls and bind to the UI. Here is the screenshot of generate sas-token and you can control the permission like read-only:
Assume you have this data in ADLS Gen2: test/image1.jgp, when you generate the sas-token, add this data to Blob Service SAS URL which is generated above, like below:
https://ADLS_Gen2_account.blob.core.windows.net/test/image1.jpg?sv=xxxx

Azure Blob Storage 400 Bad Request

Good morning,
I'm trying to implement Azure Blog Storage for the first time using their example code provided. However my app is through a very broad 400 Bad Request error when trying to UploadFromStream().
I have done a bunch of searching on this issue. Almost everything i have come across identifies naming conventions of the container or blob to be the issue. this is NOT my issue, i'm using all lowercase, etc.
My code is no different from their example code:
The connection string:
<add key="StorageConnectionString" value="DefaultEndpointsProtocol=https;AccountName=xxxx;AccountKey=xxxxxx;EndpointSuffix=core.windows.net" />
And the code:
// Retrieve storage account from connection string
CloudStorageAccount storageAccount = CloudStorageAccount.Parse(CloudConfigurationManager.GetSetting("StorageConnectionString"));
// Create the blob client
CloudBlobClient blobClient = storageAccount.CreateCloudBlobClient();
// Retrieve reference to a previously created container
CloudBlobContainer container = blobClient.GetContainerReference("mycontainer");
// Retrieve reference to a blob named "myblob"
CloudBlockBlob blob = container.GetBlockBlobReference("myblob");
// Create the container if it doesn't already exist
container.CreateIfNotExists();
// Create or overwrite the "myblob" blob with contents from a local file.
using (var fileStream = System.IO.File.OpenRead(#"D:\Files\logo.png"))
{
blob.UploadFromStream(fileStream);
}
Here is the exception details:
This is all i have to go on. The only other thing i can think of is that i'm running this on my development environment with HTTP not HTTPS. Not sure if this might be a issue?
EDIT:
Additionally, when attempting to upload a file directily in the Azure portal to the container i recieve a
Validation error for TestAzureFileUpload.txt. Details: "The page blob
size must be aligned to a 512-byte boundary. The current file size is
56."
Could this be related to my issue? Am i missing some setting here?
I know i do not have enough to go on here for anyone to help me identify the exact issue, but i am hoping that someone can at least point me in the right direction to resolve this?
Any help would be appreciated

I use a Premium storage account to test the code and get the same "400 bad request" as yours.
From the exception details, you can see the "Block blobs are not supported" message.
Here is an image of my exception details
To solve your problem, I think you should know the difference between block blob and page blob.
Block blobs are comprised of blocks, each of which is identified by a block ID. You create or modify a block blob by writing a set of blocks and committing them by their block IDs. they are for you discrete storage objects like jpg, txt, log, etc. That you'd typically view as a file in your local OS. Supported by standard storage account only.
Page blobs are a collection of 512-byte pages optimized for random read and write operations, such as VHD's. To create a page blob, you initialize the page blob and specify the maximum size the page blob will grow. The truth is, page blobs are designed for Azure Virtual Machine disks. Supported by both standard and Premium storage account.
Since you are using the Premium Storage, which is currently available only for storing data on disks used by Azure Virtual Machines.
So my suggestion is :
If you want your application to support streaming and random access scenarios, and be able to access application data from anywhere. You should use block blobs with the standard account.
If you want to lift and shift applications that use native file system APIs to read and write data to persistent disks. Or you want to store data that is not required to be accessed from outside the virtual machine to which the disk is attached. You should use Page blobs.
Reference link:
Understanding Block Blobs, Append Blobs, and Page Blobs

Not seeing all blobs when using the Azure List Blobs Container API

I am trying to find a simple way to list log files which I am storing in and Azure Blob Container so developers and admins can easily get to dev log information. I am following the information in this API doc https://msdn.microsoft.com/en-us/library/dd135734.aspx but when I go to
https://-my-storage-url-.blob.core.windows.net/dev?comp=list&nclude={snapshots,metadata,uncommittedblobs,copy}&maxresults=1000
I see one file listed which is a Block Blob but the log files I have generated which are of type Append Blob are not showing. How can I construct this api call to include Append Blobs?

.NET Azure Blob Storage: Get Root-Level Directories w/o Listing All Blobs

I have an Azure cloud blob storage account & I need to enumerate its contents. The account has a large amount of data & using ListBlobs to enumerate all its contents takes a long time to complete.
For both cloud containers & directories, I want the ability to enumerate only root-level items. For a container, I assume this will enumerate root-level blobs:
cloudBlobContainer.ListBlobs(
String.Empty,
false,
BlobListingDetails.None,
null,
null))
Is there any reasonable way to get root-level directories without listing all blobs? The only way I can think to do it is absurd: make calls to ListBlob with every possible combination a blob prefix could be.

Zachary, unfortunately there is no such thing as a "directory" in Azure Blob Storage. The object hierarchy is as follows:
Storage Account (Management Plane)
Storage Container [0..n] (Data Plane)
Blobs [0..n] (Data Plane)
When you see additional forward slashes in the blob names, it is only a "virtual" directory, not a separate directory entity.
https://azure.microsoft.com/en-us/documentation/articles/storage-dotnet-how-to-use-blobs/

You can achieve a more granular listing of a directory's contents by using the .ListBlobs().OfType<your_chosen_blob_type>() call. One blob type is CloudBlobDirectory, for example. See this answer: https://stackoverflow.com/a/14440507/9654964.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.