Windows Azure: length of blob remains 0 - c#

I'm confused about how to get size of the blob in Windows Azure.
In my case, I first get the blob reference with CloudBlockBlob blob = container.GetBlockBlobReference(foo);(here foo is name of blob and I'm sure the blob exists). Then I try to get blob size blob.Property.Length; However, it always return 0. I breakpoint at this statement and track content inside blob: uri of the blob is correct, can I infer that the blob is correctly retrieved from that? While all the fields in Properties is either null or 0. I cannot figure out a solution. Is it because I currently emulate the app locally in Storage Emulator and will be OK after the deployment?
Thanks and Best Regards.

Call blob.FetchAttributes(). GetBlockBlobReference doesn't actually make any calls to the blob service. It just constructs a local object that represents the blob.

I have written a blog on exact same issue about 4 days back:
http://blogs.msdn.com/b/avkashchauhan/archive/2012/04/27/windows-azure-blob-size-return-0-even-when-blob-is-accessible-and-downloaded-without-any-problem.aspx

Related

How to search Blob Container's file content?

We store our logs in Blob Container and we create individual JSON file for each action
for example 12345.json
{"User\":\"User1\",\"Location\":\"LOC\","timestamp":"2023-01-10T10:34:43.5470187+00:00","Id":"12345"}
I want to return all the data that User = User1.
I can use BlobServiceClient to connect to Blog storage account and retrieve all the json files. I would assume I can read individual json file and do some filtering, but are there any better ways to do this?
My ultimate goal is to create an endpoint and accept list of keywords, date range and then return the corresponding results.
If you just want to use Blob Storage only, then the option would be to first list all blobs in the container and then search inside each of the blob using Query Blob Contents (I linked REST API documentation. Please check the equivalent method in the SDK).
Other (a much better option IMO) would be to use Azure Cognitive Search and create a Blob Indexer. Have the contents of the blob container indexed by Azure Cognitive Search and then do a search over that indexed data.
You can learn more about using Azure Cognitive Search with Blob Storage here: https://learn.microsoft.com/en-us/azure/search/search-blob-storage-integration. For working with JSON data in Blob Storage, please see this: https://learn.microsoft.com/en-us/azure/search/search-howto-index-json-blobs.

How to setup an Azure Search soft-delete policy attached to Azure Storage

I have an Azure Search that uses Azure Storage as datasource. I'm trying to get the Search to delete files from it's result whenever they are deleted in the Storage, but somehow I can't seem to achieve it.
So far I've tried to setup a soft-delete policy on the Storage to keep files for another 7 days, and a soft-delete policy on the Search to check for the metadata column IsDeleted and if true delete the item from its results. Then using code I change the metadata and then delete the file as follows:
blob.Metadata["IsDeleted"] = "true";
blob.SetMetadataAsync().Wait();
blob.DeleteAsync().Wait();
Without the deleting it seems to work fine, but with it I guess the Search can no longer access the metadata even if the file is still retained. I'm assuming something as simply as this is already thought out, but I can't seem to find it.
Short answer is that you can't use soft delete blobs in Azure Search.
When a blob is soft deleted from storage, for all intents and purpose the blob is deleted. You can't perform any operation on the blob before undeleting it first. It is also not returned as part of regular blob listing process.
Because of this fact, when an indexer runs to fetch the list of blobs it does not find soft deleted blob. Only way to mark the blob deleted from search service indexer point of view is to keep the blob in storage and set the metadata property "IsDeleted" to "true" which you're doing.

Fetch Azure blob metadata only if the blob exists

I need to fetch the metadata for an Azure blob if it exists and would like to achieve this with minimal REST calls (by the storage SDK to the storage service)
I know I can do something like
CloudBlockBlob.ExistsAsync() and then
CloudBlockBlob.FetchAttributesAsync() if the blob exists
I tried to combine these 2 calls into one
CloudBlockBlob.FetchAttributesAsync(AccessCondition.GenerateIfExistsCondition(),new BlobRequestOptions(), new OperationContext());
Docs on 'AccessCondition.GenerateIfExistsCondition()' say -
Constructs an access condition such that an operation will be
performed only if the resource exists.
but it still fails with a 404 not found.
Any idea if what I want to achieve is even possible and what I might be doing wrong?
Looking at the documentation for the action: https://learn.microsoft.com/en-us/rest/api/storageservices/get-blob-properties.
It's basically a HEAD request to the blob, and there is no mention of If-Match etc. for headers.
So I think the most optimal way of doing it is just calling FetchAttributesAsync.
If that causes a 404, then the blob did not exist.
It only does 1 HTTP request.

Azure Blob Storage 400 Bad Request

Good morning,
I'm trying to implement Azure Blog Storage for the first time using their example code provided. However my app is through a very broad 400 Bad Request error when trying to UploadFromStream().
I have done a bunch of searching on this issue. Almost everything i have come across identifies naming conventions of the container or blob to be the issue. this is NOT my issue, i'm using all lowercase, etc.
My code is no different from their example code:
The connection string:
<add key="StorageConnectionString" value="DefaultEndpointsProtocol=https;AccountName=xxxx;AccountKey=xxxxxx;EndpointSuffix=core.windows.net" />
And the code:
// Retrieve storage account from connection string
CloudStorageAccount storageAccount = CloudStorageAccount.Parse(CloudConfigurationManager.GetSetting("StorageConnectionString"));
// Create the blob client
CloudBlobClient blobClient = storageAccount.CreateCloudBlobClient();
// Retrieve reference to a previously created container
CloudBlobContainer container = blobClient.GetContainerReference("mycontainer");
// Retrieve reference to a blob named "myblob"
CloudBlockBlob blob = container.GetBlockBlobReference("myblob");
// Create the container if it doesn't already exist
container.CreateIfNotExists();
// Create or overwrite the "myblob" blob with contents from a local file.
using (var fileStream = System.IO.File.OpenRead(#"D:\Files\logo.png"))
{
blob.UploadFromStream(fileStream);
}
Here is the exception details:
This is all i have to go on. The only other thing i can think of is that i'm running this on my development environment with HTTP not HTTPS. Not sure if this might be a issue?
EDIT:
Additionally, when attempting to upload a file directily in the Azure portal to the container i recieve a
Validation error for TestAzureFileUpload.txt. Details: "The page blob
size must be aligned to a 512-byte boundary. The current file size is
56."
Could this be related to my issue? Am i missing some setting here?
I know i do not have enough to go on here for anyone to help me identify the exact issue, but i am hoping that someone can at least point me in the right direction to resolve this?
Any help would be appreciated
I use a Premium storage account to test the code and get the same "400 bad request" as yours.
From the exception details, you can see the "Block blobs are not supported" message.
Here is an image of my exception details
To solve your problem, I think you should know the difference between block blob and page blob.
Block blobs are comprised of blocks, each of which is identified by a block ID. You create or modify a block blob by writing a set of blocks and committing them by their block IDs. they are for you discrete storage objects like jpg, txt, log, etc. That you'd typically view as a file in your local OS. Supported by standard storage account only.
Page blobs are a collection of 512-byte pages optimized for random read and write operations, such as VHD's. To create a page blob, you initialize the page blob and specify the maximum size the page blob will grow. The truth is, page blobs are designed for Azure Virtual Machine disks. Supported by both standard and Premium storage account.
Since you are using the Premium Storage, which is currently available only for storing data on disks used by Azure Virtual Machines.
So my suggestion is :
If you want your application to support streaming and random access scenarios, and be able to access application data from anywhere. You should use block blobs with the standard account.
If you want to lift and shift applications that use native file system APIs to read and write data to persistent disks. Or you want to store data that is not required to be accessed from outside the virtual machine to which the disk is attached. You should use Page blobs.
Reference link:
Understanding Block Blobs, Append Blobs, and Page Blobs

Start downloading partial azure blob before it's completed

Is there a form of blob (block, page, etc) that will allow this using the C# api? Machine X could be uploading a file to an azure blob endpoint, and machine Y be reading the file in real-time. I seems to me like block blob won't work, because you need to put the block list before you can query the http endpoint for it, but is there a way to query for uncommitted blocks and download those beforehand?
An example of this in practice, is that a user machine does a handshake with the server, gets a write shared access token from the server and permission to upload the file. Client #1 machine begins uploading - now say, a second client machine requests the file from the server but client #1 has not finished the upload. In this case, client #2 will get relevant details from the server, and a read-only shared access token, and then begin to read the file even though the upload has not been finished yet.
With Block Blobs I don't think it is possible to start downloading the blob while it is still being uploaded. This is simply because nothing is stored at this time. When you upload blocks for a blob, Azure Storage simply stores the byte chunks someplace.
It is only when you commit the block list, Azure Storage creates a block blob by arranging the byte chunks based on the request payload in commit block list operation. Even though Azure Storage lets to see the block list before it is committed, it doesn't expose any API to read the contents of a block.
I don't think Page Blob is the correct type of blob in your scenario (same with Append Blob) even though the moment you write the page it gets committed in the blob and other caller can get the page ranges and start downloading the data stored in those pages. However a page blob size has to be a multiple of 512 bytes and not all files uploaded in your application would meet this requirement.

Categories

Resources