Cosmos DB Attachment Limits and Alternate Attachment Locations

Cosmos DB Attachment Limits and Alternate Attachment Locations - c#

We're moving the data storage for our core product to Cosmos DB. For documents, it works very well but I'm having trouble finding the information I need for attachments.
I can successfully do everything I want with attachments from my C# code using the Microsoft.Azure.DocumentDB NuGet ackage v 1.19.1.
According to information I can find, attachments are limited to 2GB total for all attachments in an account. This is hugely limiting. Info found here:
https://learn.microsoft.com/en-us/azure/cosmos-db/sql-api-resources#attachments-and-media
It states:
Azure Cosmos DB allows you to store binary blobs/media either with Azure Cosmos DB (maximum of 2 GB per account) or to your own remote media store.
There seems to be some implication that you can create attachments that point to resources stored elsewhere. Perhaps on a CDN. But I can't find any documentation how to actually do this from C#.
Does anyone know if Cosmos DB can, in fact, attach to BLOB payloads stored outside of itself? If so, can the .NET NuGet package do it or is it only available for pure REST calls?
Many thanks in advance.

There's nothing inherently built-in to manage externally-stored attachments. Rather, it's up to you to store them and then reference them.
The most common pattern is to store a URL to the specific attachment, with a document (e.g. to a blob in Azure Storage). This results in effectively two operations:
A query to retrieve the document from Cosmos DB
A read from storage, based on the URL found in the returned Cosmos DB document.
Note: all responsibility is on you to manage referenced content: updating it, deleting it, etc. And if you're using blob storage, you'll need to deal with things such as private vs public access (and generating SAS for private URLs where necessary, when returning URLs to your clients, vs streaming content).
One more thing: CDN isn't a storage mechanism on its own. You cannot store something directly to CDN; that's more of a layer on top of something like Azure Storage (for public-accessible content).

Related

Copy .csv file from Azure Blob Storage to Sharepoint site

I have a CSV file stored in blob storage. The goal is to move this file into a Sharepoint site and set some metadata. What would be the best way to do this? The client does not want us to use Power Automate or Logic Apps.
I tried using Azure Data Factory but there seems to be an issue with writing data to SharePoint. I used the copy activity but the 'sink' to SharePoint failed. Does data factory support writing to Sharepoint?

The client does not want us to use Power Automate or Logic Apps.
Why not? This is the simplest way to achieve this, and is also better maintainable than for instance C# code.
Does data factory support writing to Sharepoint?
Yes, it does. However, using Data Factory only to copy a file to SharePoint is quite a bit of overkill.
If Logic Apps are not an option, have a look at an Azure Function to automatically trigger when the file is created in Azure Storage, and have a look at for instance Upload File To SharePoint Office 365 Programmatically Using C# CSOM – PNP for a C# way of uploading a file to SharePoint.

Get/Set metadata of azure files via unc

I am accessing the azure files share via UNC. I have it mount in a windows vm and I am able to access/read files. However I also need to read and do things based on the metadata that is being set on those files.
As far I know the metadata are custom key-value pairs that can be stored on a azure file share, folder and files. A different application set it via Rest API sdks.
So, is there any way to get/set those custom metadata by mounting it in a vm?
I am using a c# program to read into the share and list files in order to find newly uploaded files. Although it works on checking last modified date, I still need to filter on a specific metadata to prevent double processing.

The REST metadata can only be accessed from REST and is not available through SMB.
If you wish you may leave your feedback here. All the feedback you share in these forums will be monitored and reviewed by the Microsoft engineering teams responsible for building Azure.

Copy Azure Storage Tables in C#

Thanks to Azcopy, it is quite easy to transfer data between different azure storage accounts in command line. But I failed to find an efficient way to copy Azure Table Storage in C#. I noticed that there is an Microsoft Azure Storage Data Movement Library that claims to power the Azcopy, but seems there is no direct way to copy tables according to the library reference. Any suggestions to implement that efficiently?
P.S. I have millions of entities to transfer now and then, and I prefer to integrate it in a C# project without using cmd.exe or power shell.

I noticed that there is an Microsoft Azure Storage Data Movement Library that claims to power the Azcopy, but seems there is no direct way to copy tables according to the library reference.
As you mentioned that there is no method about copy tables in the Microsoft Azure Storage Data Movement Library.
prefer to integrate it in a C# project without using cmd.exe or power shell.
About how to operate Azure table storage with C#, we could refer to Get started with Azure Table storage using .NET.
I have millions of entities to transfer now
As it is a huge number of entities need to be transfered. Based on my experience, we could use the Azure Data Factory to do that.
Related resources:
ETL using azure table storage
Copy data to or from Azure Table using Azure Data Factory

How to do content level search in AmazonS3

I have some files(.txt, .doc, .xlsx etc) inside a bucket in my AmazonS3 drive and is it possible to perform a content level search through my C# application? That is, when we type a string and upon pressing key in my application, every files that contains the searched string in its content should list.
Is there any way to achieve this either using any method or even using WebAPI's.
Thanks in advance

Amazon S3 is purely a storage service. There is no search capability built into S3.
You could use services such as Amazon CloudSearch and Amazon Elasticsearch Service, which can index documents, but please note that this involves additional configuration and additional costs.

You won't be able to do all those file types you listed, but any of your files that are structured, or semi-structured, you could consider using the newly released AWS Athena which does allow searching of S3 file using an SQL-like language:
https://aws.amazon.com/athena/faqs/
Amazon Athena is an interactive query service that makes it easy to
analyze data in Amazon S3 using standard SQL. Athena is serverless, so
there is no infrastructure to setup or manage, and you can start
analyzing data immediately. You don’t even need to load your data into
Athena, it works directly with data stored in S3. To get started, just
log into the Athena Management Console, define your schema, and start
querying. Amazon Athena uses Presto with full standard SQL support and
works with a variety of standard data formats, including CSV, JSON,
ORC, Apache Parquet and Avro. While Amazon Athena is ideal for quick,
ad-hoc querying and integrates with Amazon QuickSight for easy
visualization, it can also handle complex analysis, including large
joins, window functions, and arrays.

Good practice for working with files in ASP.NET web api?

I'm new working with files so i have done some reading, altough i feel that i'm still not certain how to deal with them using asp.net web api.
What i want is to be able to reach images thru my web api. What i've read so far is that many people prefer saving the file and then call for its URI, instead of saving the image to the database u only save the URI there. So I then created a imageController on the web api that does exactly this(Atleast working using localhost). I now get some people arguing that i should use blob storage(since i use Azure).
My question is: Is it wrong or bad practice to have a folder in my project where i save my image files? Else what would be the better way to save images?

Your question is really two questions:
1. database vs. filesystem
It depends on 2 main factors: security and performance.
If your images are sensitive and the risk of accessing them "outside" your app (for example by hotlinking) is unacceptable, you must go for database, and serving images via ASP.NET request - which you can authenticate any way you want. However, this is MUCH more resources-intensive than second option (see below).
If security is no concern, you definitely want to go for filesystem storage. On traditional hosting, you would save them "anywhere" on disk, and the IIS (the webserver) would serve them to user (by direct URL, bypassing your ASP.NET application). This alone is HUGE performance improvement over DB+ASP.NET solution for many reasons (request thread pool, memory pressure, avg. request duration, caching on IIS...).
2. Local directory on webrole vs. blob storage
However, in Azure you can, and HAVE TO go one step further - use dedicated blob storage, independent from your web role, so not even your IIS on webrole will be serving them, but dedicated server on blob storage (this does not need to concern you at all - it just works). Your web role should not store anything permanently - it can fail, be destroyed and replaced with new one at any time by Azure fabric. All permanent stuff must go to Azure blob storage.

Adding to #rouen's excellent answer (and I will focus only on local directory v/s blob storage).
Assuming you're deploying your WebApi as an Azure Web Application (instead of Web Role), consider the following scenarios:
What would happen to your images if accidentally you delete your Web Application. In that case, all your images will be lost.
What would happen if you need to scale your application from one instances to more than one. Since the files are located in instances local directory, they will not be replicated across other instances. If a request to fetch the image lands on an instance where the image is not present on the VM, then you will not be able to serve that image.
What would happen if you need to store more images than the disk size available to you in your web application?
What would happen if you need to serve the same images through another application (other than your WebApi)?
Considering all these, my recommendation would be to go with blob storage. What you do is store the image in blob storage and save the image URL in the database. Some of the advantages offered by blob storage are:
At this time, you can store 500 GB worth of data in a single blob storage account. If you need more space, you simply create another storage account. Currently you can create 100 storage accounts per subscription so essentially you can store 50TB worth of data in a single subscription.
Blob storage is cheap and you only pay for the storage that you use + storage transaction costs which are also very cheap.
Contents in blob storage are replicated thrice and if you opt for geo-redundancy option (there's an extra cost for that), your content is replicated six time (three times in primary region + three times in secondary region). However you should not confuse replication with backup!
Since the content is served from blob storage, you're essentially freeing up IIS server from serving that content.
You can take advantage of Azure CDN and serve the content from a CDN node closest to your end user.
Considering all these advantages, I would strongly urge you to take a look at blob storage.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.