Cross-partition paging in CosmosDB - c#

I have a CosmosDB collection that is partitioned and where throughput is set to 10,000 RU/s (the problem does not occur when throughput is below 6100 RU/s).
Now I issue an arbitrary document query (for example to retrieve all documents in the collection) with a variable pageSize and a continuationToken (initially set to null):
var q = DocumentClient.CreateDocumentQuery<T>(CollectionUri,
new FeedOptions
{
MaxItemCount = pageSize,
EnableCrossPartitionQuery = true,
RequestContinuation = continuationToken
});
Now if I call
FeedResponse<T> response = await q.ExecuteNextAsync<T>();
I would expect the response to be paged according to the specified pageSize. In particular, if pageSize = -1 or pageSize = int.MaxValue, I want only exactly one page with all results to be returned. However, the resulting pages are fragmented along the partitions.
For example, with pageSize = -1 or pageSize = int.MaxValue I would get a page with 18 objects from the first partition, and only when ExecuteNextAsync is called a second time, I would get the remaining 35 objects from the other two partitions.
With pageSize = 17 I would first get a page with 17 objects on the first call of ExecuteNextAsync, then a page with 1 object on the next call, and then another page with 17 objects!
But this renders paging (almost) completely useless! Or is there a way to implement paging properly (even when throughput is above 6000 RU/s)?

Based on Nick Chapsas' information that ExecuteNextAsync may return fewer than MaxItemCount items even if more are available, I am using the following workaround:
List<T> result = new List<T>();
string continuationToken = null;
IDocumentQuery<T> docQuery = queryable.AsDocumentQuery();
// ugly hack to get the feed options using reflection
FeedOptions feedOptions = docQuery.GetNonPublicProperty<FeedOptions>("feedOptions");
while (docQuery.HasMoreResults && (pageSize <= 0 || result.Count < pageSize))
{
if (feedOptions != null && pageSize > 0)
{
feedOptions.MaxItemCount = pageSize - result.Count;
}
FeedResponse<T> response = await docQuery.ExecuteNextAsync<T>();
result.AddRange(response.ToList());
continuationToken = response.ResponseContinuation;
}
return (result, continuationToken);
Getting the private property using reflection is not very nice, but there doesn't seem to be any other way to get hold of the query's FeedOptions. In particular, the FeedOptions used for calling DocumentClient.CreateDocumentQuery<T> are cloned internally, so it's really a private instance.

MaxItemCount represents the maximum data that a single request to a partition will return. It is not guaranteed to always be that and sometimes, it will even be empty.
For that reason, you should leave MaxItemCount out of your pagination logic, as it has nothing to do with what you're trying to achieve.
Instead what you really want is the following:
Here's an implementation with a pageSize & nextPageToken combo. The continuation token is in the FeedOptions of the query;
var results = new List<T>();
var nextPageToken = string.Empty;
while (query.HasMoreResults)
{
if (results.Count == pageSize)
break;
var items = await query.ExecuteNextAsync<T>(cancellationToken);
nextPageToken = items.ResponseContinuation;
foreach (var item in items)
{
results.Add(item);
if (results.Count == pageSize)
break;
}
}
return (results, nextPageToken);
For this to work on any RU/s, you will need to either wrap your query.ExecuteNextAsync<T>(cancellationToken); call with a retry wrapper or simply rump up the DocumentClient's retry options.
For further implementation details you can take a look on how Cosmonaut handles pagination and solves this issue and more specifically here. (Full disclosure, I am the creator of this library but I don't want to paste the full implementation here)

Related

How to get all entries from the database table using Take and Skip methods?

In our current application we have following functionality in Data layer:
public IEnumerable<User> GetUsers(IPagedAndFilteredAndSortedRequest request)
{
var users = dbContext.Users;
//1) "filteredAndSorted" is a result of applying filters and sorts on users
//2) "filteredAndSorted" is OrderedQueriable
//3) "rows" is number of rows to skip based on request.PageSize and request.PageNumber
var result = filteredAndSorted.Skip(rows).Take(request.PageSize);
return result.ToArray();
}
And we need to get all users from the database using this method. So, the questions are:
Is it a good idea to pass 1 as pageNumber and Int32.MaxValue as pageSize?
What is the maximum number of rows in MSSQL database table?
Is it a good idea to pass 1 as pageNumber and Int32.MaxValue as pageSize?
Not really. It would be best to add another property to the request, something like
var result = filteredAndSorted;
if (request.UsePaging)
{
result = filteredAndSorted.Skip(rows).Take(request.PageSize);
}
return result.ToArray();
Or use request.PageSize < 1 to turn off paging.

Why does my Azure Cosmos query return empty results when it should return many results?

I am running a query against my Cosmos db instance, and I am occasionally getting 0 results back, when I know that I should be getting some results.
var options = new QueryRequestOptions()
{
MaxItemCount = 25
};
var query = #"
select c.id,c.callTime,c.direction,c.action,c.result,c.duration,c.hasR,c.hasV,c.callersIndexed,c.callers,c.files
from c
where
c.ownerId=#ownerId
and c.callTime>=#dateFrom
and c.callTime<=#dateTo
and (CONTAINS(c.phoneNums_s, #name)
or CONTAINS(c.names_s, #name)
or CONTAINS(c.xNums_s, #name))
order by c.callTime desc";
var queryIterator = container.GetItemQueryIterator<CallIndex>(new QueryDefinition(query)
.WithParameter("#ownerId", "62371255008")
.WithParameter("#name", "harr")
.WithParameter("#dateFrom", dateFrom) // 5/30/2020 5:00:00 AM +00:00
.WithParameter("#dateTo", dateTo) // 8/29/2020 4:59:59 AM +00:00
.WithParameter("#xnum", null), requestOptions: options, continuationToken: null);
if (queryIterator.HasMoreResults)
{
var feed = queryIterator.ReadNextAsync().Result;
model.calls = feed.ToList(); //feed.Resource is empty; feed.Count is 0;
model.CosmosContinuationToken = feed.ContinuationToken; //feed.ContinuationToken is populated with a large token value, indicating that there are more results, even though this fetch returned 0 items.
model.TotalRecords = feed.Count(); // 0
}
As you can see, even though I received 0 results, the continuation token indicates that there is more data there after this first request. And, after visually inspecting the data directly in the database (data explorer in the Azure portal), I see records that should match, but they are not found in this query. To further test, I ran the same exact query a few seconds later, and received results:
var query = #"
select c.id,c.callTime,c.direction,c.action,c.result,c.duration,c.hasR,c.hasV,c.callersIndexed,c.callers,c.files
from c
where
c.ownerId=#ownerId
and c.callTime>=#dateFrom
and c.callTime<=#dateTo
and (CONTAINS(c.phoneNums_s, #name)
or CONTAINS(c.names_s, #name)
or CONTAINS(c.xNums_s, #name))
order by c.callTime desc";
var queryIterator = container.GetItemQueryIterator<CallIndex>(new QueryDefinition(query)
.WithParameter("#ownerId", "62371255008")
.WithParameter("#name", "harr")
.WithParameter("#dateFrom", dateFrom) // 5/30/2020 5:00:00 AM +00:00
.WithParameter("#dateTo", dateTo) // 8/29/2020 4:59:59 AM +00:00
.WithParameter("#xnum", null), requestOptions: options, continuationToken: null);
if (queryIterator.HasMoreResults)
{
var feed = queryIterator.ReadNextAsync().Result;
model.calls = feed.ToList(); //feed.Resource has 25 items; feed.Count is 25;
model.CosmosContinuationToken = feed.ContinuationToken; //feed.ContinuationToken is populated, but it is considerably smaller than the token I received from the first request.
model.TotalRecords = feed.Count(); // 25
}
This is the exact query as before, but this time the feed gave me the results I expected. This has happened more than once, and continues to happen intermittently. What gives with this? Is this a bug in Azure Cosmos? If so, it seems like a serious bug that breaks the very core functionality of Cosmos (and databases in general).
Or, is this expected? Is it possible that in the first query, I need to continue to ReadNextAsync until I get some results back using the continuation token?
Any help is appreciated, as this is breaking very basic functionality in my app.
Also, I would like to add that the data returned from the query has not been newly added between the times of my first query attempt, and my second query attempt. That data has been there for a while.
Your code is correct, you are expected to drain the query checking HasMoreResults (although I would change the .Result with await to avoid a possible deadlock). What can happen in cross-partition queries is that you could get some empty page if the initial partitions checked for results have none.
Sometimes queries may have empty pages even when there are results on a future page. Reasons for this could be:
The SDK could be doing multiple network calls.
The query might be taking a long time to retrieve the documents.
Reference: https://learn.microsoft.com/azure/cosmos-db/troubleshoot-query-performance#common-sdk-issues
Try using below code:
Query Cosmos DB method:
public async Task<DocDbQueryResult> QueryCollectionBaseWithPagingInternalAsync(FeedOptions feedOptions, string queryString, IDictionary<string, object> queryParams, string collectionName)
{
string continuationToken = feedOptions.RequestContinuation;
List<JObject> documents = new List<JObject>();
IDictionary<string, object> properties = new Dictionary<string, object>();
int executionCount = 0;
double requestCharge = default(double);
double totalRequestCharge = default(double);
do
{
feedOptions.RequestContinuation = continuationToken;
var query = this.documentDbClient.CreateDocumentQuery<JObject>(
UriFactory.CreateDocumentCollectionUri(this.databaseName, collectionName),
new SqlQuerySpec
{
QueryText = queryString,
Parameters = ToSqlQueryParamterCollection(queryParams),
},
feedOptions)
.AsDocumentQuery();
var response = await query.ExecuteNextAsync<JObject>().ConfigureAwait(false);
documents.AddRange(response.AsEnumerable());
executionCount++;
requestCharge = executionCount == 1 ? response.RequestCharge : requestCharge;
totalRequestCharge += response.RequestCharge;
continuationToken = response.ResponseContinuation;
}
while (!string.IsNullOrWhiteSpace(continuationToken) && documents.Count < feedOptions.MaxItemCount);
var pagedDocuments = documents.Take(feedOptions.MaxItemCount.Value);
var result = new DocDbQueryResult
{
ResultSet = new JArray(pagedDocuments),
TotalResults = Convert.ToInt32(pagedDocuments.Count()),
ContinuationToken = continuationToken
};
// if query params are not null, use existing query params also to be passed as properties.
if (queryParams != null)
{
properties = queryParams;
}
properties.Add("TotalRequestCharge", totalRequestCharge);
properties.Add("ExecutionCount", executionCount);
return result;
}
ToSqlQueryParamterCollection method:
private static SqlParameterCollection ToSqlQueryParamtereCollection(IDictionary<string, object> queryParams)
{
var coll = new SqlParameterCollection();
if (queryParams != null)
{
foreach (var paramKey in queryParams.Keys)
{
coll.Add(new SqlParameter(paramKey, queryParams[paramKey]));
}
}
return coll;
}

Redis Optimization with .NET, and a concrete example of How to Store and get an element from Hash

I have more than 15000 POCO elements stored in a Redis List. I'm using ServiceStack in order to save and get them. However, I'm not pleased about the response times that I have when I get them into a grid. As I read , it would be better to store these object in hash - but unfortunately I could not find any good example for my case :(
This is the method I use, in order to get them into my grid
public IEnumerable<BookingRequestGridViewModel> GetAll()
{
try
{
var redisManager = new RedisManagerPool(Global.RedisConnector);
using (var redis = redisManager.GetClient())
{
var redisEntities = redis.As<BookingRequestModel>();
var result =redisEntities.Lists["BookingRequests"].GetAll().Select(z=> new BookingRequestGridViewModel
{
CreatedDate =z.CreatedDate,
DropOffBranchName =z.DropOffBranch !=null ? z.DropOffBranch.Name : string.Empty,
DropOffDate =z.DropOffDate,
DropOffLocationName = z.DropOffLocation != null ? z.DropOffLocation.Name : string.Empty,
Id =z.Id.Value,
Number =z.Number,
PickupBranchName =z.PickUpBranch !=null ? z.PickUpBranch.Name :string.Empty,
PickUpDate =z.PickUpDate,
PickupLocationName = z.PickUpLocation != null ? z.PickUpLocation.Name : string.Empty
}).OrderBy(z=>z.Id);
return result;
}
}
catch (Exception ex)
{
return null;
}
}
Note that I use redisEntities.Lists["BookingRequests"].GetAll() which is causing performance issues (I would like to use just redisEntities.Lists["BookingRequests"] but I lose last updates from grid - after editing)
I would like to know if saving them into list is a good approach as for me it's very important to have a fast grid (I have now 1 second at paging which is huge).
Please, advice!
Firstly you should not create a new Redis Client Manager like RedisManagerPool instance each time, there should only be a singleton instance of RedisManagerPool in your App which all clients are resolved from.
But otherwise I would rethink your data access strategy, downloading 15K items in a batch is not an ideal strategy. You can create indexes by storing ids in Sets or you could store items in a sorted set with a value that you can page against like an incrementing id, e.g:
var redisEntities = redis.As<BookingRequestModel>();
var bookings = redisEntities.SortedSets["bookings"];
foreach (var item in new BookingRequestModel[0])
{
redisEntities.AddItemToSortedSet(bookings, item, item.Id);
}
That way you will be able to fetch them in batches, e.g:
var batch = bookings.GetRangeByLowestScore(fromId, toId, skip, take);

Issues querying on a partitioned CosmosDB collection

I Am trying to do a cross partition query on Azure CosmosDB without a partition key. The throughput is set to be 4000, I get 250RU/s per partition key range.
My cosmos db collection has about 1million documents and is a total of 70gb in size. They are spread evenly across approx 40,000 logical partitions, the json documents are on average 100kb in size. This is what the structure of my json documents look like:
"ArrayOfObjects": [
{
// other properties omitted for brevity
"SubId": "ed2a49fb-51d4-45b4-9690-df0721d6a32f"
},
{
"SubId": "35c87833-9bea-4151-86da-4d9c482ae1fe"
},
"ParitionKey": "b42"
This is how I am querying currently without a partition key:
public async Task<ResponseModel> GetBySubId(string subId)
{
var collectionId = _cosmosClient.CollectionId;
var query = $#"SELECT * FROM {collectionId} c
WHERE ARRAY_CONTAINS(c.ArrayOfObjects, {{'SubId': '{subId}'}}, true)";
var feedOptions = new FeedOptions { EnableCrossPartitionQuery = true };
var docQuery = _cosmosClient.Client.CreateDocumentQuery(
_collectionUri,
query,
feedOptions)
.AsDocumentQuery();
var results = new List<ResponseModel>();
while (docQuery.HasMoreResults)
{
var executedQuery = await docQuery.ExecuteNextAsync<ResponseModel>();
if (executedQuery.Count != 0)
{
results.AddRange(executedQuery.ToList());
}
}
if (results.Count == 0)
{
return null;
}
return results.FirstOrDefault();
}
I am expecting to to be able to retrieve the document via one of the SubId's right after inserting it. What actually happens is that it is unable to get the document and returns back null even after the query finishes execution by draining all continuation tokens. This issue is intermittent and inconsistent as sometimes it can get the document after it is inserted other times not.
For those documents that are failing to be retrieved after being inserted, if you wait some time (a couple of minutes usually) and repeat the query with the same SubId it is able to then retrieve the document. There seems to be a delay.
I have checked the cosmosdb metrics in the Azure portal, the metrics indicate that I have not exceeded the provisioned RU/s per partition at all or that there has been any rate limiting in my requests (HTTP 429).
Given the above why am I still seeing issues with cross partition querying even when there is enough throughput provisioned?

Getting more than 100 documents back from ExecuteStoredProcedureAsync

I have a CosmosDB instance that is using the SQL / DocumentDB interface. I am accessing it via the .NET SDK.
I have the stored procedure that I call with ExecuteStoredProcedureAsync. But I can only get a max of 100 documents back. I know this is the default option. Can I change it?
The optional parameter to ExecuteStoredProcedureAsync is a RequestOptions object. The RequestOptions doesn't have properties for MaxItemCount or continuation tokens.
You need to change the SP itself to adjust the amount of records you'd like to return. Here is a complete example with the implemented skip/take logic in SP-
function storedProcedure(continuationToken, take){
var filterQuery = "SELECT * FROM ...";
var accept = __.queryDocuments(__.getSelfLink(), filterQuery, {pageSize: take, continuation: continuationToken},
function (err, documents, responseOptions) {
if (err) throw new Error("Error" + err.message);
__.response.setBody({
result: documents,
continuation: responseOptions.continuation
});
});
}
Here is a corresponding C# code:
string continuationToken = null;
int pageSize = 500;
do
{
var r = await client.ExecuteStoredProcedureAsync<dynamic>(
UriFactory.CreateStoredProcedureUri(DatabaseId, CollectionId, "SP_NAME"),
new RequestOptions { PartitionKey = new PartitionKey("...") },
continuationToken, pageSize);
var documents = r.Response.result;
// processing documents ...
// 'dynamic' could be easily substituted with a class that will cater your needs
continuationToken = r.Response.continuation;
}
while (!string.IsNullOrEmpty(continuationToken));
As you can see, there is a parameter that controls the number of records to send back - pageSize. As you've noticed, pageSize is 100 by default. In case you need to return all at once, specify -1.
The RequestOptions doesn't have properties for MaxItemCount or
continuation tokens.
MaxItemCount is a parameter in Feedoptions.
ExecuteStoredProcedureAsync method does not limit the returned data entries, the key is your query operation in the Stored Procedure set the maximum number of entries you want to return.
Please refer to the sample stored procedure code as below :
function sample(prefix) {
var collection = getContext().getCollection();
var isAccepted = collection.queryDocuments(
collection.getSelfLink(),
'SELECT * FROM root r',
{ pageSize: 1000 },
function (err, feed, options) {
if (err) throw err;
if (!feed || !feed.length) {
var response = getContext().getResponse();
response.setBody('no docs found');
}
else {
var response = getContext().getResponse();
var body = "";
for(var i=0 ; i<feed.length;i++){
body +="{"+feed[i].id+"}";
}
response.setBody(JSON.stringify(body));
}
});
if (!isAccepted) throw new Error('The query was not accepted by the server.');
}
Result :

Categories

Resources