Getting multiple entities using Azure TableStorage over multiple partitions

Getting multiple entities using Azure TableStorage over multiple partitions - c#

I'm using Azure.Data.Tables (12.6.1) and I need to query a single record from multiple partitions of a single table (so the result would be multiple records, 1 from each partition). Each entity needs to be looked up by its partition key and row key - for a single TableClient.GetEntity() call this would be a point query.
After reading the documentation I'm confused if it's efficient or not to call TableClient.QueryAsync() with multiple partition key / row key pairs and the search results I found provide contradicting suggestions.
Is it efficient to do this (for a number of partition key / row key combinations, up to ~50) or is it just better to call GetEntity() one by one, for each entity?
var filter = "(PartitionKey eq 'p1' And RowKey eq 'r1') Or " +
"(PartitionKey eq 'p2' And RowKey eq 'r2') Or ...";
var results = await tableClient.QueryAsync(filter, 500, null, cancelToken);

I don't know if there is a definitive answer here as it probably depends on your specific requirements. I would suggest testing different options and tune accordingly.
Just for reference, here is a general reference about query performance for tables https://learn.microsoft.com/azure/storage/tables/table-storage-design-for-query

I settled on parallelizing point queries for this scenario, and has given good results. I have heavy-burst read scenarios, I may have 10's/100's of 1000's of lookups to do against 100's of millions of records). I prefer that over a query with a series of OR's, as those were tending to give worse throughput (I don't have any stats to hand now....)
For me parallelization happens through 2 means:
lower level: awaiting a batch of Tasks, each making an individual point query
higher level: architecting a particularly heavy workload to scale out over multiple instances, each making parallel queries via 1)

Related

Azure Table Storage multi row query performance

We have had issue in a service utilizing Azure Table Storage where sometimes the queries take multiple seconds (3 to 30 seconds). This happens daily, but only for some of the queries. We do not have huge load on the service and the table storage (some hundreds of calls per hour). But still the table storage is not performing.
The slow queries are all doing filter queries that should return in maximum 10 rows. I have the filters structured so that there is always partition key and row key joined by and followed by next pair of partition and row keys after an or operator:
(partitionKey1 and RowKey1) or (partitionKey2 and rowKey2) or (partitionKey3 and rowKey3)
So qurrently I am on the premise that I need to split the query into separate queries. This was somewhat verified with a python script I did. Where when I repeat same query as single query (combined query with or's and expecting multiple rows as result) or split to multiple queries executed in separate treads, I see the combined query slow up every now and then.
import time
import threading
from azure.cosmosdb.table.tableservice import TableService
from azure.cosmosdb.table.models import Entity
############################################################################
# Script for querying data from azure table storage or cosmos DB table API.
# SAS token needs to be generated for using this script and a table with data
# needs to exist.
#
# Warning: extensive use of this script may burden the table performance,
# so use with care.
#
# PIP requirements:
# - requires azure-cosmosdb-table to be installed
# * run: 'pip install azure-cosmosdb-table'
dateTimeSince = '2019-06-12T13:16:45.446Z'
sasToken = 'SAS_TOKEN_HERE'
tableName = 'TABLE_NAME_HERER'
table_service = TableService(account_name="ACCOUNT_NAME_HERE", sas_token=sasToken)
tableFilter = "(PartitionKey eq 'http%3a%2f%2fsome_website.azurewebsites.net%2fApiName_ed6d31b0') and (RowKey eq 'ed6d31b0-d2a3-4f18-9d16-7f72cbc88cb3') or (PartitionKey eq 'http%3a%2f%2fsome_website.azurewebsites.net%2fApiName_9be86f34') and (RowKey eq '9be86f34-865b-4c0f-8ab0-decf928dc4fc') or (PartitionKey eq 'http%3a%2f%2fsome_website.azurewebsites.net%2fApiName_97af3bdc') and (RowKey eq '97af3bdc-b827-4451-9cc4-a8e7c1190d17') or (PartitionKey eq 'http%3a%2f%2fsome_website.azurewebsites.net%2fApiName_9d557b56') and (RowKey eq '9d557b56-279e-47fa-a104-c3ccbcc9b023') or (PartitionKey eq 'http%3a%2f%2fsome_website.azurewebsites.net%2fApiName_e251a31a') and (RowKey eq 'e251a31a-1aaa-40a8-8cde-45134550235c')"
resultDict = {}
# Do separate queries
filters = tableFilter.split(" or ")
threads = []
def runQueryPrintResult(filter):
result = table_service.query_entities(table_name=tableName, filter=filter)
item = result.items[0]
resultDict[item.RowKey] = item
# Loop where:
# - Step 1: test is run with tableFilter query split to multiple threads
# * returns single row per query
# - Step 2: Query is runs tableFilter query as single query
# - Press enter to repeat the two query tests
while 1:
start2 = time.time()
for filter in filters:
x = threading.Thread(target=runQueryPrintResult, args=(filter,))
x.start()
threads.append(x)
for x in threads:
x.join()
end2 = time.time()
print("Time elapsed with multi threaded implementation: {}".format(end2-start2))
# Do single query
start1 = time.time()
listGenerator = table_service.query_entities(table_name=tableName, filter=tableFilter)
end1 = time.time()
print("Time elapsed with single query: {}".format(end1-start1))
counter = 0
allVerified = True
for item in listGenerator:
if resultDict[item.RowKey]:
counter += 1
else:
allVerified = False
if len(listGenerator.items) != len(resultDict):
allVerified = False
print("table item count since x: " + str(counter))
if allVerified:
print("Both queries returned same amount of results")
else:
print("Result count does not match, single threaded count={}, multithreaded count={}".format(len(listGenerator.items), len(resultDict)))
input('Press enter to retry test!')
Here is an example output from the python code:
Time elapsed with multi threaded implementation: 0.10776209831237793
Time elapsed with single query: 0.2323908805847168
table item count since x: 5
Both queries returned same amount of results
Press enter to retry test!
Time elapsed with multi threaded implementation: 0.0897986888885498
Time elapsed with single query: 0.21547174453735352
table item count since x: 5
Both queries returned same amount of results
Press enter to retry test!
Time elapsed with multi threaded implementation: 0.08280491828918457
Time elapsed with single query: 3.2932426929473877
table item count since x: 5
Both queries returned same amount of results
Press enter to retry test!
Time elapsed with multi threaded implementation: 0.07794523239135742
Time elapsed with single query: 1.4898555278778076
table item count since x: 5
Both queries returned same amount of results
Press enter to retry test!
Time elapsed with multi threaded implementation: 0.07962584495544434
Time elapsed with single query: 0.20011520385742188
table item count since x: 5
Both queries returned same amount of results
Press enter to retry test!
The service we have problems with is implemented in C# though and I have yet to reproduce the results gotten with python script on the C# side. There I seem to have worse performance when splitting the query to multiple separate queries vs using single filter query (returning all the required rows).
So doing following multiple times and awaiting all to complete seems to be slower:
TableOperation getOperation =
TableOperation.Retrieve<HqrScreenshotItemTableEntity>(partitionKey, id.ToString());
TableResult result = await table.ExecuteAsync(getOperation);
Than doing all in single query:
private IEnumerable<MyTableEntity> GetBatchedItemsTableResult(Guid[] ids, string applicationLink)
{
var table = InitializeTableStorage();
TableQuery<MyTableEntity> itemsQuery=
new TableQuery<MyTableEntity>().Where(TableQueryConstructor(ids, applicationLink));
IEnumerable<MyTableEntity> result = table.ExecuteQuery(itemsQuery);
return result;
}
public string TableQueryConstructor(Guid[] ids, string applicationLink)
{
var fullQuery = new StringBuilder();
foreach (var id in ids)
{
// Encode link before setting to partition key as REST GET requests
// do not accept non encoded URL params by default)
partitionKey = HttpUtility.UrlEncode(applicationLink);
// Create query for single row in a requested partition
string queryForRow = TableQuery.CombineFilters(
TableQuery.GenerateFilterCondition("PartitionKey", QueryComparisons.Equal, partitionKey),
TableOperators.And,
TableQuery.GenerateFilterCondition("RowKey", QueryComparisons.Equal, id.ToString()));
if (fullQuery.Length == 0)
{
// Append query for first row
fullQuery.Append(queryForRow);
}
else
{
// Append query for subsequent rows with or operator to make queries independent of each other.
fullQuery.Append($" {TableOperators.Or} ");
fullQuery.Append(queryForRow);
}
}
return fullQuery.ToString();
}
The test case used with the C# code is quite different though from the python test. In C# I am querying 2000 rows from data of something like 100000 rows. If the data is queried in batches of 50 rows the latter filter query beats the single row query run in 50 tasks.
Maybe I should just repeat the test I did with python in C# as a console app to see if the .Net client api seems to behave the same way as python perf vice.

I think you should use multi-threaded implementation, since it consists of multiple Point Query. Doing all in single query probably results in a Table Scan. As the official doc mentions:
Using an "or" to specify a filter based on RowKey values results in a partition scan and is not treated as a range query. Therefore, you should avoid queries that use filters such as: $filter=PartitionKey eq 'Sales' and (RowKey eq '121' or RowKey eq '322')
You might think the example above is two Point Queries, but it actually results in a Partition Scan.

To me the answer here seems to be that executing queries on table storage has not been optimized to work with OR operator as you would expect. Query is not handled as point query when it combines point queries with OR operator.
This can be reproduced in python, C# and Azure Storage Explorer in which all where if you combine point queries with OR it can be 10x slower (or even more) than doing separate point queries that only return one row.
So most efficient way to get number of rows with partition and row keys known is to do them all with separate async queries with TableOperation.Retrieve (in C#). Using TableQuery is highly inefficient and does not produce results anywhere near the performance scalability targets for Azure Table Storage are leading to expect. Scalability targets say for example: "Target throughput for a single table partition (1 KiB-entities) Up to 2,000 entities per second". And here I was not even able to be served with 5 rows per second although all rows were in different partitions.
This limitation in query performance is not very clearly stated anywhere in any documentation or performance optimization guide, but it could be understod from these lines in the Azure storage performance checklist:
Querying
This section describes proven practices for querying the table service.
Query scope
There are several ways to specify the range of entities to query. The following is a discussion of the uses of each.
In general, avoid scans (queries larger than a single entity), but if you must scan, try to organize your data so that your scans retrieve the data you need without scanning or returning significant amounts of entities you don't need.
Point queries
A point query retrieves exactly one entity. It does this by specifying both the partition key and row key of the entity to retrieve. These queries are efficient, and you should use them wherever possible.
Partition queries
A partition query is a query that retrieves a set of data that shares a common partition key. Typically, the query specifies a range of row key values or a range of values for some entity property in addition to a partition key. These are less efficient than point queries, and should be used sparingly.
Table queries
A table query is a query that retrieves a set of entities that does not share a common partition key. These queries are not efficient and you should avoid them if possible.
So "A point query retrieves exactly one entity" and "Use point queries when ever possible". Since I had split the data to partitions, it may have been handled as table query: "A table query is a query that retrieves a set of entities that does not share a common partition key". This although the query combined set of point queries as it listed both partition and row keys for all entities that were expected. But since the combined query was not retriewing only one query it cannot be expected to perform as point query (or set of point queries).

Posting as an answer since it was getting bigger for comments.
Can you try by changing your query to something like the following:
(PartitionKey eq 'http%3a%2f%2fsome_website.azurewebsites.net%2fApiName_ed6d31b0' and RowKey eq 'ed6d31b0-d2a3-4f18-9d16-7f72cbc88cb3') or (PartitionKey eq 'http%3a%2f%2fsome_website.azurewebsites.net%2fApiName_9be86f34' and RowKey eq '9be86f34-865b-4c0f-8ab0-decf928dc4fc') or (PartitionKey eq 'http%3a%2f%2fsome_website.azurewebsites.net%2fApiName_97af3bdc' and RowKey eq '97af3bdc-b827-4451-9cc4-a8e7c1190d17') or (PartitionKey eq 'http%3a%2f%2fsome_website.azurewebsites.net%2fApiName_9d557b56' and RowKey eq '9d557b56-279e-47fa-a104-c3ccbcc9b023') or (PartitionKey eq 'http%3a%2f%2fsome_website.azurewebsites.net%2fApiName_e251a31a' and RowKey eq 'e251a31a-1aaa-40a8-8cde-45134550235c')

Linq join three tables orderd by Timstamp

I have three tables:
Sites:
Id
Timezone (string)
SiteName
Vehicles:
Id
SiteId
Name
Positions:
Id
TimestampLocal (DateTimeOffset)
VehicleId
Data1
Data2
...
Data50
There are multiple positions for a single vehicle. The position table is very large (100+ mil. records)
I need to get the last position for each vehicle (by timestamp as they can send old data) and its timezone so I can do further data processing based on the timezone. Something like {PositionId, VehicleId, Timezone, Data1}
I have tried with:
var result =
from ot in entities.Positions
join v in entities.Vehicles on ot.VehicleId equals v.Id
join s in entities.Sites on v.SiteId equals s.Id
group ot by ot.VehicleId into grp
select grp.OrderByDescending(g=>g.TimestampLocal).FirstOrDefault();
then I process the data with:
foreach (var rr in result){... update Data1 field ... }
This gets the last values but it does bring all the fields in Positions (a lot of data) and no Timezone. Also the foreach part is very CPU intensive (as it probably brings the data) as it gets 100% CPU for a few seconds).
How can this be done in Linq ...and be lightweight for the DB and its transfers?

Check out the following it is on same lines as you have done but the result contains Sites Timezonecolumn as projected, it is projected in the Join itself
var result =
S.Join(V,s1=>s1.Id,v1=>v1.SiteId,(s1,v1)=>new {v1.Id,s1.Timezone})
.Join(P,v1=>v1.Id,p1=>p1.VehicleId,(v1,p1)=>new {p1,v1.Timezone})
.GroupBy(g=>g.p1.VehicleId)
.Select(x=>x.OrderByDescending(y=>y.p1.TimestampLocal).FirstOrDefault())
.Select(y=>new {y.p1,y.Timezone});
Now following are important points related to the questions you have asked:
To reduce the number of columns fetched, as you may not want all the Positions columns, following need to be done:
In this line - Join(P,v1=>v1.Id,p1=>p1.VehicleId,(v1,p1)=>new {p1,v1.Timezone})
project the fields, which are result of the Join, something like:
new {p1.Id,p1.TimestampLocal,p1.VehicleId,p1.Data1,p1...,v1.Timezone}
which will provide only the projected field, but then the GroupBy would change to GroupBy(g=>g.VehicleId)
Other option for the same would be change the GroupBy projection, as follows instead of Join statement
GroupBy(g=>g.p1.VehicleId,
g=>new {g.p1.Id,g.p1.TimestampLocal,g.p1.VehicleId,g.p1.Data1,g.p1...,g.Timezone})
Now the remaining part process being CPU intensive and using 100% CPU, following could be done to optimize:
Does each foreach loop iterations does the network Update call, then it is bound to make it a network intensive process and thus slow, preferable would be doing all the necessary changes in-memory and then updating the database in one go that would still be intensive if you have millions of record
Even doing the same for million records will never be a good idea, since that would be a fairly network and CPU intensive, thus making it slow, your options would be:
Chunk the memory into smaller components, since I am not sure if any one needs to update million records in one shot, so make it multiple smaller updates, which are executed one after another triggered by the user, but will be much less taxing on the system resources.
Bring smaller dataset in the memory, do the filtering at the database level using a parameter and pass the necessary data for modification in memory to update.
Using projection as shown in the Linq above, bring only the required columns, that would reduce overall data memory footprint, bound to have an impact.
If the logic is such that various updates are mutually exclusive, then do them using Parallel API in a thread safe structure, that would ensure efficient and quick CPU utilization of all the cores, thus faster, though it will spike to 100% CPU but to a fraction of the Non parallel execution
Beside this provide me specific details to help you our with more details, these are basic suggestions, there's no golden rule to solve such optimization issues

Is it possible to accelerate (dynamic) LINQ queries using GPU?

I have been searching for some days for solid information on the possibility to accelerate LINQ queries using a GPU.
Technologies I have "investigated" so far:
Microsoft Accelerator
Cudafy
Brahma
In short, would it even be possible at all to do an in-memory filtering of objects on the GPU?
Let´s say we have a list of some objects and we want to filter something like:
var result = myList.Where(x => x.SomeProperty == SomeValue);
Any pointers on this one?
Thanks in advance!
UPDATE
I´ll try to be more specific about what I am trying to achieve :)
The goal is, to use any technology, which is able to filter a list of objects (ranging from ~50 000 to ~2 000 000), in the absolutely fastest way possible.
The operations I perform on the data when the filtering is done (sum, min, max etc) is made using the built in LINQ-methods and is already fast enough for our application, so that´s not a problem.
The bottleneck is "simply" the filtering of data.
UPDATE
Just wanted to add that I have tested about 15 databases, including MySQL (checking possible cluster approach / memcached solution), H2, HSQLDB, VelocityDB (currently investigating further), SQLite, MongoDB etc, and NONE is good enough when it comes to the speed of filtering data (of course, the NO-sql solutions do not offer this like the sql ones, but you get the idea) and/or the returning of the actual data.
Just to summarize what I/we need:
A database which is able to sort data in the format of 200 columns and about 250 000 rows in less than 100 ms.
I currently have a solution with parallellized LINQ which is able (on a specific machine) to spend only nano-seconds on each row when filtering AND processing the result!
So, we need like sub-nano-second-filtering on each row.
Why does it seem that only in-memory LINQ is able to provide this?
Why would this be impossible?
Some figures from the logfile:
Total tid för 1164 frågor: 2579
This is Swedish and translates:
Total time for 1164 queries: 2579
Where the queries in this case are queries like:
WHERE SomeProperty = SomeValue
And those queries are all being done in parallell on 225639 rows.
So, 225639 rows are being filtered in memory 1164 times in about 2.5 seconds.
That´s 9,5185952917007032597107300413827e-9 seconds / row, BUT, that also includes the actual processing of the numbers! We do Count (not null), total count, Sum, Min, Max, Avg, Median. So, we have 7 operations on these filtered rows.
So, we could say it´s actually 7 times faster than the the databases we´ve tried, since we do NOT do any aggregation-stuff in those cases!
So, in conclusion, why are the databases so poor at filtering data compared to in-memory LINQ filtering? Have Microsoft really done such a good job that it is impossible to compete with it? :)
It makes sense though that in-memory filtering should be faster, but I don´t want a sense that it is faster. I want to know what is faster, and if it´s possible why.

I will answer definitively about Brahma since it's my library, but it probably applies to other approaches as well. The GPU has no knowledge of objects. It's memory is also mostly completely separate from CPU memory.
If you do have a LARGE set of objects and want to operate on them, you can only pack the data you want to operate on into a buffer suitable for the GPU/API you're using and send it off to be processed.
Note that this will make two round trips over the CPU-GPU memory interface, so if you aren't doing enough work on the GPU to make it worthwhile, you'll be slower than if you simply used the CPU in the first place (like the sample above).
Hope this helps.

The GPU is really not intended for all general purpose computing purposes, especially with object oriented designs like this, and filtering an arbitrary collection of data like this would really not be an appropriate thing.
GPU computations are great for things where you are performing the same operation on a large dataset - which is why things like matrix operations and transforms can be very nice. There, the data copying can be outweighed by the incredibly fast computational capabilities on the GPU....
In this case, you'd have to copy all of the data into the GPU to make this work, and restructure it into some form the GPU will understand, which would likely be more expensive than just performing the filter in software in the first place.
Instead, I would recommend looking at using PLINQ for speeding up queries of this nature. Provided your filter is thread safe (which it'd have to be for any GPU related work...) this is likely a better option for general purpose query optimization, as it won't require the memory copying of your data. PLINQ would work by rewriting your query as:
var result = myList.AsParallel().Where(x => x.SomeProperty == SomeValue);
If the predicate is an expensive operation, or the collection is very large (and easily partitionable), this can make a significant improvement to the overall performance when compared to standard LINQ to Objects.

GpuLinq
GpuLinq's main mission is to democratize GPGPU programming through LINQ. The main idea is that we represent the query as an Expression tree and after various transformations-optimizations we compile it into fast OpenCL kernel code. In addition we provide a very easy to work API without the need of messing with the details of the OpenCL API.
https://github.com/nessos/GpuLinq

select *
from table1 -- contains 100k rows
left join table2 -- contains 1M rows
on table1.id1=table2.id2 -- this would run for ~100G times
-- unless they are cached on sql side
where table1.id between 1 and 100000 -- but this optimizes things (depends)
could be turned into
select id1 from table1 -- 400k bytes if id1 is 32 bit
-- no need to order
stored in memory
select id2 from table2 -- 4Mbytes if id2 is 32 bit
-- no need to order
stored in memory, both arrays sent to gpu using a kernel(cuda,opencl) like below
int i=get_global_id(0); // to select an id2, we need a thread id
int selectedID2=id2[i];
summary__=-1;
for(int j=0;j<id1Length;j++)
{
int selectedID1=id1[j];
summary__=(selectedID2==selectedID1?j:summary__); // no branching
}
summary[i]=j; // accumulates target indexings of
"on table1.id1=table2.id2" part.
On the host side, you can make
select * from table1 --- query3
and
select * from table2 --- query4
then use the id list from gpu to select the data
// x is table1 ' s data
myList.AsParallel().ForEach(x=>query3.leftjoindata=query4[summary[index]]);
The gpu code shouldn't be slower than 50ms for a gpu with constant memory, global broadcast ability and some thousands of cores.
If any trigonometric function is used for filtering, the performance would drop fast. Also when left joined tables row count makes it O(m*n) complexity so millions versus millions would be much slower. GPU memory bandwidth is important here.
Edit:
A single operation of gpu.findIdToJoin(table1,table2,"id1","id2") on my hd7870(1280 cores) and R7-240(320 cores) with "products table(64k rows)" and a "categories table(64k rows)" (left join filter) took 48 milliseconds with unoptimized kernel.
Ado.Net 's "nosql" style linq-join took more than 2000 ms with only 44k products and 4k categories table.
Edit-2:
left join with a string search condition gets 50 to 200 x faster on gpu when tables grow to 1000s of rows each having at least hundreds of characters.

The simple answer for your use case is no.
1) There's no solution for that kind of workload even in raw linq to object, much less in something that would replace your database.
2) Even if you were fine with loading the whole set of data at once (this takes time) it would still be much slower as GPU have high thoroughput but their access is high latency, so if you're looking at "very" fast solutions GPGPU is often not the answer as just preparing / sending the workload and getting back the results will be slow, and in your case probably need to be done in chunks too.

NHibernate - Log items that appear in a search result

I am using NHibernate in an MVC 2.0 application. Essentially I want to keep track of the number of times each product shows up in a search result. For example, when somebody searches for a widget the product named WidgetA will show up in the first page of the search results. At this point i will increment a field in the database to reflect that it appeared as part of a search result.
While this is straightforward I am concerned that the inserts themselves will greatly slow down the search result. I would like to batch my statements together but it seems that coupling my inserts with my select may be counter productive. Has anyone tried to accomplish this in NHibernate and, if so, are there any standard patterns for completing this kind of operation?

Interesting question!
Here's a possible solution:
var searchResults = session.CreateCriteria<Product>()
//your query parameters here
.List<Product>();
session.CreateQuery(#"update Product set SearchCount = SearchCount + 1
where Id in (:productIds)")
.SetParameterList("productIds", searchResults.Select(p => p.Id).ToList())
.ExecuteUpdate();
Of course you can do the search with Criteria, HQL, SQL, Linq, etc.
The update query is a single round trip for all the objects, so the performance impact should be minimal.

How can I speed up my pagination code in ASP.NET MVC with Azure?

I'm using ASP.NET MVC and Azure Table Storage in the local development fabric. My pagination code is very slow when working with a large resultset:
var PageSize = 25;
var qResult2 = from c in svc.CreateQuery<SampleEntity>(sampleTableName)
where c.PartitionKey == "samplestring"
select c;
TableStorageDataServiceQuery<SampleEntity> tableStorageQuery =
new TableStorageDataServiceQuery<SampleEntity>
(qResult2 as DataServiceQuery<SampleEntity>);
var result = tableStorageQuery.ExecuteAllWithRetries()
.Skip((page - 1) * PageSize)
.Take(PageSize);
var numberOfEntities = tableStorageQuery.ExecuteAllWithRetries().Count
ViewData["TotalPages"] = (int)Math.Ceiling((double) numberOfEntities / PageSize);
ViewData["CurrentPage"] = page;
return View(result);
The ViewData is used by the View to calculate paging links using code from Sanderson's MVC book. For an Azure Table with 1000+ entities, this is very slow. For starters, "Count" takes quite a long time to calculate the total number of entities. If I'm reading my LINQ book correctly, this is because the query doesn't implement ICollection. The book is "Pro LINQ" by Joseph Rattz.
Even if I set "numberOfEntities" to the known total (e.g. 1500), the paging is still slow for pages above 10. I'm guessing that .Skip and/or .Take are slow. Also, I call ExecuteAllWithRetries() twice, and that can't be helping if in fact Azure is queried twice.
What strategy should I follow for paging through large datasets with ASP.NET MVC and Azure?
EDIT: I don't need to know the exact total number of pages.

Skip and Take aren't the problem here - they will be executed against the IEnumerable, which will already be in memory and thus very quick.
ExecuteAllWithRetries is likely to be the culprit here - you're basically retrieving all of the entities in the partition from the remote storage in this call, which will result in a very large payload.
Pagination in the manner you're showing is quite difficult in Table Storage. Here are a few issues:
The only order that's guaranteed is the PartitionKey/RowKey order, so you need to design your RowKeys with this in mind.
You can perform the Take in the query (ie, your qResult2), so this will reduce the number of entities going over the wire.
To perform the Skip-like functionality, you'll need to use a comparison operator. So you'll need to know where you are in the result set and query all RowKeys above that value (ie, add something like where c.RowKey > [lastRowKey] to your query)
There's no way to retrieve a count without keeping track of it yourself (or retrieving the entire table like you're already doing). Depending on your design, you could store the count along with each entity (ie, use an incrementing value) - but just make sure you keep track of concurrent edit conflicts, etc. If you do keep track of the count with each entity, then you can also perform your Skip using this as well. Another option would be to store the count in a single value in another entity (you could use the same table to ensure transactional behaviour). You could actually combine these approaches too (store the count in a single entity, to get the optimistic concurrency, and also store it in each entity so you know where it lies).
An alternative would be, if possible, to get rid of the count altogether. You'll notice a couple of large scalable sites do this - they don't provide an exact list of how many pages there are, but they might let you go a couple of pages ahead/back. This basically eliminates the need for count - you just need to keep track of the RowKeys for the next/prev pages.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.