Assume we have a dataset like this:
Name
Population
Capital
London
8,799,800
true
Barcelona
1,620,343
false
Luxembourg City
128,512
true
When working with SQL databases, the order in which you typed your ORDER, LIMIT and WHERE clauses is important. For instance:
SELECT *
FROM Cities
WHERE Capital = true
ORDER BY Population DESC
LIMIT 2
would return London and Luxembourg City. Whereas
SELECT *
FROM Cities
ORDER BY Population DESC
LIMIT 2
WHERE Capital = true
would return London
(the second statement is probably illegal but that's a different thing)
In Firestore (I'm using the C# SDK), we can set the query arguments and then get the snapshot. We could do:
Query query = citiesRef
.WhereEqualTo("Capital", true)
.OrderByDescending("Population")
.Limit(2);
QuerySnapshot querySnapshot = await query.GetSnapshotAsync();
We could also run this:
Query query = citiesRef
.OrderByDescending("Population")
.Limit(2)
.WhereEqualTo("Capital", true);
QuerySnapshot querySnapshot = await query.GetSnapshotAsync();
If Firestore was to behave in the way that a SQL database does, these 2 statements would potentially return different results. However, when we run them, they seem to return the same results.
Are the two Firestore queries equivalent?
Yes, they are the same. When you specify a limit, you are always limiting the final set of results, after all filters and orders, that would be returned to the client so that the results can be paginated by the client. It is never limiting the data set before any other operations are applied.
Related
I have a Clients table which has the following columns:
Id, FirstName, LastName, CurrencyId, Gender.
I want to select the client's Currency that corresponded to Id 10 so I am doing something like this:
var currencyId = Db.Clients.FirstOrDefault(c=>c.Id == 10)?.CurrencyId;
Does this line bring all properties from Db and selects the currency in the code or it executes something like this in the database:
SELECT currencyId FROM Client WHERE ID = 10
Or I should write the linq like this:
var currencyId = Db.Clients.Where(c=>c.Id == 10).Select(c=>c.CurrnecyId).FirstOrDefault();
What's the difference between the two queries?
And what is the correct way to translate the above SQL query into a linq query?
Looked into it myself cause I found the anwser from most people questionable. I expect the FirstOrDefault to materialize the result (you also see from the type that you are not longer working with a query object), so that would mean it queries for all properties.
Unlike the 2nd query where you are still working with a query when filtering the property you like, thus dependent on the implementation it could be used for filtering properties and selecting specific fields.
The following is an example of the queries generated using EF for two similar queries, where it shows both generating different queries: https://dotnetfiddle.net/5aFJAZ
In your first example, var currencyId = Db.Clients.FirstOrDefault(c=>c.Id == 10)?.CurrencyId; the query selects the entire object to memory and then returns the Id property from that in memory object. As a result it needs to do something like the following SQL: SELECT * FROM Clients WHERE Id = 10. I understand I'm not using a parameter here and EF does spell out every column. However the key to understand here is that by returning more columns than you need, you potentially are setting up a performance concern because a covering index on Id and CurrencyId would not be used.
Your second LINQ query would use a SQL statement like SELECT CurrencyId FROM Clients Where Id = 10 which would take advantage of your indexes assuming you have an index covering these columns.
We have had issue in a service utilizing Azure Table Storage where sometimes the queries take multiple seconds (3 to 30 seconds). This happens daily, but only for some of the queries. We do not have huge load on the service and the table storage (some hundreds of calls per hour). But still the table storage is not performing.
The slow queries are all doing filter queries that should return in maximum 10 rows. I have the filters structured so that there is always partition key and row key joined by and followed by next pair of partition and row keys after an or operator:
(partitionKey1 and RowKey1) or (partitionKey2 and rowKey2) or (partitionKey3 and rowKey3)
So qurrently I am on the premise that I need to split the query into separate queries. This was somewhat verified with a python script I did. Where when I repeat same query as single query (combined query with or's and expecting multiple rows as result) or split to multiple queries executed in separate treads, I see the combined query slow up every now and then.
import time
import threading
from azure.cosmosdb.table.tableservice import TableService
from azure.cosmosdb.table.models import Entity
############################################################################
# Script for querying data from azure table storage or cosmos DB table API.
# SAS token needs to be generated for using this script and a table with data
# needs to exist.
#
# Warning: extensive use of this script may burden the table performance,
# so use with care.
#
# PIP requirements:
# - requires azure-cosmosdb-table to be installed
# * run: 'pip install azure-cosmosdb-table'
dateTimeSince = '2019-06-12T13:16:45.446Z'
sasToken = 'SAS_TOKEN_HERE'
tableName = 'TABLE_NAME_HERER'
table_service = TableService(account_name="ACCOUNT_NAME_HERE", sas_token=sasToken)
tableFilter = "(PartitionKey eq 'http%3a%2f%2fsome_website.azurewebsites.net%2fApiName_ed6d31b0') and (RowKey eq 'ed6d31b0-d2a3-4f18-9d16-7f72cbc88cb3') or (PartitionKey eq 'http%3a%2f%2fsome_website.azurewebsites.net%2fApiName_9be86f34') and (RowKey eq '9be86f34-865b-4c0f-8ab0-decf928dc4fc') or (PartitionKey eq 'http%3a%2f%2fsome_website.azurewebsites.net%2fApiName_97af3bdc') and (RowKey eq '97af3bdc-b827-4451-9cc4-a8e7c1190d17') or (PartitionKey eq 'http%3a%2f%2fsome_website.azurewebsites.net%2fApiName_9d557b56') and (RowKey eq '9d557b56-279e-47fa-a104-c3ccbcc9b023') or (PartitionKey eq 'http%3a%2f%2fsome_website.azurewebsites.net%2fApiName_e251a31a') and (RowKey eq 'e251a31a-1aaa-40a8-8cde-45134550235c')"
resultDict = {}
# Do separate queries
filters = tableFilter.split(" or ")
threads = []
def runQueryPrintResult(filter):
result = table_service.query_entities(table_name=tableName, filter=filter)
item = result.items[0]
resultDict[item.RowKey] = item
# Loop where:
# - Step 1: test is run with tableFilter query split to multiple threads
# * returns single row per query
# - Step 2: Query is runs tableFilter query as single query
# - Press enter to repeat the two query tests
while 1:
start2 = time.time()
for filter in filters:
x = threading.Thread(target=runQueryPrintResult, args=(filter,))
x.start()
threads.append(x)
for x in threads:
x.join()
end2 = time.time()
print("Time elapsed with multi threaded implementation: {}".format(end2-start2))
# Do single query
start1 = time.time()
listGenerator = table_service.query_entities(table_name=tableName, filter=tableFilter)
end1 = time.time()
print("Time elapsed with single query: {}".format(end1-start1))
counter = 0
allVerified = True
for item in listGenerator:
if resultDict[item.RowKey]:
counter += 1
else:
allVerified = False
if len(listGenerator.items) != len(resultDict):
allVerified = False
print("table item count since x: " + str(counter))
if allVerified:
print("Both queries returned same amount of results")
else:
print("Result count does not match, single threaded count={}, multithreaded count={}".format(len(listGenerator.items), len(resultDict)))
input('Press enter to retry test!')
Here is an example output from the python code:
Time elapsed with multi threaded implementation: 0.10776209831237793
Time elapsed with single query: 0.2323908805847168
table item count since x: 5
Both queries returned same amount of results
Press enter to retry test!
Time elapsed with multi threaded implementation: 0.0897986888885498
Time elapsed with single query: 0.21547174453735352
table item count since x: 5
Both queries returned same amount of results
Press enter to retry test!
Time elapsed with multi threaded implementation: 0.08280491828918457
Time elapsed with single query: 3.2932426929473877
table item count since x: 5
Both queries returned same amount of results
Press enter to retry test!
Time elapsed with multi threaded implementation: 0.07794523239135742
Time elapsed with single query: 1.4898555278778076
table item count since x: 5
Both queries returned same amount of results
Press enter to retry test!
Time elapsed with multi threaded implementation: 0.07962584495544434
Time elapsed with single query: 0.20011520385742188
table item count since x: 5
Both queries returned same amount of results
Press enter to retry test!
The service we have problems with is implemented in C# though and I have yet to reproduce the results gotten with python script on the C# side. There I seem to have worse performance when splitting the query to multiple separate queries vs using single filter query (returning all the required rows).
So doing following multiple times and awaiting all to complete seems to be slower:
TableOperation getOperation =
TableOperation.Retrieve<HqrScreenshotItemTableEntity>(partitionKey, id.ToString());
TableResult result = await table.ExecuteAsync(getOperation);
Than doing all in single query:
private IEnumerable<MyTableEntity> GetBatchedItemsTableResult(Guid[] ids, string applicationLink)
{
var table = InitializeTableStorage();
TableQuery<MyTableEntity> itemsQuery=
new TableQuery<MyTableEntity>().Where(TableQueryConstructor(ids, applicationLink));
IEnumerable<MyTableEntity> result = table.ExecuteQuery(itemsQuery);
return result;
}
public string TableQueryConstructor(Guid[] ids, string applicationLink)
{
var fullQuery = new StringBuilder();
foreach (var id in ids)
{
// Encode link before setting to partition key as REST GET requests
// do not accept non encoded URL params by default)
partitionKey = HttpUtility.UrlEncode(applicationLink);
// Create query for single row in a requested partition
string queryForRow = TableQuery.CombineFilters(
TableQuery.GenerateFilterCondition("PartitionKey", QueryComparisons.Equal, partitionKey),
TableOperators.And,
TableQuery.GenerateFilterCondition("RowKey", QueryComparisons.Equal, id.ToString()));
if (fullQuery.Length == 0)
{
// Append query for first row
fullQuery.Append(queryForRow);
}
else
{
// Append query for subsequent rows with or operator to make queries independent of each other.
fullQuery.Append($" {TableOperators.Or} ");
fullQuery.Append(queryForRow);
}
}
return fullQuery.ToString();
}
The test case used with the C# code is quite different though from the python test. In C# I am querying 2000 rows from data of something like 100000 rows. If the data is queried in batches of 50 rows the latter filter query beats the single row query run in 50 tasks.
Maybe I should just repeat the test I did with python in C# as a console app to see if the .Net client api seems to behave the same way as python perf vice.
I think you should use multi-threaded implementation, since it consists of multiple Point Query. Doing all in single query probably results in a Table Scan. As the official doc mentions:
Using an "or" to specify a filter based on RowKey values results in a partition scan and is not treated as a range query. Therefore, you should avoid queries that use filters such as: $filter=PartitionKey eq 'Sales' and (RowKey eq '121' or RowKey eq '322')
You might think the example above is two Point Queries, but it actually results in a Partition Scan.
To me the answer here seems to be that executing queries on table storage has not been optimized to work with OR operator as you would expect. Query is not handled as point query when it combines point queries with OR operator.
This can be reproduced in python, C# and Azure Storage Explorer in which all where if you combine point queries with OR it can be 10x slower (or even more) than doing separate point queries that only return one row.
So most efficient way to get number of rows with partition and row keys known is to do them all with separate async queries with TableOperation.Retrieve (in C#). Using TableQuery is highly inefficient and does not produce results anywhere near the performance scalability targets for Azure Table Storage are leading to expect. Scalability targets say for example: "Target throughput for a single table partition (1 KiB-entities) Up to 2,000 entities per second". And here I was not even able to be served with 5 rows per second although all rows were in different partitions.
This limitation in query performance is not very clearly stated anywhere in any documentation or performance optimization guide, but it could be understod from these lines in the Azure storage performance checklist:
Querying
This section describes proven practices for querying the table service.
Query scope
There are several ways to specify the range of entities to query. The following is a discussion of the uses of each.
In general, avoid scans (queries larger than a single entity), but if you must scan, try to organize your data so that your scans retrieve the data you need without scanning or returning significant amounts of entities you don't need.
Point queries
A point query retrieves exactly one entity. It does this by specifying both the partition key and row key of the entity to retrieve. These queries are efficient, and you should use them wherever possible.
Partition queries
A partition query is a query that retrieves a set of data that shares a common partition key. Typically, the query specifies a range of row key values or a range of values for some entity property in addition to a partition key. These are less efficient than point queries, and should be used sparingly.
Table queries
A table query is a query that retrieves a set of entities that does not share a common partition key. These queries are not efficient and you should avoid them if possible.
So "A point query retrieves exactly one entity" and "Use point queries when ever possible". Since I had split the data to partitions, it may have been handled as table query: "A table query is a query that retrieves a set of entities that does not share a common partition key". This although the query combined set of point queries as it listed both partition and row keys for all entities that were expected. But since the combined query was not retriewing only one query it cannot be expected to perform as point query (or set of point queries).
Posting as an answer since it was getting bigger for comments.
Can you try by changing your query to something like the following:
(PartitionKey eq 'http%3a%2f%2fsome_website.azurewebsites.net%2fApiName_ed6d31b0' and RowKey eq 'ed6d31b0-d2a3-4f18-9d16-7f72cbc88cb3') or (PartitionKey eq 'http%3a%2f%2fsome_website.azurewebsites.net%2fApiName_9be86f34' and RowKey eq '9be86f34-865b-4c0f-8ab0-decf928dc4fc') or (PartitionKey eq 'http%3a%2f%2fsome_website.azurewebsites.net%2fApiName_97af3bdc' and RowKey eq '97af3bdc-b827-4451-9cc4-a8e7c1190d17') or (PartitionKey eq 'http%3a%2f%2fsome_website.azurewebsites.net%2fApiName_9d557b56' and RowKey eq '9d557b56-279e-47fa-a104-c3ccbcc9b023') or (PartitionKey eq 'http%3a%2f%2fsome_website.azurewebsites.net%2fApiName_e251a31a' and RowKey eq 'e251a31a-1aaa-40a8-8cde-45134550235c')
I have a very simple linq query which is as following:
var result = (from r in employeeRepo.GetAll()
where r.EmployeeName.Contains(searchString)
|| r.SAMAccountName.Contains(searchString)
orderby r.EmployeeName
select new SelectListItem
{
Text = r.EmployeeName,
Value = r.EmployeeName
});
The issue is for some strange reason it fetches me the record of every person who I search for whether in lower case or upper case. i.e.
test user
Test User
TEST USER
I will get back the correct records. However when I search for my own name using lower case I don't get any results back but if I use the first letter of my name as upper case then I get the results. I can't seem to figure out why its doing that.
Every first and last name in the database start with upper case.
The searchString which I'm using are:
richard - I get correct results
waidande - no results found
Both of the above users are in the database.
I'm also using Entity Framework to query Sql Server 2012.
If your text has NVARCHAR datatype check for similiar letters that in reality are not the same:
CREATE TABLE #employee (ID INT IDENTITY(1,1), EmployeeName NVARCHAR(100));
INSERT INTO #employee(EmployeeName) VALUES (N'waidаnde');
SELECT *
FROM #employee
WHERE EmployeeName LIKE '%waidande%';
-- checking
SELECT *
FROM #employee
WHERE CAST(EmployeeName AS VARCHAR(100)) <> EmployeeName;
db<>fiddle demo
Here: 'а' != 'a'. One is from Cyrillic 'a' and the second is normal.
Idea taken from:
Slide from: http://sqlbits.com/Sessions/Event12/Revenge_The_SQL
P.S. I highly recommend to watch Rob Volk's talk: Revenge: The SQL!.
To troubleshoot the issue, determine whether the problem is on the EF side, or on DB side.
A common mistake is extra whitespace, so make sure it's not the case before proceeding.
First check what query is being generated by EF, you can use one of the following methods to do this
ObjectQuery.ToTraceString() method
EF logging of intercepted db calls
Sql server profiler
If you are using EF correctly and your query is translated to SQL as expected and contains the predicates in the where section, but you still are not getting any meaningful results, here are some ideas to try out on the DB side:
Check collation ( be aware it can be set on server, database and individual column level) - beware of case sensitivity and code page that is being used
Verify that your search string contains symbols that can be interpreted in the db code page - for example if code page is 252 - Windows Latin 1 ANSI and you are sending input with symbols from UTF-16 that are outside ANSI - you won't get any results, even though the symbols look the same
Highly improbable, but as last resort check if one of your queries has not been cached, as described here
SQL Server 2012 (SQL Server) is installed by default with case insensitive collation. If you need to retrieve records from the database using case sensitivity (because you have "several" records) you need to change the collation (take care because if you change DBMS collation you change also master database collation so also tables and field names become case sensitive).
If you don't need to avoid to retrieve all the records from the DBMS you can just filter records after you retrieve them, i.e.
var result = (from r in employeeRepo.GetAll()
where r.EmployeeName.Contains(searchString)
|| r.SAMAccountName.Contains(searchString)
orderby r.EmployeeName
select new SelectListItem
{
Text = r.EmployeeName,
Value = r.EmployeeName
})
.ToList() // Materialize records and apply case sensitive filter
.Where(r.EmployeeName.Contains(searchString)
|| r.SAMAccountName.Contains(searchString));
im calling a table with 200.000 rows and 6 columns, but i only want 2 of these columns to be used in one controller, so i want to know if there is a better way to call them from the server without compromising performance, because as i know Linq queries get the whole table and them makes the filtering, i think maybe Views is a good way, but i want to know if there are others and betters, Thanks.
for example:
var items = from i in db.Items select new {i.id,i.name};
in case i have 1.000.000 items, will it be a trouble for the server?
Your initial assumption is incorrect.
In general LINQ queries do not get the whole table. the query is converted into a "server side expression" (i.e. a SQL statement) and the statement is resolved on the server and only the requested data is returned.
Given the statement you provided you will return only two columns but you will get 1,000,000 objects in the result if you do not do any filtering. But that isn't a problem with LINQ, that's a problem with you not filtering. If you included a where clause you would only get the rows you requested.
var items = from i in db.Items
where i.Whatever == SomeValue
select new { i.id, i.name };
Your original query would be translated (roughly) into the following SQL:
SELECT id, name FROM Items
You didn't include a where clause so you're going to get everything.
With the version that included a where clause you'd get the following SQL generated:
SELECT id, name FROM Items WHERE Whatever = SomeValue
Only the rows that match the condition would be returned to your application and converted into objects.
I have an MSSQL database with LINQ to SQL.
I have three tables.
Requests -> id, string name
Results -> id, requestID, int jumps
Places -> id, resultID, int location
Then, using an input string, I need to get an ICollectable or array or something of Place which meets the following:
Each Request that has name=input, take its ID.[you can assume only one has]
Each Result that has requestID=ID[from above] - take its id.
Each Place that has resultID='id[from above]' - append to array for further processing.
I made it by looping on all Results and then executing another LINQ statement, but its extremely slow [about 500ms for a single request!]. Can I make it any faster?
Thank you!
Edit: Whoops, I also need it grouped by result. aka a List of List of Places, while each inner list contains one column from Result.
You can perform table joins in Linq2Sql using the join keyword:
var places = from request in Requests
join result in Results on request.Id equals result.requestID
join place in Places on result.Id equals place.ResultId
where request.name = input
select place;
Somthing like
Requests.Where(r => r.name == input).Results.Places.Select();
If this is too slow then I expect you need some indexes on your database.
If you don't have the relationships in your model then you need to establish some foreign key constraints on your tables an rebuild your model.