OData and Cosmos DB - c#

I'm looking to implement OData v4 for as a querying tool in an ASP.NET Core application I'm working on, and our backing persistence store is Cosmos DB. So far, I haven't figured out a way to make OData queries run against the DocumentQuery IQueryable interface without encountering some sort of exception or error.
I was wanting to know if there was a "clean" way to utilize OData against Cosmos Document DB (not the Table API), and if so, how? So far, all I've found is an unofficial library that's using Framework v4.6, but nothing official, and all documentation I've found about implementing OData has almost exclusively been ran against Entity Framework or in-memeory data store.

I know this isn't exactly the most insightful answer to my question, but the correct answer here is really to just not attempt to do this. If you're ever in a position where someone asks you to try to force together two technologies that don't really go together, say no and deal with the consequences.

OData integration with Cosmos is not that hard if you're using the SQL api and you only need the basic stuff like $orderby, $top and $skip. It's a matter of generating correct SQL.
If you need more than that it gets a bit harder. Anyway, I did some simple testning with this NuGet lib. It seems to work from my simple tests at least.
var oDataToSqlTranslator = new ODataToSqlTranslator(new SQLQueryFormatter());
var select = oDataToSqlTranslator.Translate(odataQueryOptions, TranslateOptions.SELECT_CLAUSE);
var where = oDataToSqlTranslator.Translate(odataQueryOptions, TranslateOptions.WHERE_CLAUSE);
var order = oDataToSqlTranslator.Translate(odataQueryOptions, TranslateOptions.ORDERBY_CLAUSE);
var top = oDataToSqlTranslator.Translate(odataQueryOptions, TranslateOptions.TOP_CLAUSE);
var all = oDataToSqlTranslator.Translate(odataQueryOptions, TranslateOptions.ALL);
log.LogInformation("SQL select => " + select);
log.LogInformation("SQL where => " + where);
log.LogInformation("SQL order => " + order);
log.LogInformation("SQL all => " + all);
Given this URL as input:
http://localhost:7071/api/v1/invoices/customer/20?$top=2&$select=CustomerId&$filter=InvoiceNumber eq 'xxx'&orderby=brand
The logs shows this:
SQL select => SELECT c.CustomerId FROM c
SQL where => WHERE c.InvoiceNumber = 'xxx'
SQL order => ORDER BY c.InvoiceNumber ASC
SQL all => SELECT TOP 2 c.CustomerId FROM c WHERE c.InvoiceNumber = 'xxx' ORDER BY c.InvoiceNumber ASC

How about something like - convert OData to SQL, and then execute it on the CosmosDB
https://github.com/Azure/azure-odata-sql-js

Related

Fluent Linq and SQL Server

This may be a silly question, but I have fluent linq query that is pulling data off a DB, but, of course, it processes the data in memory, not on the DB.
Is there a means of taking a query (Say table.Where(t => t. id > 22)) and having it run as a db query (i.e. SELECT * FROM TABLE WHERE ID > 22).
I want to use the fluent style because I am building up the query from various places. I know I can make this specific query into query syntax, but my actual queries are more complex.
I have tried standard searches, but I suspect it is not going to be possible, as nothing comes up.
Yes using EF Core.
The code is not necessarily clear:
var getTableData = table.Where(base.whereSelection);
whereSelection.ForEach(w => getTableData = getTableData.Where(w));
At the moment, all of the Where Selections are Funcs - they could be converted to something else (like Expressions) if I knew that would make them run on the DB. The rest of the code is also building up this set of details programatically:
var selectedData = getTableData.Select(Convert);
var sortedTableData = GetSort(selectedData, Ordering);
var sorted = PickDataFrom(sortedTableData);
return CheckAndTake(sorted, pageSize);
Everything seems to work. Just in the wrong place.

CosmosDB Linq query with contains is not of type IDocumentQuery

I have a project working woth CosmosDB. At first I used the preview for EFCore, but it really isn't mature enough so I decided to opt with Cosmonaut instead. I have a linq statement which basically looks if two properties contains a list of substrings - basically I'm trying to do something like:
SELECT * FROM c WHERE CONTAINS(c.Name, ListOfNames) AND CONTAINS(c.Producer, ListOfProducers);
Or a huge ass bunch of:
foreach(var name in nameList) {
foreach(var producer in producerList){
SELECT * FROM c WHERE c.Name == searchedName AND c.Producer == searchedProducer;
}
}
This worked with the EFCore SQL adapter with the following Linq Query:
public async void Search(List<string> producers, List<string> names){
await _store.Entity.Where(x => producers.Any(p => x.Producer.Contains(p)) && names.Any(w => x.Name.Contains(w))).ToListAsync()
}
However, this with the cosmonaut library (which wraps the DocumentDB client from cosmos) gives the following exception:
Input is not of type IDocumentQuery
Looking at the specs for Contains I can do this:
USE AdventureWorks2012;
GO
SELECT Name
FROM Production.Product
WHERE CONTAINS(Name, '"chain*" OR "full*"');
GO
But doing the following in the CosmosDB data explorer yields 0 results:
SELECT * FROM c WHERE CONTAINS(c.Name, '"Test" OR "Test2"')
Whereas a regular contains does:
SELECT * FROM c WHERE CONTAINS(c.Name, "Test")
Maybe my strategy is just wrong. The main reason I want to combine it is to have better performance. My search domain is around 100.000 documents where I have a list of up to a 1000 of producer + names. So basically I want to see if I can find the given producer + name combination in my document list.
Best Regards
First of all you should not mix T-SQL and Cosmos SQL API. Cosmos db has sql-like query syntax but it doesn't support T-SQL(it's for MS SQL).
Secondly, CONTAINS in Cosmos SQL API is a string operator so you cannot use it for arrays.
I think you're looking for IN keyword.
So actually you need next query:
SELECT * FROM c WHERE (c.Name IN("Test", "Test2")) AND (c.Producer IN("Producer1", "Producer2"))
I have not used cosmonaut library but in Microsoft LINQ provider for Document DB your query should look like this:
var data = yourQueryable.Where(x => producers.Contains(x.Producer) && names.Contains(x.Name)).ToList();

Entity Framework Core count does not have optimal performance

I need to get the amount of records with a certain filter.
Theoretically this instruction:
_dbContext.People.Count (w => w.Type == 1);
It should generate SQL like:
Select count (*)
from People
Where Type = 1
However, the generated SQL is:
Select Id, Name, Type, DateCreated, DateLastUpdate, Address
from People
Where Type = 1
The query being generated takes much longer to run in a database with many records.
I need to generate the first query.
If I just do this:
_dbContext.People.Count ();
Entity Framework generates the following query:
Select count (*)
from People
.. which runs very fast.
How to generate this second query passing search criteria to the count?
There is not much to answer here. If your ORM tool does not produce the expected SQL query from a simple LINQ query, there is no way you can let it do that by rewriting the query (and you shouldn't be doing that at the first place).
EF Core has a concept of mixed client/database evaluation in LINQ queries which allows them to release EF Core versions with incomplete/very inefficient query processing like in your case.
Excerpt from Features not in EF Core (note the word not) and Roadmap:
Improved translation to enable more queries to successfully execute, with more logic being evaluated in the database (rather than in-memory).
Shortly, they are planning to improve the query processing, but we don't know when will that happen and what level of degree (remember the mixed mode allows them to consider query "working").
So what are the options?
First, stay away from EF Core until it becomes really useful. Go back to EF6, it's has no such issues.
If you can't use EF6, then stay updated with the latest EF Core version.
For instance, in both v1.0.1 and v1.1.0 you query generates the intended SQL (tested), so you can simply upgrade and the concrete issue will be gone.
But note that along with improvements the new releases introduce bugs/regressions (as you can see here EFCore returning too many columns for a simple LEFT OUTER join for instance), so do that on your own risk (and consider the first option again, i.e. Which One Is Right for You :)
Try to use this lambda expression for execute query faster.
_dbContext.People.select(x=> x.id).Count();
Try this
(from x in _dbContext.People where x.Type == 1 select x).Count();
or you could do the async version of it like:
await (from x in _dbContext.People where x.Type == 1 select x).CountAsync();
and if those don't work out for you, then you could at least make the query more efficient by doing:
(from x in _dbContext.People where x.Type == 1 select x.Id).Count();
or
await (from x in _dbContext.People where x.Type == 1 select x.Id).CountAsync();
If you want to optimize performance and the current EF provider is not not (yet) capable of producing the desired query, you can always rely on raw SQL.
Obviously, this is a trade-off as you are using EF to avoid writing SQL directly, but using raw SQL can be useful if the query you want to perform can't be expressed using LINQ, or if using a LINQ query is resulting in inefficient SQL being sent to the database.
A sample raw SQL query would look like this:
var results = _context.People.FromSql("SELECT Id, Name, Type, " +
"FROM People " +
"WHERE Type = #p0",
1);
As far as I know, raw SQL queries passed to the FromSql extension method currently require that you return a model type, i.e. returning a scalar result may not yet be supported.
You can however always go back to plain ADO.NET queries:
using (var connection = _context.Database.GetDbConnection())
{
connection.Open();
using (var command = connection.CreateCommand())
{
command.CommandText = "SELECT COUNT(*) FROM People WHERE Type = 1";
var result = command.ExecuteScalar().ToString();
}
}
It seems that there has been some problem with one of the early releases of Entity Framework Core. Unfortunately you have not specified exact version so I am not able to dig into EF source code to tell what exactly has gone wrong.
To test this scenario, I have installed the latest EF Core package and managed to get correct result.
Here is my test program:
And here is SQL what gets generated captured by SQL Server Profiler:
As you can see it matches all the expectations.
Here is the excerpt from packages.config file:
...
<package id="Microsoft.EntityFrameworkCore" version="1.1.0" targetFramework="net452" />
...
So, in your situation the only solution is to update to the latest package which is 1.1.0 at the time of writing this.
Does this get what you want:
_dbContext.People.Where(w => w.Type == 1).Count();
I am using EFCore 1.1 here.
This can occur if EFCore cannot translate the entire Where clause to SQL. This can be something as simple as DateTime.Now that might not even think about.
The following statement results in a SQL query that will surprisingly run a SELECT * and then C# .Count() once it has loaded the entire table!
int sentCount = ctx.ScheduledEmail.Where(x => x.template == template &&
x.SendConfirmedDate > DateTime.Now.AddDays(-7)).Count();
But this query will run an SQL SELECT COUNT(*) as you would expect / hope for:
DateTime earliestDate = DateTime.Now.AddDays(-7);
int sentCount = ctx.ScheduledEmail.Where(x => x.template == template
&& x.SendConfirmedDate > earliestDate).Count();
Crazy but true. Fortunately this also works:
DateTime now = DateTime.Now;
int sentCount = ctx.ScheduledEmail.Where(x => x.template == template &&
x.SendConfirmedDate > now.AddDays(-7)).Count();
sorry for the bump, but...
probably the reason the query with the where clause is slow is because you didnt provide your database a fast way to execute it.
in case of the select count(*) from People query we dont need to know the actual data for each field and we can just use a small index that doesnt have all these fields in them so we havent got to spend our slow I/O on. The database software would be clever enough to see that the primary key index requires the least I/O to do the count on. The pk id's require less space than the full row so you get more back to count per I/O block so you can complete faster.
Now in the case of the query with the Type it needs to read the Type to determine it's value. You should create an index on Type if you want your query to be fast or else it will have to do a very slow full table scan, reading all rows. It helps when your values are more discriminating. A column Gender (usually) only has two values and isnt very discriminating, a primary key column where every value is unique is highly dscriminating. Higher discriminating values will result in a shorter index range scan and a faster result to the count.
What I used to count rows using a search query was
_dbContext.People.Where(w => w.Type == 1).Count();
This can also be achieved by
List<People> people = new List<People>();
people = _dbContext.People.Where(w => w.Type == 1);
int count = people.Count();
This way you will get the people list too if you need it further.

EntityFramework MySQL retrieves results for counting

I am using EntityFramework 6 with the official MySQL provider.
I have a database containing a list of VenuePlans which each consist of Areas.
In order to show these values I am using this very simple LINQ query:
model.VenuePlans = CurrentOrganization.VenuePlans.Select(p => new ViewModels.VenuePlans.IndexViewModel.VenuePlan
{
ID = p.MaskID,
Name = p.DisplayName,
AreaCount = p.VenuePlans_Areas.Count()
}).ToArray();
But when looking at the executed queries using MiniProfiler I see that this results in duplicate queries as follows:
Retrieving the VenuePlans:
SELECT
`Extent1`.`PlanID`,
`Extent1`.`MaskID`,
`Extent1`.`DisplayName`,
`Extent1`.`OrganizationID`,
`Extent1`.`SVG`
FROM `VenuePlans` AS `Extent1`
WHERE `Extent1`.`OrganizationID` = #EntityKeyValue1
Retrieving the Areas for the first VenuePlan:
SELECT
`Extent1`.`AreaID`,
`Extent1`.`PlanID`,
`Extent1`.`DisplayName`,
`Extent1`.`MaskID`,
`Extent1`.`FillColor`,
`Extent1`.`InternalName`
FROM `VenuePlans_Areas` AS `Extent1`
WHERE `Extent1`.`PlanID` = #EntityKeyValue1
Now this latter query is repeated for every area present in the database.
CurrentOrganization is an instance of another model retrieved earlier.
Now when writing the query directly on the DbContext instance I don't have this issue:
model.VenuePlans = DbContext.VenuePlans
.Where(p => p.OrganizationID == CurrentOrganization.OrganizationID)
.Select(p => new ViewModels.VenuePlans.IndexViewModel.VenuePlan
{
ID = p.MaskID,
Name = p.DisplayName,
AreaCount = p.VenuePlans_Areas.Count()
}).ToArray();
What is the reason for this?
DbContext is a variable declared in my BaseController which returns an instance of the current DbContext stored in HttpRequest.Items.
What can I do to prevent this behavior?
I've never found the MySql Linq stuff to be very good. I used it recently, and had to use ToList earlier than I would have liked to stop the query generation from spouting gibberish.
Armed with the knowledge that Linq to MySql is broken, and it's not just you, you'd be best using the version of the query that's fluid from your context instead of from your object.
Having said that, I'd be interesting in seeing if anybody does have a solution, because I tend to avoid Linq when using MySql.

linq to sql query with 2 dbml files

I have a object of MyFriendFollowStatus , for each friend I need information from 2 different databases, so I wrote something like this
db1Context FFdb = new db1Context();
db2Context EEdb = new db2Context();
foreach (fbFriendsFollowStatus a in fbids)
{
long ffID = FFdb.FFUsers.Where(x => x.FacebookID == a.fbid).FirstOrDefault().FFUserID;
a.ffID = ffID;
int status = EEdb.StatusTable(x => x.ffID == ffid).FirstOrDefault().Status;
a.Status = status;
}
this works, but it doesnt really seem right calling 2 databases - once each for each user , is there something built in LinqToSql that helps with something like this? or sometype of join I can use using 2 different databases?
Well, you can always limit your N+1 query problem to 3 queries - one to get users, one to get user's data form first database and one for the second database. Then connect all the results in memory - this will limit the connections to databases which should improve performance of your application.
I don't know if linq-to-sql or entity framework offers building model from different databases - this would pose some performance problems probably - like in includes or something, but I may simply not be aware of such features.
Sample code to do what you're trying to achieve would look something like that:
var facebookIds = fbFriendsFollowStatus.Select(a => a.fbid);
var FFUserIds= FFdb.FFUsers.Where(x => facebookIds.Contains(x.FacebookID)).Select(x => new { x.FacebookID, x.FFUserId)
var statuses = EEdb.StatusTable.Where(x => FFUserIds.Contains(x.ffID)).Select(x => new { x.ffID, x.Status})
And then some simple code to match results in memory - but that will be simple.
Please note that this code is sample - if I've mismatched some ids or something, but idea should be clear.

Categories

Resources