MongoDB and returning collections efficiently - c#

I am very new to Mongo (this is actually day 1) and using the C# driver that is available for it. One thing that I want to know (as I am not sure how to word it in Google) is how does mongo handle executing queries when I want to grab a part of the collection.
What I mean by this is that I know that with NHibernate and EF Core, the query is first built and it will only fire when you cast it. So say like an IQueryable to IEnnumerable, .ToList(), etc.
Ex:
//Query is fired when I call .ToList, until that point it is just building it
context.GetLinqQuery<MyObject>().Where(x => x.a == 'blah').ToList();
However, with Mongo's examples it appears to me that if I want to grab a filtered result I will first need to get the collection, and then filter it down.
Ex:
var collection = _database.GetCollection<MyObject>("MyObject");
//Empty filter for ease of typing for example purposes
var filter = Builders<MyObject>.Filter.Empty;
var collection.Find(filter).ToList();
Am I missing something here, I do not think I saw any overload in the GetCollection method that will accept a filter. Does this mean that it will first load the whole collection into memory, then filter it? Or will it still be building the query and only execute it once I call either .Find or .ToList on it?
I ask this because at work we have had situations where improper positioning of .ToList() would result is seriously weak performance. Apologies if this is not the right place to ask.
References:
https://docs.mongodb.com/guides/server/read_queries/

The equivalent to your context.GetLinqQuery<MyObject>() would be to use AsQueryable:
collection.AsQueryable().Where(x => x.a == "blah").ToList();
The above query will be executed server side* and is equivalent to:
collection.Find(Builders<MyObject>.Filter.Eq(x => x.a, "blah")).ToEnumerable().ToList();
* The docs state that:
Only LINQ queries that can be translated to an equivalent MongoDB query are supported. If you write a LINQ query that can’t be translated you will get a runtime exception and the error message will indicate which part of the query wasn’t supported.

Related

IQueryable Intersect is currently not supported

When I trying to do this
//data.Photos it's IEnumerable<Photo>. Comparer worked by Id.
List<Photo> inDb = db.Photos
.Intersect(data.Photos, new PhotoComparer())
.ToList();
I get an exception:
NotSupportedException: Could not parse expression
'value(Microsoft.EntityFrameworkCore.Query.Internal.EntityQueryable`1[ReportViewer.Models.DbContexts.Photo]).Intersect(__p_0, __p_1)'
This overload of the method #x27;System.Linq.Queryable.Intersect' is currently not supported.
// This works
List<Photo> inDb = db.Photos
.ToList()
.Intersect(data.Photos, new PhotoComparer())
.ToList();
// But will it take a long time - or not ?
What did I need to use Intersect with IQueryable and IEnumerable collection?
Due to the "custom comparer", although it's functionality might be trivial, the framework is currently not able to translate your statement to SQL (which I suspect you are using).
Next, it seems that you have a in memory collection, on which you want to perform this intersect.
So if you're wondering about speed, in order to get it working you'll need to send your data to the database server, and based on the Id's retrieve your data.
So basically, you are looking for a way to perform an inner join, which would be the SQL equivalent of the intersect.
Which you could do with the flowing linq query:
//disclaimer: from the top of my head
var list= from dbPhoto in db.Photos
join dataPhoto in data.Photos on dbPhoto.Id equals dataPhoto.Id
select dbPhoto;
This will not work though, since as far as I know EF isn't able to perform an join against an in-memory dataset.
So, alternatively you could:
fetch the data as IEnumerable (but yes, you'll be retrieving the whole set first)
use a Contains, be carefull though, if you're not using primitive types this can translate to a bunch of SQL OR statements
But basically it depends on the amount of data you're querying. You might want to reconsider your setup and try to be able to query the data based on some ownership, like user or other means.

Selecting the first item from a query eficiently

Been reading Microsoft's LINQ docs for a while in search for the correct way to do this. Microsoft's example is the following:
Customer custQuery =
(from custs in db.Customers
where custs.CustomerID == "BONAP"
select custs)
.First();
This obviosly works and it's the obvious way to do it(except for using FirstOrDefault() rather than First()), however to me, it looks like this runs the query and after it's done it selects the first.
Is there a way to return the first result and not continue the query?
however to me, it looks like this runs the query and after it's done it selects the first
Nope. The query inside the parentheses returns an IQueryable object, which is basically the representation of a query that hasn't been run yet. It's only when you call .First() does it actually process the IQueryable object and translate it into a database query, and without looking I guarantee you it only asks the database for the first item.
However, if you were to write .ToList().First() instead of just .First() (and you see beginners making this mistake in less obvious ways), it would indeed load everything into memory and then pull the first object from it.
But the code you've pasted is perfectly efficient.

Linq: Method has no supported translation to SQL - but how to dump to memory?

I have a Linq query that reads from a SQL table and 1 of the fields it returns are from a custom function (in C#).
Something like:
var q = from my in MyTable
select new
{
ID = my.ID,
Amount = GetAmount(ID)
};
If I do a q.Dump() in LinqPad, it shows the results, which tells me that it runs the custom function without trying to send it to SQL.
Now I want to union this to another query, with:
var q1 = (from p in AnotherQuery.Union(q)...
and the I get the error that Method has no supported translation to SQL.
So, my logic tells me that I need to dump q in memory and then try to union to that. I've tried doing that with ToList() and creating a secondary query that populates itself from the List, but that leads to a long list of different errors. Am I on the right track, by trying to get q in memory and union on that, or are there better ways of doing this?
You can't use any custom functions in a LINQ query that gets translated - only the functions supported by the given LINQ provider. If you want your query to happen on the server, you need to stick with the supported functions (even if it sometimes means having to inline code that would otherwise be reused).
The difference between your two queries boils down to when (and where) the projection happens. In your first case, the data from MyTable is returned from the DB - in your sample, just the ID. Then, the projection happens on top of this - the GetAmount method is called in your application for each of ID.
On the other hand, there's no such way for this to happen in your second query, since you're not using GetAmount in the final projection.
You either need to replace the custom function with inlined query the provider understands, or refactor all your queries to use the supported functions in addition with whatever you need to do in-memory. There's no point in giving you any sample code, since it depends entirely on your actual query, and what you're really trying to query for.

Should I be using LINQ queries on RavenDB?

I'm learning RavenDB and I am rather confused. As far as I understand, one should create indexes in order to have really efficient queries. However, it is possible to simply make LINQ queries, such as
using(IDocumentSession session = _store.OpenSession())
{
MyDocument doc = session.Query<MyDocument>()
.Where(d => d.Property == value)
.Single();
}
This type of query works perfectly fine. I have, however, never created an index for it (and never reference an index when making the query, of course).
Should I be using this kind of query when working with RavenDB? If not, why is it even available in the API?
There's two things you are asking, here.
Can we use Indexes .. which are suppose to be more efficient than dynamic queries?
If we use indexes .. then should we use Linq and chaining?
Indexes
As Matt Warren correctly said, you're not using any indexes in your sample query. Right now, with your sample query, RavenDb is smart enough to create a temp (dynamic) index. If that dynamic index is used enough, it get auto-promoted to a static / perminent index.
So .. should you use indexes? If you can, then yeah!
here's your statement again, this time with an Index defined.
using(IDocumentSession session = _store.OpenSession())
{
MyDocument doc = session.Query<MyDocument>("ByProperty")
.Where(d => d.Property == value)
.Single();
}
In this case an index called MyDocument_ByProperty was created somewhere. I'm not going to explain the details of indexes .. go and read all about them here.
Linq and chaining
(Not sure if that is the correct terminology ... )
If you create a linq statement (which I did above) with OR without an index .. a query is still generated .. which then is translated into an HTTP RESTful request to the RavenDB Server. If you have an index .. then the query is smart enough to ask to use that. None? Then the server will create a dynamic index .. which means it will also have to go through the motions of indexing first, then retrieving your results.
TL;DR;
Yes use indexes. Yes use Linq chaining.
RavenDb comes with a native support for .Net and Linq.
The Linq provider, under the hood, does normal REST calls to the ravendb server, but for you it's easier to code on it since you can use IQueryable<T> with strongly typed classes.
So yes, you can and you should use linq/lambda to work with RavenDB in a .Net envorinment.
Something to be aware of that caught me out is that if you include a linq statement such as .Where(d => d.SomeProperty == null) then you might expect that if the document does not have the property then you would return a match. However this is not the case. If the document does not have the property then its value is not considered to be null (or any other value).

Dynamically Generating a Linq/Lambda Where Clause

I've been searching here and Google, but I'm at a loss. I need to let users search a database for reports using a form. If a field on the form has a value, the app will get any reports with that field set to that value. If a field on a form is left blank, the app will ignore it. How can I do this? Ideally, I'd like to just write Where clauses as Strings and add together those that are not empty.
.Where("Id=1")
I've heard this is supposed to work, but I keep getting an error: "could not be resolved in the current scope of context Make sure all referenced variables are in scope...".
Another approach is to pull all the reports then filter it one where clause at a time. I'm hesitant to do this because 1. that's a huge chunk of data over the network and 2. that's a lot of processing on the user side. I'd like to take advantage of the server's processing capabilities. I've heard that it won't query until it's actually requested. So doing something like this
var qry = ctx.Reports
.Select(r => r);
does not actually run the query until I do:
qry.First()
But if I start doing:
qry = qry.Where(r => r.Id = 1).Select(r => r);
qry = qry.Where(r => r.reportDate = '2010/02/02').Select(r => r);
Would that run the query? Since I'm adding a where clause to it. I'd like a simple solution...in the worst case I'd use the Query Builder things...but I'd rather avoid that (seems complex).
Any advice? :)
Linq delays record fetching until a record must be fetched.
That means stacking Where clauses is only adding AND/OR clauses to the query, but still not executing.
Execution of the generated query will be done in the precise moment you try to get a record (First, Any etc), a list of records(ToList()), or enumerate them (foreach).
.Take(N) is not considered fetching records - but adding a (SELECT TOP N / LIMIT N) to the query
No, this will not run the query, you can structure your query this way, and it is actually preferable if it helps readability. You are taking advantage of lazy evaluation in this case.
The query will only run if you enumerate results from it by using i.e. foreach or you force eager evaluation of the query results, i.e. using .ToList() or otherwise force evaluation, i.e evaluate to a single result using i.e First() or Single().
Try checking out this dynamic Linq dll that was released a few years back - it still works just fine and looks to be exactly what you are looking for.

Categories

Resources