Using Facets in the Aggregation Framework C# - c#

I would like to create an Aggregation on my data to get the total amount of counts for specific tags for a collection of books in my .Net application.
I have the following Book class.
public class Book
{
public string Id { get; set; }
public string Name { get; set; }
[BsonDictionaryOptions(DictionaryRepresentation.Document)]
public Dictionary<string, string> Tags { get; set; }
}
And when the data is saved, it is stored in the following format in MongoDB.
{
"_id" : ObjectId("574325a36fdc967af03766dc"),
"Name" : "My First Book",
"Tags" : {
"Edition" : "First",
"Type" : "HardBack",
"Published" : "2017",
}
}
I've been using facets directly in MongoDB and I am able to get the results that I need by using the following query:
db.{myCollection}.aggregate(
[
{
$match: {
"Name" : "SearchValue"
}
},
{
$facet: {
"categorizedByTags" : [
{
$project :
{
Tags: { $objectToArray: "$Tags" }
}
},
{ $unwind : "$Tags"},
{ $sortByCount : "$Tags"}
]
}
},
]
);
However I am unable to transfer this over to the .NET C# Driver for Mongo. How can I do this using the .NET C# driver?
Edit - I will ultimately be looking to query the DB on other properties of the books as part of a faceted book listings page, such as Publisher, Author, Page count etc... hence the usage of $facet, unless there is a better way of doing this?

I would personally not use $facet here since you've only got one pipeline which kind of defeats the purpose of $facet in the first place...
The following is simpler and scales better ($facet will create one single potentially massive document).
db.collection.aggregate([
{
$match: {
"Name" : "My First Book"
}
}, {
$project: {
"Tags": {
$objectToArray: "$Tags"
}
}
}, {
$unwind: "$Tags"
}, {
$sortByCount: "$Tags"
}, {
$group: { // not really needed unless you need to have all results in one single document
"_id": null,
"categorizedByTags": {
$push: "$$ROOT"
}
}
}, {
$project: { // not really needed, either: remove _id field
"_id": 0
}
}])
This could be written using the C# driver as follows:
var collection = new MongoClient().GetDatabase("test").GetCollection<Book>("test");
var pipeline = collection.Aggregate()
.Match(b => b.Name == "My First Book")
.Project("{Tags: { $objectToArray: \"$Tags\" }}")
.Unwind("Tags")
.SortByCount<BsonDocument>("$Tags");
var output = pipeline.ToList().ToJson(new JsonWriterSettings {Indent = true});
Console.WriteLine(output);
Here's the version using a facet:
var collection = new MongoClient().GetDatabase("test").GetCollection<Book>("test");
var project = PipelineStageDefinitionBuilder.Project<Book, BsonDocument>("{Tags: { $objectToArray: \"$Tags\" }}");
var unwind = PipelineStageDefinitionBuilder.Unwind<BsonDocument, BsonDocument>("Tags");
var sortByCount = PipelineStageDefinitionBuilder.SortByCount<BsonDocument, BsonDocument>("$Tags");
var pipeline = PipelineDefinition<Book, AggregateSortByCountResult<BsonDocument>>.Create(new IPipelineStageDefinition[] { project, unwind, sortByCount });
// string based alternative version
//var pipeline = PipelineDefinition<Book, BsonDocument>.Create(
// "{ $project :{ Tags: { $objectToArray: \"$Tags\" } } }",
// "{ $unwind : \"$Tags\" }",
// "{ $sortByCount : \"$Tags\" }");
var facetPipeline = AggregateFacet.Create("categorizedByTags", pipeline);
var aggregation = collection.Aggregate().Match(b => b.Name == "My First Book").Facet(facetPipeline);
var output = aggregation.Single().Facets.ToJson(new JsonWriterSettings { Indent = true });
Console.WriteLine(output);

Related

MongoDB .NET Driver - Use $in in the match stage

I have two collections one is for posts (PostInfo) and one for users (UserInfo), I join two collections and I want to find the posts if the given userid is in AsUser.Friends :
var docs = await _dbContext.PostInfos.Aggregate()
.Lookup("UserInfo", "UserId", "UserId", "AsUser")
.Unwind("AsUser")
.Match(
new BsonDocument() {
{ "$expr", new BsonDocument() {
{ "$in", new BsonArray(){ "$AsUser.Friends", BsonArray.Create(user.UserId) } }
}
}
}
)
.As<PostInfo>()
.Project<PostInfo>(Builders<PostInfo>.Projection.Exclude("AsUser"))
.ToListAsync();
This is userinfo document :
{
"_id" : ObjectId("62d64398772c29b212332ec2"),
"UserId" : "18F1FDB9-E5DE-4116-9486-271FE6738785",
"IsDeleted" : false,
"UserName" : "kaveh",
"Followers" : [],
"Followings" : [],
"Friends" : [
"9e3163b9-1ae6-4652-9dc6-7898ab7b7a00",
"2B5F6867-E804-48AF-BED3-672EBD770D10"
],
}
I am having a problem working with the $in operator.
Update
Also, I think this would work too (from here):
db.inventory.find( { tags: { $eq: [ "A", "B" ] } } )
But I can't convert this to C# format.
The $in operator (logic) is incorrect, you should check whether the userId in the AsUser.Friends array as below:
{
$match: {
$expr: {
$in: [
"9e3163b9-1ae6-4652-9dc6-7898ab7b7a00", // UserId
"$AsUser.Friends"
]
}
}
}
Sample Mongo Playground
For MongoDB C# syntax,
.Match(
new BsonDocument()
{
{
"$expr", new BsonDocument()
{
{ "$in", new BsonArray() { user.UserId, "$AsUser.Friends" } }
}
}
}
)

MongoDB $geoNear result separate distance from returned document

I am running a MongoDB query in .NET application that is using $geoNear aggregation stage and returns a list of object within a specified distance. It is returning them in following format (I am using results from MongoDB examples page for simplification):
{
"_id" : 8,
"name" : "Sara D. Roosevelt Park",
"location" : {
"type" : "Point",
"coordinates" : [
-73.9928,
40.7193
]
},
"category" : "Parks",
"distance" : 974.175764916902
}
which is essentially my output type with added "distance" field. What I want to do is separate them into a different class so it looks like this:
public class EventLocationSearchResult
{
public Event Event { get; set; }
public double Distance { get; set; }
}
My code so far looks like this:
var result = await events
.Aggregate()
.AppendStage(GeoHelper.GetGeoNearStage<Event>(lon, lat, distance))
.Lookup<User, Event>(_usersCollectionName, "memberIds", "id", "Members")
.ToListAsync();
with GeoHelper.GetGeoNearStage(..) looking like so:
public static BsonDocumentPipelineStageDefinition<TNewResult, TNewResult> GetGeoNearStage<TNewResult>(double lon, double lat, double distance, string dbField)
{
var geoNearOptions = new BsonDocument
{
{ "near", new GeoJsonPoint<GeoJson2DGeographicCoordinates>(new GeoJson2DGeographicCoordinates(lon, lat)).ToBsonDocument()},
{ "maxDistance", distance },
{ "distanceField", "Distance" },
{ "spherical", true }
};
var stage = new BsonDocumentPipelineStageDefinition<TNewResult, TNewResult>(new BsonDocument { { "$geoNear", geoNearOptions } });
return stage;
}
can I do it somehow? Thanks for any help!

Replace Root with C# driver for .net core AggregateExpressionDefinition

I'm trying to perform a simple unwind and replace root in .net core 2.2.
I've already tried the query in MongoDB and it works but I'm finding it difficult to translate it 100 % to C# without using magic strings.
This is my document:
{
"_id" : ObjectId("5cb6475b20b49a5cec99eb89"),
"name" : "Route A"
"isActive" : true,
"buses" : [
{
"capacity" : "15",
"time" : "08:00:00",
"direction" : "Inbound"
},
{
"capacity" : "15",
"time" : "08:30:00",
"direction" : "Inbound"
},
{
"capacity" : "15",
"time" : "08:00:00",
"direction" : "Outbound"
},
{
"capacity" : "15",
"time" : "08:30:00",
"direction" : "Outbound"
}
]
}
I also have a class for the root document called Routes and another one for the subdocument called Bus.
The query I'm running in mongo is this one:
db.routes.aggregate([
{ $match : { "_id" : ObjectId("5cb4e818cb95b3572c8f0f2c") } },
{ $unwind: "$buses" },
{ $replaceRoot: { "newRoot": "$buses" } }
])
The expected result is a simple array of buses, so far I'm getting it with this query in C#
_routes.Aggregate()
.Match(r => r.Id == routeId)
.Unwind<Route, UnwoundRoute>(r => r.Buses)
.ReplaceRoot<Bus>("$buses")
.ToListAsync();
I want to know if it's possible to replace the string "$buses" with something that's not hardcoded.
I've tried using the AggregateExpressionDefinition class which is one of the possible parameters that the ReplaceRoot method can receive but I wasn't able to understand it completely.
Any help will be appreciated.
Posting this here in case someone ends up making the same mistakes I did.
I basically created a new "UnwoundRoute" entity to hold the results of the unwind operation and then used a simple LINQ expression. Thanks to the reddit user u/Nugsly for the suggestion about changing the way I should call unwind.
This works:
_routes.Aggregate()
.Match(r => r.Id == routeId)
.Unwind<Route, UnwoundRoute>(r => r.Buses)
.ReplaceRoot(ur => ur.Buses)
.ToListAsync();
You can also filter the result of the replace root afterwards:
_routes.Aggregate()
.Match(r => r.Id == routeId)
.Unwind<Route, UnwoundRoute>(r => r.Buses)
.ReplaceRoot(ur => ur.Buses)
.Match(b => b.Direction == direction)
.ToListAsync();
And it will return an array of documents.
{
"capacity" : "15",
"time" : "08:00:00",
"direction" : "Inbound"
},
{
"capacity" : "15",
"time" : "08:30:00",
"direction" : "Inbound"
}
Also, if you try to add the result type to replace root VS will throw an error saying that the lambda expression couldn't be converted because it's not a delegate type.
This doesn't (which is what I had in the beggining):
_routes.Aggregate()
.Match(r => r.Id == routeId)
.Unwind<Route, UnwoundRoute>(r => r.Buses)
.ReplaceRoot<Bus>(ur => ur.Buses)
.ToListAsync();
i can offer you a solution that uses MongoDAL, which is just a wrapper around the c# driver which hides most of the complexity of the driver.
using System;
using System.Linq;
using MongoDAL;
namespace BusRoutes
{
class Route : Entity
{
public string name { get; set; }
public bool isActive { get; set; }
public Bus[] buses { get; set; }
}
class Bus
{
public int capacity { get; set; }
public string time { get; set; }
public string direction { get; set; }
}
class Program
{
static void Main(string[] args)
{
new DB("busroutes");
var routeA = new Route
{
name = "Route A",
buses = new Bus[]
{
new Bus { capacity = 15, direction = "Inbound", time = "8:00:00"},
new Bus { capacity = 25, direction = "Outbound", time = "9:00:00" },
new Bus { capacity = 35, direction = "Inbound", time = "10:00:00" }
}
};
routeA.Save();
var query = routeA.Collection()
.Where(r => r.ID == routeA.ID)
.SelectMany(r => r.buses);
Console.WriteLine(query.ToString());
var busesOfRouteA = query.ToArray();
foreach (var bus in busesOfRouteA)
{
Console.WriteLine(bus.time.ToString());
}
Console.ReadKey();
}
}
}

MongoDb - Joining ObjectId references in the list with related collections [duplicate]

I have the following MongoDb query working:
db.Entity.aggregate(
[
{
"$match":{"Id": "12345"}
},
{
"$lookup": {
"from": "OtherCollection",
"localField": "otherCollectionId",
"foreignField": "Id",
"as": "ent"
}
},
{
"$project": {
"Name": 1,
"Date": 1,
"OtherObject": { "$arrayElemAt": [ "$ent", 0 ] }
}
},
{
"$sort": {
"OtherObject.Profile.Name": 1
}
}
]
)
This retrieves a list of objects joined with a matching object from another collection.
Does anybody know how I can use this in C# using either LINQ or by using this exact string?
I tried using the following code but it can't seem to find the types for QueryDocument and MongoCursor - I think they've been deprecated?
BsonDocument document = MongoDB.Bson.Serialization.BsonSerializer.Deserialize<BsonDocument>("{ name : value }");
QueryDocument queryDoc = new QueryDocument(document);
MongoCursor toReturn = _connectionCollection.Find(queryDoc);
There is no need to parse the JSON. Everything here can actually be done directly with either LINQ or the Aggregate Fluent interfaces.
Just using some demonstration classes because the question does not really give much to go on.
Setup
Basically we have two collections here, being
entities
{ "_id" : ObjectId("5b08ceb40a8a7614c70a5710"), "name" : "A" }
{ "_id" : ObjectId("5b08ceb40a8a7614c70a5711"), "name" : "B" }
and others
{
"_id" : ObjectId("5b08cef10a8a7614c70a5712"),
"entity" : ObjectId("5b08ceb40a8a7614c70a5710"),
"name" : "Sub-A"
}
{
"_id" : ObjectId("5b08cefd0a8a7614c70a5713"),
"entity" : ObjectId("5b08ceb40a8a7614c70a5711"),
"name" : "Sub-B"
}
And a couple of classes to bind them to, just as very basic examples:
public class Entity
{
public ObjectId id;
public string name { get; set; }
}
public class Other
{
public ObjectId id;
public ObjectId entity { get; set; }
public string name { get; set; }
}
public class EntityWithOthers
{
public ObjectId id;
public string name { get; set; }
public IEnumerable<Other> others;
}
public class EntityWithOther
{
public ObjectId id;
public string name { get; set; }
public Other others;
}
Queries
Fluent Interface
var listNames = new[] { "A", "B" };
var query = entities.Aggregate()
.Match(p => listNames.Contains(p.name))
.Lookup(
foreignCollection: others,
localField: e => e.id,
foreignField: f => f.entity,
#as: (EntityWithOthers eo) => eo.others
)
.Project(p => new { p.id, p.name, other = p.others.First() } )
.Sort(new BsonDocument("other.name",-1))
.ToList();
Request sent to server:
[
{ "$match" : { "name" : { "$in" : [ "A", "B" ] } } },
{ "$lookup" : {
"from" : "others",
"localField" : "_id",
"foreignField" : "entity",
"as" : "others"
} },
{ "$project" : {
"id" : "$_id",
"name" : "$name",
"other" : { "$arrayElemAt" : [ "$others", 0 ] },
"_id" : 0
} },
{ "$sort" : { "other.name" : -1 } }
]
Probably the easiest to understand since the fluent interface is basically the same as the general BSON structure. The $lookup stage has all the same arguments and the $arrayElemAt is represented with First(). For the $sort you can simply supply a BSON document or other valid expression.
An alternate is the newer expressive form of $lookup with a sub-pipeline statement for MongoDB 3.6 and above.
BsonArray subpipeline = new BsonArray();
subpipeline.Add(
new BsonDocument("$match",new BsonDocument(
"$expr", new BsonDocument(
"$eq", new BsonArray { "$$entity", "$entity" }
)
))
);
var lookup = new BsonDocument("$lookup",
new BsonDocument("from", "others")
.Add("let", new BsonDocument("entity", "$_id"))
.Add("pipeline", subpipeline)
.Add("as","others")
);
var query = entities.Aggregate()
.Match(p => listNames.Contains(p.name))
.AppendStage<EntityWithOthers>(lookup)
.Unwind<EntityWithOthers, EntityWithOther>(p => p.others)
.SortByDescending(p => p.others.name)
.ToList();
Request sent to server:
[
{ "$match" : { "name" : { "$in" : [ "A", "B" ] } } },
{ "$lookup" : {
"from" : "others",
"let" : { "entity" : "$_id" },
"pipeline" : [
{ "$match" : { "$expr" : { "$eq" : [ "$$entity", "$entity" ] } } }
],
"as" : "others"
} },
{ "$unwind" : "$others" },
{ "$sort" : { "others.name" : -1 } }
]
The Fluent "Builder" does not support the syntax directly yet, nor do LINQ Expressions support the $expr operator, however you can still construct using BsonDocument and BsonArray or other valid expressions. Here we also "type" the $unwind result in order to apply a $sort using an expression rather than a BsonDocument as shown earlier.
Aside from other uses, a primary task of a "sub-pipeline" is to reduce the documents returned in the target array of $lookup. Also the $unwind here serves a purpose of actually being "merged" into the $lookup statement on server execution, so this is typically more efficient than just grabbing the first element of the resulting array.
Queryable GroupJoin
var query = entities.AsQueryable()
.Where(p => listNames.Contains(p.name))
.GroupJoin(
others.AsQueryable(),
p => p.id,
o => o.entity,
(p, o) => new { p.id, p.name, other = o.First() }
)
.OrderByDescending(p => p.other.name);
Request sent to server:
[
{ "$match" : { "name" : { "$in" : [ "A", "B" ] } } },
{ "$lookup" : {
"from" : "others",
"localField" : "_id",
"foreignField" : "entity",
"as" : "o"
} },
{ "$project" : {
"id" : "$_id",
"name" : "$name",
"other" : { "$arrayElemAt" : [ "$o", 0 ] },
"_id" : 0
} },
{ "$sort" : { "other.name" : -1 } }
]
This is almost identical but just using the different interface and produces a slightly different BSON statement, and really only because of the simplified naming in the functional statements. This does bring up the other possibility of simply using an $unwind as produced from a SelectMany():
var query = entities.AsQueryable()
.Where(p => listNames.Contains(p.name))
.GroupJoin(
others.AsQueryable(),
p => p.id,
o => o.entity,
(p, o) => new { p.id, p.name, other = o }
)
.SelectMany(p => p.other, (p, other) => new { p.id, p.name, other })
.OrderByDescending(p => p.other.name);
Request sent to server:
[
{ "$match" : { "name" : { "$in" : [ "A", "B" ] } } },
{ "$lookup" : {
"from" : "others",
"localField" : "_id",
"foreignField" : "entity",
"as" : "o"
}},
{ "$project" : {
"id" : "$_id",
"name" : "$name",
"other" : "$o",
"_id" : 0
} },
{ "$unwind" : "$other" },
{ "$project" : {
"id" : "$id",
"name" : "$name",
"other" : "$other",
"_id" : 0
}},
{ "$sort" : { "other.name" : -1 } }
]
Normally placing an $unwind directly following $lookup is actually an "optimized pattern" for the aggregation framework. However the .NET driver does mess this up in this combination by forcing a $project in between rather than using the implied naming on the "as". If not for that, this is actually better than the $arrayElemAt when you know you have "one" related result. If you want the $unwind "coalescence", then you are better off using the fluent interface, or a different form as demonstrated later.
Querable Natural
var query = from p in entities.AsQueryable()
where listNames.Contains(p.name)
join o in others.AsQueryable() on p.id equals o.entity into joined
select new { p.id, p.name, other = joined.First() }
into p
orderby p.other.name descending
select p;
Request sent to server:
[
{ "$match" : { "name" : { "$in" : [ "A", "B" ] } } },
{ "$lookup" : {
"from" : "others",
"localField" : "_id",
"foreignField" : "entity",
"as" : "joined"
} },
{ "$project" : {
"id" : "$_id",
"name" : "$name",
"other" : { "$arrayElemAt" : [ "$joined", 0 ] },
"_id" : 0
} },
{ "$sort" : { "other.name" : -1 } }
]
All pretty familiar and really just down to functional naming. Just as with using the $unwind option:
var query = from p in entities.AsQueryable()
where listNames.Contains(p.name)
join o in others.AsQueryable() on p.id equals o.entity into joined
from sub_o in joined.DefaultIfEmpty()
select new { p.id, p.name, other = sub_o }
into p
orderby p.other.name descending
select p;
Request sent to server:
[
{ "$match" : { "name" : { "$in" : [ "A", "B" ] } } },
{ "$lookup" : {
"from" : "others",
"localField" : "_id",
"foreignField" : "entity",
"as" : "joined"
} },
{ "$unwind" : {
"path" : "$joined", "preserveNullAndEmptyArrays" : true
} },
{ "$project" : {
"id" : "$_id",
"name" : "$name",
"other" : "$joined",
"_id" : 0
} },
{ "$sort" : { "other.name" : -1 } }
]
Which actually is using the "optimized coalescence" form. The translator still insists on adding a $project since we need the intermediate select in order to make the statement valid.
Summary
So there are quite a few ways to essentially arrive at what is basically the same query statement with exactly the same results. Whilst you "could" parse the JSON to BsonDocument form and feed this to the fluent Aggregate() command, it's generally better to use the natural builders or the LINQ interfaces as they do easily map onto the same statement.
The options with $unwind are largely shown because even with a "singular" match that "coalescence" form is actually far more optimal then using $arrayElemAt to take the "first" array element. This even becomes more important with considerations of things like the BSON Limit where the $lookup target array could cause the parent document to exceed 16MB without further filtering. There is another post here on Aggregate $lookup Total size of documents in matching pipeline exceeds maximum document size where I actually discuss how to avoid that limit being hit by using such options or other Lookup() syntax available to the fluent interface only at this time.

Understanding MongoDB Aggregate and GroupBy

I'm trying to do a query in MongoDB to first group by id and then sort descending. I have a functional LINQ expression here:
var list = this.GetPackages().ToList();
list = list.OrderByDescending(package => package.PackageVersion)
.GroupBy(g => g.PackageId)
.Select(packages => packages.First()).ToList();
return list;
But I can't seem to come up with the equivalent MongoDB expression, nor can I even get the $project function to work:
db.packages.aggregate([
{
$sort : { packageVersion : -1 }
},
{
$group: { _id: "$PackageId" }
},
{
$project: { PackageVersion: 1, Title: 1 }
}
])
My result is this:
{ "_id" : "e3afb1fe-dce7-476e-8372-cd8201abc131" }
{ "_id" : "e3722179-0903-4894-9a86-3a3ffd94de83" }
{ "_id" : "3e65e93a-4c2c-4a02-8b21-e5858a4058dd" }
Is the MongoDB query of the correct format, and is there an equivalent way to do this using the C# MongoDB driver?
Make use of the $first operator and $$ROOT variable to get the first document in the group.
$$ROOT is a system variable that:
References the root document, i.e. the top-level document, currently
being processed in the aggregation pipeline stage.
Then project the first document.
db.packages.aggregate([
{
$sort : { packageVersion : -1 }
},
{
$group: { "_id": "$PackageId","firstPackage":{$first:"$$ROOT"}}
},
{
$project: { "firstPackage": 1, "_id": 0}
}
])

Categories

Resources