Sorting by aggregate of two fields - c#

I have a mongo database with documents that look like this:
{
PublishedDate: [date],
PublishedDateOverride: [NullableDate],
...
}
The reason I have the override as a separate field is that it is important to know the original published date as well as the overridden one.
When I get these documents back I want to sort them by their "apparent" published date. That is if there is an override it should use that, otherwise use the original.
Our current system just sorts by PublishedDateOverride and then by PublishedDate which of course groups all of those with a null override together.
For a concrete example take the following four documents:
A = {
PublishedDate: 2014-03-14,
PublishedDateOverride: 2014-03-24,
...
}
B = {
PublishedDate: 2014-01-21,
PublishedDateOverride: 2014-02-02,
...
}
C = {
PublishedDate: 2014-03-01,
PublishedDateOverride: null,
...
}
D = {
PublishedDate: 2014-03-27,
PublishedDateOverride: null,
...
}
The desired sort order would be D (2014-03-27), A (2014-03-14), C (2014-03-01), B (2014-02-02).
I need to be able to do this in the database since I am also paging this data so I can't just sort after getting it out of the database.
So the question:
What is the best way to achieve this goal? Is there a way to sort by an expression? Is there a way to have a calculated field such that whenever I update a document it will put the appropriate date in there to sort on?
I'm doing this in C# in case that is relevant but I would assume any solution would be a mongo one, not in my client code.

If you want a projection of only the valid and greater date then use aggregate with the $cond operator and the $gt operator. A basic shell example for translation (which is not hard) :
db.collection.aggregate([
{ "$project": {
"date": { "$cond": [
{ "$gt": [
"$PublishedDate",
"$PublishedDateOverride"
]},
"$PublishedDate",
"$PublishedDateOverride"
]}
}},
{ "$sort": { "date": 1 } }
])
So that basically breaks down your documents to having the "date" field set to which ever of those two fields had the greater value. Then you can sort on the result. All processed server side.

try
datesCollection.OrderBy(d => d.PublishedDateOverride!= null? d.PublishedDateOverride: d.PublishedDate)

Use Indexes to Sort Query Results
To sort on multiple fields, create a compound index.

Related

C# MongoDB.Driver sortBy element in property that is an array

I've got a document structure like this:
public class CountryDomain{
public string Iso2 {get;set;}
public List<EntityName> Names {get;set;}
}
public class EntityName{
public string Culture {get;set;}
public string Name {get;set;}
}
this results in something like this as a Document
{
Iso2: "th",
Names: [
{Culture: "nl", Name: "Thailand"},
{Culture: "en", Name: "Thailand"}
]
}
Now I want to get a list of countries sorted by the name of the "nl" culture.
Sorting by ISO2 works by doing this:
var collection = Context.Database.GetCollection<CountryDomain>(Table);
return collection.Aggregate().SortByDescending(e => e.Iso2).ToList();
But how do I sort it by the Name value of in that Names array where the Culture is nl.
I tried something like this but that cannot be serialized by the MongoDB Driver.
var collection = Context.Database.GetCollection<CountryDomain>(Table);
return collection.Aggregate().SortByDescending(e => e.Names.FirstOrDefault(x=> x.Culture == "nl").ToList();
In terms of MongoDB querying directly, I think you will need to take an approach demonstrated in this playground example. Basically you'll need to do an $addFields in the aggregation first to extract out the value you want to sort on from the array. You can then use that generated field name in the subsequent sort specification and then remove it afterwards if desired.
The reason that I asked about the size of the result set in the comments is because this operation is not going to be able to use indexes to provide the sort. So if the results being returned (as opposed to just the number of documents in the collection) is large, then this operation may not be particularly performant and may consume meaningful system resources while doing so. That's not a reason to avoid it entirely, but certainly something to be aware of.

MongoDB: Combine Aggregation and Filter

Please see the following post for some background: MongoDB C# Driver - Return last modified rows only
After almost two years of running this code, we've been experiencing performance problems lately and as much as I keep on saying that the code is not the issue, Infrastructure are insisting it's because I'm doing full table scans.
The thing is that the problem is environment specific. Our QA environment runs like a dream all the time but Dev and Prod are very slow at times and fine at other - it's very erratic. They have the same data and code on but Dev and Prod have another app that is also running on the database.
My data has an Id as well as an _id (or AuditId) - I group the data by Id and then return the last _id for that record where it was not deleted. We have multiple historic records for the same ID and I would like to return the last one (see original post).
So I have the following method:
private static FilterDefinition<T> ForLastAuditIds<T>(IMongoCollection<T> collection) where T : Auditable, IMongoAuditable
{
var pipeline = new[] { new BsonDocument { { "$group", new BsonDocument { { "_id", "$Id" }, { "LastAuditId", new BsonDocument { { "$max", "$_id" } } } } } } };
var lastAuditIds = collection.Aggregate<Audit>(pipeline).ToListAsync().Result.ToList().Select(_ => _.LastAuditId);
var forLastAuditIds = Builders<T>.Filter.Where(_ => lastAuditIds.Contains(_.AuditId) && _.Status != "DELETE");
return forLastAuditIds;
}
This method is called by the one below, which accepts an Expression that it appends to the FilterDefinition created by ForLastAuditIds.
protected List<T> GetLatest<T>(IMongoCollection<T> collection,
Expression<Func<T, bool>> filter, ProjectionDefinition<T, T> projection = null,
bool disableRoleCheck = false) where T : Auditable, IMongoAuditable
{
var forLastAuditIds = ForLastAuditIds(collection);
var limitedList = (
projection != null
? collection.Find(forLastAuditIds & filter, new FindOptions()).Project(projection)
: collection.Find(forLastAuditIds & filter, new FindOptions())
).ToListAsync().Result.ToList();
return limitedList;
}
Now, all of this works really well and is re-used by all of my code that calls Collections, but this specific collection is a lot bigger than the others and we are getting slowdowns just on that one.
My question is: Is there a way for me to take the aggregate and Filter Builder and combine them to return a single FilterDefinition that I could use without running the full table scan first?
I really hope I am making sense.
Assuming I fully understand what you want, this should be as easy as this:
First, put a descending index on the LastAuditId field:
db.collection.createIndex{ "LastAuditId": -1 /* for sorting */ }
Or even extend the index to cover for other fields that you have in your filter:
db.collection.createIndex{ "Status": 1, "LastAuditId": -1 /* for sorting */ }
Make sure, however, that you understand how indexes can/cannot support certain queries. And always use explain() to see what's really going on.
The next step is to realize that you must always filter as much as possible as the very first step to reduce the amount of sorting required.
So, if you need to e.g. filter by Name then by all means do it as the very first step if your business requirements permit it. Be careful, however, that filtering at the start changes your semantics in the sense that you will get the last modified documents per each Id that passed the preceeding $match stage as opposed to the last documents per each Id that happen to also pass the following $match stage.
Anyway, most importantly, once you've got a sorted set, you can easily and quickly get the latest full document by using $group with $first which - with the right index in place - will not do a collection scan anymore (it'll be an index scan for now and hence way faster).
Finally, you want to run the equivalent of the following MongoDB query through C# leveraging the $$ROOT variable in order to avoid a second query (I can put the required code together for you once you post your Audit, Auditable and IMongoAuditable types as well as any potential serializers/conventions):
db.getCollection('collection').aggregate({
$match: {
/* some criteria that you currently get in the "Expression<Func<BsonDocument, bool>> filter" */
}
}, {
$sort: {
"ModifiedDate": -1 // this will use the index!
}
}, {
$group: {
"_id": "$Id",
"document": { $first: "$$ROOT" } // no need to do a separate subsequent query or a $max/$min across the entire group because we're sorted!
}
}, {
$match: { // some additional filtering depending on your needs
"document.Status": { $ne: "Delete" }
}
})
Lastly, kindly note that it might be a good idea to move to the latest version of MongoDB because they are currently putting a lot of effort into optimizing aggregation cases like yours, e.g. this one: https://jira.mongodb.org/browse/SERVER-9507

Iterate over 2 nested lists

I am writing a weather app and need to go through 2 nested loops. For the return value I want to iterate over the first list, looking at the corresponding second list data. When the data in the second list matches the bool, I need to get data from the corresponding first list. Now I think that my code works... but would like to ask if this is a good way to do this. I am also not sure if this LINQ query will work in general, with even more nested lists. Here's my approach in LINQ:
public static async Task<string> UpdateWeather(string lat, string lon)
{
WeatherObject weather = await WeatherAPI.GetWeatherAsync(lat, lon);
var first = (from l in weather.list
from w in l.weather
where w.id == 800
select l.city.name).First();
return first;
}
Your code is OK, it is a LINQ query.But one more thing. Use FirstOrDefault() instead of First(). First() will throw an exception if no matched element is found, but FirstOrDefault() will return the element or the default value.
You can also write in LINQ Method syntax if you prefer this.
public static async Task<string> UpdateWeather(string lat, string lon)
{
WeatherObject weather = await WeatherAPI.GetWeatherAsync(lat, lon);
var first = weather.list.Where(l => l.weather.Any(w => w.id == 800))
.Select(l => l.city.name)
.FirstOrDefault();
return first;
}
I believe your query should work, and it should generally work with more nested lists following a similar structure. As to if it is a good way to do this - it depends on the data structure and any data constraints.
For example, if two elements in weather.list have an element in their nested weather list might have the same id, then your code will only return the first one - which may not be correct.
e.g. in json:
[
{
city : {
name : "Chicago"
},
weather : [
{
id = 799
},
{
id = 800
}
]
},
{
city : {
name : "New York"
},
weather : [
{
id = 800
},
{
id = 801
}
]
}
}
For this dataset, your code will return "Chicago", but "New York" also matches. This may not be possible with the data API you are accessing, but given that there are no data constraints to ensure exclusivity of the nested lists, you might want to defensively check that there is only 0 or 1 elements in the returned list that match the expected criteria.
Another suggestion
On another note, not strictly an answer to your question - if you think your code will work but aren't sure, write a unit test. In this case, you'd wrap the call to WeatherAPI in a class that implements an interface you define. Update your method to call the method on a reference to the interface.
For your real application, ensure that an instance of the wrapper/proxy class is set on the reference.
For the unit test, use a framework like Moq to create a mock implementation of the interface that returns a known set of data and use that instead. You can then define a suite of unit tests that use mocks that return different data structures and ensure your code works under all expected structures.
This will be a lot easier if your class is not a static method as well, and if you can use dependency injection (Ninject, Autofac or one of many others...) to manage injecting the appropriate implementation of the service.
Further explanations of unit testing, dependency injection and mocking will take more than I can write in this answer, but I recommend reading up on it - you'll never find yourself thinking "I think this code works" again!

mongodb c# select specific field dot notation

In addition for my previous question:
mongodb c# select specific field.
I'm writing a generic method for selecting a specific field.
the requirements are:
Field can be of any type
Return type is T
Field can be inside a sub field
Field can be inside array items - in that case its OK to select the specific field of all the items in the array
for shorts, im looking for the "select" / dot notation capability.
for example:
the wanted method:
T GetFieldValue<T>(string id, string fieldName)
the document:
persons
{
"id": "avi"
"Freinds" : [
{
"Name" : "joni",
"age" : "33"
},
{
"Name" : "daniel",
"age" : "27"
}]
}
The goal is to call the method like this:
string[] myFriends = GetFieldValue<string[]>("avi", "Freinds.Name");
myFriends == ["joni","daniel"]
as far as i know, using projection expression with lambda is no good for items in array,
I was thinking more dot notation way.
note:
I'm using the new c# driver (2.0)
Thanks A lot.
I don't see good approach with don notation in string, because it has more issues with collections than generic approach:
For example Persion.Friends.Name
Which element is array in this chain?
You should apply explicit conversion for collection elements (possible place of bugs)
Generic methods are more reliable in support and using:
var friends = await GetFieldValue<Person, Friend[]>("avi", x => x.Friends);
var names = friends.Select(x=>x.Name).ToArray();

How to sort a list using my own logic (not alphabetically or numerically)

I've spent the last 30 mins looking through existing answers for what I think is a common question, but nothing quite hits it for me. Apologies if this is a dupe.
I've got a list of objects.
List<Journey> journeys;
The object has a 'status' property - this is a string.
class Journey
{
public string Name;
public string Status;
}
I want to sort based on this string, however not alphabetically. The status depicts the object's position through a journey, either "Enroute", "Finished", or "Error". When sorted ascending I want them to appear in the following order: "Error", "Enroute", "Finished". (In practice there are more statuses than this so renaming them to fall in alphabetical order isn't an option)
Aside from creating a class for 'status' with value and sort order properties, and then sorting based on that, how do I do this? Or is that the best method?
You can define the you sorting logic inside of custom function which is provided to Comparison delegate:
List<Journey> list = new List<Journey>();
list.Sort(new Comparison<Journey>((Journey source, Journey compare) =>
{
// here is my custom compare logic
return // -1, 0 or 1
}));
Just another thought:
class Journey
{
public enum JourneyStatus
{
Enroute,
Finished,
Error
}
public string Name;
public JourneyStatus Status;
}
Used with OrderBy:
var journeys = new List<Journey>();
journeys.Add(new Journey() { Name = "Test1", Status = Journey.JourneyStatus.Enroute });
journeys.Add(new Journey() { Name = "Test2", Status = Journey.JourneyStatus.Error });
journeys.Add(new Journey() { Name = "Test3", Status = Journey.JourneyStatus.Finished });
journeys.Add(new Journey() { Name = "Test4", Status = Journey.JourneyStatus.Enroute });
journeys = journeys.OrderBy(x => x.Status).ToList();
foreach (var j in journeys)
Console.WriteLine("{0} : {1}", j.Name, j.Status);
Output:
Test1 : Enroute
Test4 : Enroute
Test3 : Finished
Test2 : Error
Or you might modify the lambda passed to OrderBy to map the value of Status string to an int.
In some situations you might want to implement IComparer<T>, like Jon said. It can help keeping the sorting logic and the class definition itself in one place.
You need to create a class that implements IComparer<Journey> and implement the Compare method accordingly.
You don't mention how you are sorting exactly, but pretty much all methods of the BCL that involve sorting have an overload that accepts an IComparer so that you can plug in your logic.
Aside from creating a class for 'status' with value and sort order properties, and then sorting based on that, how do I do this?
As this is some custom order you need creating such a class is a good idea.
Derive from Comparer<T> or StringComparer class, implement class with your custom logic, pass instance of this class to sort method.
One way to do this is to assign an order number to each item. So create a database table to hold your items and their order:
Column Name, Column Order
Enrout , 0
Finished, 1
Error, 2
Then when I populate a drop down I can sort by the Order in my SQL select rather than the name.
Or if you don't want to use a database, you could change Status to include the order and then just parse out the Status name: so Status values might look like this: "0,Enrout","1,Finished","2,Error" then it will naturally sort. Then just use split to separate the order from the name:
string[] statusArr = status.split(',');
string order = statusArr[0];
string statusname = statusArr[1];
There's a lot of ways to skin this cat.

Categories

Resources