Mongodb Lookup with sorting and grouping in C# - c#

I have the following db config:
db={
"order": [
{
"id": 1,
"version": 1
},
{
"id": 1,
"version": 2
},
{
"id": 2,
"version": 1
},
{
"id": 2,
"version": 2
}
],
"orderDetail": [
{
"orderId": 1,
"orderDate": new Date("2020-01-18T16:00:00Z")
},
{
"orderId": 1,
"orderDate": new Date("2020-01-11T16:00:00Z")
},
{
"orderId": 1,
"orderDate": new Date("2020-01-12T16:00:00Z")
}
]
}
I'm using the fluent interface to perform a Lookup joining the orderDetails to the order collection (as shown in this post). Now that I have the join in place what's the best method to:
Sort the joined array such that the details are sorted by orderDate
Group the Orders (by OrderID) and sort by version to select the latest (largest Version #)
The workaround I implemented for #1 involves sorting the list after performing the lookup, but that's only because I wasn't able to apply a sort to the "as" of collection as part of the Lookup.
If anyone has any ideas, I'd appreciate it. Thanks!

If you are using MongoDB v3.6 or higher, you can use the $lookup with uncorrelated subqueries to use the inner pipelines to archive what you want.
Join Conditions and Uncorrelated Sub-queries
Since you didn't provide what collections or fields you are using, I will give a generic example:
db.customers.aggregate([
{
$lookup: {
from: "orders",
let: { customer_id: "$_id" },
pipeline: [
{ $match: { $expr: { $eq: [ "$customer_id", "$$customer_id" ] } } },
{ $sort: { orderDate: -1 } }
],
as: "orders"
}
}
]);
I hope that gives you a way to get where you want. =]

Related

How can I create index on CosmosDb for a single element of key-value pair array

I use Azure CosmosDb with MongoDb Api and got collection of documents with a structure bellow.
I have to filter documents by parameter, ex.
x => x.Parameters.Any(xx => xx.Key == ParameterNames.ShiftId && (int)xx.Value == shiftId)
It seems to me to better performance I need to create index, but I cannot find any information how can I do it.
{
"_id": "08d8c696-2b7b-f227-d5dd-0647a8d51c1c",
"State": 2,
"Created": {
"$date": "2021-02-01T09:45:54.986Z"
},
"TailId": "e8fb236e-4d48-417b-bf5a-73f1d48fe239",
"Parameters": [{
"k": "ShiftId",
"v": 181
}, {
"k": "Id",
"v": "147814878155"
}, ....
]
}
For the Cosmos DB Mongo API, you could try adding a wildcard index on Parameters as described in the docs.
If you are in a position to change your model, you'll likely have better performance by refactoring the Parameters array into properties closer to the document root. For example:
{
"_id": "08d8c696-2b7b-f227-d5dd-0647a8d51c1c",
"State": 2,
...
"ShiftId": 181,
}
or
{
"_id": "08d8c696-2b7b-f227-d5dd-0647a8d51c1c",
"State": 2,
...
Parameters: {
"ShiftId": 181,
"Id": "147814878155"
}
}

How to make this CosmosDB SQL Query work without knowing ARRAY index?

I am querying a CosmosDB in such a way that I am getting a string in and ned to return some data out through a C# WEB API, the query that works for me is as below
SELECT *
FROM c IN jongel.OriginalData.base.sales.variants
WHERE c.globalTradeItemNumber.globalTradeItemNumberType[0].GTIN = '1111111111111'
The problem is that I have to know the ARRAY INDEX for the globalTradeItemNumberType ARRAY, [0] in this example, for it to work but it is not always 0, it could be any number from 0-9 basically and I cannot figure out how to rewrite the query so that it works regardless of the index where the matching data is found?
How can I rewrite this query so that I do not need to know the ARRAY INDEX beforehand?
--- EDIT ---
A sample document shortened to only include the needed parts
{
"id": "635af816-8db7-49c6-8284-ab85116b499b",
"brand": "XXX",
"IntegrationSource": "XXX",
"DocumentType": "Item",
"ItemInformationType": "",
"ItemLevel": "Article",
"ItemNo": "0562788040",
"UpdatedDate": "1/1/2020 4:00:01 AM",
"UpdatedDateUtc": "2020-01-01T04:00:01.82Z",
"UpdatedBy": "XXX",
"OriginalData": {
"corporateBrandId": "2",
"productId": "0562788",
"articleId": "0562788040",
"season": "201910",
"base": {
"sales": {
"SAPArticleNumber": "562788040190",
"simpleColour": {
"simpleColourId": "99",
"simpleColourDescription": "Green",
"translatedColourDescription": [
{
"languageCode": "sr",
"simpleColourDescription": "Zeleno"
},
{
"languageCode": "zh-Hans",
"simpleColourDescription": "绿色"
},
{
"languageCode": "vi-VN",
"simpleColourDescription": "Xanh la cay"
}
]
},
"variants": [
{
"variantId": "0562788040001",
"variantNumber": "562788040190001",
"variantDescription": "YYYYYYYYY, XXS",
"sizeScaleAndCode": "176-001",
"netWeight": 0.491,
"unitsOfMeasure": {
"unitsOfMeasureType": [
{
"alternativeUOM_ISO": "PCE",
"length": 320,
"width": 290,
"height": 31,
"unitOfDimension": "MM",
"volume": 2876.8,
"volumeUnit": "CCM",
"weightUnit": "KG"
}
]
},
"globalTradeItemNumber": {
"globalTradeItemNumberType": [
{
"GTIN": "1111111111111",
"GTINCategory": "Z3"
},
{
"GTIN": "2222222222222",
"GTINCategory": "Z3"
},
{
"GTIN": "3333333333333",
"GTINCategory": "IE"
}
]
}
}
]
}
}
}
}
I tried the following query based on suggested answer below but it did not work
SELECT *
FROM c
WHERE ARRAY_CONTAINS(c.OriginalData.base.sales.variants.globalTradeItemNumber.globalTradeItemNumberType, {GTIN:"1111111111111"}, true)
I guess the above fails because variants part of the tree is also an array?
NOTE: the variants array can hold several objects so its not always index[0]
You could try using the ARRAY_CONTAINS function.
SELECT *
FROM c IN jongel.OriginalData.base.sales.variants
WHERE ARRAY_CONTAINS(c.globalTradeItemNumber.globalTradeItemNumberType, {GTIN:"1111111111111"}, true)
This will allow the query to search all items in the array for a matching GTIN value.
https://learn.microsoft.com/en-us/azure/cosmos-db/sql-query-array-contains

How can I push an array into aggregate pipeline and pull it one level up?

I am new in MongoDB and I am developing a software by C# and MongoDB. My data structure is like this
{
"Id": 1,
"Title": "myTitle",
"Geners": [ "Drama", "Action" ],
"Category": 1,
"Casts": [
{
"Id": 1,
"Name": "myName",
"Gender": "Male",
"Age": 35
},
{
"Id": 2,
"Name": "herName",
"Gender": "Female",
"Age": 30
},
{
"Id": 3,
"Name": "hisName",
"Gender": "Male",
"Age": 45
}
]
}
This is just one document and I have about 5 million documents. I want to run a query like below to count the records based on Category and shows me how many movie do I have in each category and I want to put Casts field in result.
db.getCollection('myCollection').aggregate([
{
$group:{"_id":"$Category", "count": {$sum:1},
"Casts":{$push:"$Casts"}}
}
])
this is close to something I want but the problem is, it puts Casts data in second level of array like {"Id":1, ... , "Casts":[[{},{},...]]} but I need it like this {"Id":1, ... , "Casts":[{},{},...]}
How can I show the data like that?
If duplicates are acceptable, then the following aggregation will suffice:
db.getCollection('myCollection').aggregate([
{ $unwind:"$Casts"},
{
$group:{"_id":"$Category", "count": {$sum:1},
"Casts":{$push:"$Casts"}}
}
])
Update:
Since you need the count to be valid, there's a few more hoops to jump through.
db.getCollection('myCollection').aggregate([
{ $group:{"_id":"$Category", "count": {$sum:1}, "Casts":{$addToSet:"$Casts"}}},
{$unwind:"$Casts"},
{$unwind:"$Casts"},
{ $group:{"_id":"$_id", "count": {$first:"$count"}, "Casts":{$addToSet:"$Casts"}}},
])
Let me know if that helps

DocumentDB query nested object

I'm having trouble querying nested objects in DocumentDB. I have no control over the format of the data. Let's say an object looks like this in DocumentDB:
{
"SCHEMA_ID": {
"PROJECT": "A",
"MODEL": "B",
"GUID":"A GUID"
},
"STATE": {
"Active": "True"
},
"OBJECTS": {
"OBJECT": [
{
"ATTR_VALS": {
"NAME": "Header",
"ID": "0",
"VALUE": [
{
"NAME": "JobId",
"VAL": "1011656"
},
{
"NAM": "Region",
"VAL": "West Coast"
}
]
}
},
{
"ATTR_VALS": {
"NAME": "SampleData",
"ID": "0",
"VALUE": [
{
"NAME": "Height",
"VAL": "5"
},
{
"NAM": "Length",
"VAL": "3"
}
]
}
}
]
}
}
I want to find all the objects that have a 'ATTR_VALS' = 'SampleData' and where those items have a 'Height'=5
So Far I have:
SELECT test.GUID
FROM test
join OBJECTS in test.OBJECTS
join OBJECT in OBJECTS
join ATTR_VALS in OBJECT
join VALUE in ATTR_VALS
WHERE ATTR_VALS.NAME = 'SampleData' AND VALUE.NAME='Height' AND VALUE.VAL='5'
But this doesn't work, and returns no results. Thanks!
The query must be:
SELECT test.SCHEMA_ID.GUID
FROM test
join OBJ in test.OBJECTS.OBJECT
join VAL in OBJ.ATTR_VALS["VALUE"]
WHERE OBJ.ATTR_VALS.NAME = "SampleData" AND VAL.NAME='Height' AND VAL.VAL='5'
A couple things I changed:
JOIN must be performed against arrays, not objects. Objects can be expanded using the “.” Operator
VALUE is a special keyword and must be escaped
Small typo in the projection clause missing SCHEMA_ID

Elasticsearch - Rolling up "other" results

I'm trying to rollup some of my 'other' results using Elasticsearch. Ideally, I'd like my query to return the top N hits and then roll the rest of the data up into an N+1 hit titled "Other".
So for example, if I'm trying to aggregate "Institutions by Total Value", I'd get back 10 Institutions with the most value and then the total aggregated value of the other institutions as another record. The purpose is that I'd like to see the total value aggregated across all institutions but not have to list thousands.
An example search I've been using is:
GET my_index/institution/_search?pretty=true
{
"query": {
"filtered": {
"filter": {
"bool": {
"must": [
... terms queries ...
]
}
}
}
},
"aggs": {
"dimension_type_name_agg": {
"terms": {
"field": "institution_name",
"order": {
"metric_sum_total_value_agg": "desc"
},
"size": 0
},
"aggs": {
"metric_sum_total_value_agg": {
"sum": {
"field": "total_value"
}
},
"metric_count_account_id_agg": {
"value_count": {
"field": "institution_id"
}
}
}
}
}
}
I'm curious as to if this can be done by modifying a query like the one given above. Also, I'm using C# and Nest/Elasticsearch.NET so any tips on how this translates to that side is appreciated as well.

Categories

Resources