I'm having a hard time understanding how I should write some MongoDB queries.
Maybe my mind is too accustomed to relational databases.
Anyway, I want to retrieve all documents (whole documents, not a subset of elements) but only one per distinct value of a element.
For example, I have the following 3 documents in a collection:
{
"person": {
"name": "james",
"age": "21",
"city": "London"
}
},
{
"person": {
"name": "edith",
"age": "18",
"city": "London"
}
},
{
"person": {
"name": "steve",
"age": "29",
"city": "Berlin"
}
}
I want to retrieve whole documents but only with distinct "city" element values. The rest of the data should be there (hence why I cant just $group them) it just doesn't matter which document among the subset that gets returned.
So the desired output should be (in case we always use the first document with the distinct value): (The first document could just aswell be edith, doesn't matter)
{
"person": {
"name": "james",
"age": "21",
"city": "London"
}
},
{
"person": {
"name": "steve",
"age": "29",
"city": "Berlin"
}
}
Did that make sense?
(Dummy data, but the problem is a real one)
I believe what your are looking for is:
GetCollection<Person>("person").DistinctAsync("person.city",filter)
Related
I use Azure CosmosDb with MongoDb Api and got collection of documents with a structure bellow.
I have to filter documents by parameter, ex.
x => x.Parameters.Any(xx => xx.Key == ParameterNames.ShiftId && (int)xx.Value == shiftId)
It seems to me to better performance I need to create index, but I cannot find any information how can I do it.
{
"_id": "08d8c696-2b7b-f227-d5dd-0647a8d51c1c",
"State": 2,
"Created": {
"$date": "2021-02-01T09:45:54.986Z"
},
"TailId": "e8fb236e-4d48-417b-bf5a-73f1d48fe239",
"Parameters": [{
"k": "ShiftId",
"v": 181
}, {
"k": "Id",
"v": "147814878155"
}, ....
]
}
For the Cosmos DB Mongo API, you could try adding a wildcard index on Parameters as described in the docs.
If you are in a position to change your model, you'll likely have better performance by refactoring the Parameters array into properties closer to the document root. For example:
{
"_id": "08d8c696-2b7b-f227-d5dd-0647a8d51c1c",
"State": 2,
...
"ShiftId": 181,
}
or
{
"_id": "08d8c696-2b7b-f227-d5dd-0647a8d51c1c",
"State": 2,
...
Parameters: {
"ShiftId": 181,
"Id": "147814878155"
}
}
I am querying a CosmosDB in such a way that I am getting a string in and ned to return some data out through a C# WEB API, the query that works for me is as below
SELECT *
FROM c IN jongel.OriginalData.base.sales.variants
WHERE c.globalTradeItemNumber.globalTradeItemNumberType[0].GTIN = '1111111111111'
The problem is that I have to know the ARRAY INDEX for the globalTradeItemNumberType ARRAY, [0] in this example, for it to work but it is not always 0, it could be any number from 0-9 basically and I cannot figure out how to rewrite the query so that it works regardless of the index where the matching data is found?
How can I rewrite this query so that I do not need to know the ARRAY INDEX beforehand?
--- EDIT ---
A sample document shortened to only include the needed parts
{
"id": "635af816-8db7-49c6-8284-ab85116b499b",
"brand": "XXX",
"IntegrationSource": "XXX",
"DocumentType": "Item",
"ItemInformationType": "",
"ItemLevel": "Article",
"ItemNo": "0562788040",
"UpdatedDate": "1/1/2020 4:00:01 AM",
"UpdatedDateUtc": "2020-01-01T04:00:01.82Z",
"UpdatedBy": "XXX",
"OriginalData": {
"corporateBrandId": "2",
"productId": "0562788",
"articleId": "0562788040",
"season": "201910",
"base": {
"sales": {
"SAPArticleNumber": "562788040190",
"simpleColour": {
"simpleColourId": "99",
"simpleColourDescription": "Green",
"translatedColourDescription": [
{
"languageCode": "sr",
"simpleColourDescription": "Zeleno"
},
{
"languageCode": "zh-Hans",
"simpleColourDescription": "绿色"
},
{
"languageCode": "vi-VN",
"simpleColourDescription": "Xanh la cay"
}
]
},
"variants": [
{
"variantId": "0562788040001",
"variantNumber": "562788040190001",
"variantDescription": "YYYYYYYYY, XXS",
"sizeScaleAndCode": "176-001",
"netWeight": 0.491,
"unitsOfMeasure": {
"unitsOfMeasureType": [
{
"alternativeUOM_ISO": "PCE",
"length": 320,
"width": 290,
"height": 31,
"unitOfDimension": "MM",
"volume": 2876.8,
"volumeUnit": "CCM",
"weightUnit": "KG"
}
]
},
"globalTradeItemNumber": {
"globalTradeItemNumberType": [
{
"GTIN": "1111111111111",
"GTINCategory": "Z3"
},
{
"GTIN": "2222222222222",
"GTINCategory": "Z3"
},
{
"GTIN": "3333333333333",
"GTINCategory": "IE"
}
]
}
}
]
}
}
}
}
I tried the following query based on suggested answer below but it did not work
SELECT *
FROM c
WHERE ARRAY_CONTAINS(c.OriginalData.base.sales.variants.globalTradeItemNumber.globalTradeItemNumberType, {GTIN:"1111111111111"}, true)
I guess the above fails because variants part of the tree is also an array?
NOTE: the variants array can hold several objects so its not always index[0]
You could try using the ARRAY_CONTAINS function.
SELECT *
FROM c IN jongel.OriginalData.base.sales.variants
WHERE ARRAY_CONTAINS(c.globalTradeItemNumber.globalTradeItemNumberType, {GTIN:"1111111111111"}, true)
This will allow the query to search all items in the array for a matching GTIN value.
https://learn.microsoft.com/en-us/azure/cosmos-db/sql-query-array-contains
I am new in MongoDB and I am developing a software by C# and MongoDB. My data structure is like this
{
"Id": 1,
"Title": "myTitle",
"Geners": [ "Drama", "Action" ],
"Category": 1,
"Casts": [
{
"Id": 1,
"Name": "myName",
"Gender": "Male",
"Age": 35
},
{
"Id": 2,
"Name": "herName",
"Gender": "Female",
"Age": 30
},
{
"Id": 3,
"Name": "hisName",
"Gender": "Male",
"Age": 45
}
]
}
This is just one document and I have about 5 million documents. I want to run a query like below to count the records based on Category and shows me how many movie do I have in each category and I want to put Casts field in result.
db.getCollection('myCollection').aggregate([
{
$group:{"_id":"$Category", "count": {$sum:1},
"Casts":{$push:"$Casts"}}
}
])
this is close to something I want but the problem is, it puts Casts data in second level of array like {"Id":1, ... , "Casts":[[{},{},...]]} but I need it like this {"Id":1, ... , "Casts":[{},{},...]}
How can I show the data like that?
If duplicates are acceptable, then the following aggregation will suffice:
db.getCollection('myCollection').aggregate([
{ $unwind:"$Casts"},
{
$group:{"_id":"$Category", "count": {$sum:1},
"Casts":{$push:"$Casts"}}
}
])
Update:
Since you need the count to be valid, there's a few more hoops to jump through.
db.getCollection('myCollection').aggregate([
{ $group:{"_id":"$Category", "count": {$sum:1}, "Casts":{$addToSet:"$Casts"}}},
{$unwind:"$Casts"},
{$unwind:"$Casts"},
{ $group:{"_id":"$_id", "count": {$first:"$count"}, "Casts":{$addToSet:"$Casts"}}},
])
Let me know if that helps
I have JSON data with nested arrays (see example below). What I am trying to accomplish is to deserialize this data into a DataSet where each nested array gets inserted into a corresponding datatable.
Example:
[
{
"Id": "1",
"LastName": "John",
"FirstName": "Doe",
"MiddleInitial": "I",
"DateOfBirth": "2000-10-05",
"Gender": "M",
"LastModifiedDate": "2017-03-13 14:36:53",
"Classes": [
{
"ClassNumber": "21",
"TeacherID": "15"
},
{
"ClassNumber": "12",
"TeacherID": "10"
}
]
},
{
"Id": "2",
"LastName": "Jane",
"FirstName": "Doe",
"MiddleInitial": "K",
"DateOfBirth": "2000-10-05",
"Gender": "F",
"LastModifiedDate": "2017-03-13 14:36:53",
"Classes": [
{
"ClassNumber": "11",
"TeacherID": "8"
},
{
"ClassNumber": "4",
"TeacherID": "26"
}
]
}]
So the dataset would contain 2 datatables. One with all of the records from the main array and the second with all of the records from the "Classes" array.
You kind of have to create the DataSet and its DataTables manually and fill them manually. There's no automatic way of doing it, if that's what you were hoping for. If you need to maintain the relationships between the objects you have to add a foreign key to the Class anyway, otherwise there's no way to know which person a Class belongs to.
The table columns could be generated from the JSON properties, of course, if you're careful enough with how you write the code or if the JSON structure is always the same without exception.
I'm having trouble querying nested objects in DocumentDB. I have no control over the format of the data. Let's say an object looks like this in DocumentDB:
{
"SCHEMA_ID": {
"PROJECT": "A",
"MODEL": "B",
"GUID":"A GUID"
},
"STATE": {
"Active": "True"
},
"OBJECTS": {
"OBJECT": [
{
"ATTR_VALS": {
"NAME": "Header",
"ID": "0",
"VALUE": [
{
"NAME": "JobId",
"VAL": "1011656"
},
{
"NAM": "Region",
"VAL": "West Coast"
}
]
}
},
{
"ATTR_VALS": {
"NAME": "SampleData",
"ID": "0",
"VALUE": [
{
"NAME": "Height",
"VAL": "5"
},
{
"NAM": "Length",
"VAL": "3"
}
]
}
}
]
}
}
I want to find all the objects that have a 'ATTR_VALS' = 'SampleData' and where those items have a 'Height'=5
So Far I have:
SELECT test.GUID
FROM test
join OBJECTS in test.OBJECTS
join OBJECT in OBJECTS
join ATTR_VALS in OBJECT
join VALUE in ATTR_VALS
WHERE ATTR_VALS.NAME = 'SampleData' AND VALUE.NAME='Height' AND VALUE.VAL='5'
But this doesn't work, and returns no results. Thanks!
The query must be:
SELECT test.SCHEMA_ID.GUID
FROM test
join OBJ in test.OBJECTS.OBJECT
join VAL in OBJ.ATTR_VALS["VALUE"]
WHERE OBJ.ATTR_VALS.NAME = "SampleData" AND VAL.NAME='Height' AND VAL.VAL='5'
A couple things I changed:
JOIN must be performed against arrays, not objects. Objects can be expanded using the “.” Operator
VALUE is a special keyword and must be escaped
Small typo in the projection clause missing SCHEMA_ID