Azure Cosmos DB Add Composite Index for Array of String - c#

I am trying to add a new composite index to do multiple fields search.
I would like to know thing to consider while adding a new Composite index and will it work for array string?
Sample Cosmos Document
{
"id": "ed78b9b5-764b-4ebc-a4f2-6b764679",
"OrderReference": "X000011380",
"SetReferences": [
"000066474884"
],
"TransactionReference": "ed78b9b5-764b-4ebc-6b7644f06679",
"TransactionType": "Debit",
"Amount": 73.65,
"Currency": "USD",
"BrandCode": "TestBrand",
"PartitionKey": "Test-21052020-255",
"SettlementDateTime": "2020-05-21T04:35:35.133Z",
"ReasonCode": "TestReason",
"IsProcessed": true,
}
My Existing index policy
{
"indexingMode": "consistent",
"automatic": true,
"includedPaths": [
{
"path": "/PartitionKey/?"
},
{
"path": "/BrandCode/?"
}
],
"excludedPaths": [
{
"path": "/*"
},
{
"path": "/\"_etag\"/?"
}
],
"compositeIndexes": [
[
{
"path": "/PartitionKey",
"order": "ascending"
},
{
"path": "/IsProcessed",
"order": "ascending"
}
]
]
}
To fetch data from Array of string SettlementReferences, IsProcessed, ReasonCode.
SELECT * FROM c WHERE ARRAY_CONTAINS(c.SettlementReferences, '00884') and c.IsProcessed = true and c.ReasonCode = 'TestReason'
I am planning to add the following policy
{
"indexingMode": "consistent",
"automatic": true,
"includedPaths": [
{
"path": "/PartitionKey/?"
},
{
"path": "/BrandCode/?"
}
],
"excludedPaths": [
{
"path": "/*"
},
{
"path": "/\"_etag\"/?"
}
],
"compositeIndexes": [
[
{
"path": "/PartitionKey",
"order": "ascending"
},
{
"path": "/IsProcessed",
"order": "ascending"
}
],
[
{
"path": "/SettlementReferences",
"order": "ascending"
},
{
"path": "/IsProcessed",
"order": "ascending"
},
{
"path": "/ReasonCode",
"order": "ascending"
}
]
]
}
Please let me know if this change is sufficient?
Moreover, I tried to compare RU's before and after the change. I don't see any massive difference, both coming around 133.56 Rus.
Is there anything more I need to consider for optimized performance?

Composite Indexes will not help with this query and overall don't have any impact on equality statements. They are useful when doing order by's in your queries. This is why you don't see any RU/s reduction in your query. However, you will notice increased RU/s on writes.
If you want to improve your query performance you should add any properties in your where clauses into the "includedPaths" in your index policy.
Another thing to point out too is generally it is a best practice to by default index everything and selectively add properties to excludedPaths. This way, if your schema changes it will be indexed automatically without having to rebuilt your index.

As mark mentioned we need to add include path for Array "/SettlementReferences /[]/?". After adding my number of Ru's reduced from 115 to 5 ru's.

Related

Get nodes and relationships at n-hops away from a root node in a Neo4j graph

I want to query a Neo4j graph for nodes and their relationships at a given hop from a root node. I could get the nodes using apoc.neighbors.byhop, though not sure how I can get relationships between nodes.
Specifically, in the following graph, I'm interested to know that A is connected to C through B1 or B2. The output of apoc.neighbors.byhop does not seem to contain this information.
merge (p1:Person {label:"A"})
merge (p2:Person {label:"B1"})
merge (p3:Person {label:"B2"})
merge (p4:Person {label:"C"})
merge (p1)-[:Knows]->(p2)
merge (p1)-[:Knows]->(p3)
merge (p2)-[:Knows]->(p4)
To retrieve nodes at an n-hop distance:
match (p:Person {label:"A"})
call apoc.neighbors.byhop(p, "Knows", 3)
yield nodes
return nodes
Which returns an object as the following that does not include relationship information.
[
[
{
"identity":11,
"labels":[
"Person"
],
"properties":{
"label":"B1"
}
},
{
"identity":12,
"labels":[
"Person"
],
"properties":{
"label":"B2"
}
}
],
[
{
"identity":0,
"labels":[
"Person"
],
"properties":{
"label":"C"
}
}
]
]
I'm interfacing with Neo4j through its .NET driver.
That APOC function (apoc.neighbors.byhop) only returns nodes and NOT relationships. I tried replicating the same APOC function and returns the relationships in the path.
MATCH path=(p:Person {label:"A"})-[:Knows*1..]->(p2:Person)
WITH [n in nodes(path) where n <> p | n] as nodes, relationships(path) as relationships
WITH size(nodes) as cnt, collect(nodes[-1]) as nodes, collect(distinct relationships[-1]) as relationships
RETURN nodes, relationships
Below is the result:
[
{
"nodes": [
{
"identity": 2,
"labels": [
"Person"
],
"properties": {
"label": "B1"
}
},
{
"identity": 3,
"labels": [
"Person"
],
"properties": {
"label": "B2"
}
}
],
"relationships": [
{
"identity": 0,
"start": 1,
"end": 2,
"type": "Knows",
"properties": {
}
},
{
"identity": 1,
"start": 1,
"end": 3,
"type": "Knows",
"properties": {
}
}
]
},
{
"nodes": [
{
"identity": 4,
"labels": [
"Person"
],
"properties": {
"label": "C"
}
}
],
"relationships": [
{
"identity": 2,
"start": 2,
"end": 4,
"type": "Knows",
"properties": {
}
}
]
}
]
Something like this in pure Cypher will show you all relationships between nodes. You can tweak the node properties and relationship names as you'd like.
MATCH path = (p:Person)-[*1..]->(p2:Person) RETURN p, relationships(path), p2

Solr .Net Facet Term for obtaining numBuckets

Trying to implement with the following Facet Term which works when querying through postman but I cannot find any equivalent in Solr.NET as there are not facet term queries. I can see by field, range, and pivots. I would need to get the numBuckets and I do not see how this can me achieved in with the Solr.NET libraries.
Postman body example.
{
"query": "*:*",
"fields": [
"user_id"
],
"filter": [
"",
],
"facet": {
"user_id": {
"mincount": 1,
"numBuckets": true,
"allBuckets": false,
"offset": 0,
"type": "terms",
"field": "user_id",
"limit": 1
}
},
"params": {
"echoParams": "none"
},
"limit": 100,
"offset": 0,
"sort": "ismail_bius asc,createdon_zius desc"
}
I have tried to pass it in the ExtraParams property of the query options but no luck on it.
Thanks.

Extract Items in Nested Json Array

How can I extract items from nested Json Array using Newtonsoft.Json functions or methods? I have a Json like this
{
"Bounds": {
"TextLength": 1379
},
"DocumentTypeName": "Invoice",
"DocumentTypeField": {
"Value": "Invoice",
"Confidence": 1
},
"Fields": [
{
"FieldId": "RPA.DocumentUnderstanding.Invoice.LineItems",
"FieldName": "Line Items",
"Values": [
{
"Components": [
{
"FieldId": "RPA.DocumentUnderstanding.Invoice.LineItems.Body",
"FieldName": "Body",
"Values": [
{
"Components": [
{
"FieldId": "RPA.DocumentUnderstanding.Invoice.LineItems.Item",
"FieldName": "Item",
"Values": [
{
"Components": [],
"Value": "Film 4C for the publication \"Racing World\" Visual: PNSP 02 05 Ref. 2004/021 Graphic designer honoraries 560010",
"Confidence": 0.962736368
}
]
},
{
"FieldId": "RPA.DocumentUnderstanding.Invoice.LineItems.UnitPrice",
"FieldName": "Unit Price",
"Values": [
{
"Components": [],
"Value": "400.00",
"Confidence": 0.9779528
}
]
}
],
"Confidence": 0.9432406
}]}],
"Confidence": 0.920952857}]}]}
and I want to extract the red highlighted fields from it.
Any help will be much appreciated.
Thanks
Since not deserializing is not a requirement you can do that. Just create C# object that has the exact same structure as your JSON and then do
var yourObject = JsonConvert.DeserializeObject<YourCSharpClassHere>(yourJsonString);
Then it's just a simple matter of getting the values
var fieldName = yourObject.Values[0].Components[0].Values[0].Components[0].FieldName
You can use JSON Query
Example
var fieldNames = o.SelectTokens("Values[*].Components[*].Values[*].Components[*].FieldName");

How to make this CosmosDB SQL Query work without knowing ARRAY index?

I am querying a CosmosDB in such a way that I am getting a string in and ned to return some data out through a C# WEB API, the query that works for me is as below
SELECT *
FROM c IN jongel.OriginalData.base.sales.variants
WHERE c.globalTradeItemNumber.globalTradeItemNumberType[0].GTIN = '1111111111111'
The problem is that I have to know the ARRAY INDEX for the globalTradeItemNumberType ARRAY, [0] in this example, for it to work but it is not always 0, it could be any number from 0-9 basically and I cannot figure out how to rewrite the query so that it works regardless of the index where the matching data is found?
How can I rewrite this query so that I do not need to know the ARRAY INDEX beforehand?
--- EDIT ---
A sample document shortened to only include the needed parts
{
"id": "635af816-8db7-49c6-8284-ab85116b499b",
"brand": "XXX",
"IntegrationSource": "XXX",
"DocumentType": "Item",
"ItemInformationType": "",
"ItemLevel": "Article",
"ItemNo": "0562788040",
"UpdatedDate": "1/1/2020 4:00:01 AM",
"UpdatedDateUtc": "2020-01-01T04:00:01.82Z",
"UpdatedBy": "XXX",
"OriginalData": {
"corporateBrandId": "2",
"productId": "0562788",
"articleId": "0562788040",
"season": "201910",
"base": {
"sales": {
"SAPArticleNumber": "562788040190",
"simpleColour": {
"simpleColourId": "99",
"simpleColourDescription": "Green",
"translatedColourDescription": [
{
"languageCode": "sr",
"simpleColourDescription": "Zeleno"
},
{
"languageCode": "zh-Hans",
"simpleColourDescription": "绿色"
},
{
"languageCode": "vi-VN",
"simpleColourDescription": "Xanh la cay"
}
]
},
"variants": [
{
"variantId": "0562788040001",
"variantNumber": "562788040190001",
"variantDescription": "YYYYYYYYY, XXS",
"sizeScaleAndCode": "176-001",
"netWeight": 0.491,
"unitsOfMeasure": {
"unitsOfMeasureType": [
{
"alternativeUOM_ISO": "PCE",
"length": 320,
"width": 290,
"height": 31,
"unitOfDimension": "MM",
"volume": 2876.8,
"volumeUnit": "CCM",
"weightUnit": "KG"
}
]
},
"globalTradeItemNumber": {
"globalTradeItemNumberType": [
{
"GTIN": "1111111111111",
"GTINCategory": "Z3"
},
{
"GTIN": "2222222222222",
"GTINCategory": "Z3"
},
{
"GTIN": "3333333333333",
"GTINCategory": "IE"
}
]
}
}
]
}
}
}
}
I tried the following query based on suggested answer below but it did not work
SELECT *
FROM c
WHERE ARRAY_CONTAINS(c.OriginalData.base.sales.variants.globalTradeItemNumber.globalTradeItemNumberType, {GTIN:"1111111111111"}, true)
I guess the above fails because variants part of the tree is also an array?
NOTE: the variants array can hold several objects so its not always index[0]
You could try using the ARRAY_CONTAINS function.
SELECT *
FROM c IN jongel.OriginalData.base.sales.variants
WHERE ARRAY_CONTAINS(c.globalTradeItemNumber.globalTradeItemNumberType, {GTIN:"1111111111111"}, true)
This will allow the query to search all items in the array for a matching GTIN value.
https://learn.microsoft.com/en-us/azure/cosmos-db/sql-query-array-contains

How can I push an array into aggregate pipeline and pull it one level up?

I am new in MongoDB and I am developing a software by C# and MongoDB. My data structure is like this
{
"Id": 1,
"Title": "myTitle",
"Geners": [ "Drama", "Action" ],
"Category": 1,
"Casts": [
{
"Id": 1,
"Name": "myName",
"Gender": "Male",
"Age": 35
},
{
"Id": 2,
"Name": "herName",
"Gender": "Female",
"Age": 30
},
{
"Id": 3,
"Name": "hisName",
"Gender": "Male",
"Age": 45
}
]
}
This is just one document and I have about 5 million documents. I want to run a query like below to count the records based on Category and shows me how many movie do I have in each category and I want to put Casts field in result.
db.getCollection('myCollection').aggregate([
{
$group:{"_id":"$Category", "count": {$sum:1},
"Casts":{$push:"$Casts"}}
}
])
this is close to something I want but the problem is, it puts Casts data in second level of array like {"Id":1, ... , "Casts":[[{},{},...]]} but I need it like this {"Id":1, ... , "Casts":[{},{},...]}
How can I show the data like that?
If duplicates are acceptable, then the following aggregation will suffice:
db.getCollection('myCollection').aggregate([
{ $unwind:"$Casts"},
{
$group:{"_id":"$Category", "count": {$sum:1},
"Casts":{$push:"$Casts"}}
}
])
Update:
Since you need the count to be valid, there's a few more hoops to jump through.
db.getCollection('myCollection').aggregate([
{ $group:{"_id":"$Category", "count": {$sum:1}, "Casts":{$addToSet:"$Casts"}}},
{$unwind:"$Casts"},
{$unwind:"$Casts"},
{ $group:{"_id":"$_id", "count": {$first:"$count"}, "Casts":{$addToSet:"$Casts"}}},
])
Let me know if that helps

Categories

Resources