C# MongoDB driver [2.7.0] CountDocumentAsync unexpected native query - c#

I encountered weird thing when using C# MongoDB CountDocumentAsync function. I enabled query logging on MongoDB and this is what I got:
{
"op" : "command",
"ns" : "somenamespace",
"command" : {
"aggregate" : "reservations",
"pipeline" : [
{
"some_query_key": "query_value"
},
{
"$group" : {
"_id" : null,
"n" : {
"$sum" : 1
}
}
}
],
"cursor" : {}
},
"keyUpdates" : 0,
"writeConflicts" : 0,
"numYield" : 9,
"locks" : {
"Global" : {
"acquireCount" : {
"r" : NumberLong(24)
}
},
"Database" : {
"acquireCount" : {
"r" : NumberLong(12)
}
},
"Collection" : {
"acquireCount" : {
"r" : NumberLong(12)
}
}
},
"responseLength" : 138,
"protocol" : "op_query",
"millis" : 2,
"execStats" : {},
"ts" : ISODate("2018-09-27T14:08:48.099Z"),
"client" : "172.17.0.1",
"allUsers" : [ ],
"user" : ""
}
simple count is converted into an aggregate.
More interestingly when I use CountAsync function (which btw is marked obsolete with remark I should be using CountDocumentsAsync) it produces:
{
"op" : "command",
"ns" : "somenamespace",
"command" : {
"count" : "reservations",
"query" : {
"query_key": "query_value"
}
},
"keyUpdates" : 0,
"writeConflicts" : 0,
"numYield" : 9,
"locks" : {
"Global" : {
"acquireCount" : {
"r" : NumberLong(20)
}
},
"Database" : {
"acquireCount" : {
"r" : NumberLong(10)
}
},
"Collection" : {
"acquireCount" : {
"r" : NumberLong(10)
}
}
},
"responseLength" : 62,
"protocol" : "op_query",
"millis" : 2,
"execStats" : {
},
"ts" : ISODate("2018-09-27T13:58:27.758Z"),
"client" : "172.17.0.1",
"allUsers" : [ ],
"user" : ""
}
which is what I would expect. Does anyone know what might be a reason for this behavior? I browsed documentation but didn't find anything interesting regarding it.

This is the documented behaviour for drivers supporting 4.0 features. The reason for the change is to remove confusion and make it clear when an estimate is used and when it is not.
When counting based on a query filter (rather than just counting the entire collection) both methods will cause the server to iterate over matching documents to count them and therefore have similar performance.
From MongoDb docs: db.collection.count()
NOTE:
MongoDB drivers compatible with the 4.0 features deprecate their
respective cursor and collection count() APIs in favor of new APIs for
countDocuments() and estimatedDocumentCount(). For the specific API
names for a given driver, see the driver documentation.
From MongoDb docs: db.collection.countDocuments()
db.collection.countDocuments(query, options)
New in version 4.0.3.
Returns the count of documents that match the query for a collection
or view. The method wraps the $group aggregation stage with a $sum
expression to perform the count and is available for use in
Transactions.
A more detailed explanation for this change in API can be found on the MongoDb JIRA site:
Drivers supporting MongoDB 4.0 must deprecate the count() helper and
add two new helpers - estimatedDocumentCount() and countDocuments().
Both helpers are supported with MongoDB 2.6+.
The names of the new helpers were chosen to make it clear how they
behave and exactly what they do. The estimatedDocumentCount helper
returns an estimate of the count of documents in the collection using
collection metadata, rather than counting the documents or consulting
an index. The countDocuments helper counts the documents that match
the provided query filter using an aggregation pipeline.
The count() helper is deprecated. It has always been implemented using
the count command. The behavior of the count command differs depending
on the options passed to it and the topology in use and may or may not
provide an accurate count. When no query filter is provided the count
command provides an estimate using collection metadata. Even when
provided with a query filter the count command can return inaccurate
results with a sharded cluster if orphaned documents exist or if a
chunk migration is in progress. The countDocuments helper avoids these
sharded cluster problems entirely when used with MongoDB 3.6+, and
when using Primary read preference with older sharded clusters.

Related

MongoDB C# driver - Obsolete CountAsync method is around 6 time faster then CountDocumentAsync. Why?

I have a collection with only 1.2M records and I need to perform two count queries. One without any filter and one with a filter on a indexed date type field.
I am using the C# driver and for some reason the fastest results I got was using the CountAsync method.
Results were:
CountAsync was the fastest by far, but this method is marked as obsolete.
CountDocumentAsync was around 6 times slower than CountAsync.
Using aggregation was around 10-15 times slower than CountAsync.
Any idea why an obsolete method is the faster way to count documents in a collection?
Side note -
Within the aggregation I used the following:
Count without filter used $count aggregation pipeline stage
Count with filter used $match than $count pipeline stages.
To have the closest behavior than CountAsync you can use EstimatedDocumentCountAsync
When you turn log on you can see the queries are different:
var count1 = await cnx.GetDatabase("flow").GetCollection<BsonDocument>("Page")
.CountAsync(_ => true);
query = { "count" : "Page", "query" : { }, "$db" : "flow" }
var count2 = await cnx.GetDatabase("flow").GetCollection<BsonDocument>("Page")
.EstimatedDocumentCountAsync();
query = { "count" : "Page", "$db" : "flow" }
var count3 = await cnx.GetDatabase("flow").GetCollection<BsonDocument>("Page")
.CountDocumentsAsync(_=>true);
query = { "aggregate" : "Page", "pipeline" : [{ "$match" : { } }, { "$group" : { "_id" : 1, "n" : { "$sum" : 1 } } }], "cursor" : { }, "$db" : "flow" }

Mongodb Update or insert in c#

I want to update or insert into to mongo collection "Member". Under this collection i have an array MagazineSubscription. Here magazine Code is unique. Please refer the sample JSON.
So if need to update or insert into mongo using C# mongo driver.
First I need to check this code exist
2, If it exist update one
If it does not exist insert.
Is there any way I can do in one step. Like if it already exist update otherwise insert. Instead of hit twice. Because my collection is very big.
{
"_id" : ObjectId("5c44f7017en0893524d4e9b1"),
"Code" : "WH01",
"Name" : "Lara",
"LastName" : "John",
"DOB" : "12-10-2017",
"Gender" : "Male",
"Dependents" : [
{
"RelationShip" : "Son",
"Name" : "JOHN",
"DOB" : "01-01-1970",
"Gender" : "Male",
"Address" : "Paris",
"ContactNumber" : "+312233445666"
},
{
"RelationShip" : "Wife",
"Name" : "Marry",
"DOB" : "01-01-1980",
"Gender" : "Female",
"Address" : "Paris",
"ContactNumber" : "+312233445666"
}
]
"Matrimony" : [
{
"Fee" : 1000.0,
"FromDate" : "01-01-2015",
"ToDate" : "01-01-2017",
"Status" : false
}
],
"MagazineSubscription" : [
{
"MagazineCode" : "WSS",
"DateFrom" : "01-05-2018",
"DateTo" : "01-01-2020",
"PaidAmount" : 1000.0,
"ActualCost" : 1500.0,
"Status" : false,
"DeliveryStatus" : [
{
"ReturnedDate" : "10-01-2019",
"Comment" : "Returned because of invalid address"
},
{
"ReturnedDate" : "10-02-2019",
"Comment" : "Returned because of invalid address"
}
]
}
]
}
Use mongodb's update operation with upsert:true.
Please refer here: https://docs.mongodb.com/manual/reference/method/db.collection.update/
Here's a sample from the page:
db.collection.update(
<query>,
<update>,
{
upsert: <boolean>, //you need this option
multi: <boolean>,
writeConcern: <document>,
collation: <document>,
arrayFilters: [ <filterdocument1>, ... ]
}
)
And here's a similar question according to what you need:
Upserting in Mongo DB using official C# driver
EDIT 1
Steps:
First you need to write a filter to scan if the document exists. you can check any number of keys (Essentially a document).
Write the update section with the keys you'd like to update (Essentially a document).
Set upsert to true.
Mongodb will use your filter to search the document. If found, it will use the update section to perform the update mentioned by you.
In case the document does not exist, a new document will be created by using the filter keys + keys in the update part.
Hope that makes things clear as I have never used a C# mongo driver. So I won't be able to provide you the exact syntax.
EDIT 2
I'm providing #jeffsaracco's solution here:
MongoCollection collection = db.GetCollection("matches");
var query = new QueryDocument("recordId", recordId); //this is the filter
var update = Update.Set("FirstName", "John").Set("LastName","Doe"); //these are the keys to be updated
matchCollection.Update(query, update, UpdateFlags.Upsert, SafeMode.False);

Using MongoDB C# Driver write ElementMatch with Regex query

I need to construct the following query using MongoDB C# driver
db.Notes.find({ "Group._id" : 74, "CustomFields" : { "$elemMatch" : { "Value" : /batch/i } }, "IsDeleted" : false }).sort({ "CreatedDateTimeUtc" : -1 })
I used a query like this
builder.ElemMatch(x => x.CustomFields, x => x.Value.Contains(filterValue))
It generated mongo query as
db.Notes.find({ "Group._id" : 74, "CustomFields" : { "$elemMatch" : { "Value" : /batch/s } }, "IsDeleted" : false }).sort({ "CreatedDateTimeUtc" : -1 })
if you notice it is appending s at /batch/s instead of i /batch/i
How can I get this work? I need to do this for filters like
contains, using .Contains()
equals, thinking of using .Equals()
doesn't contain, thinking of using !Field.contains(value)
not equals to
starts with
ends with
Can I do something like this, so that I can apply all my regex patterns for all above filters.
builder.Regex(x => x.CustomFields[-1].Value, new BsonRegularExpression($"/{filterValue}/i"));
This converts the query to as below, but that doesn't get any results
db.Notes.find({ "Project._id" : 74, "CustomFields.$.Value" : /bat/i, "IsDeleted" : false }).sort({ "CreatedDateTimeUtc" : -1 })
FYI: builder is FilterDefinition<Note>
My sample Notes Collection is like this:
{
Name:"",
Email:"",
Tel:"",
Date:02 /21/1945,
CustomFields:[
{
Name:"",
Value:"",
IsSearchable:true,
},
{
Name:"",
Value:"",
IsSearchable:true,
},
{
Name:"",
Value:"",
IsSearchable:true,
},
{
Name:"",
Value:"",
IsSearchable:true,
}
]
}
It sounds like all you're missing is the insensitive part. Have you tried this?
ToLower, ToLowerInvariant, ToUpper, ToUpperInvariant (string method)
These methods are used to test whether a string field or property of
the document matches a value in a case-insensitive manner.
According to the 1.1 documentation here, it says that will allow to perform a case insensitive regex match.
The current documentation doesn't mention it, so just to be sure, i checked github and the code to create an insensitive match is still there.

How to store billions of JSON files and query them

I currently have an API which accepts JSON files(which are JSON serialised objects which contains some user transaction data) and stores the same into the server. Every such JSON file has a unique global id and a unique user to which it is associated. There are billions of such files generated every day. The user should then be able to query through all JSON files that are associated to him and produce a bunch of aggregated results calculated on top of those files.
A typical JSON file that needs to be stored looks something like:
[ { "currencyCode" : "INR",
"receiptNumber" : { "value" : "1E466GDX5X2C" },
"retailTransaction" : [ { "grandTotal" : 90000.0,
"lineItem" : [ { "otherAttributes" : { },
"sale" : { "description" : "Samsung galaxy S3",
"discountAmount" : { "currency" : "INR",
"value" : 2500
},
"itemSubType" : "SmartPhone",
"otherAttributes" : { },
"unitCostPrice" : { "quantity" : 1,
"value" : 35000
}
},
"sequenceNumber" : 1000
},
{ "customerOrderForPickup" : { "description" : "iPhone5",
"discountAmount" : { "currency" : "INR",
"value" : 5000
},
"itemSubType" : "SmartPhone",
"otherAttributes" : { },
"unitCostPrice" : { "quantity" : 1,
"value" : 55000
}
},
"otherAttributes" : { },
"sequenceNumber" : 1000
}
],
"otherAttributes" : { },
"reason" : "Delivery",
"total" : [ { "otherAttributes" : { },
"type" : "TransactionGrossAmount",
"value" : 35000
} ]
},
null
],
"sequenceNumber" : 125435,
"vatRegistrationNumber" : "10868758650"
} ]
The above JSON is the serialised version of a complex object containing single or array of Objects of other classes as attributes. So the 'receiptNumber' is the universal id of the JSON file.
I would need to query stuff like quantity and value of the customerOrderForPickup or the grandTotal of the transaction, and in as an aggegate of various such transaction JSONs **
I would like to have some suggestion as to how to go about: 1) Storing these JSON files on the server, the file system ie 2) What kind of a database should I use to query through these JSON files with such a complex structure
My research has resulted in a couple of possibilities: 1) Use a MongoDB database to store the JSON representatives of the object and query through the database. How would the JSON files be stored? What will be the best way to store the transaction JSONs in the MongoDB database? 2) Couple a SQL database containing the unique global id, user id and the address of the JSON file on the server, with an aggregating code on those files. I doubt if this can be scaled
Would be glad if someone has any insights on the problem. Thanks.
I would say your question is very general and really a matter of style and preferences. You could do this in 10 different ways and every one would be perfectly good.
I'm gonna give my personal preference and how I would do it:
Since there is a lot of data, I would use a Relational database - SQL Server. Since I like Microsoft tools and ASP MVC (I know there is a lot of people who don't, but its my preference) and it has a serializer which can turn JSON into c# objects. Since I also like to use entity framework, and entity framework can translate c# objects into database stuff I would just structure a database the same way my JSON object looks. I would then have an api that would accept those JSON entities, ASP MVC would automaticly turn them into c# objects and entity framework would automaticly turn them into database rows. This way the whole upload API woudnt take more than a few lines of code to make.
I would then make more API methods for different types of querying the data. Linq and entity framework make the different queries easy as one line of code sometimes.

Exact Contains Match on a Sub Collection

Using mongodb with the NoRM driver I have this document:
{
"_id" : ObjectId("0758030341b870c019591900"),
"TmsId" : "EP000015560091",
"RootId" : "1094362",
"ConnectorId" : "SH000015560000",
"SeasonId" : "7894681",
"SeriesId" : "184298",
"Titles" : [
{
"Size" : 120,
"Type" : "full",
"Lang" : "en",
"Description" : "House"
},
{
"Size" : 10,
"Type" : "red",
"Lang" : "en",
"Description" : "House M.D."
}
], yadda yadda yadda
and I am querying like:
var query = new Expando();
query["Titles.Description"] = Q.In(showNames);
var fuzzyMatches = db.GetCollection<Program>("program").Find(query).ToList();
where showNames is a string[] contain something like {"House", "Glee", "30 Rock"}
My results contain fuzzy matches. For example the term "House" returns every show with a Title with the word House in it ( like its doing a Contains ).
What I would like is straight matches. So if document.Titles contains "A big blue House" it does not return a match. Only if Titles.Description contains "House" would I like a match.
I haven't been able to reproduce the problem, perhaps because we're using different versions of MongoDB and/or NoRM. However, here are some steps that may help you to find the origin of the fuzzy results.
Turn on profiling, using the MongoDB shell:
> db.setProfilingLevel(2)
Run your code again.
Set the profiling level back to 0.
Review the queries that were executed:
> db.system.profile.find()
The profiling information should look something like this:
{
"ts" : "Wed Dec 08 2010 09:13:13 GMT+0100",
"info" : "query test.program ntoreturn:2147483647 reslen:175 nscanned:3 \nquery: { query: { Titles.Description: { $in: [ \"House\", \"Glee\", \"30 Rock\" ] } } } nreturned:1 bytes:159",
"millis" : 0
}
The actual query is in the info property and should be:
{ Titles.Description: { $in: [ "House", "Glee", "30 Rock" ] } }
If your query looks different, then the 'problem' is in the NoRM driver. For example, if NoRM translates your code to the following regex query, it will do a substring match:
{ Titles.Description: { $in: [ /House/, /Glee/, /30 Rock/ ] } }
I have used NoRM myself, but I haven't come across a setting to control this. Perhaps you're using a different version, that does come with such functionality.
If your query isn't different from what it should by, try running the query from the shell. If it still comes up with fuzzy results, then we're definitely using different versions of MongoDB ;)
in shell syntax:
db.mycollection.find( { "Titles.Description" : "House" } )

Categories

Resources