MongoDB C# Case Insensitive Sort and Index - c#

So far I've been using this code to find my documents and then sort them:
var options = new FindOptions
{
Modifiers = new BsonDocument("$hint", "PathTypeFilenameIndex")
};
return await Collection
.Find(f => f.Metadata["path"] == path, options)
.SortBy(f => f.Metadata["type"])
.ThenBy(f => f.Filename)
.ToListAsync();
I have a class that has Metadata field with path and type fields, also the class has a Filename field. I want all documents with a given path inside the metadata sorted by type and then by Filename.
An example result would be a list of documents ordered by the Name field like this:
a, Ab, B, c, D
Unfortunately, I get something like this:
Ab, B, D, a, c
And that's because MongoDB sorts the data with a simple binary comparison, where 'A' < 'a' because of their ASCII codes.
So my question is: Is there a way to make a case insensitive sort and keep using the "$hint"?
That options I pass to the Find method should tell MongoDB which index to use. I found this post: MongoDB and C#: Case insensitive search but the method here doesn't work for sorting and I couldn't tell MongoDB which index to use.

I think you can use aggregation pipeline with $addFields, $toLower (to convert filename to lowercase in temporary field), and $sort to sort them irrespective of the case
In mongodb shell you would write something like this :
db.collection.aggregate([{
$addFields : {
"lowercaseFileName" : {
$loLower : "$fileName"
}
},{
$sort : {
"metadata.type" : 1,
lowercaseFileName : 1
}
}
}])
Please write the similar code in c#, and see if it works. I dont know c#, otherwise i would have given you the exact query, but i cant.
The idea is to transform the filename to lowercase, save it in temporary field, using addFields and sort by that field.
Hope this helps you out.
Read more about $addFields, $toLower here.
Update
For whoever wants a working code in C# , thanks to #kaloyan-manev
You can use this :
return await Collection.Aggregate()
.Match(f => f.Metadata["path"] == path)
.AppendStage<BsonDocument>(new BsonDocument("$addFields", new BsonDocument("lowercaseFileName", new BsonDocument("$toLower", "$filename"))))
.AppendStage<GridFSFileInfo>(new BsonDocument("$sort", new BsonDocument { {"metadata.type", 1}, {"lowercaseFileName", 1} }))
.ToListAsync();

Did you try to set the CollationStrenght = 2?
Your code would be similar all you need is to set the Collation in the FindObject:
var options = new FindOptions
{
Modifiers = new BsonDocument("$hint", "PathTypeFilenameIndex"),
Collation = new Collation("en", strength: CollationStrength.Secondary)
};

Related

How to create MongoDB MultiKey index on attribute of items in an array .NET Driver

I have a MongoDB collection "foos" containing items which each have an array of "bars". That is, "foo" has the following schema:
{
"id": UUID
"name": string
...
"bars": [
"id": UUID
"key": string
...
]
}
I need to create an index on name and bar.key using the MongoDB C# .NET Mongo driver.
I presumed I could use a Linq Select function to do this as follows:
Indexes.Add(Context.Collection<FooDocument>().Indexes.CreateOne(
Builders<FooDocument>.IndexKeys
.Descending(x => x.Bars.Select(y => y.Key))));
However this results in an InvalidOperationException:
System.InvalidOperationException: 'Unable to determine the serialization information for x => x.Bars.Select(y => y.Id).'
The Mongo documentation on MultiKey indexes shows how to create such an index using simple dot notation, i.e.
db.foos.createIndex( { "name": 1, "bars.key": 1 } )
However the MongoDB driver documentation seems to suggest that using a Linq function as I'm doing is correct.
How can I create a multikey index on my collection using the MongoDB .NET Driver, preferably using a Linq function?
This is an example how to do it with C#
var indexDefinition = Builders<FooDocument>.IndexKeys.Combine(
Builders<FooDocument>.IndexKeys.Ascending(f => f.Key1),
Builders<FooDocument>.IndexKeys.Ascending(f => f.Key2));
await collection.Indexes.CreateOneAsync(indexDefinition);
UPDATE
Regarding index within the array, closest what i was able to find is to use "-1" as index whene you building your index key. As i understand from github source code is is a valid option in case of building queries.
var indexDefinition = Builders<FooDocument>.IndexKeys.Combine(
Builders<FooDocument>.IndexKeys.Ascending(f => f.Key1),
Builders<FooDocument>.IndexKeys.Ascending(f => f.Key2[-1].Key));
await collection.Indexes.CreateOneAsync(indexDefinition);
"-1" is a hardcoded constant in side mongodb C# drivers which means "$" (proof). So this code would try to create index:
{ "Key1": 1, "Key2.$.Key": 1 }
which is fine for querying info from database, but not allowed (will throw an exception "Index key contains an illegal field name: field name starts with '$'") to use in indexes. So i assume it should be changed in mongodb drivers to make it work. Something like "-2" means empty operator. In that case we could use
var indexDefinition = Builders<FooDocument>.IndexKeys.Combine(
Builders<FooDocument>.IndexKeys.Ascending(f => f.Key1),
Builders<FooDocument>.IndexKeys.Ascending(f => f.Key2[-2].Key));
await collection.Indexes.CreateOneAsync(indexDefinition);
which would generate index like:
{ "Key1": 1, "Key2.Key": 1 }
So basically i don't think it is possible right now to build index you want with pure Linq without changing mongo C# drivers.
So i think your only option do like this, still C# but without Linq
await collection.Indexes.CreateOneAsync(new BsonDocument {{"name", 1}, {"bars.key", 1}});
This appears to be a requested feature for the C# driver, although it hasn't seen any progress lately. That said, someone did submit a rough-and-ready solution there on the JIRA thread, so perhaps that will do the job for you.
You can create a string index and use nameof() in C# 6:
Indexes.Add(Context.Collection<FooDocument>().Indexes.CreateOne($"{nameof(FooDocument.Bars)}.{nameof(Bars.Key)}"));
As of "MongoDB.Driver" Version="2.8.0" syntax has been changed and some of the methods has been deprecated.Following is the way to achieve the same .
Please note the CreateOneAsync(new CreateIndexModel<FooDocument>(indexDefinition)); part
var indexDefinition = Builders<FooDocument>.IndexKeys.Combine(
Builders<FooDocument>.IndexKeys.Ascending(f => f.Key1),
Builders<FooDocument>.IndexKeys.Ascending(f => f.Key2[-1].Key));
await collection.Indexes.CreateOneAsync(new CreateIndexModel<FooDocument>(indexDefinition));
As per a previous comment, C# driver still doesn't support a strongly typed way of doing multikey indexes.
Also, using something like [-1] seems bit hacky and not really what you're after as it'll substitute it with $.
As such I suggest doing this (as per MongoDB.Driver 2.8.0 onwards):
var indexDefinition = Builders<FooDocument>.IndexKeys.Combine(
Builders<FooDocument>.IndexKeys.Ascending(f => f.Key1),
Builders<FooDocument>.IndexKeys.Ascending($"{nameof(FooDocument.Key2)}.{nameof(BarType.Key)}"));
await collection.Indexes.CreateOneAsync(new CreateIndexModel<FooDocument>(indexDefinition), cancellationToken: token);

mongodb c# select specific field dot notation

In addition for my previous question:
mongodb c# select specific field.
I'm writing a generic method for selecting a specific field.
the requirements are:
Field can be of any type
Return type is T
Field can be inside a sub field
Field can be inside array items - in that case its OK to select the specific field of all the items in the array
for shorts, im looking for the "select" / dot notation capability.
for example:
the wanted method:
T GetFieldValue<T>(string id, string fieldName)
the document:
persons
{
"id": "avi"
"Freinds" : [
{
"Name" : "joni",
"age" : "33"
},
{
"Name" : "daniel",
"age" : "27"
}]
}
The goal is to call the method like this:
string[] myFriends = GetFieldValue<string[]>("avi", "Freinds.Name");
myFriends == ["joni","daniel"]
as far as i know, using projection expression with lambda is no good for items in array,
I was thinking more dot notation way.
note:
I'm using the new c# driver (2.0)
Thanks A lot.
I don't see good approach with don notation in string, because it has more issues with collections than generic approach:
For example Persion.Friends.Name
Which element is array in this chain?
You should apply explicit conversion for collection elements (possible place of bugs)
Generic methods are more reliable in support and using:
var friends = await GetFieldValue<Person, Friend[]>("avi", x => x.Friends);
var names = friends.Select(x=>x.Name).ToArray();

MongoDB C# Driver multiple field query

Using the MongoDB C# driver How can I include more than one field in the query (Im using vb.net)
I know how to do (for name1=value1)
Dim qry = Query.EQ("name1","value1")
How can I modify this query so I can make it find all documents where name1=value1 and name2=value2?
( Similar to )
db.collection.find({"name1":"value1","name2":"value2"})
I wanted to search a text in different fields and Full Text Search doesn't work for me even after wasting so much time. so I tried this.
var filter = Builders<Book>.Filter.Or(
Builders<Book>.Filter.Where(p=>p.Title.ToLower().Contains(queryText.ToLower())),
Builders<Book>.Filter.Where(p => p.Publisher.ToLower().Contains(queryText.ToLower())),
Builders<Book>.Filter.Where(p => p.Description.ToLower().Contains(queryText.ToLower()))
);
List<Book> books = Collection.Find(filter).ToList();
You can use:
var arrayFilter = Builders<BsonDocument>.Filter.Eq("student_id", 10000)
& Builders<BsonDocument>.Filter.Eq("scores.type", "quiz");
Reference: https://www.mongodb.com/blog/post/quick-start-csharp-and-mongodb--update-operation
And doesn't always do what you want (as I found was the case when doing a not operation on top of an and). You can also create a new QueryDocument, as shown below. This is exactly the equivalent of what you were looking for.
Query.Not(new QueryDocument {
{ "Results.Instance", instance },
{ "Results.User", user.Email } }))

c# Lambda and grouping

trying to get my head around using Lambda expressions to fetch data from my database.
Say I have a table that looks a bit like this (notice the spaces and casing):
name, count:
iPhone 4, 15
iphone 4, 2
iPhone4, 8
If I try to find items by name (using StartsWith()), I only want to fetch the result with the highest count, independent of casing and spaces. So searches for "iphone4" "i p h o n e 4", "iPhone4" sholud all return the "iPhone 4"-record
If you have a MS Sql Server 2005+ the following would work for your stated example:
var inputString = "iPhone 4";
var token = inputString.ToLower().Replace(" ", "");
var tokenizedQuery = DataContext.Devices.Select(d => new { Device = d, Token = d.Name.ToLower().Replace(" ", "") });
var filteredQuery = tokenizedQuery.Where(d => d.Token == token);
var resultsQuery = filteredQuery.Select(d => d.Device).OrderByDescending(d => d.Count);
var result = resultsQuery.FirstOrDefault();
Here is what is going on:
You are creating a tokenized version of your input string by lower-casing it and then removing spaces.
Then you are creating a pseudo-column on your table to create a similar token column
Filter your results based on this token
Finally, select only the record with the highest count
However it is very important that you realize the ToLower() and Replace() methods are being translated to T-SQL commands that run on the sql server and not in your app. This means should you need more sophisticated tokenizing routines, or you are not using MS SQL this may not work!
As others have noted, you may want to clean up your design somewhat. You are essentially storing a key or search keyword that can have many permutations. Doing the tokenizing in a query is not portable or performant, so you should ideally store the tokenized version of this string in its own column. Alternatively, look into Full Text Indexes, as they may also address your problem (again, if using MSSQL).
Let's assume that you have a Collapse string extension, which is not hard to write. One thing you'll note is that there won't be a mapping from this to SQL so the final filtering will have to be done in LINQ to Objects. You might be able to make the DB query more efficient by doing partial filtering (i.e., on iphone), then complete the filtering in memory.
db.Table.ToList().Where( t => t.Name.Collapse().StartsWith( searchString.Collapse() )
.OrderByDescending( t => t.Count )
.Take( 1 );
Where Collapse is
public static class StringExtensions
{
public static string Collapse( this string source )
{
if (string.IsNullOrWhiteSpace( source ))
{
return string.Empty;
}
var builder = new StringBuilder();
foreach (char c in source)
{
if (!char.IsWhiteSpace( c ))
{
builder.Append( c );
}
}
return builder.ToString();
}
}
Note: you'd be better off sanitizing your database if possible AND you really want these to map to the same thing.

MongoDB and C#: Case insensitive search

I am using MongoDB and the C# driver for MongoDB.
I recently discovered that all queries in MongoDB are case-sensitive. How can I make a case-insensitive search?
I found one way to do this:
Query.Matches(
"FirstName",
BsonRegularExpression.Create(new Regex(searchKey,RegexOptions.IgnoreCase)));
The simplest and safest way to do that is using Linq:
var names = namesCollection.AsQueryable().Where(name =>
name.FirstName.ToLower().Contains("hamster"));
As explained in the tutorial ToLower, ToLowerInvariant, ToUpper and ToUpperInvariant all perform matches in a case insensitive way. After that you can use all the supported string methods like Contains or StartsWith.
This example will generate:
{
"FirstName" : /hamster/is
}
The i option makes it case insensitive.
I've just implemented this much simpler than any of the other suggestions. However I realise due to the age of this question, this functionality may not have been available at the time.
Use the options of the Bson Regular Expression constructor to pass in case insensitivity. I just had a look at the source code and found that 'i' is all you need. For example.
var regexFilter = Regex.Escape(filter);
var bsonRegex = new BsonRegularExpression(regexFilter, "i");
Query.Matches("MyField", bsonRegex);
You shouldn't have to keep records twice for searching.
try to use something like this:
Query.Matches("FieldName", BsonRegularExpression.Create(new Regex(searchKey, RegexOptions.IgnoreCase)))
You will probably have to store the field twice, once with its real value, and again in all lowercase. You can then query the lowercased version for case-insensitive search (don't forget to also lowercase the query string).
This approach works (or is necessary) for many database systems, and it should perform better than regular expression based techniques (at least for prefix or exact matching).
As i3arnon answered, you can use Queryable to do a case insensitive comparison/search. What i found out was, that i could not use string.Equals() method, because is it not supported. If you need to do a comparison, Contains() will unfortunately not be suitable which kept me struggling for a solution, for quite some time.
For anyone wanting to do a string comparison, simply use == instead of .Equals().
Code:
var names = namesCollection.AsQueryable().Where(name =>
name.FirstName.ToLower() == name.ToLower());
For MongoDB 3.4+ the recommended way is to use indexes.
See https://jira.mongodb.org/browse/DOCS-11105?focusedCommentId=1859745&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-1859745
I am successfully searching with case insensitive by:
1. Creating an index with Collation for a locale (e.g: "en") and with a strength of 1 or 2. See https://docs.mongodb.com/manual/core/index-case-insensitive/ for further details
Using the same Collation when performing searches on the MongoDb collection.
As an example:
Create a collation with strength 1 or 2 for case insensitive
private readonly Collation _caseInsensitiveCollation = new Collation("en", strength: CollationStrength.Primary);
Create an index. In my case I index several fields:
private void CreateIndex()
{
var indexOptions = new CreateIndexOptions {Collation = _caseInsensitiveCollation};
var indexDefinition
= Builders<MyDto>.IndexKeys.Combine(
Builders<MyDto>.IndexKeys.Ascending(x => x.Foo),
Builders<MyDto>.IndexKeys.Ascending(x => x.Bar));
_myCollection.Indexes.CreateOne(indexDefinition, indexOptions);
}
When querying make sure you use the same Collation:
public IEnumerable<MyDto> GetItems()
{
var anyFilter = GetQueryFilter();
var anySort = sortBuilder.Descending(x => x.StartsOn);
var findOptions = new FindOptions {Collation = _caseInsensitiveCollation};
var result = _salesFeeRules
.Find(anyFilter, findOptions)
.Sort(anySort)
.ToList();
return result;
}
You can also use MongoDB's built in filters. It may make it easier for using some of mongo's methods.
var filter = Builders<Model>.Filter.Where(p => p.PropertyName.ToLower().Contains(s.ToLower()));
var list = collection.Find(filter).Sort(mySort).ToList();
The easiest way for MongoDB 3.4+ is to use one of ICU Comparison Levels
return await Collection()
.Find(filter, new FindOptions { Collation = new Collation("en", strength: CollationStrength.Primary) })
.ToListAsync();
More info https://docs.mongodb.com/manual/reference/method/cursor.collation/index.html
In case anyone else wondering, using fluent-mongo add-on, you can use Linq to query like that:
public User FindByEmail(Email email)
{
return session.GetCollection<User>().AsQueryable()
.Where(u => u.EmailAddress.ToLower() == email.Address.ToLower()).FirstOrDefault();
}
Which results in correct JS-query. Unfortunately, String.Equals() isn't supported yet.
A way to do it is to use the MongoDB.Bson.BsonJavaScript class as shown below
store.FindAs<Property>(Query.Where(BsonJavaScript.Create(string.Format("this.City.toLowerCase().indexOf('{0}') >= 0", filter.City.ToLower()))));
this is exact text search and case insensitive (see this link).
{ “FieldName” : /^keywordHere$/i }

Categories

Resources