Find method using Full-Text search to be case and diacritic insensitive - c#

Using the MongoDB driver for the .NET framework, I need to develop a way of creating a full-text search method that ignores string cases and diacritics.
I've already tried using a regex expression for that, but it does not work for strings that have accents and weird casings (diacritics).
var filterDefinition = new BsonDocument("$expr", new BsonDocument("$regexMatch", new BsonDocument
{
{ "input", new BsonDocument("$toString", "$field") },
{ "regex", searchTerm },
{ "options", "i" }
}));
If I create a text index that ignores the diacritics, the full-text search method doesn't work.
db.BsonDocument.createIndex({name: 'text'},{default_language: 'pt'})
var filter = Builders<BsonDocument>.Filter.Text(searchTerm);
Is there any way of combining these two approaches into something that meets the desired criteria?

Related

MongoDb ToString Version 3.2

at the moment I am using Mongo C# driver. At the moment, I have a search functionality. It searches through all the columns and converts it to string and applies a regex search. Unfortunately the ToString function was only introduced in 4.0 and not introduced in 3.2 and all our mongo servers currently use 3.2
Because ToString isn't present, I'm having some troubles searching for integer values that contain the searchText
foreach (var column in columns)
{
var regexFilter = new BsonDocument("$expr",
new BsonDocument("$regexMatch",
new BsonDocument
{
{"input", new BsonDocument("$toString", $"${column.name}") },
{"regex", searchString},
{"options", "i" }
})
);
}
Any suggestions for searching if an integer contains a certaint text would be greatly appreciated.

MongoDB C# Case Insensitive Sort and Index

So far I've been using this code to find my documents and then sort them:
var options = new FindOptions
{
Modifiers = new BsonDocument("$hint", "PathTypeFilenameIndex")
};
return await Collection
.Find(f => f.Metadata["path"] == path, options)
.SortBy(f => f.Metadata["type"])
.ThenBy(f => f.Filename)
.ToListAsync();
I have a class that has Metadata field with path and type fields, also the class has a Filename field. I want all documents with a given path inside the metadata sorted by type and then by Filename.
An example result would be a list of documents ordered by the Name field like this:
a, Ab, B, c, D
Unfortunately, I get something like this:
Ab, B, D, a, c
And that's because MongoDB sorts the data with a simple binary comparison, where 'A' < 'a' because of their ASCII codes.
So my question is: Is there a way to make a case insensitive sort and keep using the "$hint"?
That options I pass to the Find method should tell MongoDB which index to use. I found this post: MongoDB and C#: Case insensitive search but the method here doesn't work for sorting and I couldn't tell MongoDB which index to use.
I think you can use aggregation pipeline with $addFields, $toLower (to convert filename to lowercase in temporary field), and $sort to sort them irrespective of the case
In mongodb shell you would write something like this :
db.collection.aggregate([{
$addFields : {
"lowercaseFileName" : {
$loLower : "$fileName"
}
},{
$sort : {
"metadata.type" : 1,
lowercaseFileName : 1
}
}
}])
Please write the similar code in c#, and see if it works. I dont know c#, otherwise i would have given you the exact query, but i cant.
The idea is to transform the filename to lowercase, save it in temporary field, using addFields and sort by that field.
Hope this helps you out.
Read more about $addFields, $toLower here.
Update
For whoever wants a working code in C# , thanks to #kaloyan-manev
You can use this :
return await Collection.Aggregate()
.Match(f => f.Metadata["path"] == path)
.AppendStage<BsonDocument>(new BsonDocument("$addFields", new BsonDocument("lowercaseFileName", new BsonDocument("$toLower", "$filename"))))
.AppendStage<GridFSFileInfo>(new BsonDocument("$sort", new BsonDocument { {"metadata.type", 1}, {"lowercaseFileName", 1} }))
.ToListAsync();
Did you try to set the CollationStrenght = 2?
Your code would be similar all you need is to set the Collation in the FindObject:
var options = new FindOptions
{
Modifiers = new BsonDocument("$hint", "PathTypeFilenameIndex"),
Collation = new Collation("en", strength: CollationStrength.Secondary)
};

How to fetch data from MongoDB collection in C# using Regular Expression?

I am using MongoDB.Drivers nuget package in my MVC (C#) web application to communication with MongoDB database. Now, I want to fetch data based on specific column and it's value. I used below code to fetch data.
var findValue = "John";
var clientTest1 = new MongoClient("mongodb://localhost:XXXXX");
var dbTest1 = clientTest1.GetDatabase("Temp_DB");
var empCollection = dbTest1.GetCollection<Employee>("Employee");
var builder1 = Builders<Employee>.Filter;
var filter1 = builder1.Empty;
var regexFilter = new BsonRegularExpression(findValue, "i");
filter1 = filter1 & builder1.Regex(x => x.FirstName, regexFilter);
filter1 = filter1 & builder1.Eq(x => x.IsDeleted,false);
var collectionObj = await empCollection.FindAsync(filter1);
var dorObj = collectionObj.FirstOrDefault();
But, the above code is performing like query.
It means it is working as (select * from Employee where FirstName like '%John%') I don't want this. I want to fetch only those data whose FirstName value should match exact. (like in this case FirstName should equal John).
How can I perform this, can anyone provide me suggestions on this.
Note: I used new BsonRegularExpression(findValue, "i") to make search case-insensitive.
Any help would be highly appreciated.
Thanks
I would recommend storing a normalized version of your data, and index/search upon that. It will likely be considerably faster than using regex. Sure, you'll eat up a little more storage space by including "john" alongside "John", but your data access will be faster since you would just be able to use a standard $eq query.
If you insist on regex, I recommend using ^ (start of line) and $ (end of line) around your search term. Remember though, that you should escape your find value so that its contents isn't treated as RegEx.
This should work:
string escapedFindValue = System.Text.RegularExpressions.Regex.Escape(findValue);
new BsonRegularExpression(string.Format("^{0}$", escapedFindValue), "i");
Or if you're using a newer framework version, you can use string interpolation:
string escapedFindValue = System.Text.RegularExpressions.Regex.Escape(findValue);
new BsonRegularExpression($"^{escapedFindValue}$", "i");

MongoDB C# Driver multiple field query

Using the MongoDB C# driver How can I include more than one field in the query (Im using vb.net)
I know how to do (for name1=value1)
Dim qry = Query.EQ("name1","value1")
How can I modify this query so I can make it find all documents where name1=value1 and name2=value2?
( Similar to )
db.collection.find({"name1":"value1","name2":"value2"})
I wanted to search a text in different fields and Full Text Search doesn't work for me even after wasting so much time. so I tried this.
var filter = Builders<Book>.Filter.Or(
Builders<Book>.Filter.Where(p=>p.Title.ToLower().Contains(queryText.ToLower())),
Builders<Book>.Filter.Where(p => p.Publisher.ToLower().Contains(queryText.ToLower())),
Builders<Book>.Filter.Where(p => p.Description.ToLower().Contains(queryText.ToLower()))
);
List<Book> books = Collection.Find(filter).ToList();
You can use:
var arrayFilter = Builders<BsonDocument>.Filter.Eq("student_id", 10000)
& Builders<BsonDocument>.Filter.Eq("scores.type", "quiz");
Reference: https://www.mongodb.com/blog/post/quick-start-csharp-and-mongodb--update-operation
And doesn't always do what you want (as I found was the case when doing a not operation on top of an and). You can also create a new QueryDocument, as shown below. This is exactly the equivalent of what you were looking for.
Query.Not(new QueryDocument {
{ "Results.Instance", instance },
{ "Results.User", user.Email } }))

MongoDB and C#: Case insensitive search

I am using MongoDB and the C# driver for MongoDB.
I recently discovered that all queries in MongoDB are case-sensitive. How can I make a case-insensitive search?
I found one way to do this:
Query.Matches(
"FirstName",
BsonRegularExpression.Create(new Regex(searchKey,RegexOptions.IgnoreCase)));
The simplest and safest way to do that is using Linq:
var names = namesCollection.AsQueryable().Where(name =>
name.FirstName.ToLower().Contains("hamster"));
As explained in the tutorial ToLower, ToLowerInvariant, ToUpper and ToUpperInvariant all perform matches in a case insensitive way. After that you can use all the supported string methods like Contains or StartsWith.
This example will generate:
{
"FirstName" : /hamster/is
}
The i option makes it case insensitive.
I've just implemented this much simpler than any of the other suggestions. However I realise due to the age of this question, this functionality may not have been available at the time.
Use the options of the Bson Regular Expression constructor to pass in case insensitivity. I just had a look at the source code and found that 'i' is all you need. For example.
var regexFilter = Regex.Escape(filter);
var bsonRegex = new BsonRegularExpression(regexFilter, "i");
Query.Matches("MyField", bsonRegex);
You shouldn't have to keep records twice for searching.
try to use something like this:
Query.Matches("FieldName", BsonRegularExpression.Create(new Regex(searchKey, RegexOptions.IgnoreCase)))
You will probably have to store the field twice, once with its real value, and again in all lowercase. You can then query the lowercased version for case-insensitive search (don't forget to also lowercase the query string).
This approach works (or is necessary) for many database systems, and it should perform better than regular expression based techniques (at least for prefix or exact matching).
As i3arnon answered, you can use Queryable to do a case insensitive comparison/search. What i found out was, that i could not use string.Equals() method, because is it not supported. If you need to do a comparison, Contains() will unfortunately not be suitable which kept me struggling for a solution, for quite some time.
For anyone wanting to do a string comparison, simply use == instead of .Equals().
Code:
var names = namesCollection.AsQueryable().Where(name =>
name.FirstName.ToLower() == name.ToLower());
For MongoDB 3.4+ the recommended way is to use indexes.
See https://jira.mongodb.org/browse/DOCS-11105?focusedCommentId=1859745&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-1859745
I am successfully searching with case insensitive by:
1. Creating an index with Collation for a locale (e.g: "en") and with a strength of 1 or 2. See https://docs.mongodb.com/manual/core/index-case-insensitive/ for further details
Using the same Collation when performing searches on the MongoDb collection.
As an example:
Create a collation with strength 1 or 2 for case insensitive
private readonly Collation _caseInsensitiveCollation = new Collation("en", strength: CollationStrength.Primary);
Create an index. In my case I index several fields:
private void CreateIndex()
{
var indexOptions = new CreateIndexOptions {Collation = _caseInsensitiveCollation};
var indexDefinition
= Builders<MyDto>.IndexKeys.Combine(
Builders<MyDto>.IndexKeys.Ascending(x => x.Foo),
Builders<MyDto>.IndexKeys.Ascending(x => x.Bar));
_myCollection.Indexes.CreateOne(indexDefinition, indexOptions);
}
When querying make sure you use the same Collation:
public IEnumerable<MyDto> GetItems()
{
var anyFilter = GetQueryFilter();
var anySort = sortBuilder.Descending(x => x.StartsOn);
var findOptions = new FindOptions {Collation = _caseInsensitiveCollation};
var result = _salesFeeRules
.Find(anyFilter, findOptions)
.Sort(anySort)
.ToList();
return result;
}
You can also use MongoDB's built in filters. It may make it easier for using some of mongo's methods.
var filter = Builders<Model>.Filter.Where(p => p.PropertyName.ToLower().Contains(s.ToLower()));
var list = collection.Find(filter).Sort(mySort).ToList();
The easiest way for MongoDB 3.4+ is to use one of ICU Comparison Levels
return await Collection()
.Find(filter, new FindOptions { Collation = new Collation("en", strength: CollationStrength.Primary) })
.ToListAsync();
More info https://docs.mongodb.com/manual/reference/method/cursor.collation/index.html
In case anyone else wondering, using fluent-mongo add-on, you can use Linq to query like that:
public User FindByEmail(Email email)
{
return session.GetCollection<User>().AsQueryable()
.Where(u => u.EmailAddress.ToLower() == email.Address.ToLower()).FirstOrDefault();
}
Which results in correct JS-query. Unfortunately, String.Equals() isn't supported yet.
A way to do it is to use the MongoDB.Bson.BsonJavaScript class as shown below
store.FindAs<Property>(Query.Where(BsonJavaScript.Create(string.Format("this.City.toLowerCase().indexOf('{0}') >= 0", filter.City.ToLower()))));
this is exact text search and case insensitive (see this link).
{ “FieldName” : /^keywordHere$/i }

Categories

Resources