In Luke, the following search expression returns 23 results:
docurl:www.siteurl.com docfile:Tomatoes*
If I pass this same expression into my C# Lucene.NET app with the following implementation:
IndexReader reader = IndexReader.Open(indexName);
Searcher searcher = new IndexSearcher(reader);
try
{
QueryParser parser = new QueryParser("docurl", new StandardAnalyzer());
BooleanQuery bquery = new BooleanQuery();
Query parsedQuery = parser.Parse(query);
bquery.Add(parsedQuery, Lucene.Net.Search.BooleanClause.Occur.MUST);
int _max = searcher.MaxDoc();
BooleanQuery.SetMaxClauseCount(Int32.MaxValue);
TopDocs hits = searcher.Search(parsedQuery, _max)
...
}
I get 0 results
Luke is using StandardAnalyzer and this is what the Explain Structure window looks like:
Must I manually create BooleanClause objects for each field I search on, specifying Should for each one then add them to the BooleanQuery object with .Add()? I thought the QueryParser would do this for me. What am I missing?
Edit:
Simplifying a tad, docfile:Tomatoes* returns 23 docs in Luke, yet 0 in my app. Per Gene's suggestion, I've changed from MUST to SHOULD:
QueryParser parser = new QueryParser("docurl", new StandardAnalyzer());
BooleanQuery bquery = new BooleanQuery();
Query parsedQuery = parser.Parse(query);
bquery.Add(parsedQuery, Lucene.Net.Search.BooleanClause.Occur.SHOULD);
int _max = searcher.MaxDoc();
BooleanQuery.SetMaxClauseCount(Int32.MaxValue);
TopDocs hits = searcher.Search(parsedQuery, _max);
parsedQuery is simply docfile:tomatoes*
Edit2:
I think I've finally gotten to the root problem:
QueryParser parser = new QueryParser("docurl", new StandardAnalyzer());
Query parsedQuery = parser.Parse(query);
In the second line, query is "docfile:Tomatoes*", but parsedQuery is {docfile:tomatoes*}. Notice the difference? Lower case 't' in the parsed query. I never noticed this before. If I change the value in the IDE to 'T', 23 results return.
I've verified that StandardAnalyzer is being used when indexing and reading the index. How do I force queryParser to keep the case of the value of query?
Edit3:
Wow, how frustrating. According to the documentation, I can accomplish this with:
parser.setLowercaseExpandedTerms(false);
Whether terms of wildcard, prefix,
fuzzy and range queries are to be
automatically lower-cased or not.
Default is true.
I won't argue whether that's a sensible default or not. I suppose SimpleAnalyzer should have been used to lowercase everything in and out of the index. The frustrating part is, at least with the version I'm using, Luke defaults the other way! At least I learned a bit more about Lucene.
Using Occur.MUST is equivalent to using the + operator with the standard query parser. Thus you code is evaluating +docurl:www.siteurl.com +docfile:Tomatoes* rather than the expression you typed into Luke. To get that behavior, try Occur.SHOULD when adding your clauses.
QueryParser will indeed take a query like "docurl:www.siteurl.com docfile:Tomatoes*" and build a proper query out of it (boolean query, range query, etc.) depending on the query given (see query syntax).
Your first step should be to attach a debugger and inspect the value and type of parsedQuery.
Related
I am using MongoDB.Drivers nuget package in my MVC (C#) web application to communication with MongoDB database. Now, I want to fetch data based on specific column and it's value. I used below code to fetch data.
var findValue = "John";
var clientTest1 = new MongoClient("mongodb://localhost:XXXXX");
var dbTest1 = clientTest1.GetDatabase("Temp_DB");
var empCollection = dbTest1.GetCollection<Employee>("Employee");
var builder1 = Builders<Employee>.Filter;
var filter1 = builder1.Empty;
var regexFilter = new BsonRegularExpression(findValue, "i");
filter1 = filter1 & builder1.Regex(x => x.FirstName, regexFilter);
filter1 = filter1 & builder1.Eq(x => x.IsDeleted,false);
var collectionObj = await empCollection.FindAsync(filter1);
var dorObj = collectionObj.FirstOrDefault();
But, the above code is performing like query.
It means it is working as (select * from Employee where FirstName like '%John%') I don't want this. I want to fetch only those data whose FirstName value should match exact. (like in this case FirstName should equal John).
How can I perform this, can anyone provide me suggestions on this.
Note: I used new BsonRegularExpression(findValue, "i") to make search case-insensitive.
Any help would be highly appreciated.
Thanks
I would recommend storing a normalized version of your data, and index/search upon that. It will likely be considerably faster than using regex. Sure, you'll eat up a little more storage space by including "john" alongside "John", but your data access will be faster since you would just be able to use a standard $eq query.
If you insist on regex, I recommend using ^ (start of line) and $ (end of line) around your search term. Remember though, that you should escape your find value so that its contents isn't treated as RegEx.
This should work:
string escapedFindValue = System.Text.RegularExpressions.Regex.Escape(findValue);
new BsonRegularExpression(string.Format("^{0}$", escapedFindValue), "i");
Or if you're using a newer framework version, you can use string interpolation:
string escapedFindValue = System.Text.RegularExpressions.Regex.Escape(findValue);
new BsonRegularExpression($"^{escapedFindValue}$", "i");
I found this code below in a file called Filter.cs in a project created with Microsoft App Studio. Although I am a veteran C# programmer, I'm short on experience with LINQ predicate expression builders. I can tell that the code below it is "meta-logic" for flexibly building a query given a list of filter predicates containing type field info and a set of data values to inject into the sub-expressions. What I can't figure out is how the "expression" variable in the following statement:
query = query.Where(expression).AsQueryable()"
.. is concatenating the per-field expressions into a more complex query expression that is finally executed at the end of the code to create the ObservableCollection result. If it was "query +=" I could infer a chaining action like an Event handler field, but as a straight assignment statement it baffles me since I would expect it to replace the last value the expression variable got from the last loop iteration, thereby resetting it in the process and losing its previous value(s). What is going on here?
public class Filter<T>
{
public static ObservableCollection<T> FilterCollection(
FilterSpecification filter, IEnumerable<T> data)
{
IQueryable<T> query = data.AsQueryable();
foreach (var predicate in filter.Predicates)
{
Func<T, bool> expression;
var predicateAux = predicate;
switch (predicate.Operator)
{
case ColumnOperatorEnum.Contains:
expression = x => predicateAux.GetFieldValue(x).ToLower().Contains(predicateAux.Value.ToString().ToLower());
break;
case ColumnOperatorEnum.StartsWith:
expression = x => predicateAux.GetFieldValue(x).ToLower().StartsWith(predicateAux.Value.ToString().ToLower());
break;
case ColumnOperatorEnum.GreaterThan:
expression = x => String.Compare(predicateAux.GetFieldValue(x).ToLower(), predicateAux.Value.ToString().ToLower(), StringComparison.Ordinal) > 0;
break;
case ColumnOperatorEnum.LessThan:
expression = x => String.Compare(predicateAux.GetFieldValue(x).ToLower(), predicateAux.Value.ToString().ToLower(), StringComparison.Ordinal) < 0;
break;
case ColumnOperatorEnum.NotEquals:
expression = x => !predicateAux.GetFieldValue(x).Equals(predicateAux.Value.ToString(), StringComparison.InvariantCultureIgnoreCase);
break;
default:
expression = x => predicateAux.GetFieldValue(x).Equals(predicateAux.Value.ToString(), StringComparison.InvariantCultureIgnoreCase);
break;
}
// Why doesn't this assignment wipe out the expression function value from the last loop iteration?
query = query.Where(expression).AsQueryable();
}
return new ObservableCollection<T>(query);
}
My understanding is that you are having trouble understanding why this line executed in a loop
query = query.Where(expression).AsQueryable();
produces an effect similar to "concatenation" of expressions. A short answer is that it is similar to why
str = str + suffix;
produces a longer string, even though it is an assignment.
A longer answer is that the loop is building an expression one predicate at a time, and appends a Where to the sequence of conditions. Even though it is an assignment, it is built from the previous state of the object, so the previous expression is not "lost", because it is used as a base of a bigger, more complex, filter expression.
To understand it better, imagine that the individual expressions produced by the switch statement are placed into an array of IQueryable objects, instead of being appended to query. Once the array of parts is built, you would be able to do this:
var query = data.AsQueryable()
.Where(parts[0]).AsQueryable()
.Where(parts[1]).AsQueryable()
...
.Where(parts[N]).AsQueryable();
Now observe that each parts[i] is used only once; after that, it is no longer needed. That is why you can build the chain of expressions incrementally in a loop: after the first iteration, query contains a chain that includes the first term; after the second iteration, it contains two first terms, and so on.
It doesn't "wipe it out" since it is chaining. It's handling it by assigning back to query. It's effectively like writing:
var queryTmp = query;
query = queryTmp.Where(expression).AsQueryable();
Each time you call .Where(expression).AsQueryable(), a new IQueryable<T> is being returned, and set to query. This IQueryable<T> is the result of the last .Where call. This means you effectively get a query that looks like:
query.Where(expression1).AsQueryable().Where(expression2).AsQueryable()...
Code essentially generates sequence of Where/AsQueryable calls. Not sure why you expect each loop to append expressions.
Essentially result is
query = query
.Where(expression0).AsQueryable()
.Where(expression1).AsQueryable()
.Where(expression2).AsQueryable()
where I think you expect more like
query = query
.Where(v => expression0(v) && expression1(v) && expression2(v) ...).AsQueryable()
The query variable name is a bit misleading. This code doesn't build up a long filter in the expression variable and then run it against the data set - it runs each filter against the data set, one at a time, until all of the filters have been run. The query variable just contains everything from the data that is left over from the previously run filters.
So this line:
query = query.Where(expression).AsQueryable();
is applying the filter to the existing data stored in query, and then saving the new (filtered) result back into the variable. The value of expression is overwritten each time through the loop, but we don't need it anymore because the filter has already been applied.
Is there any way to use the LINQ dynamic query library (System.Linq.Dynamic) to evaluate a condition based on the properties of an ExpandoObject? The following code throws an exception on the "var e..." line, saying "No property or field 'Weight' exists in type ExpandoObject":-
const string TestCondition = "MyStateBag.Foo >= 50 && MyStateBag.Bar >= 100";
dynamic myStateBag = new ExpandoObject();
myStateBag.Foo = 70;
myStateBag.Bar = 100;
var p = Expression.Parameter(typeof(ExpandoObject), "MyStateBag");
var e = DynamicExpression.ParseLambda(new[] { p }, null, TestCondition);
var result = e.Compile().DynamicInvoke(myStateBag);
Assert.IsTrue(result);
The alternative would be to implement the "statebag" as a dictionary, but this will result in a slightly more verbose condition string, e.g. MyStateBag["Foo"] >= 50 && MyStateBag["Bar"] >= 100. As this is going to be used as the basis of a user scripting environment, I would prefer the simpler ExpandoObject syntax if it's possible to achieve.
Not directly. The dynamic LINQ library boils down to an expression-tree, and expression trees do not directly support dynamic. Most likely, the dynamic query library is using Expression.PropertyOrField to handle .Foo etc, and that will not work with dynamic.
You could perhaps write a custom expression parser that replaces this with lots of lookup code if it finds the parameter is a dictionary; not fun, though.
I am querying lucene index via nhibernate.search using code below:
var fts = NHibernate.Search.Search.CreateFullTextSession(this._session);
var luceneQuery = "Search:name~0.7 AND Moderated:true NOT PlaceType:WrongType";
var places = fts.CreateFullTextQuery<Place>(luceneQuery)
.List<Place>();
The problem is that query returns all types of Places, including WrongType. When I try to run the same query against the same index in Luke everything is ok, Places of type WrongType are not returned.
Search field is concatenation of many fields in Place object. I am using Moderated and PlaceType fields to filter out some records, as I have discovered, that in this way original sorting order (by score) from Lucene query is preserved.
How can I exclude Places by PlaceType from results using NHibernate.Search?
Ok, so I have found solution.
I have indexed all fields using WhiteSpaceAnalyzer. It seems that NHibernate.Search is using StandardAnalyzer by default, regardless from the fact, that I have set global AnalyzerClass to WhiteSpaceAnalyzer. After parsing the query it looked like that:
"+Search:name~0.7 +Moderated:true -PlaceType:wrongtype"
which didn't work, because values in PlaceType field were not lowercased.
Changing the code in the question to something like that:
var fts = NHibernate.Search.Search.CreateFullTextSession(this._session);
var queryParser = new QueryParser("text", new WhitespaceAnalyzer());
var luceneQuery = "Search:name~0.7 AND Moderated:true NOT PlaceType:WrongType";
var query = queryParser.Parse(luceneQuery);
var places = fts.CreateFullTextQuery(query, typeof(Place))
.List<Place>();
solved the situation.
I am using MongoDB and the C# driver for MongoDB.
I recently discovered that all queries in MongoDB are case-sensitive. How can I make a case-insensitive search?
I found one way to do this:
Query.Matches(
"FirstName",
BsonRegularExpression.Create(new Regex(searchKey,RegexOptions.IgnoreCase)));
The simplest and safest way to do that is using Linq:
var names = namesCollection.AsQueryable().Where(name =>
name.FirstName.ToLower().Contains("hamster"));
As explained in the tutorial ToLower, ToLowerInvariant, ToUpper and ToUpperInvariant all perform matches in a case insensitive way. After that you can use all the supported string methods like Contains or StartsWith.
This example will generate:
{
"FirstName" : /hamster/is
}
The i option makes it case insensitive.
I've just implemented this much simpler than any of the other suggestions. However I realise due to the age of this question, this functionality may not have been available at the time.
Use the options of the Bson Regular Expression constructor to pass in case insensitivity. I just had a look at the source code and found that 'i' is all you need. For example.
var regexFilter = Regex.Escape(filter);
var bsonRegex = new BsonRegularExpression(regexFilter, "i");
Query.Matches("MyField", bsonRegex);
You shouldn't have to keep records twice for searching.
try to use something like this:
Query.Matches("FieldName", BsonRegularExpression.Create(new Regex(searchKey, RegexOptions.IgnoreCase)))
You will probably have to store the field twice, once with its real value, and again in all lowercase. You can then query the lowercased version for case-insensitive search (don't forget to also lowercase the query string).
This approach works (or is necessary) for many database systems, and it should perform better than regular expression based techniques (at least for prefix or exact matching).
As i3arnon answered, you can use Queryable to do a case insensitive comparison/search. What i found out was, that i could not use string.Equals() method, because is it not supported. If you need to do a comparison, Contains() will unfortunately not be suitable which kept me struggling for a solution, for quite some time.
For anyone wanting to do a string comparison, simply use == instead of .Equals().
Code:
var names = namesCollection.AsQueryable().Where(name =>
name.FirstName.ToLower() == name.ToLower());
For MongoDB 3.4+ the recommended way is to use indexes.
See https://jira.mongodb.org/browse/DOCS-11105?focusedCommentId=1859745&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-1859745
I am successfully searching with case insensitive by:
1. Creating an index with Collation for a locale (e.g: "en") and with a strength of 1 or 2. See https://docs.mongodb.com/manual/core/index-case-insensitive/ for further details
Using the same Collation when performing searches on the MongoDb collection.
As an example:
Create a collation with strength 1 or 2 for case insensitive
private readonly Collation _caseInsensitiveCollation = new Collation("en", strength: CollationStrength.Primary);
Create an index. In my case I index several fields:
private void CreateIndex()
{
var indexOptions = new CreateIndexOptions {Collation = _caseInsensitiveCollation};
var indexDefinition
= Builders<MyDto>.IndexKeys.Combine(
Builders<MyDto>.IndexKeys.Ascending(x => x.Foo),
Builders<MyDto>.IndexKeys.Ascending(x => x.Bar));
_myCollection.Indexes.CreateOne(indexDefinition, indexOptions);
}
When querying make sure you use the same Collation:
public IEnumerable<MyDto> GetItems()
{
var anyFilter = GetQueryFilter();
var anySort = sortBuilder.Descending(x => x.StartsOn);
var findOptions = new FindOptions {Collation = _caseInsensitiveCollation};
var result = _salesFeeRules
.Find(anyFilter, findOptions)
.Sort(anySort)
.ToList();
return result;
}
You can also use MongoDB's built in filters. It may make it easier for using some of mongo's methods.
var filter = Builders<Model>.Filter.Where(p => p.PropertyName.ToLower().Contains(s.ToLower()));
var list = collection.Find(filter).Sort(mySort).ToList();
The easiest way for MongoDB 3.4+ is to use one of ICU Comparison Levels
return await Collection()
.Find(filter, new FindOptions { Collation = new Collation("en", strength: CollationStrength.Primary) })
.ToListAsync();
More info https://docs.mongodb.com/manual/reference/method/cursor.collation/index.html
In case anyone else wondering, using fluent-mongo add-on, you can use Linq to query like that:
public User FindByEmail(Email email)
{
return session.GetCollection<User>().AsQueryable()
.Where(u => u.EmailAddress.ToLower() == email.Address.ToLower()).FirstOrDefault();
}
Which results in correct JS-query. Unfortunately, String.Equals() isn't supported yet.
A way to do it is to use the MongoDB.Bson.BsonJavaScript class as shown below
store.FindAs<Property>(Query.Where(BsonJavaScript.Create(string.Format("this.City.toLowerCase().indexOf('{0}') >= 0", filter.City.ToLower()))));
this is exact text search and case insensitive (see this link).
{ “FieldName” : /^keywordHere$/i }