How do I property roll up and/or combinations in NEST? - c#

Background
What I'm Trying to Do
I have a list of vehicles.
I have an API (WebAPI v2) that takes in a list of filters for a make and models
a filter consists of 1 make and 0 or more models. (e.g. "Honda" and ["Civic", "Accord"])
If a filter is passed in with a make and no models, I want it to match all models for that make.
If a filter is passed in with a make and models, I want it to make only those models for that make.
The Filter Object I'm using
public class MakeModelFilter : IMakeModelFilter
{
public string Make { get; set; }
public List<string> Models { get; set; }
}
What the entire API Call Looks Like
{
"MakeModelFilters": [
{"Make": "BMW", "Models": ["X3", "X5"]}
],
"TypeFilter": [],
"GenericColorFilter": [],
"FeaturesFilter": [],
"MaxMileage" : 100000,
"PriceRange": {"Min": 1, "Max": 1000000},
"SearchText": ""
}
The portion I'm concerned with is the MakeAndModelFilters list (the rest works as designed currently).
How I'm currently obtaining search results:
var vehicles = _esClient.Search<Vehicle>(s => s
.From(0).Size(10000)
.Query(q => q
.Filtered(fq => fq
.Filter(ff => ff
.Bool(b => b
.Must(m=> m.And(
m.Or(makeModelFilterList.ToArray()),
m.Or(featureFilters.ToArray()),
m.Or(typeFilters.ToArray()),
priceRangeFilter,
mileageFilter))
)
)
.Query(qq => qq
.QueryString(qs => qs.Query(criteria.SearchText))
)
)
)
);
The Problem
No matter how I structure the filter, it seems to filter out all documents -- not in our best interest. :) Something in my boolean logic is wrong.
Where I think the problem lies
The list of make and model filters that I or together is generated by this method:
private List<FilterContainer> GenerateMakeModelFilter(List<MakeModelFilter> makeModelFilters)
{
var filterList = new List<FilterContainer>();
foreach (var filter in makeModelFilters)
{
filterList.Add(GenerateMakeModelFilter(filter));
}
return filterList;
}
This method calls the individual method to generate a bool for each make/model filter I have.
What I think the problem method is
The below method, as far as I'm aware, does the following:
If no make is passed in, throw exception
If only a make is passed in, return a bool for only that make.
If a make and models are passed in, return an a bool of the make filter + an or of all the model terms. e.g. Make:BMW AND (model:X3 OR model:X5)
Code is below:
private FilterContainer GenerateMakeModelFilter(MakeModelFilter makeModelFilter)
{
if (string.IsNullOrWhiteSpace(makeModelFilter.Make)) { throw new ArgumentNullException(nameof(makeModelFilter));}
var makeFilter = new TermFilter { Field = Property.Path<Vehicle>(it => it.Make), Value = makeModelFilter.Make };
var boolMake = new BoolFilter { Must = new List<FilterContainer> { makeFilter } };
var modelFilters = GenerateFilterList(Property.Path<Vehicle>(it => it.Model), makeModelFilter.Models);
if (!modelFilters.Any())
{
// If it has a make but no model, generate boolFilter make only.
return boolMake;
}
var orModels = new OrFilter {Filters = modelFilters};
var boolModels = new BoolFilter {Must = new List<FilterContainer> {orModels}};
var boolMakeAndModels = new AndFilter {Filters = new List<FilterContainer> {boolMake, boolModels}};
return new BoolFilter {Must = new List<FilterContainer> {boolMakeAndModels}};
}
FYI, GenerateFilterList just creates a list of Term filters and returns the list.
FYI: Generated ElasticSearch JSON
This might be a clue to where I'm going wrong (though it's huge). I've just been staring at it so long that I can't see it I think.
{
"from": 0,
"size": 10000,
"query": {
"filtered": {
"filter": {
"bool": {
"must": [
{
"and": {
"filters": [
{
"or": {
"filters": [
{
"bool": {
"must": [
{
"and": {
"filters": [
{
"bool": {
"must": [
{
"term": {
"make": "BMW"
}
}
]
}
},
{
"bool": {
"must": [
{
"or": {
"filters": [
{
"term": {
"model": "x3"
}
},
{
"term": {
"model": "x5"
}
}
]
}
}
]
}
}
]
}
}
]
}
}
]
}
},
{ },
{ },
{
"range": {
"sellingPriceUSD": {
"lte": "1000000",
"gte": "1"
}
}
},
{
"range": {
"miles": {
"lte": "100000"
}
}
}
]
}
}
]
}
}
}
}
}
Refactor 1: Move more Towards Bitwise operations
Per Martijn's answer and Zachary's post that he references, I've updated my GenerateFilterList to return a concatenated filterContainer:
private FilterContainer GenerateFilterList(PropertyPathMarker path, List<string> filter)
{
if (filter == null || filter.Count <= 0){ return null; }
FilterContainer returnFilter = null;
foreach (var aFilter in filter)
{
returnFilter |= new TermFilter {Field = path, Value = aFilter.ToLowerInvariant()};
}
return returnFilter;
}
And then for my GenerateMakeModelFilter, I perform an "and" against the "model filters", which should be a bitwise or based on the above code:
private FilterContainer GenerateMakeModelFilter(MakeModelFilter makeModelFilter)
{
if (string.IsNullOrWhiteSpace(makeModelFilter.Make)) { throw new ArgumentNullException(nameof(makeModelFilter)); }
var makeFilter = new TermFilter { Field = Property.Path<Vehicle>(it => it.Make), Value = makeModelFilter.Make };
var modelFilters = GenerateFilterList(Property.Path<Vehicle>(it => it.Model), makeModelFilter.Models);
return makeFilter && modelFilters;
}
This shortens the part that retrieves the query:
QueryContainer textQuery = new QueryStringQuery() {Query = criteria.SearchText };
FilterContainer boolFilter = makeModelFilter || featureFilter || typeFilter || priceRangeFilter || mileageFilter;
var vehicles = _esClient.Search<Vehicle>(s => s
.From(0).Size(10000) //TODO: Extract this into a constant or setting in case the inventory grows to 10k+. This prevents it from paging.
.Query(q => q
.Filtered(fq => fq
.Filter(filter => filter.Bool(bf => bf.Must(boolFilter)))
.Query(qq => textQuery)
)
)
);
return vehicles.Documents.ToList<IVehicle>();
...but I still have no documents returned. What the heck am I missing? If I have a Make of Honda with Models of "Civic" and "Accord", and a make of "BMW" with no models, I should receive all vehicles with honda + civic || honda + accord || bmw + (any model). I'll keep at it.

And,or, & not filters might not be doing what you want. They are a special filter construct that performs better when combining filters that do not operate on bitsets. Must read on this topic:
https://www.elastic.co/blog/all-about-elasticsearch-filter-bitsets
Knowing when to use and/or/not filters vs bool filters can be quite confusing and with Elasticsearch 2.0 you can use the bool filter in ALL contexts and it will know how to best execute the filters/queries in its clauses. No more need for you to hint!
Further more although the bool filter/query is named bool it does a unary bool whereas you might expect it to be a binary bool.
This is why the bool clauses are must/should/must_not vs and/or/not.
In NEST if you use the && || ! operators combined with parenthesis we will compose one or many bool queries so that it acts in the binary bool fashion you write it down in C#.
e.g:
.Query(q=>q
(q.Term("language", "php")
&& !q.Term("name", "Elastica")
)
||
q.Term("name", "NEST")
)
If you need a more dynamic list you can use the assignment operators != and &=:
private FilterContainer GenerateMakeModelFilter(List<MakeModelFilter> makeModelFilters)
{
FilterContainer filter = null;
foreach (var filter in makeModelFilters)
{
filter |= GenerateMakeModelFilter(filter);
}
return filter;
}
Similarly if you refactor GenerateMakeModelFilter to take advantage of the C# boolean operator overloads you'll end up with an easier to read and debug query. Both in terms of C# as well as the query that gets send to Elasticsearch.
Our documentation goes into it in some more detail http://nest.azurewebsites.net/nest/writing-queries.html
UPDATE
Awesome refactor! Now we can focus on mappings in elasticsearch. When you index a json property it goes through an analysis chain which takes the single string and tries to make 1 or more terms out of it that are going to be stored in lucene's inverted index.
By default elasticsearch will analyze all string fields using the standard analyzer
In your case BMW will go through the standard analyzer which splits on whitespace (Unicode standard annex #29 to be exact) and lowercases it.
So the term in the inverted index is bmw. In elasticsearch some queries are also analyzed at query time so a e.g a match query for BMW is also analyzed and transformed to bmw before consulting the inverted index and thus will find documents no matter the casing of BMW at query time.
The term query/filter that you are using is not analyzed at query time so it will try to find BMW in the inverted index where the inverted index only has bmw. This is great if you only want exact term matches. If you set up your mapping so that a field is not analyzed you could for instance do exact matches on New York without worrying its actually stored as two separate terms new and york and inadvertently also get results from New New York

Related

translating mongo query to C# by using Filter

Is there any way to use Filters in C# and translate this mongo query?
{'EmailData.Attachments.Files': {$all: [{Name: 'a.txt'},{Name: 'b.txt'},{Name:'c.txt'}], $size: 3}}
my data model is like:
{
"_id": ObjectId("5f0a9c07b001406068c073c1"),
"EmailData" : [
{
"Attachments" : {
"Files" : [
{
"Name" : "a.txt"
},
{
"Name" : "b.txt"
},
{
"Name" : "c.txt"
}
]
}
}
]
}
I have something like this in my mind:
var Filter =
Builders<EmailEntity>.Filter.All(s => s.EmailData????);
or something like:
var Filter =
Builders<EmailEntity>.Filter.ElemMatch(s => s.EmailData???)
I was wondering is there any way in the above filter to use All inside ElemMatch?
The difficulty here is that EmailData.Attachments.Files is an array within another array so C# compiler will get lost when you try to use Expression Trees.
Thankfully there's another approach when you need to define a field using MongoDB .NET driver. You can take advantage of StringFieldDefinition<T> class.
Try:
var files = new[] { new FileData(){ Name = "a.txt"}, new FileData() { Name = "b.txt" }, new FileData() { Name = "c.txt" } };
FieldDefinition<EmailEntity> fieldDef = new StringFieldDefinition<EmailEntity>("EmailData.Attachments.Files");
var filter = Builders<EmailEntity>.Filter.And(
Builders<EmailEntity>.Filter.All(fieldDef, files),
Builders<EmailEntity>.Filter.Size(fieldDef, 3));
var result= collection.Find(filter).ToList();

Get value in Nested JSON/JTOKEN using LINQ

I am new to LINQ queries and would like to know if what I am trying to achieve is possible via LINQ query.
So, I have a JSON doc as below.
I am trying to get all the values that match the "$type" and return me the directory path and the value for $type.
I know an interactive way of doing this but it seems LINQ is preferred and supposed to be easy to get this.
{
"$type":"type1",
"title":"US version",
"_object1":[
{
"$type":"type2",
"rootModule":{
"id":"page",
"modules":[
{
"id":"header",
"$type":"module-header"
},
{
"id":"footer",
"$type":"module-footer"
}
]
}
},
{
"$type":"type2",
"_id":"ab134"
},
{
"$type":"type3",
"_id":"ab567"
}
],
"_object2":[
{
"$type":"module1",
"constraintsId":"page"
},
{
"name":"header1 1",
"nestedobject":{
"$type":"nestedobject-type",
"dataBinder":{
"id":"ab244"
}
}
}
]
}
Thanks guys,
I was able to get the list as below:
var root = (JContainer)JToken.FromObject(document, CommonSerializerSetting.GetCommonSerializer());
var descendant = "$type";
var query = root
// Recursively descend the JSON hierarchy
.DescendantsAndSelf()
// Select all properties named descendant
.OfType<JProperty>()
.Where(p => p.Name == descendant)
// Select their value
.Select(p => p.Value);

MongoDB C# Driver - Return last modified rows only

The data:
The collection contains a list of audit records and I want to return the last modified items from the collection.
For example:
So the query needs to return Audit 1235 and 1237 Only.
The following statement works in Mongo Shell and returns the data sub-millisecond, I just need to also figure out how to return the entire Collection item instead of just the Id.
db.Forms.aggregate(
{ $group: { _id: "$Id", lastModifiedId: { $last: "$_id" } } }
)
However, I need to convert this to the C# Driver's syntax.
I have the following at the moment but it's not working and returns (for lack of a better term) weird data (see screencap under the statement).
var results = collection.Aggregate()
.Group(new BsonDocument { { "_id", "$Id" }, { "lastModifiedId", new BsonDocument("$last", "_id") } })
.ToListAsync().Result.ToList();
My current solution gets the full collection back and then runs it through an extension method to get the latest records (where list is the full collection):
var lastModifiedOnlyList =
from listItem in list.OrderByDescending(_ => _.AuditId)
group listItem by listItem.Id into grp
select grp.OrderByDescending(listItem => listItem.AuditId)
.FirstOrDefault();
While this code works, it is EXTREMELY slow because of the sheer amount of data that is being returned from the collection, so I need to do the grouping on the list as part of the collection get/find.
Please let me know if I can provide any additional information.
Update: With Axel's help I managed to get it resolved:
var pipeline = new[] { new BsonDocument { { "$group", new BsonDocument { { "_id", "$Id" }, { "LastAuditId", new BsonDocument { { "$last", "$_id" } } } } } } };
var lastAuditIds = collection.Aggregate<Audit>(pipeline).ToListAsync().Result.ToList().Select(_=>_.LastAuditId);
I moved that to it's own method and then use the IDs to get the collection items back, with my projection working as well:
var forLastAuditIds = ForLastAuditIds(collection);
var limitedList = (
projection != null
? collection.Find(forLastAuditIds & filter, new FindOptions()).Project(projection)
: collection.Find(forLastAuditIds & filter, new FindOptions())
).ToListAsync().Result.ToList();
"filter" in this case is either an Expression or a BsonDocument. The performance is great as well - sub-second for the whole thing. Thanks for the help, Axel!
I think you're doing an extra OrderBy, this should do:
var lastModifiedOnlyList =
from listItem in list
group listItem by listItem.Id into grp
select grp.OrderByDescending(listItem => listItem.AuditId)
.FirstOrDefault();
EDIT:
To gain performance in the query, you could use the Aggregate function differently:
var match = new BsonDocument
{
{
"$group",
new BsonDocument
{
{ "_id", "$Id" },
{ "lastModifiedId", new BsonDocument
{
{
"$last", "$_id"
}
}}
}
}
};
var pipeline = new[] { match };
var result = collection.Aggregate(pipeline);
That should be the equivalent of your Mongo Shell query.

ElasticSearch combining MultiMatch with Must

So I have this object model:
string Name; // name of the person
int Age; // age of the person
string CreatedBy; // operator who created person
My query sounds like this: all documents WHERE Age > 40 AND CreatedBy == 'callum' AND Name contains 'll'
CreatedBy is a necessary, scope of control.
Age is also a necessary (but isn't a security issue)
Name is where it can get fuzzy, because that is what the user is querying. Akin to sort of contains
The query below works for the first two parts:
"query": {
"bool": {
"must": [
{
"range": {
"age": {
"gt": 40
}
}
},
{
"match": {
"createdBy": "Callum"
}
}
]
}
}
I tried adding a multi_match because ultimately it maybe a search across Name, Address and other bits of information. I couldn't make sense of where to fit it in.
In my, nested queries would be useful. So first filter out all irrelevant users, then filter out irrelevant ages. Then do some fuzzier matching on relevant fields.
So, the answer to this isn't straightforward.
First of all you need to create an Analyser for Compound Words.
So in the .NET client it looks like:
this.elasticClient.CreateIndex("customer", p => p
.Settings(s => s
.Analysis(a => a
.TokenFilters(t => t
.NGram("bigrams_filter", ng => ng
.MaxGram(2)
.MinGram(2)))
.Analyzers(al => al
.Custom("bigrams", l => l
.Tokenizer("standard")
.Filters("lowercase", "bigrams_filter"))))));
this.elasticClient.Map<Person>(m => m
.Properties(props => props
.String(s => s
.Name(p => p.Name)
.Index(FieldIndexOption.Analyzed)
.Analyzer("bigrams"))
.String(s => s
.Name(p => p.CreatedBy)
.NotAnalyzed())
.Number(n => n
.Name(p => p.Age))));
Which is a sort of direct translation of the the first link provided. This now means that all names will be broken into their bigram representation:
Callum
ca
al
ll
lu
um
Then you need the actual query to take advantage of this. Now this is bit I like, because we've set up that index on the name column, it means that all term queries can have partial words in them, so take this for example (Sense query):
GET customer/_search
{
"query": {
"filtered": {
"query": {
"multi_match": {
"query": "ll",
"fields": ["name"]
}
},
"filter": {
"bool": {
"must": [
{
"range": {
"age": {
"gt": 40
}
}
},
{
"match": {
"createdBy": "Callum"
}
}
]
}
}
}
}
}
Here, we have a filtered query. So the query is always the first to be run (can't find documentation yet to cite that, but I have read it), and this will be the partial terms match. Then we simply filter - which is done after the query - to get the subset of results we need.
Because the ngrams analyser is only set on name that is the only column that will be partially matched against. So CreatedBy won't and thus we get our security around the results.
Basically what you can do is put the query into two blocks:
"query": {
"filter":{
"bool":
{
"must":[
{
"range": {
"age": {
"gt": 40
}
}
}
]
}
},
"query":{
"bool": {
"must": [
{
"multi_match" : {
"query": "ll",
"fields": [ "createdBy", "Address","Name" ] ,
"fuzziness":2
}
}
]
}
}
}
What you can do is in filter you can use condtions to filter things out, on then with the filtered data you can apply you multi-match query. The main reason why I included age in filter is because you dont need to perform any kind of free text search, you just need to check with a static value, you can include more conditions within the must block of filter.
You can also look into this article, which might give you some overview.
https://googleweblight.com/?lite_url=https://www.elastic.co/blog/found-optimizing-elasticsearch-searches&ei=EBaRAJDx&lc=en-IN&s=1&m=75&host=www.google.co.in&ts=1465153335&sig=APY536wHUUfGEjoafiVIzGx2H77aieiymw
Hope it helps!

How can I determine which value occurs the most in my collection?

So, I have a json file that has a list of fruits. Fruits key can map to a single fruit or a collection of fruits.
E.g:
[
{
"fruits": [
"banana"
]
},
{
"fruits": [
"apple"
]
},
{
"fruits": [
"orange",
"apple"
]
}
]
I was wondering, how can I determine which fruit(s) occur the most in my json structure? That is, how do I know my how often a value occurs and which one is leading above the others?
Not sure if you're interested in having a class to deserialize into, but here's how you would do it. Feel free to skip the class and use dynamic deserialization:
class FruitCollection
{
string[] Fruits { get; set; }
}
var fruitColls = JsonConvert.DeserializeObject<FruitCollection>(json);
var mostCommon = fruitColls
.SelectMany(fc => fc.Fruits)
.GroupBy(f => f)
.OrderByDescending(g => g.Count())
.First()
.Key;
EDIT:
This question's pretty old, but I'll mention that the OrderByDescending, First thing is doing redundant work: you don't really need to sort to get the maximum. This is an age-old lazy hack that people keep doing because LINQ does not provide a nice MaxBy extension method.
Usually your input size is small enough and the other stuff adds enough overhead that you don't really care, but the "correct" way (e.g. if you had billions of fruit types) would be to use a proper MaxBy extension method or hack something out of Aggregate. Finding the max is worst-case linear, whereas sorting is worst case O(n log(n)).
If you use Json.NET, you can load your json using LINQ to JSON, then use SelectTokens to recursively find all "fruits" properties, then recursively collect all descendants string values (those of type JValue), group them by their string value, and put them in descending order:
var token = JToken.Parse(jsonString);
var fruits = token.SelectTokens("..fruits") // Recursively find all "fruit" properties
.SelectMany(f => f.DescendantsAndSelf()) // Recursively find all string literals undernearh each
.OfType<JValue>()
.GroupBy(f => (string)f) // Group by string value
.OrderByDescending(g => g.Count()) // Descending order by count.
.ToList();
Or, if you prefer to put your results into an anonymous type for clarity:
var fruits = token.SelectTokens("..fruits") // Recursively find all "fruit" properties
.SelectMany(f => f.DescendantsAndSelf()) // Recursively find all string literals undernearh each
.OfType<JValue>()
.GroupBy(f => (string)f) // Group by string value
.Select(g => new { Fruit = (string)g.Key, Count = g.Count() } )
.OrderByDescending(f => f.Count) // Descending order by count.
.ToList();
Then afterwards:
Console.WriteLine(JsonConvert.SerializeObject(fruits, Formatting.Indented));
Produces:
[
{
"Fruit": "apple",
"Count": 2
},
{
"Fruit": "banana",
"Count": 1
},
{
"Fruit": "orange",
"Count": 1
}
]
** Update **
Forgot to include the following extension method
public static class JsonExtensions
{
public static IEnumerable<JToken> DescendantsAndSelf(this JToken node)
{
if (node == null)
return Enumerable.Empty<JToken>();
var container = node as JContainer;
if (container != null)
return container.DescendantsAndSelf();
else
return new [] { node };
}
}
The original question was a little vague on the precise structure of the JSON which is why I suggested using Linq rather than deserialization.
The serialization class for this structure is simple:
public class RootObject
{
public List<List<string>> fruits { get; set; }
}
So to deserialize:
var fruitListContainer = JsonConvert.DeserializeObject<RootObject>(jsonString);
Then you can put all fruits in one list:
List<string> fruits = fruitListContainer.fruits.SelectMany(f => f);
Now you have all fruits in one list, and you can do whatever you want. For sorting, see the other answers.
Assuming that the data is in a file named fruits.json, that jq (http://stedolan.github.io/jq/) is on the PATH, and that you're using a Mac or Linux-style shell:
$ jq 'reduce (.[].fruits[]) as $fruit ({}; .[$fruit] += 1)' fruits.json
{
"banana": 1,
"apple": 2,
"orange": 1
}
On Windows, the same thing will work if the quotation marks are suitably adjusted. Alternatively, if the one-line jq program is put in a file, say fruits.jq, the following command could be run in any supported environment:
jq -f fruits.jq fruits.json
If the data is coming from some other process, you can pipe it into jq, e.g. like so:
jq -f fruits.jq
One way to find the maximum count is to add a couple of filters, e.g. as follows:
$ jq 'reduce (.[].fruits[]) as $fruit ({}; .[$fruit] += 1) |
to_entries | max_by(.value)' fruits.json
{
"key": "apple",
"value": 2
}

Categories

Resources