So I have this object model:
string Name; // name of the person
int Age; // age of the person
string CreatedBy; // operator who created person
My query sounds like this: all documents WHERE Age > 40 AND CreatedBy == 'callum' AND Name contains 'll'
CreatedBy is a necessary, scope of control.
Age is also a necessary (but isn't a security issue)
Name is where it can get fuzzy, because that is what the user is querying. Akin to sort of contains
The query below works for the first two parts:
"query": {
"bool": {
"must": [
{
"range": {
"age": {
"gt": 40
}
}
},
{
"match": {
"createdBy": "Callum"
}
}
]
}
}
I tried adding a multi_match because ultimately it maybe a search across Name, Address and other bits of information. I couldn't make sense of where to fit it in.
In my, nested queries would be useful. So first filter out all irrelevant users, then filter out irrelevant ages. Then do some fuzzier matching on relevant fields.
So, the answer to this isn't straightforward.
First of all you need to create an Analyser for Compound Words.
So in the .NET client it looks like:
this.elasticClient.CreateIndex("customer", p => p
.Settings(s => s
.Analysis(a => a
.TokenFilters(t => t
.NGram("bigrams_filter", ng => ng
.MaxGram(2)
.MinGram(2)))
.Analyzers(al => al
.Custom("bigrams", l => l
.Tokenizer("standard")
.Filters("lowercase", "bigrams_filter"))))));
this.elasticClient.Map<Person>(m => m
.Properties(props => props
.String(s => s
.Name(p => p.Name)
.Index(FieldIndexOption.Analyzed)
.Analyzer("bigrams"))
.String(s => s
.Name(p => p.CreatedBy)
.NotAnalyzed())
.Number(n => n
.Name(p => p.Age))));
Which is a sort of direct translation of the the first link provided. This now means that all names will be broken into their bigram representation:
Callum
ca
al
ll
lu
um
Then you need the actual query to take advantage of this. Now this is bit I like, because we've set up that index on the name column, it means that all term queries can have partial words in them, so take this for example (Sense query):
GET customer/_search
{
"query": {
"filtered": {
"query": {
"multi_match": {
"query": "ll",
"fields": ["name"]
}
},
"filter": {
"bool": {
"must": [
{
"range": {
"age": {
"gt": 40
}
}
},
{
"match": {
"createdBy": "Callum"
}
}
]
}
}
}
}
}
Here, we have a filtered query. So the query is always the first to be run (can't find documentation yet to cite that, but I have read it), and this will be the partial terms match. Then we simply filter - which is done after the query - to get the subset of results we need.
Because the ngrams analyser is only set on name that is the only column that will be partially matched against. So CreatedBy won't and thus we get our security around the results.
Basically what you can do is put the query into two blocks:
"query": {
"filter":{
"bool":
{
"must":[
{
"range": {
"age": {
"gt": 40
}
}
}
]
}
},
"query":{
"bool": {
"must": [
{
"multi_match" : {
"query": "ll",
"fields": [ "createdBy", "Address","Name" ] ,
"fuzziness":2
}
}
]
}
}
}
What you can do is in filter you can use condtions to filter things out, on then with the filtered data you can apply you multi-match query. The main reason why I included age in filter is because you dont need to perform any kind of free text search, you just need to check with a static value, you can include more conditions within the must block of filter.
You can also look into this article, which might give you some overview.
https://googleweblight.com/?lite_url=https://www.elastic.co/blog/found-optimizing-elasticsearch-searches&ei=EBaRAJDx&lc=en-IN&s=1&m=75&host=www.google.co.in&ts=1465153335&sig=APY536wHUUfGEjoafiVIzGx2H77aieiymw
Hope it helps!
Related
I am very new to Elastic Search , I want to search a result based on a partial word of a sentence , like the search string is
"val"
and it should search the result with string value
"value is grater than 100"
but if I am using a query
var searchDescriptor = new SearchDescriptor<ElasticsearchProject>()
searchDescriptor.Query(q =>
q.Wildcard(m => m.OnField(p => p.PNumber).Value(string.Format("*{0}*", searchString)))
);
it will work only for one word string like
"ValueIsGraterThan100"
if I use something like this
var searchDescriptor = new SearchDescriptor<ElasticsearchProject>()
searchDescriptor.Query(q =>
q.QueryString(m => m.OnFields(p => p.PName).Query(searchString))
);
This will work for entire word , like i have to provide search string as
"value"
to search
"value is grater than 100"
only providing val will not work.So how i can fulfill my requirement ?
Your field currently is not_analyzed, You can use edge n-gram analyzer made up of edge ngram filter to token your field before saving the fields on inverted index. You can use the following settings
PUT index_name1323
{
"settings": {
"analysis": {
"analyzer": {
"autocomplete_analyzer": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"standard",
"lowercase",
"filter_edgengram"
]
}
},
"filter": {
"filter_edgengram": {
"type": "edgeNGram",
"min_gram": 2,
"max_gram": 15
}
}
}
},
"mappings": {
"test_type": {
"properties": {
"props": {
"type": "string",
"analyzer": "autocomplete_analyzer"
}
}
}
}
}
Now you can simply use both query_string or term filter to match both your documents to val
POST index_name1323/_search
{
"query": {
"query_string": {
"default_field": "props",
"query": "val"
}
}
}
Hope this helps
I'm trying to rollup some of my 'other' results using Elasticsearch. Ideally, I'd like my query to return the top N hits and then roll the rest of the data up into an N+1 hit titled "Other".
So for example, if I'm trying to aggregate "Institutions by Total Value", I'd get back 10 Institutions with the most value and then the total aggregated value of the other institutions as another record. The purpose is that I'd like to see the total value aggregated across all institutions but not have to list thousands.
An example search I've been using is:
GET my_index/institution/_search?pretty=true
{
"query": {
"filtered": {
"filter": {
"bool": {
"must": [
... terms queries ...
]
}
}
}
},
"aggs": {
"dimension_type_name_agg": {
"terms": {
"field": "institution_name",
"order": {
"metric_sum_total_value_agg": "desc"
},
"size": 0
},
"aggs": {
"metric_sum_total_value_agg": {
"sum": {
"field": "total_value"
}
},
"metric_count_account_id_agg": {
"value_count": {
"field": "institution_id"
}
}
}
}
}
}
I'm curious as to if this can be done by modifying a query like the one given above. Also, I'm using C# and Nest/Elasticsearch.NET so any tips on how this translates to that side is appreciated as well.
My goal is to search a word irrespective of the analyzer added to that.
I used match query with keyword analyzer but i think it works with the default analyzer added to that property.
In elastic search, my author document structure is like
"_source": {
"Id": 3,
"Organization": "let123"
}
Index mapping :
createIndexDescriptor.NumberOfReplicas(1)
.NumberOfShards(1)
.Settings(
settings =>
settings
.Add("analysis.filter.autocomplete_filter_ngram.type", "edge_ngram")
.Add("analysis.filter.autocomplete_filter_ngram.min_gram", "2")
.Add("analysis.filter.autocomplete_filter_ngram.max_gram", "7")
.Add("analysis.analyzer.title_analyzer.type", "custom")
.Add("analysis.analyzer.title_analyzer.char_filter.0", "html_strip")
.Add("analysis.analyzer.title_analyzer.tokenizer", "standard")
.Add("analysis.analyzer.title_analyzer.filter.0", "lowercase")
.Add("analysis.analyzer.title_analyzer.filter.1", "asciifolding")
.Add("analysis.analyzer.title_analyzer.filter.2", "autocomplete_filter_ngram"))
.AddMapping<Author>(
m =>
m.MapFromAttributes()
.AllField(f => f.Enabled(true))
.Properties(
props =>
props.MultiField(
mf =>
mf.Name(t => t.Organization)
.Fields(fs => fs.String(s => s.Name(t => t.Organization).Analyzer("title_analyzer"))
))));
here i noted one of my title analyzer filter is ngram
But I used keyword analyzer in my match query to avoid autocomplete in my searching.
GET /author/_search {
"query": {
"match": {
"Organization": {
"query": "le",
"analyzer": "keyword"
}
}
} }
But when i searched, the above document is matched.
what i am expecting is Organization having exact value as 'le'
Why this is matched? Any idea to achieve my goal?
By specifiying the analyser in the query you are instructing Elasticsearch how to analyse the query you've sent.
For example:
GET /author/_search
{
"query": {
"match": {
"Organization": {
"query": "le",
"analyzer": "keyword"
}
}
}
}
Tells Elasticsearch to use the keyword analyser on the le string. It doesn't affect the indexed terms that have already been created on your stored data (let123)
The only way to change the way that stored data is analysed, is to update your mapping and re-index your data.
Multifields
It's not possible to have multiple analyzers against the same field but data can instead be easily stored in multiple fields (each having a single analyser).
for example:
{
"tweet" : {
"properties" : {
"name" : {
"type" : "string",
"index" : "analyzed",
"fields" : {
"raw" : {"type" : "string", "index" : "not_analyzed"}
}
}
}
}
}
the name data is automatically stored in two places - in fields name (where it is analysed) and name.raw (where no analysis takes place). See Multi Fields.
GET /author/_search
{
"query": {
"term": "le"
}
}
I am trying to write a search query on an elastic index that will return me results from any part of the field value.
I have a Path field that contains values like C:\temp\ab-cd\abc.doc
I want the ability to send a query that will return my any matching part from what I wrote
QueryContainer currentQuery = new QueryStringQuery
{
DefaultField = "Path",
Query = string.Format("*{0}*", "abc"),
};
The above will return results, this will not:
QueryContainer currentQuery = new QueryStringQuery
{
DefaultField = "Path",
Query = string.Format("*{0}*", "ab-cd"),
};
The same goes for any other special character like ##$%^&* and so on.
Is there some generic way to send a query and to find exactly what I searched?
Each of my fields are multi-fields and I can use the *.raw options but do not exactly know how or if I should
Use nGrams to split the text in smaller chunks and use term filter to query it. Pro: it should be faster. Con: the size of the index (disk space) will be larger because more terms (from nGram filter) are generated.
PUT /test
{
"settings": {
"analysis": {
"analyzer": {
"my_ngram_analyzer": {
"tokenizer": "keyword",
"filter": [
"substring"
]
}
},
"filter": {
"substring": {
"type": "nGram",
"min_gram": 1,
"max_gram": 50
}
}
}
},
"mappings": {
"test": {
"properties": {
"Path": {
"type": "string",
"index_analyzer": "my_ngram_analyzer",
"search_analyzer": "keyword"
}
}
}
}
}
And the query:
GET /test/test/_search
{
"query": {
"term": {
"Path": {
"value": "\temp"
}
}
}
}
If you wish, you can use the config above as a sub-field for whatever mapping you already have.
If you want to use query_string there one thing you need to be aware: you need to escape special characters. For example -, \ and : (complete list here). Also, when indexing, the \ char needs escaping, otherwise it will issue an error. This is what I tested especially with query_string: https://gist.github.com/astefan/a52fa4989bf5298102d1
I am trying to sort the aggregated result by applying another aggregation that does the summing and then applying order by descending to that sum.
if I try like below, the aggregation result get sorted by doc count.
"order": {
"revrsenestedowners": "desc"
}
Below code explains the problem am facing. (field names are changed just to illustrate the problem)
"machines" is my nested object but the "owners" is not nested and it belongs to parent object.
I need to get the top 10 machines name by owners machine count (require sum as the owners object are list and can have more than one value).
{
"query": {
"range": {
"createdDate": {
"gte": "2015-04-28T00:00:00",
"lte": "2015-05-01T23:59:59"
}
}
},
"aggs": {
"nestedagg": {
"nested": {
"path": "machines"
},
"aggs": {
"terms": {
"terms": {
"field": "machines.machineName",
"size": 10,
"order": {
"sumowners": "desc"
}
},
"aggs": {
"revrsenestedowners": {
"reverse_nested": {},
"aggs": {
"sumowners": {
"sum": {
"field": "owners.machinesCount"
}
}
}
}
}
}
}
}
}
}
I require the sum ordering and not the doc count ordering.
for it to work I may require something like :
"order": {
"revrsenestedowners.sumowners": "desc"
}
Is there a way to achieve what I'm looking for.
Or Is this the limitation with elastic search? or a bug?
I'm stuck and really appreciate any help
I raised same issue on elastic search forum and they replied with the correct syntax,
https://github.com/elastic/elasticsearch/issues/11059
The Answer is:
"order": {
"revrsenestedowners > sumowners.value" : "desc"
}
Hope it may help to others,