ElasticSearch NEST - Find results with special characters - c#

I am trying to write a search query on an elastic index that will return me results from any part of the field value.
I have a Path field that contains values like C:\temp\ab-cd\abc.doc
I want the ability to send a query that will return my any matching part from what I wrote
QueryContainer currentQuery = new QueryStringQuery
{
DefaultField = "Path",
Query = string.Format("*{0}*", "abc"),
};
The above will return results, this will not:
QueryContainer currentQuery = new QueryStringQuery
{
DefaultField = "Path",
Query = string.Format("*{0}*", "ab-cd"),
};
The same goes for any other special character like ##$%^&* and so on.
Is there some generic way to send a query and to find exactly what I searched?
Each of my fields are multi-fields and I can use the *.raw options but do not exactly know how or if I should

Use nGrams to split the text in smaller chunks and use term filter to query it. Pro: it should be faster. Con: the size of the index (disk space) will be larger because more terms (from nGram filter) are generated.
PUT /test
{
"settings": {
"analysis": {
"analyzer": {
"my_ngram_analyzer": {
"tokenizer": "keyword",
"filter": [
"substring"
]
}
},
"filter": {
"substring": {
"type": "nGram",
"min_gram": 1,
"max_gram": 50
}
}
}
},
"mappings": {
"test": {
"properties": {
"Path": {
"type": "string",
"index_analyzer": "my_ngram_analyzer",
"search_analyzer": "keyword"
}
}
}
}
}
And the query:
GET /test/test/_search
{
"query": {
"term": {
"Path": {
"value": "\temp"
}
}
}
}
If you wish, you can use the config above as a sub-field for whatever mapping you already have.
If you want to use query_string there one thing you need to be aware: you need to escape special characters. For example -, \ and : (complete list here). Also, when indexing, the \ char needs escaping, otherwise it will issue an error. This is what I tested especially with query_string: https://gist.github.com/astefan/a52fa4989bf5298102d1

Related

How to make this CosmosDB SQL Query work without knowing ARRAY index?

I am querying a CosmosDB in such a way that I am getting a string in and ned to return some data out through a C# WEB API, the query that works for me is as below
SELECT *
FROM c IN jongel.OriginalData.base.sales.variants
WHERE c.globalTradeItemNumber.globalTradeItemNumberType[0].GTIN = '1111111111111'
The problem is that I have to know the ARRAY INDEX for the globalTradeItemNumberType ARRAY, [0] in this example, for it to work but it is not always 0, it could be any number from 0-9 basically and I cannot figure out how to rewrite the query so that it works regardless of the index where the matching data is found?
How can I rewrite this query so that I do not need to know the ARRAY INDEX beforehand?
--- EDIT ---
A sample document shortened to only include the needed parts
{
"id": "635af816-8db7-49c6-8284-ab85116b499b",
"brand": "XXX",
"IntegrationSource": "XXX",
"DocumentType": "Item",
"ItemInformationType": "",
"ItemLevel": "Article",
"ItemNo": "0562788040",
"UpdatedDate": "1/1/2020 4:00:01 AM",
"UpdatedDateUtc": "2020-01-01T04:00:01.82Z",
"UpdatedBy": "XXX",
"OriginalData": {
"corporateBrandId": "2",
"productId": "0562788",
"articleId": "0562788040",
"season": "201910",
"base": {
"sales": {
"SAPArticleNumber": "562788040190",
"simpleColour": {
"simpleColourId": "99",
"simpleColourDescription": "Green",
"translatedColourDescription": [
{
"languageCode": "sr",
"simpleColourDescription": "Zeleno"
},
{
"languageCode": "zh-Hans",
"simpleColourDescription": "绿色"
},
{
"languageCode": "vi-VN",
"simpleColourDescription": "Xanh la cay"
}
]
},
"variants": [
{
"variantId": "0562788040001",
"variantNumber": "562788040190001",
"variantDescription": "YYYYYYYYY, XXS",
"sizeScaleAndCode": "176-001",
"netWeight": 0.491,
"unitsOfMeasure": {
"unitsOfMeasureType": [
{
"alternativeUOM_ISO": "PCE",
"length": 320,
"width": 290,
"height": 31,
"unitOfDimension": "MM",
"volume": 2876.8,
"volumeUnit": "CCM",
"weightUnit": "KG"
}
]
},
"globalTradeItemNumber": {
"globalTradeItemNumberType": [
{
"GTIN": "1111111111111",
"GTINCategory": "Z3"
},
{
"GTIN": "2222222222222",
"GTINCategory": "Z3"
},
{
"GTIN": "3333333333333",
"GTINCategory": "IE"
}
]
}
}
]
}
}
}
}
I tried the following query based on suggested answer below but it did not work
SELECT *
FROM c
WHERE ARRAY_CONTAINS(c.OriginalData.base.sales.variants.globalTradeItemNumber.globalTradeItemNumberType, {GTIN:"1111111111111"}, true)
I guess the above fails because variants part of the tree is also an array?
NOTE: the variants array can hold several objects so its not always index[0]
You could try using the ARRAY_CONTAINS function.
SELECT *
FROM c IN jongel.OriginalData.base.sales.variants
WHERE ARRAY_CONTAINS(c.globalTradeItemNumber.globalTradeItemNumberType, {GTIN:"1111111111111"}, true)
This will allow the query to search all items in the array for a matching GTIN value.
https://learn.microsoft.com/en-us/azure/cosmos-db/sql-query-array-contains

Search By Partial word of a sentence in elastic search

I am very new to Elastic Search , I want to search a result based on a partial word of a sentence , like the search string is
"val"
and it should search the result with string value
"value is grater than 100"
but if I am using a query
var searchDescriptor = new SearchDescriptor<ElasticsearchProject>()
searchDescriptor.Query(q =>
q.Wildcard(m => m.OnField(p => p.PNumber).Value(string.Format("*{0}*", searchString)))
);
it will work only for one word string like
"ValueIsGraterThan100"
if I use something like this
var searchDescriptor = new SearchDescriptor<ElasticsearchProject>()
searchDescriptor.Query(q =>
q.QueryString(m => m.OnFields(p => p.PName).Query(searchString))
);
This will work for entire word , like i have to provide search string as
"value"
to search
"value is grater than 100"
only providing val will not work.So how i can fulfill my requirement ?
Your field currently is not_analyzed, You can use edge n-gram analyzer made up of edge ngram filter to token your field before saving the fields on inverted index. You can use the following settings
PUT index_name1323
{
"settings": {
"analysis": {
"analyzer": {
"autocomplete_analyzer": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"standard",
"lowercase",
"filter_edgengram"
]
}
},
"filter": {
"filter_edgengram": {
"type": "edgeNGram",
"min_gram": 2,
"max_gram": 15
}
}
}
},
"mappings": {
"test_type": {
"properties": {
"props": {
"type": "string",
"analyzer": "autocomplete_analyzer"
}
}
}
}
}
Now you can simply use both query_string or term filter to match both your documents to val
POST index_name1323/_search
{
"query": {
"query_string": {
"default_field": "props",
"query": "val"
}
}
}
Hope this helps

ElasticSearch combining MultiMatch with Must

So I have this object model:
string Name; // name of the person
int Age; // age of the person
string CreatedBy; // operator who created person
My query sounds like this: all documents WHERE Age > 40 AND CreatedBy == 'callum' AND Name contains 'll'
CreatedBy is a necessary, scope of control.
Age is also a necessary (but isn't a security issue)
Name is where it can get fuzzy, because that is what the user is querying. Akin to sort of contains
The query below works for the first two parts:
"query": {
"bool": {
"must": [
{
"range": {
"age": {
"gt": 40
}
}
},
{
"match": {
"createdBy": "Callum"
}
}
]
}
}
I tried adding a multi_match because ultimately it maybe a search across Name, Address and other bits of information. I couldn't make sense of where to fit it in.
In my, nested queries would be useful. So first filter out all irrelevant users, then filter out irrelevant ages. Then do some fuzzier matching on relevant fields.
So, the answer to this isn't straightforward.
First of all you need to create an Analyser for Compound Words.
So in the .NET client it looks like:
this.elasticClient.CreateIndex("customer", p => p
.Settings(s => s
.Analysis(a => a
.TokenFilters(t => t
.NGram("bigrams_filter", ng => ng
.MaxGram(2)
.MinGram(2)))
.Analyzers(al => al
.Custom("bigrams", l => l
.Tokenizer("standard")
.Filters("lowercase", "bigrams_filter"))))));
this.elasticClient.Map<Person>(m => m
.Properties(props => props
.String(s => s
.Name(p => p.Name)
.Index(FieldIndexOption.Analyzed)
.Analyzer("bigrams"))
.String(s => s
.Name(p => p.CreatedBy)
.NotAnalyzed())
.Number(n => n
.Name(p => p.Age))));
Which is a sort of direct translation of the the first link provided. This now means that all names will be broken into their bigram representation:
Callum
ca
al
ll
lu
um
Then you need the actual query to take advantage of this. Now this is bit I like, because we've set up that index on the name column, it means that all term queries can have partial words in them, so take this for example (Sense query):
GET customer/_search
{
"query": {
"filtered": {
"query": {
"multi_match": {
"query": "ll",
"fields": ["name"]
}
},
"filter": {
"bool": {
"must": [
{
"range": {
"age": {
"gt": 40
}
}
},
{
"match": {
"createdBy": "Callum"
}
}
]
}
}
}
}
}
Here, we have a filtered query. So the query is always the first to be run (can't find documentation yet to cite that, but I have read it), and this will be the partial terms match. Then we simply filter - which is done after the query - to get the subset of results we need.
Because the ngrams analyser is only set on name that is the only column that will be partially matched against. So CreatedBy won't and thus we get our security around the results.
Basically what you can do is put the query into two blocks:
"query": {
"filter":{
"bool":
{
"must":[
{
"range": {
"age": {
"gt": 40
}
}
}
]
}
},
"query":{
"bool": {
"must": [
{
"multi_match" : {
"query": "ll",
"fields": [ "createdBy", "Address","Name" ] ,
"fuzziness":2
}
}
]
}
}
}
What you can do is in filter you can use condtions to filter things out, on then with the filtered data you can apply you multi-match query. The main reason why I included age in filter is because you dont need to perform any kind of free text search, you just need to check with a static value, you can include more conditions within the must block of filter.
You can also look into this article, which might give you some overview.
https://googleweblight.com/?lite_url=https://www.elastic.co/blog/found-optimizing-elasticsearch-searches&ei=EBaRAJDx&lc=en-IN&s=1&m=75&host=www.google.co.in&ts=1465153335&sig=APY536wHUUfGEjoafiVIzGx2H77aieiymw
Hope it helps!

Elasticsearch - Rolling up "other" results

I'm trying to rollup some of my 'other' results using Elasticsearch. Ideally, I'd like my query to return the top N hits and then roll the rest of the data up into an N+1 hit titled "Other".
So for example, if I'm trying to aggregate "Institutions by Total Value", I'd get back 10 Institutions with the most value and then the total aggregated value of the other institutions as another record. The purpose is that I'd like to see the total value aggregated across all institutions but not have to list thousands.
An example search I've been using is:
GET my_index/institution/_search?pretty=true
{
"query": {
"filtered": {
"filter": {
"bool": {
"must": [
... terms queries ...
]
}
}
}
},
"aggs": {
"dimension_type_name_agg": {
"terms": {
"field": "institution_name",
"order": {
"metric_sum_total_value_agg": "desc"
},
"size": 0
},
"aggs": {
"metric_sum_total_value_agg": {
"sum": {
"field": "total_value"
}
},
"metric_count_account_id_agg": {
"value_count": {
"field": "institution_id"
}
}
}
}
}
}
I'm curious as to if this can be done by modifying a query like the one given above. Also, I'm using C# and Nest/Elasticsearch.NET so any tips on how this translates to that side is appreciated as well.

Elastic search : Match query with analyzer is not working

My goal is to search a word irrespective of the analyzer added to that.
I used match query with keyword analyzer but i think it works with the default analyzer added to that property.
In elastic search, my author document structure is like
"_source": {
"Id": 3,
"Organization": "let123"
}
Index mapping :
createIndexDescriptor.NumberOfReplicas(1)
.NumberOfShards(1)
.Settings(
settings =>
settings
.Add("analysis.filter.autocomplete_filter_ngram.type", "edge_ngram")
.Add("analysis.filter.autocomplete_filter_ngram.min_gram", "2")
.Add("analysis.filter.autocomplete_filter_ngram.max_gram", "7")
.Add("analysis.analyzer.title_analyzer.type", "custom")
.Add("analysis.analyzer.title_analyzer.char_filter.0", "html_strip")
.Add("analysis.analyzer.title_analyzer.tokenizer", "standard")
.Add("analysis.analyzer.title_analyzer.filter.0", "lowercase")
.Add("analysis.analyzer.title_analyzer.filter.1", "asciifolding")
.Add("analysis.analyzer.title_analyzer.filter.2", "autocomplete_filter_ngram"))
.AddMapping<Author>(
m =>
m.MapFromAttributes()
.AllField(f => f.Enabled(true))
.Properties(
props =>
props.MultiField(
mf =>
mf.Name(t => t.Organization)
.Fields(fs => fs.String(s => s.Name(t => t.Organization).Analyzer("title_analyzer"))
))));
here i noted one of my title analyzer filter is ngram
But I used keyword analyzer in my match query to avoid autocomplete in my searching.
GET /author/_search {
"query": {
"match": {
"Organization": {
"query": "le",
"analyzer": "keyword"
}
}
} }
But when i searched, the above document is matched.
what i am expecting is Organization having exact value as 'le'
Why this is matched? Any idea to achieve my goal?
By specifiying the analyser in the query you are instructing Elasticsearch how to analyse the query you've sent.
For example:
GET /author/_search
{
"query": {
"match": {
"Organization": {
"query": "le",
"analyzer": "keyword"
}
}
}
}
Tells Elasticsearch to use the keyword analyser on the le string. It doesn't affect the indexed terms that have already been created on your stored data (let123)
The only way to change the way that stored data is analysed, is to update your mapping and re-index your data.
Multifields
It's not possible to have multiple analyzers against the same field but data can instead be easily stored in multiple fields (each having a single analyser).
for example:
{
"tweet" : {
"properties" : {
"name" : {
"type" : "string",
"index" : "analyzed",
"fields" : {
"raw" : {"type" : "string", "index" : "not_analyzed"}
}
}
}
}
}
the name data is automatically stored in two places - in fields name (where it is analysed) and name.raw (where no analysis takes place). See Multi Fields.
GET /author/_search
{
"query": {
"term": "le"
}
}

Categories

Resources