Turkish character problem in elasticsearch - c#

When I search with Turkish characters in elasticsearch, it does not match. For example, when I type "yazilim", the result comes, but when I type "Yazılım", no result. The correct one is "Yazılım".
My index code.
var createIndexDescriptor = new CreateIndexDescriptor(INDEX_NAME).Mappings(ms => ms.Map<T>(m => m.AutoMap()
.Properties(pprops => pprops
.Text(ps => ps
.Name("Title")
.Fielddata(true)
.Fields(f => f
.Keyword(k => k
.Name("keyword")))))
)).Settings(st => st
.Analysis(an => an
.Analyzers(anz => anz
.Custom("tab_delim_analyzer", td => td
.Filters("lowercase", "asciifolding")
.Tokenizer("standard")
)
)
)
);
my search query code.
var searchResponse = eClient.Search<GlobalCompany>(s => s.Index(INDEX_NAME).From(0).Size(10)
.Query(q => q
.MultiMatch(m => m
.Fields(f => f
.Field(u => u.Title)
.Field(u => u.RegisterNumber))
.Type(TextQueryType.PhrasePrefix)
.Query(value))));

You are using an asciifolding filter, it makes sure ASCII characters are used (see docs).

You need to configure your field Title as a text field instead of a keyword field and set the analyzer for this field to tab_delim_analyzer.
I don't know how to translate this in dotNet world but here is what I mean in pure Kibana Dev Console script (curl):
DELETE deneme
PUT deneme
{
"settings": {
"analysis": {
"analyzer": {
"tab_delim_analyzer": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"lowercase",
"asciifolding"
]
}
}
}
},
"mappings": {
"properties": {
"Title": {
"type": "text",
"analyzer": "tab_delim_analyzer"
}
}
}
}

Related

Dynamic Elastic search query in c# NEST

Started working on NEST api for elastic search recently, got stuck on following query, the data.e would be dynamically populated using the input from client in the HttpGet,
ex: user sends eventA,eventB,eventC then we would add in the should part:
GET events/_search
{
"_source": false,
"query": {
"bool": {
"must": [
{"range": {
"timestamp": {
"gte": 1604684158527,
"lte": 1604684958731
}
}},
{"nested": {
"path": "data",
"query": {
"bool": {
"should": [
{"match": {
"data.e": "eventA"
}},
{"match": {
"data.e": "eventB"
}},
{"match": {
"data.e": "eventC"
}},
]
}
},
"inner_hits": {}
}}
]
}
}
}
Following is what I came up with till now:
var graphDataSearch = _esClient.Search<Events>(s => s
.Source(src => src
.Includes(i => i
.Field("timestamp")
)
)
.Query(q => q
.Bool(b => b
.Must(m => m
.Range(r => r
.Field("timestamp")
.GreaterThanOrEquals(startTime)
.LessThanOrEquals(stopTime)
),
m => m
.Nested(n => n
.Path("data")
.Query(q => q
.Bool(bo => bo
.Should(
// what to add here?
)
)
)
)
)
));
Can someone please help how to build the should part dynamically based on what input the user sends?
Thanks.
You can replace the nested query in the above snippet as shown below
// You may modify the parameters of this method as per your needs to reflect user input
// Field can be hardcoded as shown here or can be fetched from Event type as below
// m.Field(f => f.Data.e)
public static QueryContainer Blah(params string[] param)
{
return new QueryContainerDescriptor<Events>().Bool(
b => b.Should(
s => s.Match(m => m.Field("field1").Query(param[0])),
s => s.Match(m => m.Field("field2").Query(param[1])),
s => s.Match(m => m.Field("field3").Query(param[2]))));
}
What we are essentially doing here is we are returning a QueryContainer object that will be passed to the nested query
.Query(q => Blah(<your parameters>))
The same can be done by adding this inline without a separate method. You may choose which ever route you perfer. However, in general, having a method of its own increases the readability and keeps things cleaner.
You can read more about Match usage here
Edit:
Since you want to dynamically add the match queries inside this, below is a way you can do it.
private static QueryContainer[] InnerBlah(string field, string[] param)
{
QueryContainer orQuery = null;
List<QueryContainer> queryContainerList = new List<QueryContainer>();
foreach (var item in param)
{
orQuery = new MatchQuery() {Field = field, Query = item};
queryContainerList.Add(orQuery);
}
return queryContainerList.ToArray();
}
Now, call this method from inside of the above method as shown below
public static QueryContainer Blah(params string[] param)
{
return new QueryContainerDescriptor<Events>().Bool(
b => b.Should(
InnerBlah("field", param)));
}

Nest Elasticsearch wildcard query works as querystring but not with fluent API

I have about a hundred test documents in my index, built using NBuilder:
[
{
"title" : "Title1",
"text" : "Text1"
},
{
"title" : "Title2",
"text" : "Text2"
},
{
"title" : "Title3",
"text" : "Text3"
}
]
I want to query them with a wildcard to find all items with "text" starts with "Text". But when I use two wildcard methods in Nest I get two different results.
var response = await client.SearchAsync<FakeFile>(s => s.Query(q => q
.QueryString(d => d.Query("text:Text*")))
.From((page - 1) * pageSize)
.Size(pageSize));
This returns 100 results. But I'm trying to use a fluent API rather than querystring.
var response = await client.SearchAsync<FakeFile>(s => s
.Query(q => q
.Wildcard(c => c
.Field(f => f.Text)
.Value("Text*"))));
This returns 0 results. I'm new to Elasticsearch. I've tried to make the example as simple as possible to make sure I understand it piece-by-piece. I don't know why nothing is returning from the second query. Please help.
Assuming your text field is of type text, then during indexing elasticsearch will store Text1 as text1 internally in the inverted index. Exactly the same analysis will happen when using query string query, but not when you are using wildcard query.
.QueryString(d => d.Query("text:Text*"))) looks for text* and .Wildcard(c => c.Field(f => f.Text).Value("Text*"))) looks for Text* but elasticsearch stores internally only first one.
Hope that helps.
Supposed your mapping looks like that:
{
"mappings": {
"doc": {
"properties": {
"title": {
"type": "text"
},
"text":{
"type": "text"
}
}
}
}
}
Try this (Value should be in lowercase):
var response = await client.SearchAsync<FakeFile>(s => s
.Query(q => q
.Wildcard(c => c
.Field(f => f.Text)
.Value("text*"))));
Or this (don't know if f.Text has text property on it):
var response = await client.SearchAsync<FakeFile>(s => s
.Query(q => q
.Wildcard(c => c
.Field("text")
.Value("text*"))));
Kibana syntax:
GET index/_search
{
"query": {
"wildcard": {
"text": {
"value": "text*"
}
}
}
}

Elasticsearch can't read ScriptedMetric from BucketScript?

I should work since the ScriptedMetric is a metric, and it does return a single numeric value, but I can't get it to work.
I'm using NEST (5.5.0 via NuGet, I'm targeting Elasticsearch 6.0.0) in C# to get it to work, I did however also try building the same query in Kibana to rule out an issue with NEST. And in Kibana I'm getting exactly the same error.
The error:
buckets_path must reference either a number value or a single value numeric metric aggregation, got: org.elasticsearch.search.aggregations.metrics.scripted.InternalScriptedMetric
My code:
ISearchResponse<LogItem> aggregationResponse = await client.SearchAsync<LogItem>(s => s
.Size(0)
.Type("errordoc")
.Query(...)
.Aggregations(a => a
.Terms("Hash", st => st
.Field(o => o.messageHash.Suffix("keyword")).OrderDescending("Avg-Score")
.Aggregations(aa => aa
.Terms("Friendly", ff => ff
.Field(oo => oo.friendly.Suffix("keyword"))
)
.Max("Max-DateTime", ff => ff
.Field(oo => oo.dateTimeStamp)
)
.Average("Avg-Score", sc => sc
.Script("_score")
)
.ScriptedMetric("Urgency-Level", sm => sm
.InitScript(i => i.Inline("params._agg.data = []").Lang("painless"))
.MapScript(i => i.Inline("params._agg.data.add(doc.urgency.value)").Lang("painless"))
.CombineScript(i => i.Inline("int urgency = 0; for (u in params._agg.data) { urgency += u } return urgency").Lang("painless"))
.ReduceScript(i =>i.Inline("int urgency = 0; for (a in params._aggs) { urgency += a } return urgency").Lang("painless"))
)
.BucketScript("finalScore", scb => scb
.BucketsPath(bp => bp
.Add("maxDateTime", "Max-DateTime")
.Add("avgScore", "Avg-Score")
.Add("urgencyLevel", "Urgency-Level")
)
.Script(i => i.Inline("params.avgScore").Lang("painless"))
)
)
)
)
);
The ScriptedMetric aggregation is returning an 11 with my data set.
Am I doing something wrong? Or Is this not possible? If not possible, what would be an alternative?
Also, I know this ScriptedMetric does pretty much do what a Sum would do, but that's going to change of course...
Index mapping:
PUT /live
{
"mappings": {
"errordoc": {
"properties": {
"urgency": {
"type": "integer"
},
"dateTimeStamp": {
"type": "date",
"format": "MM/dd/yyyy hh:mm:ss a"
} }
}
}
}
Test data:
POST /live/errordoc
{ "messageID": "M111111", "messageHash": "1463454562\/-1210136287\/-1885530817\/-275007043\/-57589585", "friendly": "0", "urgency": "1", "organisation": "Organisation Name", "Environment": "ENV02", "Task": "TASK01", "Action": "A12", "dateTimeStamp": "11\/29\/2017 10:24:21 AM", "machineName": "DESKTOP-SMOM9R9", "parameters": "{ " }
Copy this document a couple of times, maybe changing the urgency/dateTimeStamp, as long as the Hash stays the same it should reproduce my environment...
Somebody on the Elasticsearch forums replied after I created 2 GitHub issues...
I did try everything, including this solution, but I must have done something else wrong during that try... Well whatever...
The solution
Change:
.Add("urgencyLevel", "Urgency-Level")
to:
.Add("urgencyLevel", "Urgency-Level.value")

Searching for an input keyword in all fields of an elasticsearch document using c# nest client

I have a nested elasticsearch document and I want to search within all the fields of that document i.e I want to search in both the top-level and the nested fields. My index name is people and my type name is person.
My documents look like this :
{
"id": 1,
"fname": "elizabeth",
"mname": "nicolas",
"lname": "thomas",
"houseno": "beijing",
"car": [
{
"carname": "audi",
"carno": 4444,
"color": "black"
},
{
"carname": "mercedez",
"carno": 5555,
"color": "pink"
}
]
}
Then i have the following query in .net which actually searches for an user input keyword in the elasticsearch documents. Basically, I want to search in each and every field of a document. And I use inner_hits in my query so that i can return only the matching nested document.
I have designed my query as :
var result = client.Search<person>
(s => s
.From(from)
.Size(size)
.Source(false)
.Query(query => query.Filtered(filtered => filtered
.Query(q => q.MatchAll())
.Filter(f => f.Nested(nf => nf
.InnerHits()
.Path(p => p.car)
.Query(qq => qq.Match(m => m.OnField(g => g.car.First().carname).Query(searchKeyword))))))));
And my corresponding JSON query which i use in the head plugin is :
POST-people/person/_search:
{
"_source":false,
"query": {
"filtered": {
"query": {"match_all": {}},
"filter": {
"nested": {
"path": "car",
"filter": {
"term": {
"car.carname": "searchKeyword"
}
},
"inner_hits" : {}
}
}
}
}
}
But i wanted to search in all the fields(id,fname,mname,lname,houseno,carname,carno,color) and not just in a single field e.g. in carname as i have done in my above query.
Also, i want to do partial searching like %xyz%.
How can i do these ?
Can anyone help me modify this query so that i can use this single query to search within all the fields as well as do partial searching?
I'm new to elasticsearch as well as .net,so I would be thankful for any help.
Did you try using Query instead of OnField method?
I mean, having your query this way:
var result = client.Search<person>
(s => s
.From(from)
.Size(size)
.Source(false)
.Query(query => query.Filtered(filtered => filtered
.Query(q => q.MatchAll())
.Filter(f => f.Nested(nf => nf
.InnerHits()
.Path(p => p.car)
.Query(qq => qq.Match(m => m.Query(searchKeyword))))))));

MultiMatch query with Nest and Field Suffix

Using Elasticsearch I have a field with a suffix - string field with a .english suffix with an english analyser on it as shown in the following mapping
...
"valueString": {
"type": "string",
"fields": {
"english": {
"type": "string",
"analyzer": "english"
}
}
}
...
The following query snippet won't compile because ValueString has no English property.
...
sh => sh
.Nested(n => n
.Path(p => p.ScreenData)
.Query(nq => nq
.MultiMatch(mm => mm
.Query(searchPhrase)
.OnFields(
f => f.ScreenData.First().ValueString,
f => f.ScreenData.First().ValueString.english)
.Type(TextQueryType.BestFields)
)
)
)...
Is there a way to strongly type the suffix at query time in NEST or do I have to use magic strings?
Did you try to use extension method Suffix?
This is how you can modify your query:
...
.OnFields(
f => f.ScreenData.First().ValueString,
f => f.ScreenData.First().ValueString.Suffix("english"))
.Type(TextQueryType.BestFields)
...
Hope it helps.

Categories

Resources