Make Elasticsearch diacritics insensitive

Make Elasticsearch diacritics insensitive - c#

I am using Elasticsearch 6.6.0 and NEST in a .NET MVC project.
I am indexing some products using this code:
var esSettings = new ConnectionSettings(node);
esSettings = esSettings.DefaultIndex(IndexInstanceName);
esSettings = esSettings
.DefaultMappingFor<SearchableProduct>(s => s.IdProperty("Id").IndexName(IndexInstanceName + "-products-" + ConfigurationManager.AppSettings["DefaultCulture"]));
var elastic = new ElasticClient(esSettings);
var mapResponse = elastic.Map<SearchableProduct>(x => x.AutoMap().Index(IndexInstanceName + "-products-" + culture));
var indexState = new IndexState
{
Settings = new IndexSettings()
};
indexState.Settings.Analysis = new Analysis
{
Analyzers = new Analyzers()
};
indexState.Settings.Analysis.Analyzers.Add("nospecialchars", new CustomAnalyzer
{
Tokenizer = "standard",
Filter = new List<string> { "standard", "lowercase", "stop", "asciifolding" }
});
//products
if (!elastic.IndexExists(IndexInstanceName + "-products-" + culture).Exists)
{
var response = elastic.CreateIndex(
IndexInstanceName + "-products-" + culture,
s => s.InitializeUsing(indexState)
.Mappings(m => m.Map<SearchableProduct>(sc => sc.AutoMap())));
}
await this.IndexProductsAsync(context, products, elastic, culture);
await elastic.RefreshAsync(new RefreshRequest(IndexInstanceName + "-products-" + culture));
and for the search I use the below code:
ISearchResponse<SearchableProduct> result = await elastic.SearchAsync<SearchableProduct>(s => s
.Index(elasticIndexName + "-products-" + culture)
.Take(DefaultPageSize)
.Source(src => src.IncludeAll())
.Query(query =>
query.QueryString(qs =>
qs.Query(q).DefaultOperator(Operator.And).Fuzziness(Fuzziness.EditDistance(0)).Fields(x => x.Field(d => d.Name, 2)
.Field(d => d.MetaTitle, 1)
.Field(d => d.Image, 1)
.Field(d => d.SystemId, 2)
.Field(d => d.Manufacturer, 1)
)
))
.Sort(d => d.Ascending(SortSpecialField.Score))
);
When i search for a word with accent in greek (eg παγωτό) I get results (Because in my index the product is indexed with accent), but when i use the same word without accent (eg παγωτο) i get no results.
Is anything wrong with the indexing settings or the search code?
Can I index my data without accents or alternatively index them as is but make the search or index accent insensitive?

Creating a field with a greek analyzer will make sure indexed text and query string pass the same analysis path. For παγωτό that means, during indexing, the text will be tokenized to παγωτ as well as during making the query request.
Please check my example which creates a field with greek analyzer and the example outputs both documents with παγωτό and παγωτο when looking for παγωτό or παγωτο.
class Program
{
static async Task Main(string[] args)
{
var connectionPool = new SingleNodeConnectionPool(new Uri("http://localhost:9200"));
var settings = new ConnectionSettings(connectionPool)
.DefaultIndex("index_name")
.DisableDirectStreaming()
.PrettyJson();
var client = new ElasticClient(settings);
await client.Indices.DeleteAsync("index_name");
var createIndexResponse = await client.Indices.CreateAsync("index_name",
c => c
.Map(map => map.AutoMap<Document>()));
await client.IndexManyAsync(new []
{new Document {Id = 1, Text = "παγωτό"}, new Document {Id = 2, Text = "παγωτο"},});
await client.Indices.RefreshAsync();
var query = "παγωτό";
var searchResponse = await client.SearchAsync<Document>(s => s
.Query(q => q.Match(m => m.Field(f => f.Text).Query(query))));
Console.OutputEncoding = Encoding.UTF8;
Print(query, searchResponse);
query = "παγωτο";
var searchResponse2 = await client.SearchAsync<Document>(s => s
.Query(q => q.Match(m => m.Field(f => f.Text).Query(query))));
Print(query, searchResponse2);
}
private static void Print(string query, ISearchResponse<Document> searchResponse)
{
Console.WriteLine($"For {query} found:");
foreach (var document in searchResponse.Documents)
{
Console.WriteLine($"Document {document.Id} {document.Text}");
}
}
}
public class Document
{
public int Id { get; set; }
[Text(Analyzer = "greek")]
public string Text { get; set; }
}
Prints:
For παγωτό found:
Document 1 παγωτό
Document 2 παγωτο
For παγωτο found:
Document 1 παγωτό
Document 2 παγωτο
Hope that helps.

Related

Elasticsearch Nest client Search

I try to compose a dynamic query with NEST (Elastichsearc library for dotnet 5.0) but only the first code work:
Case 1: (Work)
var response1 = await client.SearchAsync<VideoManifestElasticDto>(s =>
s.Query(q => q
.Bool(b => b
.Must(mu => mu
.Wildcard(f => f.Title, '*' + dtoSearch.Title + '*')
))));
var aaa1 = response1.Documents;
Return 4 documents. It's OK
Case 2: (Not Work)
var response2 = await client.SearchAsync<VideoManifestElasticDto>(s =>
s.Query(q => q
.Bool(b => b
.Must(mu => new WildcardQuery() { Field = nameof(VideoManifestElasticDto.Title), CaseInsensitive = true, Value = '*' + dtoSearch.Title + '*' }
))));
var aaa2 = response2.Documents;
Return 0 documents. Why?
Case 3: (Not Work)
Last case, this is my goal i wan't create a dynamic query
var response3 = await client.SearchAsync<VideoManifestElasticDto>(Blah(dtoSearch));
var aaa3 = response3.Documents;
public static SearchDescriptor<VideoManifestElasticDto> Blah(VideoManifestElasticDto videoManifestElasticDto)
{
return new SearchDescriptor<VideoManifestElasticDto>().Query(b => b.Bool( c => c.Must(Orso(videoManifestElasticDto))));
}
public static QueryContainer[] Orso(VideoManifestElasticDto videoManifestElasticDto)
{
List<QueryContainer> queryContainerList = new List<QueryContainer>();
if (videoManifestElasticDto == null)
{
return queryContainerList.ToArray();
}
if (!string.IsNullOrWhiteSpace(videoManifestElasticDto.Title))
{
var orQuery = new WildcardQuery() { Field = nameof(VideoManifestElasticDto.Title), CaseInsensitive = true, Value = '*' + videoManifestElasticDto.Title + '*' };
queryContainerList.Add(orQuery);
}
else if (!string.IsNullOrWhiteSpace(videoManifestElasticDto.Description))
{
var orQuery = new MatchQuery() { Field = "Description", Query = videoManifestElasticDto.Description };
queryContainerList.Add(orQuery);
}
else if (!string.IsNullOrWhiteSpace(videoManifestElasticDto.VideoId))
{
var orQuery = new MatchQuery() { Field = "VideoId", Query = videoManifestElasticDto.VideoId };
queryContainerList.Add(orQuery);
}
return queryContainerList.ToArray();
}

How to write the equivalent query in NEST,C# for the date_histogram by week

I need to convert the following query into c# in using NEST.
"aggs": {
"number_of_weeks": {
"date_histogram": {
"field": "#timestamp",
"interval": "week"
}
}
}
in Kibana the output is
I wrote the following query but it give me zero bucket while in Kibana it return many result in buckets
var query3 = EsClient.Search<doc>(q => q
.Index("SomeIndex")
.Size(0)
.Aggregations(agg => agg.DateHistogram("group_by_week", e => e.Field(p => p.timestamp) .Interval(DateInterval.Week)
)) ;
var resultquery3 = query3.Aggregations.DateHistogram("group_by_week");
in vs studio the output is

The problem is likely that
e => e.Field(p => p.timestamp)
does not serialize to the "#timestamp" field in Elasticsearch. For this to work, you would need to either map it with an attribute on the POCO
public class Doc
{
[Date(Name = "#timestamp")]
public DateTime timestamp { get; set; }
}
or map it on ConnectionSettings
var pool = new SingleNodeConnectionPool(new Uri("http://localhost:9200"));
var settings = new ConnectionSettings(pool)
.DefaultMappingFor<Doc>(m => m
.PropertyName(e => e.timestamp, "#timestamp")
);
var client = new ElasticClient(settings);
Alternatively, you can simply pass a string to .Field(), which implicitly converts
.Field("#timestamp")

How to get the first N result from google search using C#?

How to get the first N result from google search using c#?
using (var webclient = new WebClient())
{
const string url = "https://www.google.com.au/search?num=100&q=my+search+term";
var result = webclient.DownloadString(url);
}
Update:
How can I get where and how many times a specific url appeared ?

The following will return the first 100 result of searching 'my search term' and return the order of a specified target 'mytarget'
internal class Program
{
private const string Url = "http://www.google.com/search?num=100&q=my+search+term";
private static void Main(string[] args)
{
var result = new HtmlWeb().Load(Url);
var nodes = result.DocumentNode.SelectNodes("//html//body//div[#class='g']");
var indexes = nodes == null
? new List<int> { 0 }
: nodes.Select((x, i) => new { i, x.InnerHtml })
.Where(x => x.InnerHtml.Contains("mytarget"))
.Select(x => x.i + 1)
.ToList();
Console.WriteLine(String.Join(", ", indexes));
Console.ReadLine();
}
}
another way to do it using regex:
string html;
using (var webClient = new WebClient())
{
html = webClient.DownloadString(searchUrl);
}
var regex = new Regex("<div class=\"g\">(.*?)</div>");
var matches = regex.Matches(html).Cast<Match>().ToList();
var indexes = matches.Select((x, i) => new { i, x })
.Where(x => x.ToString().Contains("mytarget"))
.Select(x => x.i + 1)
.ToList();

Elasticsearch nest can't find filter

So I'm trying to configure my index to have certain mappings and filters, but whenever I try to create the index I get the following error:
"[amgindex] failed to create index]; nested: IllegalArgumentException[Custom Analyzer [amgsearch] failed to find filter under name [synonym]"
this is the code I'm using to create the index
public void newIndex() {
var amgBasic = new CustomAnalyzer {
Tokenizer = "edgeNGram",
Filter = new string[] { "lowercase", "worddelimiter", "stemmerEng", "stemmerNl", "stopper", "snowball" } //
};
var amgBasicText = new CustomAnalyzer {
Tokenizer = "standard",
Filter = new string[] { "lowercase", "worddelimiter" }
};
var amgSearch = new CustomAnalyzer {
Tokenizer = "whitespace",
Filter = new string[] { "lowercase", "synonym" }
};
var synonmyfilter = new SynonymTokenFilter() {
Format = "Solr",
SynonymsPath = "analysis/synonym.txt"
};
try {
var result = client.CreateIndex("amgindex", i => i
.Analysis(descriptor => descriptor
.Analyzers(bases => bases
//.Add("amgBasic", amgBasic)
//.Add("amgBasicText", amgBasicText)
.Add("amgsearch", amgSearch)
)
.TokenFilters(c => c.Add("stemmereng", new StemmerTokenFilter() { Language = "english" }))
.TokenFilters(c => c.Add("stemmernl", new StemmerTokenFilter() { Language = "english" }))
.TokenFilters(c => c.Add("stopper", new StopTokenFilter() { Stopwords = new List<string>() { "_english_", "_dutch_" } }))
.TokenFilters(c => c.Add("snowball", new SnowballTokenFilter() { Language = "english" }))
.TokenFilters(c => c.Add("worddelimiter ", new WordDelimiterTokenFilter() { }))
.TokenFilters(c => c.Add("synonym ", synonmyfilter))
)
.AddMapping<general_document>(m => m
.Properties(o => o
.String(p => p.Name(x => x.object_name).IndexAnalyzer("amgSearch"))
.String(p => p.Name(x => x.title).IndexAnalyzer("amgSearch"))
.String(p => p.Name(x => x.Text).IndexAnalyzer("amgSearch"))
)
)
);
Log.Info("Index created? " + result.Acknowledged);
} catch (Exception ex) {
Log.Error("[index-creation] " + ex.Message);
throw;
}
}
everytime I use one of my own filters the error pops up.
Any clue why this is happening?

After spending way to much time on this I found it!! :)
there was a space in the name of 2 Filters, and those were causing the issue;
▼
.TokenFilters(c => c.Add("worddelimiter ", new WordDelimiterTokenFilter() { }))
▼
.TokenFilters(c => c.Add("synonym ", synonmyfilter))

Elasticsearch.NET NEST Object Initializer syntax for a highlight request

I've got:
var result = _client.Search<ElasticFilm>(new SearchRequest("blaindex", "blatype")
{
From = 0,
Size = 100,
Query = titleQuery || pdfQuery,
Source = new SourceFilter
{
Include = new []
{
Property.Path<ElasticFilm>(p => p.Url),
Property.Path<ElasticFilm>(p => p.Title),
Property.Path<ElasticFilm>(p => p.Language),
Property.Path<ElasticFilm>(p => p.Details),
Property.Path<ElasticFilm>(p => p.Id)
}
},
Timeout = "20000"
});
And I'm trying to add a highlighter filter but I'm not that familiar with the Object Initializer (OIS) C# syntax. I've checked NEST official pages and SO but can't seem to return any results for specifically the (OIS).
I can see the Highlight property in the Nest.SearchRequest class but I'm not experienced enough (I guess) to simply construct what I need from there - some examples and explanations as to how to employ a highlighter with OIS would be hot!

This is the fluent syntax:
var response= client.Search<Document>(s => s
.Query(q => q.Match(m => m.OnField(f => f.Name).Query("test")))
.Highlight(h => h.OnFields(fields => fields.OnField(f => f.Name).PreTags("<tag>").PostTags("</tag>"))));
and this is by object initialization:
var searchRequest = new SearchRequest
{
Query = new QueryContainer(new MatchQuery{Field = Property.Path<Document>(p => p.Name), Query = "test"}),
Highlight = new HighlightRequest
{
Fields = new FluentDictionary<PropertyPathMarker, IHighlightField>
{
{
Property.Path<Document>(p => p.Name),
new HighlightField {PreTags = new List<string> {"<tag>"}, PostTags = new List<string> {"</tag>"}}
}
}
}
};
var searchResponse = client.Search<Document>(searchRequest);
UPDATE
NEST 7.x syntax:
var searchQuery = new SearchRequest
{
Highlight = new Highlight
{
Fields = new FluentDictionary<Field, IHighlightField>()
.Add(Nest.Infer.Field<Document>(d => d.Name),
new HighlightField {PreTags = new[] {"<tag>"}, PostTags = new[] {"<tag>"}})
}
};
My document class:
public class Document
{
public int Id { get; set; }
public string Name { get; set; }
}

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Make Elasticsearch diacritics insensitive - c#

Related

Elasticsearch Nest client Search

How to write the equivalent query in NEST,C# for the date_histogram by week

How to get the first N result from google search using C#?

Elasticsearch nest can't find filter

Elasticsearch.NET NEST Object Initializer syntax for a highlight request

Categories

Resources