ElasticSearch accent insensitive query with NEST C# client

ElasticSearch accent insensitive query with NEST C# client - c#

I´m trying to make a query in ElasticSearch with the NEST c# client a query without accent, my data has portuguese latin word with accent. See the code bellow:
var result = client.Search<Book>(s => s
.From(0)
.Size(20)
.Fields(f => f.Title)
.FacetTerm(f => f.OnField(of => of.Genre))
.Query(q => q.QueryString(qs => qs.Query("sao")))
);
This search did not find anything. My data on this index contains many titles like: "São Cristóvan", "São Gonçalo".
var settings = new IndexSettings();
settings.NumberOfReplicas = 1;
settings.NumberOfShards = 5;
settings.Analysis.Analyzers.Add("snowball", new Nest.SnowballAnalyzer { Language = "Portuguese" });
var idx5 = client.CreateIndex("idx5", settings);
How I can make query "sao" and find "são" using ElasticSearch?
I think have to create index with right properties, but I already tried many settings like.
or in Raw Mode:
{
"idx" : {
"settings" : {
"index.analysis.filter.jus_stemmer.name" : "brazilian",
"index.analysis.filter.jus_stop._lang_" : "brazilian"
}
}
}
How can I make the search and ignore accents?
Thanks Friends,

See the solution:
Connect on elasticsearch search with putty execute:
curl -XPOST 'localhost:9200/idx30/_close'
curl -XPUT 'localhost:9200/idx30/_settings' -d '{
"index.analysis.analyzer.default.filter.0": "standard",
"index.analysis.analyzer.default.tokenizer": "standard",
"index.analysis.analyzer.default.filter.1": "lowercase",
"index.analysis.analyzer.default.filter.2": "stop",
"index.analysis.analyzer.default.filter.3": "asciifolding",
"index.number_of_replicas": "1"
}'
curl -XPOST 'localhost:9200/idx30/_open'
Replace "idx30" with name of your index
Done!

I stumbled upon this thread since I got the same problem.
Here's the NEST code to create an index with an AsciiFolding Analyzer:
// Create the Client
string indexName = "testindex";
var uri = new Uri("http://localhost:9200");
var settings = new ConnectionSettings(uri).SetDefaultIndex(indexName);
var client = new ElasticClient(settings);
// Create new Index Settings
IndexSettings set = new IndexSettings();
// Create a Custom Analyzer ...
var an = new CustomAnalyzer();
// ... based on the standard Tokenizer
an.Tokenizer = "standard";
// ... with Filters from the StandardAnalyzer
an.Filter = new List<string>();
an.Filter.Add("standard");
an.Filter.Add("lowercase");
an.Filter.Add("stop");
// ... just adding the additional AsciiFoldingFilter at the end
an.Filter.Add("asciifolding");
// Add the Analyzer with a name
set.Analysis.Analyzers.Add("nospecialchars", an);
// Create the Index
client.CreateIndex(indexName, set);
Now you can Map your Entity to this index (it's important to do this after you created the Index)
client.MapFromAttributes<TestEntity>();
And here's how such an entity could look like:
[ElasticType(Name = "TestEntity", DisableAllField = true)]
public class TestEntity
{
public TestEntity(int id, string desc)
{
ID = id;
Description = desc;
}
public int ID { get; set; }
[ElasticProperty(Analyzer = "nospecialchars")]
public string Description { get; set; }
}
There you go, the Description-Field is now inserted into the index without accents.
You can test this if you check the Mapping of your index:
http://localhost:9200/testindex/_mapping
Which then should look something like:
{
testindex: {
TestEntity: {
_all: {
enabled: false
},
properties: {
description: {
type: "string",
analyzer: "nospecialchars"
},
iD: {
type: "integer"
}
}
}
}
}
Hope this will help someone.

You'll want to incorporate an ACSII Folding filter into your analyzer to accomplish this. That will mean constructing the snowballanalyzer form tokenizers and filters (unless nest allows you to add filters to non-custom analyzers. ElasticSearch doesn't, though, as far as I know).
A SnowballAnalyzer incorporates:
StandardTokenizer
StandardFilter
(Add the ASCIIFolding Filter here)
LowercaseFilter
StopFilter (with the appropriate stopword set)
SnowballFilter (with the appropriate language)
(Or maybe here)
I would probably try to add the ASCIIFoldingFilter just before LowercaseFilter, although it might be better to add it as the very las step (after SnowballFilter). Try it both ways, see which works better. I don't know enough about either the Protuguese stemmer to say which would be best for sure.

Related

Can I disable stemming / stop words filtering with MongoDB 4 text search, in C#

I would like to use the C# driver in MongoDB to make a full text search.
But I see that when I create the index, I can't select 'none' as a language.
I would like terms to be matched as they are and without removing the stop words either.

Given a type
public class Entity
{
public string Text;
}
You can do this:
var collection = new MongoClient().GetDatabase("test").GetCollection<Entity>("collection");
var indexKeysDefinition = new IndexKeysDefinitionBuilder<Entity>().Text(x => x.Text);
var createIndexOptions = new CreateIndexOptions { DefaultLanguage= "none" };
collection.Indexes.CreateOne(new CreateIndexModel<Entity>(indexKeysDefinition, createIndexOptions));

How to use Addfields in MongoDB C# Aggregation Pipeline

Mongo DB's Aggregation pipeline has an "AddFields" stage that allows you to project new fields to the pipeline's output document without knowing what fields already existed.
It seems this has not been included in the C# driver for Mongo DB (using version 2.7).
Does anyone know if there are any alternatives to this? Maybe a flag on the "Project" stage?

I'm not sure all the BsonDocument usage is required. Certainly not in this example where I append the textScore of a text search to the search result.
private IAggregateFluent<ProductTypeSearchResult> CreateSearchQuery(string query)
{
FilterDefinition<ProductType> filter = Builders<ProductType>.Filter.Text(query);
return _collection
.Aggregate()
.Match(filter)
.AppendStage<ProductType>("{$addFields: {score: {$meta:'textScore'}}}")
.Sort(Sort)
.Project(pt => new ProductTypeSearchResult
{
Description = pt.ExternalProductTypeDescription,
Id = pt.Id,
Name = pt.Name,
ProductFamilyId = pt.ProductFamilyId,
Url = !string.IsNullOrEmpty(pt.ShopUrl) ? pt.ShopUrl : pt.TypeUrl,
Score = pt.Score
});
}
Note that ProductType does have a Score property defined as
[BsonIgnoreIfNull]
public double Score { get; set; }
It's unfortunate that $addFields is not directly supported and we have to resort to "magic strings"

As discussed here Using $addFields in MongoDB Driver for C# you can build the aggregation stage yourself with a BsonDocument.
To use the example from https://docs.mongodb.com/manual/reference/operator/aggregation/addFields/
{
$addFields: {
totalHomework: { $sum: "$homework" } ,
totalQuiz: { $sum: "$quiz" }
}
}
would look something like this:
BsonDocument expression = new BsonDocument(new List<BsonElement>() {
new BsonElement("totalHomeWork", new BsonDocument(new BsonElement("$sum", "$homework"))),
new BsonElement("totalQuiz", new BsonDocument(new BsonElement("$sum", "$quiz")))
});
BsonDocument addFieldsStage = new BsonDocument(new BsonElement("$addFields", expression));
IAggregateFluent<BsonDocument> aggregate = col.Aggregate().AppendStage(addFieldsStage);
expression being the BsonDocument representing
{
totalHomework: { $sum: "$homework" } ,
totalQuiz: { $sum: "$quiz" }
}
You can append additional stages onto the IAggregateFluent Object as normal
IAggregateFluent<BsonDocument> aggregate = col.Aggregate()
.Match(filterDefintion)
.AppendStage(addFieldsStage)
.Project(projectionDefintion);

Comparing two fields of mongo collection using c# driver in mono

Am completely new to Mongodb and C# driver.
Development is being done using Monodevelop on Ubuntu 14.04 and Mongodb's version is 3.2.10 :
Currently my code has a POCO as below:
public class User
{
public String Name { get; set;}
public DateTime LastModifiedAt { get; set;}
public DateTime LastSyncedAt { get; set;}
public User ()
{
}
}
Have been able to create a collection and also to add users.
How do I find users, whose LastModifiedAt timestamp is greater than LastSyncedAt timestamp ? Did some searching, but haven't been able to find the answer.
Any suggestions would be of immense help
Thanks

Actually, it is not very simple. This should be possible with querysuch as :
var users = collection.Find(user => user.LastModifiedAt > user.LastSyncedAt).ToList();
But unfortunetly MongoDriver could not translate this expression.
You could either query all Users and filter on the client side:
var users = collection.Find(Builders<User>.Filter.Empty)
.ToEnumerable()
.Where(user => user.LastModifiedAt > user.LastSyncedAt)
.ToList();
Or send json query, because MongoDb itself is able to do it:
var jsonFliter = "{\"$where\" : \"this.LastModifiedAt>this.LastSyncedAt\"}";
var users = collection.Find(new JsonFilterDefinition<User>(jsonFliter))
.ToList();
And, yes, you need an Id - Property for your model class, i haven't mentioned it first, because i thought you do have one, just not posted in the question.

There is another way to do it. First lets declare collection:
var collection = Database.GetCollection<BsonDocument>("CollectionName");
Now lets add our project:
var pro = new BsonDocument {
{"gt1", new BsonDocument {
{ "$gt", new BsonArray(){ "$LastModifiedAt", "$LastSyncedAt" }
}
} },
{"Name", true },
{"LastModifiedAt", true },
{"LastSyncedAt", true }
};
Now lets add our filter:
var filter = Builders<BsonDocument>.Filter.Eq("gt1", true);
We'll aggregate our query:
var aggregate = collection.Aggregate(new AggregateOptions { AllowDiskUse = true })
.Project(pro)
.Match(filter)
Now our query is ready. We can check our query as follow:
var query=aggregate.ToString();
Lets run our query as follow:
var query=aggregate.ToList();
This with return the required data in list of bson documents.
This solution will work mongo c# driver 3.6 or above. Please comment in case of any confusion. Hopefully i'll able to explain this.

NEST - IndexMany doesn't index my objects

I've used NEST for elasticsearch for a while now and up until now I've used the regular ElasticSearchClient.Index(...) function, but now I want to index many items in a bulk operation.
I found the IndexMany(...) function, but I must do something wrong because nothing is added to the elastic search database as it does with the regular Index(...) function?
Does anyone have any idea?
Thanks in advance!

I found the problem. I had to specifiy the index name in the call to IndexMany
var res = ElasticClient.CreateIndex("pages", i => i.Mappings(m => m.Map<ESPageViewModel>(mm => mm.AutoMap())));
var page = new ESPageViewModel
{
Id = dbPage.Id,
PageId = dbPage.PageId,
Name = dbPage.Name,
Options = pageTags,
CustomerCategoryId = saveTagOptions.CustomerCategoryId,
Link = dbPage.Link,
Price = dbPage.Price
};
var pages = new List<ESPageViewModel>() { page };
var res2 = ElasticClient.IndexManyAsync<ESPageViewModel>(pages, "pages");
This works as expected. Guess I could specify a default index name in the configuration to avoid specifying the index for the IndexMany call.

If you are using C# you should create a list of objects that you want to insert then call the IndexMany function.
Example :
List<Business> businessList = new List<Business>();
#region Fill the business list
...............................
#endregion
if (businessList.Count == 1000) // the size of the bulk.
{
EsClient.IndexMany<Business>(businessList, IndexName);
businessList.Clear();
}
And in the end check again
if (businessList.Count > 0)
{
EsClient.IndexMany<Business>(businessList, IndexName);
}

Refactor linq statement

I have a linq expression that I've been playing with in LINQPad and I would like to refactor the expression to replace all the tests for idx == -1 with a single test. The input data for this is the result of a free text search on a database used for caching Active Directory info. The search returns a list of display names and associated summary data from the matching database rows. I want to extract from that list the display name and the matching Active Directory entry. Sometimes the match will only occur on the display name so there may be no further context. In the example below, the string "Sausage" is intended to be the search term that returned the two items in the matches array. Clearly this wouldn't be the case for a real search because there is no match for Sausage in the second array item.
var matches = new []
{
new { displayName = "Sausage Roll", summary = "|Title: Network Coordinator|Location: Best Avoided|Department: Coordination|Email: Sausage.Roll#somewhere.com|" },
new { displayName = "Hamburger Pattie", summary = "|Title: Network Development Engineer|Location: |Department: Planning|Email: Hamburger.Pattie#somewhere.com|" },
};
var context = (from match in matches
let summary = match.summary
let idx = summary.IndexOf("Sausage")
let start = idx == -1 ? 0 : summary.LastIndexOf('|', idx) + 1
let stop = idx == -1 ? 0 : summary.IndexOf('|', idx)
let ctx = idx == -1 ? "" : string.Format("...{0}...", summary.Substring(start, stop - start))
select new { displayName = match.displayName, summary = ctx, })
.Dump();
I'm trying to create a list of names and some context for the search results if any exists. The output below is indicative of what Dump() displays and is the correct result:
displayName summary
---------------- ------------------------------------------
Sausage Roll ...Email: Sausage.Roll#somewhere.com...
Hamburger Pattie
Edit: Regex version is below, definitely tidier:
Regex reg = new Regex(#"\|((?:[^|]*)Sausage[^|]*)\|");
var context = (from match in matches
let m = reg.Match(match.summary)
let ctx = m.Success ? string.Format("...{0}...", m.Groups[1].Value) : ""
select new { displayName = match.displayName, context = ctx, })
.Dump();

(I know this doesn't answer your specific question), but here's my contribution anyway:
You haven't really described how your data comes in. As #Joe suggested, you could use a regex or split the fields as I've done below.
Either way I would suggested refactoring your code to allow unit testing.
Otherwise if your data is invalid / corrupt whatever, you will get a runtime error in your linq query.
[TestMethod]
public void TestMethod1()
{
var matches = new[]
{
new { displayName = "Sausage Roll", summary = "|Title: Network Coordinator|Location: Best Avoided|Department: Coordination|Email: Sausage.Roll#somewhere.com|" },
new { displayName = "Hamburger Pattie", summary = "|Title: Network Development Engineer|Location: |Department: Planning|Email: Hamburger.Pattie#somewhere.com|" },
};
IList<Person> persons = new List<Person>();
foreach (var m in matches)
{
string[] fields = m.summary.Split('|');
persons.Add(new Person { displayName = m.displayName, Title = fields[1], Location = fields[2], Department = fields[3] });
}
Assert.AreEqual(2, persons.Count());
}
public class Person
{
public string displayName { get; set; }
public string Title { get; set; }
public string Location { get; set; }
public string Department { get; set; }
/* etc. */
}

Or something like this:
Regex reg = new Regex(#"^|Email.*|$");
foreach (var match in matches)
{
System.Console.WriteLine(match.displayName + " ..." + reg.Match(match.summary) + "... ");
}
I haven't tested this, probably not even correct syntax but just to give you an idea of how you could do it with regex.
Update
Ok, i've seen your answer and it's good that you posted it because I think i didn't explain it clearly.
I expected your answer to look something like this at the end (tested using LINQPad now, and now i understand what you mean by using LINQPad because it actually does run a C# program not just linq commands, awesome!) Anyway this is what it should look like:
foreach (var match in matches)
Console.WriteLine(string.Format("{0,-20}...{1}...", match.displayName, Regex.Match(match.summary, #"Email:(.*)[|]").Groups[1]));
}
That's it, the whole thing, take linq out of it, completely!
I hope this clears it up, you do not need linq at all.

like this?
var context = (from match in matches
let summary = match.summary
let idx = summary.IndexOf("Sausage")
let test=idx == -1
let start =test ? 0 : summary.LastIndexOf('|', idx) + 1
let stop = test ? 0 : summary.IndexOf('|', idx)
let ctx = test ? "" : string.Format("...{0}...", summary.Substring(start, stop - start))
select new { displayName = match.displayName, summary = ctx, })
.Dump();

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

ElasticSearch accent insensitive query with NEST C# client - c#

Related

Can I disable stemming / stop words filtering with MongoDB 4 text search, in C#

How to use Addfields in MongoDB C# Aggregation Pipeline

Comparing two fields of mongo collection using c# driver in mono

NEST - IndexMany doesn't index my objects

Refactor linq statement

Categories

Resources