I have an ElasticSearch database with some documents in it. Each documents has its own timestamp field.
I currently have a WebApi which requires two timestamps, startTime and endTime. The WebApi simply performs a query on ES to grab the documents which have the timestamps in the given range.
This is my current query:
var readRecords = ElasticClient.Search<SegmentRecord>(s => s
.Index(ElasticIndexName)
.Filter(f =>
f.Range(i =>
i.OnField(a => a.DateTime).GreaterOrEquals(startTime).LowerOrEquals(endTime))).Size(MaximumNumberOfReturnedDocs).SortAscending(p => p.DateTime)).Documents;
Very simple, it's basically a range query based on the startTime and endTime parameters. And it works. :-)
Now the problem is: I need to retrieve even the latest document which has got a timestamp lower than startTime.
So basically the final query should be:
all the document in the range [startTime, endTime]
AND
the latest document in time which has a timestamp < startTime
the first part obviously can return any number of records, zero, just one or many
the second part should return just one document, (or zero if doesn't exist any document prior to starTime)
Something like this I meant in my comment above:
{
"query": {
"filtered": {
"filter": {
"range": {
"time": {
"gte": "2015-06-04",
"lte": "2015-06-05"
}
}
}
}
},
"aggs": {
"global_all_docs_agg": {
"global": {},
"aggs": {
"filter_for_min": {
"filter": {
"range": {
"time": {
"lte": "2015-06-04"
}
}
},
"aggs": {
"min_date": {
"top_hits": {
"size": 1,
"sort": [
{
"time": "asc"
}
]
}
}
}
}
}
}
}
}
The result looks like this:
"hits": [
{
"_index": "sss",
"_type": "test",
"_id": "1",
"_score": 1,
"_source": {
"time": "2015-06-05"
}
},
{
"_index": "sss",
"_type": "test",
"_id": "2",
"_score": 1,
"_source": {
"time": "2015-06-04"
}
},
{
"_index": "sss",
"_type": "test",
"_id": "4",
"_score": 1,
"_source": {
"time": "2015-06-05"
}
}
]
},
"aggregations": {
"global_all_docs_agg": {
"doc_count": 6,
"filter_for_min": {
"doc_count": 4,
"min_date": {
"hits": {
"total": 4,
"max_score": null,
"hits": [
{
"_index": "sss",
"_type": "test",
"_id": "5",
"_score": null,
"_source": {
"time": "2015-06-01"
},
"sort": [
1433116800000
]
}
]
}
}
}
}
}
The list between startTime and endTime is under hits. The minimum lower than startTime is under aggregations.
Related
I'm having hard time implementing autocomplete in elastic for DisplayName property for the text that have spaces in it, here is the setup of the field:
"DisplayName": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256,
"normalizer": "caseinsensitive"
},
"prefix": {
"type": "text",
"analyzer": "startswith"
}
}
},
"id": {
"type": "keyword"
}
Here is the startwith analyzer definition:
"analysis": {
"analyzer": {
"startswith": {
"char_filter": [
"html_strip"
],
"filter": [
"lowercase"
],
"tokenizer": "keyword",
"type": "custom"
}
},
"normalizer": {
"caseinsensitive": {
"filter": [
"lowercase",
"asciifolding"
],
"type": "custom"
}
}
},
"creation_date": "1565034410554",
"mapping": {
"total_fields": {
"limit": "5000"
}
},
"number_of_shards": "5",
"provided_name": "streetsmart"
In my query builder here is the query that tries to grab the result:
_type:User AND (DisplayName.prefix:Joseph adam* OR UserPrincipalName.prefix:Joseph adam*)"
and the result that I get is all the names that contains Adam which the result should be Joseph Adam Jr,
Does anyone know what I should do?
I am not aware of c# and .net syntax, but adding a working example with index data, search query, and search result in JSON format.
You can also use Match phrase prefix query that :
Returns documents that contain the words of a provided text, in the
same order as provided. The last term of the provided text is treated
as a prefix, matching any words that begin with that term.
Index Data:
{
"name": "Adam"
}
{
"name": "Joseph Adam Sr"
}
{
"name": "Joseph Adam Jr"
}
Search Query:
{
"query": {
"multi_match": {
"query": "Joseph Adam",
"fields": [
"name"
],
"type": "phrase_prefix"
}
}
}
Search Result:
"hits": [
{
"_index": "stof_64163994",
"_type": "_doc",
"_id": "1",
"_score": 0.54037446,
"_source": {
"name": "Joseph Adam Jr"
}
},
{
"_index": "stof_64163994",
"_type": "_doc",
"_id": "3",
"_score": 0.54037446,
"_source": {
"name": "Joseph Adam Sr"
}
}
]
I want to extract the field names where the search text appears in the elastic search (stored) indexed documents.
Is this type of querying possible in elastic search, I am using Nest Client in C#
Please refer to the example below:
Example: employee document
{
"first_name" : "emp first",
"last_name" : "emp last"
}
Input search text: "first"
Expected out : ["first_name"]
Input search text : "emp"
Expected output : ["first_name", "last_name"]
Thanks,
AT
There is a feature in elasticsearch "Named Queries", you can named each query and elasticsearch will return the matched queries names
For your case you can use this query
GET index/doc_type/_search
{
"_source": [
"first_name",
"last_name"
],
"query": {
"bool": {
"should": [
{
"match": {
"first_name": {
"query": "emp",
"_name": "first_name"
}
}
},
{
"match": {
"last_name": {
"query": "emp",
"_name": "last_name"
}
}
}
]
}
}
}
Elasticsearch will return result like this one
{
"took": 90,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 16.399673,
"hits": [
{
"_index": "index",
"_type": "doc_type",
"_id": "1",
"_score": 16.399673,
"_routing": "1",
"_source": {
"first_name": "emp first",
"last_name": "emp last"
},
"matched_queries": [
"first_name",
"last_name"
]
}
]
}
}
You can also do the same thing with highlighting
GET index/doc_type/_search
{
"_source": [
"first_name",
"last_name"
],
"query": {
"bool": {
"should": [
{
"match": {
"first_name": "emp"
}
},
{
"match": {
"last_name": "emp"
}
}
]
}
},
"highlight": {
"fields": {
"first_name": {},
"last_name" : {}
}
}
}
Sample Response :
{
"took": 90,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 16.399673,
"hits": [
{
"_index": "index",
"_type": "doc_type",
"_id": "1",
"_score": 16.399673,
"_routing": "1",
"_source": {
"first_name": "emp first",
"last_name": "emp last"
},
"highlight": [
"first_name" : ["<em>emp</em> first"],
"last_name" : ["<em>emp</em> last"]
]
}
]
}
}
When I use Kibana to execute the following Searchrequest to Elasticsearch
GET _search
{
"query": {
"query_string": {
"query": "PDB_W2237.docx",
"default_operator": "AND"
}
}
}
it returns:
{
"took": 14,
"timed_out": false,
"_shards": {
"total": 15,
"successful": 15,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 6.3527603,
"hits": [
{
"_index": "proconact",
"_type": "proconact",
"_id": "68cecf2c-7e5a-11e5-80fa-000c29bd9450",
"_score": 6.3527603,
"_source": {
"Id": "68cecf2c-7e5a-11e5-80fa-000c29bd9450",
"ActivityId": "1bad9115-7e5a-11e5-80fa-000c29bd9450",
"ProjectId": "08938a1d-2429-11e5-80f9-000c29bd9450",
"Filename": "PDB_W2237.docx"
}
}
]
}
}
When I use the NEST ElasticClient like
var client = new ElasticClient();
var searchResponse = client.Search<Hit>(new SearchRequest {
Query = new QueryStringQuery {
Query = "DB_W2237.docx",
DefaultOperator = Operator.And
}
});
it do return 0 Hits.
Here is the Indexmapping for the 4 Fields in the Hit:
{
"proconact": {
"mappings": {
"proconact": {
"properties": {
"ActivityId": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"Filename": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"Id": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"ProjectId": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
}
}
}
}
Are the two search-requests not the same?
The problem is your mapping doesn't allow a token different than whatever is present in your index.
In your kibana query:
GET _search
{
"query": {
"query_string": {
"query": "PDB_W2237.docx",
"default_operator": "AND"
}
}
}
You're querying PDB_W2237.docx but in your NEST you're querying DB_W2237.docx.
If you want to query DB_W2237.docx and expecting results then you might have to change the analyzer from standard analyzer which is applied by default to something else a possible candidate depends on your usecase.
I have a WebAPI method that returns Json in a flexible structure that depends on the request.
Part of the problem is that there could be any number of columns, and they could be any type. The 2 given below (Code and Count) are just one example.
This structure is based on the underlying classes but there could be any number of columns in the output. So, rather than the usual properties you might expect, these are objects in a collection with Name and Value properties.
The downside of this flexible approach is that it gives a non-standard format.
Is there a way to transform this into a more normalised shape? Are there maybe some attributes I can add to the class properties to change the way they are serialised?
For example, where there are 2 columns - Code (string) and Count (numeric):
Current Json:
{
"Rows": [
{
"Columns": [
{
"Value": "1",
"Name": "Code"
},
{
"Value": 13,
"Name": "Count"
}
]
},
{
"Columns": [
{
"Value": "2",
"Name": "Code"
},
{
"Value": 12,
"Name": "Count"
}
]
},
{
"Columns": [
{
"Value": "9",
"Name": "Code"
},
{
"Value": 1,
"Name": "Count"
}
]
},
{
"Columns": [
{
"Value": "5",
"Name": "Code"
},
{
"Value": 2,
"Name": "Count"
}
]
}
]
}
Ideally I'd like to transform it to this:
{
"Rows": [
{
"Code": "1",
"Count": 13
},
{
"Code": "2",
"Count": 12
},
{
"Code": "9",
"Count": 1
},
{
"Code": "5",
"Count": 2
}
]
}
The controller method (C#)
public ReportResponse Get(ReportRequest request)
{
var result = ReportLogic.GetReport(request);
return result;
}
The output classes
public class ReportResponse
{
public List<ReportRow> Rows { get; set; }
public ReportResponse()
{
Rows = new List<ReportRow>();
}
}
public class ReportRow
{
public List<ReportColumn> Columns { get; set; }
public ReportRow()
{
Columns = new List<ReportColumn>();
}
}
public class ReportColumn<T> : ReportColumn
{
public T Value { get; set; }
public ReportColumn(string name)
{
Name = name;
}
}
public abstract class ReportColumn
{
public string Name { get; internal set; }
}
I think the easiest way would be to map your class to a dictionary before serializing. Something like:
var dictionaries = List<Dictionary<string, object>();
foreach(var column in rows.Columns)
{
dictionaries.Add(new Dictionary<string, object>{{column.Name, column.Value}});
}
Then serialize the dictionaries variable should do the trick.
If you're using the output in JavaScript, you could translate as follows:
var
data = {
"Rows": [
{
"Columns": [
{
"Value": "1",
"Name": "Code"
},
{
"Value": 13,
"Name": "Count"
}
]
},
{
"Columns": [
{
"Value": "2",
"Name": "Code"
},
{
"Value": 12,
"Name": "Count"
}
]
},
{
"Columns": [
{
"Value": "9",
"Name": "Code"
},
{
"Value": 1,
"Name": "Count"
}
]
},
{
"Columns": [
{
"Value": "5",
"Name": "Code"
},
{
"Value": 2,
"Name": "Count"
}
]
}
]
},
output = [
];
data.Rows.forEach(function (row)
{
var
newRow = {};
row.Columns.forEach(function (column)
{
newRow[column.Name] = column.Value;
});
output.push(newRow);
})
console.log(JSON.stringify(output));
My mapping model:
// TypeLog: Error, Info, Warn
{
"onef-sora": {
"mappings": {
"Log": {
"properties": {
"application": {
"type": "string",
"index": "not_analyzed"
}
"typeLog": {
"type": "string"
}
}
}
}
}
}
My query:
{
"size": 0,
"aggs": {
"application": {
"terms": {
"field": "application",
"order" : { "_count" : "desc"},
"size": 5
},
"aggs": {
"typelogs": {
"terms": {
"field": "typeLog",
"order" : { "_term" : "asc"}
}
}
}
}
}
}
I want get top 5 application has most error, but term aggregation order support three key : _count, _term, _key. How do I order by typeLog doc_count in my query. Thanks !!!
Result I want:
{
"took": 3,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 10000,
"max_score": 0,
"hits": []
},
"aggregations": {
"application": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 5000,
"buckets": [
{
"key": "OneF0",
"doc_count": 1000,
"typelogs": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "error",
"doc_count": 334
},
{
"key": "info",
"doc_count": 333
},
{
"key": "warn",
"doc_count": 333
}
]
}
},
{
"key": "OneF1",
"doc_count": 1000,
"typelogs": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "error",
"doc_count": 333
},
{
"key": "info",
"doc_count": 334
},
{
"key": "warn",
"doc_count": 333
}
]
}
},
{
"key": "OneF2",
"doc_count": 1000,
"typelogs": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "error",
"doc_count": 332
},
{
"key": "info",
"doc_count": 333
},
{
"key": "warn",
"doc_count": 334
}
]
}
}
]
}
}
}
As you to get the top 5 applications with most errors, you can filter to keep only error logs in query (you could use a filter). Then you only need order you sub-term aggregation by descending count
{
"size": 0,
"query": {
"term": {
"typeLog": "Error"
}
},
"aggs": {
"application": {
"terms": {
"field": "application",
"order": {
"_count": "desc"
},
"size": 5
},
"aggs": {
"typelogs": {
"terms": {
"field": "typeLog",
"order": {
"_count": "desc"
}
}
}
}
}
}
}
To keep all typeLogs, you may need to perform your query the other way
{
"size": 0,
"aggs": {
"typelogs": {
"terms": {
"field": "typeLog",
"order": {
"_count": "asc"
}
},
"aggs": {
"application": {
"terms": {
"field": "application",
"order": {
"_count": "desc"
},
"size": 5
}
}
}
}
}
}
You will have 3 first level buckets, the the top 5 applications by type of log