batch search for documents elastic search - c#

I am using this hopelessly inefficient code to establish if a document is already indexed:
foreach (var entry in dic)
{
var response = client.Search<Document>(s => s.Query(q => q.QueryString(d =>
d.Query(string.Format("{0}", entry.Key)))));
if (response.Documents.Count == 0)
{
not_found++;
}
else
{
found++;
}
}
I wonder, if one could send several entry.Key in one batch rather than hitting the endpoint for every id (entry.Key)? Thanks.

Sure!
You can use a terms filter:
client.Search<Document>(s => s.Query(
q => q.Terms(
c => c
.Field(doc => doc.Id)
.Terms(keys)))
If you are specifically looking for IDs, you can use the ids filter:
client.Search<Document>(s => s.Query(
q => q.Ids(c => c.Values(keys))
);
If you are only interested in whether or not the document(s) have been indexed, consider limiting the returned fields to only the ID field so you don't waste bandwidth returning the full document:
response = client.Search<Document>(s => s
.Query(q => q.Ids(c => c.Values(keys)) // look for these IDs
.StoredFields(sf => sf.Fields(doc => doc.Id)) // return only the Id field
);
Lastly, if you're only interested in the number of matching documents, then you can ask Elasticsearch to not return any results, and only use the response metadata to count how many documents matched:
response = client.Search<Document>(s => s
.Query(q => q.Ids(c => c.Values(keys))) // look for these IDs
.Size(0) // return 0 hits
);
found += response.Total; // number of total hits

Related

Elasticsearch Nest Client - searching nested properties

I'm having a tough time finding information on how to search nested properties using the Nest client in C#.
I have email objects in an index with approximately this shape:
{
subject: “This is a test”,
content: “This is an email written as a test of the Elasticsearch system. Thanks, Mr Tom Jones”,
custodians: [
{
firstName: “Tom”,
lastName: “Jones”,
routeType: 0
},
{
firstName: “Matthew”,
lastName: “Billsley”,
routeType: 1
}
]
}
You should be able to see that there is an array in there called “custodians” which is a list of all the senders and recipients of the email. In the Fluent-style query builder in .Net I can build the query just fine when I’m using subject, content, and other “first tier” properties. But I may only want to include custodians who have the routeType = 0 in some queries. I can’t seem to find any guidance on how to accomplish this. Any ideas?
For instance, a query for the term “picnic” in the subject field would look like:
Client.SearchAsync(m => m
.Query(q => q
.Match(f => f
.Field(msg => msg.Subject)
.Query(“picnic”))));
What would the query to only get messages from the index with routeType = 0 and lastName = “Jones” be?
FYI: This is crossposted to the Elasticsearch forums. If I get a good suggestion there, I will add it here.
If you want to get messages that have a custodian with routeType == 0:
Client.SearchAsync(m => m
.Query(q => q
.Term(t => t
.Field(msg => msg.Custodians[0].RouteType)
.Value(0))));
If you want to get messages that have a custodian with lastName == "jones":
Client.SearchAsync(m => m
.Query(q => q
.Term(t => t
.Field(msg => msg.Custodians[0].LastName)
.Value("jones"))));
If you want to get messages that have a custodian with lastName == "jones" AND routeType == 0:
Client.SearchAsync(m => m
.Query(q => q
.Nested(t => t
.Path(msg => msg.Custodians)
.Query(nq =>
nq.Term(t => t.Field(msg => msg.Custodians[0].RouteType).Value(0) &&
ng.Term(t => t.Field(msg => msg.Custodians[0].LastName).Value("jones")
)
)
)
);
Note that custodians will need to be mapped as a nested field for the last query to work as expected. See here for more about nested fields.

elasticsearch NEST get nested document

situation is this. I have in elastic a group. each of these groups have a nested list of items.
Both group and items have an attribute named serial, which are unique.
I get a serial for the group and a serial for item, and with those 2 items i'm supposed to return the item.
Currently i'm doing it the following way:
public item findItem(string groupSerial, string itemSerial)
{
var searchResponse = _elasticClient.Search<Group>(s => s
.Index(_config.groupIndexName)
.Query(q => q
.ConstantScore(cs => cs
.Filter(f => f
.Term(t => t
.Field(fi => fi.serial)
.Value(groupSerial)
)
)
)
).Query(q => q
.Nested(c => c
.InnerHits(i => i.Explain())
.Path(p => p.items)
.Query(nq => nq.Term(t => t
.Field(field => field.items.First().serial)
.Value(itemSerial)))))
);
var result = searchResponse.Documents.FirstOrDefault();
return result?.items.Find(item => item.serial == itemSerial);
}
I get the feeling that there is supposed to be a more efficient way. Like getting the item straight from the search in elastic. Does anyone know how?

linq-to-sql group by list of strings from a joined table

I have a DB table [Table1] with a one to many relationship.
This related table [Table2] has a type field which is a string.
Table 1 Table 2
Field | Type Field | Type
------|----- ---------|-----
Id | int Id | int
Table1Id | int
Type | string
I am trying to create a summary of how often each combination of types occurs,
and am attempting to do as much work on the DB as possible as it is too slow to bring it all into memory.
The code I have works but seems repetitive to me.
items
.Where(x => x.Table2s.Count > 1)
.Select(x => new
{
Type = x.Table2s.Select(y => y.Type)
})
.ToList() // into memory as string.join has no translation into sql
.Select(x => new
{
Type = string.Join(",", x.Type) // no sql translation
})
.GroupBy(x => x.Type) // can't work on IEnumerable[string]
.Select(x => new Result()
{
Type = x.Key,
Count = x.Count()
})
.OrderByDescending(x => x.Count)
.ToList();
Is there a way to group by this list of strings so that I can do the grouping on the DB and also reduce the number of select statements in my code
Linq to SQL doesn't support Aggregate or String.Join or corresponding SQL tricks, so unless you want to use a stored procedure, some of the work has to happen on the client side.
One alternative would be to create the groupings first and then send them back to the server to count the matches, but that doesn't seem like a gain.
I think the best you can do is something like
var sqlans = items.Where(x => x.Table2s.Count > 0).AsEnumerable();
var ans = sqlans.Select(x => String.Join(",", x.Table2s.Select(t2 => t2.Type).OrderBy(ty => ty).ToArray())).GroupBy(ty => ty, ty => ty, (key, g) => new { key, Count = g.Count() });
I ordered the Types belonging to an item so they would match up when grouped.
The sqlans portion would be executed by the server, but the rest has to execute on the client to process the String.Join.
Instead of using EF I would be tempted to work on Table2 (doing left semi join if you might have orphan Table2 entries) directly
var sqlans = Table2.GroupBy(t2 => t2.Table1Id, t2 => t2.Type, (key, g) => g).AsEnumerable();
var ans = sqlans.Select(x => String.Join(",", x.OrderBy(ty => ty).ToArray())).GroupBy(ty => ty, ty => ty, (key, g) => new { key, Count = g.Count() });

What can I do to improve the speed of this query?

I have a linq query that returns the last page a user looked at based on a table of page hits. The fields are simply TimeStamp, UserID and URL which are logged from user activity. The query looks like this:
public static IQueryable GetUserStatus()
{
var ctx = new AppEntities();
var currentPageHits = ctx.Pagehits
.GroupBy(x => x.UserID)
.Select(x => x.Where(y => y.TimeStamp == x.Max(z => z.TimeStamp)))
.SelectMany(x => x);
return currentPageHits.OrderByDescending(o => o.TimeStamp);
}
The query works perfectly but runs slowly. Our DBA assures us that the table has indexes in all the right places and that the trouble must be with the query.
Is there anything inherently wrong or BAD with this, or is there a more efficient way of getting the same results?
You could try:
var currentPageHits2 = ctx.Pagehits
.GroupBy(x => x.UserID)
.Select(x => x.OrderByDescending(y => y.TimeStamp).First())
.OrderByDescending(x => x.TimeStamp);
But the speed should be the same.
Note that there is a subtle difference between this query and yours... With yours, if a UserId has two "max TimeStamp" PageHits with the same TimeStamp, two "rows" will be returned, with this one only one will be returned.
So you try to implement DENSE_RANK() OVER (PARTITION BY UserID ORDER BY TimeStamp DESC) with LINQ? So all latest records per user-group according to the Timestamp. You could try:
public static IQueryable GetUserStatus()
{
var ctx = new AppEntities();
var currentPageHits = ctx.Pagehits
.GroupBy(x => x.UserID)
.SelectMany(x => x.GroupBy(y => y.TimeStamp).OrderByDescending(g=> g.Key).FirstOrDefault())
.OrderByDescending(x => x.TimeStamp);
return currentPageHits;
}
So it's grouping the user-group by TimeStamp, then it takes the latest group(one or more records in case of ties). The SelectMany flattens the goups to records. I think this is more efficient than your query.

Why does this combination of Select, Where and GroupBy cause an exception?

I have a simple table structure of services with each a number of facilities. In the database, this is a Service table and a Facility table, where the Facility table has a reference to a row in the Service table.
In our application, we have the following LINQ working:
Services
.Where(s => s.Facilities.Any(f => f.Name == "Sample"))
.GroupBy(s => s.Type)
.Select(g => new { Type = g.Key, Count = g.Count() })
But for reasons beyond my control, the source set is projected to a non-entity object before the Where call, in this way:
Services
.Select(s => new { Id = s.Id, Type = s.Type, Facilities = s.Facilities })
.Where(s => s.Facilities.Any(f => f.Name == "Sample"))
.GroupBy(s => s.Type)
.Select(g => new { Type = g.Key, Count = g.Count() })
But this raises the following exception, with no inner exception:
EntityCommandCompilationException: The nested query is not supported. Operation1='GroupBy' Operation2='MultiStreamNest'
Removing the Where, however, makes it work, which makes me believe it's only in this specific combination of method calls:
Services
.Select(s => new { Id = s.Id, Type = s.Type, Facilities = s.Facilities })
//.Where(s => s.Facilities.Any(f => f.Name == "Sample"))
.GroupBy(s => s.Type)
.Select(g => new { Type = g.Key, Count = g.Count() })
Is there a way to make the above work: select to an non-entity object, and then use Where and GroupBy on the resulting queryable? Adding ToList after the Select works, but the large source set makes this unfeasible (it would execute the query on the database and then do grouping logic in C#).
This exception originates from this piece of code in the EF source...
// <summary>
// Not Supported common processing
// For all those cases where we don't intend to support
// a nest operation as a child, we have this routine to
// do the work.
// </summary>
private Node NestingNotSupported(Op op, Node n)
{
// First, visit my children
VisitChildren(n);
m_varRemapper.RemapNode(n);
// Make sure we don't have a child that is a nest op.
foreach (var chi in n.Children)
{
if (IsNestOpNode(chi))
{
throw new NotSupportedException(Strings.ADP_NestingNotSupported(op.OpType.ToString(), chi.Op.OpType.ToString()));
}
}
return n;
}
I have to admit: it's not obvious what happens here and there's no technical design document disclosing all of EF's query building strategies. But this piece of code...
// We can only pull the nest over a Join/Apply if it has keys, so
// we can order things; if it doesn't have keys, we throw a NotSupported
// exception.
foreach (var chi in n.Children)
{
if (op.OpType != OpType.MultiStreamNest
&& chi.Op.IsRelOp)
{
var keys = Command.PullupKeys(chi);
if (null == keys
|| keys.NoKeys)
{
throw new NotSupportedException(Strings.ADP_KeysRequiredForJoinOverNest(op.OpType.ToString()));
}
}
}
Gives a little peek behind the curtains. I just tried an OrderBy in a case of my own that exactly reproduced yours, and it worked. So I'm pretty sure that if you do...
Services
.Select(s => new { Id = s.Id, Type = s.Type, Facilities = s.Facilities })
.OrderBy(x => x.Id)
.Where(s => s.Facilities.Any(f => f.Name == "Sample"))
.GroupBy(s => s.Type)
.Select(g => new { Type = g.Key, Count = g.Count() })
the exception will be gone.

Categories

Resources