How to avoid posting duplicates into elasticsearch using Nest .NET 6.x? - c#

When data from a device goes into the elastic there are duplicates. I like to avoid this duplicates. I'm using a object of IElasticClient, .NET and NEST to put data.
I searched for a method like ElasticClient.SetDocumentId(), but cant find.
_doc doc = (_doc)obj;
HashObject hashObject = new HashObject { DataRecordId = doc.DataRecordId, TimeStamp = doc.Timestamp };
// hashId should be the document ID.
int hashId = hashObject.GetHashCode();
ElasticClient.IndexDocumentAsync(doc);
I would like to update the data set inside the Elastic instead of adding one more same object right now.

Assuming the following set up
var pool = new SingleNodeConnectionPool(new Uri("http://localhost:9200"));
var settings = new ConnectionSettings(pool)
.DefaultIndex("example")
.DefaultTypeName("_doc");
var client = new ElasticClient(settings);
public class HashObject
{
public int DataRecordId { get; set; }
public DateTime TimeStamp { get; set; }
}
If you want to set the Id for a document explicitly on the request, you can do so with
Fluent syntax
var indexResponse = client.Index(new HashObject(), i => i.Id("your_id"));
Object initializer syntax
var indexRequest = new IndexRequest<HashObject>(new HashObject(), id: "your_id");
var indexResponse = client.Index(indexRequest);
both result in a request
PUT http://localhost:9200/example/_doc/your_id
{
"dataRecordId": 0,
"timeStamp": "0001-01-01T00:00:00"
}
As Rob pointed out in the question comments, NEST has a convention whereby it can infer the Id from the document itself, by looking for a property on the CLR POCO named Id. If it finds one, it will use that as the Id for the document. This does mean that an Id value ends up being stored in _source (and indexed, but you can disable this in the mappings), but it is useful because the Id value is automatically associated with the document and used when needed.
If HashObject is updated to have an Id value, now we can just do
Fluent syntax
var indexResponse = client.IndexDocument(new HashObject { Id = 1 });
Object initializer syntax
var indexRequest = new IndexRequest<HashObject>(new HashObject { Id = 1});
var indexResponse = client.Index(indexRequest);
which will send the request
PUT http://localhost:9200/example/_doc/1
{
"id": 1,
"dataRecordId": 0,
"timeStamp": "0001-01-01T00:00:00"
}
If your documents do not have an id field in the _source, you'll need to handle the _id values from the hits metadata from each hit yourself. For example
var searchResponse = client.Search<HashObject>(s => s
.MatchAll()
);
foreach (var hit in searchResponse.Hits)
{
var id = hit.Id;
var document = hit.Source;
// do something with them
}

Thank you very much Russ for this detailed and easy to understand description! :-)
The HashObject should be just a helper to get a unique ID from my real _doc object. Now I add a Id property to my _doc class and the rest I will show with my code below. I get now duplicates any more into the Elastic.
public void Create(object obj)
{
_doc doc = (_doc)obj;
string idAsString = doc.DataRecordId.ToString() + doc.Timestamp.ToString();
int hashId = idAsString.GetHashCode();
doc.Id = hashId;
ElasticClient.IndexDocumentAsync(doc);
}

Related

How to obtain ETag for individual document when using FeedResponse from Microsoft.Azure.Cosmos

I'm looking at samples from https://github.com/Azure/azure-cosmos-dotnet-v3/blob/dc3468bd5ce828e504ddef92ef792c35370de055/Microsoft.Azure.Cosmos.Samples/Usage/ItemManagement/Program.cs#L590
Here is the example of using ETag when using container.ReadItemAsync:
ItemResponse<SalesOrder> itemResponse = await container.ReadItemAsync<SalesOrder>(
partitionKey: new PartitionKey("Account1"),
id: "SalesOrder1");
Console.WriteLine("ETag of read item - {0}", itemResponse.ETag);
SalesOrder item = itemResponse;
//Update the total due
itemResponse.Resource.TotalDue = 1000000;
//persist the change back to the server
ItemResponse<SalesOrder> updatedDoc = await container.ReplaceItemAsync<SalesOrder>(
partitionKey: new PartitionKey(item.AccountNumber),
id: item.Id,
item: item);
Console.WriteLine("ETag of item now that is has been updated - {0}", updatedDoc.ETag);
//now, using the originally retrieved item do another update
//but set the AccessCondition class with the ETag of the originally read item and also set the AccessConditionType
//this tells the service to only do this operation if ETag on the request matches the current ETag on the item
//in our case it won't, because we updated the item and therefore gave it a new ETag
try
{
itemResponse.Resource.TotalDue = 9999999;
updatedDoc = await container.ReplaceItemAsync<SalesOrder>(itemResponse, item.Id, new PartitionKey(item.AccountNumber), new ItemRequestOptions { IfMatchEtag = itemResponse.ETag });
}
catch (CosmosException cre)
{
// now notice the failure when attempting the update
// this is because the ETag on the server no longer matches the ETag of doc (b/c it was changed in step 2)
if (cre.StatusCode == HttpStatusCode.PreconditionFailed)
{
Console.WriteLine("As expected, we have a pre-condition failure exception\n");
}
}
In my scenario I have a query and I need to obtain ETag for later use in Update. In this sample we get FeedResponse<SalesOrder> which has ETag, but this is the tag associated with last transaction, not document. So my question is how to get ETag of an individual document?
private static async Task QueryItems()
{
//******************************************************************************************************************
// 1.4 - Query for items by a property other than Id
//
// NOTE: Operations like AsEnumerable(), ToList(), ToArray() will make as many trips to the database
// as required to fetch the entire result-set. Even if you set MaxItemCount to a smaller number.
// MaxItemCount just controls how many results to fetch each trip.
//******************************************************************************************************************
Console.WriteLine("\n1.4 - Querying for a item using its AccountNumber property");
QueryDefinition query = new QueryDefinition(
"select * from sales s where s.AccountNumber = #AccountInput ")
.WithParameter("#AccountInput", "Account1");
FeedIterator<SalesOrder> resultSet = container.GetItemQueryIterator<SalesOrder>(
query,
requestOptions: new QueryRequestOptions()
{
PartitionKey = new PartitionKey("Account1"),
MaxItemCount = 1
});
List<SalesOrder> allSalesForAccount1 = new List<SalesOrder>();
while (resultSet.HasMoreResults)
{
FeedResponse<SalesOrder> response = await resultSet.ReadNextAsync();
SalesOrder sale = response.First();
Console.WriteLine($"\n1.4.1 Account Number: {sale.AccountNumber}; Id: {sale.Id};");
if(response.Diagnostics != null)
{
Console.WriteLine($" Diagnostics {response.Diagnostics.ToString()}");
}
allSalesForAccount1.Add(sale);
}
I have found a solution for now. Not perfect as I call JObject.ToObject<T> twice for:
SalesOrder - business object
ETagCosmosDocument - object with ETag
public class ETagCosmosDocument
{
public string _etag {get; set;}
}
public class CosmosDocument<T>
{
public T Document {get; private set;}
public string ETag {get; private set;}
public CosmosDocument(T document, string etag)
{
Document = document;
ETag = etag;
}
public static implicit operator CosmosDocument<T> (JObject o)
{
T document = o.ToObject<T>();
ETagCosmosDocument etagDoc = o.ToObject<ETagCosmosDocument>();
return new CosmosDocument<T>(document, etagDoc._etag);
}
}
then using FeedIterator<dynamic> instead of FeedIterator<SalesOrder>
QueryDefinition query = new QueryDefinition(...)
FeedIterator<dynamic> resultSet = container.GetItemQueryIterator<dynamic>(
query,
requestOptions: new QueryRequestOptions()
{
PartitionKey = new PartitionKey("Account1"),
MaxItemCount = 1
});
while (resultSet.HasMoreResults)
{
FeedResponse<dynamic> response = await resultSet.ReadNextAsync();
JObject r = response.First();
CosmosDocument<SalesOrder> cd = r;
...
}
When you execute a query, the FeedResponse contains a list of objects of type T. You can obtain the ETag if the T contains a property that matches with the _etag system property. Source Github issue comment.
For instance, using NewtonSoft.JSON's JsonProperty attribute, I had to add the following property to my type T
[JsonProperty("_etag")]
public string ETag { get; set; }
The etag is a property of the response object. Can get it this way.
string etag = response.ETag;
Thanks.

How to convert key value pair to a custom object list

enter image description hereI have a keyValupair of Hotel details;:
//This is where the data comes from : -
JavaScriptSerializer json_serializer = new JavaScriptSerializer();
dynamic hotels1 = (dynamic)json_serializer.DeserializeObject(jso);
var keyValuePairs = hotels1["hotels"];
var hotelList = keyValuePairs["hotels"]; // hotelList[0] ={'key'='Code' ;value='123'}
//{'key'='Name'
// ;value='Sheraton'}
how do i convert this to a list of Hotel
List<Hotel> hotaals = new List<Hotel>();
where Hotel is
public class Hotel
{
public int code { get; set; }
public string name { get; set; }
}
I use a for loop to map fields, but my great boss says its inefficient and i have to use Linq.
The loop i use
foreach (dynamic h in hotelList)
{
oneHotel = new Hotel();
oneHotel.code = h["code"];
oneHotel.name = h["name"];
myHotels.Add(oneHotel);
}
Well the brute force way would be to just project the dictionary to objects by hard-coding the properties:
List<Hotel> hotaals = hotelList.Select(kvp => new Hotel {
code = kvp['Code'],
name = kvp["Name"]
})
.ToList();
I would also challenge what your "great boss" means by inefficient".
So first you get your initial data:
var hotelList = keyValuePairs["hotels"];
Then use linq to create your new list:
var hotelObjects = hotelList.Select(hotel => new Hotel { code = hotel.key, name = hotel.name});
Now to be clear what linq is doing under the hood is an iterative loop through the objects (just like foreach) and creates a new Hotel object for each item in the hotelList and returns them as an IQueryable<Hotel>. Just apply .ToArray() or .ToList() if you don't want an IQueryable<>
Now from what it sounds like your initial List of hotel details isn't structured so you might have to modify my supplied linq query above to suit the structure of the list.
You may need something closer to this:
// Gives IQueryable<Hotel> as result
var hotelObjects = hotelList.Select(hotel => new Hotel{code = hotel["key"], name = hotel["name"]});
// Gives Array<Hotel> as result
var hotelObjects = hotelList.Select(hotel => new Hotel{code = hotel["key"], name = hotel["name"]}).ToArray();
// Gives List<Hotel> as result
var hotelObjects = hotelList.Select(hotel => new Hotel{code = hotel["key"], name = hotel["name"]}).ToList();

How to execute RawSql against the context

I currently have a project that I'm working on, which has a database connected to it. In said database I need to query some tables that don't have a relationship. I need to get a specific set of data in order to display it on my user interface. However I need to be able to reference the returned data put it into a list and convert it into json. I have a stored procedure that needs to just be executed against the context because it's retrieving data from many different tables.
I've tried using ExecuteSqlCommand but that doesn't work, because it returns -1 and can't put it into a list.
I've tried using linq to select the columns I want however it's really messy and I cannot retrieve the data as easily.
I've tried using FromSql, however that needs a model to execute against the context which is exactly what I don't want.
public string GetUserSessions(Guid memberId)
{
string sql = $"EXECUTE dbo.GetUserTrackByMemberID #p0";
var session = _context.Database.ExecuteSqlCommand(sql, memberId);
var json = JsonConvert.SerializeObject(session);
return json;
}
This is the ExecuteSqlCommand example, this returns -1 and cannot be put into a list as there will be more than one session.
public string GetUserSessions(Guid memberId)
{
var session = _context.MemberSession.Where(ms => ms.MemberId == memberId).Select(s => new Session() { SessionId =
s.SessionId, EventId = s.Session.EventId, CarCategory = s.Session.CarCategory, AirTemp = s.Session.AirTemp,
TrackTemp = s.Session.TrackTemp, Weather = s.Session.Weather, NumberOfLaps = s.Session.NumberOfLaps, SessionLength = s.Session.SessionLength,
Event = new Event() { EventId = s.Session.Event.EventId, TrackId = s.Session.Event.TrackId, Name = s.Session.Event.Name, NumberOfSessions =
s.Session.Event.NumberOfSessions, DateStart = s.Session.Event.DateStart, DateFinish = s.Session.Event.DateFinish, TyreSet = s.Session.Event.TyreSet,
Track = new Track() { TrackId = s.Session.Event.Track.TrackId, Name = s.Session.Event.Track.Name, Location = s.Session.Event.Track.Location, TrackLength
= s.Session.Event.Track.TrackLength, NumberOfCorners = s.Session.Event.Track.NumberOfCorners} } });
var json = JsonConvert.SerializeObject(session);
return json;
}
This is using Linq, however it's really messy and I feel there's probably a better way to do this, and then when retrieving the data from json it's a lot bigger pain.
public string GetUserSessions(Guid memberId)
{
var session = _context.MemberSession.FromSql($"EXECUTE dbo.GetUserSessionByMemberID {memberId}").ToList();
var json = JsonConvert.SerializeObject(session);
return json;
}
This is the ideal way I would like to do it, however since I'm using the MemberSession model it will only retrieve that data from the stored procedure which is in the MemberSession table, however I want data that is in other tables as well....
public string GetUserSessions(Guid memberId)
{
var session = _context.MemberSession.Where(ms => ms.MemberId == memberId).Include("Session").Include("Event").ToList();
var json = JsonConvert.SerializeObject(session);
return json;
}
I tried this way but because the Event table has no reference / relationship to MemberSession it returns an error.
As I've previously stated in the RawSql example I'm only getting the table data that is in the MemberSession table, no other tables.
There are no error messages.
using (var context = new DBEntities())
{
string query = $"Exec [dbo].[YOUR_SP]";
List<ResponseList> obj = context.Database.SqlQuery<ResponseList>(query).ToList();
string JSONString = JsonConvert.SerializeObject(obj);
}

Query MongoDB Using 'ObjectId'

I have inserted documents into MongoDB without an id. And I want to retrieve them by searching through their MongoDB ObjectId, that has been assigned in default.
Here is my attempt-
var query_id = Query.EQ("_id", "50ed4e7d5baffd13a44d0153");
var entity = dbCollection.FindOne(query_id);
return entity.ToString();
And I get following error-
A first chance exception of type 'System.NullReferenceException' occurred
What is the problem?
You need to create an instance of ObjectId and then query using that instance, otherwise your query compares ObjectIds to string and fails to find matching documents.
This should work:
var query_id = Query.EQ("_id", ObjectId.Parse("50ed4e7d5baffd13a44d0153"));
var entity = dbCollection.FindOne(query_id);
return entity.ToString();
In C# for latest official MongoDB.Driver write this-
var filter_id = Builders<MODEL_NAME>.Filter.Eq("id", ObjectId.Parse("50ed4e7d5baffd13a44d0153"));
var entity = dbCollection.Find(filter).FirstOrDefault();
return entity.ToString();
We can accomplish the same result without converting id from string to ObjectId. But then, we will have to add [BsonRepresentation(BsonType.ObjectId)] before id attribute in the model class.
The code can even be further simplified using lambda expression-
var entity = dbCollection.Find(document => document.id == "50ed4e7d5baffd13a44d0153").FirstOrDefault();
return entity.ToString();
If you're here in 2018 and want copy/paste code that still works or pure string syntax;
[Fact]
public async Task QueryUsingObjectId()
{
var filter = Builders<CosmosParkingFactory>.Filter.Eq("_id", new ObjectId("5b57516fd16cb04bfc35fcc6"));
var entity = stocksCollection.Find(filter);
var stock = await entity.SingleOrDefaultAsync();
Assert.NotNull(stock);
var idString = "5b57516fd16cb04bfc35fcc6";
var stringFilter = "{ _id: ObjectId('" + idString + "') }";
var entityStringFiltered = stocksCollection.Find(stringFilter);
var stockStringFiltered = await entityStringFiltered.SingleOrDefaultAsync();
Assert.NotNull(stockStringFiltered);
}
The selected answer is correct. For anyone confused by the Query.EQ, here is another way to write a basic update (updates the entire mongodb document):
string mongoDocumentID = "123455666767778";
var query = new QueryDocument("_id", ObjectId.Parse(mongoDocumentID));
var update = new UpdateDocument { { "$set", documentToSave } };
mongoCollection.Update(query, update, UpdateFlags.Multi);
The ObjectId object is needed when you want to actually search by object ID, otherwise it is comparing string to objectid type, and it won't match. Mongo is very type-strict in this way, regardless if the field name is correct.
You can also do it this way, its
public static ObjectId GetInternalId(string id)
{
if (!ObjectId.TryParse(id, out ObjectId internalId))
internalId = ObjectId.Empty;
return internalId;
}
then in your method you can do something like this
ObjectId internalId = GetMongoId.GetInternalId(id);
return await YourContext.YourTable.Find(c => c.InternalId == internalId).FirstOrDefaultAsync();
Note: id param in GetInternalId its a parameter on that method. In case you need as this:
public async Task<YourTable> Find(string id)
{
ObjectId internalId = GetMongoId.GetInternalId(id);
return await YourContext.YourTable.Find(c => c.InternalId == internalId).FirstOrDefaultAsync();
}
Hope it helps also.

How to work around NotMapped properties in queries?

I have method that looks like this:
private static IEnumerable<OrganizationViewModel> GetOrganizations()
{
var db = new GroveDbContext();
var results = db.Organizations.Select(org => new OrganizationViewModel
{
Id = org.OrgID,
Name = org.OrgName,
SiteCount = org.Sites.Count(),
DbSecureFileCount = 0,
DbFileCount = 0
});
return results;
}
This is returns results pretty promptly.
However, you'll notice the OrganizationViewModel has to properties which are getting set with "0". There are properties in the Organization model which I added via a partial class and decorated with [NotMapped]: UnsecureFileCount and SecureFileCount.
If I change those 0s to something useful...
DbSecureFileCount = org.SecureFileCount,
DbFileCount = org.UnsecureFileCount
... I get the "Only initializers, entity members, and entity navigation properties are supported" exception. I find this a little confusing because I don't feel I'm asking the database about them, I'm only setting properties of the view model.
However, since EF isn't listening to my argument I tried a different approach:
private static IEnumerable<OrganizationViewModel> GetOrganizations()
{
var db = new GroveDbContext();
var results = new List<OrganizationViewModel>();
foreach (var org in db.Organizations)
{
results.Add(new OrganizationViewModel
{
Id = org.OrgID,
Name = org.OrgName,
DbSecureFileCount = org.SecureFileCount,
DbFileCount = org.UnsecureFileCount,
SiteCount = org.Sites.Count()
});
}
return results;
}
Technically this gives me the correct results without an exception but it takes forever. (By "forever" I mean more than 60 seconds whereas the first version delivers results in under a second.)
Is there a way to optimize the second approach? Or is there a way to get the first approach to work?
Another option would be to load the values back as an anonymous type and the loop through those to load your viewmodel (n+1 is most likely the reason for the slowness).
For example:
var results = db.Organizations.Select(org => new
{
Id = org.OrgID,
Name = org.OrgName,
DbSecureFileCount = org.SecureFileCount,
DbFileCount = org.UnsecureFileCount,
SiteCount = org.Sites.Count()
}).ToList();
var viewmodels = results.Select( x=> new OrganizationViewModel
{
Id = x.Id,
Name = x.Name,
DbSecureFileCount = x.DbSecureFileCount,
DbFileCount = x.DbFileCount,
SiteCount = x.SiteCount
});
Sorry about the formatting; I'm typing on a phone.
You are basically lazy loading each object at each iteration of the loop, causing n+1 queries.
What you should do is bring in the entire collection into memory, and use it from there.
Sample code:
var organizationList = db.Organizations.Load();
foreach (var org in organizationList.Local)
{
//Here you are free to do whatever you want
}

Categories

Resources