Creating a sequence -- SetOnInsert appears to do nothing - c#

I'm having a problem trying, what boils down to, incrementing a field in a document or inserting an entire document. The context is "trying to insert an initial document for a sequence or incrementing the sequence number for an existing sequence".
This code:
private async Task<int> GetSequenceNumber(string sequenceName)
{
var filter = new ExpressionFilterDefinition<Sequence>(x => x.Id == sequenceName);
var builder = Builders<Sequence>.Update;
var update = builder
.SetOnInsert(x => x.CurrentValue, 1000)
.Inc(x => x.CurrentValue, 1);
var sequence = await _context.SequenceNumbers.FindOneAndUpdateAsync(
filter,
update,
new FindOneAndUpdateOptions<Sequence>
{
IsUpsert = true,
ReturnDocument = ReturnDocument.After,
});
return sequence.CurrentValue;
}
results in the exception
MongoDB.Driver.MongoCommandException: Command findAndModify failed: Updating the path 'currentvalue' would create a conflict at 'currentvalue'.
at MongoDB.Driver.Core.WireProtocol.CommandUsingCommandMessageWireProtocol`1.ProcessResponse(ConnectionId connectionId, CommandMessage responseMessage)
Removing the SetOnInsert results in no errors, but inserts a document with the currentValue equal to 1 instead of the expected 1000.
It almost appears if SetOnInsert is not being honored, and that what's happening is a default document is inserted and then currentValue is incremented via Inc atomically as the new document is created.
How do I overcome these issues? A non-C# solution would also be welcome, as I could translate that...

Ok thanks to #dododo in the comments, I now realize that both an Inc and a SetOnInsert can't be applied at the same time. It's unintuitive because you'd think the former would apply on update only and the latter on insert only.
I went with the solution below, which suffers more than one round-trip, but at least works, and appears to work with my concurrency based tests.
public async Task<int> GetSequenceNumber(string sequenceName, int tryCount)
{
if (tryCount > 5) throw new InvalidOperationException();
var filter = new ExpressionFilterDefinition<Sequence>(x => x.Id == sequenceName);
var builder = Builders<Sequence>.Update;
// optimistically assume value was already initialized
var update = builder.Inc(x => x.CurrentValue, 1);
var sequence = await _context.SequenceNumbers.FindOneAndUpdateAsync(
filter,
update,
new FindOneAndUpdateOptions<Sequence>
{
IsUpsert = true,
ReturnDocument = ReturnDocument.After,
});
if (sequence == null)
try
{
// we have to try to save a new sequence...
sequence = new Sequence { Id = sequenceName, CurrentValue = 1001 };
await _context.SequenceNumbers.InsertOneAsync(sequence);
}
// ...but something else could beat us to it
catch (MongoWriteException e) when (e.WriteError.Code == DuplicateKeyCode)
{
// ...so we have to retry an update
return await GetSequenceNumber(sequenceName, tryCount + 1);
}
return sequence.CurrentValue;
}
I'm sure there are other options. It may be possible to use an aggregation pipeline, for example.

Related

How to convert SQLQuery to SortedList through EF6

I have an Entity Framework 6 class called Materials, which is reflected in my database as a table with the same name. Using a parent parameter, I need to return a sorted list of materials from a SQL Query, so that I can later check that edits the user makes do not affect the order. My SQL is a stored procedure that looks like this:
CREATE PROC [dbo].[GET_SortedMaterials](#FinishedGoodCode VARCHAR(50))
AS
SELECT
ROW_NUMBER() OVER (ORDER BY Component.Percentage_of_Parent DESC,Material.Material) AS _sortField
,Material.*
FROM
Components AS Component
INNER JOIN Materials AS Material ON Component.Child_Material = Material.Material
WHERE
Component.Parent_Code = #FinishedGoodCode
ORDER BY
Component.Percentage_of_Parent DESC
,Material.Material
As you can see, the orderby field is not included in the Material. For this reason, I felt I could not return just a set of Material objects and still keep the sorting - I have performed the ordering in SQL and added the _sortField (I think that field may be a bad idea).
My C# code to read the SQL looks like this:
public async Task<SortedList<int, Materials>> GET_SortedMaterials(IProgress<Report> progress, string finishedGoodCode)
{
try
{
var report = new Report { Message = "Retrieving Sorted Materials", NewLine = true, StatusCode = Enums.StatusCode.Working };
progress.Report(report);
using (var context = new DBContext())
{
var ingredientList = await context.Database.SqlQuery<(int _sortField,Materials mat)>("[app].[GET_Customers]").ToListAsync();
var sorted = new SortedList<int, Raw_Materials>();
foreach (var (_sortField, mat) in ingredientList.OrderBy(x=>x._sortField))
{
sorted.Add(_sortField, mat);
}
return sorted;
}
}
catch (Exception ex)
{ [EXCLUDED CODE]
}
}
When the code executes, I get the correct number of rows returned, but I do not get a Sorted list where the Key corresponds to the _sortField value and the Value to the Material value. I have tried various different versions of basically the same code and I cannot get the script to return a list of materials with information about their sorting, instead, the conversion to EF class fails entirely and I only get null values back:
Any advice about how to return a sorted list from SQL and maintain the sorting in C#, when the sort field is not in the return values would be very gratefully received.
use
var ingredientList = await context.Database.SqlQuery<Materials>("[app].[GET_Customers]").Select((mat, _sortField) => (_sortField, mat)).ToDictionary(x => x._sortField, x => x.mat);
or if you want async load use
var ingredientList = await context.Database.SqlQuery<Materials>("[app].[GET_Customers]").ToListAsync().Result.Select((mat, _sortField) => (_sortField, mat)).ToDictionary(x => x._sortField, x => x.mat);
full code
public async Task<SortedList<int, Materials>> GET_SortedMaterials(IProgress<Report> progress, string finishedGoodCode)
{
try
{
var report = new Report { Message = "Retrieving Sorted Materials", NewLine = true, StatusCode = Enums.StatusCode.Working };
progress.Report(report);
using (var context = new DBContext())
{
var ingredientList = await context.Database.SqlQuery<Materials>("[app].[GET_Customers]").ToListAsync().Result.Select((mat, _sortField) => (_sortField, mat)).ToDictionary(x => x._sortField, x => x.mat);
var sorted = new SortedList<int, Raw_Materials>();
foreach (var item in ingredientList.OrderBy(x => x.Key))
{
sorted.Add(item.Key, item.Value);
}
return sorted;
}
}
catch (Exception ex)
{
[EXCLUDED CODE]
}
}

C# Parallel.ForEach with shared function throws IndexOutOfRangeException

I need to help solve the problem with shared function in Parallel.ForEach. I got an error lower, how can I change the function to be saved for work with threads ?
public IEnumerable<Datamodel> LoadLibrary(IEnumerable<Items> items)
{
var allLibReferences = new List<LibraryReferenceModel>();
var baseData = LoadBaseLibData();
Parallel.ForEach(baseData, data =>
{
var item = items.ToList().FindAll(c => c.Name == data.Name);
CreateLibraryReference(allLibReferences, item, data.Name); // Problem to call function in Parallel.ForEach
});
return allLibReferences;
}
private static void CreateLibraryReference(ICollection<LibraryReferenceModel> allLibReferences,
IReadOnlyCollection<Item> item, string libraryName)
{
allLibReferences.Add(item.Count == 0
? new LibraryReferenceModel
{
LibName = libraryName,
HasReference = false,
References = item.Count
}
: new LibraryReferenceModel
{
LibName = libraryName,
HasReference = true,
References = item.Count
});
}
I got This exception (the index is out of array bounds):
Thank you
As you've found, since multiple threads are attempting to add new items to the shared allLibReferences collection, you'll find erratic thread safety issues like the error you've described.
This is why it's really important to make your code thread safe before you consider parallelising it. One of the best techniques is to ensure that you rely on immutable code constructs, i.e. never try and change (mutate) the value of a shared variable during parallel code.
So I would change the way the code works, so that instead of sharing a collection, what we do is project the items needed immutably, which can be safely parallelised (I've used .AsParallel, as its simpler), and then you can collate the results and return them.
Furthermore, since the whole point of parallelism is to make code run as quickly as possible, you'll also want to remove inefficiencies such as materialising the same items in a list during each iteration (items.ToList()), and you'll also want to avoid O(N) iterations during a loop if possible - I've replaced .FindAll(c => c.Name == data.Name) with a pre-calculated dictionary.
Putting that altogether, you'll wind up with something like this:
public IEnumerable<LibraryReferenceModel> LoadLibrary(IEnumerable<Item> items)
{
var keyedItems = items.GroupBy(i => i.Name)
.ToDictionary(grp => grp.Key, grp => grp.ToList());
var baseData = LoadBaseLibData();
var allLibReferences = baseData
.AsParallel()
.SelectMany(data =>
{
if (keyedItems.TryGetValue(data.Name, out var matchedItems))
{
return matchedItems
.Select(i => ProjectLibraryReference(i, data.Name));
}
// No matches found
return new LibraryReferenceModel
{
LibName = data.Name,
HasReference = false,
References = 0
};
})
.ToList();
return allLibReferences;
}
private static LibraryReferenceModel ProjectLibraryReference(IReadOnlyCollection<Item> item,
string libraryName)
{
return new LibraryReferenceModel
{
LibName = libraryName,
HasReference = item.Count > 0,
References = item.Count
};
}
I've assumed that multiple items can have the same name, hence we're grouping before creating the Dictionary, and then we're flattening the projected results with .SelectMany at the end.

How to batch retrieve entities?

In Azure table storage, how can I query for a set of entities that match specific row keys in a partition???
I'm using Azure table storage and need to retrieve a set of entities that match a set of row keys within the partition.
Basically if this were SQL it may look something like this:
SELECT TOP 1 SomeKey
FROM TableName WHERE SomeKey IN (1, 2, 3, 4, 5);
I figured to save on costs and reduce doing a bunch of table retrieve operations that I could just do it using a table batch operation. For some reason I'm getting an exception that says:
"A batch transaction with a retrieve operation cannot contain any other operations"
Here is my code:
public async Task<IList<GalleryPhoto>> GetDomainEntitiesAsync(int someId, IList<Guid> entityIds)
{
try
{
var client = _storageAccount.CreateCloudTableClient();
var table = client.GetTableReference("SomeTable");
var batchOperation = new TableBatchOperation();
var counter = 0;
var myDomainEntities = new List<MyDomainEntity>();
foreach (var id in entityIds)
{
if (counter < 100)
{
batchOperation.Add(TableOperation.Retrieve<MyDomainEntityTableEntity>(someId.ToString(CultureInfo.InvariantCulture), id.ToString()));
++counter;
}
else
{
var batchResults = await table.ExecuteBatchAsync(batchOperation);
var batchResultEntities = batchResults.Select(o => ((MyDomainEntityTableEntity)o.Result).ToMyDomainEntity()).ToList();
myDomainEntities .AddRange(batchResultEntities );
batchOperation.Clear();
counter = 0;
}
}
return myDomainEntities;
}
catch (Exception ex)
{
_logger.Error(ex);
throw;
}
}
How can I achieve what I'm after without manually looping through the set of row keys and doing an individual Retrieve table operation for each one? I don't want to incur the cost associated with doing this since I could have hundreds of row keys that I want to filter on.
I made a helper method to do it in a single request per partition.
Use it like this:
var items = table.RetrieveMany<MyDomainEntity>(partitionKey, nameof(TableEntity.RowKey),
rowKeysList, columnsToSelect);
Here's the helper methods:
public static List<T> RetrieveMany<T>(this CloudTable table, string partitionKey,
string propertyName, IEnumerable<string> valuesRange,
List<string> columnsToSelect = null)
where T : TableEntity, new()
{
var enitites = table.ExecuteQuery(new TableQuery<T>()
.Where(TableQuery.CombineFilters(
TableQuery.GenerateFilterCondition(
nameof(TableEntity.PartitionKey),
QueryComparisons.Equal,
partitionKey),
TableOperators.And,
GenerateIsInRangeFilter(
propertyName,
valuesRange)
))
.Select(columnsToSelect))
.ToList();
return enitites;
}
public static string GenerateIsInRangeFilter(string propertyName,
IEnumerable<string> valuesRange)
{
string finalFilter = valuesRange.NotNull(nameof(valuesRange))
.Distinct()
.Aggregate((string)null, (filterSeed, value) =>
{
string equalsFilter = TableQuery.GenerateFilterCondition(
propertyName,
QueryComparisons.Equal,
value);
return filterSeed == null ?
equalsFilter :
TableQuery.CombineFilters(filterSeed,
TableOperators.Or,
equalsFilter);
});
return finalFilter ?? "";
}
I have tested it for less than 100 values in rowKeysList, however, if it even throws an exception if there are more, we can always split the request into parts.
With hundreds of row keys, that rules out using $filter with a list of row keys (which would result in partial partition scan anyway).
With the error you're getting, it seems like the batch contains both queries and other types of operations (which isn't permitted). I don't see why you're getting that error, from your code snippet.
Your only other option is to execute individual queries. You can do these asynchronously though, so you wouldn't have to wait for each to return. Table storage provides upwards of 2,000 transactions / sec on a given partition, so it's a viable solution.
Not sure how I missed this in the first place, but here is a snippet from the MSDN documentation for the TableBatchOperation type:
A batch operation may contain up to 100 individual table operations, with the requirement that each operation entity must have same partition key. A batch with a retrieve operation cannot contain any other operations. Note that the total payload of a batch operation is limited to 4MB.
I ended up executing individual retrieve operations asynchronously as suggested by David Makogon.
I made my own ghetto link-table. I know it's not that efficient (maybe its fine) but I only make this request if the data is not cached locally, which only means switching devices. Anyway, this seems to work. Checking the length of the two arrays lets me defer the context.done();
var query = new azure.TableQuery()
.top(1000)
.where('PartitionKey eq ?', 'link-' + req.query.email.toLowerCase() );
tableSvc.queryEntities('linkUserMarker',query, null, function(error, result, response) {
if( !error && result ){
var markers = [];
result.entries.forEach(function(e){
tableSvc.retrieveEntity('markerTable', e.markerPartition._, e.RowKey._.toString() , function(error, marker, response){
markers.push( marker );
if( markers.length == result.entries.length ){
context.res = {
status:200,
body:{
status:'error',
markers: markers
}
};
context.done();
}
});
});
} else {
notFound(error);
}
});
I saw your post when I was looking for a solution, in my case I needed to be look up multiple ids at the same time.
Because there is no contains linq support (https://learn.microsoft.com/en-us/rest/api/storageservices/query-operators-supported-for-the-table-service) I just made a massive or equals chain.
Seems to be working for me so far hope it helps anyone.
public async Task<ResponseModel<ICollection<TAppModel>>> ExecuteAsync(
ICollection<Guid> ids,
CancellationToken cancellationToken = default
)
{
if (!ids.Any())
throw new ArgumentOutOfRangeException();
// https://learn.microsoft.com/en-us/rest/api/storageservices/query-operators-supported-for-the-table-service
// Contains not support so make a massive or equals statement...lol
var item = Expression.Parameter(typeof(TTableModel), typeof(TTableModel).FullName);
var expressions = ids
.Select(
id => Expression.Equal(
Expression.Constant(id.ToString()),
Expression.MakeMemberAccess(
Expression.Parameter(typeof(TTableModel), nameof(ITableEntity.RowKey)),
typeof(TTableModel).GetProperty(nameof(ITableEntity.RowKey))
)
)
)
.ToList();
var builderExpression = expressions.First();
builderExpression = expressions
.Skip(1)
.Aggregate(
builderExpression,
Expression.Or
);
var finalExpression = Expression.Lambda<Func<TTableModel, bool>>(builderExpression, item);
var result = await _azureTableService.FindAsync(
finalExpression,
cancellationToken
);
return new(
result.Data?.Select(_ => _mapper.Map<TAppModel>(_)).ToList(),
result.Succeeded,
result.User,
result.Messages.ToArray()
);
}
public async Task<ResponseModel<ICollection<TTableEntity>>> FindAsync(
Expression<Func<TTableEntity,bool>> filter,
CancellationToken ct = default
)
{
try
{
var queryResultsFilter = _tableClient.QueryAsync<TTableEntity>(
FilterExpressionTree(filter),
cancellationToken: ct
);
var items = new List<TTableEntity>();
await foreach (TTableEntity qEntity in queryResultsFilter)
items.Add(qEntity);
return new ResponseModel<ICollection<TTableEntity>>(items);
}
catch (Exception exception)
{
_logger.Error(
nameof(FindAsync),
exception,
exception.Message
);
// OBSFUCATE
// TODO PASS ERROR ID
throw new Exception();
}
}

Find POCO with MongoDB .Net driver

MongoDB was harder than I remembered! I've tried various versions of if-exists-replace-else-insert with various functions and options. It should be easy, shouldn't it?
It's my personal opinion that the following should work.
var collection = storageClient.GetCollection<Observer>("observers");
await collection.Indexes.CreateOneAsync(Builders<Observer>.IndexKeys.Ascending(_ => _.MyId), new CreateIndexOptions { Unique = true });
foreach (var observer in _observers)
{
observer.Timestamp = DateTime.Now;
var res = await collection.FindAsync(o => o.MyId == observer.MyId);
if (res==null ||res.Current == null) {
await collection.InsertOneAsync(observer); //Part 1, fails 2nd time solved with res.MoveNextAsync()
}
else
{
observer.ID = res.Current.Single().ID;
var res2 = await collection.ReplaceOneAsync(o=>o.MyId==observer.MyId, observer);
var res3 = await collection.FindAsync(o => o.MyId == observer.MyId);
await res3.MoveNextAsync();
Debug.Assert(res3.Current.Single().Timestamp == observer.Timestamp); //Part 2, Assert fails.
}
}
Observer looks approximately like this:
public class Observer : IObserver
{
[BsonId]
public Guid ID { get; set; }
public int MyId { get; set; }
public DateTime Timestamp { get; set; }
}
The second time I run this with the exact same collection I unexpectedly get:
E11000 duplicate key error index: db.observers.$MyId_1 dup key: { : 14040 }
Edit:
Added original part two code: replacement.
Edit 2:
Now my code looks like this. Still fails.
var collection = storageClient.GetCollection<Observer>("gavagai_mentions");
await collection.Indexes.CreateOneAsync(Builders<Observer>.IndexKeys.Ascending(_ => _.MyID), new CreateIndexOptions { Unique = true });
foreach (var observer in _observers)
{
observer.Timestamp = DateTime.Now;
// Create a BsonDocument version of the POCO that we can manipulate
// and then remove the _id field so it can be used in a $set.
var bsonObserver = observer.ToBsonDocument();
bsonObserver.Remove("_id");
// Create an update object that sets all fields on an insert, and everthing
// but the immutable _id on an update.
var update = new BsonDocument("$set", bsonObserver);
update.Add(new BsonDocument("$setOnInsert", new BsonDocument("_id", observer.ID)));
// Enable the upsert option to create the doc if it's not found.
var options = new UpdateOptions { IsUpsert = true };
var res = await collection.UpdateOneAsync(o => o.MyID == observer.MyID,
update, options);
var res2 = await collection.FindAsync(o => o.MyID == observer.MyID);
await res2.MoveNextAsync();
Debug.Assert(res2.Current.Single().Timestamp == observer.Timestamp); //Assert fails, but only because MongoDB stores dates as UTC, or so I deduce. It works!!
}
You can do this atomically with UpdateOneAsync by using the IsUpsert option to create the doc if it doesn't already exist.
foreach (var observer in _observers)
{
// Create a BsonDocument version of the POCO that we can manipulate
// and then remove the _id field so it can be used in a $set.
var bsonObserver = observer.ToBsonDocument();
bsonObserver.Remove("_id");
// Create an update object that sets all fields on an insert, and everthing
// but the immutable _id on an update.
var update = new BsonDocument("$set", bsonObserver);
update.Add(new BsonDocument("$setOnInsert", new BsonDocument("_id", observer.ID)));
// Enable the upsert option to create the doc if it's not found.
var options = new UpdateOptions { IsUpsert = true };
var res = await collection.UpdateOneAsync(o => o.MyId == observer.MyId,
update, options);
}
Ok, it's just been a long time since I worked with cursors.
await res.MoveNextAsync();
helped.
This stuff is not hard to Google, but the fact of the matter is, I failed, so I'm going to leave the question up.
If you have an answer for part two https://stackoverflow.com/questions/32586064/replace-poco-with-mongodb-net-driver-2, then and would really like to post it here then I'll be happy two edit the questions.
As was pointed out in the comments the information is in fact readily available in the docs http://mongodb.github.io/mongo-csharp-driver/2.0/reference/driver/crud/reading/#finding-documents.

Why does this EF upsert logic result in deletions instead of updates?

The following code results in deletions instead of updates.
My question is: is this a bug in the way I'm coding against Entity Framework or should I suspect something else?
Update: I got this working, but I'm leaving the question now with both the original and the working versions in hopes that I can learn something I didn't understand about EF.
In this, the original non working code, when the database is fresh, all the additions of SearchDailySummary object succeed, but on the second time through, when my code was supposedly going to perform the update, the net result is a once again empty table in the database, i.e. this logic manages to be the equiv. of removing each entity.
//Logger.Info("Upserting SearchDailySummaries..");
using (var db = new ClientPortalContext())
{
foreach (var item in items)
{
var campaignName = item["campaign"];
var pk1 = db.SearchCampaigns.Single(c => c.SearchCampaignName == campaignName).SearchCampaignId;
var pk2 = DateTime.Parse(item["day"].Replace('-', '/'));
var source = new SearchDailySummary
{
SearchCampaignId = pk1,
Date = pk2,
Revenue = decimal.Parse(item["totalConvValue"]),
Cost = decimal.Parse(item["cost"]),
Orders = int.Parse(item["conv1PerClick"]),
Clicks = int.Parse(item["clicks"]),
Impressions = int.Parse(item["impressions"]),
CurrencyId = item["currency"] == "USD" ? 1 : -1 // NOTE: non USD (if exists) -1 for now
};
var target = db.Set<SearchDailySummary>().Find(pk1, pk2) ?? new SearchDailySummary();
if (db.Entry(target).State == EntityState.Detached)
{
db.SearchDailySummaries.Add(target);
addedCount++;
}
else
{
// TODO?: compare source and target and change the entity state to unchanged if no diff
updatedCount++;
}
AutoMapper.Mapper.Map(source, target);
itemCount++;
}
Logger.Info("Saving {0} SearchDailySummaries ({1} updates, {2} additions)", itemCount, updatedCount, addedCount);
db.SaveChanges();
}
Here is the working version (although I'm not 100% it's optimized, it's working reliably and performing fine as long as I batch it out in groups of 500 or less items in a shot - after that it slows down exponentially but I think that just may be a different question/subject)...
//Logger.Info("Upserting SearchDailySummaries..");
using (var db = new ClientPortalContext())
{
foreach (var item in items)
{
var campaignName = item["campaign"];
var pk1 = db.SearchCampaigns.Single(c => c.SearchCampaignName == campaignName).SearchCampaignId;
var pk2 = DateTime.Parse(item["day"].Replace('-', '/'));
var source = new SearchDailySummary
{
SearchCampaignId = pk1,
Date = pk2,
Revenue = decimal.Parse(item["totalConvValue"]),
Cost = decimal.Parse(item["cost"]),
Orders = int.Parse(item["conv1PerClick"]),
Clicks = int.Parse(item["clicks"]),
Impressions = int.Parse(item["impressions"]),
CurrencyId = item["currency"] == "USD" ? 1 : -1 // NOTE: non USD (if exists) -1 for now
};
var target = db.Set<SearchDailySummary>().Find(pk1, pk2);
if (target == null)
{
db.SearchDailySummaries.Add(source);
addedCount++;
}
else
{
AutoMapper.Mapper.Map(source, target);
db.Entry(target).State = EntityState.Modified;
updatedCount++;
}
itemCount++;
}
Logger.Info("Saving {0} SearchDailySummaries ({1} updates, {2} additions)", itemCount, updatedCount, addedCount);
db.SaveChanges();
}
The thing that keeps popping up in my mind is that maybe the Entry(entity) or Find(pk) method has some side effects? I should probably be consulting the documentation but any advice is appreciated..
It's a slight assumption on my part (without looking into your models/entities), but have a look at what's going on within this block (see if the objects being attached here are related to the deletions):
if (db.Entry(target).State == EntityState.Detached)
{
db.SearchDailySummaries.Add(target);
addedCount++;
}
Your detached object won't be able to use its navigation properties to locate its related objects; you'll be re-attaching an object in a potentially conflicting state (without the correct relationships).
You haven't mentioned what is being deleted above, so I may be way off. Just off out, so this is a little rushed, hope there's something useful in there.

Categories

Resources