How to batch retrieve entities? - c#

In Azure table storage, how can I query for a set of entities that match specific row keys in a partition???
I'm using Azure table storage and need to retrieve a set of entities that match a set of row keys within the partition.
Basically if this were SQL it may look something like this:
SELECT TOP 1 SomeKey
FROM TableName WHERE SomeKey IN (1, 2, 3, 4, 5);
I figured to save on costs and reduce doing a bunch of table retrieve operations that I could just do it using a table batch operation. For some reason I'm getting an exception that says:
"A batch transaction with a retrieve operation cannot contain any other operations"
Here is my code:
public async Task<IList<GalleryPhoto>> GetDomainEntitiesAsync(int someId, IList<Guid> entityIds)
{
try
{
var client = _storageAccount.CreateCloudTableClient();
var table = client.GetTableReference("SomeTable");
var batchOperation = new TableBatchOperation();
var counter = 0;
var myDomainEntities = new List<MyDomainEntity>();
foreach (var id in entityIds)
{
if (counter < 100)
{
batchOperation.Add(TableOperation.Retrieve<MyDomainEntityTableEntity>(someId.ToString(CultureInfo.InvariantCulture), id.ToString()));
++counter;
}
else
{
var batchResults = await table.ExecuteBatchAsync(batchOperation);
var batchResultEntities = batchResults.Select(o => ((MyDomainEntityTableEntity)o.Result).ToMyDomainEntity()).ToList();
myDomainEntities .AddRange(batchResultEntities );
batchOperation.Clear();
counter = 0;
}
}
return myDomainEntities;
}
catch (Exception ex)
{
_logger.Error(ex);
throw;
}
}
How can I achieve what I'm after without manually looping through the set of row keys and doing an individual Retrieve table operation for each one? I don't want to incur the cost associated with doing this since I could have hundreds of row keys that I want to filter on.

I made a helper method to do it in a single request per partition.
Use it like this:
var items = table.RetrieveMany<MyDomainEntity>(partitionKey, nameof(TableEntity.RowKey),
rowKeysList, columnsToSelect);
Here's the helper methods:
public static List<T> RetrieveMany<T>(this CloudTable table, string partitionKey,
string propertyName, IEnumerable<string> valuesRange,
List<string> columnsToSelect = null)
where T : TableEntity, new()
{
var enitites = table.ExecuteQuery(new TableQuery<T>()
.Where(TableQuery.CombineFilters(
TableQuery.GenerateFilterCondition(
nameof(TableEntity.PartitionKey),
QueryComparisons.Equal,
partitionKey),
TableOperators.And,
GenerateIsInRangeFilter(
propertyName,
valuesRange)
))
.Select(columnsToSelect))
.ToList();
return enitites;
}
public static string GenerateIsInRangeFilter(string propertyName,
IEnumerable<string> valuesRange)
{
string finalFilter = valuesRange.NotNull(nameof(valuesRange))
.Distinct()
.Aggregate((string)null, (filterSeed, value) =>
{
string equalsFilter = TableQuery.GenerateFilterCondition(
propertyName,
QueryComparisons.Equal,
value);
return filterSeed == null ?
equalsFilter :
TableQuery.CombineFilters(filterSeed,
TableOperators.Or,
equalsFilter);
});
return finalFilter ?? "";
}
I have tested it for less than 100 values in rowKeysList, however, if it even throws an exception if there are more, we can always split the request into parts.

With hundreds of row keys, that rules out using $filter with a list of row keys (which would result in partial partition scan anyway).
With the error you're getting, it seems like the batch contains both queries and other types of operations (which isn't permitted). I don't see why you're getting that error, from your code snippet.
Your only other option is to execute individual queries. You can do these asynchronously though, so you wouldn't have to wait for each to return. Table storage provides upwards of 2,000 transactions / sec on a given partition, so it's a viable solution.

Not sure how I missed this in the first place, but here is a snippet from the MSDN documentation for the TableBatchOperation type:
A batch operation may contain up to 100 individual table operations, with the requirement that each operation entity must have same partition key. A batch with a retrieve operation cannot contain any other operations. Note that the total payload of a batch operation is limited to 4MB.
I ended up executing individual retrieve operations asynchronously as suggested by David Makogon.

I made my own ghetto link-table. I know it's not that efficient (maybe its fine) but I only make this request if the data is not cached locally, which only means switching devices. Anyway, this seems to work. Checking the length of the two arrays lets me defer the context.done();
var query = new azure.TableQuery()
.top(1000)
.where('PartitionKey eq ?', 'link-' + req.query.email.toLowerCase() );
tableSvc.queryEntities('linkUserMarker',query, null, function(error, result, response) {
if( !error && result ){
var markers = [];
result.entries.forEach(function(e){
tableSvc.retrieveEntity('markerTable', e.markerPartition._, e.RowKey._.toString() , function(error, marker, response){
markers.push( marker );
if( markers.length == result.entries.length ){
context.res = {
status:200,
body:{
status:'error',
markers: markers
}
};
context.done();
}
});
});
} else {
notFound(error);
}
});

I saw your post when I was looking for a solution, in my case I needed to be look up multiple ids at the same time.
Because there is no contains linq support (https://learn.microsoft.com/en-us/rest/api/storageservices/query-operators-supported-for-the-table-service) I just made a massive or equals chain.
Seems to be working for me so far hope it helps anyone.
public async Task<ResponseModel<ICollection<TAppModel>>> ExecuteAsync(
ICollection<Guid> ids,
CancellationToken cancellationToken = default
)
{
if (!ids.Any())
throw new ArgumentOutOfRangeException();
// https://learn.microsoft.com/en-us/rest/api/storageservices/query-operators-supported-for-the-table-service
// Contains not support so make a massive or equals statement...lol
var item = Expression.Parameter(typeof(TTableModel), typeof(TTableModel).FullName);
var expressions = ids
.Select(
id => Expression.Equal(
Expression.Constant(id.ToString()),
Expression.MakeMemberAccess(
Expression.Parameter(typeof(TTableModel), nameof(ITableEntity.RowKey)),
typeof(TTableModel).GetProperty(nameof(ITableEntity.RowKey))
)
)
)
.ToList();
var builderExpression = expressions.First();
builderExpression = expressions
.Skip(1)
.Aggregate(
builderExpression,
Expression.Or
);
var finalExpression = Expression.Lambda<Func<TTableModel, bool>>(builderExpression, item);
var result = await _azureTableService.FindAsync(
finalExpression,
cancellationToken
);
return new(
result.Data?.Select(_ => _mapper.Map<TAppModel>(_)).ToList(),
result.Succeeded,
result.User,
result.Messages.ToArray()
);
}
public async Task<ResponseModel<ICollection<TTableEntity>>> FindAsync(
Expression<Func<TTableEntity,bool>> filter,
CancellationToken ct = default
)
{
try
{
var queryResultsFilter = _tableClient.QueryAsync<TTableEntity>(
FilterExpressionTree(filter),
cancellationToken: ct
);
var items = new List<TTableEntity>();
await foreach (TTableEntity qEntity in queryResultsFilter)
items.Add(qEntity);
return new ResponseModel<ICollection<TTableEntity>>(items);
}
catch (Exception exception)
{
_logger.Error(
nameof(FindAsync),
exception,
exception.Message
);
// OBSFUCATE
// TODO PASS ERROR ID
throw new Exception();
}
}

Related

Merge data from two arrays or something else

How to combine Id from the list I get from file /test.json and id from list ourOrders[i].id?
Or if there is another way?
private RegionModel FilterByOurOrders(RegionModel region, List<OurOrderModel> ourOrders, MarketSettings market, bool byOurOrders)
{
var result = new RegionModel
{
updatedTs = region.updatedTs,
orders = new List<OrderModel>(region.orders.Count)
};
var json = File.ReadAllText("/test.json");
var otherBotOrders = JsonSerializer.Deserialize<OrdersTimesModel>(json);
OtherBotOrders = new Dictionary<string, OrderTimesInfoModel>();
foreach (var otherBotOrder in otherBotOrders.OrdersTimesInfo)
{
//OtherBotOrders.Add(otherBotOrder.Id, otherBotOrder);
BotController.WriteLine($"{otherBotOrder.Id}"); //Output ID orders to the console works
}
foreach (var order in region.orders)
{
if (ConvertToDecimal(order.price) < 1 || !byOurOrders)
{
int i = 0;
var isOurOrder = false;
while (i < ourOrders.Count && !isOurOrder)
{
if (ourOrders[i].id.Equals(order.id, StringComparison.InvariantCultureIgnoreCase))
{
isOurOrder = true;
}
++i;
}
if (!isOurOrder)
{
result.orders.Add(order);
}
}
}
return result;
}
OrdersTimesModel Looks like that:
public class OrdersTimesModel
{
public List<OrderTimesInfoModel> OrdersTimesInfo { get; set; }
}
test.json:
{"OrdersTimesInfo":[{"Id":"1"},{"Id":"2"}]}
Added:
I'll try to clarify the question:
There are three lists with ID:
First (all orders): region.orders, as order.id
Second (our orders): ourOrders, as ourOrders[i].id in a while loop
Third (our orders 2): from the /test.json file, as an array {"Orders":[{"Id":"12345..."...},{"Id":"12345..." ...}...]}
There is a foreach in which there is a while, where the First (all orders) list and the Second (our orders) list are compared. If the id's match, then these are our orders: isOurOrder = true;
Accordingly, those orders that isOurOrder = false; will be added to the result: result.orders.Add(order)
I need:
So that if (ourOrders[i].id.Equals(order.id, StringComparison.InvariantCultureIgnoreCase)) would include more Id's from the Third (our orders 2) list.
Or any other way to do it?
You should be able to completely avoid writing loops if you use LINQ (there will be loops running in the background, but it's way easier to read)
You can access some documentation here: https://learn.microsoft.com/en-us/dotnet/csharp/programming-guide/concepts/linq/introduction-to-linq-queries
and you have some pretty cool extension methods for arrays: https://learn.microsoft.com/en-us/dotnet/api/system.linq.enumerable?view=net-6.0 (these are great to get your code easy to read)
Solution
unsing System.Linq;
private RegionModel FilterByOurOrders(RegionModel region, List<OurOrderModel> ourOrders, MarketSettings market, bool byOurOrders)
{
var result = new RegionModel
{
updatedTs = region.updatedTs,
orders = new List<OrderModel>(region.orders.Count)
};
var json = File.ReadAllText("/test.json");
var otherBotOrders = JsonSerializer.Deserialize<OrdersTimesModel>(json);
// This line should get you an array containing
// JUST the ids in the JSON file
var idsFromJsonFile = otherBotOrders.Select(x => x.Id);
// Here you'll get an array with the ids for your orders
var idsFromOurOrders = ourOrders.Select(x => x.id);
// Union will only take unique values,
// so you avoid repetition.
var mergedArrays = idsFromJsonFile.Union(idsFromOurOrders);
// Now we just need to query the region orders
// We'll get every element that has an id contained in the arrays we created earlier
var filteredRegionOrders = region.orders.Where(x => !mergedArrays.Contains(x.id));
result.orders.AddRange(filteredRegionOrders );
return result;
}
You can add conditions to any of those actions (like checking for order price or the boolean flag you get as a parameter), and of course you can do it without assigning so many variables, I did it that way just to make it easier to explain.

How to convert SQLQuery to SortedList through EF6

I have an Entity Framework 6 class called Materials, which is reflected in my database as a table with the same name. Using a parent parameter, I need to return a sorted list of materials from a SQL Query, so that I can later check that edits the user makes do not affect the order. My SQL is a stored procedure that looks like this:
CREATE PROC [dbo].[GET_SortedMaterials](#FinishedGoodCode VARCHAR(50))
AS
SELECT
ROW_NUMBER() OVER (ORDER BY Component.Percentage_of_Parent DESC,Material.Material) AS _sortField
,Material.*
FROM
Components AS Component
INNER JOIN Materials AS Material ON Component.Child_Material = Material.Material
WHERE
Component.Parent_Code = #FinishedGoodCode
ORDER BY
Component.Percentage_of_Parent DESC
,Material.Material
As you can see, the orderby field is not included in the Material. For this reason, I felt I could not return just a set of Material objects and still keep the sorting - I have performed the ordering in SQL and added the _sortField (I think that field may be a bad idea).
My C# code to read the SQL looks like this:
public async Task<SortedList<int, Materials>> GET_SortedMaterials(IProgress<Report> progress, string finishedGoodCode)
{
try
{
var report = new Report { Message = "Retrieving Sorted Materials", NewLine = true, StatusCode = Enums.StatusCode.Working };
progress.Report(report);
using (var context = new DBContext())
{
var ingredientList = await context.Database.SqlQuery<(int _sortField,Materials mat)>("[app].[GET_Customers]").ToListAsync();
var sorted = new SortedList<int, Raw_Materials>();
foreach (var (_sortField, mat) in ingredientList.OrderBy(x=>x._sortField))
{
sorted.Add(_sortField, mat);
}
return sorted;
}
}
catch (Exception ex)
{ [EXCLUDED CODE]
}
}
When the code executes, I get the correct number of rows returned, but I do not get a Sorted list where the Key corresponds to the _sortField value and the Value to the Material value. I have tried various different versions of basically the same code and I cannot get the script to return a list of materials with information about their sorting, instead, the conversion to EF class fails entirely and I only get null values back:
Any advice about how to return a sorted list from SQL and maintain the sorting in C#, when the sort field is not in the return values would be very gratefully received.
use
var ingredientList = await context.Database.SqlQuery<Materials>("[app].[GET_Customers]").Select((mat, _sortField) => (_sortField, mat)).ToDictionary(x => x._sortField, x => x.mat);
or if you want async load use
var ingredientList = await context.Database.SqlQuery<Materials>("[app].[GET_Customers]").ToListAsync().Result.Select((mat, _sortField) => (_sortField, mat)).ToDictionary(x => x._sortField, x => x.mat);
full code
public async Task<SortedList<int, Materials>> GET_SortedMaterials(IProgress<Report> progress, string finishedGoodCode)
{
try
{
var report = new Report { Message = "Retrieving Sorted Materials", NewLine = true, StatusCode = Enums.StatusCode.Working };
progress.Report(report);
using (var context = new DBContext())
{
var ingredientList = await context.Database.SqlQuery<Materials>("[app].[GET_Customers]").ToListAsync().Result.Select((mat, _sortField) => (_sortField, mat)).ToDictionary(x => x._sortField, x => x.mat);
var sorted = new SortedList<int, Raw_Materials>();
foreach (var item in ingredientList.OrderBy(x => x.Key))
{
sorted.Add(item.Key, item.Value);
}
return sorted;
}
}
catch (Exception ex)
{
[EXCLUDED CODE]
}
}

Creating a sequence -- SetOnInsert appears to do nothing

I'm having a problem trying, what boils down to, incrementing a field in a document or inserting an entire document. The context is "trying to insert an initial document for a sequence or incrementing the sequence number for an existing sequence".
This code:
private async Task<int> GetSequenceNumber(string sequenceName)
{
var filter = new ExpressionFilterDefinition<Sequence>(x => x.Id == sequenceName);
var builder = Builders<Sequence>.Update;
var update = builder
.SetOnInsert(x => x.CurrentValue, 1000)
.Inc(x => x.CurrentValue, 1);
var sequence = await _context.SequenceNumbers.FindOneAndUpdateAsync(
filter,
update,
new FindOneAndUpdateOptions<Sequence>
{
IsUpsert = true,
ReturnDocument = ReturnDocument.After,
});
return sequence.CurrentValue;
}
results in the exception
MongoDB.Driver.MongoCommandException: Command findAndModify failed: Updating the path 'currentvalue' would create a conflict at 'currentvalue'.
at MongoDB.Driver.Core.WireProtocol.CommandUsingCommandMessageWireProtocol`1.ProcessResponse(ConnectionId connectionId, CommandMessage responseMessage)
Removing the SetOnInsert results in no errors, but inserts a document with the currentValue equal to 1 instead of the expected 1000.
It almost appears if SetOnInsert is not being honored, and that what's happening is a default document is inserted and then currentValue is incremented via Inc atomically as the new document is created.
How do I overcome these issues? A non-C# solution would also be welcome, as I could translate that...
Ok thanks to #dododo in the comments, I now realize that both an Inc and a SetOnInsert can't be applied at the same time. It's unintuitive because you'd think the former would apply on update only and the latter on insert only.
I went with the solution below, which suffers more than one round-trip, but at least works, and appears to work with my concurrency based tests.
public async Task<int> GetSequenceNumber(string sequenceName, int tryCount)
{
if (tryCount > 5) throw new InvalidOperationException();
var filter = new ExpressionFilterDefinition<Sequence>(x => x.Id == sequenceName);
var builder = Builders<Sequence>.Update;
// optimistically assume value was already initialized
var update = builder.Inc(x => x.CurrentValue, 1);
var sequence = await _context.SequenceNumbers.FindOneAndUpdateAsync(
filter,
update,
new FindOneAndUpdateOptions<Sequence>
{
IsUpsert = true,
ReturnDocument = ReturnDocument.After,
});
if (sequence == null)
try
{
// we have to try to save a new sequence...
sequence = new Sequence { Id = sequenceName, CurrentValue = 1001 };
await _context.SequenceNumbers.InsertOneAsync(sequence);
}
// ...but something else could beat us to it
catch (MongoWriteException e) when (e.WriteError.Code == DuplicateKeyCode)
{
// ...so we have to retry an update
return await GetSequenceNumber(sequenceName, tryCount + 1);
}
return sequence.CurrentValue;
}
I'm sure there are other options. It may be possible to use an aggregation pipeline, for example.

Why does my Azure Cosmos query return empty results when it should return many results?

I am running a query against my Cosmos db instance, and I am occasionally getting 0 results back, when I know that I should be getting some results.
var options = new QueryRequestOptions()
{
MaxItemCount = 25
};
var query = #"
select c.id,c.callTime,c.direction,c.action,c.result,c.duration,c.hasR,c.hasV,c.callersIndexed,c.callers,c.files
from c
where
c.ownerId=#ownerId
and c.callTime>=#dateFrom
and c.callTime<=#dateTo
and (CONTAINS(c.phoneNums_s, #name)
or CONTAINS(c.names_s, #name)
or CONTAINS(c.xNums_s, #name))
order by c.callTime desc";
var queryIterator = container.GetItemQueryIterator<CallIndex>(new QueryDefinition(query)
.WithParameter("#ownerId", "62371255008")
.WithParameter("#name", "harr")
.WithParameter("#dateFrom", dateFrom) // 5/30/2020 5:00:00 AM +00:00
.WithParameter("#dateTo", dateTo) // 8/29/2020 4:59:59 AM +00:00
.WithParameter("#xnum", null), requestOptions: options, continuationToken: null);
if (queryIterator.HasMoreResults)
{
var feed = queryIterator.ReadNextAsync().Result;
model.calls = feed.ToList(); //feed.Resource is empty; feed.Count is 0;
model.CosmosContinuationToken = feed.ContinuationToken; //feed.ContinuationToken is populated with a large token value, indicating that there are more results, even though this fetch returned 0 items.
model.TotalRecords = feed.Count(); // 0
}
As you can see, even though I received 0 results, the continuation token indicates that there is more data there after this first request. And, after visually inspecting the data directly in the database (data explorer in the Azure portal), I see records that should match, but they are not found in this query. To further test, I ran the same exact query a few seconds later, and received results:
var query = #"
select c.id,c.callTime,c.direction,c.action,c.result,c.duration,c.hasR,c.hasV,c.callersIndexed,c.callers,c.files
from c
where
c.ownerId=#ownerId
and c.callTime>=#dateFrom
and c.callTime<=#dateTo
and (CONTAINS(c.phoneNums_s, #name)
or CONTAINS(c.names_s, #name)
or CONTAINS(c.xNums_s, #name))
order by c.callTime desc";
var queryIterator = container.GetItemQueryIterator<CallIndex>(new QueryDefinition(query)
.WithParameter("#ownerId", "62371255008")
.WithParameter("#name", "harr")
.WithParameter("#dateFrom", dateFrom) // 5/30/2020 5:00:00 AM +00:00
.WithParameter("#dateTo", dateTo) // 8/29/2020 4:59:59 AM +00:00
.WithParameter("#xnum", null), requestOptions: options, continuationToken: null);
if (queryIterator.HasMoreResults)
{
var feed = queryIterator.ReadNextAsync().Result;
model.calls = feed.ToList(); //feed.Resource has 25 items; feed.Count is 25;
model.CosmosContinuationToken = feed.ContinuationToken; //feed.ContinuationToken is populated, but it is considerably smaller than the token I received from the first request.
model.TotalRecords = feed.Count(); // 25
}
This is the exact query as before, but this time the feed gave me the results I expected. This has happened more than once, and continues to happen intermittently. What gives with this? Is this a bug in Azure Cosmos? If so, it seems like a serious bug that breaks the very core functionality of Cosmos (and databases in general).
Or, is this expected? Is it possible that in the first query, I need to continue to ReadNextAsync until I get some results back using the continuation token?
Any help is appreciated, as this is breaking very basic functionality in my app.
Also, I would like to add that the data returned from the query has not been newly added between the times of my first query attempt, and my second query attempt. That data has been there for a while.
Your code is correct, you are expected to drain the query checking HasMoreResults (although I would change the .Result with await to avoid a possible deadlock). What can happen in cross-partition queries is that you could get some empty page if the initial partitions checked for results have none.
Sometimes queries may have empty pages even when there are results on a future page. Reasons for this could be:
The SDK could be doing multiple network calls.
The query might be taking a long time to retrieve the documents.
Reference: https://learn.microsoft.com/azure/cosmos-db/troubleshoot-query-performance#common-sdk-issues
Try using below code:
Query Cosmos DB method:
public async Task<DocDbQueryResult> QueryCollectionBaseWithPagingInternalAsync(FeedOptions feedOptions, string queryString, IDictionary<string, object> queryParams, string collectionName)
{
string continuationToken = feedOptions.RequestContinuation;
List<JObject> documents = new List<JObject>();
IDictionary<string, object> properties = new Dictionary<string, object>();
int executionCount = 0;
double requestCharge = default(double);
double totalRequestCharge = default(double);
do
{
feedOptions.RequestContinuation = continuationToken;
var query = this.documentDbClient.CreateDocumentQuery<JObject>(
UriFactory.CreateDocumentCollectionUri(this.databaseName, collectionName),
new SqlQuerySpec
{
QueryText = queryString,
Parameters = ToSqlQueryParamterCollection(queryParams),
},
feedOptions)
.AsDocumentQuery();
var response = await query.ExecuteNextAsync<JObject>().ConfigureAwait(false);
documents.AddRange(response.AsEnumerable());
executionCount++;
requestCharge = executionCount == 1 ? response.RequestCharge : requestCharge;
totalRequestCharge += response.RequestCharge;
continuationToken = response.ResponseContinuation;
}
while (!string.IsNullOrWhiteSpace(continuationToken) && documents.Count < feedOptions.MaxItemCount);
var pagedDocuments = documents.Take(feedOptions.MaxItemCount.Value);
var result = new DocDbQueryResult
{
ResultSet = new JArray(pagedDocuments),
TotalResults = Convert.ToInt32(pagedDocuments.Count()),
ContinuationToken = continuationToken
};
// if query params are not null, use existing query params also to be passed as properties.
if (queryParams != null)
{
properties = queryParams;
}
properties.Add("TotalRequestCharge", totalRequestCharge);
properties.Add("ExecutionCount", executionCount);
return result;
}
ToSqlQueryParamterCollection method:
private static SqlParameterCollection ToSqlQueryParamtereCollection(IDictionary<string, object> queryParams)
{
var coll = new SqlParameterCollection();
if (queryParams != null)
{
foreach (var paramKey in queryParams.Keys)
{
coll.Add(new SqlParameter(paramKey, queryParams[paramKey]));
}
}
return coll;
}

C# Parallel.ForEach with shared function throws IndexOutOfRangeException

I need to help solve the problem with shared function in Parallel.ForEach. I got an error lower, how can I change the function to be saved for work with threads ?
public IEnumerable<Datamodel> LoadLibrary(IEnumerable<Items> items)
{
var allLibReferences = new List<LibraryReferenceModel>();
var baseData = LoadBaseLibData();
Parallel.ForEach(baseData, data =>
{
var item = items.ToList().FindAll(c => c.Name == data.Name);
CreateLibraryReference(allLibReferences, item, data.Name); // Problem to call function in Parallel.ForEach
});
return allLibReferences;
}
private static void CreateLibraryReference(ICollection<LibraryReferenceModel> allLibReferences,
IReadOnlyCollection<Item> item, string libraryName)
{
allLibReferences.Add(item.Count == 0
? new LibraryReferenceModel
{
LibName = libraryName,
HasReference = false,
References = item.Count
}
: new LibraryReferenceModel
{
LibName = libraryName,
HasReference = true,
References = item.Count
});
}
I got This exception (the index is out of array bounds):
Thank you
As you've found, since multiple threads are attempting to add new items to the shared allLibReferences collection, you'll find erratic thread safety issues like the error you've described.
This is why it's really important to make your code thread safe before you consider parallelising it. One of the best techniques is to ensure that you rely on immutable code constructs, i.e. never try and change (mutate) the value of a shared variable during parallel code.
So I would change the way the code works, so that instead of sharing a collection, what we do is project the items needed immutably, which can be safely parallelised (I've used .AsParallel, as its simpler), and then you can collate the results and return them.
Furthermore, since the whole point of parallelism is to make code run as quickly as possible, you'll also want to remove inefficiencies such as materialising the same items in a list during each iteration (items.ToList()), and you'll also want to avoid O(N) iterations during a loop if possible - I've replaced .FindAll(c => c.Name == data.Name) with a pre-calculated dictionary.
Putting that altogether, you'll wind up with something like this:
public IEnumerable<LibraryReferenceModel> LoadLibrary(IEnumerable<Item> items)
{
var keyedItems = items.GroupBy(i => i.Name)
.ToDictionary(grp => grp.Key, grp => grp.ToList());
var baseData = LoadBaseLibData();
var allLibReferences = baseData
.AsParallel()
.SelectMany(data =>
{
if (keyedItems.TryGetValue(data.Name, out var matchedItems))
{
return matchedItems
.Select(i => ProjectLibraryReference(i, data.Name));
}
// No matches found
return new LibraryReferenceModel
{
LibName = data.Name,
HasReference = false,
References = 0
};
})
.ToList();
return allLibReferences;
}
private static LibraryReferenceModel ProjectLibraryReference(IReadOnlyCollection<Item> item,
string libraryName)
{
return new LibraryReferenceModel
{
LibName = libraryName,
HasReference = item.Count > 0,
References = item.Count
};
}
I've assumed that multiple items can have the same name, hence we're grouping before creating the Dictionary, and then we're flattening the projected results with .SelectMany at the end.

Categories

Resources