C# Parallel.ForEach with shared function throws IndexOutOfRangeException

C# Parallel.ForEach with shared function throws IndexOutOfRangeException - c#

I need to help solve the problem with shared function in Parallel.ForEach. I got an error lower, how can I change the function to be saved for work with threads ?
public IEnumerable<Datamodel> LoadLibrary(IEnumerable<Items> items)
{
var allLibReferences = new List<LibraryReferenceModel>();
var baseData = LoadBaseLibData();
Parallel.ForEach(baseData, data =>
{
var item = items.ToList().FindAll(c => c.Name == data.Name);
CreateLibraryReference(allLibReferences, item, data.Name); // Problem to call function in Parallel.ForEach
});
return allLibReferences;
}
private static void CreateLibraryReference(ICollection<LibraryReferenceModel> allLibReferences,
IReadOnlyCollection<Item> item, string libraryName)
{
allLibReferences.Add(item.Count == 0
? new LibraryReferenceModel
{
LibName = libraryName,
HasReference = false,
References = item.Count
}
: new LibraryReferenceModel
{
LibName = libraryName,
HasReference = true,
References = item.Count
});
}
I got This exception (the index is out of array bounds):
Thank you

As you've found, since multiple threads are attempting to add new items to the shared allLibReferences collection, you'll find erratic thread safety issues like the error you've described.
This is why it's really important to make your code thread safe before you consider parallelising it. One of the best techniques is to ensure that you rely on immutable code constructs, i.e. never try and change (mutate) the value of a shared variable during parallel code.
So I would change the way the code works, so that instead of sharing a collection, what we do is project the items needed immutably, which can be safely parallelised (I've used .AsParallel, as its simpler), and then you can collate the results and return them.
Furthermore, since the whole point of parallelism is to make code run as quickly as possible, you'll also want to remove inefficiencies such as materialising the same items in a list during each iteration (items.ToList()), and you'll also want to avoid O(N) iterations during a loop if possible - I've replaced .FindAll(c => c.Name == data.Name) with a pre-calculated dictionary.
Putting that altogether, you'll wind up with something like this:
public IEnumerable<LibraryReferenceModel> LoadLibrary(IEnumerable<Item> items)
{
var keyedItems = items.GroupBy(i => i.Name)
.ToDictionary(grp => grp.Key, grp => grp.ToList());
var baseData = LoadBaseLibData();
var allLibReferences = baseData
.AsParallel()
.SelectMany(data =>
{
if (keyedItems.TryGetValue(data.Name, out var matchedItems))
{
return matchedItems
.Select(i => ProjectLibraryReference(i, data.Name));
}
// No matches found
return new LibraryReferenceModel
{
LibName = data.Name,
HasReference = false,
References = 0
};
})
.ToList();
return allLibReferences;
}
private static LibraryReferenceModel ProjectLibraryReference(IReadOnlyCollection<Item> item,
string libraryName)
{
return new LibraryReferenceModel
{
LibName = libraryName,
HasReference = item.Count > 0,
References = item.Count
};
}
I've assumed that multiple items can have the same name, hence we're grouping before creating the Dictionary, and then we're flattening the projected results with .SelectMany at the end.

Related

Merge data from two arrays or something else

How to combine Id from the list I get from file /test.json and id from list ourOrders[i].id?
Or if there is another way?
private RegionModel FilterByOurOrders(RegionModel region, List<OurOrderModel> ourOrders, MarketSettings market, bool byOurOrders)
{
var result = new RegionModel
{
updatedTs = region.updatedTs,
orders = new List<OrderModel>(region.orders.Count)
};
var json = File.ReadAllText("/test.json");
var otherBotOrders = JsonSerializer.Deserialize<OrdersTimesModel>(json);
OtherBotOrders = new Dictionary<string, OrderTimesInfoModel>();
foreach (var otherBotOrder in otherBotOrders.OrdersTimesInfo)
{
//OtherBotOrders.Add(otherBotOrder.Id, otherBotOrder);
BotController.WriteLine($"{otherBotOrder.Id}"); //Output ID orders to the console works
}
foreach (var order in region.orders)
{
if (ConvertToDecimal(order.price) < 1 || !byOurOrders)
{
int i = 0;
var isOurOrder = false;
while (i < ourOrders.Count && !isOurOrder)
{
if (ourOrders[i].id.Equals(order.id, StringComparison.InvariantCultureIgnoreCase))
{
isOurOrder = true;
}
++i;
}
if (!isOurOrder)
{
result.orders.Add(order);
}
}
}
return result;
}
OrdersTimesModel Looks like that:
public class OrdersTimesModel
{
public List<OrderTimesInfoModel> OrdersTimesInfo { get; set; }
}
test.json:
{"OrdersTimesInfo":[{"Id":"1"},{"Id":"2"}]}
Added:
I'll try to clarify the question:
There are three lists with ID:
First (all orders): region.orders, as order.id
Second (our orders): ourOrders, as ourOrders[i].id in a while loop
Third (our orders 2): from the /test.json file, as an array {"Orders":[{"Id":"12345..."...},{"Id":"12345..." ...}...]}
There is a foreach in which there is a while, where the First (all orders) list and the Second (our orders) list are compared. If the id's match, then these are our orders: isOurOrder = true;
Accordingly, those orders that isOurOrder = false; will be added to the result: result.orders.Add(order)
I need:
So that if (ourOrders[i].id.Equals(order.id, StringComparison.InvariantCultureIgnoreCase)) would include more Id's from the Third (our orders 2) list.
Or any other way to do it?

You should be able to completely avoid writing loops if you use LINQ (there will be loops running in the background, but it's way easier to read)
You can access some documentation here: https://learn.microsoft.com/en-us/dotnet/csharp/programming-guide/concepts/linq/introduction-to-linq-queries
and you have some pretty cool extension methods for arrays: https://learn.microsoft.com/en-us/dotnet/api/system.linq.enumerable?view=net-6.0 (these are great to get your code easy to read)
Solution
unsing System.Linq;
private RegionModel FilterByOurOrders(RegionModel region, List<OurOrderModel> ourOrders, MarketSettings market, bool byOurOrders)
{
var result = new RegionModel
{
updatedTs = region.updatedTs,
orders = new List<OrderModel>(region.orders.Count)
};
var json = File.ReadAllText("/test.json");
var otherBotOrders = JsonSerializer.Deserialize<OrdersTimesModel>(json);
// This line should get you an array containing
// JUST the ids in the JSON file
var idsFromJsonFile = otherBotOrders.Select(x => x.Id);
// Here you'll get an array with the ids for your orders
var idsFromOurOrders = ourOrders.Select(x => x.id);
// Union will only take unique values,
// so you avoid repetition.
var mergedArrays = idsFromJsonFile.Union(idsFromOurOrders);
// Now we just need to query the region orders
// We'll get every element that has an id contained in the arrays we created earlier
var filteredRegionOrders = region.orders.Where(x => !mergedArrays.Contains(x.id));
result.orders.AddRange(filteredRegionOrders );
return result;
}
You can add conditions to any of those actions (like checking for order price or the boolean flag you get as a parameter), and of course you can do it without assigning so many variables, I did it that way just to make it easier to explain.

Fastest equivalent of comparing all elements of array

What is the fastest equivalent in C#/LINQ to compare all combination of elements of an array like so and add them to a bucket if they are not in a bucket. AKA. How could I optimize this piece of code in C#.
// pseudocode
List<T> elements = { .... }
HashSet<T> bucket = {}
foreach (T element in elements)
foreach (var myelemenet in elements.Where(e => e.id != element.id))
{
if (!element.notInTheList)
{
_elementIsTheSame = element.Equals(myelement);
if (_elementIsTheSame)
{
// append element to a bucket
if (!elementIsInTheBucket(bucket, element))
{
element.notInTheList = true;
addToBucket(bucket, element);
}
}
}
}
}
// takes about 150ms on a fast workstation with only 300 elements in the LIST!
The final order of the elements in the bucket is important

elements.GroupBy(x=>x).SelectMany(x=>x);
https://dotnetfiddle.net/yZ9JDp
This works because GroupBy preserves order.
Note that this puts the first element of each equivalence class first. Your code puts the first element last, and skips classes with just a single element.
Skipping the classes with just a single element can be done with a where before the SelectMany.
elements.GroupBy(x=>x).Where(x=>x.Skip(1).Any()).SelectMany(x=>x);
Getting the first element last is a bit more tricky, but I suspect it's a bug in your code so I will not try to write it out.
Depending on how you use the result you might want to throw a ToList() at the end.

It sounds like you're effectively after DistinctBy? Which can be simulated with something like:
var list = new List<MainType>();
var known = new HashSet<PropertyType>();
foreach (var item in source)
{
if (known.Add(item.TheProperty))
list.Add(item);
}
You now have a list of the items taking the first only when there are duplicates via the selected property, preserving order.

If the intent is to find the fastest solution, as the StackOverflow title suggests, then I would consider using CSharp's Parallel.ForEach to perform a map and reduce.
For example:
var resultsCache = new IRecord[_allRecords.Length];
var resultsCount = 0;
var parallelOptions = new ParallelOptions
{
MaxDegreeOfParallelism = 1 // use an appropriate value
};
Parallel.ForEach(
_allRecords,
parallelOptions,
// Part1: initialize thread local storage
() => { return new FilterMapReduceState(); },
// Part2: define task to perform
(record, parallelLoopState, index, results) =>
{
if (_abortFilterOperation)
{
parallelLoopState.Break();
}
if (strategy.CanKeep(record))
{
resultsCache[index] = record;
results.Count++;
}
else
{
resultsCache[index] = null;
}
return results;
},
// Part3: merge the results
(results) =>
{
Interlocked.Add(ref resultsCount, results.Count);
}
);
where
class FilterMapReduceState
{
public FilterMapReduceState()
{
this.Count = 0;
}
/// <summary>
/// Represents the number of records that meet the search criteria.
/// </summary>
internal int Count { get; set; }
}

As I understand from what you did you need is these pieces.
var multipleIds=elements.GroupBy(x => x.id)
.Where(g => g.Count() > 1)
.Select(y => y.Key);
var distinctIds=elements.Select(x=>x.id).Distinct();
var distinctElements=elements.Where(x=>distinctIds.Contains(x.id));
}

Comparing attributes from a List inside a List

Description
My goal is to compare the language of a menu object from the menuList. Since the menuList has the Languages offered as another list it makes it a bit more complicated. So I tried to create a new class object with the same values so I can use menuList.Languages.Contains(languageObject), however I quickly found out that this doesn't work like that. I tried to make a for loop inside a for loop which didn't work either, but could be a failure from my side.
Obviously I can't write something like: MenuList.Languages.Name.Equals("English").
Because of that I am looking for a solution where I can check if the attribute Name of the Languages-List inside the menuList equals a value of my choice.
The Object
private LanguageBox LangEng = new LanguageBox
{
IsoCode = "eng",
Name = "English"
};
The List
var MenuList = menuDataClient.GetMenuByCity(city)
.Select(nap => new MenuBox()
{
Menu = nap.Menu,
Languages = nap.Languages
.Select(lang => new LanguageBox()
{
IsoCode = lang.IsoCode,
Name = lang.Name
}).ToList()
})
.ToList();
The Loop
for (int i = 0; i < MenuList.Count; i++)
{
if (MenuList[i].Languages.Contains(LangEng))
{
System.Console.WriteLine("Success");
}
}

Maybe linq's Where could do the trick? Sth like:
foreach(var item in MenuList)
{
var x = item.Languages.Where(obj => obj.Name == <desired language>);
if (x.Count() > 0)
{
//Success code
break;
}
}

I have found a solution. This LINQ option works if you want to only keep elements in the list which have English or Russian in their Languages-List.
Solution
.Where(lang => lang.Languages.Any(any => any.Name.Equals("English") || any.Name.Equals("Russian")))

How to batch retrieve entities?

In Azure table storage, how can I query for a set of entities that match specific row keys in a partition???
I'm using Azure table storage and need to retrieve a set of entities that match a set of row keys within the partition.
Basically if this were SQL it may look something like this:
SELECT TOP 1 SomeKey
FROM TableName WHERE SomeKey IN (1, 2, 3, 4, 5);
I figured to save on costs and reduce doing a bunch of table retrieve operations that I could just do it using a table batch operation. For some reason I'm getting an exception that says:
"A batch transaction with a retrieve operation cannot contain any other operations"
Here is my code:
public async Task<IList<GalleryPhoto>> GetDomainEntitiesAsync(int someId, IList<Guid> entityIds)
{
try
{
var client = _storageAccount.CreateCloudTableClient();
var table = client.GetTableReference("SomeTable");
var batchOperation = new TableBatchOperation();
var counter = 0;
var myDomainEntities = new List<MyDomainEntity>();
foreach (var id in entityIds)
{
if (counter < 100)
{
batchOperation.Add(TableOperation.Retrieve<MyDomainEntityTableEntity>(someId.ToString(CultureInfo.InvariantCulture), id.ToString()));
++counter;
}
else
{
var batchResults = await table.ExecuteBatchAsync(batchOperation);
var batchResultEntities = batchResults.Select(o => ((MyDomainEntityTableEntity)o.Result).ToMyDomainEntity()).ToList();
myDomainEntities .AddRange(batchResultEntities );
batchOperation.Clear();
counter = 0;
}
}
return myDomainEntities;
}
catch (Exception ex)
{
_logger.Error(ex);
throw;
}
}
How can I achieve what I'm after without manually looping through the set of row keys and doing an individual Retrieve table operation for each one? I don't want to incur the cost associated with doing this since I could have hundreds of row keys that I want to filter on.

I made a helper method to do it in a single request per partition.
Use it like this:
var items = table.RetrieveMany<MyDomainEntity>(partitionKey, nameof(TableEntity.RowKey),
rowKeysList, columnsToSelect);
Here's the helper methods:
public static List<T> RetrieveMany<T>(this CloudTable table, string partitionKey,
string propertyName, IEnumerable<string> valuesRange,
List<string> columnsToSelect = null)
where T : TableEntity, new()
{
var enitites = table.ExecuteQuery(new TableQuery<T>()
.Where(TableQuery.CombineFilters(
TableQuery.GenerateFilterCondition(
nameof(TableEntity.PartitionKey),
QueryComparisons.Equal,
partitionKey),
TableOperators.And,
GenerateIsInRangeFilter(
propertyName,
valuesRange)
))
.Select(columnsToSelect))
.ToList();
return enitites;
}
public static string GenerateIsInRangeFilter(string propertyName,
IEnumerable<string> valuesRange)
{
string finalFilter = valuesRange.NotNull(nameof(valuesRange))
.Distinct()
.Aggregate((string)null, (filterSeed, value) =>
{
string equalsFilter = TableQuery.GenerateFilterCondition(
propertyName,
QueryComparisons.Equal,
value);
return filterSeed == null ?
equalsFilter :
TableQuery.CombineFilters(filterSeed,
TableOperators.Or,
equalsFilter);
});
return finalFilter ?? "";
}
I have tested it for less than 100 values in rowKeysList, however, if it even throws an exception if there are more, we can always split the request into parts.

With hundreds of row keys, that rules out using $filter with a list of row keys (which would result in partial partition scan anyway).
With the error you're getting, it seems like the batch contains both queries and other types of operations (which isn't permitted). I don't see why you're getting that error, from your code snippet.
Your only other option is to execute individual queries. You can do these asynchronously though, so you wouldn't have to wait for each to return. Table storage provides upwards of 2,000 transactions / sec on a given partition, so it's a viable solution.

Not sure how I missed this in the first place, but here is a snippet from the MSDN documentation for the TableBatchOperation type:
A batch operation may contain up to 100 individual table operations, with the requirement that each operation entity must have same partition key. A batch with a retrieve operation cannot contain any other operations. Note that the total payload of a batch operation is limited to 4MB.
I ended up executing individual retrieve operations asynchronously as suggested by David Makogon.

I made my own ghetto link-table. I know it's not that efficient (maybe its fine) but I only make this request if the data is not cached locally, which only means switching devices. Anyway, this seems to work. Checking the length of the two arrays lets me defer the context.done();
var query = new azure.TableQuery()
.top(1000)
.where('PartitionKey eq ?', 'link-' + req.query.email.toLowerCase() );
tableSvc.queryEntities('linkUserMarker',query, null, function(error, result, response) {
if( !error && result ){
var markers = [];
result.entries.forEach(function(e){
tableSvc.retrieveEntity('markerTable', e.markerPartition._, e.RowKey._.toString() , function(error, marker, response){
markers.push( marker );
if( markers.length == result.entries.length ){
context.res = {
status:200,
body:{
status:'error',
markers: markers
}
};
context.done();
}
});
});
} else {
notFound(error);
}
});

I saw your post when I was looking for a solution, in my case I needed to be look up multiple ids at the same time.
Because there is no contains linq support (https://learn.microsoft.com/en-us/rest/api/storageservices/query-operators-supported-for-the-table-service) I just made a massive or equals chain.
Seems to be working for me so far hope it helps anyone.
public async Task<ResponseModel<ICollection<TAppModel>>> ExecuteAsync(
ICollection<Guid> ids,
CancellationToken cancellationToken = default
)
{
if (!ids.Any())
throw new ArgumentOutOfRangeException();
// https://learn.microsoft.com/en-us/rest/api/storageservices/query-operators-supported-for-the-table-service
// Contains not support so make a massive or equals statement...lol
var item = Expression.Parameter(typeof(TTableModel), typeof(TTableModel).FullName);
var expressions = ids
.Select(
id => Expression.Equal(
Expression.Constant(id.ToString()),
Expression.MakeMemberAccess(
Expression.Parameter(typeof(TTableModel), nameof(ITableEntity.RowKey)),
typeof(TTableModel).GetProperty(nameof(ITableEntity.RowKey))
)
)
)
.ToList();
var builderExpression = expressions.First();
builderExpression = expressions
.Skip(1)
.Aggregate(
builderExpression,
Expression.Or
);
var finalExpression = Expression.Lambda<Func<TTableModel, bool>>(builderExpression, item);
var result = await _azureTableService.FindAsync(
finalExpression,
cancellationToken
);
return new(
result.Data?.Select(_ => _mapper.Map<TAppModel>(_)).ToList(),
result.Succeeded,
result.User,
result.Messages.ToArray()
);
}
public async Task<ResponseModel<ICollection<TTableEntity>>> FindAsync(
Expression<Func<TTableEntity,bool>> filter,
CancellationToken ct = default
)
{
try
{
var queryResultsFilter = _tableClient.QueryAsync<TTableEntity>(
FilterExpressionTree(filter),
cancellationToken: ct
);
var items = new List<TTableEntity>();
await foreach (TTableEntity qEntity in queryResultsFilter)
items.Add(qEntity);
return new ResponseModel<ICollection<TTableEntity>>(items);
}
catch (Exception exception)
{
_logger.Error(
nameof(FindAsync),
exception,
exception.Message
);
// OBSFUCATE
// TODO PASS ERROR ID
throw new Exception();
}
}

c# linq list with varying where conditions

private void getOrders()
{
try
{
//headerFileReader is assigned with a CSV file (not shown here).
while (!headerFileReader.EndOfStream)
{
headerRow = headerFileReader.ReadLine();
getOrderItems(headerRow.Substring(0,8))
}
}
}
private void getOrderItems(string ordNum)
{
// lines is an array assigned with a CSV file...not shown here.
var sorted = lines.Skip(1).Select(line =>
new
{
SortKey = (line.Split(delimiter)[1]),
Line = line
})
.OrderBy(x => x.SortKey)
.Where(x => x.SortKey == ordNum);
//Note ordNum is different every time when it is passed.
foreach (var orderItems in sorted) {
//Process each line here.
}
}
Above is my code. What I am doing is for every order number from headerFile, I process the detailLines. I would like to only search for those lines specific to the order nr. The above logic works fine but it reads with where clause for every order number which simply is not required as well as delays the process.
I basically want to have getOrderItems something like below but I can't get as the sorted can't be passed but I think it should be possible??
private void getOrderItems(string ordNum)
{
// I would like to have sorted uploaded with data elsewhere and I pass it this function and reference it by other means but I am not able to get it.
var newSorted = sorted.Where(x => x.SortKey == docNum);
foreach (var orderItems in newSorted) {
//Process each line here.
}
}
Please suggest.
UPDATE : Thanks for the responses & improvements but my main question is I don't want to create the list every time (like I have shown in my code). What I want is to create the list first time and then only search within the list for a particular value (here docNum as shown). Please suggest.

It might be a good idea to preprocess your input lines and build a dictionary, where each distinct sort key maps to a list of lines. Building the dictionary is O(n), and after that you get constant time O(1) lookups:
// these are your unprocessed file lines
private string[] lines;
// this dictionary will map each `string` key to a `List<string>`
private Dictionary<string, List<string>> groupedLines;
// this is the method where you are loading your files (you didn't include it)
void PreprocessInputData()
{
// you already have this part somewhere
lines = LoadLinesFromCsv();
// after loading, group the lines by `line.Split(delimiter)[1]`
groupedLines = lines
.Skip(1)
.GroupBy(line => line.Split(delimiter)[1])
.ToDictionary(x => x.Key, x => x.ToList());
}
private void ProcessOrders()
{
while (!headerFileReader.EndOfStream)
{
var headerRow = headerFileReader.ReadLine();
List<string> itemsToProcess = null;
if (groupedLines.TryGetValue(headerRow, out itemsToProcess))
{
// if you are here, then
// itemsToProcess contains all lines where
// (line.Split(delimiter)[1]) == headerRow
}
else
{
// no such key in the dictionary
}
}
}

The following will get your way and also be more efficient.
var sorted = lines.Skip(1)
.Where(line => (line.Split(delimiter)[1] == ordNum))
.Select(
line =>
new
{
SortKey = (line.Split(delimiter)[1]),
Line = line
}
)
.OrderBy(x => x.SortKey);

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

C# Parallel.ForEach with shared function throws IndexOutOfRangeException - c#

Related

Merge data from two arrays or something else

Fastest equivalent of comparing all elements of array

Comparing attributes from a List inside a List

How to batch retrieve entities?

c# linq list with varying where conditions

Categories

Resources