Best data structure for caching objects with a composite unique id - c#

I have a slow function that makes an expensive trip to the server to retrieve RecordHdr objects. These objects are sorted by rid first and then by aid. They are then returned in batches of 5.
| rid | aid |
-------------->
| 1 | 1 | >
| 1 | 3 | >
| 1 | 5 | > BATCH of 5 returned
| 1 | 6 | >
| 2 | 2 | >
-------------->
| 2 | 3 |
| 2 | 4 |
| 3 | 1 |
| 3 | 2 |
| 3 | 5 |
| 3 | 6 |
| 4 | 1 |
| 4 | 2 |
| 4 | 5 |
| 4 | 6 |
After I retrieve the objects, I have to wrap them in another class called WrappedRecordHdr. I'm wondering what is the best data structure I can use to maintain a cache of WrappedRecordHdr objects such that if I'm asked for an object by rid and aid, I return a particular object for it. Also if I'm asked for the rid, I should return all objects that have that rid.
So far I have created two structures for each scenario (This may not be the best way, but It's what I'm using for now):
// key: (rid, aid)
private CacheMap<int, int, WrappedRecordHdr> m_ridAidCache =
new CacheMap<int, int, WrappedRecordHdr>();
// key: (rid)
private CacheMap<int, WrappedRecordHdr[]> m_ridCache =
new CacheMap<int, WrappedRecordHdr[]>();
Also, I'm wondering if there is a way I can rewrite this to be more efficient. Right now I have to get a number of records that I need to wrap within another object. Then, I need to group them in a dictionary by id so that if I am asked for a certain rid I can return all objects that have the same rid. The records have been already sorted, so I'm hoping the GroupBy doesn't attempt to sort them beforehand.
RecordHdr[] records = server.GetRecordHdrs(sessId, BATCH_SIZE) // expensive call to server.
// After all RecordHdr objects are retrieved, we loop through the received objects. For each RecordHdr object a WrappedRecordHdr object has to be created.
WrappedRecordHdr[] wrappedRecords = new WrappedRecordHdr[records.Length];
for (int i = 0; i < wrappedRecords.Length; i++)
{
if (records[i] == null || records[i].aid == 0 || records[i].rid == 0) continue; // skip invalid results.
wrappedRecords[i] = new WrappedRecordHdr(AccessorManager, records[i], projectId);
}
// Group all records found in a dictionary of rid => array of WrappedRecordHdrs, so all records with the same
// rid are returned.
objects associated to a particular rid.
Dictionary<int, WrappedRecordHdr[]> dict = wrappedRecords.GroupBy(obj => obj.rid).ToDictionary(gdc => gdc.Key, gdc => gdc.ToArray());
m_ridCache = dict;

As to the data structure, I think there are really two different questions here:
What structure to use;
Should there be one or two caches;
It seems to me that you want one cache, typed as a MemoryCache. The key would be the RID, and the value would be a Dictionary, where the key is an AID and the value is the header.
This has the following advantages:
The WrappedRecordHdrs are stored only once;
The MemoryCache already has all of the caching logic implemented, so you don't need to rewrite that;
When provided with only an RID, you know the AID of each WrappedRecordHdr (which you don't get with the array in the initial post);
These things are always compromises, so this has disadvantages too of course:
Cache access (get or set) requires constructing a string each time;
RID + AID lookups require indexing twice (as opposed to writing some fast hashing function that takes an RID and AID and returns a single key into the cache, however that would require that you either have two caches (one RID only, one RID + AID) or that you store the same WrappedRecordHdr twice per AID (once for RID + AID and once for null + AID));

Related

Why is my IEnumerable variable also getting updated?

I am a little confused about why the logic here isn't working, and I feel like I have been staring at this bit of code for so long that I am missing something here. I have this method that gets called by another method:
private async Task<bool> Method1(int start, int end, int increment, IEnumerable<ModelExample> examples)
{
for (int i = start; i<= end; i++)
{
ModelExample example = examples.Where(x => x.id == i).Select(i => i).First();
example.id = example.id + increment; //Line X
// do stuff
}
return true;
}
I debuged the code above and it seems like when "Line X" gets executed not only is example.id changed individually but now that example in the List "examples" gets updated with the new id value.
I am not sure why? I want the list "examples" to remain the same for the entirety of the for loop, I am confused why updating the value for example.id updates it in the list as well?
(i.e. if the list before "Line X" had an entry with id = 0, after "Line X" that same entry has its id updated to 1, how can I keep the variable "examples" constant here?)
Any help appreciated, thanks.
This is what your list looks like:
+---------------+
| List examples | +-----------+
+---------------+ | Example |
| | +-----------+
| [0] ---------------> | Id: 1 |
| | | ... |
+---------------+ +-----------+
| |
| [1] ---------------> +-----------+
| | | Example |
+---------------+ +-----------+
| | | Id: 2 |
| ... | | ... |
| | +-----------+
+---------------+
In other words, your list just contains references to your examples, not copies. Thus, your variable example refers to one of the entities on the right-hand side and modifies it in-place.
If you need a copy, you need to create one yourself.

EF retrieves rows with id zero (sometimes) causing new entities to be added to database during update operation

I have an API endpoint that updates a sequence for rows associated with an account.
The logic is similar to this
public async Task<ServiceResult<IOrderedEnumerable<SomeItem>>> ChangeSequence(
ChangeSequenceRequest request,
CancellationToken cancellationToken)
{
IOrderedEnumerable<SomeItem> originalPriorityList = await
someItemRepository.Get(request.AccountId, cancellationToken);
//logic to check whether the lock token is valid
//new sequence order is updated
await someItemRepository.SetSequence(newItemSequence,
originaItemSequence,
request.AccountId,
cancellationToken)
//release the token
//create service result
}
The request contains the following structure:
public class ChangeSequenceRequest
{
public string token {get; set}
public IEnumerable<SomeItem> newSequence { get; set; } = Enumerable.Empty<SomeItem>();
}
Then SomeItem has the following structure
public class SomeItem
{
public int Id {get; set;}
public string Name {get; set;}
public int Sequence {get; set};
}
Here's the logic behind someItemRepository.SetSequence call
public async Task SetSequence(
IOrderedEnumerable<SomeItem> newSequence,
IOrderedEnumerable<SomeItem> originalSequence,
int accountId,
CancellationToken cancellationToken)
{
await _dbContextFactory.DbContext(cancellationToken).Exec(async (dbx, ct) =>
{
Dictionary<string, SomeItem> originalSequenceByName = originalSequence.ToDictionary(x => x.Name);
for(int i = 0; i < newSequence.Count(); i++)
{
SomeItem newSomeItem = newSequence.ElementAt(i);
//logic to skip unchanged items
newSomeItem.Id = originalSequenceByName[newSomeItem.Name].Id;
dbx.Update(newSomeItem);
}
await dbx.SaveChangesAsync();
});
}
Strange behavior observed
I have set up a console application to call the API endpoint 100s times concurrently with the same lock key, but the randomized sequence (still follow a proper arithmetic sequence). Note that the lock key is destroyed every time after a call to the API endpoint. What I have noticed is that sometimes the sequence get's out of whack. I.e. I observe results like this in the database:
What I expect in
SomeItemTable:
| Id | Name | Sequence | AccountId |
|---------------------|-----------------------------------|------------------|
| 99 | A | 1 | 3 |
| 100 | B | 2 | 3 |
| 101 | C | 3 | 3 |
| 102 | D | 4 | 3 |
But I observe:
| Id | Name | Sequence | AccountId |
|---------------------|-----------------------------------|------------------|
| 99 | A | 2 | 3 |
| 100 | B | 4 | 3 |
| 101 | C | 3 | 3 |
| 102 | D | 1 | 3 |
| 103 | D | 1 | 3 |
Or any other variation that ruins the sequence.
What I currently understand about the problem
My current understanding indicates that when updating the database context, the entity state gets set to "Added" instead of "Modified". This occurs when you don't give the primary key to the entity. I.e. leave it as 0. You can see that it is being mapped when calling 'SetSequence'. However, sometimes when retrieving originalSequence via someItemRepository.Get(request.AccountId, cancellationToken); I get some entries in the collection that has id 0. This is the main crux of the problem, and I have no understanding as to why this happens.
What I expected to happen
I expect the database to simply have a sequence update of the last call that finished executed on the table as per "Last in Wins"
I've figured it out. The repository for the retrieval of the sequence was decorated with cache interface (which under this configuration uses redis) and the records within the cache would not have been assigned id's, which meant that every time I was retrieving records, some of them would have added state instead of modified due to unassigned id.

Get index array of value changes or first distinct values from array of Objects

I have a sorted array of objects that contains 3 doubles, lets call them x, y and z. I need to create an array of indexes for each transition point, example:
index x y z
------------------
0 | 3 | 2 | 1
1 | 4 | 3 | 1
2 | 3 | 1 | 1
3 | 1 | 3 | 2
4 | 3 | 1 | 2
5 | 1 | 3 | 3
6 | 1 | 3 | 3
Should give me the array {3,5} as that is the point where z changes, I have tried
var ans = myObjectArr.Select(a => a.z).Distinct().ToList();but that simply gives me a list of the values themselves and not the index they are in the object array. The array is very large and i would like to avoid iterating my way through it discreetly. I feel like i'm just missing something silly and any help would be appreciated. Generating a list or an array would both be acceptable.
EDIT: I was under the impression that using Linq was faster than doing it iteratively, after reading comments i find this not to be the case and because of this Charlieface's answer is the best one for my problem.
var lastZ = myObjectArr[0].z;
var zIndexes = new List<int>();
for(var i = 1; i < myObjectArr.Length; i++)
{
if(myObjectArr[i] != lastZ)
{
lastZ = myObjectArr[i];
zIndexes.Add(i);
}
}
// or you can use obscure Linq code if it makes you feel better
var zIndexes = myObjectArr
.Select((o, i) => (o, i))
.Skip(1)
.Aggregate(
new List<int>(),
(list, tuple) => {
if(tuple.o.z != lastZ)
{
lastZ = tuple.o.z;
list.Add(tuple.i);
}
return list;
} );

C# Retrieving the absolute value closest to 0 from a sum in a list

What I'm trying to do is get the Time where Math.Abs(A + B + C) is closest to 0 for each ID. It's quite hefty. I have a list (that I got from a CSV file) that kind of looks like this:
|------------|------------|-------|-------|-------|
| Time | ID | A | B | C |
|------------|------------|-------|-------|-------|
| 100 | 1 | 1 | 2 | 2 |
|------------|------------|-------|-------|-------|
| 100 | 2 | 3 | 4 | 3 |
|------------|------------|-------|-------|-------|
| 200 | 1 | 1 | 0 | 3 |
|------------|------------|-------|-------|-------|
| 200 | 2 | 1 | 2 | 0 |
|------------|------------|-------|-------|-------|
I have the following code, and while it it not complete yet, it technically prints the value that I want. However when I debug it, it doesn't seem to be looping through the IDs, but does it all in one loop. My idea was that I could get all the distinct IDs, then go through the distinct Time for each, put them all in a temporary list, then just use Aggregate to get the value closest to 0. But I feel there should be a more efficient approach than this. Is there a quicker LINQ function I can use to achieve what I want?
for (int i = 0; i < ExcelRecords.Select(x => x.Id).Distinct().Count(); i++)
{
for (int j = 0; j < ExcelRecords.Select(y => y.Time).Distinct().Count(); j++)
{
List<double> test = new List<double>();
var a = ExcelRecords[j].a;
var b = ExcelRecords[j].b;
var c = ExcelRecords[j].c;
test.Add(Math.Abs(pitch + yaw + roll));
Console.WriteLine(Math.Abs(pitch + yaw + roll));
}
}
Using explicit for loops like this is throwing the power and flexibility of LINQ out the window. LINQ's power comes from enumerators which can be used on their own or in conjunction with a foreach loop.
You are also doing way more work than necessary using Distinct to first get a list and then using that list to selectively group the rows into batches. This is what GroupBy was designed for.
var times = ExcelRecords.GroupBy((row) => row.Id)
.Select((g) => g.Aggregate((min, row) => Math.Abs(row.a + row.b + row.c) < Math.Abs(min.a + min.b + min.c) ? row : min).Time)
.ToList();
I think the most readable way would be to group by Id, order each group and select the first Time from each:
var times = ExcelRecords.GroupBy(x => x.Id)
.Select(grp => grp.OrderBy(item => Math.Abs(item.A + item.B + item.C)))
.Select(grp => grp.First().Time);
Working Example

Query a list object by a list int

I have a list of object (with id) and a list int, what is the best way to query that list object provided the list int.
class CommonPartRecord {
public int Id {get;set;}
public object Others {get;set;}
}
var listObj = new List<CommonPartRecord>();
// Load listObj
var listId = new List<int>();
// Load listId
Now select the listObj, for those Id is contained in listId, I currently do this way:
var filterItems = listObj.Where(x => listId.Contains(x.Id));
What would be the faster way to perform this?
Thanks,
Huy
var tmpList = new HashSet<int>(listId);
var filterItems = listObj.Where(x => tmpList.Contains(x.Id));
This could give you a performance boost or a performance drop, it depends heavily on the size of both listObj and listId.
You will need to profile it to see if you get any improvement from it.
Explaining the boost or drop:
I am going to use some really exagrated numbers to make the math easier, lets say the following
listObj.Where(x => listId.Contains(x.Id)); takes 5 seconds per row.
listObj.Where(x => tmpList.Contains(x.Id)) takes 2 seconds per row.
var tmpList = new HashSet<int>(listId); takes 10 seconds to build.
Lets plot out the times of how long it would take to process the data varying by the number of rows in listObj
+----------------+------------------------------+----------------------------------+
| Number of rows | Seconds to process with list | Seconds to process with hash set |
+----------------+------------------------------+----------------------------------+
| 1 | 5 | 12 |
| 2 | 10 | 14 |
| 3 | 15 | 16 |
| 4 | 20 | 18 |
| 5 | 25 | 20 |
| 6 | 30 | 22 |
| 7 | 35 | 24 |
| 8 | 40 | 26 |
+----------------+------------------------------+----------------------------------+
So you can see if listObj has 1 to 3 rows your old way is faster, however once you have 4 rows or more the new way is faster.
(Note I totally made these numbers up, I can guarantee that the per row for HashSet is faster than per row for List, but I can not tell you how much faster. You will need to test to see if the point you get better performance is at 4 rows or 4,000 rows. The only way to know is try both ways and test.)

Categories

Resources