List<> optimization, what are the possibilities? - c#

I have implemented in the Add() method of my generic EF repository a check whether the row I'm going to insert already exists in the table, if it does, update it with the currently available info.
private List<T> _previousEntries;
//Try and find one with same PK in the temporary tables...
var previousRow = _previousEntries.Single(n => (int)n.GetType().GetProperty(_PKName).GetValue(n, null) == entPKValue);
//If we did find one...
if (previousRow != null)
{
//Update the row...
return;
}
//Add it...
So I know, I'm using reflection, which is slow but I have not found another way since different entities have different SQL PK names.
But I'm not sure that reflection is the biggest issue here, sometimes, _previousEntries will hold up to 800,000 items.
_previousEntries has its items assigned to it in the class constructor of the repository class. _PKName is also assigned a value in the class constructor depending on the type of T.
If I just set a breakpoint on the Single() statement, it can be processing for 2-3 seconds for sure so I don't know how I could determine what is the bottleneck here: reflection or Single() on 800,000 items... It sure goes way faster on a 5,000 items list.
Any opinions ? Is there anything I can do to optimize my List ?

You could move the reflection out of the LINQ statement to avoid it being evaluated repeatedly:
var property = typeof(T).GetProperty(_PKName);
var previousRow = _previousEntries.Single(n => (int)property.GetValue(n, null) == entPKValue);
Or perhaps pass a Func<T, int> to your class constructor and avoid the need for reflection altogether.
private Func<T, int> _getKeyForItem; // Set by constructor
...
var previousRow = _previousEntries.Single(n => _getkeyForItem(n) == entPKValue);

Provide a primary key accessor as a delegate
public class Repository<T>
{
private Funct<T,int> _getPK;
private Dictionary<int,T> _previousEntries;
public Repository(Funct<T,int> getPK)
{
_getPK = getPK;
_previousEntries = new Dictionary<int,T>();
}
public void Add(T item) {
...
int pk = _getPK(item);
T previousEntry;
if (_previousEntries.TryGetValue(pk, out previousEntry) {
// Update
} else {
// Add
_previousEntries.Add(pk, item);
}
}
}
You would create a repositiory like this
var clientRepository = new Repositiory<Client>(c => c.ClientID);

There is no way to make searching through not sorted list fast. It will be O(number-of-items).
You need to use some other data structure to make look up faster - Dictionary or list sorted by PK are possible options.

Related

Reassign object in foreach loop c#

Not sure I understand why I can do this with a for loop and not a foreach loop?
This is the code that works. Looping through a BindingList Products, finding a match and then assigning that product at index i to the new product that's passed in.
public static void UpdateProduct(int productToUpdateID, Product productToUpdate)
{
for (int i = 0; i < Products.Count; i++)
{
if (Products[i].ProductID == productToUpdateID)
{
Products[i] = productToUpdate;
}
}
}
If I try to do this with a foreach loop I get an error that I cannot assign to the iterator variable. What is the reasoning for this and is there a way to get around it or is using a for loop for this kind of problem the best solution?
This is essentially what I'm trying to do.
public static void UpdateProduct(int productToUpdateID, Product productToUpdate)
{
foreach(Product product in Products)
{
if (product.ProductID == productToUpdateID)
{
product = productToUpdate;
}
}
}
I can do something like this and reassign all the properties explicitly but want to see if there is another way to do it.
foreach(Product product in Products)
{
if (product.ProductID == productToUpdateID)
{
product.Name = productToUpdate.Name;
}
}
Thanks!
The foreach construct is for when you want to do something with each item in the list. That does not seem to be what you are doing. You are modifying the list itself, by removing an item and replacing it.
Personally I would not use a loop at all, I'd just remove the old item and add the new one.
public static void UpdateProduct(int productToUpdateID, Product productToUpdate)
{
Products.RemoveAll( x => x.ProductID == productToUpdateID );
Products.Add( productToUpdate );
}
Or if you wish to preserve order:
public static void UpdateProduct(int productToUpdateID, Product productToUpdate)
{
var index = Products.FindIndex( x => x.ProductID == productToUpdateID );
Products[index] = productToUpdate;
}
The reasons have already been given, but as a minor detail: this is sometimes possible; there is an alternative syntax in recent C# that uses a ref-local for the iterator value:
foreach (ref [readonly] SomeType value in source)
which is only available for some scenarios - naked arrays, spans, or custom iterator types with a ref-return Current - and as long as the optional readonly modifier is not used, you can assign directly via the value variable, since this is a direct reference to the underlying source. The uses for this are rare and niche. If Products is a List<T>, you could combine this with CollectionMarshal.AsSpan(...) to achieve what you want, but frankly I'd consider that hacky (apart from other things, it would bypass the list's internal change protections). Basically: don't do this, but : it isn't entirely impossible.
The foreach loop iterates over the elements of a collection, and the iteration variable is simply a reference to the current element in the collection.
The reason you cannot modify the iteration variable itself is that it is a read-only reference to the element in the collection. Modifying the iteration variable would not change the element in the collection; it would only change the reference.
Alternative ways are already mentioned in the above answers.
Just for the record. IMHO the best way is to use a foreach loop with a modified code like this. It only makes one iteration
int i=-1;
foreach (var product in products )
{
i++;
if (product.ProductID == productToUpdate.ProductID)
{
products[i]=productToUpdate;
break;
}
}
But if you want to use linq for some reason, you can do it in one line
products = products.Select(x => x = x.ProductID == productToUpdate.ProductID?productToUpdate:x).ToList();

C# Recursive Search an array of Objects.parent_id for value, then search those and so on till none left

Looking for a solution to find an object.id and get all the parent_id's in an array of objects, and then set object.missed = true.
Object.id, and Object parent_id. If the object doesn't have a parent_id, parent_id = id.
I know how to do it for one level of parent_id's. How can I go unlimited levels deep? Below is the code I have for searching the 1 level.
public class EPlan
{
public int id;
public int parent_id;
public bool is_repeatable;
public bool missed;
}
EPlan[] plans = Array.FindAll(eventsPlan, item => item.parent_id == event_id);
foreach (EPlan plan in plans)
{
plan.missed = true;
plan.is_repeatable = false;
}
I'm trying to search for event_id an int. So I search all of the object.id's for event_id. Once I find object.id == event_id. I need to set object.is_repeatable = false and object.missed = true.
Then I need to search all of the objects.parent_id for current object.id (event_id). Change all of those object to the same as above.
Then I need to check all of those object.id's against all of the object.parent_id's and do the same to those. Like a tree affect. 1 event was missed, and any of the events that are parented to that event need to be set as missed as well.
So far, all I can do is get 1 level deep, or code multiple foreach loops in. But it could be 10 or more levels deep. So that doesn't make sense.
Any help is appreciated. There has to be a better way that the multiple loops.
I too was confused by the question, save for the one line you said:
1 event was missed, and any of the events that are parented to that event need to be set as missed as well.
With that in mind, I suggest the following code will do what you're looking for. Each time you call the method, it will find all of the objects in the array that match the ID and set the event as Missed and Is_Repeatable appropriately.
It also keeps a running list of the Parent_ID's it found during this scan. Once the loop is finished it will call itself, using the list of parent id values instead of the passed in list of events ids it just used. That is the trick that makes the recursion work here.
To start the process off, you call the method with the single event ID you did for 1-level search.
findEvents(new List<string>{event_id}, eventsPlan);
private void findEvents(List<int> eventIDs, EPlan[] eventsPlan)
{
foreach (int eventID in eventIDs)
{
EPlan[] plans = Array.FindAll(eventsPlan, item => item.parent_id == eventID);
List<int> parentIDs = new List<int>();
foreach (EPlan plan in plans)
{
plan.missed = true;
plan.is_repeatable = false;
parentIDs.Add(plan.parent_id);
}
if (parentIDs.Count > 0)
findEvents(parentIDs, eventsPlan);
}
}
I also recommend that if you have the chance to reengineer this code to not use arrays, but a Generic Collection (like List<EPlan>) you can avoid the performance penalty this code has because it's building new arrays in memory each time you call the Array.FindAll method. Using the Generic Collection, or even using old-school foreach loop will work faster when processing a lot of data here.
Update 1:
To answer your question about how you might go about this using a Generic Collection instead:
private void findEventsAsList(List<int> eventIDs, List<EPlan> eventsPlans)
{
List<int> parentIDs = new List<int>();
foreach (EPlan plan in eventsPlans.Where(p => eventIDs.Contains(p.parent_id)))
{
plan.missed = true;
plan.is_repeatable = false;
parentIDs.Add(plan.parent_id);
}
findEventsAsList(parentIDs, eventsPlan);
}

Create a copy of IEnumerable<T> to modify collection from different threads?

I am using a thread party data model which uses it's custom data model. Hierarchy of the data model is as below:
Model
---Tables(type of Table)
-----Rows(type of Row)
-------Cells( type of Cell)
Table has property Rows as like DataTable and I have to access this property in more than tasks. Now I need a row from the table which has a column value to the specified value.
To do this, I have created a method which has lock statement to make it accessible from only one thread once.
public static Row GetRowWithColumnValue(Model model, string tableKey, string indexColumnKey, string indexColumnValue)
{
Row simObj = null;
lock (syncRoot)
{
SimWrapperFromValueFactory wrapperSimSystem = new SimWrapperFromValueFactory(model, tableKey, indexColumnKey);
simObj = wrapperSimSystem.GetWrapper(indexColumnValue);
}
return simObj;
}
To create the lookup for one of the column in Table, I have create a method which always try to create a copy of the rows to avoid collection modified exception:
Private Function GetTableRows(table As Table) As List(Of Row)
Dim rowsList As New List(Of Row)(table.Rows) 'Case 1
'rowsList.AddRange(table.Rows) 'Case 2
' Case 3
'For i As Integer = 0 To table.Rows.Count - 1
'rowsList.Add(table.Rows.ElementAt(i))
'Next
Return rowsList
End Function
but other threads can modify the table(e.g. add, remove rows or update column value in any rows). I am getting below "Collection modified exception":
at System.ThrowHelper.ThrowInvalidOperationException(ExceptionResource resource)
at System.Collections.Generic.List`1.Enumerator.MoveNextRare()
at System.Collections.Generic.List`1.InsertRange(Int32 index, IEnumerable`1 collection)
I cannot modify this third party library to concurrent collections and this same Data Model shared between multiple project.
Question: I hunting for the solution that let me allow multiple readers on this collection either it modified in another threads.. Is it possible to Get a copy of the collection without getting exception??
Referenced below SO threads but did not find exact solution:
Lock vs. ToArray for thread safe foreach access of List collection
Can ToArray() throw an exception?
Is returning an IEnumerable<> thread-safe?
The simplest solution is to retry on exception, like this:
private List<Row> CopyVolatileList(IEnumerable<Row> original)
{
while (true)
{
try
{
List<Row> copy = new List<Row>();
foreach (Row row in original) {
copy.Add(row);
}
// Validate.
if (copy.Count != 0 && copy[copy.Count - 1] == null) // Assuming Row is a reference type.
{
// At least one element was removed from the list while were copying.
continue;
}
return copy;
}
catch (InvalidOperationException)
{
// Check ex.Message?
}
// Keep trying.
}
}
Eventually you'll get a run where the exception isn't thrown and the data integrity validation passes.
Alternatively, you can dive deep (and I mean very, very deep).
DISCLAIMER: Never ever use this in production. Unless you're desperate and really have no other option.
So we've established that you're working with a custom collection (TableRowCollection) which ultimately uses List<Row>.Enumerator to iterate through the rows. This strongly suggests that your collection is backed by a List<Row>.
First things first, you need to get a reference to that list. Your collection will not expose it publicly, so you'll need to fiddle a bit. You will need to use Reflection to find and get the value of the backing list. I recommend looking at your TableRowCollection in the debugger. It will show you non-public members and you will know what to reflect.
If you can't find your List<Row>, then take a closer look at TableRowCollection.GetEnumerator() - specifically GetEnumerator().GetType(). If that returns List<Row>.Enumerator, then bingo: we can get the backing list out of it, like so:
List<Row> list;
using (IEnumerator<Row> enumerator = table.GetEnumerator())
{
list = (List<Row>)typeof(List<Row>.Enumerator)
.GetField("list", BindingFlags.Instance | BindingFlags.NonPublic)
.GetValue(enumerator);
}
If the above methods of getting your List<Row> have failed, there is no need to read further. You might as well give up.
In case you've succeeded, now that you have the backing List<Row>, we'll have to look at Reference Source for List<T>.
What we see is 3 fields being used:
private T[] _items;
private int _size; // Accessible via "Count".
private int _version;
Our goal is to copy the items whose indexes are between zero and _size - 1 from the _items array into a new array, and to do so in between _version changes.
Observations re thread safety: List<T> does not use locks, none of the fields are marked as volatile and _version is incremented via ++, not Interlocked.Increment. Long story short this means that it is impossible to read all 3 field values and confidently say that we're looking at stable data. We'll have to read the field values repeatedly in order to be somewhat confident that we're looking at a reasonable snapshot (we will never be 100% confident, but you might choose to settle for "good enough").
using System;
using System.Collections.Generic;
using System.Linq.Expressions;
using System.Reflection;
using System.Threading;
private Row[] CopyVolatileList(List<Row> original)
{
while (true)
{
// Get _items and _size values which are safe to use in tandem.
int version = GetVersion(original); // _version.
Row[] items = GetItems(original); // _items.
int count = original.Count; // _size.
if (items.Length < count)
{
// Definitely a torn read. Copy will fail.
continue;
}
// Copy.
Row[] copy = new Row[count];
Array.Copy(items, 0, copy, 0, count);
// Stabilization window.
Thread.Sleep(1);
// Validate.
if (version == GetVersion(original)) {
return copy;
}
// Keep trying.
}
}
static Func<List<Row>, int> GetVersion = CompilePrivateFieldAccessor<List<Row>, int>("_version");
static Func<List<Row>, Row[]> GetItems = CompilePrivateFieldAccessor<List<Row>, Row[]>("_items");
static Func<TObject, TField> CompilePrivateFieldAccessor<TObject, TField>(string fieldName)
{
ParameterExpression param = Expression.Parameter(typeof(TObject), "o");
MemberExpression fieldAccess = Expression.PropertyOrField(param, fieldName);
return Expression
.Lambda<Func<TObject, TField>>(fieldAccess, param)
.Compile();
}
Note re stabilization window: the bigger it is, the more confidence you have that you're not dealing with a torn read (because the list is in process of modifying all 3 fields). I've settled on the smallest value I couldn't fail in my tests where I called CopyVolatileList in a tight loop on one thread, and used another thread to add items to the list, remove them or clear the list at random intervals between 0 and 20ms.
If you remove the stabilization window, you will occasionally get a copy with uninitialized elements at the end of the array because the other thread has removed a row while you were copying - that's why it's needed.
You should obviously validate the copy once it's built, to the best of your ability (at least check for uninitialized elements at the end of the array in case the stabilization window fails).
Good luck.

Exception when modifying collection in foreach loop

I know the basic principle of not modifying collection inside a foreach, that's why I did something like this:
public void UpdateCoverages(Dictionary<PlayerID, double> coverages)
{
// TODO: temp
var keys = coverages.Select(pair => pair.Key);
foreach (var key in keys)
{
coverages[key] = 0.84;
}
}
And:
class PlayerID : IEquatable<PlayerID>
{
public PlayerID(byte value)
{
Value = value;
}
public byte Value { get; private set; }
public bool Equals(PlayerID other)
{
return Value == other.Value;
}
}
First I save all my keys not to have the Collection modified exception and then I go through it. But I still get the exception which I cannot understand.
How to correct this and what is causing the problem?
First I save all my keys
No you don't; keys is a live sequence that is actively iterating the collection as it is iterated by the foreach. To create an isolated copy of the keys, you need to add .ToList() (or similar) to the end:
var keys = coverages.Select(pair => pair.Key).ToList();
Although personally I'd probably go for:
var keys = new PlayerID[coverages.Count];
coverages.Keys.CopyTo(keys, 0);
(which allows for correct-length allocation, and memory-copy)
What is a live sequence actually?
The Select method creates non-buffered spooling iterator over another... that is a really complicated thing to understand, but basically: when you first start iterating var key in keys, it grabs the inner sequence of coverages (aka coverages.GetEnumerator()), and then every time the foreach asks for the next item, it asks for the next item. Yeah, that sounds complicated. The good news is the C# compiler has it all built in automatically, with it generating state machines etc for you. All mainly done using the yield return syntax. Jon Skeet gives an excellent discussion of this in Chapter 6 of C# in Depth. IIRC this used to be the "free chapter", but now it is not.
However, consider the following:
static IEnumerable<int> OneTwoOneTwoForever()
{
while(true) {
yield return 1;
yield return 2;
}
}
It might surprise you to learn that you can consume the above, using the same non-buffered "when you ask for another value, it runs just enough code to give you the next value" approach:
var firstTwenty = OneTwoOneTwoForever().Take(20).ToList(); // works!

how to add an associative index to an array. c#

i have an array of custom objects. i'd like to be able to reference this array by a particular data member, for instance myArrary["Item1"]
"Item1" is actually the value stored in the Name property of this custom type and I can write a predicate to mark the appropriate array item. However I am unclear as to how to let the array know i'd like to use this predicate to find the array item.
I'd like to just use a dictionary or hashtable or NameValuePair for this array, and get around this whole problem but it's generated and it must remain as CustomObj[]. i'm also trying to avoid loading a dictionary from this array as it's going to happen many times and there could be many objects in it.
For clarification
myArray[5] = new CustomObj() // easy!
myArray["ItemName"] = new CustomObj(); // how to do this?
Can the above be done? I'm really just looking for something similar to how DataRow.Columns["MyColumnName"] works
Thanks for the advice.
What you really want is an OrderedDictionary. The version that .NET provides in System.Collections.Specialized is not generic - however there is a generic version on CodeProject that you could use. Internally, this is really just a hashtable married to a list ... but it is exposed in a uniform manner.
If you really want to avoid using a dictionary - you're going to have to live with O(n) lookup performance for an item by key. In that case, stick with an array or list and just use the LINQ Where() method to lookup a value. You can use either First() or Single() depending on whether duplicate entries are expected.
var myArrayOfCustom = ...
var item = myArrayOfCustom.Where( x => x.Name = "yourSearchValue" ).First();
It's easy enough to wrap this functionality into a class so that external consumers are not burdened by this knowledge, and can use simple indexers to access the data. You could then add features like memoization if you expect the same values are going to be accessed frequently. In this way you could amortize the cost of building the underlying lookup dictionary over multiple accesses.
If you do not want to use "Dictionary", then you should create class "myArrary" with data mass storage functionality and add indexers of type "int" for index access and of type "string" for associative access.
public CustomObj this [string index]
{
get
{
return data[searchIdxByName(index)];
}
set
{
data[searchIdxByName(index)] = value;
}
}
First link in google for indexers is: http://www.csharphelp.com/2006/04/c-indexers/
you could use a dictionary for this, although it might not be the best solution in the world this is the first i came up with.
Dictionary<string, int> d = new Dictionary<string, int>();
d.Add("cat", 2);
d.Add("dog", 1);
d.Add("llama", 0);
d.Add("iguana", -1);
the ints could be objects, what you like :)
http://dotnetperls.com/dictionary-keys
Perhaps OrderedDictionary is what you're looking for.
you can use HashTable ;
System.Collections.Hashtable o_Hash_Table = new Hashtable();
o_Hash_Table.Add("Key", "Value");
There is a class in the System.Collections namespace called Dictionary<K,V> that you should use.
var d = new Dictionary<string, MyObj>();
MyObj o = d["a string variable"];
Another way would be to code two methods/a property:
public MyObj this[string index]
{
get
{
foreach (var o in My_Enumerable)
{
if (o.Name == index)
{
return o;
}
}
}
set
{
foreach (var o in My_Enumerable)
{
if (o.Name == index)
{
var i = My_Enumerable.IndexOf(0);
My_Enumerable.Remove(0);
My_Enumerable.Add(value);
}
}
}
}
I hope it helps!
It depends on the collection, some collections allow accessing by name and some don't. Accessing with strings is only meaningful when the collection has data stored, the column collection identifies columns by their name, thus allowing you to select a column by its name. In a normal array this would not work because items are only identified by their index number.
My best recommendation, if you can't change it to use a dictionary, is to either use a Linq expression:
var item1 = myArray.Where(x => x.Name == "Item1").FirstOrDefault();
or, make an extension method that uses a linq expression:
public static class CustomObjExtensions
{
public static CustomObj Get(this CustomObj[] Array, string Name)
{
Array.Where(x => x.Name == Name).FirstOrDefault();
}
}
then in your app:
var item2 = myArray.Get("Item2");
Note however that performance wouldn't be as good as using a dictionary, since behind the scenes .NET will just loop through the list until it finds a match, so if your list isn't going to change frequently, then you could just make a Dictionary instead.
I have two ideas:
1) I'm not sure you're aware but you can copy dictionary objects to an array like so:
Dictionary dict = new Dictionary();
dict.Add("tesT",40);
int[] myints = new int[dict.Count];
dict.Values.CopyTo(myints, 0);
This might allow you to use a Dictionary for everything while still keeping the output as an array.
2) You could also actually create a DataTable programmatically if that's the exact functionality you want:
DataTable dt = new DataTable();
DataColumn dc1 = new DataColumn("ID", typeof(int));
DataColumn dc2 = new DataColumn("Name", typeof(string));
dt.Columns.Add(dc1);
dt.Columns.Add(dc2);
DataRow row = dt.NewRow();
row["ID"] = 100;
row["Name"] = "Test";
dt.Rows.Add(row);
You could also create this outside of the method so you don't have to make the table over again every time.

Categories

Resources