I have a list with 1000000 items and I need to figure out if a item is inside but by reference. Therefore I can't use Contains since Contains doesn't always match by reference (e.g. when list of type string). I tried list.Any(x => object.ReferenceEquals) but that is too slow.
Take a look here:
for(int i = 0; i < 1000000; i++)
{
if(does list contains this item anotherList[i])
{
list.Add(anotherList[i]);
}
}
How do I perform this really fast?
Use a dictionary with an IdendityEqualityComparer to get the key comparison in the dictionary to do a reference comparison. The main difference between this approach and yours is that you have an O(1) lookup, instead of an O(n) lookup that you get from having to go through an entire list for each item.
Put the following code inside a sample Console app project; it basically splits a master dictionary into two.
public class IdentityEqualityComparer<T> : IEqualityComparer<T> where T : class
{
public int GetHashCode(T value)
{
return RuntimeHelpers.GetHashCode(value);
}
public bool Equals(T left, T right)
{
return left == right; // Reference identity comparison
}
}
public class RefKeyType
{
public int ID { get; set; }
}
class Program
{
public static void Main()
{
var refDictionary = new Dictionary<RefKeyType, int>(1000000, new IdentityEqualityComparer<RefKeyType>());
var testDictionary = new Dictionary<RefKeyType, int>(1000000, new IdentityEqualityComparer<RefKeyType>());
var store = new Dictionary<RefKeyType, int>(1000000);
for (var i = 0; i < 1000000; i++)
{
var key = new RefKeyType() {ID = i};
refDictionary[key] = i;
//Load the test dictionary if I is divisible by 2
if (i%2 == 0)
{
testDictionary[key] = i;
}
}
foreach (var key in refDictionary.Keys)
{
int val;
if (!testDictionary.TryGetValue(key, out val))
{
store[key] = val;
}
}
Console.WriteLine("Master dictionary has " + refDictionary.Count);
Console.WriteLine("Test dictionary has " + testDictionary.Count);
Console.WriteLine("Store dictionary has " + store.Count);
Console.WriteLine("Press any key to exit.");
Console.ReadKey();
}
}
Related
(This problem is a adaptation of a real life scenario, I reduced the problem so it is easy to understand, otherwise this question would be 10000 lines long)
I have a pipe delimited text file that looks like this (the header is not in the file):
Id|TotalAmount|Reference
1|10000
2|50000
3|5000|1
4|5000|1
5|10000|2
6|10000|2
7|500|9
8|500|9
9|1000
The reference is optional and is the Id of another entry in this text file. The entries that have a reference, are considered "children" of that reference, and the reference is their parent. I need to validate each parent in the file, and the validation is that the sum of TotalAmount of it's children should be equal to the parent's total amount. The parents can be either first or before their children in the file, like the entry with Id 9, that comes after it's children
In the provided file, the entry with Id 1 is valid, because the sum of the total amount of it's children (Ids 3 and 4) is 10000 and the entry with Id 2 is invalid, because the sum of it's children (Ids 5 and 6) is 20000.
For a small file like this, I could just parse everything to objects like this (pseudo code, I don't have a way to run it now):
class Entry
{
public int Id { get; set; }
public int TotalAmout { get; set; }
public int Reference { get; set; }
}
class Validator
{
public void Validate()
{
List<Entry> entries = GetEntriesFromFile(#"C:\entries.txt");
foreach (var entry in entries)
{
var children = entries.Where(e => e.Reference == entry.Id).ToList();
if (children.Count > 0)
{
var sum = children.Sum(e => e.TotalAmout);
if (sum == entry.TotalAmout)
{
Console.WriteLine("Entry with Id {0} is valid", entry.Id);
}
else
{
Console.WriteLine("Entry with Id {0} is INVALID", entry.Id);
}
}
else
{
Console.WriteLine("Entry with Id {0} is valid", entry.Id);
}
}
}
public List<Entry> GetEntriesFromFile(string file)
{
var entries = new List<Entry>();
using (var r = new StreamReader(file))
{
while (!r.EndOfStream)
{
var line = r.ReadLine();
var splited = line.Split('|');
var entry = new Entry();
entry.Id = int.Parse(splited[0]);
entry.TotalAmout = int.Parse(splited[1]);
if (splited.Length == 3)
{
entry.Reference = int.Parse(splited[2]);
}
entries.Add(entry);
}
}
return entries;
}
}
The problem is that I am dealing with large files (10 GB), and that would load way to many objects in memory.
Performance itself is NOT a concern here. I know that I could use dictionaries instead of the Where() method for example. My only problem now is performing the validation without loading everything to memory, and I don't have any idea how to do it, because a entry at the bottom of the file may have a reference to the entry at the top, so I need to keep track of everything.
So my question is: it is possible to keep track of each line in a text file without loading it's information into memory?
Since performance is not an issue here, I would approach this in the following way:
First, I would sort the file so all the parents go right before their children. There are classical methods for sorting huge external data, see https://en.wikipedia.org/wiki/External_sorting
After that, the task becomes pretty trivial: read a parent data, remember it, read and sum children data one by one, compare, repeat.
All you really need to keep in memory is the expected total for each non-child entity, and the running sum of the child totals for each parent entity. Everything else you can throw out, and if you use the File.ReadLines API, you can stream over the file and 'forget' each line once you've processed it. Since the lines are read on demand, you don't have to keep the entire file in memory.
public class Entry
{
public int Id { get; set; }
public int TotalAmount { get; set; }
public int? Reference { get; set; }
}
public static class EntryValidator
{
public static void Validate(string file)
{
var entries = GetEntriesFromFile(file);
var childAmounts = new Dictionary<int, int>();
var nonChildAmounts = new Dictionary<int, int>();
foreach (var e in entries)
{
if (e.Reference is int p)
childAmounts.AddOrUpdate(p, e.TotalAmount, (_, n) => n + e.TotalAmount);
else
nonChildAmounts[e.Id] = e.TotalAmount;
}
foreach (var id in nonChildAmounts.Keys)
{
var expectedTotal = nonChildAmounts[id];
if (childAmounts.TryGetValue(id, out var childTotal) &&
childTotal != expectedTotal)
{
Console.WriteLine($"Entry with Id {id} is INVALID");
}
else
{
Console.WriteLine($"Entry with Id {id} is valid");
}
}
}
private static IEnumerable<Entry> GetEntriesFromFile(string file)
{
foreach (var line in File.ReadLines(file))
yield return GetEntryFromLine(line);
}
private static Entry GetEntryFromLine(string line)
{
var parts = line.Split('|');
var entry = new Entry
{
Id = int.Parse(parts[0]),
TotalAmount = int.Parse(parts[1])
};
if (parts.Length == 3)
entry.Reference = int.Parse(parts[2]);
return entry;
}
}
This uses a nifty extension method for IDictionary<K, V>:
public static class DictionaryExtensions
{
public static TValue AddOrUpdate<TKey, TValue>(
this IDictionary<TKey, TValue> dictionary,
TKey key,
TValue addValue,
Func<TKey, TValue, TValue> updateCallback)
{
if (dictionary == null)
throw new ArgumentNullException(nameof(dictionary));
if (updateCallback == null)
throw new ArgumentNullException(nameof(updateCallback));
if (dictionary.TryGetValue(key, out var value))
value = updateCallback(key, value);
else
value = addValue;
dictionary[key] = value;
return value;
}
}
Having a model something like this (I cannot change this):
public class SomeObject
{
public int Amount { get; set; }
public int TotalAmount { get; set; }
}
I need to iterate an array of SomeObject to populate some values and accumulate (perform not simple calculations) another fields.
static void Main(string[] args)
{
List<SomeObject> myCollection = new List<SomeObject>()
{
new SomeObject() { Amount = 3 },
new SomeObject() { Amount = 6 },
new SomeObject() { Amount = 9 }
};
int totalAccumulated = 0;
for (int i = 0; i < myCollection.Count; i++)
{
PopulateAndCalculate(myCollection[i], ref totalAccumulated);
}
//I don't want to create here a second for to iterate again all myCollection to set his TotalAmount property.
//There is another way?
Console.WriteLine($"The total accumulated is: {totalAccumulated}");
}
private static void PopulateAndCalculate(SomeObject prmObject, ref int accumulatedTotal)
{
//Populate a lot of another fields
accumulatedTotal += prmObject.Amount;
prmObject.TotalAmount = accumulatedTotal; //This don't work, but I need something alike
}
I don't want a second for statement to update TotalAmount property of each item in myCollection.
The main requirement is iterate the whole array, few times, don't care about string interpolation this is a short demo, this code must run in .net 2.0.
Theres is a clean/better way?
The solution is actually simple, though it's not exactly a good coding practice.
What you really need is for TotalAmount to be a static property. Without that, there's this:
static void Main(string[] args)
{
List<SomeObject> myCollection = new List<SomeObject>()
{
new SomeObject() { Amount = 3 },
new SomeObject() { Amount = 6 },
new SomeObject() { Amount = 9 }
};
int totalAccumulated = 0;
for (int i = 0; i < myCollection.Count; i++)
{
PopulateAndCalculate(myCollection[i], ref totalAccumulated);
}
/*****This is the new part*******/
myCollection[0].TotalAmount = totalAccumulated;
myCollection[1].TotalAmount = totalAccumulated;
myCollection[2].TotalAmount = totalAccumulated;
Console.WriteLine($"The total accumulated is: {totalAccumulated}");
}
private static void PopulateAndCalculate(SomeObject prmObject, ref int accumulatedTotal)
{
//Populate a lot of another fields
accumulatedTotal += prmObject.Amount;
//no need to mess with the total here as far as the properties are concerned.
}
You can st fields inside linq expression.
Could you consider this please
myCollection.ForEach(c => c.TotalAmount = myCollection.Sum(a => a.Amount));
Console.WriteLine($"Total accumulated :{myCollection.First().TotalAmount}");
I found a solution using the Observer Pattern.
Firstly I created a global delegate to be used by an event:
public delegate void UpdateTotalAmountDelegate(int totalAmount);
Then a new class called: 'CalculatorSetter'
public class CalculatorSetter
{
public event UpdateTotalAmountDelegate UpdateTotalAmounthHandler;
public void UpdateTotalAmount(int prmTotalAmount)
{
UpdateTotalAmounthHandler(prmTotalAmount);
}
}
I refactor the data object 'SomeObject' adding a field of type CalculatorSetter.
public class SomeObject
{
private CalculatorSetter finalCalculator;
public void SetCalculator(CalculatorSetter prmCalculator)
{
this.finalCalculator = prmCalculator;
finalCalculator.UpdateTotalAmounthHandler += FinalCalculator_UpdateTotalAmounthHandler;
}
private void FinalCalculator_UpdateTotalAmounthHandler(int totalAmount)
{
this.TotalAmount = totalAmount;
}
//Some Other Fields
public int Amount { get; set; }
public int TotalAmount { get; set; }
}
And my original code and unique for:
static void Main(string[] args)
{
List<SomeObject> myCollection = new List<SomeObject>()
{
new SomeObject() { Amount = 3 },
new SomeObject() { Amount = 6 },
new SomeObject() { Amount = 9 }
};
CalculatorSetter commonCalculator = new CalculatorSetter();
int totalToAccumulate = 0;
for (int i = 0; i < myCollection.Count; i++)
{
PopulateAndCalculate(myCollection[i], commonCalculator, ref totalToAccumulate);
}
commonCalculator.UpdateTotalAmount(totalToAccumulate);
Console.WriteLine($"The total accumulated is: {totalToAccumulate}");
Console.WriteLine($"The first total accumulated is: {myCollection[0].TotalAmount}");
}
Many thanks.
Use a wrapper and keep it simple (if you want you can change a little for use static methods you can, or static class but I dont see the point)
the result is:
The Amount is 3, The total ammount is 18
The Amount is 6, The total ammount is 18
The Amount is 9, The total ammount is 18
namespace Prueba1
{
class Program
{
public class WrapperInt {
public int Value { get; set; }
}
public class SomeObject
{
public int Amount { get; set; }
public WrapperInt TotalAmount { get; set; }
}
public Program() {
WrapperInt TotalAmountAllArrays = new WrapperInt();
List<SomeObject> myCollection = new List<SomeObject>()
{
new SomeObject() { Amount = 3, TotalAmount =TotalAmountAllArrays },
new SomeObject() { Amount = 6 , TotalAmount =TotalAmountAllArrays },
new SomeObject() { Amount = 9 , TotalAmount =TotalAmountAllArrays }
};
for (int i = 0; i < myCollection.Count; i++)
{
myCollection[i].TotalAmount.Value += myCollection[i].Amount;
}
foreach (var c in myCollection)
{
Console.WriteLine($"The Amount is:" + c.Amount + " The total ammount is:" + c.TotalAmount.Value);
}
}
static void Main(string[] args)
{
new Program();
}
}
}
Hopefully this will work for you… One possible solution is to create a wrapper class called MyTotalList which contains a List named amounts and an int named total. MyTotalList class does not expose its list amounts as an editable list. If the class exposes this list as editable, then other methods could ultimately change an items value in that list and the MyTotalList class would not be aware of this and unfortunately contain an incorrect total. To avoid this situation and for the class to work as expected, methods must use the MyTotalList’s Add and Remove methods. To ensure this happens, the private List amounts in the MyTotalList class returns a read only list which ensures that changes to the list will not be made outside the MyTotalList class. Leaving the list exposed and editable will/could cause the class to contain an incorrect total.
My solution is to create a Class that wraps a List. MyTotalList class has a no argument constructor. Once a new instance of a MyTotalList object is created you can then use that instance to Add MyObject items to its list. Every time an item is added to the MyTotalList, list amounts the variable total gets updated with the added item’s amount. Example:
Create a new MyTotalList object:
MyTotalList listOfObjects = new MyTotalList();
Then add some MyObject instances to the listOfObjects
listOfObjects.Add(new MyObject(1,3));
listOfObjects.Add(new MyObject(2,6));
listOfObjects.Add(new MyObject(3,9));
After you add the items, you can then use the listOfObjects Total property to get the total sum of all MyObject items in the list with:
listOfObjects.Total
If you need to pass or use the List of MyTotalList items you can use:
listOfObjects.Items
Bear in mind as I discussed above, this List Items is a read-only list. Therefore you cannot add/remove items in this list as you would an editable list. So the code below will fail during implementation as these methods are not exposed for read only objects.
listOfObjects.Items.Remove(new MyObject(4, 10));
listOfObjects.Items.Add(new MyObject(4, 10));
The above lines will cause the compiler to complain: xxx… does not contain a definition for Add/Remove. This ensures methods will use the MyTotalList.Add and MyTotalsList.Remove methods and eliminate any possibility of the list changing outside the MyTotalList class.
MyObject Class
class MyObject : IComparable {
public int id { get; }
public int amount { get; }
public MyObject(int inID, int inAmount) {
id = inID;
amount = inAmount;
}
public override string ToString() {
return amount.ToString();
}
public override int GetHashCode() {
return id.GetHashCode();
}
public override bool Equals(object other) {
if (other != null)
return (this.id == ((MyObject)other).id);
return false;
}
public int CompareTo(object other) {
if (this.id > ((MyObject)other).id)
return 1;
if (this.id < ((MyObject)other).id)
return -1;
return 0;
}
}
MyTotalList Class
class MyTotalList {
private int total;
private List<MyObject> amounts;
public MyTotalList() {
total = 0;
amounts = new List<MyObject>();
}
public int ListCount {
get { return amounts.Count; }
}
public IReadOnlyCollection<MyObject> Items {
get { return amounts.AsReadOnly(); }
}
public int Total {
get { return total; }
}
public void Add(MyObject other) {
if (other != null) {
if (!(amounts.Contains(other))) {
total += other.amount;
amounts.Add(other);
}
else {
Console.WriteLine("Duplicate id's not allowed!");
}
}
}
public void Remove(MyObject other) {
if (amounts.Contains(other)) {
total -= amounts[amounts.IndexOf(other)].amount;
amounts.Remove(other);
}
else {
Console.WriteLine("Item to remove not found!");
}
}
}
Examples
MyTotalList listOfObjects = new MyTotalList();
listOfObjects.Add(new MyObject(1,3));
listOfObjects.Add(new MyObject(2,6));
listOfObjects.Add(new MyObject(3,9));
Console.WriteLine("----------------------------------------");
Console.WriteLine("Initial list with total");
Console.WriteLine("items in list:");
foreach (MyObject mo in listOfObjects.Items)
Console.Write(mo.ToString() + " ");
Console.WriteLine();
Console.WriteLine("Total from list of " + listOfObjects.ListCount +
" items is: " + listOfObjects.Total);
Console.WriteLine("----------------------------------------");
Console.WriteLine("Add three more items");
listOfObjects.Add(new MyObject(4, 10));
listOfObjects.Add(new MyObject(5, 11));
listOfObjects.Add(new MyObject(6, 12));
Console.WriteLine("items in list:");
foreach (MyObject mo in listOfObjects.Items)
Console.Write(mo.ToString() + " ");
Console.WriteLine();
Console.WriteLine("Total from list of " + listOfObjects.ListCount +
" items is: " + listOfObjects.Total);
Console.WriteLine("----------------------------------------");
Console.WriteLine("Remove id 4 (10) from the list");
listOfObjects.Remove(new MyObject(4, 10));
Console.WriteLine("items in list:");
foreach (MyObject mo in listOfObjects.Items)
Console.Write(mo.ToString() + " ");
Console.WriteLine();
Console.WriteLine("Total from list of " + listOfObjects.ListCount +
" items is: " + listOfObjects.Total);
A Side note to your original post…About the class you can not change
SomeObject {
public int Amount { get; set; }
public int TotalAmount { get; set; }
}
Regardless of how you get the total for theint varable: TotaAmount… for each instance of SomeObject class to contain the same variable with the same amount and you want to ensure this is true for all existing SomeObject instances… is well a poor design. This creates redundant data and simply waste space and it makes no sense for each variable to contain this value as it has absolutely nothing to do with that SomeObject instance. This class design is counter intuitive of a good design. As #Tim Schmelter’s comment points out "a single object should not know anything about the total amount of other objects." This “redundant data” situation is something a programmer should try to avoid, not promote.
I have a class as below,
class EUInput
{
public EUInput()
{
RtID = 0;
}
public int RtID { get; set; }
}
I want to store this class with different RtID values in a list. I tried as below,
static void Main(string[] args)
{
EUInput clsEUInput = new EUInput();
List list = new List();
for (int i = 0; i < 5; i++)
{
clsEUInput.RtID = i;
list.Add(clsEUInput);
}
foreach (EUInput obj in list)
{
Console.WriteLine(obj.RtID.ToString());
}
Console.ReadLine();
}
I am getting an output as
4
4
4
4
4
But I need an outupt as
0
1
2
3
4
You need to move the declaration of clsEUInput inside the for loop. Right now, there is only one EUInput object and you're adding the same object to the list multiple times.
List list = new List();
for (int i = 0; i < 5; i++)
{
EUInput clsEUInput = new EUInput();
clsEUInput.RtID = i;
list.Add(clsEUInput);
}
Change EUInput to be a struct (and keep your Main method as it is):
public struct EUInput
{
public int RtID;
}
A struct is a value type (a class is a reference type), so when you add it to a list, you basically add a "copy" of the whole structure (and not just a reference). So when you keep changing the RtID in the loop, you still change that one object you created, but the objects in the list won't be affected.
Either your boss is playing a trick on you, i.e. want's to test your knowledge about value types and reference types, or he doesn't know about the difference between them himself...
you Need new instances to the class
or the complete list will hold references to the one instance
private class EUInput
{
public EUInput()
{
RtID = 0;
}
public int RtID { get; set; }
}
//I want to store this class with different RtID values in a list. I tried as below,
private static void Main(string[] args)
{
List<EUInput> list = new List<EUInput>();
for (int i = 0; i < 5; i++)
{
EUInput clsEUInput = new EUInput();
clsEUInput.RtID = i;
list.Add(clsEUInput);
}
foreach (EUInput obj in list)
{
Console.WriteLine(obj.RtID.ToString());
}
Console.ReadLine();
}
I am holding two lists in my program - one master list and another temporary list which is constantly being updated. Every so often, the temporary list flushes into the master list.
The master list is HashSet (for no-duplicates) and the temporary list is List (for indexing capability). I flush the latter into the former by calling
HashSet<T>.UnionWith(List<T>)
In my testing, I find that duplicates make their way into the list, yet I thought this wasn't possible in a HashSet. Can someone please confirm/correct this? I haven't been able to find it in MSDN.
It isn't possible if your type overrides GetHashCode() and Equals() correctly. My guess is that your type hasn't done this properly. (Or your hash set has been created with a custom equality comparer which doesn't do what you want.)
If you believe that's not the case, please post the code :)
But yes, it really will prevents duplicates when used normally.
List (for indexing capability).
You'd want a dictionary for indexing.
On that note though, here's a very simple program that illustrates your problem:
class Program
{
static void Main(string[] args)
{
int totalCats = 0;
HashSet<Cat> allCats = new HashSet<Cat>();
List<Cat> tempCats = new List<Cat>();
//put 10 cats in
for (int i = 0; i < 10; i++)
{
tempCats.Add(new Cat(i));
totalCats += 1;
}
//add the cats to the final hashset & empty the temp list
allCats.UnionWith(tempCats);
tempCats = new List<Cat>();
//create 10 identical cats
for (int i = 0; i < 10; i++)
{
tempCats.Add(new Cat(i));
totalCats += 1;
}
//join them again
allCats.UnionWith(tempCats);
//print the result
Console.WriteLine("Total cats: " + totalCats);
foreach (Cat curCat in allCats)
{
Console.WriteLine(curCat.CatNumber);
}
}
}
public class Cat
{
public int CatNumber { get; set; }
public Cat(int catNum)
{
CatNumber = catNum;
}
}
Your problem is that you aren't overriding GetHashCode() and Equals(). You need to have both for the hash set to stay unique.
This will work, however the GetHashCode() function should be much more robust. I'd recommend reading up how .NET does it:
class Program
{
static void Main(string[] args)
{
int totalCats = 0;
HashSet<Cat> allCats = new HashSet<Cat>();
List<Cat> tempCats = new List<Cat>();
//put 10 cats in
for (int i = 0; i < 10; i++)
{
tempCats.Add(new Cat(i));
totalCats += 1;
}
//add the cats to the final hashset & empty the temp list
allCats.UnionWith(tempCats);
tempCats = new List<Cat>();
//create 10 identical cats
for (int i = 0; i < 10; i++)
{
tempCats.Add(new Cat(i));
totalCats += 1;
}
//join them again
allCats.UnionWith(tempCats);
//print the result
Console.WriteLine("Total cats: " + totalCats);
foreach (Cat curCat in allCats)
{
Console.WriteLine(curCat.CatNumber);
}
Console.ReadKey();
}
}
public class Cat
{
public int CatNumber { get; set; }
public Cat(int catNum)
{
CatNumber = catNum;
}
public override int GetHashCode()
{
return CatNumber;
}
public override bool Equals(object obj)
{
if (obj is Cat)
{
return ((Cat)obj).CatNumber == CatNumber;
}
return false;
}
}
Problem: I have 2 kinds of objects, lets call them Building and Improvement. There are roughly 30 Improvement instances, while there can be 1-1000 Buildings. For each combination of Building and Improvement, I have to perform some heavy calculation, and store the result in a Result object.
Both Buildings and Improvements can be represented by an integer ID.
I then need to be able to:
Access the Result for a given Building and Improvement efficiently (EDIT: see comment further down)
Perform aggregations on the Results for all Improvements for a given Building, like .Sum() and .Average()
Perform the same aggregations on the Results for all Buildings for a given Improvement
This will happen on a web-server back-end, so memory may be a concern, but speed is most important.
Thoughts so far:
Use a Dictionary<Tuple<int, int>, Result> with <BuildingID, ImprovementID> as key. This should give me speedy inserts and single lookups, but I am concerned about .Where() and .Sum() performance.
Use a two-dimensional array, with one dimension for BuildingIDs and one for ImprovementIDs, and the Result as value. In addition, build two Dictionary<int, int> that map BuildingIDs and ImprovementIDs to their respective array row/column indexes. This could potentially mean max 1000+ Dictionarys, will this be a problem?
Use a List<Tuple<int, int, Result>>. I think this may be the least efficient, with O(n) inserts, though I could be wrong.
Am I missing an obvious better option here?
EDIT: Turns out it is only the aggregated values (per Building and per Improvement) I am interested in; see my answer.
Generally, the Dictionary is most lookup efficent. The both lookup efficency and manipulation efficency is constant O(1), when accessed via key. This will help for access, the first point.
In the second and third you need to walk through all of the items O(n), so there is no way to speed it except you want to walk them through in specified order O(n*n) - then you can use SortedDictionray O(n), but you compromise the lookup and manipulation efficency (O(log n)).
So I would go with the 1st solution you post.
You could use a "dictionary of dictionaries" to hold the Result data, for example:
// Building ID ↓ ↓ Improvement ID
var data = new Dictionary<int, Dictionary<int, Result>>();
This would let you quickly find the improvements for a particular building.
However, finding the buildings that contain a particular improvement would require iterating over all the buildings. Here's some sample code:
using System;
using System.Linq;
using System.Collections.Generic;
namespace Demo
{
sealed class Result
{
public double Data;
}
sealed class Building
{
public int Id;
public int Value;
}
sealed class Improvement
{
public int Id;
public int Value;
}
class Program
{
void run()
{
// Building ID ↓ ↓ Improvement ID
var data = new Dictionary<int, Dictionary<int, Result>>();
for (int buildingKey = 1000; buildingKey < 2000; ++buildingKey)
{
var improvements = new Dictionary<int, Result>();
for (int improvementKey = 5000; improvementKey < 5030; ++improvementKey)
improvements.Add(improvementKey, new Result{ Data = buildingKey + improvementKey/1000.0 });
data.Add(buildingKey, improvements);
}
// Aggregate data for all improvements for building with ID == 1500:
int buildingId = 1500;
var sum = data[buildingId].Sum(result => result.Value.Data);
Console.WriteLine(sum);
// Aggregate data for all buildings with a given improvement.
int improvementId = 5010;
sum = data.Sum(improvements =>
{
Result result;
return improvements.Value.TryGetValue(improvementId, out result) ? result.Data : 0.0;
});
Console.WriteLine(sum);
}
static void Main()
{
new Program().run();
}
}
}
To speed up the second aggregation (for summing data for all improvements with a given ID) we can use a second dictionary:
// Improvment ID ↓ ↓ Building ID
var byImprovementId = new Dictionary<int, Dictionary<int, Result>>();
You would have an extra dictionary to maintain, but it's not too complicated. Having a few nested dictionaries like this might take too much memory though - but it's worth considering.
As noted in the comments below, it would be better to define types for the IDs and also for the dictionaries themselves. Putting that together gives:
using System;
using System.Linq;
using System.Collections.Generic;
namespace Demo
{
sealed class Result
{
public double Data;
}
sealed class BuildingId
{
public BuildingId(int id)
{
Id = id;
}
public readonly int Id;
public override int GetHashCode()
{
return Id.GetHashCode();
}
public override bool Equals(object obj)
{
var other = obj as BuildingId;
if (other == null)
return false;
return this.Id == other.Id;
}
}
sealed class ImprovementId
{
public ImprovementId(int id)
{
Id = id;
}
public readonly int Id;
public override int GetHashCode()
{
return Id.GetHashCode();
}
public override bool Equals(object obj)
{
var other = obj as ImprovementId;
if (other == null)
return false;
return this.Id == other.Id;
}
}
sealed class Building
{
public BuildingId Id;
public int Value;
}
sealed class Improvement
{
public ImprovementId Id;
public int Value;
}
sealed class BuildingResults : Dictionary<BuildingId, Result>{}
sealed class ImprovementResults: Dictionary<ImprovementId, Result>{}
sealed class BuildingsById: Dictionary<BuildingId, ImprovementResults>{}
sealed class ImprovementsById: Dictionary<ImprovementId, BuildingResults>{}
class Program
{
void run()
{
var byBuildingId = CreateTestBuildingsById(); // Create some test data.
var byImprovementId = CreateImprovementsById(byBuildingId); // Create the alternative lookup dictionaries.
// Aggregate data for all improvements for building with ID == 1500:
BuildingId buildingId = new BuildingId(1500);
var sum = byBuildingId[buildingId].Sum(result => result.Value.Data);
Console.WriteLine(sum);
// Aggregate data for all buildings with a given improvement.
ImprovementId improvementId = new ImprovementId(5010);
sum = byBuildingId.Sum(improvements =>
{
Result result;
return improvements.Value.TryGetValue(improvementId, out result) ? result.Data : 0.0;
});
Console.WriteLine(sum);
// Aggregate data for all buildings with a given improvement using byImprovementId.
// This will be much faster than the above Linq.
sum = byImprovementId[improvementId].Sum(result => result.Value.Data);
Console.WriteLine(sum);
}
static BuildingsById CreateTestBuildingsById()
{
var byBuildingId = new BuildingsById();
for (int buildingKey = 1000; buildingKey < 2000; ++buildingKey)
{
var improvements = new ImprovementResults();
for (int improvementKey = 5000; improvementKey < 5030; ++improvementKey)
{
improvements.Add
(
new ImprovementId(improvementKey),
new Result
{
Data = buildingKey + improvementKey/1000.0
}
);
}
byBuildingId.Add(new BuildingId(buildingKey), improvements);
}
return byBuildingId;
}
static ImprovementsById CreateImprovementsById(BuildingsById byBuildingId)
{
var byImprovementId = new ImprovementsById();
foreach (var improvements in byBuildingId)
{
foreach (var improvement in improvements.Value)
{
if (!byImprovementId.ContainsKey(improvement.Key))
byImprovementId[improvement.Key] = new BuildingResults();
byImprovementId[improvement.Key].Add(improvements.Key, improvement.Value);
}
}
return byImprovementId;
}
static void Main()
{
new Program().run();
}
}
}
Finally, here's a modified version which determines the time it takes to aggregate data for all instances of a building/improvement combination for a particular improvement and compares the results for dictionary of tuples with dictionary of dictionaries.
My results for a RELEASE build run outside any debugger:
Dictionary of dictionaries took 00:00:00.2967741
Dictionary of tuples took 00:00:07.8164672
It's significantly faster to use a dictionary of dictionaries, but this is only of importance if you intend to do many of these aggregations.
using System;
using System.Diagnostics;
using System.Linq;
using System.Collections.Generic;
namespace Demo
{
sealed class Result
{
public double Data;
}
sealed class BuildingId
{
public BuildingId(int id)
{
Id = id;
}
public readonly int Id;
public override int GetHashCode()
{
return Id.GetHashCode();
}
public override bool Equals(object obj)
{
var other = obj as BuildingId;
if (other == null)
return false;
return this.Id == other.Id;
}
}
sealed class ImprovementId
{
public ImprovementId(int id)
{
Id = id;
}
public readonly int Id;
public override int GetHashCode()
{
return Id.GetHashCode();
}
public override bool Equals(object obj)
{
var other = obj as ImprovementId;
if (other == null)
return false;
return this.Id == other.Id;
}
}
sealed class Building
{
public BuildingId Id;
public int Value;
}
sealed class Improvement
{
public ImprovementId Id;
public int Value;
}
sealed class BuildingResults : Dictionary<BuildingId, Result>{}
sealed class ImprovementResults: Dictionary<ImprovementId, Result>{}
sealed class BuildingsById: Dictionary<BuildingId, ImprovementResults>{}
sealed class ImprovementsById: Dictionary<ImprovementId, BuildingResults>{}
class Program
{
void run()
{
var byBuildingId = CreateTestBuildingsById(); // Create some test data.
var byImprovementId = CreateImprovementsById(byBuildingId); // Create the alternative lookup dictionaries.
var testTuples = CreateTestTuples();
ImprovementId improvementId = new ImprovementId(5010);
int count = 10000;
Stopwatch sw = Stopwatch.StartNew();
for (int i = 0; i < count; ++i)
byImprovementId[improvementId].Sum(result => result.Value.Data);
Console.WriteLine("Dictionary of dictionaries took " + sw.Elapsed);
sw.Restart();
for (int i = 0; i < count; ++i)
testTuples.Where(result => result.Key.Item2.Equals(improvementId)).Sum(item => item.Value.Data);
Console.WriteLine("Dictionary of tuples took " + sw.Elapsed);
}
static Dictionary<Tuple<BuildingId, ImprovementId>, Result> CreateTestTuples()
{
var result = new Dictionary<Tuple<BuildingId, ImprovementId>, Result>();
for (int buildingKey = 1000; buildingKey < 2000; ++buildingKey)
for (int improvementKey = 5000; improvementKey < 5030; ++improvementKey)
result.Add(
new Tuple<BuildingId, ImprovementId>(new BuildingId(buildingKey), new ImprovementId(improvementKey)),
new Result
{
Data = buildingKey + improvementKey/1000.0
});
return result;
}
static BuildingsById CreateTestBuildingsById()
{
var byBuildingId = new BuildingsById();
for (int buildingKey = 1000; buildingKey < 2000; ++buildingKey)
{
var improvements = new ImprovementResults();
for (int improvementKey = 5000; improvementKey < 5030; ++improvementKey)
{
improvements.Add
(
new ImprovementId(improvementKey),
new Result
{
Data = buildingKey + improvementKey/1000.0
}
);
}
byBuildingId.Add(new BuildingId(buildingKey), improvements);
}
return byBuildingId;
}
static ImprovementsById CreateImprovementsById(BuildingsById byBuildingId)
{
var byImprovementId = new ImprovementsById();
foreach (var improvements in byBuildingId)
{
foreach (var improvement in improvements.Value)
{
if (!byImprovementId.ContainsKey(improvement.Key))
byImprovementId[improvement.Key] = new BuildingResults();
byImprovementId[improvement.Key].Add(improvements.Key, improvement.Value);
}
}
return byImprovementId;
}
static void Main()
{
new Program().run();
}
}
}
Thanks for the answers, the test code was really informative :)
The solution for me turned out to be to forgo LINQ, and perform aggregation manually directly after the heavy calculation, as I had to iterate over each combination of Building and Improvement anyway.
Also, I had to use the objects themselves as keys, in order to perform calculations before the objects were persisted to Entity Framework (i.e. their IDs were all 0).
Code:
public class Building {
public int ID { get; set; }
...
}
public class Improvement {
public int ID { get; set; }
...
}
public class Result {
public decimal Foo { get; set; }
public long Bar { get; set; }
...
public void Add(Result result) {
Foo += result.Foo;
Bar += result.Bar;
...
}
}
public class Calculator {
public Dictionary<Building, Result> ResultsByBuilding;
public Dictionary<Improvement, Result> ResultsByImprovement;
public void CalculateAndAggregate(IEnumerable<Building> buildings, IEnumerable<Improvement> improvements) {
ResultsByBuilding = new Dictionary<Building, Result>();
ResultsByImprovement = new Dictionary<Improvement, Result>();
for (building in buildings) {
for (improvement in improvements) {
Result result = DoHeavyCalculation(building, improvement);
if (ResultsByBuilding.ContainsKey(building)) {
ResultsByBuilding[building].Add(result);
} else {
ResultsByBuilding[building] = result;
}
if (ResultsByImprovement.ContainsKey(improvement)) {
ResultsByImprovement[improvement].Add(result);
} else {
ResultsByImprovement[improvement] = result;
}
}
}
}
}
public static void Main() {
var calculator = new Calculator();
IList<Building> buildings = GetBuildingsFromRepository();
IList<Improvement> improvements = GetImprovementsFromRepository();
calculator.CalculateAndAggregate(buildings, improvements);
DoStuffWithResults(calculator);
}
I did it this way because I knew exactly which aggregations I wanted; if I required a more dynamic approach I would probably have gone with something like #MatthewWatson's Dictionary of Dictionaries.