I have a class A that works with hundreds or thousands of classes, each class has a method with some calculations, for example.
Class A has a method where it choose which class, of those hundreds or thousands, runs. And the method of class A runs many times in a short time.
The solution that I thought at the beginning was to have the classes already created in class A, to avoid having to create and destroy classes every time the event was executed and that the garbage collector consumes CPU. But this class A, as I say, is going to have hundreds or thousands of classes to run and having them all loaded is too high an expense in memory (I think).
My question is, can you think of an optimal way to work with hundreds or thousands of classes, which will run some of them every second, without having to create and destroy it in each execution of the method that works with them?
Edit:
First example: Create and save the classes and then use them, I think it would be a memory expense. But keep the garbage collector from working too much.
public class ClassA {
Class1 class1;
Class2 class2;
// ... more classes
Class100 class100;
public ClassA() {
class1 = new Class1();
// ... initializations
class100 = new Class100();
}
public ChooseClass(int numberClass) {
switch (numberClass) {
case 1:
class1.calculate();
break;
case 2:
class2.run();
break;
// ... more cases, one for each class
case 100:
class100.method();
break;
default:
break;
}
}
}
Second example: Creating the class when used, saves memory but the garbage collector consumes a lot of CPU.
public class ClassA {
public ChooseClass(int numberClass) {
switch (numberClass) {
case 1:
Class1 class1 = new Class1();
class1.calculate();
break;
case 2:
Class2 Class2 = new Class2();
class2.run();
break;
// ... more cases, one for each class
case 100:
Class100 class100 = new Class100();
class100.method();
break;
default:
break;
}
}
}
The basic problem you face, when you start increasing the number of class instances is that they all need to be accounted and tracked during garbage collection operation, even if you never free those instances, the garbage collector still needs to track them. There comes a point when the program spends more time performing garbage collection than actual work. We experienced this kind of performance problem with a binary search tree that ended up containing several millions of nodes that originally were class instances.
We were able to circumvent this by using List<T> of structs rather than classes. (The memory of a list is backed by an array, and for structs, the garbage collector only needs to track a single reference to this array). Now, instead of references to a class, we store indices to this list in order to access a desired instance of the struct.
In fact we also faced the problem (notice newer versions of the .NET framework do away with this limitation) that the backing array couldn't grow beyond 2GB even under 64-bits, so we split storage on several lists (256) and used a 32 bit index where 8 bits acted as a list selector and the remaining 24 bits served as an index into the list.
Of course it is convenient to build a class that abstracts all these details, and you need to be aware that when modifying the struct, you actually need to copy it to a local instance, modify it and then replace the original struct with a copy of the modified instance, otherwise your changes will occur in a temporal copy of the struct and not be reflected on your data collection. Also, there is a performance impact, that fortunately is paid-back once the collection is large enough, with extremely fast garbage collection cycles.
Here is some code (quite old), showing these ideas in place, and went from a server spending near 100% of CPU time, to around 15%, just by migrating our search tree to this approach.
public class SplitList<T> where T : struct {
// A virtual list divided into several sublists, removing the 2GB capacity limit
private List<T>[] _lists;
private Queue<int> _free = new Queue<int>();
private int _maxId = 0;
private const int _hashingBits = 8;
private const int _listSelector = 32 - _hashingBits;
private const int _subIndexMask = (1 << _listSelector) - 1;
public SplitList() {
int listCount = 1 << _hashingBits;
_lists = new List<T>[listCount];
for( int i = 0; i < listCount; i++ )
_lists[i] = new List<T>();
}
// Access a struct by index
// Remember that this returns a local copy of the struct, so if changes are to be done,
// the local copy must be copied to a local struct, modify it, and then copy back the changes
// to the list
public T this[int idx] {
get {
return _lists[(idx >> _listSelector)][idx & _subIndexMask];
}
set {
_lists[idx >> _listSelector][idx & _subIndexMask] = value ;
}
}
// returns an index to a "new" struct inside the collection
public int New() {
int result;
T newElement = new T();
// are there any free indexes available?
if( _free.Count > 0 ) {
// yes, return a free index and initialize reused struct to default values
result = _free.Dequeue();
this[result] = newElement;
} else {
// no, grow the capacity
result = ++_maxId;
List<T> list = _lists[result >> _listSelector];
list.Add(newElement);
}
return result;
}
// free an index and allow the struct slot to be reused.
public void Free(int idx) {
_free.Enqueue(idx);
}
}
Here is a snippet of how our binary tree implementation ended up looking using this SplitList backing container class:
public class CLookupTree {
public struct TreeNode {
public int HashValue;
public int LeftIdx;
public int RightIdx;
public int firstSpotIdx;
}
SplitList<TreeNode> _nodes;
…
private int RotateLeft(int idx) {
// Performs a tree rotation to the left, here you can see how we need
// to retrieve the struct to a local copy (thisNode), modify it, and
// push back the modifications to the node storage list
// Also note that we are working with indexes rather than references to
// the nodes
TreeNode thisNode = _nodes[idx];
int result = thisNode.RightIdx;
TreeNode rightNode = _nodes[result];
thisNode.RightIdx = rightNode.LeftIdx;
rightNode.LeftIdx = idx;
_nodes[idx] = thisNode;
_nodes[result] = rightNode;
return result;
}
}
I'm currently working on a Windows Service that will be handling the acquisition of data from multiple measurement instruments connected via USB to my computer. It will be sending some data to an SQL database but I am also creating another application for the machine to be able to view the data locally in real-time and I'll be dealing with arrays/lists with over 7,000,000 elements in the worst case scenarios.
Currently I'm using WCF with NetNamedPipeBinding for inter-process communication and it works great (I can transfer an array with over 7 million doubles in less than a quarter second). So I don't need answers urgently, but I am curious if there are faster ways of accessing the data in the service quicker or more easily.
I have been thinking of delving into unmanaged memory and having the service return a pointer to the array, or something similar. However I don't want to bother with that if the gains are minimal. It's just that when passing a class (I bet a struct will have less overhead) the performance tanks and I am trying to get a good foundation in case I start dealing with more complex data types.
Service related code
public class testclass
{
public double dub1 {get;set;}
public double dub2 {get;set;}
}
public testclass[] GetList(int n)
{
sw.Restart();
testclass[] numbers = new testclass[n];
for (var i = 0; i < n; i++)
{
numbers[i] = new testclass { dub1 = i, dub2 = i };
}
numbers[0].dub1 = (double)sw.ElapsedMilliseconds;
return numbers;
}
public double[] GetDoubles(int n)
{
sw.Restart();
double[] numbers = new double[n];
numbers[0] = (double)sw.ElapsedMilliseconds;
return numbers;
}
Client Related Code
class Program
{
static void Main(string[] args)
{
while (1 == 1)
{
Console.WriteLine("Size of List");
var number1 = int.Parse(Console.ReadLine());
var sw = new Stopwatch();
var test = new ServiceReference1.CalculatorClient();
sw.Restart();
var list = test.GetDoubles(number1).ToList();
Console.WriteLine("Response Time: "+ sw.ElapsedMilliseconds);
Console.WriteLine("Time to make list: "+list[0]);
}
}
}
GetDoubles Performance
GetList Performance
Im just guessing here but I expect that serialization may be part of the issue, there are two steps to dealing with this (at least). One is to use DTO's to make sure you are only passing what is needed and are using performant data types, the other would be to look at using a different serializer, I have heard of protobuf-net but have never used it.
I know your above example is trivial but if your test class could create a byte array DTO and accept one as a constructor on the client side you might have a pattern that helps.
Okay, I've this classes and main. I'm on VS 2010 Ultimate and .NET 4 Client.
internal class tezt
{
private int[] _numeros = new int[5];
public int[] Numeros
{
get { return _numeros; }
}
}
public class tezt2
{
private int[] _numeros = new int[5];
public int[] Numeros
{
get { return _numeros; }
}
}
class tezt3
{
private int[] _numeros = new int[5];
public int[] Numeros
{
get { return _numeros; }
}
}
internal static class Program
{
private static void Main()
{
var arrNums = new tezt();
var arrNums2 = new tezt2();
var arrNums3 = new tezt3();
Console.WriteLine(arrNums.Numeros[0]);
arrNums.Numeros[0] = 5;
Console.WriteLine(arrNums.Numeros[0]);
Console.WriteLine(arrNums2.Numeros[0]);
arrNums2.Numeros[0] = 6;
Console.WriteLine(arrNums2.Numeros[0]);
Console.WriteLine(arrNums3.Numeros[0]);
arrNums3.Numeros[0] = 7;
Console.WriteLine(arrNums3.Numeros[0]);
Console.ReadKey(true);
}
}
What's happening with these lines:
arrNums.Numeros[0] = 5;
arrNums2.Numeros[0] = 6;
arrNums3.Numeros[0] = 7;
Isn't supposed that because the classes from which those objects are derived from haven't a set parameter, those asignations must not be allowed?
What can be done to avoid that, to restrict that, thad doing thiks like arrNums.Numeros[0] = 5; throws a error?
You're not setting the Numeros property, you are modifying an element at an index inside that property. You are only using the getter for that property.
The assignment that isn't allowed is assigning a new value to that object's property:
arrNums.Numeros = new int[5]; // will not compile.
You could make the getter return an IEnumerable<int> to protect it:
class tezt3
{
private int[] _numeros = new int[5];
public IEnumerable<int> Numeros
{
get { return _numeros; }
}
}
You could also use a ReadOnlyCollection. The pros and cons of using IEnumerable<T> vs ReadOnlyCollection<T> are discussed in depth in this question.
Can I also advice not using arrays on properties. Here is a quote from Framework Design Guidelines
Properties that return arrays can be very misleading. Usually it is
necessary to return a copy of an internal array so that the user
cannot change the internal state. This could lead to inefficient code.
In the following example, the Employees property is accessed twice in
every iteration of the loop. That would be 2n + 1 copies for the
following short code sample:
Company microsoft = GetCompanyData("MSFT");
for (int i = 0; i < microsoft.Employees.Length; i++)
{
if (microsoft.Employees[i].Alias == "kcwalina")
{ ... }
}
Brad Abrams, a annotator of the book, remarks on this pitfall:
Some of the guidelines in this book were debated and agreed on in the
abstract; others were learned in the school of hard knocks. The
guideline on properties that return arrays is in the school of hard
knocks camp. When we were investigating some performance issues in
version 1.0 of the .NET Framework, we noticed that thousands of arrays
were being created and quickly trashed. It turns out that many places
in the Framework itself ran into this pattern. Needless to say, we
fixed those instances and the guidelines. - Brad Abrams
I was doing other experiments until this strange behaviour caught my eye.
code is compiled in x64 release.
if key in 1, the 3rd run of List method cost 40% more time than the first 2. output is
List costs 9312
List costs 9289
Array costs 12730
List costs 11950
if key in 2, the 3rd run of Array method cost 30% more time than the first 2. output is
Array costs 8082
Array costs 8086
List costs 11937
Array costs 12698
You can see the pattern, the complete code is attached following (just compile and run):
{the code presented is minimal to run the test. The actually code used to get reliable result is more complicated, I wrapped the method and tested it 100+ times after proper warmed up}
class ListArrayLoop
{
readonly int[] myArray;
readonly List<int> myList;
readonly int totalSessions;
public ListArrayLoop(int loopRange, int totalSessions)
{
myArray = new int[loopRange];
for (int i = 0; i < myArray.Length; i++)
{
myArray[i] = i;
}
myList = myArray.ToList();
this.totalSessions = totalSessions;
}
public void ArraySum()
{
var pool = myArray;
long sum = 0;
for (int j = 0; j < totalSessions; j++)
{
sum += pool.Sum();
}
}
public void ListSum()
{
var pool = myList;
long sum = 0;
for (int j = 0; j < totalSessions; j++)
{
sum += pool.Sum();
}
}
}
class Program
{
static void Main(string[] args)
{
Stopwatch sw = new Stopwatch();
ListArrayLoop test = new ListArrayLoop(10000, 100000);
string input = Console.ReadLine();
if (input == "1")
{
sw.Start();
test.ListSum();
sw.Stop();
Console.WriteLine("List costs {0}",sw.ElapsedMilliseconds);
sw.Reset();
sw.Start();
test.ListSum();
sw.Stop();
Console.WriteLine("List costs {0}", sw.ElapsedMilliseconds);
sw.Reset();
sw.Start();
test.ArraySum();
sw.Stop();
Console.WriteLine("Array costs {0}", sw.ElapsedMilliseconds);
sw.Reset();
sw.Start();
test.ListSum();
sw.Stop();
Console.WriteLine("List costs {0}", sw.ElapsedMilliseconds);
}
else
{
sw.Start();
test.ArraySum();
sw.Stop();
Console.WriteLine("Array costs {0}", sw.ElapsedMilliseconds);
sw.Reset();
sw.Start();
test.ArraySum();
sw.Stop();
Console.WriteLine("Array costs {0}", sw.ElapsedMilliseconds);
sw.Reset();
sw.Start();
test.ListSum();
sw.Stop();
Console.WriteLine("List costs {0}", sw.ElapsedMilliseconds);
sw.Reset();
sw.Start();
test.ArraySum();
sw.Stop();
Console.WriteLine("Array costs {0}", sw.ElapsedMilliseconds);
}
Console.ReadKey();
}
}
Contrived problems give you contrived answers.
Optimization should be done after code is written and not before. Write your solution the way that is easiest to understand and maintain. Then if the program is not fast enough for your use case, then you use a profiling tool and go back and see where the actual bottleneck is, not where you "think" it is.
Most optimizations people try to do in your situation is spending 6 hours to do something that will decrease the run time by 1 second. Most small programs will not be run enough times to offset the cost you spent trying to "optimize" it.
That being said this is a strange edge case. I modified it a bit and am running it though a profiler but I need to downgrade my VS2010 install so I can get .NET framework source stepping back.
I ran though the profiler using the larger example, I can find no good reason why it would take longer.
Your issue is your test. When you are benchmarking code there are a couple of guiding principals you should always follow:
Processor Affinity: Use only a single processor, usually not #1.
Warmup: Always perform the test a small number of times up front.
Duration: Make sure your test duration is at least 500ms.
Average: Average together multiple runs to remove anomalies.
Cleanup: Force the GC to collect allocated objects between tests.
Cooldown: Allow the process to sleep for a short-period of time.
So using these guidelines and rewriting your tests I get the following results:
Run 1
Enter test number (1|2): 1
ListSum averages 776
ListSum averages 753
ArraySum averages 1102
ListSum averages 753
Press any key to continue . . .
Run 2
Enter test number (1|2): 2
ArraySum averages 1155
ArraySum averages 1102
ListSum averages 753
ArraySum averages 1067
Press any key to continue . . .
So here is the final test code used:
static void Main(string[] args)
{
//We just need a single-thread for this test.
Process.GetCurrentProcess().ProcessorAffinity = new IntPtr(2);
System.Threading.Thread.BeginThreadAffinity();
Console.Write("Enter test number (1|2): ");
string input = Console.ReadLine();
//perform the action just a few times to jit the code.
ListArrayLoop warmup = new ListArrayLoop(10, 10);
Console.WriteLine("Performing warmup...");
Test(warmup.ListSum);
Test(warmup.ArraySum);
Console.WriteLine("Warmup complete...");
Console.WriteLine();
ListArrayLoop test = new ListArrayLoop(10000, 10000);
if (input == "1")
{
Test(test.ListSum);
Test(test.ListSum);
Test(test.ArraySum);
Test(test.ListSum);
}
else
{
Test(test.ArraySum);
Test(test.ArraySum);
Test(test.ListSum);
Test(test.ArraySum);
}
}
private static void Test(Action test)
{
long totalElapsed = 0;
for (int counter = 10; counter > 0; counter--)
{
try
{
var sw = Stopwatch.StartNew();
test();
totalElapsed += sw.ElapsedMilliseconds;
}
finally { }
GC.Collect(0, GCCollectionMode.Forced);
GC.WaitForPendingFinalizers();
//cooldown
for (int i = 0; i < 100; i++)
System.Threading.Thread.Sleep(0);
}
Console.WriteLine("{0} averages {1}", test.Method.Name, totalElapsed / 10);
}
Note: Some people may debate about the usefulness of the cool-down; However, everyone agrees that even if it's not helpful, it is not harmful. I find that on some tests it can yield a more reliable result; however, in the example above I doubt it makes any difference.
Short answer: It is because CRL has optimization for dispatching methods called on interface-type. As long as particular interface's method call is made on the same type (that implements this interface), CLR uses fast dispatching routine (only 3 instructions) that only checks actual type of instance and in case of match it jumps directly on precomputed address of particular method. But when the same interface's method call is made on instance of another type, CLR switches dispatching to slower routine (which can dispatch methods for any actual instance type).
Long answer:
Firstly, take a look at how the method System.Linq.Enumerable.Sum() is declared (I omitted validity checking of source parameter because it's not important it this case):
public static int Sum(this IEnumerable<int> source)
{
int num = 0;
foreach (int num2 in source)
num += num2;
return num;
}
So all types that implement IEnumerable< int > can call this extension method, including int[] and List< int >. Keyword foreach is just abbreviation for getting enumerator via IEnumerable< T >.GetEnumerator() and iterating through all values. So this method actually does this:
public static int Sum(this IEnumerable<int> source)
{
int num = 0;
IEnumerator<int> Enumerator = source.GetEnumerator();
while(Enumerator.MoveNext())
num += Enumerator.Current;
return num;
}
Now you can clearly see, that method body contains three method calls on interface-type variables: GetEnumerator(), MoveNext(), and Current (although Current is actually property, not method, reading value from property just calls corresponding getter method).
GetEnumerator() typically creates new instance of some auxiliary class, which implements IEnumerator< T > and thus is able to return all values one by one. It is important to note, that in case of int[] and List< int >, types of enumerators returned by GetEnumerator() ot these two classes are different. If argument source is of type int[], then GetEnumerator() returns instance of type SZGenericArrayEnumerator< int > and if source is of type List< int >, then it returns instance of type List< int >+Enumerator< int >.
Two other methods (MoveNext() and Current) are repeatedly called in tight loop and therefore their speed is crucial for overall performance. Unfortunatelly calling method on interface-type variable (such as IEnumerator< int >) is not as straightforward as ordinary instance method call. CLR must dynamically find out actual type of object in variable and then find out, which object's method implements corresponding interface method.
CLR tries to avoid doing this time consuming lookup on every call with a little trick. When particular method (such as MoveNext()) is called for the first time, CLR finds actual type of instance on which this call is made (for example SZGenericArrayEnumerator< int > in case you called Sum on int[]) and finds address of method, that implements corresponding method for this type (that is address of method SZGenericArrayEnumerator< int >.MoveNext()). Then it uses this information to generate auxiliary dispatching method, which simply checks, whether actual instance type is the same as when first call was made (that is SZGenericArrayEnumerator< int >) and if it is, it directly jumps to the method's address found earlier. So on subsequent calls, no complicated method lookup is made as long as type of instance remains the same. But when call is made on enumerator of different type (such as List< int >+Enumerator< int > in case of calculating sum of List< int >), CLR no longer uses this fast dispatching method. Instead another (general-purpose) and much slower dispatching method is used.
So as long as Sum() is called on array only, CLR dispatches calls to GetEnumerator(), MoveNext(), and Current using fast method. When Sum() is called on list too, CLR switches to slower dispatching method and therefore performance decreases.
If performance is your concern, implement your own separate Sum() extension method for every type, on which you want to call Sum(). This ensures that CLR will use fast dispatching method. For example:
public static class FasterSumExtensions
{
public static int Sum(this int[] source)
{
int num = 0;
foreach (int num2 in source)
num += num2;
return num;
}
public static int Sum(this List<int> source)
{
int num = 0;
foreach(int num2 in source)
num += num2;
return num;
}
}
Or even better, avoid using IEnumerable< T > interface at all (because it's still brings noticeable overhead). For example:
public static class EvenFasterSumExtensions
{
public static int Sum(this int[] source)
{
int num = 0;
for(int i = 0; i < source.Length; i++)
num += source[i];
return num;
}
public static int Sum(this List<int> source)
{
int num = 0;
for(int i = 0; i < source.Count; i++)
num += source[i];
return num;
}
}
Here are results from my computer:
Your original program: 9844, 9841, 12545, 14384
FasterSumExtensions: 6149, 6445, 754, 6145
EvenFasterSumExtensions: 1557, 1561, 553, 1574
Way too much for a comment so it's CW -- feel free to incorporate and I'll delete this. The given code is a little off to me but the problem is still interesting. If you mix calls, you get poorer performance. This code highlights it:
static void Main(string[] args)
{
var input = Console.ReadLine();
var test = new ListArrayLoop(10000, 1000);
switch (input)
{
case "1":
Test(test.ListSum);
break;
case "2":
Test(test.ArraySum);
break;
case "3":
// adds about 40 ms
test.ArraySum();
Test(test.ListSum);
break;
default:
// adds about 35 ms
test.ListSum();
Test(test.ArraySum);
break;
}
}
private static void Test(Action toTest)
{
for (int i = 0; i < 100; i++)
{
var sw = Stopwatch.StartNew();
toTest();
sw.Stop();
Console.WriteLine("costs {0}", sw.ElapsedMilliseconds);
sw.Reset();
}
}
Lists are implemented in .NET with arrays so the average performance should be the same (since you do not change the length of either).
It looks like you have averaged the sum()s' times sufficiently, this could be a GC issue with an iterator used in the sum() method.
Hm, it really looks strange...
My guess: You are calling .sum() on the variable pool with var type. As long as you are working on only one type (either list or array), the call to sum() is unambigous and can be optimized. By using a new class var is ambigous and must be resolved, so further calls will cause a performance hit.
I do not have a compiler, so try to load another class which supports sum() and compare the times. If I am right, I would expect again a performance hit, but this time not so much.
In my opinion that's caching (because of read ahead).
First time you access an array, many elements from it get into the cache at once (read ahead). This prefetching mechanism expects the program will be likely accessing memory near theaddress requested.
Further calls already benefit from this (provided the array fit into the cache). When you change the method, the cache is invalidated and you need to get everything again from memory.
so calling: list, array, list, array, list, array
should be slower than: list,list,list,array,array,array
But this is not deterministic from the programmer point of view, as you don't know the state of the cache or other units affecting caching decisions.
everyone,recently i was debugging a program for improve performance.i notice a interest thing about assignment's performance.the below code is my test code.
CODE A
public class Word{....}
public class Chunk
{
private Word[] _items;
private int _size;
public Chunk()
{
_items = new Word[3];
}
public void Add(Word word)
{
_items[_size++] = word;
}
}
main
Chunk chunk = new Chunk();
for (int i = 0; i < 3; i++)
{
chunk.Add(new Word() { });//
}
CODE B
public class Chunk
{
private Word[] _items;
private int _size;
public Chunk()
{
_items = new Word[3];
}
public Word[] Words
{
get
{
return _items;
}
}
public int Size
{
get{return _size;}
set{_size=value;}
}
}
main
Chunk chunk = new Chunk();
for (int i = 0; i < 3; i++)
{
chunk.Words[i] = new Word() { };
chunk.Size + = 1;
}
in my test with visual studio'profiling tool,calling the main method 32000 times,that performance shows the CODE B FASTER than the CODE A.why the CODE B faster than the CODE A?who can give me a suggestion?
thanks
update:sorry,i forgot increase _size code in the CODE B,i have updated my CODE B
update: #Shiv Kuma Yes, code A is similar with Code B in the situation of 30000 call times. I tested the 700K file and the code can be called 29000 times or so.
Meanwhile, code B is 100 millisecond faster than Code A, and actually Code B is much better during the real segment.
Here one more thing I’m wondering is why Code B is faster than Code A even for the same assignment?
Anyway, thanks for you reply.
Three reason I can think of.
Chunk.Add() is a method call, a method call is always expensive compared to same code running inline.
There are two incremnets in the first code sample ( _size++ and i++ )
chunk.Words array might cached locally (2nd example) therefore no need to evaluate chunk.items (1st example) every time Add is called.
In CODE A you are incrementing twice. Once in your for loop:
for (int i = 0; i < 3; i++)
And once in your method:
_items[_size++] = word;
In CODE B you are only incrementing once in the for loop.
It isn't much but it would definitely cause the performance difference.
Yes, the method call would also add a small amount of overhead.