"Immutable" Byte Array or Object Locks?

"Immutable" Byte Array or Object Locks? - c#

I'm developing a multithreaded modbus server, and I will need to manage a block of bytes for the client(s) to read. Each modbus device will have a thread to update their respective portion of the byte array, so I will need to implement some form of locking.
When the application is initialized, the number of devices and the number of bytes of memory allocated to each device will be constant. I did some research, and it seems that it's safe to lock a multidimensional array with an array of ReaderWriterLockSlim Objects to be used as locks:
private static ReaderWriterLockSlim[] memLock;
private static byte[][] memoryBlock;
...
//Modify or read memoryBlock corresponding to deviceIndex as needed using:
memLock[deviceIndex].EnterWriteLock();
try
{
// Write to array of bytes from memoryBlock
memoryBlock[2][3] = 0x05;
}
finally
{
memLock[deviceIndex].ExitWriteLock();
}
memLock[deviceIndex].EnterReadLock();
try
{
// Read array of bytes from memoryBlock
}
finally
{
memLock[deviceIndex].ExitReadLock();
}
I have not written a lot of multithreaded applications, and I only recently "discovered" the concept of immutability. Here is my stab at turning the above into an immutable class, assuming a set list of devices and a memory size that never changes after the memory is Initializeed. The reason the class is static is because there will only be 1 instance of this class in my application:
private static class DeviceMemory
{
private static bool initialized_ = false;
private static int memSizeInBytes_;
private static List<byte[]> registers_;
public static void Initialize(int deviceCount, int memSizeInBytes)
{
if (initialized_) throw new Exception("DeviceMemory already initialized");
if (memSizeInBytes <= 0) throw new Exception("Invalid memory size in bytes");
memSizeInBytes_ = memSizeInBytes;
registers_ = new List<byte[]>();
for(int i=0; i<deviceCount;++i)
{
byte[] scannerRegs = new byte[memSizeInBytes];
registers_.Add(scannerRegs);
}
initialized_ = true;
}
public static byte[] GetBytes(int deviceIndex)
{
if (initialized_) return registers_[deviceIndex];
else return null;
}
public static void UpdateBytes(int deviceIndex, byte[] memRegisters)
{
if (!initialized_) throw new Exception("Memory has not been initialized");
if (memRegisters.Length != memSizeInBytes_)
throw new Exception("Memory register size does not match the defined memory size in bytes: " + memSizeInBytes_);
registers_[deviceIndex] = memRegisters;
}
}
Here are my questions about the above:
Am I correct that I can lock a row of the 2 dimensional array as shown above?
Have I properly implemented an immutable class? i.e. Do you see any issues with the DeviceMemory class that would prevent me from writing to the UpdateBytes method from the device thread and read simultaneously from multiple clients on different thread?
Is this immutable class a wise choice over the more traditional multi-dimensional byte array/lock? Specifically, I'm concerned about memory usage/garbage collection since updates to the byte arrays will actually be "new" byte arrays that replace the reference to the old array. The reference to the old array should be released immediately after the client reads it, however. There will be about 35 devices updating every second and 4 clients per device reading at approximately 1 second intervals.
Would once solution or the other perform better, especially if the server were to scale out?
Thank you for reading!

First, what you've shown in your second code example is not an "immutable class", but rather a "static class". Two very different things; you'll want to review your terminology to ensure that you are communicating effectively, as well as not becoming confused when researching techniques related to either.
As for your questions:
1.Am I correct that I can lock a row of the 2 dimensional array as shown above?
You should use ReaderWriterLockSlim instead, and you didn't show any mechanism for catching the timeout exception should one occur. But otherwise, yes…you can use an individual lock for each byte[] element of the byte[][] object.
More generally, you can use a lock to represent whatever unit of data or other resource you want. The lock object doesn't care.
2.Have I properly implemented an immutable class? i.e. Do you see any issues with the DeviceMemory class that would prevent me from writing to the UpdateBytes method from the device thread and read simultaneously from multiple clients on different thread?
If you really mean "immutable", then no. If you really mean "static", then yes, but it's not clear to me that knowing that is useful. The class isn't thread-safe, which seems to be your greater concern, so in that respect you haven't done it right.
As far as the UpdateBytes() method itself goes, there's nothing wrong with it per se. And indeed, since copying a reference from one variable to another is an atomic operation in .NET, the UpdateBytes() method can "safely" update the array element at the same time some other thread is trying to retrieve it. "Safely" in the sense that a reader won't get corrupted data.
But there's nothing in your class that ensures that Initialize() is called only once. In addition, without synchronization (a lock or marking variables volatile) you have no guarantees that values written in one thread will ever be observed by another thread. That includes all of the fields, as well as the individual byte[] array elements.
3.Is this immutable class a wise choice over the more traditional multi-dimensional byte array/lock? Specifically, I'm concerned about memory usage/garbage collection since updates to the byte arrays will actually be "new" byte arrays that replace the reference to the old array. The reference to the old array should be released immediately after the client reads it, however. There will be about 35 devices updating every second and 4 clients per device reading at approximately 1 second intervals.
There's not enough context in your question to make a comparison, as we have no idea how you'd otherwise access the memoryBlock array. If you're just copying new arrays into the array, the two should be similar. Even if it's only in the second example that you are creating new arrays, then assuming the arrays are not large, I would expect the generation of ~100 new objects per second to be well within the bandwidth of the garbage collector.
As far as whether the approach in your second code example is a desirable approach, I will with all due respect suggest that you should probably stick to a conventionally synchronized code (i.e. with ReaderWriterLockSlim, or even just a plain lock statement). Concurrency is hard enough to get right with regular locks, never mind trying to write lock-free code. Given the update rates you're describing, I would expect a plain lock statement would work just fine.
If you run into some bandwidth problems, then at least you have a known-good implementation with which to compare new implementations, and will have a better idea of just how complicated an implementation you really require.
4.Would once solution or the other perform better, especially if the server were to scale out?
Impossible to say without more details.

Related

Where is the List<MyClass> object buffer maintained? Is it on RAM or HDD?

My question might sound a little vague. But what I want to know is where the List<> buffer is maintained.
I have a list List<MyClass> to which I am adding items from an infinite loop. But the RAM consumption of the Windows Service(inside which I am creating the List) never goes beyond 17 MB. In fact it hovers between 15-16MB even if I continue adding items to the List.
I was trying to do some Load Testing of My Service and came across this thing.
Can anyone tell me whether it dumps the data to some temporary location on the machine, and picks it from there as I don't see an increase in RAM consumption.
The method which I am calling infinitely is AddMessageToList().
class MainClass
{
List<MessageDetails> messageList = new List<MessageDetails>();
private void AddMessageToList()
{
SendMessage(ApplicationName,Address, Message);
MessageDetails obj= new MessageDetails();
obj.ApplicationName= ApplicationName;
obj.Address= Address;
obj.Message= Message;
lock(messageList)
{
messageList.Add(obj);
}
}
}
class MessageDetails
{
public string Message
{
get;
set;
}
public string ApplicationName
{
get;
set;
}
public string Address
{
get;
set;
}
}

The answer to your question is: "In Memory".
That can mean RAM, and it can also mean the hard drive (Virtual Memory). The OS memory manager decides when to page memory to Virtual Memory, which mostly has to do with how often the memory is accessed (though I don't pretend to know Microsoft's specific algorithm).
You also asked why your memory usages isn't going up. First off, a MegaByte is a HUGE amount of memory. Unless your class is quite large, you will need a LOT of them to make a MB appear. Eventually your memory usage should go up though.

In general C# objects are created from the Heap, which resides in memory. If you want to store things on disk there are ways to go about it, but a standard List<T> will live in memory.
When you create an object it will occupy a certain number of bytes in memory plus the size of the pointers used to reference it. Adding it to a list only adds a pointer to the object you've already created, so if you're adding lots of copies of the same instance into the list, it won't grow as fast as you expect.
If you really want to test the impact of large data structures on your memory, you're going to have to ramp up the numbers. A few thousand average objects aren't going to occupy much memory, but a few million might.
You might also be interested in the GC.GetTotalMemory() method and its friends.

Note that pretty much all memory on Windows (and .NET) is Virtual Memory - its "real, physical" location is arbitrary, Windows memory management handles that. However, regardless of whether it's currently using physical RAM or a page file on the HDD, it will show up as committed private memory.
So it's up to how you're actually creating the items and adding them to the List<T>. How many objects are there? Are you adding the same object over and over again, or creating a new one every time? Are you using the same List instance, or are you creating others? Do you keep references to the created objects (and List instances), or are you throwing them away? Do you actually do anything with the object / list? If not, the optimizer might have removed the code alltogether (it's very conservative, though, so I wouldn't count on that in adding items to a list - that's a very complex scenario with possible side effects).
In the lowest memory ideal case, you could be using about four bytes per list item, that's not much - you'd need 262 144 items to consume a single MiB of memory!
Show us your code, the whole loop and it's surroundings. Then we can tell you what you're actually doing.
EDIT: This is in a WCF service? You should have said so before. Where do you store the MainClass class? If it's inside the WCF service class, it might not last longer than a single request. And even if you fix that, and store it in something a bit more persistent, like a static class, you get into the complexities of when everything is collected, how the service is being restarted etc. If you need the data to be safely held for longer than a single request, storing it in process memory isn't good enough. If you don't care that the data can get thrown away once in a while, you can make the List instance static (it's not going to be shared nor persisted otherwise). Otherwise, use a database.

Speculating from your sparse description, there are two sorts of things you might not realize:
The 15-16 MB usage you see might have nothing to do with the size of your list: it could be the memory requirements for the rest of the program, and your list only consumes a negligible amount of memory in comparison. Even if you don't explicitly create objects, your program still has to load libraries and stuff, which takes memory.
I don't know C# so I don't know if this applies to List, but one of the standard container class implementations to dynamically allocate an array to hold the objects... and if the array is ever filled, then you allocate a new array twice the size and copy everything over to the new array and continue along (the actual ratio may be something other than $2$). This can have the effect that your memory usage remains constant for a long time, until you finally fill up the array and then it suddenly jumps up in size, only to remain constant again for a long time.

Need to delete objects: implement Dispose or create objects in a function?

I have some objects that read a file, save the data in arrays and make some operations. The sequence is Create object A, operate with object A. Create object B, operate with object B...
The data read by each object may be around 10 MB. So the best option would be to delete each object after operate with each one. Let say I want my program to allocate around 10 MB in memory, not 10MB * 1000 objects = 1GB
The objects are something like:
class MyClass
{
List<string[]> data;
public MyClass(string datafile)
{
using (CsvReader csv = new CsvReader(new StreamReader(datafile), true))
{
data = csv.ToList<string[]>();
}
}
public List<string> Operate()
{
...
}
}
My question is: should I implement dispose? And do something like:
List<string> results = new List<results>();
using (MyClass m = new MyClass("fileM.txt"))
{
results.AddRange(m.Operate());
}
using (MyClass d = new MyClass("fileD.txt"))
{
results.AddRange(d.Operate());
}
...
I´ve read that implementing Disposable is recommended when you use unmmanaged resources (sockets, streams, ...), but in my class I have only big data arrays.
Another way would be to create functions for each objects (I suppose GC will delete a object created in a function automatically):
List<string> results = new List<results>();
results.AddRange(myFunction("fileM.txt"));
results.AddRange(myFunction("fileD.txt"));
public List<string> myFunction(string file)
{
MyClass c = new MyClass(file);
return results.AddRange(c.Operate());
}

IDisposable etc will not help you here, since it doesn't cause anything to get collected. In this type of scenario, maybe the best approach is to use a pool to reduce allocations - essentially becoming your own memory manager. For example, if your List<string> is big, you can avoid a lot of the arrays by re-using the lists - after clearing them, obviously. If you call .Clear(), the backing array is not reset - it just sets a logical marker to consider it as empty. In your specific case, a lot of your objects are going to be the individual strings; that is trickier, but at least they are small and should be collectable in generation-zero.

In your case I'd allocate a single buffer array. For example, allocate a 10 MB array once and fill it with the data you want. Then, when you get to the next object, just reuse the array. If you ever need a bigger array, you can just allocate a new, bigger, array and use that instead. The garbage collector will eventually remove your smaller one.
You can also use a List<T>, it will internally do the same (allocate an array, keep it until it becomes too small, allocate a new one). Just Clear it before creating the next object.
Note that you cannot force1 the garbage collector to collect an object. IDisposable is indeed only used to clean up unmanaged resources, as the garbage collector does not know about them, or to close (file) handles. Calling Dispose does not guarantee (or imply) that the object is removed from memory.
However, if you change nothing, your code will still be correct and work properly. The garbage collector is responsible for removing unused objects whenever it feels like doing that, and it will ensure that there is plenty of memory available at all times. The only thing you have to do to let the collector do its work is to let go of any references to old objects (by overwriting them or setting them to null, or letting them go out of scope).
1) You can force the garbage collector to collect your data by calling GC.Collect(). However, this is not recommended. Let the garbage collector figure it out by itself.

If you're using .NET 4.0 or greater, have a look at the BlockingCollection class. The constructor that takes the Int32 parameter allows you to specify an upper bound on the size of the collection. The Add and Take methods work as throttles. Add will only succeed if the upper bound has not been reached. If it has, it'll block. Take will only succeed if an item exists. If no item exists, it'll block until one is available. Of course, the class has some variations of these methods, so fully examine the documentation to see which, if any, make sense.

C# Growing List and Pointers to Elements

I need to have a growing array, or list (the built in ones are sufficient). Furthermore I need to be able to manipulate elements in the array with pointers to that specific element for example the following code
List<int> l1=new List<int>();
List<bool> l2=new List<bool>();
l1.Add(8);
l2.Add(true);
l1.Add(234);
l2.Add(true);
Console.WriteLine(l1[0]); //output=8
int* pointer = (int *) l1[0];
Console.WriteLine(*pointer); //Needs to output 8
Console.WriteLine(l2[0]); //output=true
bool* pointer2 = (bool *) l2[0];
Console.WriteLine(*pointer2); //Needs to output true
Thanks in advance for any help

Im trying to use an array to store packet data and pass it off to threads, these theads need to be able to modify the data without trashing the array
In this case, I would just pass your List<T> into the threaded routine, as well as a starting and ending index that thread should use.
Provided you always work by index, and stay within the bounds, there shouldn't be any problem with "trashing the array."

First off: you are applying a C++ approach to a problem that is solved differently in C#. In C# you generally don't want to do things involving explicit pointers, because they make life difficult, especially as it pertains to garbage collection.
That said, if you must do it this way, what you'd want to do is pass the entire list (and maybe the index) as a parameter, along with an offset, to the other thread. You would also want to be sure to lock the list appropriately in all accessing threads, to avoid dirty reads/writes.
The right solution is just to pass the item that you actually want to process. Reference types are passed byval, but that just means a new pointer is created to the same heap variable. It isn't actually creating a new value on the heap.
So for example:
var myList = new List<MyClass> { someInstanceofMyClass1, someInstanceofMyClass2 };
var t = new Thread(()=> SomeMethod(myList[0])); // Assuming MyClass is a reference type, the value passed here is the same instance as the one in myList
t.Start();
...

It already works the way you want for reference types. Therefore one potential solution is to create a class to box these values as reference types. If the items are related by index (and I suspect they are) it's a good idea to keep one list to hold a type that groups both values rather than two lists anyway. Going with that:
public Class MyClass
{
public int IntValue {get;set;}
public bool BoolValue {get;set;}
public MyClass(int intValue, bool boolValue)
{
IntValue = intValue;
boolValue = boolValue;
}
}
List<MyClass> l1 = new List<MyClass>();
l1.Add(new MyClass(8, true));
MyClass pointer = l1[0];
Console.WriteLine(pointer.IntValue); //writes 8
Console.WriteLine(pointer.BoolValue); //writes True

If you are using .NET 4, you might want to look into the classes in System.Collections.Concurrent namespace. They provide thread-safe data structures that might help you achieve your goal with less code.

For what you are trying to accomplish (network packets), it sounds like you would benefit from a System.IO.BinaryWriter instead of a List. (You could even pass a NetworkStream to a BinaryWriter)
It supports seeking so you can go back and re-write, and you can't trash the array since it grows automatically.
Performance-wise, I would assume BinaryWriter is faster than a List since it writes to an underlying Stream and MemoryStream is faster than a List<byte>

How do these people avoid creating any garbage?

Here's an interesting article that I found on the web.
It talks about how this firm is able to parse a huge amount of financial data in a managed environment, essentially by object reuse and avoiding immutables such as string. They then go on and show that their program doesn't do any GC during the continuous operation phase.
This is pretty impressive, and I'd like to know if anyone else here has some more detailed guidelines as to how to do this. For one, I'm wondering how the heck you can avoid using string, when blatently some of the data inside the messages are strings, and whatever client application is looking at the messages will want to be passed those strings? Also, what do you allocate in the startup phase? How will you know it's enough? Is it simple a matter of claiming a big chunk of memory and keeping a reference to it so that GC doesn't kick in? What about whatever client application is using the messages? Does it also need to be written according to these stringent standards?
Also, would I need a special tool to look at the memory? I've been using SciTech memory profiler thus far.

I found the paper you linked to rather deficient:
It assumes, and wants you to assume, that garbage collection is the ultimate latency killer. They have not explained why they think so, nor have they explained in what way their system is not basically a custom-made garbage collector in disguise.
It talks about the amount of memory cleaned up in garbage collection, which is irrelevant: the time taken to garbage collect depends more on the number of objects, irrespective of their size.
The table of “results” at the bottom provides no comparison to a system that uses .NET’s garbage collector.
Of course, this doesn’t mean they’re lying and it’s nothing to do with garbage collection, but it basically means that the paper is just trying to sound impressive without actually divulging anything useful that you could use to build your own.

One thing to note from the beginning is where they say "Conventional wisdom has been developing low latency messaging technology required the use of unmanaged C++ or assembly language". In particular, they are talking about a sort of case where people would often dismiss a .NET (or Java) solution out of hand. For that matter, a relatively naïve C++ solution probably wouldn't make the grade either.
Another thing to consider here, is that they have essentially haven't so much gotten rid of the GC as replaced it - there's code there managing object lifetime, but it's their own code.
There are several different ways one could do this instead. Here's one. Say I need to create and destroy several Foo objects as my application runs. Foo creation is parameterised by an int, so the normal code would be:
public class Foo
{
private readonly int _bar;
Foo(int bar)
{
_bar = bar;
}
/* other code that makes this class actually interesting. */
}
public class UsesFoo
{
public void FooUsedHere(int param)
{
Foo baz = new Foo(param)
//Do something here
//baz falls out of scope and is liable to GC colleciton
}
}
A much different approach is:
public class Foo
{
private static readonly Foo[] FOO_STORE = new Foo[MOST_POSSIBLY_NEEDED];
private static Foo FREE;
static Foo()
{
Foo last = FOO_STORE[MOST_POSSIBLY_NEEDED -1] = new Foo();
int idx = MOST_POSSIBLY_NEEDED - 1;
while(idx != 0)
{
Foo newFoo = FOO_STORE[--idx] = new Foo();
newFoo._next = FOO_STORE[idx + 1];
}
FREE = last._next = FOO_STORE[0];
}
private Foo _next;
//Note _bar is no longer readonly. We lose the advantages
//as a cost of reusing objects. Even if Foo acts immutable
//it isn't really.
private int _bar;
public static Foo GetFoo(int bar)
{
Foo ret = FREE;
FREE = ret._next;
return ret;
}
public void Release()
{
_next = FREE;
FREE = this;
}
/* other code that makes this class actually interesting. */
}
public class UsesFoo
{
public void FooUsedHere(int param)
{
Foo baz = Foo.GetFoo(param)
//Do something here
baz.Release();
}
}
Further complication can be added if you are multithreaded (though for really high performance in a non-interactive environment, you may want to have either one thread, or separate stores of Foo classes per thread), and if you cannot predict MOST_POSSIBLY_NEEDED in advance (the simplest is to create new Foo() as needed, but not release them for GC which can be easily done in the above code by creating a new Foo if FREE._next is null).
If we allow for unsafe code we can have even greater advantages in having Foo a struct (and hence the array holding a contiguous area of stack memory), _next being a pointer to Foo, and GetFoo() returning a pointer.
Whether this is what these people are actually doing, I of course cannot say, but the above does prevent GC from activating. This will only be faster in very high throughput conditions, if not then letting GC do its stuff is probably better (GC really does help you, despite 90% of questions about it treating it as a Big Bad).
There are other approaches that similarly avoid GC. In C++ the new and delete operators can be overridden, which allows for the default creation and destruction behaviour to change, and discussions of how and why one might do so might interest you.
A practical take-away from this is when objects either hold resources other than memory that are expensive (e.g. connections to databases) or "learn" as they continue to be used (e.g. XmlNameTables). In this case pooling objects is useful (ADO.NET connections do so behind the scenes by default). In this case though a simple Queue is the way to go, as the extra overhead in terms of memory doesn't matter. You can also abandon objects on lock contention (you're looking to gain performance, and lock contention will hurt it more than abandoning the object), which I doubt would work in their case.

From what I understood, the article doesn't say they don't use strings. They don't use immutable strings. The problem with immutable strings is that when you're doing parsing, most of the strings generated are just throw-away strings.
I'm guessing they're using some sort of pre-allocation combined with free lists of mutable strings.

I worked for a while with a CEP product called StreamBase. One of their engineers told me that they were migrating their C++ code to Java because they were getting better performance, fewer bugs and better portability on the JVM by pretty much avoiding GC altogether. I imagine the arguments apply to the CLR as well.
It seemed counter-intuitive, but their product was blazingly fast.
Here's some information from their site:
StreamBase avoids garbage collection in two ways: Not using objects, and only using the minimum set of objects we need.
First, we avoid using objects by using Java primitive types (Boolean, byte, int, double, and long) to represent our data for processing. Each StreamBase data type is represented by one or more primitive type. By only manipulating the primitive types, we can store data efficiently in stack or array allocated regions of memory. We can then use techniques like parallel arrays or method calling to pass data around efficiently.
Second, when we do use objects, we are careful about their creation and destruction. We tend to pool objects rather than releasing them for garbage collection. We try to manage object lifecycle such that objects are either caught by the garbage collector in the young generation, or kept around forever.
Finally, we test this internally using a benchmarking harness that measures per-tuple garbage collection. In order to achieve our high speeds, we try to eliminate all per-tuple garbage collection, generally with good success.

In 99% of the time you will be wasting your bosses money when you try to achieve this. The article describes a absolute extreme scenario where they need the last drop of performance. As you can read in the article, there are great parts of the .NET framework that can't be used when trying to be GC-free. Some of the most basic parts of the BCL use memory allocations (or 'produce garbage', as the paper calls it). You will need to find a way around those methods. And even when you need absolute blazingly fast applications, you'd better first try to build an application/architecture that can scale out (use multiple machines), before trying to walk the no-GC route. The sole reason for them to use the no-GC route is they need an absolute low latency. IMO, when you need absolute speed, but don't care about the absolute minimum response time, it will be hard to justify a no-GC architecture. Besides this, if you try to build a GC-free client application (such as Windows Forms or WPF App); forget it, those presentation frameworks create new objects constantly.
But if you really want this, it is actually quite simple. Here is a simple how to:
Find out which parts of the .NET API can't be used (you can write a tool that analyzes the .NET assemblies using an introspection engine).
Write a program that verifies the code you or your developers write to ensure they don't allocate directly or use 'forbidden' .NET methods, using the safe list created in the previous point (FxCop is a great tool for this).
Create object pools that you initialize at startup time. The rest of the program can reuse existing object so that they won't have to do any new ops.
If you need to manipulate strings, use byte arrays for this and store byte arrays in a pool (WCF uses this technique also). You will have to create an API that allows manipulating those byte arrays.
And last but not least, profile, profile, profile.
Good luck

Is c# compiler deciding to use stackalloc by itself?

I found a blog entry which suggests that sometimes c# compiler may decide to put array on the stack instead of the heap:
Improving Performance Through Stack Allocation (.NET Memory Management: Part 2)
This guy claims that:
The compiler will also sometimes decide to put things on the stack on its own. I did an experiment with TestStruct2 in which I allocated it both an unsafe and normal context. In the unsafe context the array was put on the heap, but in the normal context when I looked into memory the array had actually been allocated on the stack.
Can someone confirm that?
I was trying to repeat his example, but everytime I tried array was allocated on the heap.
If c# compiler can do such trick without using 'unsafe' keyword I'm specially intrested in it. I have a code that is working on many small byte arrays (8-10 bytes long) and so using heap for each new byte[...] is a waste of time and memory (especially that each object on heap has 8 bytes overhead needed for garbage collector).
EDIT: I just want to describe why it's important to me:
I'm writing library that is communicating with Gemalto.NET smart card which can have .net code working in it. When I call a method that returns something, smart card return 8 bytes that describes me the exact Type of return value. This 8 bytes are calculated by using md5 hash and some byte arrays concatenations.
Problem is that when I have an array that is not known to me I must scan all types in all assemblies loaded in application and for each I must calculate those 8 bytes until I find the same array.
I don't know other way to find the type, so I'm trying to speed it up as much as possible.

Author of the linked-to article here.
It seems impossible to force stack allocation outside of an unsafe context. This is likely the case to prevent some classes of stack overflow condition.
Instead, I recommend using a memory recycler class which would allocate byte arrays as needed but also allow you to "turn them in" afterward for reuse. It's as simple as keeping a stack of unused byte arrays and, when the list is empty, allocating new ones.
Stack<Byte[]> _byteStack = new Stack<Byte[]>();
Byte[] AllocateArray()
{
Byte[] outArray;
if (_byteStack.Count > 0)
outArray = _byteStack.Pop();
else
outArray = new Byte[8];
return outArray;
}
void RecycleArray(Byte[] inArray)
{
_byteStack.Push(inArray);
}
If you are trying to match a hash with a type it seems the best idea would be to use a Dictionary for fast lookups. In this case you could load all relevant types at startup, if this causes program startup to become too slow you might want to consider caching them the first time each type is used.

From your line:
I have a code that is working on many small byte arrays (8-10 bytes long)
Personally, I'd be more interested in allocating a spare buffer somewhere that different parts of your code can re-use (while processing the same block). Then you don't have any creation/GC to worry about. In most cases (where the buffer is used for very discreet operations) with a scratch-buffer, you can even always assume that it is "all yours" - i.e. every method that needs it can assume that they can start writing at zero.
I use this single-buffer approach in some binary serialization code (while encoding data); it is a big boost to performance. In my case, I pass a "context" object between the layers of serialization (that encapsulates the scratch-buffer, the output-stream (with some additional local buffering), and a few other oddities).

System.Array (the class representing an array) is a reference type and lives on the heap. You can only have an array on the stack if you use unsafe code.
I can't see where it says otherwise in the article that you refer to. If you want to have a stack allocated array, you can do something like this:
decimal* stackAllocatedDecimals = stackalloc decimal[4];
Personally I wouldn't bother- how much performance do you think you will gain by this approach?
This CodeProject article might be useful to you though.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.