Serializing a C# Socket

Serializing a C# Socket - c#

I'm looking at options for optimizing the number of concurrent connections my socket servers can handle and had an idea that hinges on being able to serialize C# sockets so that they can be removed from memory and then restored as needed. This scenario is only acceptable for me because sessions last for hours and the sockets are used very infrequently to send to clients and never for receiving during this time period. My current implementation is memory bound because I am holding each socket in memory for the lifetime of the corresponding client's session. Again, my thought is that if I were able to serialize the socket and write it to disc or stick it in a distributed cache/database/file store I could free up memory on the servers at the expense of some extra time required to process each send (i.e. deserialize the socket and then send on it). I've tried a couple of options, but ran into road blocks with each:
Serialize/Deserialize the socket by reading and writing through a pointer to the object in memory. I can't seem to restore the socket after serialization.
Use the Socket.DuplicateAndClose() method to get the SocketInformation, then serialize that and when needed restore the socket to the same process using the SocketInformation. I can't seem to use the socket once it is restored and I'm not sure this is going to amount to a significant memory savings as it seems to leave unmanaged resources in memory.
It seems to me that there should be a way to accomplish this. Ultimately, I'm looking for someone to either point me in the right direction or confirm that it is or isn't possible.
Any help is greatly appreciated!

This sounds like a good continuation to Alice's Adventures in Wonderland - a wonderful non-sense. You can't serialize a socket because this just doesn't make sense. Class "socket" (I mean not .NET Socket class but a type of objects which are called socket) don't support operation of "serialization" because sockets are (if we think in real-world objects) not data containers but gates to the communication channel. You can make a copy of the book, but it will be very hard to make a paper copy of a door.
Now about memory. You can have about 64K of sockets on your Windows system (I can be wrong with exact number, but the aproximate is this). Even with 100 bytes per socket you will occupy just 6 Mb of memory. In modern server OS (Windows, Linux, you name it) 6 Mb of user-mode memory is less than nothing. You will gain much more if you review overall application architecture.

If I understand the question correctly, you are trying to serialize a Socket object, saving off its information (object contents), and then later trying to reconstitute that object with the saved info.
This won't work, because you can't simply save the contents of the Socket object and restore it later. Deep down, the socket uses an actual socket handler (open file descriptor) from the operating system. Saving and restoring this data won't reconnect the actual device handle within the operating system.
A socket requires physically connecting it (opening it) at the operating system level. This is similar to a Stream object. You can't simply save off the contents of the object and restore it later; it requires an attachment to a file descriptor within the operating system.

Your sockets is not the problem. They use very little memory. It's more likely how you treat inbound and outbound data that is the problem.
Are you allocating new (byte) buffers for each operation?
Create a buffer pool instead. Let the pool create a new buffer if it's empty. Don't forget to return a buffer when you are done with it.
What are you doing once you got the data in a buffer?
Are you building a string? If you have lots of incoming data or large strings you might want to switch to a StringBuilder. string.Format("{0}kdkd{1}jdjd{2}", var1, var2, var3) allocates less memory than var1 + "kdkd" + var2 + "jdjd" + var3.
How are you wrapping the socket?
You got a fat class with lots of stuff in it? Then it's your fat class that is the problem.

Related

How to write from multiple threads to a TcpListener?

Let's say I have a static list List<string> dataQueue, where data keeps getting added at random intervals and also at a varying rate (1-1000 entries/second).
My main objective is to send the data from the list to the server, I'm using a TcpClient class.
What I've done so far is, I'm sending the data synchronously to the client in a Single thread
byte[] bytes = Encoding.ASCII.GetBytes(message);
tcpClient.GetStream().Write(bytes, 0, bytes.Length);
//The client is already connected at the start
And I remove the entry from the list, once the data is sent.
This works fine, but the speed of data being sent is not fast enough, the list gets populated and consumes more memory, as the list gets iterated and sent one by one.
My question is can I use the same tcpClient object to write concurrently from another thread or can I use another tcpClient object with a new connection to the same server in another thread? What is the most efficient(quickest) way to send this data to the server?
PS: I don't want to use UDP

Right; this is a fun topic which I think I can opine about. It sounds like you are sharing a single socket between multiple threads - perfectly valid as long as you do it very carefully. A TCP socket is a logical stream of bytes, so you can't use it concurrently as such, but if your code is fast enough, you can share the socket very effectively, with each message being consecutive.
Probably the very first thing to look at is: how are you actually writing the data to the socket? what is your framing/encoding code like? If this code is simply bad/inefficient: it can probably be improved. For example, is it indirectly creating a new byte[] per string via a naive Encode call? Are there multiple buffers involved? Is it calling Send multiple times while framing? How is it approaching the issue of packet fragmentation? etc
As a very first thing to try - you could avoid some buffer allocations:
var enc = Encoding.ASCII;
byte[] bytes = ArrayPool<byte>.Shared.Rent(enc.GetMaxByteCount(message.Length));
// note: leased buffers can be oversized; and in general, GetMaxByteCount will
// also be oversized; so it is *very* important to track how many bytes you've used
int byteCount = enc.GetBytes(message, 0, message.Length, bytes, 0);
tcpClient.GetStream().Write(bytes, 0, byteCount);
ArrayPool<byte>.Shared.Return(bytes);
This uses a leased buffer to avoid creating a byte[] each time - which can massively improve GC impact. If it was me, I'd also probably be using a raw Socket rather than the TcpClient and Stream abstractions, which frankly don't gain you a lot. Note: if you have other framing to do: include that in the size of the buffer you rent, use appropriate offsets when writing each piece, and only write once - i.e. prepare the entire buffer once - avoid multiple calls to Send.
Right now, it sounds like you have a queue and dedicated writer; i.e. your app code appends to the queue, and your writer code dequeues things and writes them to the socket. This is a reasonably way to implement things, although I'd add some notes:
List<T> is a terrible way to implement a queue - removing things from the start requires a reshuffle of everything else (which is expensive); if possible, prefer Queue<T>, which is implemented perfectly for your scenario
it will require synchronization, meaning you need to ensure that only one thread alters the queue at a time - this is typically done via a simple lock, i.e. lock(queue) {queue.Enqueue(newItem);} and SomeItem next; lock(queue) { next = queue.Count == 0 ? null : queue.Dequeue(); } if (next != null) {...write it...}.
This approach is simple, and has some advantages in terms of avoiding packet fragmentation - the writer can use a staging buffer, and only actually write to the socket when a certain threshold is buffered, or when the queue is empty, for example - but it has the possibility of creating a huge backlog when stalls occur.
However! The fact that a backlog has occurred indicates that something isn't keeping up; this could be the network (bandwidth), the remote server (CPU) - or perhaps the local outbound network hardware. If this is only happening in small blips that then resolve themselves - fine (especially if it happens when some of the outbound messages are huge), but: one to watch.
If this kind of backlog is recurring, then frankly you need to consider that you're simply saturated for the current design, so you need to unblock one of the pinch points:
making sure your encoding code is efficient is step zero
you could move the encode step into the app-code, i.e. prepare a frame before taking the lock, encode the message, and only enqueue an entirely prepared frame; this means that the writer thread doesn't have to do anything except dequeue, write, recycle - but it makes buffer management more complex (obviously you can't recycle buffers until they've been completely processed)
reducing packet fragmentation may help significantly, if you're not already taking steps to achieve that
otherwise, you might need (after investigating the blockage):
better local network hardware (NIC) or physical machine hardware (CPU etc)
multiple sockets (and queues/workers) to round-robin between, distributing load
perhaps multiple server processes, with a port per server, so your multiple sockets are talking to different processes
a better server
multiple servers
Note: in any scenario that involves multiple sockets, you want to be careful not to go mad and have too many dedicated worker threads; if that number goes above, say, 10 threads, you probably want to consider other options - perhaps involving async IO and/or pipelines (below).
For completeness, another basic approach is to write from the app-code; this approach is even simpler, and avoids the backlog of unsent work, but: it means that now your app-code threads themselves will back up under load. If your app-code threads are actually worker threads, and they're blocked on a sync/lock, then this can be really bad; you do not want to saturate the thread-pool, as you can end up in the scenario where no thread-pool threads are available to satisfy the IO work required to unblock whichever writer is active, which can land you in real problems. This is not usually a scheme that you want to use for high load/volume, as it gets problematic very quickly - and it is very hard to avoid packet fragmentation since each individual message has no way of knowing whether more messages are about to come in.
Another option to consider, recently, is "pipelines"; this is a new IO framework in .NET that is designed for high volume networking, giving particular attention to things like async IO, buffer re-use, and a well-implemented buffer/back-log mechanism that makes it possible to use the simple writer approach (syncronize while writing) and have that not translate into direct sends - it manifests as an async writer with access to a backlog, which makes packet fragmentation avoidance simple and efficient. This is quite an advanced area, but it can be very effective. The problematic part for you will be: it is designed for async usage throughout, even for writes - so if your app-code is currently synchronous, this could be a pain to implement. But: it is an area to consider. I have a number of blog posts talking about this topic, and a range of OSS examples and real-life libraries that make use of pipelines that I can point you at, but: this isn't a "quick fix" - it is a radical overhaul of your entire IO layer. It also isn't a magic bullet - it can only remove overhead due to local IO processing costs.

High-performance TCP Socket programming in .NET C#

I know this topic is already asked sometimes, and I have read almost all threads and comments, but I'm still not finding the answer to my problem.
I'm working on a high-performance network library that must have TCP server and client, has to be able to accept even 30000+ connections, and the throughput has to be as high as possible.
I know very well I have to use async methods, and I have already implemented all kinds of solutions that I have found and tested them.
In my benchmarking, only the minimal code was used to avoid any overhead in the scope, I have used profiling to minimize the CPU load, there is no more room for simple optimization, on the receiving socket the buffer data was always read, counted and discarded to avoid socket buffer fill completely.
The case is very simple, one TCP Socket listens on localhost, another TCP Socket connects to the listening socket (from the same program, on the same machine oc.), then one infinite loop starts to send 256kB sized packets with the client socket to the server socket.
A timer with 1000ms interval prints a byte counter from both sockets to the console to make the bandwidth visible then resets them for the next measurement.
I've realized the sweet-spot for packet size is 256kB and the socket's buffer size is 64kB to have the maximum throughput.
With the async/await type methods I could reach
~370MB/s (~3.2gbps) on Windows, ~680MB/s (~5.8gbps) on Linux with mono
With the BeginReceive/EndReceive/BeginSend/EndSend type methods I could reach
~580MB/s (~5.0gbps) on Windows, ~9GB/s (~77.3gbps) on Linux with mono
With the SocketAsyncEventArgs/ReceiveAsync/SendAsync type methods I could reach
~1.4GB/s (~12gbps) on Windows, ~1.1GB/s (~9.4gbps) on Linux with mono
Problems are the following:
async/await methods were the slowest, so I will not work with them
BeginReceive/EndReceive methods started new async thread together with the BeginAccept/EndAccept methods, under Linux/mono every new instance of the socket was extremely slow (when there was no more thread in the ThreadPool mono started up new threads, but to create 25 instance of connections did take about 5 mins, creating 50 connections was impossible (program just stopped doing anything after ~30 connections).
Changing the ThreadPool size did not help at all, and I would not change it (it was just a debug move)
The best solution so far is SocketAsyncEventArgs, and that makes the highest throughput on Windows, but in Linux/mono it is slower than the Windows, and it was the opposite before.
I've benchmarked both my Windows and Linux machine with iperf,
Windows machine produced ~1GB/s (~8.58gbps), Linux machine produced ~8.5GB/s (~73.0gbps)
The weird thing is iperf could make a weaker result than my application, but on Linux, it is much higher.
First of all, I would like to know if the results are normal, or can I get better results with a different solution?
If I decide to use the BeginReceive/EndReceive methods (they produced relatively the highest result on Linux/mono) then how can I fix the threading problem, to make the connection instance creating fast, and eliminate the stalled state after creating multiple instances?
I continue making further benchmarks and will share the results if there is any new.
================================= UPDATE ==================================
I promised code snippets, but after many hours of experimenting the overall code is kind of a mess, so I would just share my experience in case it can help someone.
I had to realize under Window 7 the loopback device is slow, could not get higher result than 1GB/s with iperf or NTttcp, only Windows 8 and newer versions have fast loopback, so I don't care anymore about Windows results until I can test on newer version. SIO_LOOPBACK_FAST_PATH should be enabled via Socket.IOControl, but it throws exception on Windows 7.
It turned out the most powerful solution is the Completion event based SocketAsyncEventArgs implementation both on Windows and Linux/Mono. Creating a few thousand instances of the clients never messed up the ThreadPool, the program did not stop suddenly as I mentioned above. This implementation is very nice to the threading.
Creating 10 connections to the listening socket and feeding data from 10 separate thread from the ThreadPool with the clients together could produce ~2GB/s data traffic on Windows, and ~6GB/s on Linux/Mono.
Increasing the client connection count did not improve the overall throughput, but the total traffic became distributed among the connections, this might be because the CPU load was 100% on all cores/threads even with 5, 10 or 200 clients.
I think overall performance is not bad, 100 clients could produce around ~500mbit/s traffic each. (Of course this is measured in local connections, real life scenario on network would be different.)
The only observation I would share: experimenting with both the Socket in/out buffer sizes and with the program read/write buffer sizes/loop cycles highly affected the performance and very differently on Windows and on Linux/Mono.
On Windows the best performance has been reached with 128kB socket-receive, 32kB socket-send, 16kB program-read and 64kB program-write buffers.
On Linux the previous settings produced very weak performance, but 512kB socket-receive and -send both, 256kB program-read and 128kB program-write buffer sizes worked the best.
Now my only problem is if I try create 10000 connecting sockets, after around 7005 it just stops creating the instances, does not throw any exceptions, and the program is running as there was no any problem, but I don't know how can it quit from a specific for loop without break, but it does.
Any help would be appreciated regarding anything I was talking about!

Because this question gets a lot of views I decided to post an "answer", but technically this isn't an answer, but my final conclusion for now, so I will mark it as answer.
About the approaches:
The async/await functions tend to produce awaitable async Tasks assigned to the TaskScheduler of the dotnet runtime, so having thousands of simultaneous connections, therefore thousands or reading/writing operations will start up thousands of Tasks. As far as I know this creates thousands of StateMachines stored in ram and countless context switchings in the threads they are assigned to, resulting in very high CPU overhead. With a few connections/async calls it is better balanced, but as the awaitable Task count grows it gets slow exponentially.
The BeginReceive/EndReceive/BeginSend/EndSend socket methods are technically async methods with no awaitable Tasks, but with callbacks on the end of the call, which actually optimizes more the multithreading, but still the limitation of the dotnet design of these socket methods are poor in my opinion, but for simple solutions (or limited count of connections) it is the way to go.
The SocketAsyncEventArgs/ReceiveAsync/SendAsync type of socket implementation is the best on Windows for a reason. It utilizes the Windows IOCP in the background to achieve the fastest async socket calls and use the Overlapped I/O and a special socket mode. This solution is the "simplest" and fastest under Windows. But under mono/linux, it never will be that fast, because mono emulates the Windows IOCP by using linux epoll, which actually is much faster than IOCP, but it has to emulate the IOCP to achieve dotnet compatibility, this causes some overhead.
About buffer sizes:
There are countless ways to handle data on sockets. Reading is straightforward, data arrives, You know the length of it, You just copy bytes from the socket buffer to Your application and process it.
Sending data is a bit different.
You can pass Your complete data to the socket and it will cut it to chunks, copy the chucks to the socket buffer until there is no more to send and the sending method of the socket will return when all data is sent (or when error happens).
You can take Your data, cut it to chunks and call the socket send method with a chunk, and when it returns then send the next chunk until there is no more.
In any cases You should consider what socket buffer size You should choose. If You are sending large amount of data, then the bigger the buffer is, the less chunks has to be sent, therefore less calls in Your (or in the socket's internal) loop has to be called, less memory copy, less overhead.
But allocating large socket buffers and program data buffers will result in large memory usage, especially if You are having thousands of connections, and allocating (and freeing up) large memory multiple times is always expensive.
On sending side 1-2-4-8kB socket buffer size is ideal for most cases, but if You are preparing to send large files (over few MB) regularly then 16-32-64kB buffer size is the way to go. Over 64kB there is usually no point to go.
But this has only advantage if the receiver side has relatively large receiving buffers too.
Usually over the internet connections (not local network) no point to get over 32kB, even 16kB is ideal.
Going under 4-8kB can result in exponentially incremented call count in the reading/writing loop, causing large CPU load and slow data processing in the application.
Go under 4kB only if You know Your messages will usually be smaller than 4kB, or just very rarely over 4KB.
My conclusion:
Regarding my experiments built-in socket class/methods/solutions in dotnet are OK, but not efficient at all. My simple linux C test programs using non-blocking sockets could overperform the fastest and "high-performance" solution of dotnet sockets (SocketAsyncEventArgs).
This does not mean it is impossible to have fast socket programming in dotnet, but under Windows I had to make my own implementation of Windows IOCP by directly communicating with the Windows Kernel via InteropServices/Marshaling, directly calling Winsock2 methods, using a lot of unsafe codes to pass the context structs of my connections as pointers between my classes/calls, creating my own ThreadPool, creating IO event handler threads, creating my own TaskScheduler to limit the count of simultaneous async calls to avoid pointlessly much context switches.
This was a lot of job with a lot of research, experiment, and testing. If You want to do it on Your own, do it only if You really think it worth it. Mixing unsafe/unmanage code with managed code is a pain in the ass, but the end it worth it, because with this solution I could reach with my own http server about 36000 http request/sec on a 1gbit lan, on Windows 7, with an i7 4790.
This is such a high performance that I never could reach with dotnet built-in sockets.
When running my dotnet server on an i9 7900X on Windows 10, connected to a 4c/8t Intel Atom NAS on Linux, via 10gbit lan, I can use the complete bandwidth (therefore copying data with 1GB/s) no matter if I have only 1 or 10000 simultaneous connections.
My socket library also detects if the code is running on linux, and then instead of Windows IOCP (obviously) it is using linux kernel calls via InteropServices/Marshalling to create, use sockets, and handle the socket events directly with linux epoll, managed to max out the performance of the test machines.
Design tip:
As it turned out it is difficult to design a networking library from scatch, especially one, that is likely very universal for all purposes. You have to design it to have many settings, or especially to the task You need.
This means finding the proper socket buffer sizes, the I/O processing thread count, the Worker thread count, the allowed async task count, these all has to be tuned to the machine the application running on and to the connection count, and data type You want to transfer through the network. This is why the built-in sockets are not performing that good, because they must be universal, and they do not let You set these parameters.
In my case assingning more than 2 dedicated threads to I/O event processing actually makes the overall performance worse, because using only 2 RSS Queues, and causing more context switching than what is ideal.
Choosing wrong buffer sizes will result in performance loss.
Always benchmark different implementations for the simulated task You need to find out which solution or setting is the best.
Different settings may produce different performance results on different machines and/or operating systems!
Mono vs Dotnet Core:
Since I've programmed my socket library in a FW/Core compatible way I could test them under linux with mono, and with core native compilation. Most interestingly I could not observe any remarkable performance differences, both were fast, but of course leaving mono and compiling in core should be the way to go.
Bonus performance tip:
If Your network card is capable of RSS (Receive Side Scaling) then enable it in Windows in the network device settings in the advanced properties, and set the RSS Queue from 1 to as high you can/as high is the best for your performance.
If it is supported by Your network card then it is usually set to 1, this assigns the network event to process only by one CPU core by the kernel. If You can increment this queue count to higher numbers then it will distribute the network events between more CPU cores, and will result in much better performance.
In linux it is also possible to set this up, but in different ways, better to search for Your linux distro/lan driver information.
I hope my experience will help some of You!

I had the same problem. You should take a look into:
NetCoreServer
Every thread in the .NET clr threadpool can handle one task at one time. So to handle more async connects/reads etc., you have to change the threadpool size by using:
ThreadPool.SetMinThreads(Int32, Int32)
Using EAP (event based asynchronous pattern) is the way to go on Windows. I would use it on Linux too because of the problems you mentioned and take the performance plunge.
The best would be io completion ports on Windows, but they are not portable.
PS: when it comes to serialize objects, you are highly encouraged to use protobuf-net. It binary serializes objects up to 10x times faster than the .NET binary serializer and saves a little space too!

Record and replay UDP packets with timing

I need to implement a way of recording an UDP stream with the purpose of later on replaying that stream as requested. This record and replay must use the same timing (up to a sensible resolution, hardly perceivable by a human user, let's say 10ms): if the original stream had data for seconds 00:00 to 00:35, then went mute till 00:55, and then sent more data from 00:55 to 01:34, the same distribution must be observed when my application replays that stream.
If it were just a matter of saving the udp stream to disk and then replaying it, without this timing, it would be trivial by either using Socket, NetworkStream or UdpClient. The issue is I cannot get around to devise a way to modify the standard receive algorithm to include that timing in a way that it's somehow easily replayed later by the send algorithm. As a bonus, a way to start the replay at any time mark (from 00:15 on, for example) should also be supported.
1) Is there any way to easily implement this functionality on C#? We do not have any severe non-functional requirement for this, we just need for it to simply work. Any hint about a way to implement it would be really appreciated.
2) If this is indeed not a simple matter and anyone suggests the use of any third-party component for this, the requirements would be for it to have an API for C# (or a way to operate the component from code), and hopefully be opensource or freely used software.
Thank you very much.

This is not exactly a native feature of C#. But I will attempt to do your homework.
The way I would attack this is still to store the packets to disk. Design a custom file structure in binary format, Holding data about the time the Datagram was recieved followed by the UDP Datagram itself. You can then have a program read this file back, Find out at what time the UDPs payload was delivered. This will give you what you need to replay the packets and store it for later replay with all native C# Code, no third party modules. This will of course require you to know writing to/from a file and being able to parse the relevant data into the typed objects in C# (Timespan, maybe byte[] for the Datagram, etc). All of these decisions would be up to you, the program writer.
The long and the short of it is i'm 99% certain there's no native functionality for this kind of requirement. This is exactly Why programming languages exist. So we as programmers can produce amazing solutions for our clients/customers/Teachers ;)

Send large byte arrays between AppDomains in the same process

I'm building a network server and starting a lot of AppDomains on the server to which requests are routed. What will be the fastest way to send off a request payload to one of the AppDomains for processing?
Read in the payload from the socket into a byte array and marshal it.
Marshal the network stream (inherits from MarshalByRef) to the AppDomain.
Read the payload. Decode it into objects. Marshal the decoded objects.
Use named pipes to transfer the byte array.
Use loopback sockets.
Maybe there is a way to marshal the actual socket connection?
The decoding mostly creates immutable objects that are used to determine how to fulfill the clients request and the AppDomain then creates a response and marshals it back to the host AppDomain which sends it back through the socket.
The method should prefer less memory over less CPU.
WCF is not an option.

TCP binary remoting is certainly fast, I do not how much faster it is than raw sockets which is probably the fastest, but a royal PIA.
I have run 1500 - 2000 req per second in production using HTTP binary remoting between two boxes. On the same box you should have much high performance using TCP or a name pipes channel, depending in the CPU cycles it takes to process the data.

If I was you I would take a look at how Cassini is implemented. It does pretty much exactly what you are talking about doing.
Actually Cassini has been sort of superceded by Webhost which is the built-in webserver that ships with Visual Studio now. Take a look at this post on Phil Haack's blog for more.

Very good question. If I were coming at this problem I would probably use a Buffered Stream / Memory Stream and marshal the stream into the AppDomain that consumes the object to reduce marshaling or serializing many object graphs that were created in a different AppDomain.
But then again, it sounds like you are almost completely duplicating the functionality of IIS, so I would look/reflector into the System.Web.Hosting namespace and see how they handle it and their WorkerThreadPool etc....

6 .Maybe there is a way to marshal the
actual socket connection?
6-th is IMO the best option.
Socket from process perspective is just a handle. AppDomains reside in single process. That means that appdomains can interchange socket handles.
If socket marshalling is not working, you can try recreating socket in other appdomain. You can use DuplicateAndClose to do this.
If that will not work, you should do some perfomance testing to choose the best data transfer method. (I would choose named pipes or memomry mapped files)

Buffer pool management using C#

We need to develop some kind of buffer management for an application we are developing using C#.
Essentially, the application receives messages from devices as and when they come in (there could be many in a short space of time). We need to queue them up in some kind of buffer pool so that we can process them in a managed fashion.
We were thinking of allocating a block of memory in 256 byte chunks (all messages are less than that) and then using buffer pool management to have a pool of available buffers that can be used for incoming messages and a pool of buffers ready to be processed.
So the flow would be "Get a buffer" (process it) "Release buffer" or "Leave it in the pool". We would also need to know when the buffer was filling up.
Potentially, we would also need a way to "peek" into the buffers to see what the highest priority buffer in the pool is rather than always getting the next buffer.
Is there already support for this in .NET or is there some open source code that we could use?

C# sharps memory management is actually quite good, so instead of having a pool of buffers, you could just allocate exactly what you need and stick it into a queue. Once you are done with buffer just let the garbage collector handle it.
One other option (knowing only very little about your application), is to process the messages minimally as you get them, and turn them into full fledged objects (with priorities and all), then your queue could prioritize them just by investigating the correct set of attributes or methods.
If your messages come in too fast even for minimal processing you could have a two queue system. One is just a queue of unprocessed buffers, and the next queue is the queue of message objects built from the buffers.
I hope this helps.

#grieve: Networking is native, meaning that when buffers are used the receive/send data on the network, they are pinned in memory. see my comments below for elaboration.

Why wouldn't you just receive the messages, create a DeviceMessage (for lack of a better name) object, and put that object into a Queue ? If the prioritization is important, implement a PriorityQueue class that handles that automatically (by placing the DeviceMessage objects in priority order as they're inserted into the queue). Seems like a more OO approach, and would simplify maintenance over time with regards to the prioritization.

I know this is an old post, but I think you should take a look at the memory pool implemented in the ILNumerics project. I think they did exactly what you need and it is a very nice piece of code.
Download the code at http://ilnumerics.net/ and take a look at the file ILMemoryPool.cs

I'm doing something similar. I have messages coming in on MTA threads that need to be serviced on STA threads.
I used a BlockingCollection (part of the parallel fx extensions) that is monitored by several STA threads (configurable, but defaults to xr * the number of cores). Each thread tries to pop a message off the queue. They either time out and try again or successfully pop a message off and service it.
I've got it wired with perfmon counters to keep track of idle time, job lengths, incoming messages, etc, which can be used to tweak the queue's settings.
You'd have to implement a custom collection, or perhaps extend BC, to implement queue item priorities.
One of the reasons why I implemented it this way is that, as I understand it, queueing theory generally favors single-line, multiple-servers (why do I feel like I'm going to catch crap about that?).

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.