Performance limits for NetMQ

Performance limits for NetMQ - c#

I am sending messages from various places on my 10G network. I am using a Pub/Sub pattern.
The messages I send are serialized using zeroFormatter and have length of approx 270 bytes.
Once I start to send over 150K message per sec, I notice the subscriber starting to miss messages.
How do I work out what the limits I can expect to be able to send are?
EDIT 1:
I am sending just under 1 billion bits/sec, which is a 10th of the capacity of my network. After this point I start to miss messages. Would this be due to CPU issues? Neither sender or receiver seem highly utilized...
private void BackgroundProcess()
{
int msgSeqNum = 0;
using (var server = new PublisherSocket())
{
server.Options.SendHighWatermark = 1000;
server.Bind(Connection);
var address = Key;
FastTickData fastTickData;
while (true)
{
if (O.TryTake(out fastTickData, 60000))
{
msgSeqNum++;
server.SendMoreFrame(address).SendMoreFrame(msgSeqNum.ToString()).SendMoreFrame(DateTime.UtcNow.ToString("yyyyMMddTHHmmssffffff")).SendFrame(ZeroFormatterSerializer.Serialize(fastTickData));
}
}
}
}

Related

Kafka very high latency C#

I am doing some performance tests on Apache Kafka to compare it with others like RabbitMQ and ActiveMQ. The idea is to use it on a messaging system for agents' communication.
I am testing multiple scenarios (one to one, broadcast and many to one) with different numbers of publishers and subscribers and so different loads. Even in the lowest load scenario of one to one with 10 pairs of agents sending 500 messages with 1ms delay between sends I am experiencing very high latencies (average of ~200ms). And if we go to 100 pairs the numbers rise to ~1500ms. The same thing happens on broadcast and many to one.
I am using Windows with Kafka 2.12-2.5.0 and zookeeper 3.6.1 with C# .Net client Confluent.Kafka 1.4.2. I have already tried some properties like LingerMs = 0 according to some posts I found. I have both Kafka and zookeeper with default settings.
I made a simple test code in which the problem happens:
using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;
using System.Text;
using System.Threading;
using System.Threading.Tasks;
using Confluent.Kafka;
namespace KafkaSetupAgain
{
class Program
{
static void Main(string[] args)
{
int numberOfMessages = 500;
int numberOfPublishers = 10;
int numberOfSubscribers = 10;
int timeOfRun = 30000;
List<MCVESubscriber> Subscribers = new List<MCVESubscriber>();
for (int i = 0; i < numberOfSubscribers; i++)
{
MCVESubscriber ZeroMqSubscriber = new MCVESubscriber();
new Thread(() =>
{
ZeroMqSubscriber.read(i.ToString());
}).Start();
Subscribers.Add(ZeroMqSubscriber);
}
Thread.Sleep(10000);//to make sure all subscribers started
for (int i = 0; i < numberOfPublishers; i++)
{
MCVEPublisher ZeroMqPublisherBroadcast = new MCVEPublisher();
new Thread(() =>
{
ZeroMqPublisherBroadcast.publish(numberOfMessages, i.ToString());
}).Start();
}
Thread.Sleep(timeOfRun);
foreach (MCVESubscriber Subscriber in Subscribers)
{
Subscriber.PrintMessages("file.csv");
}
}
public class MCVEPublisher
{
public void publish(int numberOfMessages, string topic)
{
var config = new ProducerConfig
{
BootstrapServers = "localhost:9092",
LingerMs = 0,
Acks = 0,
};
var producer = new ProducerBuilder<Null, string>(config).Build();
int success = 0;
int failure = 0;
Thread.Sleep(3500);
for (int i = 0; i < numberOfMessages; i++)
{
Thread.Sleep(1);
long milliseconds = System.Diagnostics.Stopwatch.GetTimestamp() / TimeSpan.TicksPerMillisecond;
var t = producer.ProduceAsync(topic, new Message<Null, string> { Value = milliseconds.ToString() });
t.ContinueWith(task => {
if (task.IsFaulted)
{
failure++;
}
else
{
success++;
}
});
}
Console.WriteLine("Success: " + success + " Failure:" + failure);
}
}
public class MCVESubscriber
{
private List<string> prints = new List<string>();
public void read(string topic)
{
var config = new ConsumerConfig()
{
BootstrapServers = "localhost:9092",
EnableAutoCommit = false,
FetchErrorBackoffMs = 1,
};
var consumerConfig = new ConsumerConfig(config);
consumerConfig.GroupId = Guid.NewGuid().ToString();
consumerConfig.AutoOffsetReset = AutoOffsetReset.Earliest;
consumerConfig.EnableAutoCommit = false;
using (var consumer = new ConsumerBuilder<Ignore, string>(config).Build())
{
consumer.Subscribe(new[] { topic });
while (true)
{
var consumeResult = consumer.Consume();
long milliseconds = System.Diagnostics.Stopwatch.GetTimestamp() / TimeSpan.TicksPerMillisecond;
prints.Add(consumeResult.Message.Value + ";" + milliseconds.ToString());
}
consumer.Close();
}
}
public void PrintMessages(string path)
{
Console.WriteLine("printing " + prints.Count);
File.AppendAllLines(path, prints);
}
}
}
}
Does someone what can be the problem? What configs can I change to
improve latency?
Thanks,
Davide Costa

Kafka is not really built for low latency message distribution, but for high availability. It can be configured to have lower latency, but you start losing a lot of the advantages Kafka offers.
A few tips/comments below:
On the KafkaProducer side, in general, you want to wait until there's enough messages to send, to as to batch messages more efficiently. That's the linger.ms property you already mentioned. Typically that is set to something like 50ms, so by setting it to zero, you're effectively telling the producer to send data as fast as it gets it. This may make the producer more "chatty", but you have the assurance it will send the data to the cluster as soon as it gets it.
However, once a message is "produced" into Kafka, it waits until it gets an ACK from the lower layer that the broker has received the message successfully. There's multiple options here:
Consider a message as "received" once the message has been sent by the producer. That is, locally, once the network layer has finished sending it, the producer will consider it "sent and acknowledged"
Wait for an ACK from the leader broker which you're sending the message to, depending on which partition it gets assigned, so you at least know one broker has it. THIS IS THE DEFAULT.
Wait for an ACK from the leader broker which you're sending the message to, PLUS an ACK from each of that partitions' replicas on the other brokers. This means, if your cluster has a replication factor of 3, that the message is sent to broker 1 for example, it then replicates that to brokers 2 and 3, which have copies of the same partition, waits for those brokers to reply back saying they got the message, and only THEN reply back to the producer saying the message has been ACK'd. This is typically used in environments where you never want the possibility of losing a single message, so you always guarantee that there will be three copies of your message before the producer moves on.
Official acks explanation from the Kafka docs:
https://kafka.apache.org/25/documentation.html#acks
There are other settings to consider like kafka producer compression and broker compression settings that might add more latency/overhead, but if you're using the defaults (no producer compression and producer option in the broker compression), there should be no additional latency in those steps.
Having said all that, I would suggest you try to set the acks option in the producer to 0, and see how your latency changes. My guess is you will get much better latency, BUT also understand that there are no guarantees your messages are actually being received and stored correctly. A flaky network, a network partition, etc, could cause you to lose data. That might be ok for your use case, but just make sure you're aware of it.

.NET CORE(on windows), DataReceived called frequency of one serialPort effected by other serialPort

I have a .NET CORE 2.1 console application that is communicating with 2 devices which are both half-duplex via 2 comPort (rs232).
the device A on COM1, baud rate 9600, my app polling it every 200ms, and get response in 50ms.
the device B on COM2, baud rate 1200, my app receive its poll every 400ms, and responding it in 50ms.
the code for two comPort are totally separated, no share variable, no reference and etc.
For device A:
private ConcurrentQueue<object> outgoingMessageQueue = new ConcurrentQueue<object>();
this.comPort1.DataReceived += (a,b)=>{
while (this.comPort1.BytesToRead > 0){
var b = this.comPort.ReadByte();
buffer.Add((byte)b);
};
if(CheckIsFullMessage(buffer))
{//fire event for consume}
};
ThreadPool.QueueWorkerThread((_)=>{
while(true){
Thread.Sleep(200);
if (this.outgoingMessageQueue.TryDequeue(out object pendingForWrite))
{this.comPort1.Write(pendingForWrite); }
else
this.comPort1.Write(new PollMsg());
}};
//business logic, queue a request at any time.
this.outgoingMessageQueue.Add(new requestMessage());
For device B:
this.comPort2.DataReceived += (a,b)=>{
while (this.comPort2.BytesToRead > 0){
var b = this.comPort.ReadByte();
buffer.Add((byte)b);
};
if(CheckIsFullMessage(buffer))
{
//trigger business logic, consume the buffer and construct a response.
//this.comPort2.Write(response,0,response.length);
}
};
I noticed a thing, that if I turned on device B, the DataReceived for device A (comPort1) will be randomly delayed to be called(from ms to seconds), during the delay period, device A's 200ms polling is never stoped, so I'll suddenly get huge data from device A at one DataReceived.
Could anyone help, why these two comPorts affected each other?
-----more test----
I've did a test that connect 3 device A in 3 comPort into app, they works good, no DataReceived delayed.

After some testing and post from web, I confirmed this behavior on .NET CORE, that multiple SerialPort write&receive could delay the firing of the DataReceived, so rather than wait, I've added a code to actively pull
public void Start()
{
this.comPort.DataReceived += (_,__)=>{this.PullPortDataBuffer();};
//for very fast time accurate call(my app send&receive data every 200ms), use dedicated thread rather than timer.
this.pullingSerialPortBufferLoop = new Thread(() =>
{
while (true)
{
Thread.Sleep(200);
this.PullPortDataBuffer();
}
});
this.pullingSerialPortBufferLoop.Start();
};
var buffer = new List<byte>();
private void PullPortDataBuffer()
{
if (0 == Interlocked.CompareExchange(ref this.onPullingComPortBuffer, 1, 0))
try
{
while (this.comPort.BytesToRead > 0)
{
this.buffer.Add((byte)b);
}
this.ConsumeBufferIfFull(this.buffer);
}
catch (Exception ex)
{}
finally
{
this.onPullingComPortBuffer = 0;
}
else
{
if (logger.IsDebugEnabled)
logger.Debug(this.comPort.PortName + " concurrent enter Port_DataReceived");
}
}
from my testing, the issue is gone.

communicating with multiple slave (Modbus protocol based)

I am developing an application in which Lets says 50-60 Modbus supporting devices (Slaves) are connected to a Com Port which are communicating with my application in request response mechanism.
I want after every 15 min. request should be sent to every meter and response to be received from meter one by one.
communicating with multiple slave (Modbus protocol based)
For this i am making the use of System.Timers.timer to call the method lets say ReadAllSlave() after every 15 min.
In ReadAllSlave() i have used For loop to send the request and to receive response and using thread.sleep to maintain the delay..! but it seems that its not working and loop is executing in damn wired way.
private void StartPoll()
{
double txtSampleRate = 15 * 60 * 1000;
timer.Interval = txtSampleRate;
timer.AutoReset = true;
timer.Start();
}
void timer_Elapsed(object sender, ElapsedEventArgs e)
{
for(int index = 0; index<meterCount; Index++)
{
//Sending request to connected meter..
mb.SendFc3(m_slaveID[0], m_startRegAdd[0], m_noOfReg[0], ref value_meter);
if (mb.modbusStatus == "Read successful")
{
//Some code for writing the values in SQL Express database
}
//Wait for some time so that will not get timeout error for the next
//request..
Thread.Sleep(10000);
}
}
Can any one please suggest me the best approach to implement the same.
Thanks in advance.

It looks like your problem is a trivial one... You're always interrogating the same slave !
"index" is never used in your code...
What about something like this :
mb.SendFc3(m_slaveID[index], m_startRegAdd[index], m_noOfReg[index], ref value_meter);

ZeroMQ/0MQ Push/Pull memory and routing issues

I have played a while with ZeroMQ and have couple questions/problems that I came up with. Would appreciate if any contributer to ZeroMQ could chime in or anyone who has used or currently uses the library.
* Let's say I have one router/forwarder and 2 different clients(c1,c2). I want to push messages from client1 to client2 through the routing device. The router pulls messages from whichever client (here client1) and publishes them to any subscribed client (here client2). I currently the only way to route such messages to the appropriate client is through pub/sub, however , a) I want to decide how to route at runtime by sending a routingTo tag along with the message body, b) I want to use push/pull to forward to clients, not pub/sub because I want to implement blocking functionality when setting the high water mark property, c) I want to have c1 and c2 connect on exactly 1 port for pushing and 1 port for subscribing. Can I somehow make changes on the router side in order to not having to use pub/sub or is pub/sub the only way to route to clients even I know on the routing side where a message is supposed to be forwarded to? I read that pub/sub drops messages when queue size exceeds the hwm which I dont want. I also do not want to implement the request/reply patters because it adds unnecessary overhead as I do not need replies.
* After running below code (Push/Pull -> Pub/Sub) and sent all messages and have received confirmation that all messages were received the client that pushed messages out still displays a huge memory footprint, apparently there are still huge amounts of messages in the Push socket's queue. Why is that and what can I do to fix that?
Here is my code:
ROUTER:
class Program
{
static void Main(string[] args)
{
using (var context = new Context(1))
{
using (Socket socketIn = context.Socket(SocketType.PULL), socketOut = context.Socket(SocketType.XPUB))
{
socketIn.HWM = 10000;
socketOut.Bind("tcp://*:5560"); //forwards on this port
socketIn.Bind("tcp://*:5559"); //listens on this port
Console.WriteLine("Router started and running...");
while (true)
{
//Receive Message
byte[] address = socketIn.Recv();
byte[] body = socketIn.Recv();
//Forward Message
socketOut.SendMore(address);
socketOut.Send(body);
}
}
}
}
}
CLIENT1:
class Program
{
static void Main(string[] args)
{
using (var context = new Context(1))
{
using (Socket socketIn = context.Socket(SocketType.SUB), socketOut= context.Socket(SocketType.PUSH))
{
byte[] iAM = Encoding.Unicode.GetBytes("Client1");
byte[] youAre = Encoding.Unicode.GetBytes("Client2");
byte[] msgBody = new byte[16];
socketOut.HWM = 10000;
socketOut.Connect("tcp://localhost:5559");
socketIn.Connect("tcp://localhost:5560");
socketIn.Subscribe(iAM);
Console.WriteLine("Press key to kick off Test Client1 Sending Routine");
Console.ReadLine();
for (int counter = 1; counter <= 10000000; counter++)
{
//Send Message
socketOut.SendMore(youAre);
socketOut.Send(msgBody);
}
Console.WriteLine("Client1: Finished Sending");
Console.ReadLine();
}
}
}
}
CLIENT2:
class Program
{
public static int msgCounter;
static void Main(string[] args)
{
msgCounter = 0;
using (var context = new Context(1))
{
using (Socket socketIn = context.Socket(SocketType.SUB), socketOut = context.Socket(SocketType.PUSH))
{
byte[] iAM = Encoding.Unicode.GetBytes("Client2");
socketOut.Connect("tcp://localhost:5559");
socketIn.Connect("tcp://localhost:5560");
socketIn.Subscribe(iAM);
Console.WriteLine("Client2: Started Listening");
//Receive First Message
byte[] address = socketIn.Recv();
byte[] body = socketIn.Recv();
msgCounter += 1;
Console.WriteLine("Received first message");
Stopwatch watch = new Stopwatch();
watch.Start();
while (msgCounter < 10000000)
{
//Receive Message
address = socketIn.Recv();
body = socketIn.Recv();
msgCounter += 1;
}
watch.Stop();
Console.WriteLine("Elapsed Time: " + watch.ElapsedMilliseconds + "ms");
Console.ReadLine();
}
}
}
}

I'm going to suggest that your architecture may be a bit off here.
1) If you need exactly one PUSH and exactly one PULL, remove the device from the middle. Devices are added to an architecture explicitly to mange multiple consumers so that you don't have to update producers each time you add a node. When/If you do get to where you need multiple consumers and/or producers, you're going to need a connection to each node on your device - that's just how they work. In this case, it sounds as though the device is overly complicating your solution.
2) The idea of having the "route to" tag really boggles my mind. Probably the biggest reason to choose messaging over other integration options is to decouple your producers and consumers so that neither side has to know anything about the other (other than where to send the messages in the case of broker-less designs). Adding routing information directly to your logic breaks this.
As to the overhead, I've never experienced this. But then, I've never used the .Net driver for ZeroMQ before and so an uneducated guess would be to look at the .Net driver itself.

c# async sockets, read all buffer

I have a client that will send a lots of data to server from different threads.
The packet uses the following format:
PACKET_ID
CONTENT
END_OF_PACKET_INDICATOR
I have the following onDataRecieved function:
public void OnDataReceived(IAsyncResult asyn)
{
SocketPacket socketData = (SocketPacket)asyn.AsyncState;
int iRx = 0;
iRx = socketData.m_currentSocket.EndReceive(asyn);
char[] chars = new char[iRx + 1];
System.Text.Decoder d = System.Text.Encoding.UTF8.GetDecoder();
int charLen = d.GetChars(socketData.dataBuffer, 0, iRx, chars, 0);
MessageBox.Show("Incoming data: " + socketData.dataBuffer.Length.ToString() + " from socket(" + socketData.socket_id + ")");
char[] PACKET_END_IDENTIFIER = { (char)2, (char)1, (char)2, (char)1 };
for (int i = 0; i < iRx; i++)
{
GLOBAL_BUFFER.Add(chars[iRx]);
}
if (PacketEndReached(chars, PACKET_END_IDENTIFIER))
{
// done reading the data
PROCESS_PACKET(GLOBAL_BUFFER);
}
WaitForData(socketData.m_currentSocket, socketData.socket_id);
}
My socket buffer size is set to 100. If I send 1000 bytes, they would be split up in 10 chunks and onDataRecieved would get triggered 10 times.
All I need to do is keep reading the data into buffer for each individual packet sent my client until PacketEndReached gets triggered
then pass the buffer to another function that will process the data.
If I define a GLOBAL_BUFFER for storing incoming data, then if client sends data from multiple threads, wouldn't the data get mixed up? I need a way to read all the data for each individual packet sent my client.
Thanks!
UPDATE:
This is my current class:
public partial class TCP_SERVER
{
const int MAX_CLIENTS = 3000;
const int MAX_SOCKET_BUFFER_SIZE = 10;
public AsyncCallback pfnWorkerCallBack;
private Socket m_mainSocket;
private Socket[] m_workerSocket = new Socket[MAX_CLIENTS];
private int m_clientCount = 0;
public GLOBAL_BUFFER;
public void StartServer(int listen_port)
public void OnClientConnect(IAsyncResult asyn)
public void ProcessIncomingData(char[] INCOMING_DATA, int CLIENT_ID)
public void OnDataReceived(IAsyncResult asyn)
}
As you can see GLOBAL_BUFFER is defined 'globally'. If client sends packet_1 that takes 10 seconds to send and at the same time packet_2 that takes 2 secs to send data would get mixed up. I need to collect data for each packet individually.

If at all possible, I would recommend allowing each client thread to have their own connection to the server. Doing so will help the Winsock stack differentiate the messages from each thread and avoid any bleeding of packets between messages. This will effectively allow you to benefit from the stack's ability to decipher which messages (and message segements) are intended to be grouped together before passing them to your application as a complete message.
The message design you describe while very primitive can only work (reliably) if you separate your threads to different connections (or otherwise provide a guarantee that only a single message will be sent from the client at a time). You employee a very primitive message framing technique to your communication which will aide in your effort to determining message boundries but the reason it is failing is because socketData.m_currentSocket.EndReceive(asyn); will only tell you the number of bytes received when the event is raised (not necessarily the total number of bytes in the message). Rather than relying on it to tell you how many bytes have been read, I'd suggest reading the incoming message incrementally from a loop within your async handler reading very small message segments until it discovers your end of message byte sequence. Doing so will tell your event when to quit reading and to pass the data on to something else to process it.
The way I typically approach message framing is to have a before message eye-catcher (some value that will rarely if ever be seen in the messaging), followed by the length of the message (encoded to taste, I personally use binary encoding for it's efficiency), , the message content, and finally a second eye-catcher at the end of your message. The eye-catchers serve as logical queues for message breaks in the protocol and the message length tells your server explicitly how many bytes to wait for. Doing it this way, you are guaranteed to receive the number of bytes necessary (if you don't it is a problem so discard and/or throw exception) and it provides a very explicit boundary between messages that you can code to, which allows intermittent spot checking and validation.

Simply use Dictionary<String,List<Char>> to replace your current GLOBAL_BUFFER,
store different PACKET_ID data into different List.
I strongly recommend you a perfect Socket Framework SuperSocket, you needn't write any socket code, it will significantly improve your development efficiency.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.