I am trying to design a multithreaded windows application mostly serves for our clients to send emails fastly to their customers(there can be millions as there is a big telecommunication company), and I need design hints.(I am sorry that Q is long)
I fairly read articles about the multithreaded applications. I also read about SmartThread Pool, .NET ThreadPool, Task Parallel Library and other SO questions. But I could not come with a correct design. My logic is like that :
Within start of the program(Email engine), a timer starts and check if there any email campaigns in database(Campaigns table) that has Status 1(new campaign).
If there are, Campaign Subscribers should be queried from DB and should be written to another table(SqlBulkCopy) called SubscriberReports table and update the Campaign's Status to 2 in Campaigns table.
Timer also listens Campaigns with Status 2 to call another method to customize the campaign for each Subscriber, creates a Struct that has customized properties of the Subscriber.
Thirdly a SendEmail method is invoked to send the email via SMTP. What I tried so far is below(I know that ThreadPool is wrong here, and I have bunch of other mistakes). Can you pls suggest and help me how to design such an application. Highly appreciate any help. Thanks alot for your time.
private void ProcessTimer(object Source, ElapsedEventArgs e)
{
Campaigns campaign = new Campaigns();
IEnumerable<Campaigns> campaignsListStatusOne = // Get Campaign Properties to a List
IEnumerable<Campaigns> campaignsListStatusTwo = // Get Campaign Properties to a List
foreach (Campaigns _campaign in campaignsListStatusOne)
{
ThreadPool.QueueUserWorkItem(new WaitCallback(CheckNewCampaign), _campaign.CampaignID);
}
foreach (Campaigns _campaign in campaignsListStatusTwo)
{
ThreadPool.QueueUserWorkItem(new WaitCallback(CustomizeMail), _campaign.CampaignID);
}
}
private void CheckNewCampaign(object state)
{
int campaignID = (int)state;
DataTable dtCampaignSubscribers = // get subscribers based on Campaign ID
campaign.UpdateStatus(campaignID, 2);
}
private void CustomizeMail(object state)
{
int campaignID = (int)state;
CampaignCustomazition campaignCustomizer;
IEnumerable<SubscriberReports> reportList = // get subscribers to be sent from Reports table
foreach (SubscriberReports report in reportList)
{ // 3 database related methods are here
campaignCustomizer = new CampaignCustomazition(report.CampaignID, report.SubscriberID);
campaignCustomizer.CustomizeSource(report.CampaignID, report.SubscriberID, out campaignCustomizer.source, out campaignCustomizer.format);
campaignCustomizer.CustomizeCampaignDetails(report.CampaignID, report.SubscriberID, out campaignCustomizer.subject, out campaignCustomizer.fromName, out campaignCustomizer.fromEmail, out campaignCustomizer.replyEmail);
campaignCustomizer.CustomizeSubscriberDetails(report.SubscriberID, out campaignCustomizer.email, out campaignCustomizer.fullName);
ThreadPool.QueueUserWorkItem(new WaitCallback(SendMail), campaignCustomizer);
}
}
private void SendMail(object state)
{
CampaignCustomazition campaignCustomizer = new CampaignCustomazition();
campaignCustomizer = (CampaignCustomazition)state;
//send email based on info at campaignCustomizer via SMTP and update DB record if it is success.
}
There is little to be gained here by using threading. What threads buy you is more cpu cycles. Assuming you have a machine with multiple cores, pretty standard these days. But that's not what you need to get the job done quicker. You need more dbase and email servers. Surely you only have one of each. Your program will burn very little core, it is constantly waiting for the dbase query and the email server to complete their job.
The only way to get ahead is to overlap the delays of each. One thread could constantly be waiting for the dbase engine, the other could be constantly waiting for the email server. Which is better than one thread waiting for both.
That's not likely to buy you much either though, there's a big mismatch between the two. The dbase engine can give you thousands of email addresses in a second, the email server can only a few hundred emails in a second. Everything is throttled by how fast the email server works.
Given the low odds of getting ahead, I'd recommend you don't try to get yourself into trouble with threading at all. It has a knack for producing very hard to diagnose failure if you don't lock properly. The amount of time you can spend on troubleshooting this can greatly exceed the operational gains from moving a wee bit faster.
If you are contemplating threading to avoid freezing a user interface then that's a reasonable use for threading. Use BackgroundWorker. The MSDN Library has excellent help for it.
Related
as I am new in multithreaded application I would like to have some advice from more experienced people before starting to write the code...
I need to queue data received on serial port in serial port event for further processing.
So I have the following event handler:
void jmPort_ReceivedEvent(object source, SerialEventArgs e)
{
SetStatusLabel("Iddle...", lbStatus);
SetPicVisibility(ledNotReceiving, true);
SetPicVisibility(ledReceiving, false);
String st = jmPort.ReadLine();
if (st != null)
{
lines.Enqueue(st); //"lines" is the ConcurrentQueue<string> object
StartDataProcessing(lines); //???
SetStatusLabel("Receiving data...", lbStatus);
SetPicVisibility(ledNotReceiving, false);
SetPicVisibility(ledReceiving, true);
}
else
{
jmPort.Close();
jmPort.Open();
}
}
Within the StartDataProcessing I need to dequeue strings and update MANY UI controlls (using the InvokeRequired...this I already know :-)).
What is the best approach and colision free (without deadlock) approach to achieve this?
How to call StartDataProcessing method in more threads and safely dequeue (TryDequeue) the lines queue, make all needed computations and update UI controlls?
I have to appoint that the communication is very fast and that I am not using the standard SerialPort class. If I simply write all received strings without further processing to console window it works just well.
I am working in .NET 4.5.
Thank you for any advice...
Updated question: Ok, so what will be the best way to run the task from the datareceived event using TPL? Is it necessary to create another class (object) that will process data and use callbacks to update UI or it is possible to load some form method from the event? I'll could be very happy if someone can give me the direction what exactly to do within the datareceived event. What to do as the first step because studying all possible ways is not the solution I have time for. I need to begin with some particular way... There is so many different possible multithreading approaches and after reading about them I am still more confused and I don't know what will be the best a fastest solution... Usual Thread(s), BackgroundWorker, TPL, async-await...? :-( Because my application uses .NET 4.5 I would like to use some state-of-the-art solution :-) Thank you for any advice...
So after a lot of trying it is working to my satisfaction now.
Finally I've used the standard .NET SerialPort class as the third-party Serial class causes somae problems with higher baudrates (115200). It uses WinAPI directly so the finall code was mixed - managed and unmanaged. Now, even the standard .NET 4.5 SerialPort class works well (I've let my application successfully running through a whole night).
So, for everyone that need to deal with C#, SerialPort and higher rates (only for clarification - the device sending messages to PC is the STM32F407 /using USART 2/. I've tried it also with Arduino Due and it works as well) my datareceived event is in the following form now:
private void serialPort1_DataReceived(object sender, System.IO.Ports.SerialDataReceivedEventArgs e)
{
//the SetXXXXX functions are using the .InvokeRequired approach
//because the UI components are updated from another thread than
//the thread they were created in
SetStatusLabel("Iddle...", lbStatus);
SetPicVisibility(Form1.frm.ledNotReceiving, true);
SetPicVisibility(Form1.frm.ledReceiving, false);
String st = serialPort1.ReadLine();
if (st != null)
{
lines.Enqueue(st);
Task.Factory.StartNew(() => StartDataProcessing(lines)); // lines is global ConcurrentQueue object so in fact there is no need to pass it as parameter
SetStatusLabel("Receiving data...", lbStatus);
SetPicVisibility(Form1.frm.ledNotReceiving, false);
SetPicVisibility(Form1.frm.ledReceiving, true);
}
}
Within the StartDataProcessing function:
1. TryDequeue(lines, out str)
2. Use the ThreadPool.QueueUserWorkItem(lCallBack1, tmp); where tmp is needed part of the str (without EOF, without the message number etc.)
lCallBack1 = new WaitCallback(DisplayData);
Within the DisplayData function all the UI controls are updated
This approach mixes the ThreadPool and TPL ways but it is not a problem because the ThreadPool is used by TPL in background operation anyway.
Another working method I've tried was the following:
ThreadPool.QueueUserWorkItem(lCallBack, lines);
instead of :
Task.Factory.StartNew(() => StartDataProcessing(lines));
This method was working well but I've not tested it in over night run.
By my subjective perception the Task.... method updated the controls more smoothly but it can be only my personal feeling :-)
So, I hope this answer will help someone as I know from forums that many people are dealing with with unreliable communication based on the micocontroller <--> PC
My (surprising :-) ) conclusion is that the standard .NET SerialPort is able to handle messages even at higher baudrates. If you still run into troubles with buffer overrun then try to play with the SerialPort buffer size and SerialPort threshold. For me the settings 1024/500 are satisfactory (max size of the message send by microcontroller is 255 bytes so 500 bytes means that 2 messages are in buffer before the event is fired.)
You can also remove all SetXXXX calls from the datareceived event as they are not really needed and they can slow down the communication a little...
I am very close to real-time data capturing now and it is exactly what I've needed.
Good luck to everyone :-)
Within the StartDataProcessing I need to dequeue strings and update MANY UI controlls
No, you do not. You need to dequeue strings and then enqueue them again into the multiple queues for the different segments of the UI.
If you want to be fast, you scatter all operations and definitely the UI into separate windows that run their own separate message pumps and thus can update independently in separate UI threads.
The general process would be:
1 thread handles the serial port and takes the data and queues it.
Another one dequeues it and distributes it to separate processing threads from which
the data goes to multiple output queues all responsible for one part of the UI (depending on whether the UI Will turn a bottleneck).
There is no need to be thread safe in dequeuing. How serial is the data? Can you skip data when another update for the same piece arrives?
Read up on TPL and tasks - there are base libraries for parallel processing which come with a ton of documentation.
Background
Hi.
I write a program that analyzes the packets for specific words contained therein. I need to analyze outgoing email, jabber, ICQ. If the words are found, the packet is blocked.I did it, but I have a problem with the files and sending email through the web.
Problems
Simple code:
while (Ndisapi.ReadPacket(hNdisapi, ref Request))
{
// some work
switch (protocol)
{
//....
case "HTTP":
// parse packet(byte[])
HTTP.HttpField field = HTTP.ParseHttp(ret);
if (field != null && field.Method == HTTP.HttpMethod.POST)
{
// analyze packet and drop if needed
DoWork();
}
}
The problem is the following. For example, I attach to email the file of 500 KB. The file will be split approximately in 340 packets. In the code above, DoWork() only for first packet will be executed.
Ok, then I need to restore session completely and pass whole session to DoWork(). I did it. But I can't wait while session is finished, because other packet( http, arp, all packets) will be suspended (And after a couple of minutes the Internet is disconnected).
Therefore, the first question:
How to solve this problem (may be advice for design program)?
Now the email, suppose this code:
switch (protocol)
{
//....
case "HTTP":
// parse packet(byte[])
var httpMimeMessage = Mime.Parse(ret);
// analyze packet and drop if needed
DoSomeWork();
break;
}
For example, we are looking for word "Finance". Then, if we open any website and there will be a word finance then packet is blocked.
Second question: How do I determine that this is the e-mail?
Thanks and sorry for my English.
To be able to analyze more than one packet/stream at the same time, you'll need to refactor your solution to use threading or some other form of multitasking and since your task appears to be both compute and io-intensive, you'll probably want to take a hard look at how to leverage event-handling at the operating system level (select, epoll, or the equivalent for your target platform).
And to answer your second question regarding email, you'll need to be able to identify and track the tcp session used to deliver email messages from client to server, assuming the session hasn't been encrypted.
As I'm sure you already know, the problem you're trying to solve is a very complicated one, requiring very specialized skills like realtime programming, deep knowledge of networking protocols, etc.
Of course, there are several "deep packet inspection" solutions out there already that do all of this for you, (typically used by public companies to fulfill regulatory requirements like Sarbanes-Oxley), but they are quite expensive.
I hope you guys will bear with me for my total lack of direction in case of Threading.
I have to implement a Mail Queue Processing System where I have to send emails queued up in a database through a Windows Service.
It is not a producer-consumer pattern. I have fetched say 10 rows at a time into a datable.
The datatable contains the serialized MailMessage object and SMTP sending details. If I have to use say a fixed number of threads (say 6 threads) Now how would I go about fetching a row from the datatable in a thread and then send the mail and again return to see if any more rows are remaining?
Any simple logic to implement this will do, preferably with a simple example in C#.
i am using .NET 3.5
Since sending emails is an I/O bound process, so spawning threads to send emails won't achieve much (if any) speedup.
If you're using the SMTP server that's part of Windows then when you "send" an email it doesn't actually get sent at that instant. It sits in a queue on the server and the server sends them as fast as it can. Sending emails is actually a slow process.
I guess what I'm saying is there are two options:
Just send them sequentially and see if that meets your performance requirements.
You could use a parallel programming concept called "Data Parallel" I've exampled it with examples in an blog post Data Parallel – Parallel Programming in C#/.NET
Basically, what you're doing is, you'll get all of your data (in one go). The reason is getting data in batches will also slow your process down so if you're interested in performance (which is why I'm guessing you're attempting to use threads), then don't do multiple rounds trips to the database server (which is also I/O bound on two levels, network I/O as well as disk I/O).
So get your data, and split it into chunks or partitions. This is all explained in the article I pointed to. The naive implementation would be the number of chunks equals the number of cores on the machine.
Each chunck is processed by one thread.
When all threads are done, you're done.
Withthe new features of the ThreadPool in .NET 4.0 (if you use Parallel.For or PLINQ or Tasks) you'll get some other benefits such as "work stealing" to further speed up the work.
Parallel.For/Parallel.ForEach will work well for you I'd think.
EDIT
Just noticed the .NET 3.5 requirement. Well the concepts still apply, but you don't have Parallel.For/ForEach. So here is an implementation (modified from my blog post) that uses the ThreadPool and using a Data Parallel technique.
private static void SendEmailsUsingThreadPool(List<Recipient> recipients)
{
var coreCount = Environment.ProcessorCount;
var itemCount = recipients.Count;
var batchSize = itemCount / coreCount;
var pending = coreCount;
using (var mre = new ManualResetEvent(false))
{
for (int batchCount = 0; batchCount < coreCount; batchCount++)
{
var lower = batchCount * batchSize;
var upper = (batchCount == coreCount - 1) ? itemCount : lower + batchSize;
ThreadPool.QueueUserWorkItem(st =>
{
for (int i = lower; i < upper; i++)
SendEmail(recipients[i]);
if (Interlocked.Decrement(ref pending) == 0)
mre.Set();
});
}
mre.WaitOne();
}
}
private static void SendEmail(Recipient recipient)
{
//Send your Emails here
}
}
class Recipient
{
public string FirstName { get; set; }
public string LastName { get; set; }
public string EmailAddress { get; set; }
}
So, get your data and call SendEmailUsingThreadPool() passing it your data. Of course don't call your method that :). If you have a DataSet/DataTable then simply modify the implementation to accept a DataSet/DataTable. This methods takes care of partioning your data into chunks so you don't have to worry about any of that. Simply call it.
You need to fetch messages in to memory in one place, and then route them to separate threads.
I guess
I've read and looked a quite a few examples for Threadpooling but I just cant seem to understand it they way I need to. What I have manage to get working is not really what I need. It just runs the function in its own thread.
public static void Main()
{
while (true)
{
try
{
ThreadPool.QueueUserWorkItem(new WaitCallback(Process));
Console.WriteLine("ID has been queued for fetching");
}
catch (Exception ex)
{
Console.WriteLine("Error: " + ex.Message);
}
Console.ReadLine();
}
}
public static void Process(object state)
{
var s = StatsFecther("byId", "0"); //returns all player stats
Console.WriteLine("Account: " + s.nickname);
Console.WriteLine("ID: " + s.account_id);
Console.ReadLine();
}
What I'm trying to do is have about 50 threads going (maybe more) that fetch serialized php data containing player stats. Starting from user 0 all the way up to a user ID i specify (300,000). My question is not about how to fetch the stats I know how to get the stats and read them, But how I write a Threadpool that will keep fetching stats till it gets to 300,000th user ID without stepping on the toes of the other threads and saves the stats as it retrieves them to a Database.
static int _globalId = 0;
public static void Process(object state)
{
// each queued Process call gets its own player ID to fetch
processId = InterlockedIncrement(ref _globalId);
var s = StatsFecther("byId", processId); //returns all player stats
Console.WriteLine("Account: " + s.nickname);
Console.WriteLine("ID: " + s.account_id);
Console.ReadLine();
}
This is the simplest thing to do. But is far from optimal. You are using synchronous calls, you are relying on the ThreadPool to throttle your call rate, you have no retry policy for failed calls and your application will behave extremly bad under error conditions (when the web calls are failing).
First you should consider using the async methods of WebRequest: BeginGetRequestStream (if you POST and have a request body) and/or BeginGetResponse. These methods scale much better and you'll get a higher troughput for less CPU (if the back end can keep up of course).
Second you should consider self-throthling. On a similar project I used a pending request count. On success, each call would submit 2 more calls, capped with the throtling count. On failure the call would not submit anything. If no calls are pending, a timer based retry submits a new call every minute. This way you only attempt once per minute when the service is down, saving your own resources from spinning w/o traction, and you increase the throughput back up to the throtling cap when the service is up.
You should also know that the .Net framework will limit the number of concurent conncetions it makes to any resource. You must find your destination ServicePoint and change the ConnectionLimit from its default value (2) to the max value you are willing to throttle on.
About the database update part, there are way to many variables at play and way too little information to give any meaningfull advice. Some general advice would be use asynchronous methods in the database call also, size yoru conneciton pool to allow for your throtling cap, make sure your updates use the player ID as a key so you don't deadlock on updating the same record from different threads.
How do you determine the user ID? One option is to segment all the threads so that thread X deals with ID's from 0 - N, and so on, as a fraction of how many threads you have.
Having set up a ReferenceDataRequest I send it along to an EventQueue
Service refdata = _session.GetService("//blp/refdata");
Request request = refdata.CreateRequest("ReferenceDataRequest");
// append the appropriate symbol and field data to the request
EventQueue eventQueue = new EventQueue();
Guid guid = Guid.NewGuid();
CorrelationID id = new CorrelationID(guid);
_session.SendRequest(request, eventQueue, id);
long _eventWaitTimeout = 60000;
myEvent = eventQueue.NextEvent(_eventWaitTimeout);
Normally I can grab the message from the queue, but I'm hitting the situation now that if I'm making a number of requests in the same run of the app (normally around the tenth), I see a TIMEOUT EventType
if (myEvent.Type == Event.EventType.TIMEOUT)
throw new Exception("Timed Out - need to rethink this strategy");
else
msg = myEvent.GetMessages().First();
These are being made on the same thread, but I'm assuming that there's something somewhere along the line that I'm consuming and not releasing.
Anyone have any clues or advice?
There aren't many references on SO to BLP's API, but hopefully we can start to rectify that situation.
I just wanted to share something, thanks to the code you included in your initial post.
If you make a request for historical intraday data for a long duration (which results in many events generated by Bloomberg API), do not use the pattern specified in the API documentation, as it may end up making your application very slow to retrieve all events.
Basically, do not call NextEvent() on a Session object! Use a dedicated EventQueue instead.
Instead of doing this:
var cID = new CorrelationID(1);
session.SendRequest(request, cID);
do {
Event eventObj = session.NextEvent();
...
}
Do this:
var cID = new CorrelationID(1);
var eventQueue = new EventQueue();
session.SendRequest(request, eventQueue, cID);
do {
Event eventObj = eventQueue.NextEvent();
...
}
This can result in some performance improvement, though the API is known to not be particularly deterministic...
I didn't really ever get around to solving this question, but we did find a workaround.
Based on a small, apparently throwaway, comment in the Server API documentation, we opted to create a second session. One session is responsible for static requests, the other for real-time. e.g.
_marketDataSession.OpenService("//blp/mktdata");
_staticSession.OpenService("//blp/refdata");
The means one session operates in subscription mode, the other more synchronously - I think it was this duality which was at the root of our problems.
Since making that change, we've not had any problems.
My reading of the docs agrees that you need separate sessions for the "//blp/mktdata" and "//blp/refdata" services.
A client appeared to have a similar problem. I solved it by making hundreds of sessions rather than passing in hundreds of requests in one session. Bloomberg may not be to happy with this BFI (brute force and ignorance) approach as we are sending the field requests for each session but it works.
Nice to see another person on stackoverflow enjoying the pain of bloomberg API :-)
I'm ashamed to say I use the following pattern (I suspect copied from the example code). It seems to work reasonably robustly, but probably ignores some important messages. But I don't get your time-out problem. It's Java, but all the languages work basically the same.
cid = session.sendRequest(request, null);
while (true) {
Event event = session.nextEvent();
MessageIterator msgIter = event.messageIterator();
while (msgIter.hasNext()) {
Message msg = msgIter.next();
if (msg.correlationID() == cid) {
processMessage(msg, fieldStrings, result);
}
}
if (event.eventType() == Event.EventType.RESPONSE) {
break;
}
}
This may work because it consumes all messages off each event.
It sounds like you are making too many requests at once. BB will only process a certain number of requests per connection at any given time. Note that opening more and more connections will not help because there are limits per subscription as well. If you make a large number of time consuming requests simultaneously, some may timeout. Also, you should process the request completely(until you receive RESPONSE message), or cancel them. A partial request that is outstanding is wasting a slot. Since splitting into two sessions, seems to have helped you, it sounds like you are also making a lot of subscription requests at the same time. Are you using subscriptions as a way to take snapshots? That is subscribe to an instrument, get initial values, and de-subscribe. If so, you should try to find a different design. This is not the way the subscriptions are intended to be used. An outstanding subscription request also uses a request slot. That is why it is best to batch as many subscriptions as possible in a single subscription list instead of making many individual requests. Hope this helps with your use of the api.
By the way, I can't tell from your sample code, but while you are blocked on messages from the event queue, are you also reading from the main event queue while(in a seperate event queue)? You must process all the messages out of the queue, especially if you have outstanding subscriptions. Responses can queue up really fast. If you are not processing messages, the session may hit some queue limits which may be why you are getting timeouts. Also, if you don't read messages, you may be marked a slow consumer and not receive more data until you start consuming the pending messages. The api is async. Event queues are just a way to block on specific requests without having to process all messages from the main queue in a context where blocking is ok, and it would otherwise be be difficult to interrupt the logic flow to process parts asynchronously.