C# Threadpooling HttpWebRequests

C# Threadpooling HttpWebRequests - c#

I've read and looked a quite a few examples for Threadpooling but I just cant seem to understand it they way I need to. What I have manage to get working is not really what I need. It just runs the function in its own thread.
public static void Main()
{
while (true)
{
try
{
ThreadPool.QueueUserWorkItem(new WaitCallback(Process));
Console.WriteLine("ID has been queued for fetching");
}
catch (Exception ex)
{
Console.WriteLine("Error: " + ex.Message);
}
Console.ReadLine();
}
}
public static void Process(object state)
{
var s = StatsFecther("byId", "0"); //returns all player stats
Console.WriteLine("Account: " + s.nickname);
Console.WriteLine("ID: " + s.account_id);
Console.ReadLine();
}
What I'm trying to do is have about 50 threads going (maybe more) that fetch serialized php data containing player stats. Starting from user 0 all the way up to a user ID i specify (300,000). My question is not about how to fetch the stats I know how to get the stats and read them, But how I write a Threadpool that will keep fetching stats till it gets to 300,000th user ID without stepping on the toes of the other threads and saves the stats as it retrieves them to a Database.

static int _globalId = 0;
public static void Process(object state)
{
// each queued Process call gets its own player ID to fetch
processId = InterlockedIncrement(ref _globalId);
var s = StatsFecther("byId", processId); //returns all player stats
Console.WriteLine("Account: " + s.nickname);
Console.WriteLine("ID: " + s.account_id);
Console.ReadLine();
}
This is the simplest thing to do. But is far from optimal. You are using synchronous calls, you are relying on the ThreadPool to throttle your call rate, you have no retry policy for failed calls and your application will behave extremly bad under error conditions (when the web calls are failing).
First you should consider using the async methods of WebRequest: BeginGetRequestStream (if you POST and have a request body) and/or BeginGetResponse. These methods scale much better and you'll get a higher troughput for less CPU (if the back end can keep up of course).
Second you should consider self-throthling. On a similar project I used a pending request count. On success, each call would submit 2 more calls, capped with the throtling count. On failure the call would not submit anything. If no calls are pending, a timer based retry submits a new call every minute. This way you only attempt once per minute when the service is down, saving your own resources from spinning w/o traction, and you increase the throughput back up to the throtling cap when the service is up.
You should also know that the .Net framework will limit the number of concurent conncetions it makes to any resource. You must find your destination ServicePoint and change the ConnectionLimit from its default value (2) to the max value you are willing to throttle on.
About the database update part, there are way to many variables at play and way too little information to give any meaningfull advice. Some general advice would be use asynchronous methods in the database call also, size yoru conneciton pool to allow for your throtling cap, make sure your updates use the player ID as a key so you don't deadlock on updating the same record from different threads.

How do you determine the user ID? One option is to segment all the threads so that thread X deals with ID's from 0 - N, and so on, as a fraction of how many threads you have.

Related

Azure EventHub: Send Async performance

I have pretty naive code :
public async Task Produce(string topic, object message, MessageHeader messageHeaders)
{
try
{
var producerClient = _EventHubProducerClientFactory.Get(topic);
var eventData = CreateEventData(message, messageHeaders);
messageHeaders.Times?.Add(DateTime.Now);
await producerClient.SendAsync(new EventData[] { eventData });
messageHeaders.Times?.Add(DateTime.Now);
//.....
Log.Info($"Milliseconds spent: {(messageHeaders.Times[1]- messageHeaders.Times[0]).TotalMilliseconds});
}
}
private EventData CreateEventData(object message, MessageHeader messageHeaders)
{
var eventData = new EventData(Encoding.UTF8.GetBytes(JsonConvert.SerializeObject(message)));
eventData.Properties.Add("CorrelationId", messageHeaders.CorrelationId);
if (messageHeaders.DateTime != null)
eventData.Properties.Add("DateTime", messageHeaders.DateTime?.ToString("s"));
if (messageHeaders.Version != null)
eventData.Properties.Add("Version", messageHeaders.Version);
return eventData;
}
in logs I had values for almost 1 second (~ 800 milliseconds)
What could be a reason for such long execution time?

The EventHubProducerClient opens connections to the Event Hubs service lazily, waiting until the first time an operation requires it. In your snippet, the call to SendAsync triggers an AMQP connection to be created, an AMQP link to be created, and authentication to be performed.
Unless the client is closed, most future calls won't incur that overhead as the connection and link are persistent. Most being an important distinction in that statement, as the client may need to reconnect in the face of a network error, when activity is low and the connection idles out, or if the Event Hubs service terminates the connection/link.
As Serkant mentions, if you're looking to understand timings, you'd probably be best served using a library like Benchmark.NET that works ove a large number of iterations to derive statistically meaningful results.

You are measuring the first 'Send'. That will incur some overhead that other Sends won't. So, always do warm up first like send single event and then measure the next one.
Another important thing. It is not right to measure just single 'Send' call. Measure bunch of calls instead and calculate latency percentile. That should provide a better figure for your tests.

Monitor.TryEnter and Threading.Timer race condition

I have a Windows service that every 5 seconds checks for work. It uses System.Threading.Timer for handling the check and processing and Monitor.TryEnter to make sure only one thread is checking for work.
Just assume it has to be this way as the following code is part of 8 other workers that are created by the service and each worker has its own specific type of work it needs to check for.
readonly object _workCheckLocker = new object();
public Timer PollingTimer { get; private set; }
void InitializeTimer()
{
if (PollingTimer == null)
PollingTimer = new Timer(PollingTimerCallback, null, 0, 5000);
else
PollingTimer.Change(0, 5000);
Details.TimerIsRunning = true;
}
void PollingTimerCallback(object state)
{
if (!Details.StillGettingWork)
{
if (Monitor.TryEnter(_workCheckLocker, 500))
{
try
{
CheckForWork();
}
catch (Exception ex)
{
Log.Error(EnvironmentName + " -- CheckForWork failed. " + ex);
}
finally
{
Monitor.Exit(_workCheckLocker);
Details.StillGettingWork = false;
}
}
}
else
{
Log.Standard("Continuing to get work.");
}
}
void CheckForWork()
{
Details.StillGettingWork = true;
//Hit web server to grab work.
//Log Processing
//Process Work
}
Now here's the problem:
The code above is allowing 2 Timer threads to get into the CheckForWork() method. I honestly don't understand how this is possible, but I have experienced this with multiple clients where this software is running.
The logs I got today when I pushed some work showed that it checked for work twice and I had 2 threads independently trying to process which kept causing the work to fail.
Processing 0-3978DF84-EB3E-47F4-8E78-E41E3BD0880E.xml for Update Request. - at 09/14 10:15:501255801
Stopping environments for Update request - at 09/14 10:15:501255801
Processing 0-3978DF84-EB3E-47F4-8E78-E41E3BD0880E.xml for Update Request. - at 09/14 10:15:501255801
Unloaded AppDomain - at 09/14 10:15:10:15:501255801
Stopping environments for Update request - at 09/14 10:15:501255801
AppDomain is already unloaded - at 09/14 10:15:501255801
=== Starting Update Process === - at 09/14 10:15:513756009
Downloading File X - at 09/14 10:15:525631183
Downloading File Y - at 09/14 10:15:525631183
=== Starting Update Process === - at 09/14 10:15:525787359
Downloading File X - at 09/14 10:15:525787359
Downloading File Y - at 09/14 10:15:525787359
The logs are written asynchronously and are queued, so don't dig too deep on the fact that the times match exactly, I just wanted to point out what I saw in the logs to show that I had 2 threads hit a section of code that I believe should have never been allowed. (The log and times are real though, just sanitized messages)
Eventually what happens is that the 2 threads start downloading a big enough file where one ends up getting access denied on the file and causes the whole update to fail.
How can the above code actually allow this? I've experienced this problem last year when I had a lock instead of Monitor and assumed it was just because the Timer eventually started to get offset enough due to the lock blocking that I was getting timer threads stacked i.e. one blocked for 5 seconds and went through right as the Timer was triggering another callback and they both somehow made it in. That's why I went with the Monitor.TryEnter option so I wouldn't just keep stacking timer threads.
Any clue? In all cases where I have tried to solve this issue before, the System.Threading.Timer has been the one constant and I think its the root cause, but I don't understand why.

I can see in log you've provided that you got an AppDomain restart over there, is that correct? If yes, are you sure that you have the one and the only one object for your service during the AppDomain restart? I think that during that not all the threads are being stopped right in the same time, and some of them could proceed with polling the work queue, so the two different threads in different AppDomains got the same Id for work.
You probably could fix this with marking your _workCheckLocker with static keyword, like this:
static object _workCheckLocker;
and introduce the static constructor for your class with initialization of this field (in case of the inline initialization you could face some more complicated problems), but I'm not sure is this be enough for your case - during AppDomain restart static class will reload too. As I understand, this is not an option for you.
Maybe you could introduce the static dictionary instead of object for your workers, so you can check the Id for documents in process.
Another approach is to handle the Stopping event for your service, which probably could be called during the AppDomain restart, in which you will introduce the CancellationToken, and use it to stop all the work during such circumstances.
Also, as #fernando.reyes said, you could introduce heavy lock structure called mutex for a synchronization, but this will degrade your performance.

TL;DR
Production stored procedure has not been updated in years. Workers were getting work they should have never gotten and so multiple workers were processing update requests.
I was able to finally find the time to properly set myself up locally to act as a production client through Visual Studio. Although, I wasn't able to reproduce it like I've experienced, I did accidentally stumble upon the issue.
Those with the assumptions that multiple workers were picking up the work was indeed correct and that's something that should have never been able to happen as each worker is unique in the work they do and request.
It turns out that in our production environment, the stored procedure to retrieve work based on the work type has not been updated in years (yes, years!) of deploys. Anything that checked for work automatically got updates which meant when the Update worker and worker Foo checked at the same time, they both ended up with the same work.
Thankfully, the fix is database side and not a client update.

Better Technique: Reading Data in a Thread

I've got a routine called GetEmployeeList that loads when my Windows Application starts.
This routine pulls in basic employee information from our Active Directory server and retains this in a list called m_adEmpList.
We have a few Windows accounts set up as Public Profiles that most of our employees on our manufacturing floor use. This m_adEmpList gives our employees the ability to log in to select features using those Public Profiles.
Once all of the Active Directory data is loaded, I attempt to "auto logon" that employee based on the System.Environment.UserName if that person is logged in under their private profile. (employees love this, by the way)
If I do not thread GetEmployeeList, the Windows Form will appear unresponsive until the routine is complete.
The problem with GetEmployeeList is that we have had times when the Active Directory server was down, the network was down, or a particular computer was not able to connect over our network.
To get around these issues, I have included a ManualResetEvent m_mre with the THREADSEARCH_TIMELIMIT timeout so that the process does not go off forever. I cannot login someone using their Private Profile with System.Environment.UserName until I have the list of employees.
I realize I am not showing ALL of the code, but hopefully it is not necessary.
public static ADUserList GetEmployeeList()
{
if ((m_adEmpList == null) ||
(((m_adEmpList.Count < 10) || !m_gotData) &&
((m_thread == null) || !m_thread.IsAlive))
)
{
m_adEmpList = new ADUserList();
m_thread = new Thread(new ThreadStart(fillThread));
m_mre = new ManualResetEvent(false);
m_thread.IsBackground = true;
m_thread.Name = FILLTHREADNAME;
try {
m_thread.Start();
m_gotData = m_mre.WaitOne(THREADSEARCH_TIMELIMIT * 1000);
} catch (Exception err) {
Global.LogError(_CODEFILE + "GetEmployeeList", err);
} finally {
if ((m_thread != null) && (m_thread.IsAlive)) {
// m_thread.Abort();
m_thread = null;
}
}
}
return m_adEmpList;
}
I would like to just put a basic lock using something like m_adEmpList, but I'm not sure if it is a good idea to lock something that I need to populate, and the actual data population is going to happen in another thread using the routine fillThread.
If the ManualResetEvent's WaitOne timer fails to collect the data I need in the time allotted, there is probably a network issue, and m_mre does not have many records (if any). So, I would need to try to pull this information again the next time.
If anyone understands what I'm trying to explain, I'd like to see a better way of doing this.
It just seems too forced, right now. I keep thinking there is a better way to do it.

I think you're going about the multithreading part the wrong way. I can't really explain it, but threads should cooperate and not compete for resources, but that's exactly what's bothering you here a bit. Another problem is that your timeout is too long (so that it annoys users) and at the same time too short (if the AD server is a bit slow, but still there and serving). Your goal should be to let the thread run in the background and when it is finished, it updates the list. In the meantime, you present some fallbacks to the user and the notification that the user list is still being populated.
A few more notes on your code above:
You have a variable m_thread that is only used locally. Further, your code contains a redundant check whether that variable is null.
If you create a user list with defaults/fallbacks first and then update it through a function (make sure you are checking the InvokeRequired flag of the displaying control!) you won't need a lock. This means that the thread does not access the list stored as member but a separate list it has exclusive access to (not a member variable). The update function then replaces (!) this list, so now it is for exclusive use by the UI.
Lastly, if the AD server is really not there, try to forward the error from the background thread to the UI in some way, so that the user knows what's broken.
If you want, you can add an event to signal the thread to stop, but in most cases that won't even be necessary.

Design Model and Hints For Multithreaded Win Form Application

I am trying to design a multithreaded windows application mostly serves for our clients to send emails fastly to their customers(there can be millions as there is a big telecommunication company), and I need design hints.(I am sorry that Q is long)
I fairly read articles about the multithreaded applications. I also read about SmartThread Pool, .NET ThreadPool, Task Parallel Library and other SO questions. But I could not come with a correct design. My logic is like that :
Within start of the program(Email engine), a timer starts and check if there any email campaigns in database(Campaigns table) that has Status 1(new campaign).
If there are, Campaign Subscribers should be queried from DB and should be written to another table(SqlBulkCopy) called SubscriberReports table and update the Campaign's Status to 2 in Campaigns table.
Timer also listens Campaigns with Status 2 to call another method to customize the campaign for each Subscriber, creates a Struct that has customized properties of the Subscriber.
Thirdly a SendEmail method is invoked to send the email via SMTP. What I tried so far is below(I know that ThreadPool is wrong here, and I have bunch of other mistakes). Can you pls suggest and help me how to design such an application. Highly appreciate any help. Thanks alot for your time.
private void ProcessTimer(object Source, ElapsedEventArgs e)
{
Campaigns campaign = new Campaigns();
IEnumerable<Campaigns> campaignsListStatusOne = // Get Campaign Properties to a List
IEnumerable<Campaigns> campaignsListStatusTwo = // Get Campaign Properties to a List
foreach (Campaigns _campaign in campaignsListStatusOne)
{
ThreadPool.QueueUserWorkItem(new WaitCallback(CheckNewCampaign), _campaign.CampaignID);
}
foreach (Campaigns _campaign in campaignsListStatusTwo)
{
ThreadPool.QueueUserWorkItem(new WaitCallback(CustomizeMail), _campaign.CampaignID);
}
}
private void CheckNewCampaign(object state)
{
int campaignID = (int)state;
DataTable dtCampaignSubscribers = // get subscribers based on Campaign ID
campaign.UpdateStatus(campaignID, 2);
}
private void CustomizeMail(object state)
{
int campaignID = (int)state;
CampaignCustomazition campaignCustomizer;
IEnumerable<SubscriberReports> reportList = // get subscribers to be sent from Reports table
foreach (SubscriberReports report in reportList)
{ // 3 database related methods are here
campaignCustomizer = new CampaignCustomazition(report.CampaignID, report.SubscriberID);
campaignCustomizer.CustomizeSource(report.CampaignID, report.SubscriberID, out campaignCustomizer.source, out campaignCustomizer.format);
campaignCustomizer.CustomizeCampaignDetails(report.CampaignID, report.SubscriberID, out campaignCustomizer.subject, out campaignCustomizer.fromName, out campaignCustomizer.fromEmail, out campaignCustomizer.replyEmail);
campaignCustomizer.CustomizeSubscriberDetails(report.SubscriberID, out campaignCustomizer.email, out campaignCustomizer.fullName);
ThreadPool.QueueUserWorkItem(new WaitCallback(SendMail), campaignCustomizer);
}
}
private void SendMail(object state)
{
CampaignCustomazition campaignCustomizer = new CampaignCustomazition();
campaignCustomizer = (CampaignCustomazition)state;
//send email based on info at campaignCustomizer via SMTP and update DB record if it is success.
}

There is little to be gained here by using threading. What threads buy you is more cpu cycles. Assuming you have a machine with multiple cores, pretty standard these days. But that's not what you need to get the job done quicker. You need more dbase and email servers. Surely you only have one of each. Your program will burn very little core, it is constantly waiting for the dbase query and the email server to complete their job.
The only way to get ahead is to overlap the delays of each. One thread could constantly be waiting for the dbase engine, the other could be constantly waiting for the email server. Which is better than one thread waiting for both.
That's not likely to buy you much either though, there's a big mismatch between the two. The dbase engine can give you thousands of email addresses in a second, the email server can only a few hundred emails in a second. Everything is throttled by how fast the email server works.
Given the low odds of getting ahead, I'd recommend you don't try to get yourself into trouble with threading at all. It has a knack for producing very hard to diagnose failure if you don't lock properly. The amount of time you can spend on troubleshooting this can greatly exceed the operational gains from moving a wee bit faster.
If you are contemplating threading to avoid freezing a user interface then that's a reasonable use for threading. Use BackgroundWorker. The MSDN Library has excellent help for it.

Good advices to use EF in a multithread program?

Have you got some good advices to use EF in a multithread program ?
I have 2 layers :
a EF layer to read/write into my database
a multithread service which uses my entities (read/write) and makes some computations (I use Task Parallel Library in the framework)
How can I synchronize my object contexts in each thread ?
Do you know a good pattern to make it work ?

Good advice is - just don't :-) EF barely manages to survive one thread - the nature of the beast.
If you absolutely have to use it, make the lightest DTO-s, close OC as soon as you have the data, repack data, spawn your threads just to do calculations and nothing else, wait till they are done, then create another OC and dump data back into DB, reconcile it etc.
If another "main" thread (the one that spawns N calculation threads via TPL) needs to know when some ther thread is done fire event, just set a flag in the other thread and then let it's code check the flag in it's loop and react by creating new OC and then reconciling data if it has to.
If your situation is more simple you can adapt this - the key is that you can only set a flag and let another thread react when it's ready. That means that it's in a stable state, has finished a round of whatever it was doing and can do things without risking race conditions. Reset the flag (an int) with interchaged operations and keep some timing data to make sure that your threads don't react again within some time T - otherwire they can spend their lifetime just querying DB.

This is how I implemented it my scenario.
var processing= new ConcurrentQueue<int>();
//possible multi threaded enumeration only processed non-queued records
Parallel.ForEach(dataEnumeration, dataItem=>
{
if(!processing.Contains(dataItem.Id))
{
processing.Enqueue(dataItem.Id);
var myEntityResource = new EntityResource();
myEntityResource.EntityRecords.Add(new EntityRecord
{
Field1="Value1",
Field2="Value2"
}
);
SaveContext(myEntityResource);
var itemIdProcessed = 0;
processing.TryDequeue(out itemIdProcessed );
}
}
public void RefreshContext(DbContext context)
{
var modifiedEntries = context.ChangeTracker.Entries()
.Where(e => e.State == EntityState.Modified || e.State == EntityState.Deleted);
foreach (var modifiedEntry in modifiedEntries)
{
modifiedEntry.Reload();
}
}
public bool SaveContext(DbContext context,out Exception error, bool reloadContextFirst = true)
{
error = null;
var saved = false;
try
{
if (reloadContextFirst)
this.RefreshContext(context);
context.SaveChanges();
saved = true;
}
catch (OptimisticConcurrencyException)
{
//retry saving on concurrency error
if (reloadContextFirst)
this.RefreshContext(context);
context.SaveChanges();
saved = true;
}
catch (DbEntityValidationException dbValEx)
{
var outputLines = new StringBuilder();
foreach (var eve in dbValEx.EntityValidationErrors)
{
outputLines.AppendFormat("{0}: Entity of type \"{1}\" in state \"{2}\" has the following validation errors:",
DateTime.Now, eve.Entry.Entity.GetType().Name, eve.Entry.State);
foreach (var ve in eve.ValidationErrors)
{
outputLines.AppendFormat("- Property: \"{0}\", Error: \"{1}\"", ve.PropertyName, ve.ErrorMessage);
}
}
throw new DbEntityValidationException(string.Format("Validation errors\r\n{0}", outputLines.ToString()), dbValEx);
}
catch (Exception ex)
{
error = new Exception("Error saving changes to the database.", ex);
}
return saved;
}

I think Craig might be right about your application no needing to have threads.. but you might look for the uses of ConcurrencyCheck in your models to make sure you don't "override" your changes

I don't know how much of your application is actually number crunching. If speed is the motivation for using multi-threading then it might pay off to take a step back and gather data about where the bottle next is.
In a lot of cases I have found that the limiting factor in applications using a database server is the speed of the I/O system for your storage. For example the speed of the hard drive disk(s) and their configuration can have a huge impact. A single hard drive disk with 7,200 RPM can handle about 60 transactions per second (ball park figure depending on many factors).
So my suggestion would be to first measure and find out where the bottle next is. Chances are you don't even need threads. That would make the code substantially easier to maintain and the quality is much higher in all likelihood.

"How can I synchronize my object contexts in each thread ?"
This is going to be tough. First of all SP or the DB queries can have parallel execution plan. So if you also have parallelism on object context you have to manually make sure that you have sufficient isolation but just enough that you dont hold lock too long that you cause deadlocks.
So I would say dont need to do it .
But that might not be the answer you want. So Can you explain a bit more what you want to achieve using this mutithreading. Is it more compute bound or IO bound. If it is IO bound long running ops then look at APM by Jeff Richter.

I think your question is more about synchronization between threads and EF is irrelevvant here. If I understand correctly you want to notify threads from one group when the main thread performed some operation - in this case "SaveChanges()" operation. The threads here are like client-server applications, where one thread is a server and other threads are clients and you want client-threads to react on server activity.
As someone noticed you probably do not need threads, but let's leave it as it is.
There is no fear of dead locks as long as you are going to use separate OC per thread.
I also assume that your client threads are long-running thread in some kind of loop. If you want your code to be executed on client thread you can't use C# events.
class ClientThread {
public bool SomethingHasChanged;
public MainLoop()
{
Loop {
if (SomethingHasChanged)
{
refresh();
SomethingHasChanged = false;
}
// your business logic here
} // End Loop
}
}
Now the question is how you will set the flag in all your client-threads? You could keep references to client threads in your main thread and loop through them and set all flags to true.

Back when I used EF, I simply had one ObjectContext, to which I synchronized all access.
This isn't ideal. Your database layer would effectively be singlethreaded. But, it did keep it thread-safe in a multithreaded environment. In my case, the heavy computation was not in the database code at all - this was a game server, so game logic was of course the primary resource hog. So, I didn't have any particular need for a multithreaded DB layer.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

C# Threadpooling HttpWebRequests - c#

How do you determine the user ID? One option is to segment all the threads so that thread X deals with ID's from 0 - N, and so on, as a fraction of how many threads you have.

Related

Azure EventHub: Send Async performance

Monitor.TryEnter and Threading.Timer race condition

Better Technique: Reading Data in a Thread

Design Model and Hints For Multithreaded Win Form Application

Good advices to use EF in a multithread program?

Categories

Resources