I hope you guys will bear with me for my total lack of direction in case of Threading.
I have to implement a Mail Queue Processing System where I have to send emails queued up in a database through a Windows Service.
It is not a producer-consumer pattern. I have fetched say 10 rows at a time into a datable.
The datatable contains the serialized MailMessage object and SMTP sending details. If I have to use say a fixed number of threads (say 6 threads) Now how would I go about fetching a row from the datatable in a thread and then send the mail and again return to see if any more rows are remaining?
Any simple logic to implement this will do, preferably with a simple example in C#.
i am using .NET 3.5
Since sending emails is an I/O bound process, so spawning threads to send emails won't achieve much (if any) speedup.
If you're using the SMTP server that's part of Windows then when you "send" an email it doesn't actually get sent at that instant. It sits in a queue on the server and the server sends them as fast as it can. Sending emails is actually a slow process.
I guess what I'm saying is there are two options:
Just send them sequentially and see if that meets your performance requirements.
You could use a parallel programming concept called "Data Parallel" I've exampled it with examples in an blog post Data Parallel – Parallel Programming in C#/.NET
Basically, what you're doing is, you'll get all of your data (in one go). The reason is getting data in batches will also slow your process down so if you're interested in performance (which is why I'm guessing you're attempting to use threads), then don't do multiple rounds trips to the database server (which is also I/O bound on two levels, network I/O as well as disk I/O).
So get your data, and split it into chunks or partitions. This is all explained in the article I pointed to. The naive implementation would be the number of chunks equals the number of cores on the machine.
Each chunck is processed by one thread.
When all threads are done, you're done.
Withthe new features of the ThreadPool in .NET 4.0 (if you use Parallel.For or PLINQ or Tasks) you'll get some other benefits such as "work stealing" to further speed up the work.
Parallel.For/Parallel.ForEach will work well for you I'd think.
EDIT
Just noticed the .NET 3.5 requirement. Well the concepts still apply, but you don't have Parallel.For/ForEach. So here is an implementation (modified from my blog post) that uses the ThreadPool and using a Data Parallel technique.
private static void SendEmailsUsingThreadPool(List<Recipient> recipients)
{
var coreCount = Environment.ProcessorCount;
var itemCount = recipients.Count;
var batchSize = itemCount / coreCount;
var pending = coreCount;
using (var mre = new ManualResetEvent(false))
{
for (int batchCount = 0; batchCount < coreCount; batchCount++)
{
var lower = batchCount * batchSize;
var upper = (batchCount == coreCount - 1) ? itemCount : lower + batchSize;
ThreadPool.QueueUserWorkItem(st =>
{
for (int i = lower; i < upper; i++)
SendEmail(recipients[i]);
if (Interlocked.Decrement(ref pending) == 0)
mre.Set();
});
}
mre.WaitOne();
}
}
private static void SendEmail(Recipient recipient)
{
//Send your Emails here
}
}
class Recipient
{
public string FirstName { get; set; }
public string LastName { get; set; }
public string EmailAddress { get; set; }
}
So, get your data and call SendEmailUsingThreadPool() passing it your data. Of course don't call your method that :). If you have a DataSet/DataTable then simply modify the implementation to accept a DataSet/DataTable. This methods takes care of partioning your data into chunks so you don't have to worry about any of that. Simply call it.
You need to fetch messages in to memory in one place, and then route them to separate threads.
I guess
Related
I want to extend my experience with the .NET framework and want to build a client/server application.
Actually, the client/server is a small Point Of Sale system but first, I want to focus on the communication between server and client.
In the future, I want to make it a WPF application but for now, I simply started with a console application.
2 functionalities:
client(s) receive(s) a dataset and every 15/30min an update with changed prices/new products
(So the code will be in a Async method with a Thread.sleep for 15/30 mins).
when closing the client application, sending a kind of a report (for example, an xml)
On the internet, I found lots of examples but i can't decide which one is the best/safest/performanced manner of working so i need some advice for which techniques i should implement.
CLIENT/SERVER
I want 1 server application that handles max 6 clients. I read that threads use a lot of mb and maybe a better way will be tasks with async/await functionallity.
Example with ASYNC/AWAIT
http://bsmadhu.wordpress.com/2012/09/29/simplify-asynchronous-programming-with-c-5-asyncawait/
Example with THREADS
mikeadev.net/2012/07/multi-threaded-tcp-server-in-csharp/
Example with SOCKETS
codereview.stackexchange.com/questions/5306/tcp-socket-server
This seems to be a great example of sockets, however, the revisioned code isn't working completely because not all the classes are included
msdn.microsoft.com/en-us/library/fx6588te(v=vs.110).aspx
This example of MSDN has a lot more with Buffersize and a signal for the end of a message. I don't know if this just an "old way" to do this because in my previous examples, they just send a string from the client to the server and that's it.
.NET FRAMEWORK REMOTING/ WCF
I found also something about the remoting part of .NET and WCF but don' know if I need to implement this because i think the example with Async/Await isn't bad.
SERIALIZED OBJECTS / DATASET / XML
What is the best way to send data between it? Juse an XML serializer or just binary?
Example with Dataset -> XML
stackoverflow.com/questions/8384014/convert-dataset-to-xml
Example with Remoting
akadia.com/services/dotnet_dataset_remoting.html
If I should use the Async/Await method, is it right to something like this in the serverapplication:
while(true)
{
string input = Console.ReadLine();
if(input == "products")
SendProductToClients(port);
if(input == "rapport")
{
string Example = Console.ReadLine();
}
}
Here are several things anyone writing a client/server application should consider:
Application layer packets may span multiple TCP packets.
Multiple application layer packets may be contained within a single TCP packet.
Encryption.
Authentication.
Lost and unresponsive clients.
Data serialization format.
Thread based or asynchronous socket readers.
Retrieving packets properly requires a wrapper protocol around your data. The protocol can be very simple. For example, it may be as simple as an integer that specifies the payload length. The snippet I have provided below was taken directly from the open source client/server application framework project DotNetOpenServer available on GitHub. Note this code is used by both the client and the server:
private byte[] buffer = new byte[8192];
private int payloadLength;
private int payloadPosition;
private MemoryStream packet = new MemoryStream();
private PacketReadTypes readState;
private Stream stream;
private void ReadCallback(IAsyncResult ar)
{
try
{
int available = stream.EndRead(ar);
int position = 0;
while (available > 0)
{
int lengthToRead;
if (readState == PacketReadTypes.Header)
{
lengthToRead = (int)packet.Position + available >= SessionLayerProtocol.HEADER_LENGTH ?
SessionLayerProtocol.HEADER_LENGTH - (int)packet.Position :
available;
packet.Write(buffer, position, lengthToRead);
position += lengthToRead;
available -= lengthToRead;
if (packet.Position >= SessionLayerProtocol.HEADER_LENGTH)
readState = PacketReadTypes.HeaderComplete;
}
if (readState == PacketReadTypes.HeaderComplete)
{
packet.Seek(0, SeekOrigin.Begin);
BinaryReader br = new BinaryReader(packet, Encoding.UTF8);
ushort protocolId = br.ReadUInt16();
if (protocolId != SessionLayerProtocol.PROTOCAL_IDENTIFIER)
throw new Exception(ErrorTypes.INVALID_PROTOCOL);
payloadLength = br.ReadInt32();
readState = PacketReadTypes.Payload;
}
if (readState == PacketReadTypes.Payload)
{
lengthToRead = available >= payloadLength - payloadPosition ?
payloadLength - payloadPosition :
available;
packet.Write(buffer, position, lengthToRead);
position += lengthToRead;
available -= lengthToRead;
payloadPosition += lengthToRead;
if (packet.Position >= SessionLayerProtocol.HEADER_LENGTH + payloadLength)
{
if (Logger.LogPackets)
Log(Level.Debug, "RECV: " + ToHexString(packet.ToArray(), 0, (int)packet.Length));
MemoryStream handlerMS = new MemoryStream(packet.ToArray());
handlerMS.Seek(SessionLayerProtocol.HEADER_LENGTH, SeekOrigin.Begin);
BinaryReader br = new BinaryReader(handlerMS, Encoding.UTF8);
if (!ThreadPool.QueueUserWorkItem(OnPacketReceivedThreadPoolCallback, br))
throw new Exception(ErrorTypes.NO_MORE_THREADS_AVAILABLE);
Reset();
}
}
}
stream.BeginRead(buffer, 0, buffer.Length, new AsyncCallback(ReadCallback), null);
}
catch (ObjectDisposedException)
{
Close();
}
catch (Exception ex)
{
ConnectionLost(ex);
}
}
private void Reset()
{
readState = PacketReadTypes.Header;
packet = new MemoryStream();
payloadLength = 0;
payloadPosition = 0;
}
If you're transmitting point of sale information, it should be encrypted. I suggest TLS which is easily enabled on through .Net. The code is very simple and there are quite a few samples out there so for brevity I'm not going to show it here. If you are interested, you can find an example implementation in DotNetOpenServer.
All connections should be authenticated. There are many ways to accomplish this. I've use Windows Authentication (NTLM) as well as Basic. Although NTLM is powerful as well as automatic it is limited to specific platforms. Basic authentication simply passes a username and password after the socket has been encrypted. Basic authentication can still, however; authenticate the username/password combination against the local server or domain controller essentially impersonating NTLM. The latter method enables developers to easily create non-Windows client applications that run on iOS, Mac, Unix/Linux flavors as well as Java platforms (although some Java implementations support NTLM). Your server implementation should never allow application data to be transferred until after the session has been authenticated.
There are only a few things we can count on: taxes, networks failing and client applications hanging. It's just the nature of things. Your server should implement a method to clean up both lost and hung client sessions. I've accomplished this in many client/server frameworks through a keep-alive (AKA heartbeat) protocol. On the server side I implement a timer that is reset every time a client sends a packet, any packet. If the server doesn't receive a packet within the timeout, the session is closed. The keep-alive protocol is used to send packets when other application layer protocols are idle. Since your application only sends XML once every 15 minutes sending a keep-alive packet once a minute would able the server side to issue an alert to the administrator when a connection is lost prior to the 15 minute interval possibly enabling the IT department to resolve a network issue in a more timely fashion.
Next, data format. In your case XML is great. XML enables you to change up the payload however you want whenever you want. If you really need speed, then binary will always trump the bloated nature of string represented data.
Finally, as #NSFW already stated, threads or asynchronous doesn't really matter in your case. I've written servers that scale to 10000 connections based on threads as well as asynchronous callbacks. It's all really the same thing when it comes down to it. As #NSFW said, most of us are using asynchronous callbacks now and the latest server implementation I've written follows that model as well.
Threads are not terribly expensive, considering the amount of RAM available on modern systems, so I don't think it's helpful to optimize for a low thread count. Especially if we're talking about a difference between 1 thread and 2-5 threads. (With hundreds or thousands of threads, the cost of a thread starts to matter.)
But you do want to optimize for minimal blocking of whatever threads you do have. So for example instead of using Thread.Sleep to do work on 15 minute intervals, just set a timer, let the thread return, and trust the system to invoke your code 15 minutes later. And instead of blocking operations for reading or writing information over the network, use non-blocking operations.
The async/await pattern is the new hotness for asynchronous programming on .Net, and it is a big improvement over the Begin/End pattern that dates back to .Net 1.0. Code written with async/await is still using threads, it is just using features of C# and .Net to hide a lot of the complexity of threads from you - and for the most part, it hides the stuff that should be hidden, so that you can focus your attention on your application's features rather than the details of multi-threaded programming.
So my advice is to use the async/await approach for all of your IO (network and disk) and use timers for periodic chores like sending those updates you mentioned.
And about serialization...
One of the biggest advantages of XML over binary formats is that you can save your XML transmissions to disk and open them up using readily-available tools to confirm that the payload really contains the data that you thought would be in there. So I tend to avoid binary formats unless bandwidth is scarce - and even then, it's useful to develop most of the app using a text-friendly format like XML, and then switch to binary after the basic mechanism of sending and receiving data have been fleshed out.
So my vote is for XML.
And regarding your code example, well ther's no async/await in it...
But first, note that a typical simple TCP server will have a small loop that listens for incoming connections and starts a thread to hanadle each new connection. The code for the connection thread will then listen for incoming data, process it, and send an appropriate response. So the listen-for-new-connections code and the handle-a-single-connection code are completely separate.
So anyway, the connection thread code might look similar to what you wrote, but instead of just calling ReadLine you'd do something like "string line = await ReadLine();" The await keyword is approximately where your code lets one thread exit (after invoking ReadLine) and then resumes on another thread (when the result of ReadLine is available). Except that awaitable methods should have a name that ends with Async, for example ReadLineAsync. Reading a line of text from the network is not a bad idea, but you'll have to write ReadLineAsync yourself, building upon the existing network API.
I hope this helps.
According to my benchmark of creating nodes using
GraphClient.Create()
performance leaves much to be desired.
I've got about 10 empty nodes per second on my machine (Core i3, 8 GB RAM).
Even when I use multithreading to perform create time to each Create() call speed icreases linearly (~N times when used N threads).
I've tested both stable 1.9.2 and 2.0.0-M04. The results exactly the same.
Does anybody know what's wrong?
EDIT: I tried to use neo4j REST API and I got similar results: ~ 20 empty nodes per second and multithreading also gives no benefits.
EDIT 2: At the same time Batch REST API, that allows batch creations provides much better performance: about 250 nodes per second. It looks like there is incredible big overhead in handling single request...
Poor performance caused by overhead in processing RESTful Cypher query. Mostly it is network overhead but overhead caused by need to parse query also exists.
Use Core Java API when you interested in high performance. Core Java API provides more than 10 times faster requests processing than Cypher query language.
See this articles:
Performance of Graph Query Languages
Get the full neo4j power by using the Core Java API for traversing
your Graph data base instead of Cypher Query Language
The neo4jclient itself uses the REST API, so you're already limited in performance (by bandwidth, network latency etc) when compared to a direct API call (for which you'd need Java).
What performance are you after?
What code are you running?
Some initial thoughts & tests to try:
Obviously there are things like CPU etc which will cause some throttling, some things to consider:
Is the Neo4J server on the same machine?
Have you tried your application not through Visual Studio? (i.e. no debugging)
In my test code (below), I get 10 entries in ~200ms - can you try this code in a simple console app and see what you get?
private static void Main()
{
var client = new GraphClient(new Uri("http://localhost.:7474/db/data"));
client.Connect();
for (int i = 0; i < 10; i++)
CreateEmptyNodes(10, client);
}
private static void CreateEmptyNodes(int numberToCreate, IGraphClient client)
{
var start = DateTime.Now;
for (int i = 0; i < numberToCreate; i++)
client.Create(new object());
var timeTaken = DateTime.Now - start;
Console.WriteLine("For {0} items, I took: {1}ms", numberToCreate, timeTaken.TotalMilliseconds);
}
EDIT:
This is a raw HttpClient approach to calling the 'Create', which I believe is analagous to what neo4jclient is doing under the hood:
private async static void StraightHttpClient(int iterations, int amount)
{
var client = new HttpClient {BaseAddress = new Uri("http://localhost.:7474/db/data/")};
for (int j = 0; j < iterations; j++)
{
DateTime start = DateTime.Now;
for (int i = 0; i < amount; i++)
{
var response = await client.SendAsync(new HttpRequestMessage(HttpMethod.Post, "cypher/") { Content = new StringContent("{\"query\":\"create me\"}", Encoding.UTF8, "application/json") });
if(response.StatusCode != HttpStatusCode.OK)
Console.WriteLine("Not ok");
}
TimeSpan timeTaken = DateTime.Now - start;
Console.WriteLine("took {0}ms", timeTaken.TotalMilliseconds);
}
}
Now, if you didn't care about the response, you could just call Client.SendAsync(..) without the await, and that gets you to a spiffy ~2500 per second. However obviously the big issue here is that you haven't necessarily sent any of those creates, you've basically queued them, so shut down your program straight after, and chances are you'll have either no entries, or a very small number.
So.. clearly the code can handle firing x thousand calls a second with no problems, (I've done a similar test to the above using ServiceStack and RestSharp, both take similar times to the HttpClient).
What it can't do is send those to the actual server at the same rate, so we're limited by the windows http stack and / or how fast n4j can process the request and supply a response.
I've been trying to get what I believe to be the simplest possible form of threading to work in my application but I just can't do it.
What I want to do: I have a main form with a status strip and a progress bar on it. I have to read something between 3 and 99 files and add their hashes to a string[] which I want to add to a list of all files with their respective hashes. Afterwards I have to compare the items on that list to a database (which comes in text files).
Once all that is done, I have to update a textbox in the main form and the progressbar to 33%; mostly I just don't want the main form to freeze during processing.
The files I'm working with always sum up to 1.2GB (+/- a few MB), meaning I should be able to read them into byte[]s and process them from there (I have to calculate CRC32, MD5 and SHA1 of each of those files so that should be faster than reading all of them from a HDD 3 times).
Also I should note that some files may be 1MB while another one may be 1GB. I initially wanted to create 99 threads for 99 files but that seems not wise, I suppose it would be best to reuse threads of small files while bigger file threads are still running. But that sounds pretty complicated to me so I'm not sure if that's wise either.
So far I've tried workerThreads and backgroundWorkers but neither seem to work too well for me; at least the backgroundWorkers worked SOME of the time, but I can't even figure out why they won't the other times... either way the main form still froze.
Now I've read about the Task Parallel Library in .NET 4.0 but I thought I should better ask someone who knows what he's doing before wasting more time on this.
What I want to do looks something like this (without threading):
List<string[]> fileSpecifics = new List<string[]>();
int fileMaxNumber = 42; // something between 3 and 99, depending on file set
for (int i = 1; i <= fileMaxNumber; i++)
{
string fileName = "C:\\path\\to\\file" + i.ToString("D2") + ".ext"; // file01.ext - file99.ext
string fileSize = new FileInfo(fileName).Length.ToString();
byte[] file = File.ReadAllBytes(fileName);
// hash calculations (using SHA1CryptoServiceProvider() etc., no problems with that so I'll spare you that, return strings)
file = null; // I didn't yet check if this made any actual difference but I figured it couldn't hurt
fileSpecifics.Add(new string[] { fileName, fileSize, fileCRC, fileMD5, fileSHA1 });
}
// look for files in text database mentioned above, i.e. first check for "file bundles" with the same amount of files I have here; then compare file sizes, then hashes
// again, no problems with that so I'll spare you that; the database text files are pretty small so parsing them doesn't need to be done in an extra thread.
Would anybody be kind enough to point me in the right direction? I'm looking for the easiest way to read and hash those files quickly (I believe the hashing takes some time in which other files could already be read) and save the output to a string[], without the main form freezing, nothing more, nothing less.
I'm thankful for any input.
EDIT to clarify: by "backgroundWorkers working some of the time" I meant that (for the very same set of files), maybe the first and fourth execution of my code produces the correct output and the UI unfreezes within 5 seconds, for the second, third and fifth execution it freezes the form (and after 60 seconds I get an error message saying some thread didn't respond within that time frame) and I have to stop execution via VS.
Thanks for all your suggestions and pointers, as you all have correctly guessed I'm completely new to threading and will have to read up on the great links you guys posted.
Then I'll give those methods a try and flag the answer that helped me the most. Thanks again!
With .NET Framework 4.X
Use Directory.EnumerateFiles Method for efficient/lazy files enumeration
Use Parallel.For() to delegate parallelism work to PLINQ framework or use TPL to delegate single Task per pipeline Stage
Use Pipelines pattern to pipeline following stages: calculating hashcodes, compare with pattern, update UI
To avoid UI freeze use appropriate techniques: for WPF use Dispatcher.BeginInvoke(), for WinForms use Invoke(), see this SO answer
Considering that all this stuff has UI it might be useful adding some cancellation feature to abandon long running operation if needed, take a look at the CreateLinkedTokenSource class which allows triggering CancellationToken from the "external scope"
I can try adding an example but it's worth do it yourself so you would learn all this stuff rather than simply copy/paste - > got it working -> forgot about it.
PS: Must read - Pipelines paper at MSDN
TPL specific pipeline implementation
Pipeline pattern implementation: three stages: calculate hash, match, update UI
Three tasks, one per stage
Two Blocking Queues
//
// 1) CalculateHashesImpl() should store all calculated hashes here
// 2) CompareMatchesImpl() should read input hashes from this queue
// Tuple.Item1 - hash, Typle.Item2 - file path
var calculatedHashes = new BlockingCollection<Tuple<string, string>>();
// 1) CompareMatchesImpl() should store all pattern matching results here
// 2) SyncUiImpl() method should read from this collection and update
// UI with available results
var comparedMatches = new BlockingCollection<string>();
var factory = new TaskFactory(TaskCreationOptions.LongRunning,
TaskContinuationOptions.None);
var calculateHashesWorker = factory.StartNew(() => CalculateHashesImpl(...));
var comparedMatchesWorker = factory.StartNew(() => CompareMatchesImpl(...));
var syncUiWorker= factory.StartNew(() => SyncUiImpl(...));
Task.WaitAll(calculateHashesWorker, comparedMatchesWorker, syncUiWorker);
CalculateHashesImpl():
private void CalculateHashesImpl(string directoryPath)
{
foreach (var file in Directory.EnumerateFiles(directoryPath))
{
var hash = CalculateHashTODO(file);
calculatedHashes.Add(new Tuple<string, string>(hash, file.Path));
}
}
CompareMatchesImpl():
private void CompareMatchesImpl()
{
foreach (var hashEntry in calculatedHashes.GetConsumingEnumerable())
{
// TODO: obviously return type is up to you
string matchResult = GetMathResultTODO(hashEntry.Item1, hashEntry.Item2);
comparedMatches.Add(matchResult);
}
}
SyncUiImpl():
private void UpdateUiImpl()
{
foreach (var matchResult in comparedMatches.GetConsumingEnumerable())
{
// TODO: track progress in UI using UI framework specific features
// to do not freeze it
}
}
TODO: Consider using CancellationToken as a parameter for all GetConsumingEnumerable() calls so you easily can stop a pipeline execution when needed.
First off, you should be using a higher level of abstraction to solve this problem. You have a bunch of tasks to complete, so use the "task" abstraction. You should be using the Task Parallel Library to do this sort of thing. Let the TPL deal with the question of how many worker threads to create -- the answer could be as low as one if the work is gated on I/O.
If you do want to do your own threading, some good advice:
Do not ever block on the UI thread. That's is what is freezing your application. Come up with a protocol by which working threads can communicate with your UI thread, which then does nothing except for responding to UI events. Remember that methods of user interface controls like task completion bars must never be called by any other thread other than the UI thread.
Do not create 99 threads to read 99 files. That's like getting 99 pieces of mail and hiring 99 assistants to write responses: an extraordinarily expensive solution to a simple problem. If your work is CPU intensive then there is no point in "hiring" more threads than you have CPUs to service them. (That's like hiring 99 assistants in an office that only has four desks. The assistants spend most of their time waiting for a desk to sit at instead of reading your mail.) If your work is disk-intensive then most of those threads are going to be idle most of the time waiting for the disk, which is an even bigger waste of resources.
First, I hope you are using a built-in library for calculating hashes. It's possible to write your own, but it's far safer to use something that has been around for a while.
You may need only create as many threads as CPUs if your process is CPU intensive. If it is bound by I/O, you might be able to get away with more threads.
I do not recommend loading the entire file into memory. Your hashing library should support updating a chunk at a time. Read a chunk into memory, use it to update the hashes of each algorighm, read the next chunk, and repeat until end of file. The chunked approach will help lower your program's memory demands.
As others have suggested, look into the Task Parallel Library, particularly Data Parallelism. It might be as easy as this:
Parallel.ForEach(fileSpecifics, item => CalculateHashes(item));
Check out TPL Dataflow. You can use a throttled ActionBlock which will manage the hard part for you.
If my understanding that you are looking to perform some tasks in the background and not block your UI, then the UI BackgroundWorker would be an appropriate choice. You mentioned that you got it working some of the time, so my recommendation would be to take what you had in a semi-working state, and improve upon it by tracking down the failures. If my hunch is correct, your worker was throwing an exception, which it does not appear you are handling in your code. Unhandled exceptions that bubble out of their containing threads make bad things happen.
This code hashing one file (stream) using two tasks - one for reading, second for hashing, for more robust way you should read more chunks forward.
Because bandwidth of processor is much higher than of disk, unless you use some high speed Flash drive you gain nothing from hashing more files concurrently.
public void TransformStream(Stream a_stream, long a_length = -1)
{
Debug.Assert((a_length == -1 || a_length > 0));
if (a_stream.CanSeek)
{
if (a_length > -1)
{
if (a_stream.Position + a_length > a_stream.Length)
throw new IndexOutOfRangeException();
}
if (a_stream.Position >= a_stream.Length)
return;
}
System.Collections.Concurrent.ConcurrentQueue<byte[]> queue =
new System.Collections.Concurrent.ConcurrentQueue<byte[]>();
System.Threading.AutoResetEvent data_ready = new System.Threading.AutoResetEvent(false);
System.Threading.AutoResetEvent prepare_data = new System.Threading.AutoResetEvent(false);
Task reader = Task.Factory.StartNew(() =>
{
long total = 0;
for (; ; )
{
byte[] data = new byte[BUFFER_SIZE];
int readed = a_stream.Read(data, 0, data.Length);
if ((a_length == -1) && (readed != BUFFER_SIZE))
data = data.SubArray(0, readed);
else if ((a_length != -1) && (total + readed >= a_length))
data = data.SubArray(0, (int)(a_length - total));
total += data.Length;
queue.Enqueue(data);
data_ready.Set();
if (a_length == -1)
{
if (readed != BUFFER_SIZE)
break;
}
else if (a_length == total)
break;
else if (readed != BUFFER_SIZE)
throw new EndOfStreamException();
prepare_data.WaitOne();
}
});
Task hasher = Task.Factory.StartNew((obj) =>
{
IHash h = (IHash)obj;
long total = 0;
for (; ; )
{
data_ready.WaitOne();
byte[] data;
queue.TryDequeue(out data);
prepare_data.Set();
total += data.Length;
if ((a_length == -1) || (total < a_length))
{
h.TransformBytes(data, 0, data.Length);
}
else
{
int readed = data.Length;
readed = readed - (int)(total - a_length);
h.TransformBytes(data, 0, data.Length);
}
if (a_length == -1)
{
if (data.Length != BUFFER_SIZE)
break;
}
else if (a_length == total)
break;
else if (data.Length != BUFFER_SIZE)
throw new EndOfStreamException();
}
}, this);
reader.Wait();
hasher.Wait();
}
Rest of code here: http://hashlib.codeplex.com/SourceControl/changeset/view/71730#514336
I have a database-synchronisation task that takes some time to process, as there are in the region of 120k leaf records, but they are remote and relatively slow to access.
Currently, my app does a fairly naive process of
Get list of all the local Contacts
For each local contact, get all the related data
Then get the matching remote contact
Compare the two and do stuff to bring them in sync
Step 1 returns data before it's finished, and step 4 doesn't involve comparisons between different contacts in the same set.
What I was hoping to do was use some sort of queue construct and start populating it in step 1, then immediately move onto step 2 and start processing items as they come in, using multiple threads.
The process then becomes:
Start populating the queue with contacts
While there are items in the queue
Start a thread and:
Take the front contact from the queue
Fetch the remote contact
Compare them
Perform the required updates
Am I correct in the assumption that I can create a new ConcurrentQueue, start populating it, then loop over it as I might a single-threaded simple collection?
(I've not put in any error-checking or the actual threading, to keep the example simple)
class Program
{
static void Main(string[] args)
{
Processor p = new Processor();
p.Process();
}
}
class Processor
{
bool FetchComplete = false;
ConcurrentQueue<Contact> q = new ConcurrentQueue<Contact>();
public void Process()
{
this.PopulateQueue(); // this will be fired off using QueueUserWorkItem for example
while (FetchComplete == false)
{
if (q.Count > 0)
{
Contact contact;
q.TryDequeue(out contact);
ProcessContact(contact); // this will also be in QueueUserWorkItem
}
}
}
// a long running process that fills the queue with Contacts
private void PopulateQueue()
{
this.FetchComplete = false;
// foreach contact in database
Contact contact = new Contact(); // contact will come from DB
this.q.Enqueue(contact);
// end foreach
this.FetchComplete = true;
}
private void ProcessContact(Contact contact)
{
// do magic with contact
}
}
You might be better off using BlockingCollection instead of ConcurrentQueue. The reason being that the former will block the thread calling Take until an item appears in the queue. This would be useful when the thread processing the Contract instances clears out the queue before the fetching thread has retrieved them all.
In general your strategy is pretty solid. I use it all the time. It is often referred to as the producer-consumer pattern. When there are more than 2 stages involved in the processing then it is called the pipeline pattern. In that case you would have 2 or more queues instead of the typical one. You can imagine scenarios where each stage forwards the work item onto the next stage via another queue.
I am trying to design a multithreaded windows application mostly serves for our clients to send emails fastly to their customers(there can be millions as there is a big telecommunication company), and I need design hints.(I am sorry that Q is long)
I fairly read articles about the multithreaded applications. I also read about SmartThread Pool, .NET ThreadPool, Task Parallel Library and other SO questions. But I could not come with a correct design. My logic is like that :
Within start of the program(Email engine), a timer starts and check if there any email campaigns in database(Campaigns table) that has Status 1(new campaign).
If there are, Campaign Subscribers should be queried from DB and should be written to another table(SqlBulkCopy) called SubscriberReports table and update the Campaign's Status to 2 in Campaigns table.
Timer also listens Campaigns with Status 2 to call another method to customize the campaign for each Subscriber, creates a Struct that has customized properties of the Subscriber.
Thirdly a SendEmail method is invoked to send the email via SMTP. What I tried so far is below(I know that ThreadPool is wrong here, and I have bunch of other mistakes). Can you pls suggest and help me how to design such an application. Highly appreciate any help. Thanks alot for your time.
private void ProcessTimer(object Source, ElapsedEventArgs e)
{
Campaigns campaign = new Campaigns();
IEnumerable<Campaigns> campaignsListStatusOne = // Get Campaign Properties to a List
IEnumerable<Campaigns> campaignsListStatusTwo = // Get Campaign Properties to a List
foreach (Campaigns _campaign in campaignsListStatusOne)
{
ThreadPool.QueueUserWorkItem(new WaitCallback(CheckNewCampaign), _campaign.CampaignID);
}
foreach (Campaigns _campaign in campaignsListStatusTwo)
{
ThreadPool.QueueUserWorkItem(new WaitCallback(CustomizeMail), _campaign.CampaignID);
}
}
private void CheckNewCampaign(object state)
{
int campaignID = (int)state;
DataTable dtCampaignSubscribers = // get subscribers based on Campaign ID
campaign.UpdateStatus(campaignID, 2);
}
private void CustomizeMail(object state)
{
int campaignID = (int)state;
CampaignCustomazition campaignCustomizer;
IEnumerable<SubscriberReports> reportList = // get subscribers to be sent from Reports table
foreach (SubscriberReports report in reportList)
{ // 3 database related methods are here
campaignCustomizer = new CampaignCustomazition(report.CampaignID, report.SubscriberID);
campaignCustomizer.CustomizeSource(report.CampaignID, report.SubscriberID, out campaignCustomizer.source, out campaignCustomizer.format);
campaignCustomizer.CustomizeCampaignDetails(report.CampaignID, report.SubscriberID, out campaignCustomizer.subject, out campaignCustomizer.fromName, out campaignCustomizer.fromEmail, out campaignCustomizer.replyEmail);
campaignCustomizer.CustomizeSubscriberDetails(report.SubscriberID, out campaignCustomizer.email, out campaignCustomizer.fullName);
ThreadPool.QueueUserWorkItem(new WaitCallback(SendMail), campaignCustomizer);
}
}
private void SendMail(object state)
{
CampaignCustomazition campaignCustomizer = new CampaignCustomazition();
campaignCustomizer = (CampaignCustomazition)state;
//send email based on info at campaignCustomizer via SMTP and update DB record if it is success.
}
There is little to be gained here by using threading. What threads buy you is more cpu cycles. Assuming you have a machine with multiple cores, pretty standard these days. But that's not what you need to get the job done quicker. You need more dbase and email servers. Surely you only have one of each. Your program will burn very little core, it is constantly waiting for the dbase query and the email server to complete their job.
The only way to get ahead is to overlap the delays of each. One thread could constantly be waiting for the dbase engine, the other could be constantly waiting for the email server. Which is better than one thread waiting for both.
That's not likely to buy you much either though, there's a big mismatch between the two. The dbase engine can give you thousands of email addresses in a second, the email server can only a few hundred emails in a second. Everything is throttled by how fast the email server works.
Given the low odds of getting ahead, I'd recommend you don't try to get yourself into trouble with threading at all. It has a knack for producing very hard to diagnose failure if you don't lock properly. The amount of time you can spend on troubleshooting this can greatly exceed the operational gains from moving a wee bit faster.
If you are contemplating threading to avoid freezing a user interface then that's a reasonable use for threading. Use BackgroundWorker. The MSDN Library has excellent help for it.