I have a table of urls which I need to loop through, download each file, update the table and return the results. I want to run up to 10 downloads at a time so am thinking about using delegates as follows:
DataTable photos;
bool scanning = false,
complete = false;
int rowCount = 0;
public delegate int downloadFileDelegate();
public void page_load(){
photos = Database.getData...
downloadFileDelegate d = downloadFile;
d.BeginInvoke(downloadFileComplete, d);
d.BeginInvoke(downloadFileComplete, d);
d.BeginInvoke(downloadFileComplete, d);
d.BeginInvoke(downloadFileComplete, d);
d.BeginInvoke(downloadFileComplete, d);
while(!complete){}
//handle results...
}
int downloadFile(){
while(scanning){} scanning = true;
DataRow r;
for (int ii = 0; ii < rowCount; ii++) {
r = photos.Rows[ii];
if ((string)r["status"] == "ready"){
r["status"] = "running";
scanning = false; return ii;
}
if ((string)r["status"] == "running"){
scanning = false; return -2;
}
}
scanning = false; return -1;
}
void downloadFileComplete(IAsyncResult ar){
if (ar == null){ return; }
downloadFileDelegate d = (downloadFileDelegate)ar.AsyncState;
int i = d.EndInvoke(ar);
if (i == -1){ complete = true; return; }
//download file...
//update row
DataRow r = photos.Rows[i];
r["status"] = "complete";
//invoke delegate again
d.BeginInvoke(downloadFileComplete, d);
}
However when I run this it takes the same amount of time to run 5 as it does 1. I was expecting it to take 5 times faster.
Any ideas?
You look like you're trying to use lockless syncronization (using while(scanning) to check a boolean that's set at the start of the function and reset at the end), but all this succeeds in doing is only running one retrieval at a time.
Is there a reason that you don't want them to run concurrently? This seems like the entire point of your exercise. I can't see a reason for this, so I'd lose the scanning flag (and associated logic) entirely.
If you're going to take this approach, your boolean flags need to be declared as volatile (otherwise their reads could be cached and you could wait endlessly)
Your data update operations (updating the value in the DataRow) must take place on the UI thread. You'll have to wrap these operations up in a Control.Invoke or Control.BeginInvoke call, otherwise you'll be interacting with the control across thread boundaries.
BeginInvoke returns an AsyncWaitHandle. Use this for your logic that will take place when the operations are all complete. Something like this
-
WaitHandle[] handles = new WaitHandle[]
{
d.BeginInvoke(...),
d.BeginInvoke(...),
d.BeginInvoke(...),
d.BeginInvoke(...),
d.BeginInvoke(...)
}
WaitHandle.WaitAll(handles);
This will cause the calling thread to block until all of the operations are complete.
It will take the same amount of time if you are limited by network bandwidth. If you are downloading all 10 files from the same site then it won't be any quicker. Multi threading is useful when you either want the User Interface to respond or you have something processor intensive and multiple cores
WaitHandle[] handles = new WaitHandle[5];
handles[0] = d.BeginInvoke(downloadFileComplete, d).AsyncWaitHandle;
handles[1] = d.BeginInvoke(downloadFileComplete, d).AsyncWaitHandle;
handles[2] = d.BeginInvoke(downloadFileComplete, d).AsyncWaitHandle;
handles[3] = d.BeginInvoke(downloadFileComplete, d).AsyncWaitHandle;
handles[4] = d.BeginInvoke(downloadFileComplete, d).AsyncWaitHandle;
WaitHandle.WaitAll(handles);
There are numerous things in this implementation that are not safe for concurrent use.
But the one that is likely causing the effect you describe is the fact that downloadFile() has a while() loop that examines the scanning variable. This variable is shared by all instances of the running delegate. This while loop prevents the delegates from running concurrently.
This "scanning loop" is not a proper threading construct - it is possible for two threads to both read the variable and both set it at the same time. You should really use a Semaphore or lock() statement to to protected the DataTable from concurrent access. Since each thread is spending most of it's time waiting for scanning to be false, the downloadFile method cannot run concurrently.
You should reconsider how you've structured this code so that downloading the data and updating the structures in your application are not in contention.
Something like this would work better I think.
public class PhotoDownload
{
public ManualResetEvent Complete { get; private set; }
public Object RequireData { get; private set; }
public Object Result { get; private set; }
}
public void DownloadPhotos()
{
var photos = new List<PhotoDownload>();
// build photo download list
foreach (var photo in photos)
{
ThreadPool.QueueUserWorkItem(DownloadPhoto, photo);
}
// wait for the downloads to complete
foreach (var photo in photos)
{
photo.Complete.WaitOne();
}
// make sure everything happened correctly
}
public void DownloadPhoto(object state)
{
var photo = state as PhotoDownload;
try
{
// do not access fields in this class
// everything should be inside the photo object
}
finally
{
photo.Complete.Set();
}
}
Related
I have a thread that handles the message receiving every 10 seconds and have another one write these messages to the database every minute.
Each message has a different sender which is named serialNumber in my case.
Therefore, I created a ConcurrentDictionary like below.
public ConcurrentDictionary<string, ConcurrentQueue<PacketModel>> _dicAllPackets;
The key of the dictionary is serialNumber and the value is the collection of 1-minute messages. The reason I want to collect a minute of data is instead of going database every 10 seconds is go once in every minute so I can reduce the process by 1/6 times.
public class ShotManager
{
private const int SLEEP_THREAD_FOR_FILE_LIST_DB_SHOOTER = 25000;
private bool ACTIVE_FILE_DB_SHOOT_THREAD = false;
private List<Devices> _devices = new List<Devices>();
public ConcurrentDictionary<string, ConcurrentQueue<PacketModel>> _dicAllPackets;
public ShotManager()
{
ACTIVE_FILE_DB_SHOOT_THREAD = Utility.GetAppSettings("AppConfig", "0", "ACTIVE_LIST_DB_SHOOT") == "1";
init();
}
private void init()
{
using (iotemplaridbContext dbContext = new iotemplaridbContext())
_devices = (from d in dbContext.Devices select d).ToList();
if (_dicAllPackets is null)
_dicAllPackets = new ConcurrentDictionary<string, ConcurrentQueue<PacketModel>>();
foreach (var device in _devices)
{
if(!_dicAllPackets.ContainsKey(device.SerialNumber))
_dicAllPackets.TryAdd(device.SerialNumber, new ConcurrentQueue<PacketModel> { });
}
}
public void Spinner()
{
while (ACTIVE_FILE_DB_SHOOT_THREAD)
{
try
{
Parallel.ForEach(_dicAllPackets, devicePacket =>
{
Thread.Sleep(100);
readAndShot(devicePacket);
});
Thread.Sleep(SLEEP_THREAD_FOR_FILE_LIST_DB_SHOOTER);
//init();
}
catch (Exception ex)
{
//init();
tLogger.EXC("Spinner exception for write...", ex);
}
}
}
public void EnqueueObjectToQueue(string serialNumber, PacketModel model)
{
if (_dicAllPackets != null)
{
if (!_dicAllPackets.ContainsKey(serialNumber))
_dicAllPackets.TryAdd(serialNumber, new ConcurrentQueue<PacketModel> { });
else
_dicAllPackets[serialNumber].Enqueue(model);
}
}
private void readAndShot(KeyValuePair<string, ConcurrentQueue<PacketModel>> keyValuePair)
{
StringBuilder sb = new StringBuilder();
if (keyValuePair.Value.Count() <= 0)
{
return;
}
sb.AppendLine($"INSERT INTO ......) VALUES(");
//the reason why I don't use while(TryDequeue(out ..)){..} is there's constantly enqueue to this dictionary, so the thread will be occupied with a single device for so long
for (int i = 0; i < 10; i++)
{
keyValuePair.Value.TryDequeue(out PacketModel packet);
if (packet != null)
{
/*
*** do something and fill the sb...
*/
}
else
{
Console.WriteLine("No packet found! For Device: " + keyValuePair.Key);
break;
}
}
insertIntoDB(sb.ToString()[..(sb.Length - 5)] + ";");
}
}
EnqueueObjectToQueue caller is from a different class like below.
private void packetToDictionary(string serialNumber, string jsonPacket, string messageTimeStamp)
{
PacketModel model = new PacketModel {
MachineData = jsonPacket,
DataInsertedAt = messageTimeStamp
};
_shotManager.EnqueueObjectToQueue(serialNumber, model);
}
How I call the above function is from the handler function itself.
private void messageReceiveHandler(object sender, MessageReceviedEventArgs e){
//do something...parse from e and call the func
string jsonPacket = ""; //something parsed from e
string serialNumber = ""; //something parsed from e
string message_timestamp = DateTime.Now().ToString("yyyy-MM-dd HH:mm:ss");
ThreadPool.QueueUserWorkItem(state => packetToDictionary(serialNumber, str, message_timestamp));
}
The problem is sometimes some packets are enqueued under the wrong serialNumber or repeat itself(duplicate entry).
Is it clever to use ConcurrentQueue in a ConcurrentDictionary like this?
No, it's not a good idea to use a ConcurrentDictionary with nested ConcurrentQueues as values. It's impossible to update atomically this structure. Take this for example:
if (!_dicAllPackets.ContainsKey(serialNumber))
_dicAllPackets.TryAdd(serialNumber, new ConcurrentQueue<PacketModel> { });
else
_dicAllPackets[serialNumber].Enqueue(model);
This little piece of code is riddled with race conditions. A thread that is running this code can be intercepted by another thread at any point between the ContainsKey, TryAdd, the [] indexer and the Enqueue invocations, altering the state of the structure, and invalidating the conditions on which the correctness of the current thread's work is based.
A ConcurrentDictionary is a good idea when you have a simple Dictionary that contains immutable values, you want to use it concurrently, and using a lock around each access could potentially create significant contention. You can read more about this here: When should I use ConcurrentDictionary and Dictionary?
My suggestion is to switch to a simple Dictionary<string, Queue<PacketModel>>, and synchronize it with a lock. If you are careful and you avoid doing anything irrelevant while holding the lock, the lock will be released so quickly that rarely other threads will be blocked by it. Use the lock just to protect the reading and updating of a specific entry of the structure, and nothing else.
Alternative designs
A ConcurrentDictionary<string, Queue<PacketModel>> structure might be a good option, under the condition that you never removed queues from the dictionary. Otherwise there is still space for race conditions to occur. You should use exclusively the GetOrAdd method to get or add atomically a queue in the dictionary, and also use always the queue itself as a locker before doing anything with it (either reading or writing):
Queue<PacketModel> queue = _dicAllPackets
.GetOrAdd(serialNumber, _ => new Queue<PacketModel>());
lock (queue)
{
queue.Enqueue(model);
}
Using a ConcurrentDictionary<string, ImmutableQueue<PacketModel>> is also possible because in this case the value of the ConcurrentDictionary is immutable, and you won't need to lock anything. You'll need to use always the AddOrUpdate method, in order to update the dictionary with a single call, as an atomic operation.
_dicAllPackets.AddOrUpdate
(
serialNumber,
key => ImmutableQueue.Create<PacketModel>(model),
(key, queue) => queue.Enqueue(model)
);
The queue.Enqueue(model) call inside the updateValueFactory delegate does not mutate the queue. Instead it creates a new ImmutableQueue<PacketModel> and discards the previous one. The immutable collections are not very efficient in general. But if your goal is to minimize the contention between threads, at the cost of increasing the work that each thread has to do, then you might find them useful.
I've built a program that
takes in a list of record data from a file
parses and cleans up each record in a parsing object
outputs it to an output file
So far this has worked on a single thread, but considering the fact that records can exceed 1 million in some cases, we want to implement this in a multi threading context. Multi threading is new to me in .Net, and I've given it a shot but its not working. Below I will provide more details and code:
Main Class (simplified):
public class MainClass
{
parseObject[] parseObjects;
Thread[] threads;
List<InputLineItem> inputList = new List<InputLineItem>();
FileUtils fileUtils = new FileUtils();
public GenParseUtilsThreaded(int threadCount)
{
this.threadCount = threadCount;
Init();
}
public void Init()
{
inputList = fileUtils.GetInputList();
parseObjects = new parseObject[threadCount - 1];
threads = new Thread[threadCount - 1];
InitParseObjects();
Parse();
}
private void InitParseObjects()
{
//using a ref of fileUtils to use as my lock expression
parseObjects[0] = new ParseObject(ref fileUtils);
parseObjects[0].InitValues();
for (int i = 1; i < threadCount - 1; i++)
{
parseObjects[i] = new parseObject(ref fileUtils);
parseObjects[i].InitValues();
}
}
private void InitThreads()
{
for (int i = 0; i < threadCount - 1; i++)
{
Thread t = new Thread(new ThreadStart(parseObjects[0].CleanupAndParseInput));
threads[i] = t;
}
}
public void Parse()
{
try
{
InitThreads();
int objectIndex = 0;
foreach (InputLineItem inputLineItem in inputList)
{
parseObjects[0].inputLineItem = inputLineItem;
threads[objectIndex].Start();
objectIndex++;
if (objectIndex == threadCount)
{
objectIndex = 0;
InitThreads(); //do i need to re-init the threads after I've already used them all once?
}
}
}
catch (Exception e)
{
Console.WriteLine("(286) The following error occured: " + e);
}
}
}
}
And my Parse object class (also simplified):
public class ParseObject
{
public ParserLibrary parser { get; set; }
public FileUtils fileUtils { get; set; }
public InputLineItem inputLineItem { get; set; }
public ParseObject( ref FileUtils fileUtils)
{
this.fileUtils = fileUtils;
}
public void InitValues()
{
//relevant config of parser library object occurs here
}
public void CleanupFields()
{
parser.Clean(inputLineItem.nameValue);
inputLineItem.nameValue = GetCleanupUpValueFromParser();
}
private string GetCleanupFieldValue()
{
//code to extract cleanup up value from parses
}
public void CleanupAndParseInput()
{
CleanupFields();
ParseInput();
}
public void ParseInput()
{
try
{
parser.Parse(InputLineItem.NameValue);
}
catch (Exception e)
{
}
try
{
lock (fileUtils)
{
WriteOutputToFile(inputLineItem);
}
}
catch (Exception e)
{
Console.WriteLine("(414) Failed to write to output: " + e);
}
}
public void WriteOutputToFile(InputLineItem inputLineItem)
{
//writes updated value to output file
}
}
The error I get is when trying to run the Parse function, I get this message:
An unhandled exception of type 'System.AccessViolationException' occurred in GenParse.NET.dll
Attempted to read or write protected memory. This is often an indication that other memory is corrupt.
That being said, I feel like there's a whole lot more that I'm doing wrong here aside from what is causing that error.
I also have further questions:
Do I create multiple parse objects and iteratively feed them to each thread as I'm attempting to do, or should I use one Parse object that gets shared or cloned across each thread?
If, outside the thread, I change a value in the object that I'm passing to the thread, will that change reflect in the object passed to the thread? i.e, is the object passed by value or reference?
Is there a more efficient way for each record to be assigned to a thread and its parse object than I am currently doing with the objectIndex iterator?
THANKS!
Do I create multiple parse objects and iteratively feed them to each thread as I'm attempting to do, or should I use one Parse object that gets shared or cloned across each thread?
You initialize each thread with new ThreadStart(parseObjects[0].CleanupAndParseInput) so all threads will share the same parse object. It is a fairly safe bet that the parse objects are not threadsafe. So each thread should have a separate object. Note that this might not be sufficient, if the parse library uses any global fields it might be non-threadsafe even when using separate objects.
If, outside the thread, I change a value in the object that I'm passing to the thread, will that change reflect in the object passed to the thread? i.e, is the object passed by value or reference?
Objects (i.e. classes) are passed by reference. But any changes to an object are not guaranteed to be visible in other threads unless a memoryBarrier is issued. Most synchronization code (like lock) will issue memory barriers. Keep in mind that any non-atomic operation is unsafe if a field is written an read concurrently.
Is there a more efficient way for each record to be assigned to a thread and its parse object than I am currently doing with the objectIndex iterator?
Using manual threads in this way is very old-school. The modern, easier, and probably faster way is to use a parallel-for loop. This will try to be smart about how many threads it will use and try to adapt chunk sizes to keep the synchronization overhead low.
var items = new List<int>();
ParseObject LocalInit()
{
// Do initalization, This is run once for each thread used
return new ParseObject();
}
ParseObject ThreadMain(int value, ParallelLoopState state, ParseObject threadLocalObject)
{
// Do whatever you need to do
// This is run on multiple threads
return threadLocalObject;
}
void LocalFinally(ParseObject obj)
{
// Do Cleanup for each thread
}
Parallel.ForEach(items, LocalInit, ThreadMain, LocalFinally);
As a final note, I would advice against using multithreading unless you are familiar with the potential dangers and pitfalls it involves, at least for any project where the result is important. There are many ways to screw up and make a program that will work 99.9% of the time, and silently corrupt data the remaining 0.1% of the time.
Will parallelism help with performance for a locked object, should it be run single threaded, or is there another technique?
I noticed that when accessing a dataset and adding rows from multiple threads exceptions were thrown. Therefore I created a "thread-safe" version to add rows by locking the table prior to updating the row. This implementation works but is appears slow with many transactions.
public partial class HaMmeRffl
{
public partial class PlayerStatsDataTable
{
public void AddPlayerStatsRow(int PlayerID, int Year, int StatEnum, int Value, DateTime Timestamp)
{
lock (TeamMemberData.Dataset.PlayerStats)
{
HaMmeRffl.PlayerStatsRow testrow = TeamMemberData.Dataset.PlayerStats.FindByPlayerIDYearStatEnum(PlayerID, Year, StatEnum);
if (testrow == null)
{
HaMmeRffl.PlayerStatsRow newRow = TeamMemberData.Dataset.PlayerStats.NewPlayerStatsRow();
newRow.PlayerID = PlayerID;
newRow.Year = Year;
newRow.StatEnum = StatEnum;
newRow.Value = Value;
newRow.Timestamp = Timestamp;
TeamMemberData.Dataset.PlayerStats.AddPlayerStatsRow(newRow);
}
else
{
testrow.Value = Value;
testrow.Timestamp = Timestamp;
}
}
}
}
}
Now I can call this safely from multiple threads, but does it actually buy me anything? Can I do this differently for better performance. For instance is there any way to use System.Collections.Concurrent namespace to optimize performance or any other methods?
In addition, I update the underlying database after the entire dataset is updated and that takes a very long time. Would that be considered an I/O operation and be worth using parallel processing by updating it after each row is updated in the dataset (or some number of rows).
UPDATE
I wrote some code to test concurrent vs sequential processing which shows it takes about 30% longer to do concurrent processing and I should use sequential processing here. I assume this is because the lock on the database is causing the overhead on the ConcurrentQueue to be more costly than the gains from parallel processing. Is this conclusion correct and is there anything that I can do to speed up the processing, or am I stuck as for a Datatable "You must synchronize any write operations".
Here is my test code which might not be scientifically correct. Here is the timer and calls between them.
dbTimer.Restart();
Queue<HaMmeRffl.PlayersRow.PlayerValue> addPlayerRow = InsertToPlayerQ(addUpdatePlayers);
Queue<HaMmeRffl.PlayerStatsRow.PlayerStatValue> addPlayerStatRow = InsertToPlayerStatQ(addUpdatePlayers);
UpdatePlayerStatsInDB(addPlayerRow, addPlayerStatRow);
dbTimer.Stop();
System.Diagnostics.Debug.Print("Writing to the dataset took {0} seconds single threaded", dbTimer.Elapsed.TotalSeconds);
dbTimer.Restart();
ConcurrentQueue<HaMmeRffl.PlayersRow.PlayerValue> addPlayerRows = InsertToPlayerQueue(addUpdatePlayers);
ConcurrentQueue<HaMmeRffl.PlayerStatsRow.PlayerStatValue> addPlayerStatRows = InsertToPlayerStatQueue(addUpdatePlayers);
UpdatePlayerStatsInDB(addPlayerRows, addPlayerStatRows);
dbTimer.Stop();
System.Diagnostics.Debug.Print("Writing to the dataset took {0} seconds concurrently", dbTimer.Elapsed.TotalSeconds);
In both examples I add to the Queue and ConcurrentQueue in an identical manner single threaded. The only difference is the insertion into the datatable. The single-threaded approach inserts as follows:
private static void UpdatePlayerStatsInDB(Queue<HaMmeRffl.PlayersRow.PlayerValue> addPlayerRows, Queue<HaMmeRffl.PlayerStatsRow.PlayerStatValue> addPlayerStatRows)
{
try
{
HaMmeRffl.PlayersRow.PlayerValue row;
while (addPlayerRows.Count > 0)
{
row = addPlayerRows.Dequeue();
TeamMemberData.Dataset.Players.AddPlayersRow(
row.PlayerID, row.Name, row.PosEnum, row.DepthEnum,
row.TeamID, row.RosterTimestamp, row.DepthTimestamp,
row.Active, row.NewsUpdate);
}
}
catch (Exception)
{
TeamMemberData.Dataset.Players.RejectChanges();
}
try
{
HaMmeRffl.PlayerStatsRow.PlayerStatValue row;
while (addPlayerStatRows.Count > 0)
{
row = addPlayerStatRows.Dequeue();
TeamMemberData.Dataset.PlayerStats.AddUpdatePlayerStatsRow(
row.PlayerID, row.Year, row.StatEnum, row.Value, row.Timestamp);
}
}
catch (Exception)
{
TeamMemberData.Dataset.PlayerStats.RejectChanges();
}
TeamMemberData.Dataset.Players.AcceptChanges();
TeamMemberData.Dataset.PlayerStats.AcceptChanges();
}
The concurrent adds as follows
private static void UpdatePlayerStatsInDB(ConcurrentQueue<HaMmeRffl.PlayersRow.PlayerValue> addPlayerRows, ConcurrentQueue<HaMmeRffl.PlayerStatsRow.PlayerStatValue> addPlayerStatRows)
{
Action actionPlayer = () =>
{
HaMmeRffl.PlayersRow.PlayerValue row;
while (addPlayerRows.TryDequeue(out row))
{
TeamMemberData.Dataset.Players.AddPlayersRow(
row.PlayerID, row.Name, row.PosEnum, row.DepthEnum,
row.TeamID, row.RosterTimestamp, row.DepthTimestamp,
row.Active, row.NewsUpdate);
}
};
Action actionPlayerStat = () =>
{
HaMmeRffl.PlayerStatsRow.PlayerStatValue row;
while (addPlayerStatRows.TryDequeue(out row))
{
TeamMemberData.Dataset.PlayerStats.AddUpdatePlayerStatsRow(
row.PlayerID, row.Year, row.StatEnum, row.Value, row.Timestamp);
}
};
Action[] actions = new Action[Environment.ProcessorCount * 2];
for (int i = 0; i < Environment.ProcessorCount; i++)
{
actions[i * 2] = actionPlayer;
actions[i * 2 + 1] = actionPlayerStat;
}
try
{
// Start ProcessorCount concurrent consuming actions.
Parallel.Invoke(actions);
}
catch (Exception)
{
TeamMemberData.Dataset.Players.RejectChanges();
TeamMemberData.Dataset.PlayerStats.RejectChanges();
}
TeamMemberData.Dataset.Players.AcceptChanges();
TeamMemberData.Dataset.PlayerStats.AcceptChanges();
}
The difference in time is 4.6 seconds for the single-threaded and 6.1 for the parallel.Invoke.
Lock & transactions are not good for parallelism and performance.
1)Try avoid lock:Will different threads need to update the same Row in dataset?
2)minimize lock time.
For db operation use may try Batch Update future of ADO.NET: http://msdn.microsoft.com/en-us/library/ms810297.aspx
Multithreading can help upto an extent because once the data across your app boundary , you will start waiting for I/O , here you can do asynchronous processing because your app does not have control over various parameters ( Resource access , Network speed etc),this will give better user experience (If UI app).
Now for your scenario , you may want to use some sort of producer/consumer queue , as soon as a row is available in queue , a different thread start processing it but again this will work upto an extent.
I have a multithread application, compare CompareRow in the two List o1 and o2, and get the similarity, then store o1.CompareRow,o2.CompareRow, and Similarity in List, because the getSimilarity Process is very time consuming, and the data usually more than 1000, so using multithread should be helping, but in fact, it is not, pls
help point out, several things i already consider are
1. Database shouldnot be a problem, cause i already load the data into
two List<>
2. There is no shared writable data to complete
3. the order of the records is not a problem
pls help and it is urgent, the deadline is close....
public class OriginalRecord
{
public int PrimaryKey;
public string CompareRow;
}
===============================================
public class Record
{
// public ManualResetEvent _doneEvent;
public string r1 { get; set; }
public string r2 { get; set; }
public float similarity { get; set; }
public CountdownEvent CountDown;
public Record(string r1, string r2, ref CountdownEvent _countdown)
{
this.r1 = r1;
this.r2 = r2;
//similarity = GetSimilarity(r1, r2);
CountDown = _countdown;
}
public void ThreadPoolCallback(Object threadContext)
{
int threadIndex = (int)threadContext;
similarity = GetSimilarity(r1, r2);
CountDown.Signal();
}
private float GetSimilarity(object obj1, object obj2)
{
//Very time-consuming
ComparisionLibrary.MatchsMaker match
= new ComparisionLibrary.MatchsMaker (obj1.ToString(), obj2.ToString());
return match.Score;
}
}
================================================================
public partial class FormMain : Form
{
public FormMain()
{
InitializeComponent();
List<OriginalRecord> oList1=... //get loaded from database
List<OriginalRecord> oList2 =... //get loaded from database
int recordNum = oList1.Count * oList2.Count;
CountdownEvent _countdown = new CountdownEvent(recordNum);
Record[] rArray = new Record[recordNum];
int num = 0;
for (int i = 0; i <oList1.Count; i++)
{
for (int j = 0; j < oList2.Count; j++)
{
//Create a record instance
Record r
=new Record(oList1[i].CompareRow,oList2[j].CompareRow,ref _countdown);
rArray[num]=r;
//here use threadpool
ThreadPool.QueueUserWorkItem(r.ThreadPoolCallback, num);
num++;
}
}
_countdown.Wait();
List<Record> rList = rArray.ToList();
PopulateGridView(rList);
}
Here are the photos i capture in the debug mode
two things bring to my attention are
1. there are only 4 threads created to work but i set the minthreads in the threadpool is 10
2. as you can see, even 4 threads are created , but only one thread working at any time,
what is worse is sometimes none of the thread is working
by the way, the ComparisionLibrary is the library i download to do the heavy work
i cant post photo, would you pls leave me an email or sth that i can send the photos to you,thanks.
If you want to split your large task into several small tasks, do mind that parallelization only works if the small tasks are not too short in terms of run time. If you cut your 2 sec runtime job into 100.000 0.02 ms jobs, the overhead of distributing the jobs over the workers can be so great that the process runs much slower in parallel then it would normally do. Only if the overhead of parallelization is much smaller than the average runtime of one of the small tasks, you will see a performance gain. Try to cut up your problem in larger chunks.
Hi, my assumptions:
1) If your computer has 1 CPU than you won't gain any performance improvement. Indeed, you will lose in the performance, because of Context Switch.
2) Try to set a max threads value of the ThreadPool class.
ThreadPool.SetMaxThreads(x);
3) Try to use PLINQ.
As a guess, try to use TPL instead of plain ThreadPool, e.g.:
Replace
ThreadPool.QueueUserWorkItem(r.ThreadPoolCallback, num);
by such thing:
int nums = new int[1];
nums[0] = num;
Task t = null;
t = new Task(() =>
{
r.ThreadPoolCallback(nums[0]);
});
t.Start();
EDIT
You're trying to compare two list1, list2, each item in separate thread from pool, which is list1.Count*list2.Count thread pool calls, which looks for me like to much paralelization(this could cause many context switches) .
Try to split all comparison tasks onto several groups(e.g. 4-8) and start each one in separate thread. Maybe this could help. But again, this is only an assumption.
I'm building a multithreaded app in .net.
I have a thread that listens to a connection (abstract, serial, tcp...).
When it receives a new message, it adds it to via AddMessage. Which then call startSpool. startSpool checks to see if the spool is already running and if it is, returns, otherwise, starts it in a new thread. The reason for this is, the messages HAVE to be processed serially, FIFO.
So, my questions are...
Am I going about this the right way?
Are there better, faster, cheaper patterns out there?
My apologies if there is a typo in my code, I was having problems copying and pasting.
ConcurrentQueue<IMyMessage > messages = new ConcurrentQueue<IMyMessage>();
const int maxSpoolInstances = 1;
object lcurrentSpoolInstances;
int currentSpoolInstances = 0;
Thread spoolThread;
public void AddMessage(IMyMessage message)
{
this.messages.Add(message);
this.startSpool();
}
private void startSpool()
{
bool run = false;
lock (lcurrentSpoolInstances)
{
if (currentSpoolInstances <= maxSpoolInstances)
{
this.currentSpoolInstances++;
run = true;
}
else
{
return;
}
}
if (run)
{
this.spoolThread = new Thread(new ThreadStart(spool));
this.spoolThread.Start();
}
}
private void spool()
{
Message.ITimingMessage message;
while (this.messages.Count > 0)
{
// TODO: Is this below line necessary or does the TryDequeue cover this?
message = null;
this.messages.TryDequeue(out message);
if (message != null)
{
// My long running thing that does something with this message.
}
}
lock (lcurrentSpoolInstances)
{
this.currentSpoolInstances--;
}
}
This would be easier using BlockingCollection<T> instead of ConcurrentQueue<T>.
Something like this should work:
class MessageProcessor : IDisposable
{
BlockingCollection<IMyMessage> messages = new BlockingCollection<IMyMessage>();
public MessageProcessor()
{
// Move this to constructor to prevent race condition in existing code (you could start multiple threads...
Task.Factory.StartNew(this.spool, TaskCreationOptions.LongRunning);
}
public void AddMessage(IMyMessage message)
{
this.messages.Add(message);
}
private void Spool()
{
foreach(IMyMessage message in this.messages.GetConsumingEnumerable())
{
// long running thing that does something with this message.
}
}
public void FinishProcessing()
{
// This will tell the spooling you're done adding, so it shuts down
this.messages.CompleteAdding();
}
void IDisposable.Dispose()
{
this.FinishProcessing();
}
}
Edit: If you wanted to support multiple consumers, you could handle that via a separate constructor. I'd refactor this to:
public MessageProcessor(int numberOfConsumers = 1)
{
for (int i=0;i<numberOfConsumers;++i)
StartConsumer();
}
private void StartConsumer()
{
// Move this to constructor to prevent race condition in existing code (you could start multiple threads...
Task.Factory.StartNew(this.spool, TaskCreationOptions.LongRunning);
}
This would allow you to start any number of consumers. Note that this breaks the rule of having it be strictly FIFO - the processing will potentially process "numberOfConsumer" elements in blocks with this change.
Multiple producers are already supported. The above is thread safe, so any number of threads can call Add(message) in parallel, with no changes.
I think that Reed's answer is the best way to go, but for the sake of academics, here is an example using the concurrent queue -- you had some races in the code that you posted (depending upon how you handle incrementing currnetSpoolInstances)
The changes I made (below) were:
Switched to a Task instead of a Thread (uses thread pool instead of incurring the cost of creating a new thread)
added the code to increment/decrement your spool instance count
changed the "if currentSpoolInstances <= max ... to just < to avoid having one too many workers (probably just a typo)
changed the way that empty queues were handled to avoid a race: I think you had a race, where your while loop could have tested false, (you thread begins to exit), but at that moment, a new item is added (so your spool thread is exiting, but your spool count > 0, so your queue stalls).
private ConcurrentQueue<IMyMessage> messages = new ConcurrentQueue<IMyMessage>();
const int maxSpoolInstances = 1;
object lcurrentSpoolInstances = new object();
int currentSpoolInstances = 0;
public void AddMessage(IMyMessage message)
{
this.messages.Enqueue(message);
this.startSpool();
}
private void startSpool()
{
lock (lcurrentSpoolInstances)
{
if (currentSpoolInstances < maxSpoolInstances)
{
this.currentSpoolInstances++;
Task.Factory.StartNew(spool, TaskCreationOptions.LongRunning);
}
}
}
private void spool()
{
IMyMessage message;
while (true)
{
// you do not need to null message because it is an "out" parameter, had it been a "ref" parameter, you would want to null it.
if(this.messages.TryDequeue(out message))
{
// My long running thing that does something with this message.
}
else
{
lock (lcurrentSpoolInstances)
{
if (this.messages.IsEmpty)
{
this.currentSpoolInstances--;
return;
}
}
}
}
}
Check 'Pipelines pattern': http://msdn.microsoft.com/en-us/library/ff963548.aspx
Use BlockingCollection for the 'buffers'.
Each Processor (e.g. ReadStrings, CorrectCase, ..), should run in a Task.
HTH..