Using C# / .NET 3.5.
Currently I'm populating 2 DataTables one after the other using SqlDataAdapter.Fill().
I want to populate both of these DataTables in parallel, at the same time by doing each one asynchronously. However, there is no asynchronous version of the Fill() method - i.e. BeginFill() would be great!
One approach I've tried is (pseudo):
SqlCommand1.BeginExecuteReader // 1st query, for DataTable1
SqlCommand2.BeginExecuteReader // 2nd query, for DataTable2
SqlCommand1.EndExecuteReader
SqlCommand2.EndExecuteReader
DataTable1.Load(DataReader1)
DataTable2.Load(DataReader2)
However, DataTable.Load() takes a long time:
It takes 3 seconds to do step 1 to step 4.
Step 5 then takes 22 seconds.
Step 6 takes 17 seconds.
So, combined 39 seconds for steps 5 and 6.
The end result is, this gives me no benefit over just doing 2 SqlDataAdapter.Fills, one after the other. I want the net result to be that the entire process takes only as long as the longest query (or as close to that as possible).
Looking for recommended ways forward to end up with something that is truly an asynchronous approach to filling a DataTable.
Or do I just manage it myself and roll 2 separate threads, each one filling a DataTable?
I would suggest have a separate worker thread for each. You could use ThreadPool.QueueUserWorkItem.
List<AutoResetEvent> events = new List<AutoResetEvent>();
AutoResetEvent loadTable1 = new AutoResetEvent(false);
events.Add(loadTable1);
ThreadPool.QueueUserWorkItem(delegate
{
SqlCommand1.BeginExecuteReader;
SqlCommand1.EndExecuteReader;
DataTable1.Load(DataReader1);
loadTable1.Set();
});
AutoResetEvent loadTable2 = new AutoResetEvent(false);
events.Add(loadTable2);
ThreadPool.QueueUserWorkItem(delegate
{
SqlCommand2.BeginExecuteReader;
SqlCommand2.EndExecuteReader;
DataTable2.Load(DataReader2);
loadTable2.Set();
});
// wait until both tables have loaded.
WaitHandle.WaitAll(events.ToArray());
This is because the DataTable has a lot of objects to create (rows, values). You should have the execution of the adapter and population of a datatable all done in a different thread, and synchronise waiting for each operation to finish before you continue.
The following code was written in Notepad and probably doesn't even compile, but hopefully you get the idea...
// Setup state as a parameter object containing a table and adapter to use to populate that table here
void DoWork()
{
List<AutoResetEvent> signals = GetNumberOfWaitHandles(2);
var params1 = new DataWorkerParameters
{
Command = GetCommand1();
Table = new DataTable();
}
var params2 = new DataWorkerParameters
{
Command = GetCommand2();
Table = new DataTable();
}
ThreadPool.QueueUserWorkItem(state =>
{
var input = (DataWorkerParameters)state;
PopulateTable(input);
input.AutoResetEvent.Set(); // You can use AutoResetEvent.WaitAll() in the caller to wait for all threads to complete
},
params1
);
ThreadPool.QueueUserWorkItem(state =>
{
var input = (DataWorkerParameters)state;
PopulateTable(input);
input.AutoResetEvent.Set(); // You can use AutoResetEvent.WaitAll() in the caller to wait for all threads to complete
},
params2
);
WaitHandle.WaitAll(signals.ToArray());
}
void PopulateTable(DataWorkerParameters parameters)
{
input.Command.ExecuteReader();
input.Table.Load(input.Command);
}
Related
I want to increase the performance of a procedure which invokes a web service multiple times sequentially and store the result in a list.
Due that a single call to the WS last 1second and I need to do something like 300 calls to the web service if I do the job sequentially it takes 300 seconds to accomplish the task, that's why I changed the procedure implementation to multithreading using the following piece of code:
List<WCFResult> resultList= new List<WCFResult>()
using (var ws = new WCFService(binding, endpoint))
{
foreach (var singleElement in listOfelements)
{
Action action = () =>
{
var singleResult = ws.Call(singleElement);
resultList.Add(singleResult);
};
tasks.Add(Task.Factory.StartNew(action, TaskCreationOptions.LongRunning));
}
}
Task.WaitAll(tasks.ToArray());
//Do other stuff with the resultList...
Using this code I achieve to save 0.1 seconds per single element which is less than I thought, do you know any further optimization I can do? Or can you share an alternative?
Using the following code all the request are handled in half of the time
ParallelOptions ops = new ParallelOptions();
ops.MaxDegreeOfParallelism = 16;
ConcurrentBag<WCFResult> sapResultList = new ConcurrentBag<WCFResult>();
Parallel.ForEach(allItems, ops, item =>
{
var ws = new WCFClient(binding, endpoint);
result = ws.Call(item);
svc.Close();
resultList.Add(result);
});
//Do other stuff with the resultList...
Mission accomplished. I also modified the result list to be a ConcurrentBag instead of a List
I am writing a script in C# for Unity to read messages from multiple sources. I have a function ReadMessage which takes in a port string and returns a string ID number. The issue is, I have a dozen different connections I need to read from and the messages have a 10ms timeout timer before it stops trying and lets the code continue. This causes a reduction in frame rate when I have a dozen threads waiting a few ms for the previous one to finish Joining. My thread code is as follows:
string threadOneString= null;
Thread ThreadOne;
//Repeat for 11 more threads
void Update () {
ThreadOne = new Thread(
() =>
{
threadOneString = ReadMessage(someClass.port);
});
ThreadOne.Start();
//Repeat for 11 more threads
}
void LateUpdate () {
ThreadOne.Join();
//Repeat for 11 more threads
UpdateClass(threadOneString);
//Repeat for 11 more threads
UpdateTextDisplay(); //Just updates a Unity Text object
}
And my ReadMessage code in case it matters.
private string ReadMessage(string port) //Change this name to ReadButtonPress
{
string fullMessage = "";
someStruct parsedMessage;
var timeout = new System.TimeSpan(0, 0, 0, 0, 10); // Less than 10 and it starts to miss most messages, ideally this would be a bit higher
string connectionString = PROTOCOL + CONTROLLER_IP + ":" + port;
AsyncIO.ForceDotNet.Force();
using (var subSocket = new SubscriberSocket())
{
subSocket.Connect(connectionString);
subSocket.Subscribe("");
subSocket.TryReceiveFrameString(timeout, out fullMessage);
UnityEngine.Debug.Log("Message: " + fullMessage);
subSocket.Close();
}
// Some message parsing and checking...
return parsedMessage.someString;
}
What I'd like to do, but I don't know if it's possible, is to call the Joins for each thread at the same time instead of calling one, waiting, then calling the next. The threads don't interact with each other, so I'm hoping this is possible. If not, I'd greatly appreciate another solution or suggestion.
EDIT: Clarification on what's happening. When I only run a few threads I get a FPS of 65-70. When I run all 12 threads my FPS drops to ~50 and I have a hard requirement of 60 FPS.
Thanks to some feedback here I am no longer creating a socket every update. Instead I create the sockets in Start() and just read what I need from a public function in the class. I based my code on https://stackoverflow.com/a/14797475/8635796
Don't do what I did and recreate the exact same threads and sockets every Update(). In hindsight, it was a really poor choice.
Use tasks instead, and put them into a list.
var myTasks = new List<Task>();
Task allUpdateTasks;
...
void Update () {
myTasks.Add(
Task.Factory.StartNew(() =>
{
threadOneString = ReadMessage(someClass.port);
}));
//Repeat for 11 more tasks
// Create a single tasks that will hold all tasks above.
allUpdateTasks = Task.WhenAll(myTasks);
}
void LateUpdate() {
allUpdateTasks.Wait();
...
}
define ManualResetEvent and signal it when you got results, you can wait up to 64 waithandles using WaitHandle.WaitAny
I have a situation in which I have a producer/consumer scenario. The producer never stops, which means that even if there is a time where there are no items in the BC, further items can be added later.
Moving from .NET Framework 3.5 to 4.0, I decided to use a BlockingCollection as a concurrent queue between the consumer and the producer. I even added some parallel extensions so I could use the BC with a Parallel.ForEach.
The problem is that, in the consumer thread, I need to have a kind of an hybrid model:
Im always checking the BC to process any item that arrived with a
Parallel.ForEach(bc.GetConsumingEnumerable(), item => etc
Inside this foreach, I execute all the tasks that dont depend between each other.
Here comes the problem. After paralelizing the previous tasks I need to manage their results in the same FIFO order in which they were in the BC. The processing of these results should be made in a sync thread.
A little example in pseudo code follows:
producer:
//This event is triggered each time a page is scanned. Any batch of new pages can be added at any time at the scanner
private void Current_OnPageScanned(object sender, ScannedPage scannedPage)
{
//The object to add has a property with the sequence number
_concurrentCollection.TryAdd(scannedPage);
}
consumer:
private void Init()
{
_cancelTasks = false;
_checkTask = Task.Factory.StartNew(() =>
{
while (!_cancelTasks)
{
//BlockingCollections with Parallel ForEach
var bc = _concurrentCollection;
Parallel.ForEach(bc.GetConsumingEnumerable(), item =>
{
ScannedPage currentPage = item;
// process a batch of images from the bc and check if an image has a valid barcode. T
});
//Here should go the code that takes the results from each tasks, process them in the same FIFO order in which they entered the BC and save each image to a file, all of this in this same thread.
}
});
}
Obviously, this cant work as it is because the .GetConsumingEnumerable() blocks until there is another item in the BC. I asume I could do it with tasks and just fire 4 or 5 task in a same batch, but:
How could I do this with tasks and still have a waiting point before the start of the tasks that blocks until there is an item to be consumed in the BC (I don't want to start processing if there is nothing. Once there is something in the BC i would just start the batch of 4 tasks, and use a TryTake inside each one so if there is nothing to take they don't block, because I don't know if I can always reach the number of items from the BC as the batch of tasks, for example, just one item left in the BC and a batch of 4 tasks) ?
How could I do this and take advantage of the efficiency that Parallel.For offers?
How could I save the results of the tasks in the same FIFO order in which the items were extracted from the BC?
Is there any other concurrency class more suited to this kind of hybrid processing of items in the consumer?
Also, this is my first question ever made in StackOverflow, so if you need any more data or you just think that my question is not correct just let me know.
I think I follow what you're asking, why not create a ConcurrentBag and add to it while processing like this:
while (!_cancelTasks)
{
//BlockingCollections with Paralell ForEach
var bc = _concurrentCollection;
var q = new ConcurrentBag<ScannedPage>();
Parallel.ForEach(bc.GetConsumingEnumerable(), item =>
{
ScannedPage currentPage = item;
q.Add(item);
// process a batch of images from the bc and check if an image has a valid barcode. T
});
//Here should go the code that takes the results from each tasks, process them in the same FIFO order in which they entered the BC and save each image to a file, all of this in this same thread.
//process items in your list here by sorting using some sequence key
var items = q.OrderBy( o=> o.SeqNbr).ToList();
foreach( var item in items){
...
}
}
This obviously doesn't enqueue them in the exact order they were added to the BC but you could add some sequence nbr to the ScannedPage object like Alex suggested and then sort the results after.
Here's how I'd handle the sequence:
Add this to the ScannedPage class:
public static int _counter; //public because this is just an example but it would work.
Get a sequence nbr and assign here:
private void Current_OnPageScanned(object sender, ScannedPage scannedPage)
{
lock( this){ //to single thread this process.. not necessary if it's already single threaded of course.
System.Threading.Interlocked.Increment( ref ScannedPage._counter);
scannedPage.SeqNbr = ScannedPage._counter;
...
}
}
Whenever you need the results of a parallel operation, using PLINQ is generally more convenient that using the Parallel class. Here is how you could refactor your code using PLINQ:
private void Init()
{
_cancelTasks = new CancellationTokenSource();
_checkTask = Task.Run(() =>
{
while (true)
{
_cancelTasks.Token.ThrowIfCancellationRequested();
var bc = _concurrentCollection;
var partitioner = Partitioner.Create(
bc.GetConsumingEnumerable(_cancelTasks.Token),
EnumerablePartitionerOptions.NoBuffering);
ScannedPage[] results = partitioner
.AsParallel()
.AsOrdered()
.Select(scannedPage =>
{
// Process the scannedPage
return scannedPage;
})
.ToArray();
// Process the results
}
});
}
The .AsOrdered() is what ensures that you'll get the results in the same order as the input.
Be aware that when you consume a BlockingCollection<T> with the Parallel class or PLINQ, it is important to use the Partitioner and the EnumerablePartitionerOptions.NoBuffering configuration, otherwise there is a risk of deadlocks. The default greedy behavior of the Parallel/PLINQ and the blocking behavior of the BlockingCollection<T>, do not interact well.
I have a ListBox with a list of URLs.
I have 2 threads taking theses URLs and treat them into a function.
My Thread 1 takes the items[0] of the ListBox, and my Thread 2 takes the items[1].
After the Thread picked up the item, it immediately remove it using Items.RemoveAt(0 or 1)
My problem using this method is that some of the URL are treated twice, some even not.
Isnt there a way to flag an URL or something else ? I'm not so familiar with multi threading
PS: In my example i said i was using 2 threads, in reality i use 5 threads.
Thanks in advance
EDIT :
Used the concurentqueue system :
Thread th1;
Thread th2;
Thread th3;
Thread th4;
Thread th5;
ConcurrentQueue<string> myQueue= new ConcurrentQueue<string>();
Int queueCount = 0;
private void button2_Click(object sender, EventArgs e)
{
//initialize objects and query the database
DBconnect conn;
conn = new DBconnect();
string query = "SELECT Url FROM Pages WHERE hash = ''";
List<string> result = conn.Select(query);
for (int i = 0; i < result.Count(); i++)
{
//For all rows found, add them to the queue
myQueue.Enqueue(result[i]);
}
//start the 5 threads to process the queue
th1 = new Thread(ProcessTorrent);
th2 = new Thread(ProcessTorrent);
th3 = new Thread(ProcessTorrent);
th4 = new Thread(ProcessTorrent);
th5 = new Thread(ProcessTorrent);
th1.Start();
th2.Start();
th3.Start();
th4.Start();
th5.Start();
}
private void ProcessTorrent()
{
//Start an unlimted task with continueWith
Task tasks = Task.Factory.StartNew(() =>
{
//Check if there are still items in the queue
if (myQueue.Count > 0)
{
string queueURL;
bool haveElement = myQueue.TryDequeue(out queueURL);
//check if i can get an element from the queue
if (haveElement)
{
//start function to parse the URL and increment the number of items treated from the queue
get_torrent_detail(queueElement);
Interlocked.Increment(ref queueCount);
this.Invoke(new Action(() => label_total.Text = (myQueue.Count() - queueCount).ToString()));
}
}
});
//continue the task for another queue item
tasks.ContinueWith(task =>
{
ProcessTorrent();
});
}
It sounds like you're using a UI control to coordinate tasks between multiple threads.
That is an extremely bad idea.
Instead, you should queue up the tasks into a ConcurrentQueue<T> or BlockingCollection<T>, and have other threads take items from the queue and process them.
Yes, that happens because oyu do not synchronize access to the list.
Basically read the documentation C#, LOCK statement. Put up a lock while accessing the list. That prevents multiple threads from accessing it at the same time.
Then you ALWAYS get the top item (items[0]) immediately removing it.
I'm not so familiar with multi threading
I really love when people show that attitude. Can you imagine a cook, working in a restaurant as a professional cook, saying "ah, I am not familiar with an oven, you know". Or a doctor saying "ok, I have a problem here, I have no real idea how to give an injection". Given that today we live in a multicolored world, this sentence just SCREAMS in a bad way.
I'm currently working on a project, where we have the challenge to process items in parallel. So far not a big deal ;)
Now to the problem. We have a list of IDs, where we periodically (every 2 sec's) what to call a StoredProcedure for each ID.
The 2 sec's need to be checked for each item individually, as they are added and removing during runtime.
In addition we want to configure the maximum degree of parallelism, as the DB should not be flooded with 300 threads concurrently.
An item which is being processed should not be rescheduled for processing until it has finished with the previous execution. Reason is that we want to prevent queueing up a lot of items, in case of delays on the DB.
Right now we are using a self-developed component, that has a main thread, which periodically checks what items need to scheduled for processing. Once it has the list, it's dropping those on a custom IOCP-based thread pool, and then uses waithandles to wait for the items being processed. Then the next iteration starts. IOCP because of the work-stealing it provides.
I would like to replace this custom implementation with a TPL/.NET 4 version, and I would like to know how you would solve it (ideally simple and nicely readable/maintainable).
I know about this article: http://msdn.microsoft.com/en-us/library/ee789351.aspx, but it's just limiting the amount of threads being used. Leaves work stealing, periodically executing the items ....
Ideally it will become a generic component, that can be used for some all the tasks that need to be done periodically for a list of items.
any input welcome,
tia
Martin
I don't think you actually need to get down and dirty with direct TPL Tasks for this. For starters I would set up a BlockingCollection around a ConcurrentQueue (the default) with no BoundedCapacity set on the BlockingCollection to store the IDs that need to be processed.
// Setup the blocking collection somewhere when your process starts up (OnStart for a Windows service)
BlockingCollection<string> idsToProcess = new BlockingCollection<string>();
From there I would just use Parallel::ForEach on the enumeration returned from the BlockingCollection::GetConsumingEnumerable. In the ForEach call you will setup your ParallelOptions::MaxDegreeOfParallelism Inside the body of the ForEach you will execute your stored procedure.
Now, once the stored procedure execution completes, you're saying you don't want to re-schedule the execution for at least two seconds. No problem, schedule a System.Threading.Timer with a callback which will simply add the ID back to the BlockingCollection in the supplied callback.
Parallel.ForEach(
idsToProcess.GetConsumingEnumerable(),
new ParallelOptions
{
MaxDegreeOfParallelism = 4 // read this from config
},
(id) =>
{
// ... execute sproc ...
// Need to declare/assign this before the delegate so that we can dispose of it inside
Timer timer = null;
timer = new Timer(
_ =>
{
// Add the id back to the collection so it will be processed again
idsToProcess.Add(id);
// Cleanup the timer
timer.Dispose();
},
null, // no state, id wee need is "captured" in the anonymous delegate
2000, // probably should read this from config
Timeout.Infinite);
}
Finally, when the process is shutting down you would call BlockingCollection::CompleteAdding so that the enumerable being processed with stop blocking and complete and the Parallel::ForEach will exit. If this were a Windows service for example you would do this in OnStop.
// When ready to shutdown you just signal you're done adding
idsToProcess.CompleteAdding();
Update
You raised a valid concern in your comment that you might be processing a large amount of IDs at any given point and fear that there would be too much overhead in a timer per ID. I would absolutely agree with that. So in the case that you are dealing with a large list of IDs concurrently, I would change from using a timer-per-ID to using another queue to hold the "sleeping" IDs which is monitored by a single short interval timer instead. First you'll need a ConcurrentQueue onto which to place the IDs that are asleep:
ConcurrentQueue<Tuple<string, DateTime>> sleepingIds = new ConcurrentQueue<Tuple<string, DateTime>>();
Now, I'm using a two-part Tuple here for illustration purposes, but you may want to create a more strongly typed struct for it (or at least alias it with a using statement) for better readability. The tuple has the id and a DateTime which represents when it was put on the queue.
Now you'll also want to setup the timer that will monitor this queue:
Timer wakeSleepingIdsTimer = new Timer(
_ =>
{
DateTime utcNow = DateTime.UtcNow;
// Pull all items from the sleeping queue that have been there for at least 2 seconds
foreach(string id in sleepingIds.TakeWhile(entry => (utcNow - entry.Item2).TotalSeconds >= 2))
{
// Add this id back to the processing queue
idsToProcess.Enqueue(id);
}
},
null, // no state
Timeout.Infinite, // no due time
100 // wake up every 100ms, probably should read this from config
);
Then you would simply change the Parallel::ForEach to do the following instead of setting up a timer for each one:
(id) =>
{
// ... execute sproc ...
sleepingIds.Enqueue(Tuple.Create(id, DateTime.UtcNow));
}
This is pretty similar to the approach you said you already had in your question, but does so with TPL tasks. A task just adds itself back to a list of things to schedule when its done.
The use of locking on a plain list is fairly ugly in this example, would probably want a better collection to hold the list of things to schedule
// Fill the idsToSchedule
for (int id = 0; id < 5; id++)
{
idsToSchedule.Add(Tuple.Create(DateTime.MinValue, id));
}
// LongRunning will tell TPL to create a new thread to run this on
Task.Factory.StartNew(SchedulingLoop, TaskCreationOptions.LongRunning);
That starts up the SchedulingLoop, which actually performs the checking if its been two seconds since something ran
// Tuple of the last time an id was processed and the id of the thing to schedule
static List<Tuple<DateTime, int>> idsToSchedule = new List<Tuple<DateTime, int>>();
static int currentlyProcessing = 0;
const int ProcessingLimit = 3;
// An event loop that performs the scheduling
public static void SchedulingLoop()
{
while (true)
{
lock (idsToSchedule)
{
DateTime currentTime = DateTime.Now;
for (int index = idsToSchedule.Count - 1; index >= 0; index--)
{
var scheduleItem = idsToSchedule[index];
var timeSincePreviousRun = (currentTime - scheduleItem.Item1).TotalSeconds;
// start it executing in a background task
if (timeSincePreviousRun > 2 && currentlyProcessing < ProcessingLimit)
{
Interlocked.Increment(ref currentlyProcessing);
Console.WriteLine("Scheduling {0} after {1} seconds", scheduleItem.Item2, timeSincePreviousRun);
// Schedule this task to be processed
Task.Factory.StartNew(() =>
{
Console.WriteLine("Executing {0}", scheduleItem.Item2);
// simulate the time taken to call this procedure
Thread.Sleep(new Random((int)DateTime.Now.Ticks).Next(0, 5000) + 500);
lock (idsToSchedule)
{
idsToSchedule.Add(Tuple.Create(DateTime.Now, scheduleItem.Item2));
}
Console.WriteLine("Done Executing {0}", scheduleItem.Item2);
Interlocked.Decrement(ref currentlyProcessing);
});
// remove this from the list of things to schedule
idsToSchedule.RemoveAt(index);
}
}
}
Thread.Sleep(100);
}
}