Performing ForEach

Performing ForEach - c#

I need to increase the performance of a ForEach.
//Pseudocode
foreach (item i item in items)
{
//Call service to open DB conn and get data
}
Within this loop make a call to a service that opens a session sqlserver, gets data from the database and closes the session, so for each iteration.
What I can do?.
Thanks.

Well that does sound like a perfectly good use of Parallel.ForEach - so have you tried it?
Parallel.ForEach(queries, query => {
// Perform query
});
You may well want to specify options around the level of parallelism etc - and make sure your connection pool supports as many connections as you want. And of course, measure the performance before and after to make sure it's actually helping.

Perhaps you could start a new thread in each iteration:
foreach (item i in collection)
{
Thread t = new Thread(functionToCall);
t.Start()
}
functionToCall()
{
database = openSQLSession();
data databaseData = database.getData();
dataCollection.Add(databaseData);
closeSQLSession();
}
Of course this is a simple example and pretty pseudocode-y, but I hope you get the gist of it?

Related

Which is quickest, Loop or FirstOrDefault()

I'm trying to improve performance on a WPF app as my users are saddened by the fact that one part of the system, seems to have a performance issue. The part in question is a screen which shows logged in users. The slow part logs them out, by: scanning in their employee ref and finds their child control and removes it from the parent, i.e. logs them out. This currently uses a loop.
foreach (var userControl in UsersStackPanel.Children)
{
if (userControl.Employee.EmpRef == employee.EmpRef)
{
// plus some DB stuff here
UsersStackPanel.Children.Remove(userControl);
break;
}
}
but I've an alternative which does this,
var LoggedInUser = (UI.Controls.Generic.User)UsersStackPanel.Children
.OfType<FrameworkElement>()
.FirstOrDefault(e => e.Name == "EmpRef" + employee.EmpRef);
if (LoggedInUser != null)
{
// some DB stuff here
UsersStackPanel.Children.Remove(LoggedInUser);
}
I've timed both using the Stopwatch class but the results don't point to which is better, they both return 0 miliseconds. I am wondering if the DB part is the bottle neck, i just thought I'd start with the screen to improve things there as well as the DB updates.
Any thoughts appreciated.
Dan

It seems to me that your second example should look more like this:
UI.Controls.Generic.User LoggedInUser = UsersStackPanel.Children
.OfType<UI.Controls.Generic.User>()
.FirstOrDefault(e => e.Employee.EmpRef == employee.EmpRef);
if (LoggedInUser != null)
{
// some DB stuff here
UsersStackPanel.Children.Remove(LoggedInUser);
}
But regardless, unless you have hundreds of thousands of controls in your StackPanel (and if you do, you have bigger fish to fry than this loop), the database access will completely swamp and make irrelevant any performance difference in the two looping techniques.
There's not enough context in your question to know for sure what the correct thing to do here is, but in terms of keeping the UI responsive, most likely what you'll want to wind up doing is wrapping the DB access in a helper method, and then execute that method as an awaited task, e.g. await Task.Run(() => DoSomeDBStuff()); That will let the UI thread (which is presumably the thread that executes the code you posted) continue working while the DB operations go on. When the DB stuff is done, your method will continue execution at the next statement, i.e. the call to the Remove() method.

How can I queue MySQL queries to make them sequential rather than concurrent and prevent excessive I/O usage?

I have multiple independent processes each submitting bulk insert queries (millions of rows) into a MySQL database, but it is much slower to have them run concurrently than in sequence.
How can I throttle the execution of these queries so that only one at a time can be executed, in sequence?
I have thought of checking if PROCESSLIST contains any running queries but it may not be the best way to properly queue queries on a real first-come, first-queued, first-served basis.
I am using C# and the MySQL Connector for .NET.

I'm guessing that you're using InnoDb (which allows you to do concurrent writes). MyISAM only has table level locking so would queue up the writes.
I'd recommend an approach similar to ruakh's but that you use a table in the database to manage the locking with. The table would be called something like lock_control
Just before you try to do a bulk insert to the table you request a LOCK TABLES lock_control WRITE on this lock_control table. If you are granted the lock then continue with your bulk write and afterwards release the lock. If the table is write locked by another thread then the LOCK TABLES command will block until the lock is released.
You could do this locking with the table you're inserting into directly but I believe that no other thread would be able to read from the table either whilst you hold the lock.
The advantage over doing this locking in the db rather than on the filesystem is that you could have inserts coming in from multiple client machines and it somehow feels a little simpler to handle the locking/inserting all within MySQL.

I am not sure if this will help but here goes. I had a similar problem where my program was throwing exceptions as MySql queries were out of order. So, I decided to run my queries in sequence so succeeding queries don't fail. I found a solution here.
https://michaelscodingspot.com/c-job-queues/ -
public class job_queue
{
private ConcurrentQueue<Action> _jobs = new ConcurrentQueue<Action>();
private bool _delegateQueuedOrRunning = false;
public void Enqueue(Action job)
{
lock (_jobs)
{
_jobs.Enqueue(job);
if (!_delegateQueuedOrRunning)
{
_delegateQueuedOrRunning = true;
ThreadPool.UnsafeQueueUserWorkItem(ProcessQueuedItems, null);
}
}
}
private void ProcessQueuedItems(object ignored)
{
while (true)
{
Action item;
lock (_jobs)
{
if (_jobs.Count == 0)
{
_delegateQueuedOrRunning = false;
break;
}
_jobs.TryDequeue(out item);
}
try
{
//do job
item();
}
catch
{
ThreadPool.UnsafeQueueUserWorkItem(ProcessQueuedItems, null);
throw;
}
}
}
}
This is a class to run methods one after another in queue. And you add methods that contain Mysql Queries to job_queue by .
var mysql_tasks= new job_queue();
mysql_tasks.Enqueue(() => { Your_MYSQL_METHOD_HERE(); });

Create a Bulk insert service / class
Get the client to throw the Data at it.
It does them one at a time. Message back done if you need it.
You don't want to be choking your DB to one thread, that will kill everything else.

Not being much of a C#-er, I can't say if this is the best way to do this; but if no one gives a better answer, one common, non-language-specific approach to this sort of thing is to use a temporary file on the file-system. Before performing one of these INSERTs, grab a write-lock on the file, and after the INSERT is done, release the write-lock. (You'll want to use a using or finally block for this.) This answer gives sample code for obtaining a write-lock in C# in a blocking way.

Best way to manage thread pool against database queue

I have a data table full of summary entries and my software needs to go through and reach out to a web service to get details, then record those details back to the database. Looping through the table synchronously while calling the web service and waiting for the response is too slow (there are thousands of entries) so I'd like to take the results (10 or so at a time) and thread it out so it performs 10 operations at the same time.
My experience with C# threads is limited to say the least, so what's the best approach? Does .NET have some sort of threadsafe queue system that I can use to make sure that the results get handled properly and in order?

Depending on which version of the .NET Framework you have two pretty good options.
You can use ThreadPool.QueueUserWorkItem in any version.
int pending = table.Rows.Count;
var finished = new ManualResetEvent(false);
foreach (DataRow row in table.Rows)
{
DataRow capture = row; // Required to close over the loop variable correctly.
ThreadPool.QueueUserWorkItem(
(state) =>
{
try
{
ProcessDataRow(capture);
}
finally
{
if (Interlocked.Decrement(ref pending) == 0)
{
finished.Set(); // Signal completion of all work items.
}
}
}, null);
}
finished.WaitOne(); // Wait for all work items to complete.
If you are using .NET Framework 4.0 you can use the Task Parallel Library.
var tasks = new List<Task>();
foreach (DataRow row in table.Rows)
{
DataRow capture = row; // Required to close over the loop variable correctly.
tasks.Add(
Task.Factory.StartNew(
() =>
{
ProcessDataRow(capture);
}));
}
Task.WaitAll(tasks.ToArray()); // Wait for all work items to complete.
There are many other reasonable ways to do this. I highlight the patterns above because they are easy and work well. In the absence of specific details I cannot say for certain that either will be a perfect match for your situation, but they should be a good starting point.
Update:
I had a short period of subpar cerebral activity. If you have the TPL available you could also use Parallel.ForEach as a simpler method than all of that Task hocus-pocus I mentioned above.
Parallel.ForEach(table.Rows,
(DataRow row) =>
{
ProcessDataRow(row);
});

Does .NET have some sort of threadsafe queue system that I can use to make sure that the results get handled properly and in order?
This was something added in .NET 4. The BlockingCollection<T> class, by defaults, acts as a thread safe queue for producer/consumer scenarios.
It makes it fairly easy to create a number of elements that "consume" from the collection and process, with one or more elements adding to the collection.

How can a user cancel a long running query? [duplicate]

I'm in the process of writing a query manager for a WinForms application that, among other things, needs to be able to deliver real-time search results to the user as they're entering a query (think Google's live results, though obviously in a thick client environment rather than the web). Since the results need to start arriving as the user types, the search will get more and more specific, so I'd like to be able to cancel a query if it's still executing while the user has entered more specific information (since the results would simply be discarded, anyway).
If this were ordinary ADO.NET, I could obviously just use the DbCommand.Cancel function and be done with it, but we're using EF4 for our data access and there doesn't appear to be an obvious way to cancel a query. Additionally, opening System.Data.Entity in Reflector and looking at EntityCommand.Cancel shows a discouragingly empty method body, despite the docs claiming that calling this would pass it on to the provider command's corresponding Cancel function.
I have considered simply letting the existing query run and spinning up a new context to execute the new search (and just disposing of the existing query once it finishes), but I don't like the idea of a single client having a multitude of open database connections running parallel queries when I'm only interested in the results of the most recent one.
All of this is leading me to believe that there's simply no way to cancel an EF query once it's been dispatched to the database, but I'm hoping that someone here might be able to point out something I've overlooked.
TL/DR Version: Is it possible to cancel an EF4 query that's currently executing?

Looks like you have found some bug in EF but when you report it to MS it will be considered as bug in documentation. Anyway I don't like the idea of interacting directly with EntityCommand. Here is my example how to kill current query:
var thread = new Thread((param) =>
{
var currentString = param as string;
if (currentString == null)
{
// TODO OMG exception
throw new Exception();
}
AdventureWorks2008R2Entities entities = null;
try // Don't use using because it can cause race condition
{
entities = new AdventureWorks2008R2Entities();
ObjectQuery<Person> query = entities.People
.Include("Password")
.Include("PersonPhone")
.Include("EmailAddress")
.Include("BusinessEntity")
.Include("BusinessEntityContact");
// Improves performance of readonly query where
// objects do not have to be tracked by context
// Edit: But it doesn't work for this query because of includes
// query.MergeOption = MergeOption.NoTracking;
foreach (var record in query
.Where(p => p.LastName.StartsWith(currentString)))
{
// TODO fill some buffer and invoke UI update
}
}
finally
{
if (entities != null)
{
entities.Dispose();
}
}
});
thread.Start("P");
// Just for test
Thread.Sleep(500);
thread.Abort();
It is result of my playing with if after 30 minutes so it is probably not something which should be considered as final solution. I'm posting it to at least get some feedback with possible problems caused by this solution. Main points are:
Context is handled inside the thread
Result is not tracked by context
If you kill the thread query is terminated and context is disposed (connection released)
If you kill the thread before you start a new thread you should use still one connection.
I checked that query is started and terminated in SQL profiler.
Edit:
Btw. another approach to simply stop current query is inside enumeration:
public IEnumerable<T> ExecuteQuery<T>(IQueryable<T> query)
{
foreach (T record in query)
{
// Handle stop condition somehow
if (ShouldStop())
{
// Once you close enumerator, query is terminated
yield break;
}
yield return record;
}
}

Threading: allow one thread to access data while blocking others, and then stop blocked threads from executing the same code

imagine the simplest DB access code with some in-memory caching -
if exists in cache
return object
else
get from DB
add to cache
return object
Now, if the DB access takes a second and I have, say, 5 ASP.Net requests/threads hitting that same code within that second, how can I ensure only the first one does the DB call? I have a simple thread lock around it, but that simply queues them up in an orderly fashion, allowing each to call the DB in turn. My data repositories basically read in entire tables in one go, so we're not talking about Get by Id data requests.
Any ideas on how I can do this? Thread wait handles sound almost what I'm after but I can't figure out how to code it.
Surely this must be a common scenario?
Existing pseudocode:
lock (threadLock)
{
get collection of entities using Fluent NHib
add collection to cache
}
Thanks,
Col

You've basically answered your own question. The "lock()" is fine, it prevents the other threads proceeding into that code while any other thread is in there. Then, inside the lock perform your first pseudo-code. Check if it's cached already, if not, retrieve the value and cache it. The next thread will then come in, check the cache, find it's available and use that.

Surely this must be a common scenario?
Not necessarily as common as you may think.
In many similar caching scenarios:
the race condition you describe doesn't happen frequently (it requires multiple requests to arrive when the cache is cold)
the data returned from the database is readonly, and data returned by multiple requests is essentially interchangeable.
the cost of retrieving the database is not so prohibitive that it matters.
But if in scenario you absolutely need to prevent this race condition, then use a lock as suggested by Roger Perkins.

I'd use Monitor/Mutext over lock. Using lock u need to specify a resource (may also use this-pointer, which is not recommended).
try the following instead:
Mutext myMutex = new Mutex();
// if u want it systemwide use a named mutex
// Mutext myMutex = new Mutex("SomeUniqueName");
mutex.WaitOne();
// or
//if(mutex.WaitOne(<ms>))
//{
// //thread has access
//}
//else
//{
// //thread has no access
//}
<INSERT CODE HERE>
mutex.ReleaseMutex();

I don't know general solution or established algorithm is exist.
I personally use below code pattern to solve problem like this.
1) Define a integer variable that can be accessed by all thread.
int accessTicket = 0;
2) Modify code block
int myTicket = accessTicket;
lock (threadLock)
{
if (myTicket == accessTicket)
{
++accessTicket;
//get collection of entities using Fluent NHib
//add collection to cache
}
}
UPDATE
Purpose of this code is not prevent multiple DB access of duplicate caching. We can do it with normal thread lock.
By using the access ticket like this we can prevent other thread doing again already finished work.
UPDATE#2
LOOK THERE IS lock (threadLock)
Look before comment.
Look carefully before vote down.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Performing ForEach - c#

Related

Which is quickest, Loop or FirstOrDefault()

How can I queue MySQL queries to make them sequential rather than concurrent and prevent excessive I/O usage?

Best way to manage thread pool against database queue

How can a user cancel a long running query? [duplicate]

Threading: allow one thread to access data while blocking others, and then stop blocked threads from executing the same code

Categories

Resources