How can a user cancel a long running query? [duplicate] - c#

I'm in the process of writing a query manager for a WinForms application that, among other things, needs to be able to deliver real-time search results to the user as they're entering a query (think Google's live results, though obviously in a thick client environment rather than the web). Since the results need to start arriving as the user types, the search will get more and more specific, so I'd like to be able to cancel a query if it's still executing while the user has entered more specific information (since the results would simply be discarded, anyway).
If this were ordinary ADO.NET, I could obviously just use the DbCommand.Cancel function and be done with it, but we're using EF4 for our data access and there doesn't appear to be an obvious way to cancel a query. Additionally, opening System.Data.Entity in Reflector and looking at EntityCommand.Cancel shows a discouragingly empty method body, despite the docs claiming that calling this would pass it on to the provider command's corresponding Cancel function.
I have considered simply letting the existing query run and spinning up a new context to execute the new search (and just disposing of the existing query once it finishes), but I don't like the idea of a single client having a multitude of open database connections running parallel queries when I'm only interested in the results of the most recent one.
All of this is leading me to believe that there's simply no way to cancel an EF query once it's been dispatched to the database, but I'm hoping that someone here might be able to point out something I've overlooked.
TL/DR Version: Is it possible to cancel an EF4 query that's currently executing?

Looks like you have found some bug in EF but when you report it to MS it will be considered as bug in documentation. Anyway I don't like the idea of interacting directly with EntityCommand. Here is my example how to kill current query:
var thread = new Thread((param) =>
{
var currentString = param as string;
if (currentString == null)
{
// TODO OMG exception
throw new Exception();
}
AdventureWorks2008R2Entities entities = null;
try // Don't use using because it can cause race condition
{
entities = new AdventureWorks2008R2Entities();
ObjectQuery<Person> query = entities.People
.Include("Password")
.Include("PersonPhone")
.Include("EmailAddress")
.Include("BusinessEntity")
.Include("BusinessEntityContact");
// Improves performance of readonly query where
// objects do not have to be tracked by context
// Edit: But it doesn't work for this query because of includes
// query.MergeOption = MergeOption.NoTracking;
foreach (var record in query
.Where(p => p.LastName.StartsWith(currentString)))
{
// TODO fill some buffer and invoke UI update
}
}
finally
{
if (entities != null)
{
entities.Dispose();
}
}
});
thread.Start("P");
// Just for test
Thread.Sleep(500);
thread.Abort();
It is result of my playing with if after 30 minutes so it is probably not something which should be considered as final solution. I'm posting it to at least get some feedback with possible problems caused by this solution. Main points are:
Context is handled inside the thread
Result is not tracked by context
If you kill the thread query is terminated and context is disposed (connection released)
If you kill the thread before you start a new thread you should use still one connection.
I checked that query is started and terminated in SQL profiler.
Edit:
Btw. another approach to simply stop current query is inside enumeration:
public IEnumerable<T> ExecuteQuery<T>(IQueryable<T> query)
{
foreach (T record in query)
{
// Handle stop condition somehow
if (ShouldStop())
{
// Once you close enumerator, query is terminated
yield break;
}
yield return record;
}
}

Related

Difference between Find and FindAsync

I am writing a very, very simple query which just gets a document from a collection according to its unique Id. After some frusteration (I am new to mongo and the async / await programming model), I figured this out:
IMongoCollection<TModel> collection = // ...
FindOptions<TModel> options = new FindOptions<TModel> { Limit = 1 };
IAsyncCursor<TModel> task = await collection.FindAsync(x => x.Id.Equals(id), options);
List<TModel> list = await task.ToListAsync();
TModel result = list.FirstOrDefault();
return result;
It works, great! But I keep seeing references to a "Find" method, and I worked this out:
IMongoCollection<TModel> collection = // ...
IFindFluent<TModel, TModel> findFluent = collection.Find(x => x.Id == id);
findFluent = findFluent.Limit(1);
TModel result = await findFluent.FirstOrDefaultAsync();
return result;
As it turns out, this too works, great!
I'm sure that there's some important reason that we have two different ways to achieve these results. What is the difference between these methodologies, and why should I choose one over the other?
The difference is in a syntax.
Find and FindAsync both allows to build asynchronous query with the same performance, only
FindAsync returns cursor which doesn't load all documents at once and provides you interface to retrieve documents one by one from DB cursor. It's helpful in case when query result is huge.
Find provides you more simple syntax through method ToListAsync where it inside retrieves documents from cursor and returns all documents at once.
Imagine that you execute this code in a web request, with invoking find method the thread of the request will be frozen until the database return results it's a synchron call, if it's a long database operation that takes seconds to complete, you will have one of the threads available to serve web request doing nothing simply waiting that database return the results, and wasting valuable resources (the number of threads in thread pool is limited).
With FindAsync, the thread of your web request will be free while is waiting the database for returning the results, this means that during the database call this thread is free to attend an another web request. When the database returns the result then the code continue execution.
For long operations like read/writes from file system, database operations, comunicate with another services, it's a good idea to use async calls. Because while you are waiting for the results, the threads are available for serve another web request. This is more scalable.
Take a look to this microsoft article https://msdn.microsoft.com/en-us/magazine/dn802603.aspx.

Which is quickest, Loop or FirstOrDefault()

I'm trying to improve performance on a WPF app as my users are saddened by the fact that one part of the system, seems to have a performance issue. The part in question is a screen which shows logged in users. The slow part logs them out, by: scanning in their employee ref and finds their child control and removes it from the parent, i.e. logs them out. This currently uses a loop.
foreach (var userControl in UsersStackPanel.Children)
{
if (userControl.Employee.EmpRef == employee.EmpRef)
{
// plus some DB stuff here
UsersStackPanel.Children.Remove(userControl);
break;
}
}
but I've an alternative which does this,
var LoggedInUser = (UI.Controls.Generic.User)UsersStackPanel.Children
.OfType<FrameworkElement>()
.FirstOrDefault(e => e.Name == "EmpRef" + employee.EmpRef);
if (LoggedInUser != null)
{
// some DB stuff here
UsersStackPanel.Children.Remove(LoggedInUser);
}
I've timed both using the Stopwatch class but the results don't point to which is better, they both return 0 miliseconds. I am wondering if the DB part is the bottle neck, i just thought I'd start with the screen to improve things there as well as the DB updates.
Any thoughts appreciated.
Dan
It seems to me that your second example should look more like this:
UI.Controls.Generic.User LoggedInUser = UsersStackPanel.Children
.OfType<UI.Controls.Generic.User>()
.FirstOrDefault(e => e.Employee.EmpRef == employee.EmpRef);
if (LoggedInUser != null)
{
// some DB stuff here
UsersStackPanel.Children.Remove(LoggedInUser);
}
But regardless, unless you have hundreds of thousands of controls in your StackPanel (and if you do, you have bigger fish to fry than this loop), the database access will completely swamp and make irrelevant any performance difference in the two looping techniques.
There's not enough context in your question to know for sure what the correct thing to do here is, but in terms of keeping the UI responsive, most likely what you'll want to wind up doing is wrapping the DB access in a helper method, and then execute that method as an awaited task, e.g. await Task.Run(() => DoSomeDBStuff()); That will let the UI thread (which is presumably the thread that executes the code you posted) continue working while the DB operations go on. When the DB stuff is done, your method will continue execution at the next statement, i.e. the call to the Remove() method.

How to use Table.ExecuteQuerySegmentedAsync() with Azure Table Storage

Working with the Azure Storage Client library 2.1, I'm working on making a query of Table storage async. I created this code:
public async Task<List<TAzureTableEntity>> GetByPartitionKey(string partitionKey)
{
var theQuery = _table.CreateQuery<TAzureTableEntity>()
.Where(tEnt => tEnt.PartitionKey == partitionKey);
TableQuerySegment<TAzureTableEntity> querySegment = null;
var returnList = new List<TAzureTableEntity>();
while(querySegment == null || querySegment.ContinuationToken != null)
{
querySegment = await theQuery.AsTableQuery()
.ExecuteSegmentedAsync(querySegment != null ?
querySegment.ContinuationToken : null);
returnList.AddRange(querySegment);
}
return returnList;
}
Let's assume there is a large set of data coming back so there will be a lot of round trips to Table Storage. The problem I have is that we're awaiting a set of data, adding it to an in-memory list, awaiting more data, adding it to the same list, awaiting yet more data, adding it to the list... and so on and so forth. Why not just wrap a Task.Factory.StartNew() around a regular TableQuery? Like so:
public async Task<List<TAzureTableEntity>> GetByPartitionKey(string partitionKey)
{
var returnList = await Task.Factory.StartNew(() =>
table.CreateQuery<TAzureTableEntity>()
.Where(ent => ent.PartitionKey == partitionKey)
.ToList());
return returnList;
}
Doing it this way seems like we're not bouncing the SynchronizationContext back and forth so much. Or does it really matter?
Edit to Rephrase Question
What's the difference between the two scenarios mentioned above?
The difference between the two is that your second version will block a ThreadPool thread for the whole time the query is executing. This might be acceptable in a GUI application (where all you want is to execute the code somewhere other than the UI thread), but it will negate any scalability advantages of async in a server application.
Also, if you don't want your first version to return to the UI context for each roundtrip (which is a reasonable requirement), then use ConfigureAwait(false) whenever you use await:
querySegment = await theQuery.AsTableQuery()
.ExecuteSegmentedAsync(…)
.ConfigureAwait(false);
This way, all iterations after the first one will (most likely) execute on a ThreadPool thread and not on the UI context.
BTW, in your second version, you don't actually need await at all, you could just directly return the Task:
public Task<List<TAzureTableEntity>> GetByPartitionKey(string partitionKey)
{
return Task.Run(() => table.CreateQuery<TAzureTableEntity>()
.Where(ent => ent.PartitionKey == partitionKey)
.ToList());
}
Not sure if this is the answer you're looking for but I still want to mention it :).
As you may already know, the 2nd method (using Task) handles continuation tokens internally and comes out of the method when all entities have been fetched whereas the 1st method fetches a set of entities (up to a maximum of 1000) and then comes out giving you the result set as well as a continuation token.
If you're interested in fetching all entities from a table, both methods can be used however the 1st one gives you the flexibility of breaking out of loop gracefully anytime, which you don't get in the 2nd one. So using the 1st function you could essentially introduce pagination concept.
Let's assume you're building a web application which shows data from a table. Further let's assume that the table contains large number of entities (let's say 100000 entities). Using 1st method, you can just fetch 1000 entities return the result back to the user and if the user wants, you can fetch next set of 1000 entities and show them to the user. You could continue doing that till the time user wants and there's data in the table. With the 2nd method the user would have to wait till all 100000 entities are fetched from the table.

NHibernate concurrents threaded insertions with unique constraint

I have a multi-threaded application that may do some concurrent insertions of same type of object which has a property marked as unique.
public class Foo(){
...
string PropertyThatshouldBeUnique {get;set;}
...
}
Each thread has its own session and does :
Foo myFooInstance = new Foo();
myFooInstance.PropertyThatshouldBeUnique = "Bar";
myThreadSession.SaveOrUpadte(myFooInstance);
I have a unique constraint on my database table that prevents the
multiple insertions and therefore I get an exception on the second
insert which trigger the rollback of the whole transaction (which is not good)
Concurrents insertions can be really close (a few milliseconds)
I haven't configure any specific Nhibernate Concurrency strategy (not sure if this could solve my issue or which one to use)
My problem is :
How and where in the code should I check for previously inserted Foo objects with same property value ?
Can you do something like this in NHibernate and not breaking your current architecture?
if(!Update(connection))
{
using (var command = new SqlCommand(#"INSERT INTO foo VALUES (Bar, ...)", connection))
{
try
{
command.ExecuteNonQuery();
}
catch (BlahUniqueConstraintException) // dummy exception, please replace with relevant
{
// very rarely will get in here
Update(connection);
}
}
}
private bool Update(SqlConnection connection)
{
// use update to do two things at once: find out if the record exists and also ... update
using (var command = new SqlCommand(#"UPDATE foo SET ... WHERE PropertyThatshouldBeUnique = 'Bar'", connection))
{
// if the record exists and is updated then returns 1, otherwise 0
return command.ExecuteNonQuery() > 0;
}
}
This was posted long ago, but as I was finding myself in the same kind of situation let me tell you what solutions I've found.
For DB operations, instead of letting each threads doing the job via Nhibernate, they send their request to another thread which is doing a kind of a service job. It exposes some method like Insert.
Thread 1 want to insert Object1
Thread 2 want to insert Object1
They are doing something like Thread3.Insert(Object1) concurrently.
The first call Thread3 receive, it opens a Session to the DB (costless),it checks Object1 is not in the DB with a SELECT and then it adds it.
The second call Thread3 receive, even if its a nanosecond later, it does the same but stop after the SELECT and do nothing or send back an exception.
The cost of waiting for the first call to be processed is quite low, after all you limit your multi-threading speed by DB constraints that need to be checked (thus DB access)
It's like saying, hmmm I have 2 very fast PC and 1 low Server.
I think its more of a design question than Nhibernate functionality.
Synchro among threads is already complicated, but synchro of DB among threads with UnitOfWork pattern in use is a pain.
(this is one of your issue here as you dont Commit your transaction everytime you Insert)
Maybe some guru here at SO have better ideas though.

How can I queue MySQL queries to make them sequential rather than concurrent and prevent excessive I/O usage?

I have multiple independent processes each submitting bulk insert queries (millions of rows) into a MySQL database, but it is much slower to have them run concurrently than in sequence.
How can I throttle the execution of these queries so that only one at a time can be executed, in sequence?
I have thought of checking if PROCESSLIST contains any running queries but it may not be the best way to properly queue queries on a real first-come, first-queued, first-served basis.
I am using C# and the MySQL Connector for .NET.
I'm guessing that you're using InnoDb (which allows you to do concurrent writes). MyISAM only has table level locking so would queue up the writes.
I'd recommend an approach similar to ruakh's but that you use a table in the database to manage the locking with. The table would be called something like lock_control
Just before you try to do a bulk insert to the table you request a LOCK TABLES lock_control WRITE on this lock_control table. If you are granted the lock then continue with your bulk write and afterwards release the lock. If the table is write locked by another thread then the LOCK TABLES command will block until the lock is released.
You could do this locking with the table you're inserting into directly but I believe that no other thread would be able to read from the table either whilst you hold the lock.
The advantage over doing this locking in the db rather than on the filesystem is that you could have inserts coming in from multiple client machines and it somehow feels a little simpler to handle the locking/inserting all within MySQL.
I am not sure if this will help but here goes. I had a similar problem where my program was throwing exceptions as MySql queries were out of order. So, I decided to run my queries in sequence so succeeding queries don't fail. I found a solution here.
https://michaelscodingspot.com/c-job-queues/ -
public class job_queue
{
private ConcurrentQueue<Action> _jobs = new ConcurrentQueue<Action>();
private bool _delegateQueuedOrRunning = false;
public void Enqueue(Action job)
{
lock (_jobs)
{
_jobs.Enqueue(job);
if (!_delegateQueuedOrRunning)
{
_delegateQueuedOrRunning = true;
ThreadPool.UnsafeQueueUserWorkItem(ProcessQueuedItems, null);
}
}
}
private void ProcessQueuedItems(object ignored)
{
while (true)
{
Action item;
lock (_jobs)
{
if (_jobs.Count == 0)
{
_delegateQueuedOrRunning = false;
break;
}
_jobs.TryDequeue(out item);
}
try
{
//do job
item();
}
catch
{
ThreadPool.UnsafeQueueUserWorkItem(ProcessQueuedItems, null);
throw;
}
}
}
}
This is a class to run methods one after another in queue. And you add methods that contain Mysql Queries to job_queue by .
var mysql_tasks= new job_queue();
mysql_tasks.Enqueue(() => { Your_MYSQL_METHOD_HERE(); });
Create a Bulk insert service / class
Get the client to throw the Data at it.
It does them one at a time. Message back done if you need it.
You don't want to be choking your DB to one thread, that will kill everything else.
Not being much of a C#-er, I can't say if this is the best way to do this; but if no one gives a better answer, one common, non-language-specific approach to this sort of thing is to use a temporary file on the file-system. Before performing one of these INSERTs, grab a write-lock on the file, and after the INSERT is done, release the write-lock. (You'll want to use a using or finally block for this.) This answer gives sample code for obtaining a write-lock in C# in a blocking way.

Categories

Resources