Which is quickest, Loop or FirstOrDefault() - c#

I'm trying to improve performance on a WPF app as my users are saddened by the fact that one part of the system, seems to have a performance issue. The part in question is a screen which shows logged in users. The slow part logs them out, by: scanning in their employee ref and finds their child control and removes it from the parent, i.e. logs them out. This currently uses a loop.
foreach (var userControl in UsersStackPanel.Children)
{
if (userControl.Employee.EmpRef == employee.EmpRef)
{
// plus some DB stuff here
UsersStackPanel.Children.Remove(userControl);
break;
}
}
but I've an alternative which does this,
var LoggedInUser = (UI.Controls.Generic.User)UsersStackPanel.Children
.OfType<FrameworkElement>()
.FirstOrDefault(e => e.Name == "EmpRef" + employee.EmpRef);
if (LoggedInUser != null)
{
// some DB stuff here
UsersStackPanel.Children.Remove(LoggedInUser);
}
I've timed both using the Stopwatch class but the results don't point to which is better, they both return 0 miliseconds. I am wondering if the DB part is the bottle neck, i just thought I'd start with the screen to improve things there as well as the DB updates.
Any thoughts appreciated.
Dan

It seems to me that your second example should look more like this:
UI.Controls.Generic.User LoggedInUser = UsersStackPanel.Children
.OfType<UI.Controls.Generic.User>()
.FirstOrDefault(e => e.Employee.EmpRef == employee.EmpRef);
if (LoggedInUser != null)
{
// some DB stuff here
UsersStackPanel.Children.Remove(LoggedInUser);
}
But regardless, unless you have hundreds of thousands of controls in your StackPanel (and if you do, you have bigger fish to fry than this loop), the database access will completely swamp and make irrelevant any performance difference in the two looping techniques.
There's not enough context in your question to know for sure what the correct thing to do here is, but in terms of keeping the UI responsive, most likely what you'll want to wind up doing is wrapping the DB access in a helper method, and then execute that method as an awaited task, e.g. await Task.Run(() => DoSomeDBStuff()); That will let the UI thread (which is presumably the thread that executes the code you posted) continue working while the DB operations go on. When the DB stuff is done, your method will continue execution at the next statement, i.e. the call to the Remove() method.

Related

Is a good idea to async-ize a sync method?

In my ASP.NET Core app, at some points I'm querying a couple ADs for data. This being AD, the queries take some time to complete and the DirectoryServices API contains only synchronous calls.
Is it a good practice to try and wrap the AD sync calls as async? I think it's done like this (just an example, not the real query):
private async Task<string[]> GetUserGroupsAsync(string samAccountName)
{
var func = new Func<string, string[]>(sam =>
{
var result = new List<string>();
using (var ctx = new PrincipalContext(ContextType.Domain, "", "", ""))
{
var p = new UserPrincipal(ctx)
{
SamAccountName = sam
};
using (var search_obj = new PrincipalSearcher(p))
{
var query_result = search_obj.FindOne();
if (query_result != null)
{
var usuario = query_result as UserPrincipal;
var directory_entry = usuario.GetUnderlyingObject() as DirectoryEntry;
var grupos = usuario.GetGroups(ctx).OfType<GroupPrincipal>().ToArray();
if (grupos != null)
{
foreach (GroupPrincipal g in grupos)
{
result.Add(g.Name);
}
}
}
}
}
return result.ToArray();
});
var result = await Task.Run(() => func(samAccountName));
return result;
}
Is it a good practice
Usually not.
In a desktop app where you don't want to hold up the UI thread, then this idea can actually be a good idea. That Task.Run moves the work to a different thread and the UI thread can continue responding to user input while you're waiting for a response.
You tagged ASP.NET. The answer there is also "it depends". ASP.NET has a limited amount of worker threads that it's allowed to use. The benefit of asynchronous code is to allow a thread to go and work on some other request while you're waiting for a response. Thus, you can serve more requests with the same amount of available threads. It helps the overall performance of your application.
If you're calling await GetUserGroupsAsync(), then there is absolutely no benefit to doing what you're doing. You're freeing up the calling thread, but you've created a new thread that is going to sit locked until a response is returned. So your net thread savings is zero, and you have the additional CPU overhead of setting up the task.
If you intend on calling GetUserGroupsAsync() and then going out and getting other data while you wait for a response, then this can save time. It won't save threads, but just time. But you should be conscious that you are now taking up two threads for each request instead of just one, which means you can hit the ASP.NET max thread count faster, potentially hurting the overall performance of your application.
But whether you want to save time in ASP.NET, or if you want to free up the UI thread in a desktop app, I would still argue that you should not use Task.Run inside GetUserGroupsAsync(). If the caller wants to offload that waiting to another thread so it can then go get other data, then the caller can use Task.Run, like this:
var groupsTask = Task.Run(() => GetUserGroupsAsync());
// make HTTP request or get some other external data while we wait
var groups = await groupsTask;
The decision on whether you should create a method for a class should depend on the answer to the question: if someone thinks of what this class represents, would he think that this class will have this functionality?
Compare this with class string and methods about string equality. Most people would think that two strings are equal if they have exactly the same characters in the same order. However, for a lot of applications, it might be handy to be able to compare two strings with case insensitivity. Instead of changing the equality method of string, a new class is created. This StringComparer class contains a lot of methods to compare strings using different definitions of equality.
If someone would say: "Okay, I've just created a class that represents several methods to compare two strings for equality". Would you expect that comparing with case insensitivity is one of the methods of this class? Of course you would!
The same should be with your class. I don't know what your class represents. However, apparently you thought, that someone who has an object of this class would be happy to "Get User Groups". He is happy that he doesn't have to know how that someone made this method for him, and that he doesn't need to know the insides of the class to be able to get the user groups.
This information hiding is an important thing of classes. It gives the creator of the class the freedom to internally change how the class works, without having to change usage of the class.
So if everyone who knows what your class represents would think: "of course getting user groups will take a considerable amount of time", and "of course, my thread will be waiting idly when getting user groups", then users of your class would expect the presence of asyn-await, to prevent idly waiting.
On the other hand, it might be that users of your class would say: "Well, I know that getting user groups will take some heavy calculations. It will take some time, but my thread will be very busy". In that case, they won't expect an async method.
Assuming that you have a non-async method to get the user groups:
string[] GetUserGroups(string samAccountName) {...}
The async method would be very simple:
Task<string[] GetUserGroupsAsync(string samAccountName)
{
return Task.Run(() => GetUserGroups(samAccountName));
}
The only thing you would have to decide is: do the users of my class expect this method?
Advantages and Disadvantages
Disadvantage of having a Sync and an Async method:
People who learn about your class have to learn about more methods
Users of your class can't decide how the async method calls the sync one, without creating an extra async method, which will only add to the confusion
You'll have to add an extra unit test
You'll have to maintain the async method forever.
Advantages of having an async method:
If in future a user group would be fetched from another process, for instance a database, or an XML file, or maybe the internet, then you can internally change the class, without having to change the many, many users (after all, all your classes are very popular, aren't they :)
Conclusion
If people look at your class, and they wouldn't even think that fetching user groups would be an async method, then don't create it.
If you think that maybe in future it could be that another process provides the user groups, then it would be wise to prepare your users about this.

Using TPL to batch/de-parallelise separate invocations

Maybe the TPL isn't the right tool, but at least from one not particularly familiar with it, it seems like it ought to have what I'm looking for. I'm open to answers that don't use it though.
Given a method like this:
public Task Submit(IEnumerable<WorkItem> work)
This can execute an expensive async operation on a collection of items. Normally the caller batches up these items and submits as many as it can at once, and there's a fairly long delay between such batches, so it executes fairly efficiently.
However there are some occasions where no external batching happens and Submit gets called for a small number of items (typically only one) many times in quick succession, possibly even concurrently from separate threads.
What I'd like to do is to defer processing (while accumulating the arguments) until there has been a certain amount of time with no calls, and then execute the operation with the whole batch, in the originally specified order.
Or in other words, each time the method is called it should add its arguments to the list of pending items and then restart the delay from zero, such that a certain idle time is required before anything is processed.
I don't want a size limit on the batch (so I don't think BatchBlock is the right answer), I just want a delay/timeout. I'm certain that the calling pattern is such that there will be an idle period at some point.
I'm not sure whether it's better to defer even the first call, or if it should start the operation immediately and only defer subsequent calls if the operation is still in progress.
If it makes the problem easier, I'm ok with making Submit return void instead of a Task (ie. not being able to observe when it completes).
I'm sure I can muddle together something that works like this, but it seems like the sort of thing that ought to already exist somewhere. Can anyone point me in the right direction? (I'd prefer not to use non-core libraries, though.)
Ok, so for lack of finding anything suitable I ended up implementing something myself. Seems to do the trick. (I implemented it a bit more generically than shown here in my actual code, so I could reuse it more easily, but this illustrates the concept.)
private readonly ConcurrentQueue<WorkItem> _Items
= new ConcurrentQueue<WorkItem>();
private CancellationTokenSource _CancelSource;
public async Task Submit(IEnumerable<WorkItem> items)
{
var cancel = ReplacePreviousTasks();
foreach (var item in items)
{
_Items.Enqueue(item);
}
await Task.Delay(TimeSpan.FromMilliseconds(250), cancel.Token);
if (!cancel.IsCancellationRequested)
{
await RunOperation();
}
}
private CancellationTokenSource ReplacePreviousTasks()
{
var cancel = new CancellationTokenSource();
var old = Interlocked.Exchange(ref _CancelSource, cancel);
if (old != null)
{
old.Cancel();
}
return cancel;
}
private async Task RunOperation()
{
var items = new List<WorkItem>();
WorkItem item;
while (_Items.TryDequeue(out item))
{
items.Add(item);
}
// do the operation on items
}
If multiple submissions occur within 250ms, the earlier ones are cancelled, and the operation executes once on all of the items after the 250ms is up (counting from the latest submit).
If another submit occurs while the operation is running, it will continue to run without cancelling (there's a tiny chance it will steal some of the items from the later call, but that's ok).
(Technically checking cancel.IsCancellationRequested isn't really necessary, since the await above will throw an exception if it was cancelled during the delay. But it doesn't hurt, and there is a tiny window it might catch.)

locking in Parallel Linq when updating UI elements

I'm using parallel linq to load a list of links from a text file. I'm checking each line whether it is a valid link(Uri) or not...if it is a valid Uri it is added to a Listbox. I'm just wondering if i should lock the ListBox.Items while adding a link to it.
Here is my code.
if (openFile.ShowDialog() == DialogResult.OK)
{
File.ReadLines(openFile.FileName).AsParallel().AsOrdered().ForAll(x =>
{
if (x.IsValidUri())
{
//lock(siteList.Items) <-should I?
siteList.Invoke(new Action<string>(s => siteList.Items.Add(s)), x);
}
});
}
There is no need to lock in this case. Using Invoke() already forces all changes to the Items collection to occur synchronously on the GUI thread.
Because of that though, you're not really gaining anything by using AsParallel(). You may want to consider using BeginInvoke() instead, which may speed things up a bit. That way, the calling thread isn't waiting for the invoke to complete.

How to use Table.ExecuteQuerySegmentedAsync() with Azure Table Storage

Working with the Azure Storage Client library 2.1, I'm working on making a query of Table storage async. I created this code:
public async Task<List<TAzureTableEntity>> GetByPartitionKey(string partitionKey)
{
var theQuery = _table.CreateQuery<TAzureTableEntity>()
.Where(tEnt => tEnt.PartitionKey == partitionKey);
TableQuerySegment<TAzureTableEntity> querySegment = null;
var returnList = new List<TAzureTableEntity>();
while(querySegment == null || querySegment.ContinuationToken != null)
{
querySegment = await theQuery.AsTableQuery()
.ExecuteSegmentedAsync(querySegment != null ?
querySegment.ContinuationToken : null);
returnList.AddRange(querySegment);
}
return returnList;
}
Let's assume there is a large set of data coming back so there will be a lot of round trips to Table Storage. The problem I have is that we're awaiting a set of data, adding it to an in-memory list, awaiting more data, adding it to the same list, awaiting yet more data, adding it to the list... and so on and so forth. Why not just wrap a Task.Factory.StartNew() around a regular TableQuery? Like so:
public async Task<List<TAzureTableEntity>> GetByPartitionKey(string partitionKey)
{
var returnList = await Task.Factory.StartNew(() =>
table.CreateQuery<TAzureTableEntity>()
.Where(ent => ent.PartitionKey == partitionKey)
.ToList());
return returnList;
}
Doing it this way seems like we're not bouncing the SynchronizationContext back and forth so much. Or does it really matter?
Edit to Rephrase Question
What's the difference between the two scenarios mentioned above?
The difference between the two is that your second version will block a ThreadPool thread for the whole time the query is executing. This might be acceptable in a GUI application (where all you want is to execute the code somewhere other than the UI thread), but it will negate any scalability advantages of async in a server application.
Also, if you don't want your first version to return to the UI context for each roundtrip (which is a reasonable requirement), then use ConfigureAwait(false) whenever you use await:
querySegment = await theQuery.AsTableQuery()
.ExecuteSegmentedAsync(…)
.ConfigureAwait(false);
This way, all iterations after the first one will (most likely) execute on a ThreadPool thread and not on the UI context.
BTW, in your second version, you don't actually need await at all, you could just directly return the Task:
public Task<List<TAzureTableEntity>> GetByPartitionKey(string partitionKey)
{
return Task.Run(() => table.CreateQuery<TAzureTableEntity>()
.Where(ent => ent.PartitionKey == partitionKey)
.ToList());
}
Not sure if this is the answer you're looking for but I still want to mention it :).
As you may already know, the 2nd method (using Task) handles continuation tokens internally and comes out of the method when all entities have been fetched whereas the 1st method fetches a set of entities (up to a maximum of 1000) and then comes out giving you the result set as well as a continuation token.
If you're interested in fetching all entities from a table, both methods can be used however the 1st one gives you the flexibility of breaking out of loop gracefully anytime, which you don't get in the 2nd one. So using the 1st function you could essentially introduce pagination concept.
Let's assume you're building a web application which shows data from a table. Further let's assume that the table contains large number of entities (let's say 100000 entities). Using 1st method, you can just fetch 1000 entities return the result back to the user and if the user wants, you can fetch next set of 1000 entities and show them to the user. You could continue doing that till the time user wants and there's data in the table. With the 2nd method the user would have to wait till all 100000 entities are fetched from the table.

How can a user cancel a long running query? [duplicate]

I'm in the process of writing a query manager for a WinForms application that, among other things, needs to be able to deliver real-time search results to the user as they're entering a query (think Google's live results, though obviously in a thick client environment rather than the web). Since the results need to start arriving as the user types, the search will get more and more specific, so I'd like to be able to cancel a query if it's still executing while the user has entered more specific information (since the results would simply be discarded, anyway).
If this were ordinary ADO.NET, I could obviously just use the DbCommand.Cancel function and be done with it, but we're using EF4 for our data access and there doesn't appear to be an obvious way to cancel a query. Additionally, opening System.Data.Entity in Reflector and looking at EntityCommand.Cancel shows a discouragingly empty method body, despite the docs claiming that calling this would pass it on to the provider command's corresponding Cancel function.
I have considered simply letting the existing query run and spinning up a new context to execute the new search (and just disposing of the existing query once it finishes), but I don't like the idea of a single client having a multitude of open database connections running parallel queries when I'm only interested in the results of the most recent one.
All of this is leading me to believe that there's simply no way to cancel an EF query once it's been dispatched to the database, but I'm hoping that someone here might be able to point out something I've overlooked.
TL/DR Version: Is it possible to cancel an EF4 query that's currently executing?
Looks like you have found some bug in EF but when you report it to MS it will be considered as bug in documentation. Anyway I don't like the idea of interacting directly with EntityCommand. Here is my example how to kill current query:
var thread = new Thread((param) =>
{
var currentString = param as string;
if (currentString == null)
{
// TODO OMG exception
throw new Exception();
}
AdventureWorks2008R2Entities entities = null;
try // Don't use using because it can cause race condition
{
entities = new AdventureWorks2008R2Entities();
ObjectQuery<Person> query = entities.People
.Include("Password")
.Include("PersonPhone")
.Include("EmailAddress")
.Include("BusinessEntity")
.Include("BusinessEntityContact");
// Improves performance of readonly query where
// objects do not have to be tracked by context
// Edit: But it doesn't work for this query because of includes
// query.MergeOption = MergeOption.NoTracking;
foreach (var record in query
.Where(p => p.LastName.StartsWith(currentString)))
{
// TODO fill some buffer and invoke UI update
}
}
finally
{
if (entities != null)
{
entities.Dispose();
}
}
});
thread.Start("P");
// Just for test
Thread.Sleep(500);
thread.Abort();
It is result of my playing with if after 30 minutes so it is probably not something which should be considered as final solution. I'm posting it to at least get some feedback with possible problems caused by this solution. Main points are:
Context is handled inside the thread
Result is not tracked by context
If you kill the thread query is terminated and context is disposed (connection released)
If you kill the thread before you start a new thread you should use still one connection.
I checked that query is started and terminated in SQL profiler.
Edit:
Btw. another approach to simply stop current query is inside enumeration:
public IEnumerable<T> ExecuteQuery<T>(IQueryable<T> query)
{
foreach (T record in query)
{
// Handle stop condition somehow
if (ShouldStop())
{
// Once you close enumerator, query is terminated
yield break;
}
yield return record;
}
}

Categories

Resources