I have the following code which take longer time to execute. Is there any alternative that I can replace below code with LINQ or any other way to increase the performance?
var usedLists = new HashSet<long>();
foreach (var test in Tests)
{
var requiredLists = this.GetLists(test.test, selectedTag);
foreach (var List in requiredLists)
{
if (!usedLists.Contains(List.Id))
{
usedLists.Add(List.Id);
var toRecipients = List.RecepientsTo;
var ccRecipients = List.RecipientsCC;
var bccRecipients = List.RecipientsBCC;
var replyTo = new List<string>() { List.ReplyTo };
var mailMode = isPreviewMode ? MailMode.Display : MailMode.Drafts;
OutlookModel.Instance.CreateEmail(toRecipients, ccRecipients, bccRecipients, this.Draft, mailMode, replyTo);
}
}
}
What do you actually do?
Looking at
foreach (var test in Tests)
{
var requiredLists = this.GetLists(test.test, selectedTag);
foreach (var List in requiredLists)
{
if (!usedLists.Contains(List.Id))
it looks to me that you try to get unique "List"s (with all those "var" I cannot tell the actual type). So you actually could replace that with a new function (to be written by you) and do
var uniqueLists = this.GetUniqueLists(Tests, selectedTag);
and finally call
this.Sendmails(uniqueLists);
which would iterate thru that list and send the mails.
If that could speed up the code or not, depends severely on the underlying GetLists / GetUniqueLists functions.
But in any case, it would make a great progress to your code: it becomes readable and testable.
Assuming the result from GetList is large/slow enough in comparison to CreateEmail call, skipping your contains-add hashset comparison will speed things up. Try a call to Distinct([Your ListComparer]) or GroupBy():
var requiredLists =
Tests.SelectMany(test => this.GetLists(test.test, selectedTag))
.Distinct([Your ListComparer]);
// or
var requiredLists =
Tests.SelectMany(test => this.GetLists(test.test, selectedTag))
.GroupBy(x => x.Id).SelectMany(x => x.First());
foreach (var List in requiredLists)
{
// (...)
OutlookModel.Instance.CreateEmail(toRecipients, ccRecipients, bccRecipients, this.Draft, mailMode, replyTo);
}
You could also try PLINQ to speed up things but I think you might run into trouble with the CreateEmail call, as it's probably a COM object, so multithreading will be a hassle. If GetList is slow enough to be worth the multithread overhead, you may experiment with it in the first call when creating requiredLists.
It might be able to done with the BackgroundWorker class or something similar.
A very rough outline would be to wrap the create email arguments into a separate class, then asynchronously start a background task while the loop continues processing the list.
Edit
The Task class might be of use too. This is assuming the CreateEmail function is the one that's taking up all the time. If it's the loop that's taking all the time then there's no point in going this route though.
Edit
Looking at the algorithmic complexity, the GetLists() calls run in O(n) time, but the CreateEmail() calls run in O(n^2) time, so - all other things being equal - the code block containing the CreateEmail() call would be a better candidate to optimise first.
Each foreach loop creates an Enumerator object, so using a regular for-loop might speed it up a little, but this might be a very minor improvement.
Like stormenet said, the CreateEmail function could be the bottleneck. You might want to make that asynchronous.
Related
I have the working foreach as follows.
List<Thing> things = new List<Thing>();
foreach (Original original in originals)
things.Add(new Thing(orignal));
Then, I got smart and LINQified it into the following, still working code.
IEnumerable<Thing> things = originals.Select(origo => new Thing(origo));
Feeling really proud of decimating the number of lines as well and LINQing myself to more clear code I realized that there's also the requirement to update the process table. It's imperative that the update occurs simultanously as we proceed through the transformation. So, with my tail between my legs and feels much less pride, I went back to the original code and added the notification method.
List<Thing> things = new List<Thing>();
foreach (Original original in originals)
{
things.Add(new Thing(orignal));
DoTell();
}
My question is if it's possible to keep the LINQie syntax and still be able to incorporate the teller somehow. I'm not hopeful that the code will look nice nor be more readable that way (although, it'd be awesome if it was). However, now it's a matter of academic curiosity and pure stubborness - I'd like to know if it can be done at all.
There is a really ugly way to do this. It is purely academic.
IEnumerable<Thing> things = originals.Select(origo => {DoTell(); return new Thing(origo)});
You will update your table and return new Thing in Select. Select won't fail because you are using brackets in it and tell that you are returning only new Thing not result of DoTell()
I suggest you to go with foreach because it will really look cleaner in the end. Using LINQ everywhere is not a solution for cleaner/clearer code.
EDIT:
if you still want to go with LINQ, you can do even uglier version of it (one-liner):
var things = things.Select(x => new { o = new Thing(), b = DoTell()}).Select(x=>x.o);
This approach is pure evil. However it works without return statement :)
This shall work, using original Select Syntax
IEnumerable<Thing> things = originals.Select(origo =>
{
DoTell();
return new Thing(origo);
});
In case DoTell() needs to be a post call and it needs to use the new Thing created for some internal usage, which is not apparent in this code, then do the following:
IEnumerable<Thing> things = originals.Select(origo =>
{
var thing = new Thing(origo);
DoTell();
return thing;
});
originals.ForEach(q =>
{
//do first thing
//do second thing
});
originals.ToList().ForEach(x=>{things.Add(new Thing(x));DoTell();});
Additional explanation ForEach return type is void however Select has return type IEnumerable.
So usage of both depend on context.
I have about 100 items (allRights) in the database and about 10 id-s to be searched (inputRightsIds). Which one is better - first to get all rights and then search the items (Variant 1) or to make 10 checking requests requests to the database
Here is some example code:
DbContext db = new DbContext();
int[] inputRightsIds = new int[10]{...};
Variant 1
var allRights = db.Rights.ToLIst();
foreach( var right in allRights)
{
for(int i>0; i<inputRightsIds.Lenght; i++)
{
if(inputRightsIds[i] == right.Id)
{
// Do something
}
}
}
Variant 2
for(int i>0; i<inputRightsIds.Lenght; i++)
{
if(db.Rights.Any(r => r.Id == inputRightsIds[i]);)
{
// Do something
}
}
Thanks in advance!
As other's have already stated you should do the following.
var matchingIds = from r in db.Rights
where inputRightIds.Contains(r.Id)
select r.Id;
foreach(var id in matchingIds)
{
// Do something
}
But this is different from both of your approaches. In your first approach you are making one SQL call to the DB that is returning more results than you are interested in. The second is making multiple SQL calls returning part of the information you want with each call. The query above will make one SQL call to the DB and return only the data you are interested in. This is the best approach as it reduces the two bottle necks of making multiple calls to the DB and having too much data returned.
You can use following :
db.Rights.Where(right => inputRightsIds.Contains(right.Id));
They should be very similar speeds since both must enumerate the arrays the same number of times. There might be subtle differences in speed between the two depending on the input data but in general I would go with Variant 2. I think you should almost always prefer LINQ over manual enumeration when possible. Also consider using the following LINQ statement to simplify the whole search to a single line.
var matches = db.Rights.Where(r=> inputRightIds.Contains(r.Id));
...//Do stuff with matches
Not forget get all your items into memory to process list further
var itemsFromDatabase = db.Rights.Where(r => inputRightsIds.Contains(r.Id)).ToList();
Or you could even enumerate through collection and do some stuff on each item
db.Rights.Where(r => inputRightsIds.Contains(r.Id)).ToList().Foreach(item => {
//your code here
});
Current Code:
For each element in the MapEntryTable, check the properties IsDisplayedColumn and IsReturnColumn and if they are true then add the element to another set of lists, its running time would be O(n), there would be many elements with both properties as false, so will not get added to any of the lists in the loop.
foreach (var mapEntry in MapEntryTable)
{
if (mapEntry.IsDisplayedColumn)
Type1.DisplayColumnId.Add(mapEntry.OutputColumnId);
if (mapEntry.IsReturnColumn)
Type1.ReturnColumnId.Add(mapEntry.OutputColumnId);
}
Following is the Linq version of doing the same:
MapEntryTable.Where(x => x.IsDisplayedColumn == true).ToList().ForEach(mapEntry => Type1.DisplayColumnId.Add(mapEntry.OutputColumnId));
MapEntryTable.Where(x => x.IsReturnColumn == true).ToList().ForEach(mapEntry => Type1.ReturnColumnId.Add(mapEntry.OutputColumnId));
I am converting all such foreach code to linq, as I am learning it, but my question is:
Do I get any advantage of Linq conversion in this case or is it a disadvantage ?
Is there a better way to do the same using Linq
UPDATE:
Consider the condition where out of 1000 elements in the list 80% have both properties false, then does where provides me a benefit of quickly finding elements with a given condition.
Type1 is a custom type with set of List<int> structures, DisplayColumnId and ReturnColumnId
ForEach ins't a LINQ method. It's a method of List. And not only is it not a part of LINQ, it's very much against the very values and patterns of LINQ. Eric Lippet explains this in a blog post that was written when he was a principle developer on the C# compiler team.
Your "LINQ" approach also:
Completely unnecessarily copies all of the items to be added into a list, which is both wasteful in time and memory and also conflicts with LINQ's goals of deferred execution when executing queries.
Isn't actually a query with the exception of the Where operator. You're acting on the items in the query, rather than performing a query. LINQ is a querying tool, not a tool for manipulating data sets.
You're iterating the source sequence twice. This may or may not be a problem, depending on what the source sequence actually is and what the costs of iterating it are.
A solution that uses LINQ as much as is it is designed for would be to use it like so:
foreach (var mapEntry in MapEntryTable.Where(entry => mapEntry.IsDisplayedColumn))
list1.DisplayColumnId.Add(mapEntry.OutputColumnId);
foreach (var mapEntry in MapEntryTable.Where(entry => mapEntry.IsReturnColumn))
list2.ReturnColumnId.Add(mapEntry.OutputColumnId);
I would say stick with the original way with the foreach loop, since you are only iterating through the list 1 time over.
also your linq should look more like this:
list1.DisplayColumnId.AddRange(MapEntryTable.Where(x => x.IsDisplayedColumn).Select(mapEntry => mapEntry.OutputColumnId));
list2.ReturnColumnId.AddRange(MapEntryTable.Where(x => x.IsReturnColumn).Select(mapEntry => mapEntry.OutputColumnId));
The performance of foreach vs Linq ForEach are almost exactly the same, within nano seconds of each other. Assuming you have the same internal logic in the loop in both versions when testing.
However a for loop, outperforms both by a LARGE margin. for(int i; i < count; ++i) is much faster than both. Because a for loop doesn't rely on an IEnumerable implementation (overhead). The for loop compiles to x86 register index/jump code. It maintains an incrementor, and then it's up to you to retrieve the item by it's index in the loop.
Using a Linq ForEach loop though does have a big disadvantage. You cannot break out of the loop. If you need to do that you have to maintain a boolean like "breakLoop = false", set it to true, and have each recursive exit if breakLoop is true... Bad performing there. Secondly you cannot use continue, instead you use "return".
I never use Linq's foreach loop.
If you are dealing with linq, e.g.
List<Thing> things = .....;
var oldThings = things.Where(p.DateTime.Year < DateTime.Now.Year);
That internally will foreach with linq and give you back only the items with a year less than the current year. Cool..
But if I am doing this:
List<Thing> things = new List<Thing>();
foreach(XElement node in Results) {
things.Add(new Thing(node));
}
I don't need to use a linq for each loop. Even if I did...
foreach(var node in thingNodes.Where(p => p.NodeType == "Thing") {
if (node.Ignore) {
continue;
}
thing.Add(node);
}
even though I could write that cleaner like
foreach(var node in thingNodes.Where(p => p.NodeType == "Thing" && !node.Ignore) {
thing.Add(node);
}
There is no real reason I can think of to do this..>
things.ForEach(thing => {
//do something
//can't break
//can't continue
return; //<- continue
});
And if I want the fastest loop possible,
for (int i = 0; i < things.Count; ++i) {
var thing = things[i];
//do something
}
Will be faster.
Your LINQ isn't quite right as you're converting the results of Where to a List and then pseudo-iterating over those results with ForEach to add to another list. Use ToList or AddRange for converting or adding sequences to lists.
Example, where overwriting list1 (if it were actually a List<T>):
list1 = MapEntryTable.Where(x => x.IsDisplayedColumn == true)
.Select(mapEntry => mapEntry.OutputColumnId).ToList();
or to append:
list1.AddRange(MapEntryTable.Where(x => x.IsDisplayedColumn == true)
.Select(mapEntry => mapEntry.OutputColumnId));
In C#, to do what you want functionally in one call, you have to write your own partition method. If you are open to using F#, you can use List.Partition<'T>
https://msdn.microsoft.com/en-us/library/ee353782.aspx
Update 2011-05-20 12:49AM: The foreach is still 25% faster than the parallel solution for my application. And don't use the collection count for max parallelism, use somthing closer to the number of cores on your machine.
=
I have an IO bound task that I would like to run in parallel. I want to apply the same operation to every file in a folder. Internally, the operation results in a Dispatcher.Invoke that adds the computed file info to a collection on the UI thread. So, in a sense, the work result is a side effect of the method call, not a value returned directly from the method call.
This is the core loop that I want to run in parallel
foreach (ShellObject sf in sfcoll)
ProcessShellObject(sf, curExeName);
The context for this loop is here:
var curExeName = Path.GetFileName(Assembly.GetEntryAssembly().Location);
using (ShellFileSystemFolder sfcoll = ShellFileSystemFolder.FromFolderPath(_rootPath))
{
//This works, but is not parallel.
foreach (ShellObject sf in sfcoll)
ProcessShellObject(sf, curExeName);
//This doesn't work.
//My attempt at PLINQ. This code never calls method ProcessShellObject.
var query = from sf in sfcoll.AsParallel().WithDegreeOfParallelism(sfcoll.Count())
let p = ProcessShellObject(sf, curExeName)
select p;
}
private String ProcessShellObject(ShellObject sf, string curExeName)
{
String unusedReturnValueName = sf.ParsingName
try
{
DesktopItem di = new DesktopItem(sf);
//Up date DesktopItem stuff
di.PropertyChanged += new PropertyChangedEventHandler(DesktopItem_PropertyChanged);
ControlWindowHelper.MainWindow.Dispatcher.Invoke(
(Action)(() => _desktopItemCollection.Add(di)));
}
catch (Exception ex)
{
}
return unusedReturnValueName ;
}
Thanks for any help!
+tom
EDIT: Regarding the update to your question. I hadn't spotted that the task was IO-bound - and presumably all the files are from a single (traditional?) disk. Yes, that would go slower - because you're introducing contention in a non-parallelizable resource, forcing the disk to seek all over the place.
IO-bound tasks can still be parallelized effectively sometimes - but it depends on whether the resource itself is parallelizable. For example, an SSD (which has much smaller seek times) may completely change the characteristics you're seeing - or if you're fetching over the network from several individually-slow servers, you could be IO-bound but not on a single channel.
You've created a query, but never used it. The simplest way of forcing everything to be used with the query would be to use Count() or ToList(), or something similar. However, a better approach would be to use Parallel.ForEach:
var options = new ParallelOptions { MaxDegreeOfParallelism = sfcoll.Count() };
Parallel.ForEach(sfcoll, options, sf => ProcessShellObject(sf, curExeName));
I'm not sure that setting the max degree of parallelism like that is the right approach though. It may work, but I'm not sure. A different way of approaching this would be to start all the operations as tasks, specifying TaskCreationOptions.LongRunning.
Your query object created via LINQ is an IEnumerable. It gets evaluated only if you enumerate it (eg. via foreach loop):
var query = from sf in sfcoll.AsParallel().WithDegreeOfParallelism(sfcoll.Count())
let p = ProcessShellObject(sf, curExeName)
select p;
foreach(var q in query)
{
// ....
}
// or:
var results = query.ToArray(); // also enumerates query
Should you add a line in the end
var results = query.ToList();
I am trying to write a utility to see if a user has logged in to windows since a date that I have stored in a database.
private void bwFindDates_DoWork(object sender, DoWorkEventArgs e)
{
UserPrincipal u = new UserPrincipal(context);
u.SamAccountName = "WebLogin*";
PrincipalSearcher ps = new PrincipalSearcher(u);
var result = ps.FindAll();
foreach (WebAccess.WebLoginUsersRow usr in webAccess.WebLoginUsers)
{
UserPrincipal b = (UserPrincipal)result.
Single((a) => a.SamAccountName == usr.WEBUSER);
if (b.LastLogon.HasValue)
{
if (b.LastLogon.Value < usr.MODIFYDATE)
usr.LastLogin = "Never";
else
usr.LastLogin = b.LastLogon.Value.ToShortDateString();
}
else
{
usr.LastLogin = "Never";
}
}
}
However the performance is very slow. The user list I am pulling from has about 150 Windows users, so when I hit the line UserPrincipal b = (UserPrincipal)result.Single((a) => a.SamAccountName == usr.CONVUSER); it takes 10 to 15 seconds for it to complete per user (stepping through i can see it is doing the step a.SamAccountName == usr.CONVUSE is run for every person so the worst case is running O(n^2) times)
Any recommendations on ways to improve my efficiency?
I would suggest:
var result = ps.FindAll().ToList();
Since PrincipalSearchResult doesn't cache like other things, this will bring you down near an O(n) performance level.
It's surprising that Single() should take quite so long on such a small list. I have to believe something else is going on here. The call to ps.FindAll() may be returning an object that does not cache it's results, and is forcing you to make an expensive call to some resource on each iteration within Single().
You may want to use a profiler to investigate where time is going when you hit that line. I would also suggest looking at the implementation of FIndAll() because it's returning something unusually expensive to iterate over.
So after reading your code a little more closely, it makes sense why Single() is so expensive. The PrincipalSearcher class uses the directory services store as the repository against which to search. It does not cache these results. That's what's affecting your performance.
You probably want to materialize the list using either ToList() or ToDictionary() so that accessing the principal information happens locally.
You could also avoid this kind of code entirely, and use the FindOne() method instead, which allows you to query directly for the principal you want.
But if you can't use that, then something like this should work better:
result.ToDictionary(u => u.SamAccountName)[usr.WEBUSER]
var userMap = result.ToDictionary(u => u.SamAccountName);
foreach (WebAccess.WebLoginUsersRow usr in webAccess.WebLoginUsers)
{
UserPrincipal b = userMap[usr.WEBUSER];
// ...
}