Finding results in a Enumerable object quickly

Finding results in a Enumerable object quickly - c#

I am trying to write a utility to see if a user has logged in to windows since a date that I have stored in a database.
private void bwFindDates_DoWork(object sender, DoWorkEventArgs e)
{
UserPrincipal u = new UserPrincipal(context);
u.SamAccountName = "WebLogin*";
PrincipalSearcher ps = new PrincipalSearcher(u);
var result = ps.FindAll();
foreach (WebAccess.WebLoginUsersRow usr in webAccess.WebLoginUsers)
{
UserPrincipal b = (UserPrincipal)result.
Single((a) => a.SamAccountName == usr.WEBUSER);
if (b.LastLogon.HasValue)
{
if (b.LastLogon.Value < usr.MODIFYDATE)
usr.LastLogin = "Never";
else
usr.LastLogin = b.LastLogon.Value.ToShortDateString();
}
else
{
usr.LastLogin = "Never";
}
}
}
However the performance is very slow. The user list I am pulling from has about 150 Windows users, so when I hit the line UserPrincipal b = (UserPrincipal)result.Single((a) => a.SamAccountName == usr.CONVUSER); it takes 10 to 15 seconds for it to complete per user (stepping through i can see it is doing the step a.SamAccountName == usr.CONVUSE is run for every person so the worst case is running O(n^2) times)
Any recommendations on ways to improve my efficiency?

I would suggest:
var result = ps.FindAll().ToList();
Since PrincipalSearchResult doesn't cache like other things, this will bring you down near an O(n) performance level.

It's surprising that Single() should take quite so long on such a small list. I have to believe something else is going on here. The call to ps.FindAll() may be returning an object that does not cache it's results, and is forcing you to make an expensive call to some resource on each iteration within Single().
You may want to use a profiler to investigate where time is going when you hit that line. I would also suggest looking at the implementation of FIndAll() because it's returning something unusually expensive to iterate over.
So after reading your code a little more closely, it makes sense why Single() is so expensive. The PrincipalSearcher class uses the directory services store as the repository against which to search. It does not cache these results. That's what's affecting your performance.
You probably want to materialize the list using either ToList() or ToDictionary() so that accessing the principal information happens locally.
You could also avoid this kind of code entirely, and use the FindOne() method instead, which allows you to query directly for the principal you want.
But if you can't use that, then something like this should work better:
result.ToDictionary(u => u.SamAccountName)[usr.WEBUSER]

var userMap = result.ToDictionary(u => u.SamAccountName);
foreach (WebAccess.WebLoginUsersRow usr in webAccess.WebLoginUsers)
{
UserPrincipal b = userMap[usr.WEBUSER];
// ...
}

Related

EF - A proper way to search several items in database

I have about 100 items (allRights) in the database and about 10 id-s to be searched (inputRightsIds). Which one is better - first to get all rights and then search the items (Variant 1) or to make 10 checking requests requests to the database
Here is some example code:
DbContext db = new DbContext();
int[] inputRightsIds = new int[10]{...};
Variant 1
var allRights = db.Rights.ToLIst();
foreach( var right in allRights)
{
for(int i>0; i<inputRightsIds.Lenght; i++)
{
if(inputRightsIds[i] == right.Id)
{
// Do something
}
}
}
Variant 2
for(int i>0; i<inputRightsIds.Lenght; i++)
{
if(db.Rights.Any(r => r.Id == inputRightsIds[i]);)
{
// Do something
}
}
Thanks in advance!

As other's have already stated you should do the following.
var matchingIds = from r in db.Rights
where inputRightIds.Contains(r.Id)
select r.Id;
foreach(var id in matchingIds)
{
// Do something
}
But this is different from both of your approaches. In your first approach you are making one SQL call to the DB that is returning more results than you are interested in. The second is making multiple SQL calls returning part of the information you want with each call. The query above will make one SQL call to the DB and return only the data you are interested in. This is the best approach as it reduces the two bottle necks of making multiple calls to the DB and having too much data returned.

You can use following :
db.Rights.Where(right => inputRightsIds.Contains(right.Id));

They should be very similar speeds since both must enumerate the arrays the same number of times. There might be subtle differences in speed between the two depending on the input data but in general I would go with Variant 2. I think you should almost always prefer LINQ over manual enumeration when possible. Also consider using the following LINQ statement to simplify the whole search to a single line.
var matches = db.Rights.Where(r=> inputRightIds.Contains(r.Id));
...//Do stuff with matches

Not forget get all your items into memory to process list further
var itemsFromDatabase = db.Rights.Where(r => inputRightsIds.Contains(r.Id)).ToList();
Or you could even enumerate through collection and do some stuff on each item
db.Rights.Where(r => inputRightsIds.Contains(r.Id)).ToList().Foreach(item => {
//your code here
});

Replace nested foreach to Increase performance

I have the following code which take longer time to execute. Is there any alternative that I can replace below code with LINQ or any other way to increase the performance?
var usedLists = new HashSet<long>();
foreach (var test in Tests)
{
var requiredLists = this.GetLists(test.test, selectedTag);
foreach (var List in requiredLists)
{
if (!usedLists.Contains(List.Id))
{
usedLists.Add(List.Id);
var toRecipients = List.RecepientsTo;
var ccRecipients = List.RecipientsCC;
var bccRecipients = List.RecipientsBCC;
var replyTo = new List<string>() { List.ReplyTo };
var mailMode = isPreviewMode ? MailMode.Display : MailMode.Drafts;
OutlookModel.Instance.CreateEmail(toRecipients, ccRecipients, bccRecipients, this.Draft, mailMode, replyTo);
}
}
}

What do you actually do?
Looking at
foreach (var test in Tests)
{
var requiredLists = this.GetLists(test.test, selectedTag);
foreach (var List in requiredLists)
{
if (!usedLists.Contains(List.Id))
it looks to me that you try to get unique "List"s (with all those "var" I cannot tell the actual type). So you actually could replace that with a new function (to be written by you) and do
var uniqueLists = this.GetUniqueLists(Tests, selectedTag);
and finally call
this.Sendmails(uniqueLists);
which would iterate thru that list and send the mails.
If that could speed up the code or not, depends severely on the underlying GetLists / GetUniqueLists functions.
But in any case, it would make a great progress to your code: it becomes readable and testable.

Assuming the result from GetList is large/slow enough in comparison to CreateEmail call, skipping your contains-add hashset comparison will speed things up. Try a call to Distinct([Your ListComparer]) or GroupBy():
var requiredLists =
Tests.SelectMany(test => this.GetLists(test.test, selectedTag))
.Distinct([Your ListComparer]);
// or
var requiredLists =
Tests.SelectMany(test => this.GetLists(test.test, selectedTag))
.GroupBy(x => x.Id).SelectMany(x => x.First());
foreach (var List in requiredLists)
{
// (...)
OutlookModel.Instance.CreateEmail(toRecipients, ccRecipients, bccRecipients, this.Draft, mailMode, replyTo);
}
You could also try PLINQ to speed up things but I think you might run into trouble with the CreateEmail call, as it's probably a COM object, so multithreading will be a hassle. If GetList is slow enough to be worth the multithread overhead, you may experiment with it in the first call when creating requiredLists.

It might be able to done with the BackgroundWorker class or something similar.
A very rough outline would be to wrap the create email arguments into a separate class, then asynchronously start a background task while the loop continues processing the list.
Edit
The Task class might be of use too. This is assuming the CreateEmail function is the one that's taking up all the time. If it's the loop that's taking all the time then there's no point in going this route though.
Edit
Looking at the algorithmic complexity, the GetLists() calls run in O(n) time, but the CreateEmail() calls run in O(n^2) time, so - all other things being equal - the code block containing the CreateEmail() call would be a better candidate to optimise first.

Each foreach loop creates an Enumerator object, so using a regular for-loop might speed it up a little, but this might be a very minor improvement.
Like stormenet said, the CreateEmail function could be the bottleneck. You might want to make that asynchronous.

Looping on IEnumerator<T>, Any Suggestions

I am having a situation where looping through the result of LINQ is getting on my nerves. Well here is my scenario:
I have a DataTable, that comes from database, from which I am taking data as:
var results = from d in dtAllData.AsEnumerable()
select new MyType
{
ID = d.Field<Decimal>("ID"),
Name = d.Field<string>("Name")
}
After doing the order by depending on the sort order as:
if(orderBy != "")
{
string[] ord = orderBy.Split(' ');
if (ord != null && ord.Length == 2 && ord[0] != "")
{
if (ord[1].ToLower() != "desc")
{
results = from sorted in results
orderby GetPropertyValue(sorted, ord[0])
select sorted;
}
else
{
results = from sorted in results
orderby GetPropertyValue(sorted, ord[0]) descending
select sorted;
}
}
}
The GetPropertyValue method is as:
private object GetPropertyValue(object obj, string property)
{
System.Reflection.PropertyInfo propertyInfo = obj.GetType().GetProperty(property);
return propertyInfo.GetValue(obj, null);
}
After this I am taking out 25 records for first page like:
results = from sorted in results
.Skip(0)
.Take(25)
select sorted;
So far things are going good, Now I have to pass this results to a method which is going to do some manipulation on the data and return me the desired data, here in this method when I want to loop these 25 records its taking a good enough time. My method definition is:
public MyTypeCollection GetMyTypes(IEnumerable<MyType> myData, String dateFormat, String offset)
I have tried foreach and it takes like 8-10 secs on my machine, it is taking time at this line:
foreach(var _data in myData)
I tried while loop and is doing same thing, I used it like:
var enumerator = myData.GetEnumerator();
while(enumerator.MoveNext())
{
int n = enumerator.Current;
Console.WriteLine(n);
}
This piece of code is taking time at MoveNext
Than I went for for loop like:
int length = myData.Count();
for (int i = 0; i < 25;i++ )
{
var temp = myData.ElementAt(i);
}
This code is taking time at ElementAt
Can anyone please guide me, what I am doing wrong. I am using Framework 3.5 in VS 2008.
Thanks in advance

EDIT: I suspect the problem is in how you're ordering. You're using reflection to first fetch and then invoke a property for every record. Even though you only want the first 25 records, it has to call GetPropertyValue on all the records first, in order to order them.
It would be much better if you could do this without reflection at all... but if you do need to use reflection, at least call Type.GetProperty() once instead of for every record.
(In some ways this is more to do with helping you diagnose the problem more easily than a full answer as such...)
As Henk said, this is very odd:
results = from sorted in results
.Skip(0)
.Take(25)
select sorted;
You almost certainly really just want:
results = results.Take(25);
(Skip(0) is pointless.)
It may not actually help, but it will make the code simpler to debug.
The next problem is that we can't actually see all your code. You've written:
After doing the order by depending on the sort order
... but you haven't shown how you're performing the ordering.
You should show us a complete example going from DataTable to its use.
Changing how you iterate over the sequence will not help - it's going to do the same thing either way, really - although it's surprising that in your last attempt, Count() apparently works quickly. Stick to the foreach - but work out exactly what that's going to be doing. LINQ uses a lot of lazy evaluation, and if you've done something which makes that very heavy going, that could be the problem. It's hard to know without seeing the whole pipeline.

The problem is that your "results" IEnumerable isn't actually being evaluated until it is passed into your method and enumerated. That means that the whole operation, getting all the data from dtAllData, selecting out the new type (which is happening on the whole enumerable, not just the first 25), and then finally the take 25 operation, are all happening on the first enumeration of the IEnumerable (foreach, while, whatever).
That's why your method is taking so long. It's actually doing some of the work defined elsewhere inside the method. If you want that to happen before your method, you could do a "ToList()" prior to the method.

You might find it easier to adopt a hybrid approach;
In order:
1) Sort your datatable in-situ. It's probably best to do this at the database level, but, if you can't, then DataTable.DefaultView.Sort is pretty efficient:
dtAllData.DefaultView.Sort = ord[0] + " " + ord[1];
This assumes that ord[0] is the column name, and ord[1] is either ASC or DESC
2) Page through the DefaultView by index:
int pageStart = 0;
List<DataRowView> pageRows = new List<DataRowView>();
for (int i = pageStart; i < dtAllData.DefaultView.Count; i++ )
{
if(pageStart + 25 > i || i == dtAllData.DefaultView.Count - 1) { break; //Exit if more than the number of pages or at the end of the rows }
pageRows.Add(dtAllData.DefaultView[i]);
}
...and create your objects from this much smaller list... (I've assumed the columns are called Id and Name, as well as the types)
List<MyType> myObjects = new List<MyType>();
foreach(DataRowView pageRow in pageRows)
{
myObjects.Add(new MyObject() { Id = Convert.ToInt32(pageRow["Id"]), Name = Convert.ToString(pageRow["Name"])});
}
You can then proceed with the rest of what you were doing.

What is the correct PLINQ syntax to convert this foreach loop to parallel execution?

Update 2011-05-20 12:49AM: The foreach is still 25% faster than the parallel solution for my application. And don't use the collection count for max parallelism, use somthing closer to the number of cores on your machine.
=
I have an IO bound task that I would like to run in parallel. I want to apply the same operation to every file in a folder. Internally, the operation results in a Dispatcher.Invoke that adds the computed file info to a collection on the UI thread. So, in a sense, the work result is a side effect of the method call, not a value returned directly from the method call.
This is the core loop that I want to run in parallel
foreach (ShellObject sf in sfcoll)
ProcessShellObject(sf, curExeName);
The context for this loop is here:
var curExeName = Path.GetFileName(Assembly.GetEntryAssembly().Location);
using (ShellFileSystemFolder sfcoll = ShellFileSystemFolder.FromFolderPath(_rootPath))
{
//This works, but is not parallel.
foreach (ShellObject sf in sfcoll)
ProcessShellObject(sf, curExeName);
//This doesn't work.
//My attempt at PLINQ. This code never calls method ProcessShellObject.
var query = from sf in sfcoll.AsParallel().WithDegreeOfParallelism(sfcoll.Count())
let p = ProcessShellObject(sf, curExeName)
select p;
}
private String ProcessShellObject(ShellObject sf, string curExeName)
{
String unusedReturnValueName = sf.ParsingName
try
{
DesktopItem di = new DesktopItem(sf);
//Up date DesktopItem stuff
di.PropertyChanged += new PropertyChangedEventHandler(DesktopItem_PropertyChanged);
ControlWindowHelper.MainWindow.Dispatcher.Invoke(
(Action)(() => _desktopItemCollection.Add(di)));
}
catch (Exception ex)
{
}
return unusedReturnValueName ;
}
Thanks for any help!
+tom

EDIT: Regarding the update to your question. I hadn't spotted that the task was IO-bound - and presumably all the files are from a single (traditional?) disk. Yes, that would go slower - because you're introducing contention in a non-parallelizable resource, forcing the disk to seek all over the place.
IO-bound tasks can still be parallelized effectively sometimes - but it depends on whether the resource itself is parallelizable. For example, an SSD (which has much smaller seek times) may completely change the characteristics you're seeing - or if you're fetching over the network from several individually-slow servers, you could be IO-bound but not on a single channel.
You've created a query, but never used it. The simplest way of forcing everything to be used with the query would be to use Count() or ToList(), or something similar. However, a better approach would be to use Parallel.ForEach:
var options = new ParallelOptions { MaxDegreeOfParallelism = sfcoll.Count() };
Parallel.ForEach(sfcoll, options, sf => ProcessShellObject(sf, curExeName));
I'm not sure that setting the max degree of parallelism like that is the right approach though. It may work, but I'm not sure. A different way of approaching this would be to start all the operations as tasks, specifying TaskCreationOptions.LongRunning.

Your query object created via LINQ is an IEnumerable. It gets evaluated only if you enumerate it (eg. via foreach loop):
var query = from sf in sfcoll.AsParallel().WithDegreeOfParallelism(sfcoll.Count())
let p = ProcessShellObject(sf, curExeName)
select p;
foreach(var q in query)
{
// ....
}
// or:
var results = query.ToArray(); // also enumerates query

Should you add a line in the end
var results = query.ToList();

Optimize code: Linq and foreach loop 15k records

this is my code
void fixInstellingenTabel(object source, ElapsedEventArgs e)
{
NASDataContext _db = new NASDataContext();
List<Instellingen> newOnes = new List<Instellingen>();
List<InstellingGegeven> li = _db.InstellingGegevens.ToList();
foreach (InstellingGegeven i in li) {
if (_db.Instellingens.Count(q => q.INST_LOC_REF == i.INST_LOC_REF && q.INST_LOCNR == i.INST_LOCNR && q.INST_REF == i.INST_REF && q.INST_TYPE == i.INST_TYPE) <= 0) {
// There is no item yet. Create one.
Instellingen newInst = new Instellingen();
newInst.INST_LOC_REF = i.INST_LOC_REF;
newInst.INST_LOCNR = i.INST_LOCNR;
newInst.INST_REF = i.INST_REF;
newInst.INST_TYPE = i.INST_TYPE;
newInst.Opt_KalStandaard = false;
newOnes.Add(newInst);
}
}
_db.Instellingens.InsertAllOnSubmit(newOnes);
_db.SubmitChanges();
}
basically, the InstellingGegevens table gest filled in by some procedure from another server.
the thing i then need to do is check if there are new records in this table, and fill in the new ones in Instellingens.
this code runs for like 4 minutes on 15k records. how do I optimize it? or is the only way a Stored Procedure?
this code runs in a timer, running every 6h. IF a stored procedure is best, how to I use that in a timer?
Timer Tim = new Timer(21600000); //6u
Tim.Elapsed += new ElapsedEventHandler(fixInstellingenTabel);
Tim.Start();

Doing this in a stored procedure would be a lot faster. We do something quite similar, only there is about 100k items in the table, it's updated every five minutes, and has a lot more fields in it. Our job takes about two minutes to run, and then it does updates in several tables across three databases, so your job would reasonably take only a couple of seconds.
The query you need would just be something like:
create procedure UpdateInstellingens as
insert into Instellingens (
INST_LOC_REF, INST_LOCNR, INST_REF, INST_TYPE, Opt_KalStandaard
)
select q.INST_LOC_REF, q.INST_LOCNR, q.INST_REF, q.INST_TYPE, cast(0 as bit)
from InstellingGeven q
left join Instellingens i
on q.INST_LOC_REF = i.INST_LOC_REF and q.INST_LOCNR = i.INST_LOCNR
and q.INST_REF = i.INST_REF and q.INST_TYPE = i.INST_TYPE
where i.INST_LOC_REF is null
You can run the procedure from a job in the SQL server, without involving any application at all, or you can use ADO.NET to execute the procedure from your timer.

One way you could optimise this is by changing the Count(...) <= 0 into Any(). However, an even better optimisation would be to retrieve this information in a single query outside the loop:
var instellingens = _db.Instellingens
.Select(q => new { q.INST_LOC_REF, q.INST_LOCNR, q.INST_REF, q.INST_TYPE })
.Distinct()
.ToDictionary(q => q, q => true);
(On second thought, a HashSet would be most appropriate here, but there is unfortunately no ToHashSet() extension method. You can write one of your own if you like!)
And then inside your loop:
if (instellingens.ContainsKey(new { q.INST_LOC_REF, q.INST_LOCNR,
q.INST_REF, q.INST_TYPE })) {
// There is no item yet. Create one.
// ...
}
Then you can optimise the loop itself by making it lazy-retrieve:
// No need for the List<InstellingGegeven>
foreach (InstellingGegeven i in _db.InstellingGegevens) {
// ...
}

What Guffa said, but using Linq here is not the best course if performance is what you are after. Linq, like every other ORM, sacrifices performance for usability. Which is usually a great tradeoff for typical application execution paths. On the other hand, SQL is very, very good at set based operations so that really is the way to fly here.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.