Efficient and Accurate way to add items into a TList - c#

I have 2 lists and the entities of the those lists have some IDs for instance
Client.ID, where ID is a property of Client anf then I have PopulationClient.ID, where ID is a property of the class PopulationClient. So I have two Lists
TList<Client> clients = clientsHelper.GetAllClients();
TList<PopulationClient> populationClients = populationHelper.GetAllPopulationClients();
So then I have a temp List
TList<Client> temp_list = new TList<Client>();
So the problem i am having is doing this efficiently and correctly. This is what I have tried.. but I am not getting the correct results
foreach(PopulationClient pClients in populationClients)
{
foreach(Client client in clients)
{
if(pClients.ID != client.ID && !InTempList(temp_list, pClients.ID))
{
temp_list.Add(client);
}
}
}
public bool InTempList(TList<Client> c, int id)
{
bool IsInList = false;
foreach(Client client in c)
{
if(client.ID == id)
{
IsInList = true;
}
}
return IsInList;
}
So while I am trying to do it right I can not come up with a good way of doing it, this is not returning the correct data because in my statement in the first loop at the top,at some point one or more is different to the otherone so it adds it anyways. What constraints do you think I should check here so that I only end up with a list of Clients that are in population clients but not in Clients?.
For instance population clients would have 4 clients and Clients 2, those 2 are also in population clients but I need to get a list of population clients not in Clients.
ANy help or pointers would be appreciated.

First, let's concentrate on getting the right results, and then we'll optimize.
Consider your nested loops: you will get too many positives, because in most (pclient, client) pairs the IDs wouldn't match. I think you wanted to code it like this:
foreach(PopulationClient pClients in populationClients)
{
if(!InTempList(clients, pClients.ID) && !InTempList(temp_list, pClients.ID))
{
temp_list.Add(client);
}
}
Now for the efficiency of that code: InTempList uses linear search through lists. This is not efficient - consider using structures that are faster to search, for example, hash sets.

If I understand what you're looking for, here is a way to do it with LINQ...
tempList = populationList.Where(p => !clientList.Any(p2 => p2.ID == p.ID));

Just to offer another LINQ-based answer... I think your intent is to populate tempList based on all the items in 'clients' (returned from GetAllClients) that don't show up (based on 'ID" value) in the populationClients collection.
If that's the case, then I'm going to assume that populationClients is sufficiently large to warrant doing a hash-based looked (if it's less than 10 items, the linear scan may not be a big deal, for instance).
So we want a fast-lookup version of all the ID values from the populationClients collection:
var populationClientIDs = populationClients.Select(pc => pc.ID);
var populationClientIDHash = new HashSet(populationClientIDs);
Now that we have the ID values we want to ignore in a fast lookup data structure, we can then use that as a filter for the clients:
var filteredClients = clients.Where(c => populationClientIDHash.Contains(c.ID) == false);
Based on the usage/need, you could either populate the tempList from 'filteredClients', or do a ToList, or whatever.

Related

Fastest way to match members of two lists

I have two lists which are orderHeaders and orderLines. These are two related tables in the database however when I pull them I have to pull them separately as two different lists and then map them out to each other later. I have a solution right now but the performance is a little bit disappointing given that I have around 400k headers and 1million+ lines.
Here's my code below. Is this the standard way to iterate over and find members inside two lists or is there a more optimized approach in C#?
var OutboundOrderHeaders =
DbContext.Context.Database.SqlQuery<OutboundOrderDTO>(queryString, parameter);
var OutboundOrderHeadersList = OutboundOrderHeaders.ToList();
var OutboundOrderLine =
DbContext.Context.Database.SqlQuery<OutboundOrderLineDTO>(queryStringLine, parameter2);
var OutboundOrderLineList = OutboundOrderLine.ToList();
for(var i = 0; i < OutboundOrderHeadersList.Count(); i++)
{
var LineToAdd = OutboundOrderLineList
.Where(x => x.OutboundNumber == OutboundOrderHeadersList[i].OutboundNumber)
.ToList() ;
OutboundOrderHeadersList[i].OrderLine = LineToAdd;
}
return OutboundOrderHeadersList;
As noted in comments, I'd really try hard to do this in the database rather than in memory. But to do it in memory, ToLookup is probably the right way to go:
// Note: here I've renamed used outboundOrderLines where you've got OutboundOrderLineList,
// and orderHeaders where you've got OutboundOrderHeadersList, as simpler
// and more conventional variable names.
var linesByOutboundNumber = outboundOrderLines.ToLookup(line => line.OutboundNumber);
foreach (var orderHeader in orderHeaders)
{
orderHeader.OrderLine = linesByOutboundNumer[orderHeader.OutboundNumber].ToList();
}
This builds a map going from outbound number to "all the lines with that outbound number" by going through outboundOrderLines once, rather than iterating over it for every order header.

EF - A proper way to search several items in database

I have about 100 items (allRights) in the database and about 10 id-s to be searched (inputRightsIds). Which one is better - first to get all rights and then search the items (Variant 1) or to make 10 checking requests requests to the database
Here is some example code:
DbContext db = new DbContext();
int[] inputRightsIds = new int[10]{...};
Variant 1
var allRights = db.Rights.ToLIst();
foreach( var right in allRights)
{
for(int i>0; i<inputRightsIds.Lenght; i++)
{
if(inputRightsIds[i] == right.Id)
{
// Do something
}
}
}
Variant 2
for(int i>0; i<inputRightsIds.Lenght; i++)
{
if(db.Rights.Any(r => r.Id == inputRightsIds[i]);)
{
// Do something
}
}
Thanks in advance!
As other's have already stated you should do the following.
var matchingIds = from r in db.Rights
where inputRightIds.Contains(r.Id)
select r.Id;
foreach(var id in matchingIds)
{
// Do something
}
But this is different from both of your approaches. In your first approach you are making one SQL call to the DB that is returning more results than you are interested in. The second is making multiple SQL calls returning part of the information you want with each call. The query above will make one SQL call to the DB and return only the data you are interested in. This is the best approach as it reduces the two bottle necks of making multiple calls to the DB and having too much data returned.
You can use following :
db.Rights.Where(right => inputRightsIds.Contains(right.Id));
They should be very similar speeds since both must enumerate the arrays the same number of times. There might be subtle differences in speed between the two depending on the input data but in general I would go with Variant 2. I think you should almost always prefer LINQ over manual enumeration when possible. Also consider using the following LINQ statement to simplify the whole search to a single line.
var matches = db.Rights.Where(r=> inputRightIds.Contains(r.Id));
...//Do stuff with matches
Not forget get all your items into memory to process list further
var itemsFromDatabase = db.Rights.Where(r => inputRightsIds.Contains(r.Id)).ToList();
Or you could even enumerate through collection and do some stuff on each item
db.Rights.Where(r => inputRightsIds.Contains(r.Id)).ToList().Foreach(item => {
//your code here
});

A better way to loop through lists

So I have a couple of different lists that I'm trying to process and merge into 1 list.
Below is a snipet of code that I want to see if there was a better way of doing.
The reason why I'm asking is that some of these lists are rather large. I want to see if there is a more efficient way of doing this.
As you can see I'm looping through a list, and the first thing I'm doing is to check to see if the CompanyId exists in the list. If it does, then I find item in the list that I'm going to process.
pList is my processign list. I'm adding the values from my different lists into this list.
I'm wondering if there is a "better way" of accomplishing the Exist and Find.
boolean tstFind = false;
foreach (parseAC item in pACList)
{
tstFind = pList.Exists(x => (x.CompanyId == item.key.ToString()));
if (tstFind == true)
{
pItem = pList.Find(x => (x.CompanyId == item.key.ToString()));
//Processing done here. pItem gets updated here
...
}
Just as a side note, I'm going to be researching a way to use joins to see if that is faster. But I haven't gotten there yet. The above code is my first cut at solving this issue and it appears to work. However, since I have the time I want to see if there is a better way still.
Any input is greatly appreciated.
Time Findings:
My current Find and Exists code takes about 84 minutes to loop through the 5.5M items in the pACList.
Using pList.firstOrDefault(x=> x.CompanyId == item.key.ToString()); takes 54 minutes to loop through 5.5M items in the pACList
You can retrieve item with FirstOrDefault instead of searching for item two times (first time to define if item exists, and second time to get existing item):
var tstFind = pList.FirstOrDefault(x => x.CompanyId == item.key.ToString());
if (tstFind != null)
{
//Processing done here. pItem gets updated here
}
Yes, use a hashtable so that your algorithm is O(n) instead of O(n*m) which it is right now.
var pListByCompanyId = pList.ToDictionary(x => x.CompanyId);
foreach (parseAC item in pACList)
{
if (pListByCompanyId.ContainsKey(item.key.ToString()))
{
pItem = pListByCompanyId[item.key.ToString()];
//Processing done here. pItem gets updated here
...
}
You can iterate though filtered list using linq
foreach (parseAC item in pACList.Where(i=>pList.Any(x => (x.CompanyId == i.key.ToString()))))
{
pItem = pList.Find(x => (x.CompanyId == item.key.ToString()));
//Processing done here. pItem gets updated here
...
}
Using lists for this type of operation is O(MxN) (M is the count of pACList, N is the count of pList). Additionally, you are searching pACList twice. To avoid that issue, use pList.FirstOrDefault as recommended by #lazyberezovsky.
However, if possible I would avoid using lists. A Dictionary indexed by the key you're searching on would greatly improve the lookup time.
Doing a linear search on the list for each item in another list is not efficient for large data sets. What is preferable is to put the keys into a Table or Dictionary that can be much more efficiently searched to allow you to join the two tables. You don't even need to code this yourself, what you want is a Join operation. You want to get all of the pairs of items from each sequence that each map to the same key.
Either pull out the implementation of the method below, or change Foo and Bar to the appropriate types and use it as a method.
public static IEnumerable<Tuple<Bar, Foo>> Merge(IEnumerable<Bar> pACList
, IEnumerable<Foo> pList)
{
return pACList.Join(pList, item => item.Key.ToString()
, item => item.CompanyID.ToString()
, (a, b) => Tuple.Create(a, b));
}
You can use the results of this call to merge the two items together, as they will have the same key.
Internally the method will create a lookup table that allows for efficient searching before actually doing the searching.
Convert pList to HashSet then query pHashSet.Contains(). Complexity O(N) + O(n)
Sort pList on CompanyId and do Array.BinarySearch() = O(N Log N) + O(n * Log N )
If Max company id is not prohibitively large, simply create and array of them where item with company id i exists at i-th position. Nothing can be more fast.
where N is size of pList and n is size of pACList

How to compare two sorted large lists efficiently in C#?

I have got two generic lists with 20,000 and 30,000 objects in each list.
class Employee
{
string name;
double salary;
}
List<Employee> newEmployeeList = List<Employee>() {....} // contains 20,000 objects
List<Employee> oldEmployeeList = List<Employee>() {....} // contains 30,000 objects
Lists can also be sorted by name if it improves the speed.
I want to compare these two lists to find out
employees whose name and salary matching
employees whose name is matching but not salary
What is the fastest way to compare such large data lists with above conditions?
I would sort both newEmployeeList and oldEmployeeList lists by name - O(n*log(n)). And then you can use linear algorithm to search for matches. So the total would be O(n+n*log(n)) if both lists are about the same size. This should be faster than O(n^2) "brute force" algorithm.
I'd probably recommend the two lists be stored in a Dictionary<string, Employee> based on the name to begin with, then you can iterate over the keys in one and lookup to see if they exist and the salaries match in the other. This would also save the cost of sorting them later or putting them in a more efficient structure.
This is pretty much O(n) - linear to build both dictionaries, linear to go through the keys and lookup in the other. Since O(n + m + n) reduces to O(n)
But, if you must use List<T> to hold the lists for other reasons, you could also use the Join() LINQ method, and build a new list with a Match field that tells you whether they were a match or mismatch...
var results = newEmpList.Join(
oldEmpList,
n => n.Name,
o => o.Name,
(n, o) => new
{
Name = n.Name,
Salary = n.Salary,
Match = o.Salary == n.Salary
});
You can then filter this with a Where() clause for Match or !Match.
Update: I assume (by the title of your question) that the 2 lists are already sorted. Perhaps they're stored in a database with a clustered index or something. This answer, therefore, relies on that assumption.
Here is an implementation that has O(n) complexity, and is also very fast, AND is pretty simple too.
I believe this is a variant of the Merge Algorithm.
Here's the idea:
Start enumerating both lists
Compare the 2 current items.
If they match, add to your results.
If the 1st item is "smaller", advance the 1st list.
If the 2nd item is "smaller", advance the 2nd list.
Since both lists are known to be sorted, this will work very well. This implementation assumes that name is unique in each list.
var comparer = StringComparer.OrdinalIgnoreCase;
var namesAndSalaries = new List<Tuple<Employee, Employee>>();
var namesOnly = new List<Tuple<Employee, Employee>>();
// Create 2 iterators; one for old, one for new:
using (IEnumerator<Employee> A = oldEmployeeList.GetEnumerator()) {
using (IEnumerator<Employee> B = newEmployeeList.GetEnumerator()) {
// Start enumerating both:
if (A.MoveNext() && B.MoveNext()) {
while (true) {
int compared = comparer.Compare(A.Current.name, B.Current.name);
if (compared == 0) {
// Names match
if (A.Current.salary == B.Current.salary) {
namesAndSalaries.Add(Tuple.Create(A.Current, B.Current));
} else {
namesOnly.Add(Tuple.Create(A.Current, B.Current));
}
if (!A.MoveNext() || !B.MoveNext()) break;
} else if (compared == -1) {
// Keep searching A
if (!A.MoveNext()) break;
} else {
// Keep searching B
if (!B.MoveNext()) break;
}
}
}
}
}
One of fastest possible solutions on sorted lists is use of BinarySearch in order to find an item in another list.
But as mantioned others, you should measure it against your project requirements, as performance often tends to be a subjective thing.
You could create a Dictionary using
var lookupDictionary = list1.ToDictionary(x=>x.name);
That would give you close to O(1) lookup and a close to O(n) behavior if you're looking up values from a loop over the other list.
(I'm assuming here that ToDictionary is O(n) which would make sense with a straight forward implementation, but I have not tested this to be the case)
This would make for a very straight forward algorithm, and I'm thinking going below O(n) with two unsorted lists is pretty hard.

C# Sort List Based on Another List

I have a class that has multiple List<> contained within it. Its basically a table stored with each column as a List<>. Each column does not contain the same type. Each list is also the same length (has the same number of elements).
For example:
I have 3 List<> objects; one List, two List, and three List.
//Not syntactically correct
List<DateTime> one = new List...{4/12/2010, 4/9/2006, 4/13/2008};
List<double> two = new List...{24.5, 56.2, 47.4};
List<string> three = new List...{"B", "K", "Z"};
I want to be able to sort list one from oldest to newest:
one = {4/9/2006, 4/13/2008, 4/12/2010};
So to do this I moved element 0 to the end.
I then want to sort list two and three the same way; moving the first to the last.
So when I sort one list, I want the data in the corresponding index in the other lists to also change in accordance with how the one list is sorted.
I'm guessing I have to overload IComparer somehow, but I feel like there's a shortcut I haven't realized.
I've handled this design in the past by keeping or creating a separate index list. You first sort the index list, and then use it to sort (or just access) the other lists. You can do this by creating a custom IComparer for the index list. What you do inside that IComparer is to compare based on indexes into the key list. In other words, you are sorting the index list indirectly. Something like:
// This is the compare function for the separate *index* list.
int Compare (object x, object y)
{
KeyList[(int) x].CompareTo(KeyList[(int) y])
}
So you are sorting the index list based on the values in the key list. Then you can use that sorted key list to re-order the other lists. If this is unclear, I'll try to add a more complete example when I get in a situation to post one.
Here's a way to do it using LINQ and projections. The first query generates an array with the original indexes reordered by the datetime values; in your example, the newOrdering array would have members:
{ 4/9/2006, 1 }, { 4/13/2008, 2 }, { 4/12/2010, 0 }
The second set of statements generate new lists by picking items using the reordered indexes (in other words, items 1, 2, and 0, in that order).
var newOrdering = one
.Select((dateTime, index) => new { dateTime, index })
.OrderBy(item => item.dateTime)
.ToArray();
// now, order each list
one = newOrdering.Select(item => one[item.index]).ToList();
two = newOrdering.Select(item => two[item.index]).ToList();
three = newOrdering.Select(item => three[item.index]).ToList();
I am sorry to say, but this feels like a bad design. Especially because List<T> does not guarantee element order before you have called one of the sorting operations (so you have a problem when inserting):
From MSDN:
The List is not guaranteed to be
sorted. You must sort the List
before performing operations (such as
BinarySearch) that require the List
to be sorted.
In many cases you won't run into trouble based on this, but you might, and if you do, it could be a very hard bug to track down. For example, I think the current framework implementation of List<T> maintains insert order until sort is called, but it could change in the future.
I would seriously consider refactoring to use another data structure. If you still want to implement sorting based on this data structure, I would create a temporary object (maybe using an anonymous type), sort this, and re-create the lists (see this excellent answer for an explanation of how).
First you should create a Data object to hold everything.
private class Data
{
public DateTime DateTime { get; set; }
public int Int32 { get; set; }
public string String { get; set; }
}
Then you can sort like this.
var l = new List<Data>();
l.Sort(
(a, b) =>
{
var r = a.DateTime.CompareTo(b);
if (r == 0)
{
r = a.Int32.CompareTo(b);
if (r == 0)
{
r = a.String.CompareTo(b);
}
}
return r;
}
);
I wrote a sort algorithm that does this for Nito.LINQ (not yet released). It uses a simple-minded QuickSort to sort the lists, and keeps any number of related lists in sync. Source code starts here, in the IList<T>.Sort extension method.
Alternatively, if copying the data isn't a huge concern, you could project it into a LINQ query using the Zip operator (requires .NET 4.0 or Rx), order it, and then pull each result out:
List<DateTime> one = ...;
List<double> two = ...;
List<string> three = ...;
var combined = one.Zip(two, (first, second) => new { first, second })
.Zip(three, (pair, third) => new { pair.first, pair.second, third });
var ordered = combined.OrderBy(x => x.first);
var orderedOne = ordered.Select(x => x.first);
var orderedTwo = ordered.Select(x => x.second);
var orderedThree = ordered.Select(x => x.third);
Naturally, the best solution is to not separate related data in the first place.
Using generic arrays, this can get a bit cumbersome.
One alternative is using the Array.Sort() method that takes an array of keys and an array of values to sort. It first sorts the key array into ascending order and makes sure the array of values is reorganized to match this sort order.
If you're willing to incur the cost of converting your List<T>s to arrays (and then back), you could take advantage of this method.
Alternatively, you could use LINQ to combine the values from multiple arrays into a single anonymous type using Zip(), sort the list of anonymous types using the key field, and then split that apart into separate arrays.
If you want to do this in-place, you would have to write a custom comparer and create a separate index array to maintain the new ordering of items.
I hope this could help :
one = one.Sort(delegate(DateTime d1, DateTime d2)
{
return Convert.ToDateTime(d2).CompareTo(Convert.ToDateTime(d1));
});

Categories

Resources