SortedHashTable in c#

SortedHashTable in c# - c#

What I am trying to do is to implement a heuristic approach to NP complete problem: I have a list of objects (matches) each has a double score. I am taking the first element in the list sorted by the score desc and then remove it from the list. Then all elements bound to the first one are to be removed. I iterate through the list till I have no more elements.
I need a data structure which can efficiently solve this problem, so basically it should ahve the following properties:
1. Generic
2. Is always sorted
3. Has a fast key access
Right now SortedSet<T> looks like the best fit.
The question is: is it the most optimal choice for in my case?
List result = new List();
while (sortedItems.Any())
{
var first = sortedItems.First();
result.Add(first);
sortedItems.Remove(first);
foreach (var dependentFirst in first.DependentElements)
{
sortedItems.Remove(dependentFirst);
}
}
What I need is something like sorted hash table.

I assume you're not just wanting to clear the list, but you want to do something with each item as it's removed.
var toDelete = new HashSet<T>();
foreach (var item in sortedItems)
{
if (!toDelete.Contains(item))
{
toDelete.Add(item);
// do something with item here
}
foreach (var dependentFirst in item.DependentElements)
{
if (!toDelete.Contains(item))
{
toDelete.Add(dependentFirst);
// do something with item here
}
}
}
sortedItems.RemoveAll(i => toDelete.Contains(i));

I think you should use two data structures - a heap and a set - heap for keeping the sorted items, set for keeping the removed items. Fill the heap with the items, then remove the top one, and add it and all its dependents to the set. Remove the second one - if it's already in the set, ignore it and move to the third, otherwise add it and its dependents to the set.
Each time you add an item to the set, also do whatever it is you plan to do with the items.
The complexity here is O(NlogN), you won't get any better than this, as you have to sort the list of items anyway. If you want to get better performance, you can add a 'Removed' boolean to each item, and set it to true instead of using a set to keep track of the removed items. I don't know if this is applicable to you.

If im not mistake, you want something like this
var dictionary = new Dictionary<string, int>();
dictionary.Add("car", 2);
dictionary.Add("apple", 1);
dictionary.Add("zebra", 0);
dictionary.Add("mouse", 5);
dictionary.Add("year", 3);
dictionary = dictionary.OrderBy(o => o.Key).ToDictionary(o => o.Key, o => o.Value);

Related

Performance between check exists before add to list and distinct in linq

In the foreach loop, I want to add the Products to a List, but I want this List to not contain duplicate Products, currently I have two ideas solved.
1/ In the loop, before adding the Product to the List, I will check whether the Product already exists in the List, otherwise I will add it to the List.
foreach (var product in products)
{
// code logic
if(!listProduct.Any(x => x.Id == product.Id))
{
listProduct.Add(product);
}
}
2/. In the loop, I will add all the Products to the List even if there are duplicate products. Then outside of the loop, I would use Distinct to remove duplicate records.
foreach (var product in products)
{
// code logic
listProduct.Add(product);
}
listProduct = listProduct.Distinct().ToList();
I wonder in these two ways is the most effective way. Or have any other ideas to be able to add records to the List to avoid duplication ??

I'd go for a third approach: the HashSet. It has a constructor overload that accepts an IEnumerable. This constructor removes duplicates:
If the input collection contains duplicates, the set will contain one
of each unique element. No exception will be thrown.
Source: HashSet<T> Constructor
usage:
List<Product> myProducts = ...;
var setOfProducts = new HashSet<Product>(myProducts);
After removing duplicates there is no proper meaning of setOfProducts[4].
Therefore a HashSet is not a IList<Product>, but an ICollection<Product>, you can Count / Add / Remove, etc, everything you can do with a List. The only thing you can't do is fetch by index

You first take which elements are not already in the collection:
var newProducts = products.Where(x => !listProduct.Any(y => x.Id == y.Id));
And then just add them using AddRang
listProduct.AddRagne(newItems)
Or you can use foreach loop too
foreach (var product in newProducts)
{
listProduct.Add(product);
}
1 more easy solution could be there no need to use Distint
var newProductList = products.Union(listProduct).ToList();
But Union has not good performance.

From what you have included, you are storing everything in memory. If this is the case, or you are persisting only after you have it ready you can consider using BinarySearch:
https://msdn.microsoft.com/en-us/library/w4e7fxsh(v=vs.110).aspx and you also get an ordered list at the end. If ordering is not important, you can use HashSet, which is very fast, and meant specially for this purpose.
Check also: https://www.dotnetperls.com/hashset

This should be pretty fast and take care of any ordering:
// build a HashSet of your primary keys type (I'm assuming integers here) containing all your list elements' keys
var hashSet = new HashSet<int>(listProduct.Select(p => p.Id));
// add all items from the products list whose Id can be added to the hashSet (so it's not a duplicate)
listProduct.AddRange(products.Where(p => hashSet.Add(p.Id)));
What you might want to consider doing instead, though, is implementing IEquatable<Product> and overriding GetHashCode() on your Product type which would make the above code a little easier and put the equality checks where they should be (inside the respective type):
var hashSet = new HashSet<int>(listProduct);
listProduct.AddRange(products.Where(hashSet.Add));

What is the most efficient way to find elements in a list that do not exist in another list and vice versa?

Consider you have two lists in C#, first list contains elements of TypeOne and second list contains elements of TypeTwo:
TypeOne
{
int foo;
int bar;
}
TypeTwo
{
int baz;
int qux;
}
Now I need to find elements ( with some property value ) in the first list that don't exist in the second list, and similarly I want to find elements in the second list that don't exist in the first list. (There are only zero or one occurences in either lists.)
What I tried so far is to iterate both lists like this:
foreach (var item in firstList)
{
if (!secondList.Any(a=> a.baz == item.foo)
{
// Item is in the first list but not in second list.
}
}
and again:
foreach (var item in secondList)
{
if (!firstList.Any(a=> a.foo == item.baz)
{
// Item is in the second list but not in first list.
}
}
I hardly think this is a good way to do what I want. I'm iterating my lists two times and use Any in each of them which also iterates the list. So too many iterations.
What is the most efficient way to achieve this?

I am afraid there is no prebuild solution for this, so the best we can do is optimize as much as possible. We only have to iterate the first list, because everything that is in second will be compared already
// First we need copies to operate on
var firstCopy = new List<TypeOne>(firstList);
var secondCopy = new List<TypeTwo>(secondList);
// Now we iterate the first list once complete
foreach (var typeOne in firstList)
{
var match = secondCopy.FirstOrDefault(s => s.baz == typeOne.foo);
if (match == null)
{
// Item in first but not in second
}
else
{
// Match is duplicate and shall be removed from both
firstCopy.Remove(typeOne);
secondCopy.Remove(match);
}
}
After running this both copies will only contain the values which are unique in this instance. This not only reduces it to half the number of iterations but also constantly improves because the second copy shrinks with each match.

Use this LINQ Query.
var result1 = secondList.Where(p2 => !firstList.Any(p1 => p1.foo == p2.baz));
var result2=firstList.Where(p1=> !secondList.Any(p2=> p2.foo == p1.baz);

How to properly use SortedDictionary in c#?

I'm trying to do something very simple but it seems that I don't understand SortedDictionary.
What I'm trying to do is the following:
Create a sorted dictionary that sorts my items by some floating number, so I create a dictionary that looks like this
SortedDictionary<float, Node<T>> allNodes = new SortedDictionary<float, Node<T>>();
And now after I add items, I want to remove them one by one (every removal should be at a complexity of O(log(n)) from the smallest to the largest.
How do I do it? I thought that simply allNodes[0] will give me the the smallest, but it doesn't.
More over, it seems like the dictionary can't handle duplicate keys. I feel like I'm using the wrong data structure...
Should I use something else if I have bunch of nodes that I want to be sorted by their distance (floating point)?

allNodes[0] will not give you the first item in the dictionary - it will give you the item with a float key value of 0.
If you want the first item try allNodes.Values.First() instead. Or to find the first key use allNodes.Keys.First()
To remove the items one by one, loop over a copy of the Keys collection and call allNodes.Remove(key);
foreach (var key in allNodes.Keys.ToList())
{
allNodes.Remove(key);
}
To answer your addendum to your question, yes SortedDictionary (any flavor of Dictionary for that matter) will not handle duplicate keys - if you try and add an item with an existing key it will overwrite the previous value.
You could use a SortedDictionary<float, List<Node<T>>> but then you have the complexity of extracting collections versus items, needing to initialize each list rather than just adding an item, etc. It's all possible and may still be the fastest structure for adds and gets, but it does add a bit of complexity.

Yes, you're right about complexity.
In SortedDictionary all the keys are sorted. If you want to iterate from the smallest to the largest, foreach will be enough:
foreach(KeyValuePair<float, Node<T>> kvp in allNodes)
{
// Do Something...
}
You wrote that you want to remove items. It's forbidden to remove from collections during iteratation with foreach, so firstly create a copy of it to do so.
EDIT:
Yes, if you have duplicated keys you can't use SortedDictionary. Create a structural Node with Node<T> and float, then write a comparer:
public class NodeComparer : IComparer<Node>
{
public int Compare(Node n1, Node n2)
{
return n2.dist.CompareTo(n1.dist);
}
}
And then put everything in simple List<Node> allNodes and sort:
allNodes.Sort(new NodeComparer());

As a Dictionary<TKey, TValue> must have unique keys, I'd use List<Node<T>> instead. For instance, if your Node<T> class has a Value property
class Node<T>
{
float Value { get; set; }
// other properties
}
and you want to sort by this property, use LINQ:
var list = new List<Node<T>>();
// populate list
var smallest = list.OrderBy(n => n.Value).FirstOrDefault();
To remove the nodes one by one, just iterate threw the list:
while (list.Count > 0)
{
list.RemoveAt(0);
}

A better way to loop through lists

So I have a couple of different lists that I'm trying to process and merge into 1 list.
Below is a snipet of code that I want to see if there was a better way of doing.
The reason why I'm asking is that some of these lists are rather large. I want to see if there is a more efficient way of doing this.
As you can see I'm looping through a list, and the first thing I'm doing is to check to see if the CompanyId exists in the list. If it does, then I find item in the list that I'm going to process.
pList is my processign list. I'm adding the values from my different lists into this list.
I'm wondering if there is a "better way" of accomplishing the Exist and Find.
boolean tstFind = false;
foreach (parseAC item in pACList)
{
tstFind = pList.Exists(x => (x.CompanyId == item.key.ToString()));
if (tstFind == true)
{
pItem = pList.Find(x => (x.CompanyId == item.key.ToString()));
//Processing done here. pItem gets updated here
...
}
Just as a side note, I'm going to be researching a way to use joins to see if that is faster. But I haven't gotten there yet. The above code is my first cut at solving this issue and it appears to work. However, since I have the time I want to see if there is a better way still.
Any input is greatly appreciated.
Time Findings:
My current Find and Exists code takes about 84 minutes to loop through the 5.5M items in the pACList.
Using pList.firstOrDefault(x=> x.CompanyId == item.key.ToString()); takes 54 minutes to loop through 5.5M items in the pACList

You can retrieve item with FirstOrDefault instead of searching for item two times (first time to define if item exists, and second time to get existing item):
var tstFind = pList.FirstOrDefault(x => x.CompanyId == item.key.ToString());
if (tstFind != null)
{
//Processing done here. pItem gets updated here
}

Yes, use a hashtable so that your algorithm is O(n) instead of O(n*m) which it is right now.
var pListByCompanyId = pList.ToDictionary(x => x.CompanyId);
foreach (parseAC item in pACList)
{
if (pListByCompanyId.ContainsKey(item.key.ToString()))
{
pItem = pListByCompanyId[item.key.ToString()];
//Processing done here. pItem gets updated here
...
}

You can iterate though filtered list using linq
foreach (parseAC item in pACList.Where(i=>pList.Any(x => (x.CompanyId == i.key.ToString()))))
{
pItem = pList.Find(x => (x.CompanyId == item.key.ToString()));
//Processing done here. pItem gets updated here
...
}

Using lists for this type of operation is O(MxN) (M is the count of pACList, N is the count of pList). Additionally, you are searching pACList twice. To avoid that issue, use pList.FirstOrDefault as recommended by #lazyberezovsky.
However, if possible I would avoid using lists. A Dictionary indexed by the key you're searching on would greatly improve the lookup time.

Doing a linear search on the list for each item in another list is not efficient for large data sets. What is preferable is to put the keys into a Table or Dictionary that can be much more efficiently searched to allow you to join the two tables. You don't even need to code this yourself, what you want is a Join operation. You want to get all of the pairs of items from each sequence that each map to the same key.
Either pull out the implementation of the method below, or change Foo and Bar to the appropriate types and use it as a method.
public static IEnumerable<Tuple<Bar, Foo>> Merge(IEnumerable<Bar> pACList
, IEnumerable<Foo> pList)
{
return pACList.Join(pList, item => item.Key.ToString()
, item => item.CompanyID.ToString()
, (a, b) => Tuple.Create(a, b));
}
You can use the results of this call to merge the two items together, as they will have the same key.
Internally the method will create a lookup table that allows for efficient searching before actually doing the searching.

Convert pList to HashSet then query pHashSet.Contains(). Complexity O(N) + O(n)
Sort pList on CompanyId and do Array.BinarySearch() = O(N Log N) + O(n * Log N )
If Max company id is not prohibitively large, simply create and array of them where item with company id i exists at i-th position. Nothing can be more fast.
where N is size of pList and n is size of pACList

Updating a list based on update happened to other list

I have listA, listB. listA is subset of listB. For example, 1 element is deleted and 2 elements are added to listB. Then if listA contains the element that was deleted from listB, delete it from listA. Also, listA should add the newly added elements.
At present I am using foreach{ if(list.contains) } two times. Once to add and once to delete. This will be of O(2n), which is ok.
But is there a best way to do this mostly with O(n) in LINQ/any other way?.
To be more clear:
Actually I have a list of custom class.
From which I am forming listA in above question(using one field of that). ListB is just list of string which I get from web service.
Code:
//First foreach loop which I was taking about.
foreach (string A in listA)
{
if (listB.Contains(A)
{
}
else
{
//getting items that are added to listB
}
}
//Second foreach loop which i was taking about.
foreach (string A in listB)
{
if (listA.Contains(A)
{
}
else
{
//getting items that are deleted from listB
}
}
And then I am updating that List<custom class> accordingly. My main question is instead of using two foreach loops can I do something better?

This might be more efficient (although that depends):
var notInA = listB.Except(listA).ToList();
var notInB = listA.Except(listB).ToList();
foreach (var a in notInA)
listA.Add(a);
foreach (var b in notInB)
listA.Remove(b);
Note that you need to implement a custom IEqualityComparer<T> if T is a custom class.
EDIT: so this is just synchronizing both lists. Maybe i've misunderstood the question but can't you simply:
listA = new List<T>(listB);

Can you use events / delegates instead of foreach ? Read a discussion here

You should be able to make these lists observable. When one list is updated, trigger the CollectionChanged event, and add in some code to update your other list. You should be able to make this two way. See Observable Collections : Here
Furthermore, observable collections allow you to detect which kind of events occured on your collection. (Ie. Add, Remove, Replace etc.) This should help you in your process of updating the other list with the same information.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

SortedHashTable in c# - c#

Related

Performance between check exists before add to list and distinct in linq

What is the most efficient way to find elements in a list that do not exist in another list and vice versa?

How to properly use SortedDictionary in c#?

A better way to loop through lists

Updating a list based on update happened to other list

Categories

Resources