Check if an object equals another object through a transitive relationship - c#

I have a class ValuePair with two properties defined in it:
public class ValuePair: IEquatable<ValuePair>
{
public string value1;
public string value2;
public ValuePair(string v1, string v1)
{
this.value1 = v1;
this.value2 = v2;
}
...
}
I have some test data in a List as defined below:
List<ValuePair> pairs = new ValuePair<ValuePair>();
pairs.Add(new ValuePair("A","B"));
pairs.Add(new ValuePair("A","C"));
pairs.Add(new ValuePair("B","C"));
pairs.Add(new ValuePair("C","D"));
My goal is to keep pairs[0] and pairs[1] because the pairs "A,B" and "A,C" are unique, but to remove pair[2] because the relationship "B,C" has already been captured in the first two relationships. pairs[3] should remain since the "C,D" relationship is unique.
I have a feeling the solution to this problem will be recursive, which is something that I have very little experience with. I started going down a path of adding a method to the class ValuePair that looks something like this:
public string EqualToEither(ValuePair v)
{
if (v.value1 == this.value1 || v.value1 == this.value2)
return v.value1;
else if (v.value2 == this.value1 || v.value2 == this.value2)
return v.value2;
else
return string.Empty;
}
I've started to try to use the above method inside of a function like this, but I am getting hung up on what to do next:
for (int i = 0; i < pairs.Count; i++)
{
for (int j = pairs.Count - 1; j >= 0; j--)
{
if (pairs[j].EqualToEither(pairs[i]) != string.Empty)
{
pairs[j].EqualToEither(pairs[i]);
}
else
{
continue;
}
}
}
I feel like I am close but still unable to get it. Can anyone please offer some guidance? If I'm approaching this the completely wrong way please let me know, thank you!

I had to solve a similar problem recently, here is how I solved it:
Transitivity is best represented, in my opinion, by grouping interrelated elements together.
For each pair you have to validate if it already belongs to a group (both values are already in the group) or if it extends the relation of a group (only one of the values belong to the group).
In the case it does not belong in any group, it becomes a new group.
In the case both values belong in different groups then you have to merge them.
As mentioned, this is closely related to a spanning tree.
One solution could be to use HashSets to represent the transitivity of your relations (I did not use HashSets in my case, there are many possible solutions).
Each HashSet would represent a group of interrelated elements.
Example implementation using HashSets:
List<ValuePair> pairs = new List<ValuePair>();
pairs.Add(new ValuePair("A", "B"));
pairs.Add(new ValuePair("A", "C"));
pairs.Add(new ValuePair("B", "C"));
pairs.Add(new ValuePair("C", "D"));
List<ValuePair> uniquePairs = new List<ValuePair>();
// this list is not really needed if all you care about
// is getting the resulting groups
List<HashSet<string>> sets = new List<HashSet<string>>();
foreach (ValuePair pair in pairs)
{
int value1Set = -1;
int value2Set = -1;
for (int i = 0; i < sets.Count; i++)
{
HashSet<string> set = sets[i];
if (set.Contains(pair.value1))
value1Set = i;
if (set.Contains(pair.value2))
value2Set = i;
}
if (value1Set == -1 && value2Set == -1)
{
// we have a new set
sets.Add(new HashSet<string> {pair.value1, pair.value2});
}
else if (value1Set == -1)
{
sets[value2Set].Add(pair.value1);
}
else if (value2Set == -1)
{
sets[value1Set].Add(pair.value2);
}
else
{
if (value1Set == value2Set)
{
// duplicate entry, skip the add
continue;
}
// merge the sets at value1Set and value2Set
foreach (string value in sets[value2Set])
{
sets[value1Set].Add(value);
}
sets.RemoveAt(value2Set);
}
uniquePairs.Add(pair);
}

Related

Logical Inversion of Symbol Tree

I have a class, Symbol_Group, that represents an invertible expression of the nature AB(C+DE) + FG. Symbol_Group contains a List<List<iSymbol>>, where iSymbol is an interface applied to Symbol_Group, and Symbol.
The above equation would be represented as A,B,Sym_Grp + F,G; Sym_Grp = C + D,E, where each + represents a new List<iSymbol>
I need to be able to invert and expand this equation using an algorithm that can handle any amount of nesting, and any amount of symbols anded or ored together, to produce a set of Symbol_Group, with each containing a unique expansion. For the above question the answer set would be !A!F; !B!F; !C!D!F; !C!E!F; !A!G; !B!G; !C!D!G; !C!E!G;
I know that I will need to use recursion, but I have had very little experience with it. Any help figuring out this algorithm would be appreciated.
Unless you are somehow required to use a List<List<iSymbol>>, I recommend switching to a different class structure, with a base class (or interface) Expression and subclasses (or implementors) SymbolExpression, NotExpression, OrExpression, and AndExpression. A SymbolExpression contains a single symbol; a NotExpression contains one Expression, and OrExpression and AndExpression contain two expressions each. This is a much more standard structure for working with mathematical expressions, and it is probably simpler to perform the transformations on it.
With the above classes, you can model any expression as a binary tree. Negate the expression by replacing the root by a NotExpression whose child is the original root. Then, traverse the tree with a depth-first search, and whenever you hit a NotExpression whose child is an OrExpression or an AndExpression, you can replace that by an AndExpression or an OrExpression (respectively) whose children are NotExpressions with the original children below them. You might also want to eliminate double negations (look for NotExpressions whose child is a NotExpression, and remove both).
(Whether this answer is understandable probably depends on how comfortable you are with working with trees. Let me know if you need clarification.)
After much work, this is the method I used to get the minimum terms of inversion.
public List<iSymbol> GetInvertedGroup()
{
TrimSymbolList();
List<List<iSymbol>> symbols = this.CopyListMembers(Symbols);
List<iSymbol> SymList;
while (symbols.Count > 1)
{
symbols.Add(MultiplyLists(symbols[0], symbols[1]));
symbols.RemoveRange(0, 2);
}
SymList = symbols[0];
for(int i=0;i<symbols[0].Count;i++)
{
if (SymList[i] is Symbol)
{
Symbol sym = SymList[i] as Symbol;
SymList.RemoveAt(i--);
Symbol_Group symgrp = new Symbol_Group(null);
symgrp.AddSymbol(sym);
SymList.Add(symgrp);
}
}
for (int i = 0; i < SymList.Count; i++)
{
if (SymList[i] is Symbol_Group)
{
Symbol_Group SymGrp = SymList[i] as Symbol_Group;
if (SymGrp.Symbols.Count > 1)
{
List<iSymbol> list = SymGrp.GetInvertedGroup();
SymList.RemoveAt(i--);
AddElementsOf(list, SymList);
}
}
}
return SymList;
}
public List<iSymbol> MultiplyLists(List<iSymbol> L1, List<iSymbol> L2)
{
List<iSymbol> Combined = new List<iSymbol>(L1.Count + L2.Count);
foreach (iSymbol S1 in L1)
{
foreach (iSymbol S2 in L2)
{
Symbol_Group newGrp = new Symbol_Group(null);
newGrp.AddSymbol(S1);
newGrp.AddSymbol(S2);
Combined.Add(newGrp);
}
}
return Combined;
}
This resulted in a List of Groups of Symbols, with each group representing 1 or term in the final result (e.g !A!F). Some further code was used to reduce this to a List>, as there was a reasonable amount of nesting in the answer. To reduce it, I used:
public List<List<Symbol>> ReduceList(List<iSymbol> List)
{
List<List<Symbol>> Output = new List<List<Symbol>>(List.Count);
foreach (iSymbol iSym in List)
{
if (iSym is Symbol_Group)
{
List<Symbol> L = new List<Symbol>();
(iSym as Symbol_Group).GetAllSymbols(L);
Output.Add(L);
}
else
{
throw (new Exception());
}
}
return Output;
}
public void GetAllSymbols(List<Symbol> List)
{
foreach (List<iSymbol> SubList in Symbols)
{
foreach (iSymbol iSym in SubList)
{
if (iSym is Symbol)
{
List.Add(iSym as Symbol);
}
else if (iSym is Symbol_Group)
{
(iSym as Symbol_Group).GetAllSymbols(List);
}
else
{
throw(new Exception());
}
}
}
}
Hope this helps someone else!
I came to this simpler solution after a bit of rejigging. I hope it helps out somebody else with a similar problem! This is the class structure (plus a few other properties)
public class SymbolGroup : iSymbol
{
public SymbolGroup(SymbolGroup Parent, SymRelation Relation)
{
Symbols = new List<iSymbol>();
this.Parent = Parent;
SymbolRelation = Relation;
if (SymbolRelation == SymRelation.AND)
Name = "AND Group";
else
Name = "OR Group";
}
public int Depth
{
get
{
foreach (iSymbol s in Symbols)
{
if (s is SymbolGroup)
{
return (s as SymbolGroup).Depth + 1;
}
}
return 1;
}
}
}
The method of inversion is also contained within this class. It replaces an unexpanded group in the results list with all of the expanded results of that result. It only strips away one level at a time.
public List<SymbolGroup> InvertGroup()
{
List<SymbolGroup> Results = new List<SymbolGroup>();
foreach (iSymbol s in Symbols)
{
if (s is SymbolGroup)
{
SymbolGroup sg = s as SymbolGroup;
sg.Parent = null;
Results.Add(s as SymbolGroup);
}
else if (s is Symbol)
{
SymbolGroup sg = new SymbolGroup(null, SymRelation.AND);
sg.AddSymbol(s);
Results.Add(sg);
}
}
bool AllChecked = false;
while (!AllChecked)
{
AllChecked = true;
for(int i=0;i<Results.Count;i++)
{
SymbolGroup result = Results[i];
if (result.Depth > 1)
{
AllChecked = false;
Results.RemoveAt(i--);
}
else
continue;
if (result.SymbolRelation == SymRelation.OR)
{
Results.AddRange(result.MultiplyOut());
continue;
}
for(int j=0;j<result.nSymbols;j++)
{
iSymbol s = result.Symbols[j];
if (s is SymbolGroup)
{
result.Symbols.RemoveAt(j--); //removes the symbolgroup that is being replaced, so that the rest of the group can be added to the expansion.
AllChecked = false;
SymbolGroup subResult = s as SymbolGroup;
if(subResult.SymbolRelation == SymRelation.OR)
{
List<SymbolGroup> newResults;
newResults = subResult.MultiplyOut();
foreach(SymbolGroup newSg in newResults)
{
newSg.Symbols.AddRange(result.Symbols);
}
Results.AddRange(newResults);
}
break;
}
}
}
}
return Results;
}

Removing duplicates from string array

I'm new to C#, have looked at numerous posts but am still confused.
I have a array list:
List<Array> moves = new List<Array>();
I'm adding moves to it using the following:
string[] newmove = { piece, axis.ToString(), direction.ToString() };
moves.Add(newmove);
And now I wish to remove duplicates using the following:
moves = moves.Distinct();
However it's not letting me do it. I get this error:
Cannot implicitly convert type 'System.Collections.Generic.IEnumerable' to 'System.Collections.Generic.List'. An explicit conversion exists (are you missing a cast?)
Help please? I'd be so grateful.
Steve
You need to call .ToList() after the .Distinct method as it returns IEnumerable<T>. I would also recommend you using a strongly typed List<string[]> instead of List<Array>:
List<string[]> moves = new List<string[]>();
string[] newmove = { piece, axis.ToString(), direction.ToString() };
moves.Add(newmove);
moves.Add(newmove);
moves = moves.Distinct().ToList();
// At this stage moves.Count = 1
Your code has two errors. The first is the missing call to ToList, as already pointed out. The second is subtle. Unique compares objects by identity, but your duplicate list items have are different array instances.
There are multiple solutions for that problem.
Use a custom equality comparer in moves.Distinct().ToList(). No further changes necessary.
Sample implementation:
class ArrayEqualityComparer<T> : EqualityComparer<T> {
public override bool Equals(T[] x, T[] y) {
if ( x == null ) return y == null;
else if ( y == null ) return false;
return x.SequenceEquals(y);
}
public override int GetHashCode(T[] obj) {
if ( obj == null) return 0;
return obj.Aggregate(0, (hash, x) => hash ^ x.GetHashCode());
}
}
Filtering for unique items:
moves = moves.Distinct(new ArrayEqualityComparer<string>()).ToList();
Use Tuple<string,string,string> instead of string[]. Tuple offers built-in structural equality and comparison. This variant might make your code cluttered because of the long type name.
Instantiation:
List<Tuple<string, string, string>> moves =
new List<Tuple<string, string, string>>();
Adding new moves:
Tuple<string, string, string> newmove =
Tuple.Create(piece, axis.ToString(), direction.ToString());
moves.Add(newmove);
Filtering for unique items:
moves = moves.Distinct().ToList();
Use a custom class to hold your three values. I'd actually recommend this variant, because it makes all your code dealing with moves much more readable.
Sample implementation:
class Move {
public Move(string piece, string axis, string direction) {
Piece = piece;
Axis = axis;
Direction = direction;
}
string Piece { get; private set; }
string Axis { get; private set; }
string Direction { get; private set; }
public override Equals(object obj) {
Move other = obj as Move;
if ( other != null )
return Piece == other.Piece &&
Axis == other.Axis &&
Direction == other.Direction;
return false;
}
public override GetHashCode() {
return Piece.GetHashCode() ^
Axis.GetHashCode() ^
Direction.GetHashCode();
}
// TODO: override ToString() as well
}
Instantiation:
List<Move> moves = new List<Move>();
Adding new moves:
Move newmove = new Move(piece, axis.ToString(), direction.ToString());
moves.Add(newmove);
Filtering for unique items:
moves = moves.Distinct().ToList();
The compiler error is because you need to convert the result to a list:
moves = moves.Distinct().ToList();
However it probably won't work as you want, because arrays don't have Equals defined in the way that you are hoping (it compares the references of the array objects, not the values inside the array). Instead of using an array, create a class to hold your data and define Equals and GetHashCode to compare the values.
Old question, but this is an O(n) solution using O(1) additional space:
public static void RemoveDuplicates(string[] array)
{
int c = 0;
int i = -1;
for (int n = 1; n < array.Length; n++)
{
if (array[c] == array[n])
{
if (i == -1)
{
i = n;
}
}
else
{
if (i == -1)
{
c++;
}
else
{
array[i] = array[n];
c++;
i++;
}
}
}
}

Performance ideas (in-memory C# hashset and contains too slow)

I have the following code
private void LoadIntoMemory()
{
//Init large HashSet
HashSet<document> hsAllDocuments = new HashSet<document>();
//Get first rows from database
List<document> docsList = document.GetAllAboveDocID(0, 500000);
//Load objects into dictionary
foreach (document d in docsList)
{
hsAllDocuments.Add(d);
}
Application["dicAllDocuments"] = hsAllDocuments;
}
private HashSet<document> documentHits(HashSet<document> hsRawHit, HashSet<document> hsAllDocuments, string query, string[] queryArray)
{
int counter = 0;
const int maxCount = 1000;
foreach (document d in hsAllDocuments)
{
//Headline
if (d.Headline.Contains(query))
{
if (counter >= maxCount)
break;
hsRawHit.Add(d);
counter++;
}
//Description
if (d.Description.Contains(query))
{
if (counter >= maxCount)
break;
hsRawHit.Add(d);
counter++;
}
//splitted query word by word
//string[] queryArray = query.Split(' ');
if (queryArray.Count() > 1)
{
foreach (string q in queryArray)
{
if (d.Headline.Contains(q))
{
if (counter >= maxCount)
break;
hsRawHit.Add(d);
counter++;
}
//Description
if (d.Description.Contains(q))
{
if (counter >= maxCount)
break;
hsRawHit.Add(d);
counter++;
}
}
}
}
return hsRawHit;
}
First I load all the data into a hashset (via Application for later use) - runs fine - totally OK to be slow for what I'm doing.
Will be running 4.0 framework in C# (can't update to the new upgrade for 4.0 with the async stuff).
The documentHits method runs fairly slow on my current setup - considering that it's all in memory. What can I do to speed up this method?
Examples would be awesome - thanks.
I see that you are using a HashSet, but you are not using any of it's advantages, so you should just use a List instead.
What's taking time is looping through all the documents and looking for strings in other strings, so you should try to elliminate as much as possible of that.
One possibility is to set up indexes of which documents contains which character pairs. If the string query contains Hello, you would be looking in the documents that contains He, el, ll and lo.
You could set up a Dictionary<string, List<int>> where the dictionary key is the character combinations and the list contains indexes to the documents in your document list. Setting up the dictionary will take some time, of course, but you can focus on the character combinations that are less common. If a character combination exists in 80% of the documents, it's pretty useless for elliminating documents, but if a character combination exists in only 2% of the documents it has elliminated 98% of your work.
If you loop through the documents in the list and add occurances to the lists in the dictionary, the lists of indexes will be sorted, so it will be very easy to join the lists later on. When you add indexes to the list, you can throw away lists when they get too large and you see that they would not be useful for elliminating documents. That way you will only be keeping the shorter lists and they will not consume so much memory.
Edit:
It put together a small example:
public class IndexElliminator<T> {
private List<T> _items;
private Dictionary<string, List<int>> _index;
private Func<T, string> _getContent;
private static HashSet<string> GetPairs(string value) {
HashSet<string> pairs = new HashSet<string>();
for (int i = 1; i < value.Length; i++) {
pairs.Add(value.Substring(i - 1, 2));
}
return pairs;
}
public IndexElliminator(List<T> items, Func<T, string> getContent, int maxIndexSize) {
_items = items;
_getContent = getContent;
_index = new Dictionary<string, List<int>>();
for (int index = 0;index<_items.Count;index++) {
T item = _items[index];
foreach (string pair in GetPairs(_getContent(item))) {
List<int> list;
if (_index.TryGetValue(pair, out list)) {
if (list != null) {
if (list.Count == maxIndexSize) {
_index[pair] = null;
} else {
list.Add(index);
}
}
} else {
list = new List<int>();
list.Add(index);
_index.Add(pair, list);
}
}
}
}
private static List<int> JoinLists(List<int> list1, List<int> list2) {
List<int> result = new List<int>();
int i1 = 0, i2 = 0;
while (i1 < list1.Count && i2 < list2.Count) {
switch (Math.Sign(list1[i1].CompareTo(list2[i2]))) {
case 0: result.Add(list1[i1]); i1++; i2++; break;
case -1: i1++; break;
case 1: i2++; break;
}
}
return result;
}
public List<T> Find(string query) {
HashSet<string> pairs = GetPairs(query);
List<List<int>> indexes = new List<List<int>>();
bool found = false;
foreach (string pair in pairs) {
List<int> list;
if (_index.TryGetValue(pair, out list)) {
found = true;
if (list != null) {
indexes.Add(list);
}
}
}
List<T> result = new List<T>();
if (found && indexes.Count == 0) {
indexes.Add(Enumerable.Range(0, _items.Count).ToList());
}
if (indexes.Count > 0) {
while (indexes.Count > 1) {
indexes[indexes.Count - 2] = JoinLists(indexes[indexes.Count - 2], indexes[indexes.Count - 1]);
indexes.RemoveAt(indexes.Count - 1);
}
foreach (int index in indexes[0]) {
if (_getContent(_items[index]).Contains(query)) {
result.Add(_items[index]);
}
}
}
return result;
}
}
Test:
List<string> items = new List<string> {
"Hello world",
"How are you",
"What is this",
"Can this be true",
"Some phrases",
"Words upon words",
"What to do",
"Where to go",
"When is this",
"How can this be",
"Well above margin",
"Close to the center"
};
IndexElliminator<string> index = new IndexElliminator<string>(items, s => s, items.Count / 2);
List<string> found = index.Find("this");
foreach (string s in found) Console.WriteLine(s);
Output:
What is this
Can this be true
When is this
How can this be
You are running linearly through all documents to find matches - this is O(n), you could do better if you solved the inverse problem, similar to how a fulltext index works: start with the query terms and preprocess the set of documents that match each query term - since this might get complicated I would suggest just using a DB with fulltext capability, this will be much faster than your approach.
Also you are abusing HashSet - instead just use a List, and don't put in duplicates - all the cases in documentHits() that produce a match should be exclusive.
If you have a whole lot of time at the start to create the database, you can look into using a Trie.
A Trie will make the string search much faster.
There's a little explanation and an implementation in the end here.
Another implementation: Trie class
You should not test each document for all test steps!
Instead it you should go to the next document after first successeful test result.
hsRawHit.Add(d);
counter++;
you should add continue; after counter++;
hsRawHit.Add(d);
counter++;
continue;

How to refactor this routine to avoid the use of recursion?

So I was writing a mergesort in C# as an exercise and although it worked, looking back at the code, there was room for improvement.
Basically, the second part of the algorithm requires a routine to merge two sorted lists.
Here is my way too long implementation that could use some refactoring:
private static List<int> MergeSortedLists(List<int> sLeft, List<int> sRight)
{
if (sLeft.Count == 0 || sRight.Count == 0)
{
sLeft.AddRange(sRight);
return sLeft;
}
else if (sLeft.Count == 1 && sRight.Count == 1)
{
if (sLeft[0] <= sRight[0])
sLeft.Add(sRight[0]);
else
sLeft.Insert(0, sRight[0]);
return sLeft;
}
else if (sLeft.Count == 1 && sRight.Count > 1)
{
for (int i=0; i<sRight.Count; i++)
{
if (sLeft[0] <= sRight[i])
{
sRight.Insert(i, sLeft[0]);
return sRight;
}
}
sRight.Add(sLeft[0]);
return sRight;
}
else if (sLeft.Count > 1 && sRight.Count == 1)
{
for (int i=0; i<sLeft.Count; i++)
{
if (sRight[0] <= sLeft[i])
{
sLeft.Insert(i, sRight[0]);
return sLeft;
}
}
sLeft.Add(sRight[0]);
return sLeft;
}
else
{
List<int> list = new List<int>();
if (sLeft[0] <= sRight[0])
{
list.Add(sLeft[0]);
sLeft.RemoveAt(0);
}
else
{
list.Add(sRight[0]);
sRight.RemoveAt(0);
}
list.AddRange(MergeSortedLists(sLeft, sRight));
return list;
}
}
Surely this routine can be improved/shortened by removing recursion, etc. There are even other ways to merge 2 sorted lists. So any refactoring is welcome.
Although I do have an answer, I'm curious as to how would other programmers would go about improving this routine.
Thank you!
Merging two sorted lists can be done in O(n).
List<int> lList, rList, resultList;
int r,l = 0;
while(l < lList.Count && r < rList.Count)
{
if(lList[l] < rList[r]
resultList.Add(lList[l++]);
else
resultList.Add(rList[r++]);
}
//And add the missing parts.
while(l < lList.Count)
resultList.Add(lList[l++]);
while(r < rList.Count)
resultList.Add(rList[r++]);
My take on this would be:
private static List<int> MergeSortedLists(List<int> sLeft, List<int> sRight)
{
List<int> result = new List<int>();
int indexLeft = 0;
int indexRight = 0;
while (indexLeft < sLeft.Count || indexRight < sRight.Count)
{
if (indexRight == sRight.Count ||
(indexLeft < sLeft.Count && sLeft[indexLeft] < sRight[indexRight]))
{
result.Add(sLeft[indexLeft]);
indexLeft++;
}
else
{
result.Add(sRight[indexRight]);
indexRight++;
}
}
return result;
}
Exactly what I'd do if I had to do it by hand. =)
Are you really sure your code works at all? Without testing it, i see the following:
...
else if (sLeft.Count > 1 && sRight.Count == 0) //<-- sRight is empty
{
for (int i=0; i<sLeft.Count; i++)
{
if (sRight[0] <= sLeft[i]) //<-- IndexError?
{
sLeft.Insert(i, sRight[0]);
return sLeft;
}
}
sLeft.Add(sRight[0]);
return sLeft;
}
...
As a starting point, I would remove your special cases for when either of the lists has Count == 1 - they can be handled by your more general (currently recursing) case.
The if (sLeft.Count > 1 && sRight.Count == 0) will never be true because you've checked for sRight.Count == 0 at the start - so this code will never be reached and is redundant.
Finally, instead of recursing (which is very costly in this case due to the number of new Lists you create - one per element!), I'd do something like this in your else (actually, this could replace your entire method):
List<int> list = new List<int>();
while (sLeft.Count > 0 && sRight.Count > 0)
{
if (sLeft[0] <= sRight[0])
{
list.Add(sLeft[0]);
sLeft.RemoveAt(0);
}
else
{
list.Add(sRight[0]);
sRight.RemoveAt(0);
}
}
// one of these two is already empty; the other is in sorted order...
list.AddRange(sLeft);
list.AddRange(sRight);
return list;
(Ideally I'd refactor this to use integer indexes against each list, instead of using .RemoveAt, because it's more performant to loop through the list than destroy it, and because it might be useful to leave the original lists intact. This is still more efficient code than the original, though!)
You were asking for differrent approaches as well. I might do as below depending on the usage. The below code is lazy so it will not sort the entire list at once but only when elements are requested.
class MergeEnumerable<T> : IEnumerable<T>
{
public IEnumerator<T> GetEnumerator()
{
var left = _left.GetEnumerator();
var right = _right.GetEnumerator();
var leftHasSome = left.MoveNext();
var rightHasSome = right.MoveNext();
while (leftHasSome || rightHasSome)
{
if (leftHasSome && rightHasSome)
{
if(_comparer.Compare(left.Current,right.Current) < 0)
{
yield return returner(left);
} else {
yield return returner(right);
}
}
else if (rightHasSome)
{
returner(right);
}
else
{
returner(left);
}
}
}
private T returner(IEnumerator<T> enumerator)
{
var current = enumerator.Current;
enumerator.MoveNext();
return current;
}
System.Collections.IEnumerator System.Collections.IEnumerable.GetEnumerator()
{
return ((IEnumerable<T>)this).GetEnumerator();
}
private IEnumerable<T> _left;
private IEnumerable<T> _right;
private IComparer<T> _comparer;
MergeEnumerable(IEnumerable<T> left, IEnumerable<T> right, IComparer<T> comparer)
{
_left = left;
_right = right;
_comparer = comparer;
}
}
EDIT: It's basically the same implementatin as Sergey Osypchuk his will from start to finish when looking only at the sorting be fastest but the latency will be higher as well due to the fact of sorting the entire list upfront. So as I said depending on the usage I might go with this approach and an alternative would be something similar to Sergey Osypchuk
Often you can use a stack instead of use recursion
Merge list (by theory, input lists are sorted in advance) sorting could be implemented in following way:
List<int> MergeSorting(List<int> a, List<int> b)
{
int apos = 0;
int bpos = 0;
List<int> result = new List<int>();
while (apos < a.Count && bpos < b.Count)
{
int avalue = int.MaxValue;
int bvalue = int.MaxValue;
if (apos < a.Count)
avalue = a[apos];
if (bpos < b.Count)
bvalue = b[bpos];
if (avalue < bvalue)
{
result.Add(avalue);
apos++;
}
else
{
result.Add(bvalue);
bpos++;
}
}
return result;
}
In case you start with not sorted list you need to split it by sorted subsequence and than marge them using function above
I never use recursion for merge sort. You can make iterative passes over the input, taking advantage of the fact that the sorted block size doubles with every merge pass. Keep track of the block size and the count of items you've processed from each input list; when they're equal, the list is exhausted. When both lists are exhausted you can move on to the next pair of blocks. When the block size is greater than or equal to your input size, you're done.
Edit: Some of the information I had left previously was incorrect, due to my misunderstanding - a List in C# is similar to an array and not a linked list. My apologies.

Remove duplicates from a List<T> in C#

Anyone have a quick method for de-duplicating a generic List in C#?
If you're using .Net 3+, you can use Linq.
List<T> withDupes = LoadSomeData();
List<T> noDupes = withDupes.Distinct().ToList();
Perhaps you should consider using a HashSet.
From the MSDN link:
using System;
using System.Collections.Generic;
class Program
{
static void Main()
{
HashSet<int> evenNumbers = new HashSet<int>();
HashSet<int> oddNumbers = new HashSet<int>();
for (int i = 0; i < 5; i++)
{
// Populate numbers with just even numbers.
evenNumbers.Add(i * 2);
// Populate oddNumbers with just odd numbers.
oddNumbers.Add((i * 2) + 1);
}
Console.Write("evenNumbers contains {0} elements: ", evenNumbers.Count);
DisplaySet(evenNumbers);
Console.Write("oddNumbers contains {0} elements: ", oddNumbers.Count);
DisplaySet(oddNumbers);
// Create a new HashSet populated with even numbers.
HashSet<int> numbers = new HashSet<int>(evenNumbers);
Console.WriteLine("numbers UnionWith oddNumbers...");
numbers.UnionWith(oddNumbers);
Console.Write("numbers contains {0} elements: ", numbers.Count);
DisplaySet(numbers);
}
private static void DisplaySet(HashSet<int> set)
{
Console.Write("{");
foreach (int i in set)
{
Console.Write(" {0}", i);
}
Console.WriteLine(" }");
}
}
/* This example produces output similar to the following:
* evenNumbers contains 5 elements: { 0 2 4 6 8 }
* oddNumbers contains 5 elements: { 1 3 5 7 9 }
* numbers UnionWith oddNumbers...
* numbers contains 10 elements: { 0 2 4 6 8 1 3 5 7 9 }
*/
How about:
var noDupes = list.Distinct().ToList();
In .net 3.5?
Simply initialize a HashSet with a List of the same type:
var noDupes = new HashSet<T>(withDupes);
Or, if you want a List returned:
var noDupsList = new HashSet<T>(withDupes).ToList();
Sort it, then check two and two next to each others, as the duplicates will clump together.
Something like this:
list.Sort();
Int32 index = list.Count - 1;
while (index > 0)
{
if (list[index] == list[index - 1])
{
if (index < list.Count - 1)
(list[index], list[list.Count - 1]) = (list[list.Count - 1], list[index]);
list.RemoveAt(list.Count - 1);
index--;
}
else
index--;
}
Notes:
Comparison is done from back to front, to avoid having to resort list after each removal
This example now uses C# Value Tuples to do the swapping, substitute with appropriate code if you can't use that
The end-result is no longer sorted
I like to use this command:
List<Store> myStoreList = Service.GetStoreListbyProvince(provinceId)
.GroupBy(s => s.City)
.Select(grp => grp.FirstOrDefault())
.OrderBy(s => s.City)
.ToList();
I have these fields in my list: Id, StoreName, City, PostalCode
I wanted to show list of cities in a dropdown which has duplicate values.
solution: Group by city then pick the first one for the list.
It worked for me. simply use
List<Type> liIDs = liIDs.Distinct().ToList<Type>();
Replace "Type" with your desired type e.g. int.
As kronoz said in .Net 3.5 you can use Distinct().
In .Net 2 you could mimic it:
public IEnumerable<T> DedupCollection<T> (IEnumerable<T> input)
{
var passedValues = new HashSet<T>();
// Relatively simple dupe check alg used as example
foreach(T item in input)
if(passedValues.Add(item)) // True if item is new
yield return item;
}
This could be used to dedupe any collection and will return the values in the original order.
It's normally much quicker to filter a collection (as both Distinct() and this sample does) than it would be to remove items from it.
An extension method might be a decent way to go... something like this:
public static List<T> Deduplicate<T>(this List<T> listToDeduplicate)
{
return listToDeduplicate.Distinct().ToList();
}
And then call like this, for example:
List<int> myFilteredList = unfilteredList.Deduplicate();
In Java (I assume C# is more or less identical):
list = new ArrayList<T>(new HashSet<T>(list))
If you really wanted to mutate the original list:
List<T> noDupes = new ArrayList<T>(new HashSet<T>(list));
list.clear();
list.addAll(noDupes);
To preserve order, simply replace HashSet with LinkedHashSet.
This takes distinct (the elements without duplicating elements) and convert it into a list again:
List<type> myNoneDuplicateValue = listValueWithDuplicate.Distinct().ToList();
Use Linq's Union method.
Note: This solution requires no knowledge of Linq, aside from that it exists.
Code
Begin by adding the following to the top of your class file:
using System.Linq;
Now, you can use the following to remove duplicates from an object called, obj1:
obj1 = obj1.Union(obj1).ToList();
Note: Rename obj1 to the name of your object.
How it works
The Union command lists one of each entry of two source objects. Since obj1 is both source objects, this reduces obj1 to one of each entry.
The ToList() returns a new List. This is necessary, because Linq commands like Union returns the result as an IEnumerable result instead of modifying the original List or returning a new List.
As a helper method (without Linq):
public static List<T> Distinct<T>(this List<T> list)
{
return (new HashSet<T>(list)).ToList();
}
Here's an extension method for removing adjacent duplicates in-situ. Call Sort() first and pass in the same IComparer. This should be more efficient than Lasse V. Karlsen's version which calls RemoveAt repeatedly (resulting in multiple block memory moves).
public static void RemoveAdjacentDuplicates<T>(this List<T> List, IComparer<T> Comparer)
{
int NumUnique = 0;
for (int i = 0; i < List.Count; i++)
if ((i == 0) || (Comparer.Compare(List[NumUnique - 1], List[i]) != 0))
List[NumUnique++] = List[i];
List.RemoveRange(NumUnique, List.Count - NumUnique);
}
Installing the MoreLINQ package via Nuget, you can easily distinct object list by a property
IEnumerable<Catalogue> distinctCatalogues = catalogues.DistinctBy(c => c.CatalogueCode);
If you have tow classes Product and Customer and we want to remove duplicate items from their list
public class Product
{
public int Id { get; set; }
public string ProductName { get; set; }
}
public class Customer
{
public int Id { get; set; }
public string CustomerName { get; set; }
}
You must define a generic class in the form below
public class ItemEqualityComparer<T> : IEqualityComparer<T> where T : class
{
private readonly PropertyInfo _propertyInfo;
public ItemEqualityComparer(string keyItem)
{
_propertyInfo = typeof(T).GetProperty(keyItem, BindingFlags.GetProperty | BindingFlags.Instance | BindingFlags.Public);
}
public bool Equals(T x, T y)
{
var xValue = _propertyInfo?.GetValue(x, null);
var yValue = _propertyInfo?.GetValue(y, null);
return xValue != null && yValue != null && xValue.Equals(yValue);
}
public int GetHashCode(T obj)
{
var propertyValue = _propertyInfo.GetValue(obj, null);
return propertyValue == null ? 0 : propertyValue.GetHashCode();
}
}
then, You can remove duplicate items in your list.
var products = new List<Product>
{
new Product{ProductName = "product 1" ,Id = 1,},
new Product{ProductName = "product 2" ,Id = 2,},
new Product{ProductName = "product 2" ,Id = 4,},
new Product{ProductName = "product 2" ,Id = 4,},
};
var productList = products.Distinct(new ItemEqualityComparer<Product>(nameof(Product.Id))).ToList();
var customers = new List<Customer>
{
new Customer{CustomerName = "Customer 1" ,Id = 5,},
new Customer{CustomerName = "Customer 2" ,Id = 5,},
new Customer{CustomerName = "Customer 2" ,Id = 5,},
new Customer{CustomerName = "Customer 2" ,Id = 5,},
};
var customerList = customers.Distinct(new ItemEqualityComparer<Customer>(nameof(Customer.Id))).ToList();
this code remove duplicate items by Id if you want remove duplicate items by other property, you can change nameof(YourClass.DuplicateProperty) same nameof(Customer.CustomerName) then remove duplicate items by CustomerName Property.
If you don't care about the order you can just shove the items into a HashSet, if you do want to maintain the order you can do something like this:
var unique = new List<T>();
var hs = new HashSet<T>();
foreach (T t in list)
if (hs.Add(t))
unique.Add(t);
Or the Linq way:
var hs = new HashSet<T>();
list.All( x => hs.Add(x) );
Edit: The HashSet method is O(N) time and O(N) space while sorting and then making unique (as suggested by #lassevk and others) is O(N*lgN) time and O(1) space so it's not so clear to me (as it was at first glance) that the sorting way is inferior
Might be easier to simply make sure that duplicates are not added to the list.
if(items.IndexOf(new_item) < 0)
items.add(new_item)
You can use Union
obj2 = obj1.Union(obj1).ToList();
Another way in .Net 2.0
static void Main(string[] args)
{
List<string> alpha = new List<string>();
for(char a = 'a'; a <= 'd'; a++)
{
alpha.Add(a.ToString());
alpha.Add(a.ToString());
}
Console.WriteLine("Data :");
alpha.ForEach(delegate(string t) { Console.WriteLine(t); });
alpha.ForEach(delegate (string v)
{
if (alpha.FindAll(delegate(string t) { return t == v; }).Count > 1)
alpha.Remove(v);
});
Console.WriteLine("Unique Result :");
alpha.ForEach(delegate(string t) { Console.WriteLine(t);});
Console.ReadKey();
}
There are many ways to solve - the duplicates issue in the List, below is one of them:
List<Container> containerList = LoadContainer();//Assume it has duplicates
List<Container> filteredList = new List<Container>();
foreach (var container in containerList)
{
Container duplicateContainer = containerList.Find(delegate(Container checkContainer)
{ return (checkContainer.UniqueId == container.UniqueId); });
//Assume 'UniqueId' is the property of the Container class on which u r making a search
if(!containerList.Contains(duplicateContainer) //Add object when not found in the new class object
{
filteredList.Add(container);
}
}
Cheers
Ravi Ganesan
Here's a simple solution that doesn't require any hard-to-read LINQ or any prior sorting of the list.
private static void CheckForDuplicateItems(List<string> items)
{
if (items == null ||
items.Count == 0)
return;
for (int outerIndex = 0; outerIndex < items.Count; outerIndex++)
{
for (int innerIndex = 0; innerIndex < items.Count; innerIndex++)
{
if (innerIndex == outerIndex) continue;
if (items[outerIndex].Equals(items[innerIndex]))
{
// Duplicate Found
}
}
}
}
David J.'s answer is a good method, no need for extra objects, sorting, etc. It can be improved on however:
for (int innerIndex = items.Count - 1; innerIndex > outerIndex ; innerIndex--)
So the outer loop goes top bottom for the entire list, but the inner loop goes bottom "until the outer loop position is reached".
The outer loop makes sure the entire list is processed, the inner loop finds the actual duplicates, those can only happen in the part that the outer loop hasn't processed yet.
Or if you don't want to do bottom up for the inner loop you could have the inner loop start at outerIndex + 1.
A simple intuitive implementation:
public static List<PointF> RemoveDuplicates(List<PointF> listPoints)
{
List<PointF> result = new List<PointF>();
for (int i = 0; i < listPoints.Count; i++)
{
if (!result.Contains(listPoints[i]))
result.Add(listPoints[i]);
}
return result;
}
All answers copy lists, or create a new list, or use slow functions, or are just painfully slow.
To my understanding, this is the fastest and cheapest method I know (also, backed by a very experienced programmer specialized on real-time physics optimization).
// Duplicates will be noticed after a sort O(nLogn)
list.Sort();
// Store the current and last items. Current item declaration is not really needed, and probably optimized by the compiler, but in case it's not...
int lastItem = -1;
int currItem = -1;
int size = list.Count;
// Store the index pointing to the last item we want to keep in the list
int last = size - 1;
// Travel the items from last to first O(n)
for (int i = last; i >= 0; --i)
{
currItem = list[i];
// If this item was the same as the previous one, we don't want it
if (currItem == lastItem)
{
// Overwrite last in current place. It is a swap but we don't need the last
list[i] = list[last];
// Reduce the last index, we don't want that one anymore
last--;
}
// A new item, we store it and continue
else
lastItem = currItem;
}
// We now have an unsorted list with the duplicates at the end.
// Remove the last items just once
list.RemoveRange(last + 1, size - last - 1);
// Sort again O(n logn)
list.Sort();
Final cost is:
nlogn + n + nlogn = n + 2nlogn = O(nlogn) which is pretty nice.
Note about RemoveRange:
Since we cannot set the count of the list and avoid using the Remove funcions, I don't know exactly the speed of this operation but I guess it is the fastest way.
Using HashSet this can be done easily.
List<int> listWithDuplicates = new List<int> { 1, 2, 1, 2, 3, 4, 5 };
HashSet<int> hashWithoutDuplicates = new HashSet<int> ( listWithDuplicates );
List<int> listWithoutDuplicates = hashWithoutDuplicates.ToList();
Using HashSet:
list = new HashSet<T>(list).ToList();
public static void RemoveDuplicates<T>(IList<T> list )
{
if (list == null)
{
return;
}
int i = 1;
while(i<list.Count)
{
int j = 0;
bool remove = false;
while (j < i && !remove)
{
if (list[i].Equals(list[j]))
{
remove = true;
}
j++;
}
if (remove)
{
list.RemoveAt(i);
}
else
{
i++;
}
}
}
If you need to compare complex objects, you will need to pass a Comparer object inside the Distinct() method.
private void GetDistinctItemList(List<MyListItem> _listWithDuplicates)
{
//It might be a good idea to create MyListItemComparer
//elsewhere and cache it for performance.
List<MyListItem> _listWithoutDuplicates = _listWithDuplicates.Distinct(new MyListItemComparer()).ToList();
//Choose the line below instead, if you have a situation where there is a chance to change the list while Distinct() is running.
//ToArray() is used to solve "Collection was modified; enumeration operation may not execute" error.
//List<MyListItem> _listWithoutDuplicates = _listWithDuplicates.ToArray().Distinct(new MyListItemComparer()).ToList();
return _listWithoutDuplicates;
}
Assuming you have 2 other classes like:
public class MyListItemComparer : IEqualityComparer<MyListItem>
{
public bool Equals(MyListItem x, MyListItem y)
{
return x != null
&& y != null
&& x.A == y.A
&& x.B.Equals(y.B);
&& x.C.ToString().Equals(y.C.ToString());
}
public int GetHashCode(MyListItem codeh)
{
return codeh.GetHashCode();
}
}
And:
public class MyListItem
{
public int A { get; }
public string B { get; }
public MyEnum C { get; }
public MyListItem(int a, string b, MyEnum c)
{
A = a;
B = b;
C = c;
}
}
I think the simplest way is:
Create a new list and add unique item.
Example:
class MyList{
int id;
string date;
string email;
}
List<MyList> ml = new Mylist();
ml.Add(new MyList(){
id = 1;
date = "2020/09/06";
email = "zarezadeh#gmailcom"
});
ml.Add(new MyList(){
id = 2;
date = "2020/09/01";
email = "zarezadeh#gmailcom"
});
List<MyList> New_ml = new Mylist();
foreach (var item in ml)
{
if (New_ml.Where(w => w.email == item.email).SingleOrDefault() == null)
{
New_ml.Add(new MyList()
{
id = item.id,
date = item.date,
email = item.email
});
}
}

Categories

Resources