Removing duplicates from string array

Removing duplicates from string array - c#

I'm new to C#, have looked at numerous posts but am still confused.
I have a array list:
List<Array> moves = new List<Array>();
I'm adding moves to it using the following:
string[] newmove = { piece, axis.ToString(), direction.ToString() };
moves.Add(newmove);
And now I wish to remove duplicates using the following:
moves = moves.Distinct();
However it's not letting me do it. I get this error:
Cannot implicitly convert type 'System.Collections.Generic.IEnumerable' to 'System.Collections.Generic.List'. An explicit conversion exists (are you missing a cast?)
Help please? I'd be so grateful.
Steve

You need to call .ToList() after the .Distinct method as it returns IEnumerable<T>. I would also recommend you using a strongly typed List<string[]> instead of List<Array>:
List<string[]> moves = new List<string[]>();
string[] newmove = { piece, axis.ToString(), direction.ToString() };
moves.Add(newmove);
moves.Add(newmove);
moves = moves.Distinct().ToList();
// At this stage moves.Count = 1

Your code has two errors. The first is the missing call to ToList, as already pointed out. The second is subtle. Unique compares objects by identity, but your duplicate list items have are different array instances.
There are multiple solutions for that problem.
Use a custom equality comparer in moves.Distinct().ToList(). No further changes necessary.
Sample implementation:
class ArrayEqualityComparer<T> : EqualityComparer<T> {
public override bool Equals(T[] x, T[] y) {
if ( x == null ) return y == null;
else if ( y == null ) return false;
return x.SequenceEquals(y);
}
public override int GetHashCode(T[] obj) {
if ( obj == null) return 0;
return obj.Aggregate(0, (hash, x) => hash ^ x.GetHashCode());
}
}
Filtering for unique items:
moves = moves.Distinct(new ArrayEqualityComparer<string>()).ToList();
Use Tuple<string,string,string> instead of string[]. Tuple offers built-in structural equality and comparison. This variant might make your code cluttered because of the long type name.
Instantiation:
List<Tuple<string, string, string>> moves =
new List<Tuple<string, string, string>>();
Adding new moves:
Tuple<string, string, string> newmove =
Tuple.Create(piece, axis.ToString(), direction.ToString());
moves.Add(newmove);
Filtering for unique items:
moves = moves.Distinct().ToList();
Use a custom class to hold your three values. I'd actually recommend this variant, because it makes all your code dealing with moves much more readable.
Sample implementation:
class Move {
public Move(string piece, string axis, string direction) {
Piece = piece;
Axis = axis;
Direction = direction;
}
string Piece { get; private set; }
string Axis { get; private set; }
string Direction { get; private set; }
public override Equals(object obj) {
Move other = obj as Move;
if ( other != null )
return Piece == other.Piece &&
Axis == other.Axis &&
Direction == other.Direction;
return false;
}
public override GetHashCode() {
return Piece.GetHashCode() ^
Axis.GetHashCode() ^
Direction.GetHashCode();
}
// TODO: override ToString() as well
}
Instantiation:
List<Move> moves = new List<Move>();
Adding new moves:
Move newmove = new Move(piece, axis.ToString(), direction.ToString());
moves.Add(newmove);
Filtering for unique items:
moves = moves.Distinct().ToList();

The compiler error is because you need to convert the result to a list:
moves = moves.Distinct().ToList();
However it probably won't work as you want, because arrays don't have Equals defined in the way that you are hoping (it compares the references of the array objects, not the values inside the array). Instead of using an array, create a class to hold your data and define Equals and GetHashCode to compare the values.

Old question, but this is an O(n) solution using O(1) additional space:
public static void RemoveDuplicates(string[] array)
{
int c = 0;
int i = -1;
for (int n = 1; n < array.Length; n++)
{
if (array[c] == array[n])
{
if (i == -1)
{
i = n;
}
}
else
{
if (i == -1)
{
c++;
}
else
{
array[i] = array[n];
c++;
i++;
}
}
}
}

Related

Check if an object equals another object through a transitive relationship

I have a class ValuePair with two properties defined in it:
public class ValuePair: IEquatable<ValuePair>
{
public string value1;
public string value2;
public ValuePair(string v1, string v1)
{
this.value1 = v1;
this.value2 = v2;
}
...
}
I have some test data in a List as defined below:
List<ValuePair> pairs = new ValuePair<ValuePair>();
pairs.Add(new ValuePair("A","B"));
pairs.Add(new ValuePair("A","C"));
pairs.Add(new ValuePair("B","C"));
pairs.Add(new ValuePair("C","D"));
My goal is to keep pairs[0] and pairs[1] because the pairs "A,B" and "A,C" are unique, but to remove pair[2] because the relationship "B,C" has already been captured in the first two relationships. pairs[3] should remain since the "C,D" relationship is unique.
I have a feeling the solution to this problem will be recursive, which is something that I have very little experience with. I started going down a path of adding a method to the class ValuePair that looks something like this:
public string EqualToEither(ValuePair v)
{
if (v.value1 == this.value1 || v.value1 == this.value2)
return v.value1;
else if (v.value2 == this.value1 || v.value2 == this.value2)
return v.value2;
else
return string.Empty;
}
I've started to try to use the above method inside of a function like this, but I am getting hung up on what to do next:
for (int i = 0; i < pairs.Count; i++)
{
for (int j = pairs.Count - 1; j >= 0; j--)
{
if (pairs[j].EqualToEither(pairs[i]) != string.Empty)
{
pairs[j].EqualToEither(pairs[i]);
}
else
{
continue;
}
}
}
I feel like I am close but still unable to get it. Can anyone please offer some guidance? If I'm approaching this the completely wrong way please let me know, thank you!

I had to solve a similar problem recently, here is how I solved it:
Transitivity is best represented, in my opinion, by grouping interrelated elements together.
For each pair you have to validate if it already belongs to a group (both values are already in the group) or if it extends the relation of a group (only one of the values belong to the group).
In the case it does not belong in any group, it becomes a new group.
In the case both values belong in different groups then you have to merge them.
As mentioned, this is closely related to a spanning tree.
One solution could be to use HashSets to represent the transitivity of your relations (I did not use HashSets in my case, there are many possible solutions).
Each HashSet would represent a group of interrelated elements.
Example implementation using HashSets:
List<ValuePair> pairs = new List<ValuePair>();
pairs.Add(new ValuePair("A", "B"));
pairs.Add(new ValuePair("A", "C"));
pairs.Add(new ValuePair("B", "C"));
pairs.Add(new ValuePair("C", "D"));
List<ValuePair> uniquePairs = new List<ValuePair>();
// this list is not really needed if all you care about
// is getting the resulting groups
List<HashSet<string>> sets = new List<HashSet<string>>();
foreach (ValuePair pair in pairs)
{
int value1Set = -1;
int value2Set = -1;
for (int i = 0; i < sets.Count; i++)
{
HashSet<string> set = sets[i];
if (set.Contains(pair.value1))
value1Set = i;
if (set.Contains(pair.value2))
value2Set = i;
}
if (value1Set == -1 && value2Set == -1)
{
// we have a new set
sets.Add(new HashSet<string> {pair.value1, pair.value2});
}
else if (value1Set == -1)
{
sets[value2Set].Add(pair.value1);
}
else if (value2Set == -1)
{
sets[value1Set].Add(pair.value2);
}
else
{
if (value1Set == value2Set)
{
// duplicate entry, skip the add
continue;
}
// merge the sets at value1Set and value2Set
foreach (string value in sets[value2Set])
{
sets[value1Set].Add(value);
}
sets.RemoveAt(value2Set);
}
uniquePairs.Add(pair);
}

OutOfMemoryException when updating a large list?

I have a large list and I would like to overwrite one value if required. To do this, I create two subsets of the list which seems to give me an OutOfMemoryException. Here is my code snippet:
if (ownRG != "")
{
List<string> maclist = ownRG.Split(',').ToList();
List<IVFile> temp = powlist.Where(a => maclist.Contains(a.Machine)).ToList();
powlist = powlist.Where(a => !maclist.Contains(a.Machine)).ToList(); // OOME Here
temp.ForEach(a => { a.ReportingGroup = ownRG; });
powlist.AddRange(temp);
}
Essentially I'm splitting the list into the part that needs updating and the part that doesn't, then I perform the update and put the list back together. This works fine for smaller lists, but breaks with an OutOfMemoryException on the third row within the if for a large list. Can I make this more efficient?
NOTE
powlist is the large list (>1m) items. maclist only has between 1 and 10 but even with 1 item this breaks.

Solving your issue
Here is how to rearrange your code using the enumerator code from my answer:
if (!string.IsNullOrEmpty(ownRG))
{
var maclist = new CommaSeparatedStringEnumerable(str);
var temp = powlist.Where(a => maclist.Contains(a.Machine));
foreach (var p in temp)
{
p.ReportingGroup = ownRG;
}
}
You should not use ToList in your code.
You don't need to remove thee contents of temp from powlist (you are re-adding them anyway)
Streaming over a large comma-separated string
You can iterate over the list manually instead of doing what you do now, by looking for , characters and remembering the position of the last found one and the one before. This will definitely make your app work because then it won't need to store the entire set in the memory at once.
Code example:
var str = "aaa,bbb,ccc";
var previousComma = -1;
var currentComma = 0;
for (; (currentComma = str.IndexOf(',', previousComma + 1)) != -1; previousComma = currentComma)
{
var currentItem = str.Substring(previousComma + 1, currentComma - previousComma - 1);
Console.WriteLine(currentItem);
}
var lastItem = str.Substring(previousComma + 1);
Console.WriteLine(lastItem);
Custom iterator
If you want to do it 'properly' in a fancy way, you can even write a custom enumerator:
public class CommaSeparatedStringEnumerator : IEnumerator<string>
{
int previousComma = -1;
int currentComma = -1;
string bigString = null;
bool atEnd = false;
public CommaSeparatedStringEnumerator(string s)
{
if (s == null)
throw new ArgumentNullException("s");
bigString = s;
this.Reset();
}
public string Current { get; private set; }
public void Dispose() { /* No need to do anything here */ }
object IEnumerator.Current { get { return this.Current; } }
public bool MoveNext()
{
if (atEnd)
return false;
atEnd = (currentComma = bigString.IndexOf(',', previousComma + 1)) == -1;
if (!atEnd)
Current = bigString.Substring(previousComma + 1, currentComma - previousComma - 1);
else
Current = bigString.Substring(previousComma + 1);
previousComma = currentComma;
return true;
}
public void Reset()
{
previousComma = -1;
currentComma = -1;
atEnd = false;
this.Current = null;
}
}
public class CommaSeparatedStringEnumerable : IEnumerable<string>
{
string bigString = null;
public CommaSeparatedStringEnumerable(string s)
{
if (s == null)
throw new ArgumentNullException("s");
bigString = s;
}
public IEnumerator<string> GetEnumerator()
{
return new CommaSeparatedStringEnumerator(bigString);
}
IEnumerator IEnumerable.GetEnumerator()
{
return this.GetEnumerator();
}
}
Then you can iterate over it like this:
var str = "aaa,bbb,ccc";
var enumerable = new CommaSeparatedStringEnumerable(str);
foreach (var item in enumerable)
{
Console.WriteLine(item);
}
Other thoughts
Can I make this more efficient?
Yes, you can. I suggest to either work with a more efficient data format (you can take a look around databases or XML, JSON, etc. depending on your needs). If you really want to work with comma-separated items, see my code examples above.

There's no need to create a bunch of sub-lists from powlist and reconstruct it. Simply loop over the powlist and update the ReportingGroup property accordingly.
var maclist = new HashSet<string>( ownRG.Split(',') );
foreach( var item in powlist) {
if( maclist.Contains( item.Machine ) ){
item.ReportingGroup = ownRG;
}
}
Since this changes powlist in place, you won't allocate any extra memory and shouldn't run into an OutOfMemoryException.

In a loop find the next ',' char. Take the substring between the ',' and the previous ',' position. At the end of the loop save a reference to the previous ',' position (which is initially set to 0). So you parse the items one-by-one rather than all at once.

You can try looping the items of your lists, but this will increase processing time.
foreach(var item in powlist)
{
//do your opeartions
}

BinarySearch on List<T> seems to be returning strange result

I am very new to C#. I have created a List object and then I am performing BinarySearch on a particular item. But the search results seem to strange. Here is the code :
class Element
{
public int x;
public Element(int val) { x = val; }
}
class MyContainer : IComparable<MyContainer>
{
public Element elem;
public MyContainer(int val) { elem = new Element(val); }
public MyContainer(Element e) { elem = e; }
public int CompareTo(MyContainer obj)
{
if (elem.x < obj.elem.x) { return -1; }
else if (elem.x == obj.elem.x) { return 0; }
else { return 1; }
}
}
class Program
{
static void Main(string[] args)
{
MyContainer container1 = new MyContainer(100);
MyContainer container2 = new MyContainer(21);
MyContainer container3 = new MyContainer(-122);
Element elemObj = new Element(-122);
List<MyContainer> list = new List<MyContainer>();
list.Add(new MyContainer(80));
list.Add(container1);
list.Add(container2);
list.Add(container3);
list.Add(new MyContainer(90));
foreach(MyContainer c in list) Console.WriteLine(c.elem.x);
if (list.Contains(container3) == true) Console.WriteLine("present");
else Console.WriteLine("NOT present");
Console.WriteLine("Search result:::"+list.BinarySearch(new MyContainer(elemObj)));
Console.WriteLine("Search result:::" + list.BinarySearch(container1));
Console.WriteLine("Search result:::" + list.BinarySearch(container2));
Console.WriteLine("Search result:::" + list.BinarySearch(container3));
}
}
The output is as follows :
80
100
21
-122
90
present
Search result:::-1
Search result:::-6
Search result:::2
Search result:::-1
Why is only the element corresponding to value 21 found and why not the others

Your list isn't sorted to start with. Binary search only works when the original input is sorted. The whole point is that you know that if list[x] = y, then list[a] <= y for all a < x and list[a] >= y for all a > x.
So either you need to sort your list first, or you need to choose a different way of searching (e.g. using a separate hash-based dictionary, or just doing a linear search).
Also note that your CompareTo method can be implemented a lot more simply:
public int CompareTo(MyContainer obj)
{
return elem.x.CompareTo(obj.elem.x);
}

A BinarySearch only works on a sorted list. Since your list is not sorted in the way it should be, it may or may not find the result correctly. You could fix this by sorting first:
list.Add(new MyContainer(90));
// after adding all elements
list.Sort();
foreach(MyContainer c in list) Console.WriteLine(c.elem.x);

fastest way for accessing double array as key in dictionary

I have a double[] array, i want to use it as key （not literally, but in the way that the key is matched when all the doubles in the double array need to be matched)
What is the fastest way to use the double[] array as key to dictionary?
Is it using
Dictionary<string, string> (convert double[] to a string)
or
anything else like converting it

Given that all key arrays will have the same length, either consider using a Tuple<,,, ... ,>, or use a structural equality comparer on the arrays.
With tuple:
var yourDidt = new Dictionary<Tuple<double, double, double>, string>();
yourDict.Add(Tuple.Create(3.14, 2.718, double.NaN), "da value");
string read = yourDict[Tuple.Create(3.14, 2.718, double.NaN)];
With (strongly typed version of) StructuralEqualityComparer:
class DoubleArrayStructuralEqualityComparer : EqualityComparer<double[]>
{
public override bool Equals(double[] x, double[] y)
{
return System.Collections.StructuralComparisons.StructuralEqualityComparer
.Equals(x, y);
}
public override int GetHashCode(double[] obj)
{
return System.Collections.StructuralComparisons.StructuralEqualityComparer
.GetHashCode(obj);
}
}
...
var yourDict = new Dictionary<double[], string>(
new DoubleArrayStructuralEqualityComparer());
yourDict.Add(new[] { 3.14, 2.718, double.NaN, }, "da value");
string read = yourDict[new[] { 3.14, 2.718, double.NaN, }];
Also consider the suggestion by Sergey Berezovskiy to create a custom class or (immutable!) struct to hold your set of doubles. In that way you can name your type and its members in a natural way that makes it more clear what you do. And your class/struct can easily be extended later on, if needed.

Thus all arrays have same length and each item in array have specific meaning, then create class which holds all items as properties with descriptive names. E.g. instead of double array with two items you can have class Point with properties X and Y. Then override Equals and GetHashCode of this class and use it as key (see What is the best algorithm for an overriding GetHashCode):
Dictionary<Point, string>
Benefits - instead of having array, you have data structure which makes its purpose clear. Instead of referencing items by indexes, you have nice named property names, which also make their purpose clear. And also speed - calculating hash code is fast. Compare:
double[] a = new [] { 12.5, 42 };
// getting first coordinate a[0];
Point a = new Point { X = 12.5, Y = 42 };
// getting first coordinate a.X

[Do not consider this a separate answer; this is an extension of #JeppeStigNielsen's answer]
I'd just like to point out that you make Jeppe's approach generic as follows:
public class StructuralEqualityComparer<T>: IEqualityComparer<T>
{
public bool Equals(T x, T y)
{
return StructuralComparisons.StructuralEqualityComparer.Equals(x, y);
}
public int GetHashCode(T obj)
{
return StructuralComparisons.StructuralEqualityComparer.GetHashCode(obj);
}
public static StructuralEqualityComparer<T> Default
{
get
{
StructuralEqualityComparer<T> comparer = _defaultComparer;
if (comparer == null)
{
comparer = new StructuralEqualityComparer<T>();
_defaultComparer = comparer;
}
return comparer;
}
}
private static StructuralEqualityComparer<T> _defaultComparer;
}
(From an original answer here: https://stackoverflow.com/a/5601068/106159)
Then you would declare the dictionary like this:
var yourDict = new Dictionary<double[], string>(new StructuralEqualityComparer<double[]>());
Note: It might be better to initialise _defaultComparer using Lazy<T>.
[EDIT]
It's possible that this might be faster; worth a try:
class DoubleArrayComparer: IEqualityComparer<double[]>
{
public bool Equals(double[] x, double[] y)
{
if (x == y)
return true;
if (x == null || y == null)
return false;
if (x.Length != y.Length)
return false;
for (int i = 0; i < x.Length; ++i)
if (x[i] != y[i])
return false;
return true;
}
public int GetHashCode(double[] data)
{
if (data == null)
return 0;
int result = 17;
foreach (var value in data)
result += result*23 + value.GetHashCode();
return result;
}
}
...
var yourDict = new Dictionary<double[], string>(new DoubleArrayComparer());

Ok this is what I found so far:
I input an entry (length 4 arrray) to the dictionary, and access it for 999999 times on my machine:
Dictionary<double[], string>(
new DoubleArrayStructuralEqualityComparer()); takes 1.75 seconds
Dictionary<Tuple<double...>,string> takes 0.85 seconds
The code below takes 0.1755285 seconds, which is the fastest now! (in line with the comment with Sergey.)
The fastest - The code of DoubleArrayComparer by Matthew Watson takes 0.15 seconds!
public class DoubleArray
{
private double[] d = null;
public DoubleArray(double[] d)
{
this.d = d;
}
public override bool Equals(object obj)
{
if (!(obj is DoubleArray)) return false;
DoubleArray dobj = (DoubleArray)obj;
if (dobj.d.Length != d.Length) return false;
for (int i = 0; i < d.Length; i++)
{
if (dobj.d[i] != d[i]) return false;
}
return true;
}
public override int GetHashCode()
{
unchecked // Overflow is fine, just wrap
{
int hash = 17;
for (int i = 0; i < d.Length;i++ )
{
hash = hash*23 + d[i].GetHashCode();
}
return hash;
}
}
}

Remove duplicates from a List<T> in C#

Anyone have a quick method for de-duplicating a generic List in C#?

If you're using .Net 3+, you can use Linq.
List<T> withDupes = LoadSomeData();
List<T> noDupes = withDupes.Distinct().ToList();

Perhaps you should consider using a HashSet.
From the MSDN link:
using System;
using System.Collections.Generic;
class Program
{
static void Main()
{
HashSet<int> evenNumbers = new HashSet<int>();
HashSet<int> oddNumbers = new HashSet<int>();
for (int i = 0; i < 5; i++)
{
// Populate numbers with just even numbers.
evenNumbers.Add(i * 2);
// Populate oddNumbers with just odd numbers.
oddNumbers.Add((i * 2) + 1);
}
Console.Write("evenNumbers contains {0} elements: ", evenNumbers.Count);
DisplaySet(evenNumbers);
Console.Write("oddNumbers contains {0} elements: ", oddNumbers.Count);
DisplaySet(oddNumbers);
// Create a new HashSet populated with even numbers.
HashSet<int> numbers = new HashSet<int>(evenNumbers);
Console.WriteLine("numbers UnionWith oddNumbers...");
numbers.UnionWith(oddNumbers);
Console.Write("numbers contains {0} elements: ", numbers.Count);
DisplaySet(numbers);
}
private static void DisplaySet(HashSet<int> set)
{
Console.Write("{");
foreach (int i in set)
{
Console.Write(" {0}", i);
}
Console.WriteLine(" }");
}
}
/* This example produces output similar to the following:
* evenNumbers contains 5 elements: { 0 2 4 6 8 }
* oddNumbers contains 5 elements: { 1 3 5 7 9 }
* numbers UnionWith oddNumbers...
* numbers contains 10 elements: { 0 2 4 6 8 1 3 5 7 9 }
*/

How about:
var noDupes = list.Distinct().ToList();
In .net 3.5?

Simply initialize a HashSet with a List of the same type:
var noDupes = new HashSet<T>(withDupes);
Or, if you want a List returned:
var noDupsList = new HashSet<T>(withDupes).ToList();

Sort it, then check two and two next to each others, as the duplicates will clump together.
Something like this:
list.Sort();
Int32 index = list.Count - 1;
while (index > 0)
{
if (list[index] == list[index - 1])
{
if (index < list.Count - 1)
(list[index], list[list.Count - 1]) = (list[list.Count - 1], list[index]);
list.RemoveAt(list.Count - 1);
index--;
}
else
index--;
}
Notes:
Comparison is done from back to front, to avoid having to resort list after each removal
This example now uses C# Value Tuples to do the swapping, substitute with appropriate code if you can't use that
The end-result is no longer sorted

I like to use this command:
List<Store> myStoreList = Service.GetStoreListbyProvince(provinceId)
.GroupBy(s => s.City)
.Select(grp => grp.FirstOrDefault())
.OrderBy(s => s.City)
.ToList();
I have these fields in my list: Id, StoreName, City, PostalCode
I wanted to show list of cities in a dropdown which has duplicate values.
solution: Group by city then pick the first one for the list.

It worked for me. simply use
List<Type> liIDs = liIDs.Distinct().ToList<Type>();
Replace "Type" with your desired type e.g. int.

As kronoz said in .Net 3.5 you can use Distinct().
In .Net 2 you could mimic it:
public IEnumerable<T> DedupCollection<T> (IEnumerable<T> input)
{
var passedValues = new HashSet<T>();
// Relatively simple dupe check alg used as example
foreach(T item in input)
if(passedValues.Add(item)) // True if item is new
yield return item;
}
This could be used to dedupe any collection and will return the values in the original order.
It's normally much quicker to filter a collection (as both Distinct() and this sample does) than it would be to remove items from it.

An extension method might be a decent way to go... something like this:
public static List<T> Deduplicate<T>(this List<T> listToDeduplicate)
{
return listToDeduplicate.Distinct().ToList();
}
And then call like this, for example:
List<int> myFilteredList = unfilteredList.Deduplicate();

In Java (I assume C# is more or less identical):
list = new ArrayList<T>(new HashSet<T>(list))
If you really wanted to mutate the original list:
List<T> noDupes = new ArrayList<T>(new HashSet<T>(list));
list.clear();
list.addAll(noDupes);
To preserve order, simply replace HashSet with LinkedHashSet.

This takes distinct (the elements without duplicating elements) and convert it into a list again:
List<type> myNoneDuplicateValue = listValueWithDuplicate.Distinct().ToList();

Use Linq's Union method.
Note: This solution requires no knowledge of Linq, aside from that it exists.
Code
Begin by adding the following to the top of your class file:
using System.Linq;
Now, you can use the following to remove duplicates from an object called, obj1:
obj1 = obj1.Union(obj1).ToList();
Note: Rename obj1 to the name of your object.
How it works
The Union command lists one of each entry of two source objects. Since obj1 is both source objects, this reduces obj1 to one of each entry.
The ToList() returns a new List. This is necessary, because Linq commands like Union returns the result as an IEnumerable result instead of modifying the original List or returning a new List.

As a helper method (without Linq):
public static List<T> Distinct<T>(this List<T> list)
{
return (new HashSet<T>(list)).ToList();
}

Here's an extension method for removing adjacent duplicates in-situ. Call Sort() first and pass in the same IComparer. This should be more efficient than Lasse V. Karlsen's version which calls RemoveAt repeatedly (resulting in multiple block memory moves).
public static void RemoveAdjacentDuplicates<T>(this List<T> List, IComparer<T> Comparer)
{
int NumUnique = 0;
for (int i = 0; i < List.Count; i++)
if ((i == 0) || (Comparer.Compare(List[NumUnique - 1], List[i]) != 0))
List[NumUnique++] = List[i];
List.RemoveRange(NumUnique, List.Count - NumUnique);
}

Installing the MoreLINQ package via Nuget, you can easily distinct object list by a property
IEnumerable<Catalogue> distinctCatalogues = catalogues.DistinctBy(c => c.CatalogueCode);

If you have tow classes Product and Customer and we want to remove duplicate items from their list
public class Product
{
public int Id { get; set; }
public string ProductName { get; set; }
}
public class Customer
{
public int Id { get; set; }
public string CustomerName { get; set; }
}
You must define a generic class in the form below
public class ItemEqualityComparer<T> : IEqualityComparer<T> where T : class
{
private readonly PropertyInfo _propertyInfo;
public ItemEqualityComparer(string keyItem)
{
_propertyInfo = typeof(T).GetProperty(keyItem, BindingFlags.GetProperty | BindingFlags.Instance | BindingFlags.Public);
}
public bool Equals(T x, T y)
{
var xValue = _propertyInfo?.GetValue(x, null);
var yValue = _propertyInfo?.GetValue(y, null);
return xValue != null && yValue != null && xValue.Equals(yValue);
}
public int GetHashCode(T obj)
{
var propertyValue = _propertyInfo.GetValue(obj, null);
return propertyValue == null ? 0 : propertyValue.GetHashCode();
}
}
then, You can remove duplicate items in your list.
var products = new List<Product>
{
new Product{ProductName = "product 1" ,Id = 1,},
new Product{ProductName = "product 2" ,Id = 2,},
new Product{ProductName = "product 2" ,Id = 4,},
new Product{ProductName = "product 2" ,Id = 4,},
};
var productList = products.Distinct(new ItemEqualityComparer<Product>(nameof(Product.Id))).ToList();
var customers = new List<Customer>
{
new Customer{CustomerName = "Customer 1" ,Id = 5,},
new Customer{CustomerName = "Customer 2" ,Id = 5,},
new Customer{CustomerName = "Customer 2" ,Id = 5,},
new Customer{CustomerName = "Customer 2" ,Id = 5,},
};
var customerList = customers.Distinct(new ItemEqualityComparer<Customer>(nameof(Customer.Id))).ToList();
this code remove duplicate items by Id if you want remove duplicate items by other property, you can change nameof(YourClass.DuplicateProperty) same nameof(Customer.CustomerName) then remove duplicate items by CustomerName Property.

If you don't care about the order you can just shove the items into a HashSet, if you do want to maintain the order you can do something like this:
var unique = new List<T>();
var hs = new HashSet<T>();
foreach (T t in list)
if (hs.Add(t))
unique.Add(t);
Or the Linq way:
var hs = new HashSet<T>();
list.All( x => hs.Add(x) );
Edit: The HashSet method is O(N) time and O(N) space while sorting and then making unique (as suggested by #lassevk and others) is O(N*lgN) time and O(1) space so it's not so clear to me (as it was at first glance) that the sorting way is inferior

Might be easier to simply make sure that duplicates are not added to the list.
if(items.IndexOf(new_item) < 0)
items.add(new_item)

You can use Union
obj2 = obj1.Union(obj1).ToList();

Another way in .Net 2.0
static void Main(string[] args)
{
List<string> alpha = new List<string>();
for(char a = 'a'; a <= 'd'; a++)
{
alpha.Add(a.ToString());
alpha.Add(a.ToString());
}
Console.WriteLine("Data :");
alpha.ForEach(delegate(string t) { Console.WriteLine(t); });
alpha.ForEach(delegate (string v)
{
if (alpha.FindAll(delegate(string t) { return t == v; }).Count > 1)
alpha.Remove(v);
});
Console.WriteLine("Unique Result :");
alpha.ForEach(delegate(string t) { Console.WriteLine(t);});
Console.ReadKey();
}

There are many ways to solve - the duplicates issue in the List, below is one of them:
List<Container> containerList = LoadContainer();//Assume it has duplicates
List<Container> filteredList = new List<Container>();
foreach (var container in containerList)
{
Container duplicateContainer = containerList.Find(delegate(Container checkContainer)
{ return (checkContainer.UniqueId == container.UniqueId); });
//Assume 'UniqueId' is the property of the Container class on which u r making a search
if(!containerList.Contains(duplicateContainer) //Add object when not found in the new class object
{
filteredList.Add(container);
}
}
Cheers
Ravi Ganesan

Here's a simple solution that doesn't require any hard-to-read LINQ or any prior sorting of the list.
private static void CheckForDuplicateItems(List<string> items)
{
if (items == null ||
items.Count == 0)
return;
for (int outerIndex = 0; outerIndex < items.Count; outerIndex++)
{
for (int innerIndex = 0; innerIndex < items.Count; innerIndex++)
{
if (innerIndex == outerIndex) continue;
if (items[outerIndex].Equals(items[innerIndex]))
{
// Duplicate Found
}
}
}
}

David J.'s answer is a good method, no need for extra objects, sorting, etc. It can be improved on however:
for (int innerIndex = items.Count - 1; innerIndex > outerIndex ; innerIndex--)
So the outer loop goes top bottom for the entire list, but the inner loop goes bottom "until the outer loop position is reached".
The outer loop makes sure the entire list is processed, the inner loop finds the actual duplicates, those can only happen in the part that the outer loop hasn't processed yet.
Or if you don't want to do bottom up for the inner loop you could have the inner loop start at outerIndex + 1.

A simple intuitive implementation:
public static List<PointF> RemoveDuplicates(List<PointF> listPoints)
{
List<PointF> result = new List<PointF>();
for (int i = 0; i < listPoints.Count; i++)
{
if (!result.Contains(listPoints[i]))
result.Add(listPoints[i]);
}
return result;
}

All answers copy lists, or create a new list, or use slow functions, or are just painfully slow.
To my understanding, this is the fastest and cheapest method I know (also, backed by a very experienced programmer specialized on real-time physics optimization).
// Duplicates will be noticed after a sort O(nLogn)
list.Sort();
// Store the current and last items. Current item declaration is not really needed, and probably optimized by the compiler, but in case it's not...
int lastItem = -1;
int currItem = -1;
int size = list.Count;
// Store the index pointing to the last item we want to keep in the list
int last = size - 1;
// Travel the items from last to first O(n)
for (int i = last; i >= 0; --i)
{
currItem = list[i];
// If this item was the same as the previous one, we don't want it
if (currItem == lastItem)
{
// Overwrite last in current place. It is a swap but we don't need the last
list[i] = list[last];
// Reduce the last index, we don't want that one anymore
last--;
}
// A new item, we store it and continue
else
lastItem = currItem;
}
// We now have an unsorted list with the duplicates at the end.
// Remove the last items just once
list.RemoveRange(last + 1, size - last - 1);
// Sort again O(n logn)
list.Sort();
Final cost is:
nlogn + n + nlogn = n + 2nlogn = O(nlogn) which is pretty nice.
Note about RemoveRange:
Since we cannot set the count of the list and avoid using the Remove funcions, I don't know exactly the speed of this operation but I guess it is the fastest way.

Using HashSet this can be done easily.
List<int> listWithDuplicates = new List<int> { 1, 2, 1, 2, 3, 4, 5 };
HashSet<int> hashWithoutDuplicates = new HashSet<int> ( listWithDuplicates );
List<int> listWithoutDuplicates = hashWithoutDuplicates.ToList();

Using HashSet:
list = new HashSet<T>(list).ToList();

public static void RemoveDuplicates<T>(IList<T> list )
{
if (list == null)
{
return;
}
int i = 1;
while(i<list.Count)
{
int j = 0;
bool remove = false;
while (j < i && !remove)
{
if (list[i].Equals(list[j]))
{
remove = true;
}
j++;
}
if (remove)
{
list.RemoveAt(i);
}
else
{
i++;
}
}
}

If you need to compare complex objects, you will need to pass a Comparer object inside the Distinct() method.
private void GetDistinctItemList(List<MyListItem> _listWithDuplicates)
{
//It might be a good idea to create MyListItemComparer
//elsewhere and cache it for performance.
List<MyListItem> _listWithoutDuplicates = _listWithDuplicates.Distinct(new MyListItemComparer()).ToList();
//Choose the line below instead, if you have a situation where there is a chance to change the list while Distinct() is running.
//ToArray() is used to solve "Collection was modified; enumeration operation may not execute" error.
//List<MyListItem> _listWithoutDuplicates = _listWithDuplicates.ToArray().Distinct(new MyListItemComparer()).ToList();
return _listWithoutDuplicates;
}
Assuming you have 2 other classes like:
public class MyListItemComparer : IEqualityComparer<MyListItem>
{
public bool Equals(MyListItem x, MyListItem y)
{
return x != null
&& y != null
&& x.A == y.A
&& x.B.Equals(y.B);
&& x.C.ToString().Equals(y.C.ToString());
}
public int GetHashCode(MyListItem codeh)
{
return codeh.GetHashCode();
}
}
And:
public class MyListItem
{
public int A { get; }
public string B { get; }
public MyEnum C { get; }
public MyListItem(int a, string b, MyEnum c)
{
A = a;
B = b;
C = c;
}
}

I think the simplest way is:
Create a new list and add unique item.
Example:
class MyList{
int id;
string date;
string email;
}
List<MyList> ml = new Mylist();
ml.Add(new MyList(){
id = 1;
date = "2020/09/06";
email = "zarezadeh#gmailcom"
});
ml.Add(new MyList(){
id = 2;
date = "2020/09/01";
email = "zarezadeh#gmailcom"
});
List<MyList> New_ml = new Mylist();
foreach (var item in ml)
{
if (New_ml.Where(w => w.email == item.email).SingleOrDefault() == null)
{
New_ml.Add(new MyList()
{
id = item.id,
date = item.date,
email = item.email
});
}
}

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Removing duplicates from string array - c#

Related

Check if an object equals another object through a transitive relationship

OutOfMemoryException when updating a large list?

BinarySearch on List<T> seems to be returning strange result

fastest way for accessing double array as key in dictionary

Remove duplicates from a List<T> in C#

Categories

Resources