Comparison between two string lists - c#

I have two string lists, I need to do a comparison and validate at least if one position is the same.
Can someone help me?
List<string> listA = new List<string>();
listA.Add("b");
listA.Add("c");
listA.Add("a");
List<string> listB = new List<string>();
listB.Add("h");
listB.Add("b");
listB.Add("d");
Expected output:
b = h false
b = b true (break)
b = d false

You should use LINQ:
if (listA.Intersect(listB).Any())
{
//do smth;
}

Given that the lists are the same size, some function/method like:
for (int index = 0; index < listA.Count; index++) {
if (listA[index] == listB[index]) {
return true;
}
}
Should work.
If the lists arn't the same size, you can then do something like.
int smallListCount = listA.Count;
if (smallListCount > listB.Count) {
smallListCount = listB.Count;
}
for (int index = 0; index < smallListCount; index++) { /*... codeblah*/ }
That'll make sure it'll only loop for as many elements as there are able to be compared in the smallest list.
Erp! If you don't care about the order of the indexes then...
You'll need some function/method that can take any input, then iterates over listB to see if there is a match, returning true... if well... it finds a match.
You'd then want to feed in the values of listA, and store the result for each element.
private bool CheckListForItem(List<string> listToCheck, string itemToCheckFor) {
foreach(string item in listToCheck) {
if (item == itemToCheckFor) {
return true;
}
}
return false;
}
which can then be used like:
bool[] results = new results[listA.Count];
for(int index = 0; index < listA.Count; index++) {
results[index] = CheckListForItem(listB, listA[index]);
}
to obtain results for each individual item.
Note that this will have a complexity of O(n), I'd assume LINQ is faster. But hey, this exposes a potential implementation! Kinda.

Try to use Except LINQ extension method, which takes items only from the first list, that are not present in the second
List<string> resultList = listA.Except(listB).ToList();

Here is something that does exactly what you want and doesnt use Linq if you arent familiar with it.
We extend a List class and add CompareLists method to it and then called it to see if any value is the same in both lists.
public static class Extensions
{
public static bool CompareLists(this List<string> listA, List<string> listB)
{
foreach (string s in listA)
{
foreach (string s1 in listB)
{
if (s.Equals(s1))
{
return true;
}
}
}
return false;
}
}
and then use it like
var result = listA.CompareLists(listB);
with a little tweaking you can make it return a list of objects that are or arent equal.
public static class Extensions
{
public static List<string> CompareLists(this List<string> listA, List<string> listB)
{
List<string> results = new List<string>();
foreach (string s in listA)
{
foreach (string s1 in listB)
{
if (s.Equals(s1))
{
results.Add(s);
}
}
}
return results;
}
}

Related

C# no duplicates in array

I have a file with allot of sentences. I need to make a dictionary with the words from that file. Until now I've separated the words and sort them using Split() and Sort() methods. My problem is to make a list without duplicate words. How can I do that?
static int n = 0;
public static string[] NoDuplicate(string[] array)
{
int i;
string[] res = (string[])array.Clone();
for (i = 0; i < array.Length-1; i++)
{
if (array[i + 1] != array[i])
res[n++] = (string)array[i];
}
return res;
}
how can I do it more neat?
I don't like that method because is
initialized using Clone() and the length is too big.
You can also use HashSet beside the .Distinct() feature of LINQ:
HashSet:
This is an optimized set collection. It helps eliminates
duplicate strings or elements in an array. It is a set that hashes its
contents.
public static string[] NoDuplicate(string[] array)
{
string[] result = new HashSet<string>(array).ToArray();
return result;
}
If you want to eliminate the duplicate with case-insensitive, you can pass an IEqualityComparer argument like this:
Using HashSet:
public static string[] NoDuplicate(string[] array)
{
string[] result = new HashSet<string>(array, StringComparer.OrdinalIgnoreCase)
.ToArray();
return result;
}
Using LINQ's Distict feature:
public static string[] NoDuplicate(string[] array)
{
string[] result = array.Distinct(StringComparer.OrdinalIgnoreCase)
.ToArray();
return result;
}
Try this:
private static string[] NoDuplicate(string[] inputArray)
{
var result = inputArray.Distinct().ToArray();
return result;
}
Instead of dictionary create a trie of words.
At each level keep a count of sameWord if it repeats. This way you can avoid using too much space and it will be faster to search any word O(log(n))
where n is number of distinct words
public class WordList {
private int sameWord = 0;
String name = "";
WordList [] child = new WordList[26];
public void add( String s, WordList c, int index )
{
sameWord++;
if(index > 0)
{
name += ""+s.charAt(index-1);
}
if(index == s.length())
{
return;
}
if(c.child[s.charAt(index)-'a'] ==null)
{
c.child[s.charAt(index)-'a'] = new WordList();
}
add(s,c.child[s.charAt(index)-'a'],index+1);
}
public static WordList findChar(char c)
{
return child[(int)(c-'a')];
}
}
You can try below solution:
private static string[] NoDuplicate(string[] inputArray)
{
List<string> stringList = new List<string>();
foreach (string s in inputArray)
{
if (!stringList.Contains(s))
{
stringList.Add(s);
}
}
return stringList.ToArray();
}

Is there another way to take N at a time than a for loop?

Is there a more elegant way to implement going 5 items at a time than a for loop like this?
var q = Campaign_stats.OrderByDescending(c=>c.Leads).Select(c=>c.PID).Take(23);
var count = q.Count();
for (int i = 0; i < (count/5)+1; i++)
{
q.Skip(i*5).Take(5).Dump();
}
for(int i = 0; i <= count; i+=5)
{
}
So you want to efficiently call Dump() on every 5 items in q.
The solution you have now will re-iterate the IEnumerable<T> every time through the for loop. It may be more efficient to do something like this: (I don't know what your type is so I'm using T)
const int N = 5;
T[] ar = new T[N]; // Temporary array of N items.
int i=0;
foreach(var item in q) { // Just one iterator.
ar[i++] = item; // Store a reference to this item.
if (i == N) { // When we have N items,
ar.Dump(); // dump them,
i = 0; // and reset the array index.
}
}
// Dump the remaining items
if (i > 0) {
ar.Take(i).Dump();
}
This only uses one iterator. Considering your variable is named q, I'm assuming that is short for "query", which implies this is against a database. So using just one iterator may be very beneficial.
I may keep this code, and wrap it up in an extension method. How about "clump"?
public static IEnumerable<IEnumerable<T>> Clump<T>(this IEnumerable<T> items, int clumpSize) {
T[] ar = new T[clumpSize];
int i=0;
foreach(var item in items) {
ar[i++] = item;
if (i == clumpSize) {
yield return ar;
i = 0;
}
}
if (i > 0)
yield return ar.Take(i);
}
Calling it in the context of your code:
foreach (var clump in q.Clump(5)) {
clump.Dump();
}
try iterating by 5 instead!
for(int i = 0; i < count; i += 5)
{
//etc
}
Adding more LINQ with GroupBy and Zip:
q
// add indexes
.Zip(Enumerable.Range(0, Int32.MaxValue),(a,index)=> new {Index=index, Value=a})
.GroupBy(m=>m.Index /5) // divide in groups by 5 items each
.Select(k => {
k.Select(v => v.Value).Dump(); // Perform operation on 5 elements
return k.Key; // return something to satisfy Select.
});

Performance ideas (in-memory C# hashset and contains too slow)

I have the following code
private void LoadIntoMemory()
{
//Init large HashSet
HashSet<document> hsAllDocuments = new HashSet<document>();
//Get first rows from database
List<document> docsList = document.GetAllAboveDocID(0, 500000);
//Load objects into dictionary
foreach (document d in docsList)
{
hsAllDocuments.Add(d);
}
Application["dicAllDocuments"] = hsAllDocuments;
}
private HashSet<document> documentHits(HashSet<document> hsRawHit, HashSet<document> hsAllDocuments, string query, string[] queryArray)
{
int counter = 0;
const int maxCount = 1000;
foreach (document d in hsAllDocuments)
{
//Headline
if (d.Headline.Contains(query))
{
if (counter >= maxCount)
break;
hsRawHit.Add(d);
counter++;
}
//Description
if (d.Description.Contains(query))
{
if (counter >= maxCount)
break;
hsRawHit.Add(d);
counter++;
}
//splitted query word by word
//string[] queryArray = query.Split(' ');
if (queryArray.Count() > 1)
{
foreach (string q in queryArray)
{
if (d.Headline.Contains(q))
{
if (counter >= maxCount)
break;
hsRawHit.Add(d);
counter++;
}
//Description
if (d.Description.Contains(q))
{
if (counter >= maxCount)
break;
hsRawHit.Add(d);
counter++;
}
}
}
}
return hsRawHit;
}
First I load all the data into a hashset (via Application for later use) - runs fine - totally OK to be slow for what I'm doing.
Will be running 4.0 framework in C# (can't update to the new upgrade for 4.0 with the async stuff).
The documentHits method runs fairly slow on my current setup - considering that it's all in memory. What can I do to speed up this method?
Examples would be awesome - thanks.
I see that you are using a HashSet, but you are not using any of it's advantages, so you should just use a List instead.
What's taking time is looping through all the documents and looking for strings in other strings, so you should try to elliminate as much as possible of that.
One possibility is to set up indexes of which documents contains which character pairs. If the string query contains Hello, you would be looking in the documents that contains He, el, ll and lo.
You could set up a Dictionary<string, List<int>> where the dictionary key is the character combinations and the list contains indexes to the documents in your document list. Setting up the dictionary will take some time, of course, but you can focus on the character combinations that are less common. If a character combination exists in 80% of the documents, it's pretty useless for elliminating documents, but if a character combination exists in only 2% of the documents it has elliminated 98% of your work.
If you loop through the documents in the list and add occurances to the lists in the dictionary, the lists of indexes will be sorted, so it will be very easy to join the lists later on. When you add indexes to the list, you can throw away lists when they get too large and you see that they would not be useful for elliminating documents. That way you will only be keeping the shorter lists and they will not consume so much memory.
Edit:
It put together a small example:
public class IndexElliminator<T> {
private List<T> _items;
private Dictionary<string, List<int>> _index;
private Func<T, string> _getContent;
private static HashSet<string> GetPairs(string value) {
HashSet<string> pairs = new HashSet<string>();
for (int i = 1; i < value.Length; i++) {
pairs.Add(value.Substring(i - 1, 2));
}
return pairs;
}
public IndexElliminator(List<T> items, Func<T, string> getContent, int maxIndexSize) {
_items = items;
_getContent = getContent;
_index = new Dictionary<string, List<int>>();
for (int index = 0;index<_items.Count;index++) {
T item = _items[index];
foreach (string pair in GetPairs(_getContent(item))) {
List<int> list;
if (_index.TryGetValue(pair, out list)) {
if (list != null) {
if (list.Count == maxIndexSize) {
_index[pair] = null;
} else {
list.Add(index);
}
}
} else {
list = new List<int>();
list.Add(index);
_index.Add(pair, list);
}
}
}
}
private static List<int> JoinLists(List<int> list1, List<int> list2) {
List<int> result = new List<int>();
int i1 = 0, i2 = 0;
while (i1 < list1.Count && i2 < list2.Count) {
switch (Math.Sign(list1[i1].CompareTo(list2[i2]))) {
case 0: result.Add(list1[i1]); i1++; i2++; break;
case -1: i1++; break;
case 1: i2++; break;
}
}
return result;
}
public List<T> Find(string query) {
HashSet<string> pairs = GetPairs(query);
List<List<int>> indexes = new List<List<int>>();
bool found = false;
foreach (string pair in pairs) {
List<int> list;
if (_index.TryGetValue(pair, out list)) {
found = true;
if (list != null) {
indexes.Add(list);
}
}
}
List<T> result = new List<T>();
if (found && indexes.Count == 0) {
indexes.Add(Enumerable.Range(0, _items.Count).ToList());
}
if (indexes.Count > 0) {
while (indexes.Count > 1) {
indexes[indexes.Count - 2] = JoinLists(indexes[indexes.Count - 2], indexes[indexes.Count - 1]);
indexes.RemoveAt(indexes.Count - 1);
}
foreach (int index in indexes[0]) {
if (_getContent(_items[index]).Contains(query)) {
result.Add(_items[index]);
}
}
}
return result;
}
}
Test:
List<string> items = new List<string> {
"Hello world",
"How are you",
"What is this",
"Can this be true",
"Some phrases",
"Words upon words",
"What to do",
"Where to go",
"When is this",
"How can this be",
"Well above margin",
"Close to the center"
};
IndexElliminator<string> index = new IndexElliminator<string>(items, s => s, items.Count / 2);
List<string> found = index.Find("this");
foreach (string s in found) Console.WriteLine(s);
Output:
What is this
Can this be true
When is this
How can this be
You are running linearly through all documents to find matches - this is O(n), you could do better if you solved the inverse problem, similar to how a fulltext index works: start with the query terms and preprocess the set of documents that match each query term - since this might get complicated I would suggest just using a DB with fulltext capability, this will be much faster than your approach.
Also you are abusing HashSet - instead just use a List, and don't put in duplicates - all the cases in documentHits() that produce a match should be exclusive.
If you have a whole lot of time at the start to create the database, you can look into using a Trie.
A Trie will make the string search much faster.
There's a little explanation and an implementation in the end here.
Another implementation: Trie class
You should not test each document for all test steps!
Instead it you should go to the next document after first successeful test result.
hsRawHit.Add(d);
counter++;
you should add continue; after counter++;
hsRawHit.Add(d);
counter++;
continue;

Remove duplicates from a List<T> in C#

Anyone have a quick method for de-duplicating a generic List in C#?
If you're using .Net 3+, you can use Linq.
List<T> withDupes = LoadSomeData();
List<T> noDupes = withDupes.Distinct().ToList();
Perhaps you should consider using a HashSet.
From the MSDN link:
using System;
using System.Collections.Generic;
class Program
{
static void Main()
{
HashSet<int> evenNumbers = new HashSet<int>();
HashSet<int> oddNumbers = new HashSet<int>();
for (int i = 0; i < 5; i++)
{
// Populate numbers with just even numbers.
evenNumbers.Add(i * 2);
// Populate oddNumbers with just odd numbers.
oddNumbers.Add((i * 2) + 1);
}
Console.Write("evenNumbers contains {0} elements: ", evenNumbers.Count);
DisplaySet(evenNumbers);
Console.Write("oddNumbers contains {0} elements: ", oddNumbers.Count);
DisplaySet(oddNumbers);
// Create a new HashSet populated with even numbers.
HashSet<int> numbers = new HashSet<int>(evenNumbers);
Console.WriteLine("numbers UnionWith oddNumbers...");
numbers.UnionWith(oddNumbers);
Console.Write("numbers contains {0} elements: ", numbers.Count);
DisplaySet(numbers);
}
private static void DisplaySet(HashSet<int> set)
{
Console.Write("{");
foreach (int i in set)
{
Console.Write(" {0}", i);
}
Console.WriteLine(" }");
}
}
/* This example produces output similar to the following:
* evenNumbers contains 5 elements: { 0 2 4 6 8 }
* oddNumbers contains 5 elements: { 1 3 5 7 9 }
* numbers UnionWith oddNumbers...
* numbers contains 10 elements: { 0 2 4 6 8 1 3 5 7 9 }
*/
How about:
var noDupes = list.Distinct().ToList();
In .net 3.5?
Simply initialize a HashSet with a List of the same type:
var noDupes = new HashSet<T>(withDupes);
Or, if you want a List returned:
var noDupsList = new HashSet<T>(withDupes).ToList();
Sort it, then check two and two next to each others, as the duplicates will clump together.
Something like this:
list.Sort();
Int32 index = list.Count - 1;
while (index > 0)
{
if (list[index] == list[index - 1])
{
if (index < list.Count - 1)
(list[index], list[list.Count - 1]) = (list[list.Count - 1], list[index]);
list.RemoveAt(list.Count - 1);
index--;
}
else
index--;
}
Notes:
Comparison is done from back to front, to avoid having to resort list after each removal
This example now uses C# Value Tuples to do the swapping, substitute with appropriate code if you can't use that
The end-result is no longer sorted
I like to use this command:
List<Store> myStoreList = Service.GetStoreListbyProvince(provinceId)
.GroupBy(s => s.City)
.Select(grp => grp.FirstOrDefault())
.OrderBy(s => s.City)
.ToList();
I have these fields in my list: Id, StoreName, City, PostalCode
I wanted to show list of cities in a dropdown which has duplicate values.
solution: Group by city then pick the first one for the list.
It worked for me. simply use
List<Type> liIDs = liIDs.Distinct().ToList<Type>();
Replace "Type" with your desired type e.g. int.
As kronoz said in .Net 3.5 you can use Distinct().
In .Net 2 you could mimic it:
public IEnumerable<T> DedupCollection<T> (IEnumerable<T> input)
{
var passedValues = new HashSet<T>();
// Relatively simple dupe check alg used as example
foreach(T item in input)
if(passedValues.Add(item)) // True if item is new
yield return item;
}
This could be used to dedupe any collection and will return the values in the original order.
It's normally much quicker to filter a collection (as both Distinct() and this sample does) than it would be to remove items from it.
An extension method might be a decent way to go... something like this:
public static List<T> Deduplicate<T>(this List<T> listToDeduplicate)
{
return listToDeduplicate.Distinct().ToList();
}
And then call like this, for example:
List<int> myFilteredList = unfilteredList.Deduplicate();
In Java (I assume C# is more or less identical):
list = new ArrayList<T>(new HashSet<T>(list))
If you really wanted to mutate the original list:
List<T> noDupes = new ArrayList<T>(new HashSet<T>(list));
list.clear();
list.addAll(noDupes);
To preserve order, simply replace HashSet with LinkedHashSet.
This takes distinct (the elements without duplicating elements) and convert it into a list again:
List<type> myNoneDuplicateValue = listValueWithDuplicate.Distinct().ToList();
Use Linq's Union method.
Note: This solution requires no knowledge of Linq, aside from that it exists.
Code
Begin by adding the following to the top of your class file:
using System.Linq;
Now, you can use the following to remove duplicates from an object called, obj1:
obj1 = obj1.Union(obj1).ToList();
Note: Rename obj1 to the name of your object.
How it works
The Union command lists one of each entry of two source objects. Since obj1 is both source objects, this reduces obj1 to one of each entry.
The ToList() returns a new List. This is necessary, because Linq commands like Union returns the result as an IEnumerable result instead of modifying the original List or returning a new List.
As a helper method (without Linq):
public static List<T> Distinct<T>(this List<T> list)
{
return (new HashSet<T>(list)).ToList();
}
Here's an extension method for removing adjacent duplicates in-situ. Call Sort() first and pass in the same IComparer. This should be more efficient than Lasse V. Karlsen's version which calls RemoveAt repeatedly (resulting in multiple block memory moves).
public static void RemoveAdjacentDuplicates<T>(this List<T> List, IComparer<T> Comparer)
{
int NumUnique = 0;
for (int i = 0; i < List.Count; i++)
if ((i == 0) || (Comparer.Compare(List[NumUnique - 1], List[i]) != 0))
List[NumUnique++] = List[i];
List.RemoveRange(NumUnique, List.Count - NumUnique);
}
Installing the MoreLINQ package via Nuget, you can easily distinct object list by a property
IEnumerable<Catalogue> distinctCatalogues = catalogues.DistinctBy(c => c.CatalogueCode);
If you have tow classes Product and Customer and we want to remove duplicate items from their list
public class Product
{
public int Id { get; set; }
public string ProductName { get; set; }
}
public class Customer
{
public int Id { get; set; }
public string CustomerName { get; set; }
}
You must define a generic class in the form below
public class ItemEqualityComparer<T> : IEqualityComparer<T> where T : class
{
private readonly PropertyInfo _propertyInfo;
public ItemEqualityComparer(string keyItem)
{
_propertyInfo = typeof(T).GetProperty(keyItem, BindingFlags.GetProperty | BindingFlags.Instance | BindingFlags.Public);
}
public bool Equals(T x, T y)
{
var xValue = _propertyInfo?.GetValue(x, null);
var yValue = _propertyInfo?.GetValue(y, null);
return xValue != null && yValue != null && xValue.Equals(yValue);
}
public int GetHashCode(T obj)
{
var propertyValue = _propertyInfo.GetValue(obj, null);
return propertyValue == null ? 0 : propertyValue.GetHashCode();
}
}
then, You can remove duplicate items in your list.
var products = new List<Product>
{
new Product{ProductName = "product 1" ,Id = 1,},
new Product{ProductName = "product 2" ,Id = 2,},
new Product{ProductName = "product 2" ,Id = 4,},
new Product{ProductName = "product 2" ,Id = 4,},
};
var productList = products.Distinct(new ItemEqualityComparer<Product>(nameof(Product.Id))).ToList();
var customers = new List<Customer>
{
new Customer{CustomerName = "Customer 1" ,Id = 5,},
new Customer{CustomerName = "Customer 2" ,Id = 5,},
new Customer{CustomerName = "Customer 2" ,Id = 5,},
new Customer{CustomerName = "Customer 2" ,Id = 5,},
};
var customerList = customers.Distinct(new ItemEqualityComparer<Customer>(nameof(Customer.Id))).ToList();
this code remove duplicate items by Id if you want remove duplicate items by other property, you can change nameof(YourClass.DuplicateProperty) same nameof(Customer.CustomerName) then remove duplicate items by CustomerName Property.
If you don't care about the order you can just shove the items into a HashSet, if you do want to maintain the order you can do something like this:
var unique = new List<T>();
var hs = new HashSet<T>();
foreach (T t in list)
if (hs.Add(t))
unique.Add(t);
Or the Linq way:
var hs = new HashSet<T>();
list.All( x => hs.Add(x) );
Edit: The HashSet method is O(N) time and O(N) space while sorting and then making unique (as suggested by #lassevk and others) is O(N*lgN) time and O(1) space so it's not so clear to me (as it was at first glance) that the sorting way is inferior
Might be easier to simply make sure that duplicates are not added to the list.
if(items.IndexOf(new_item) < 0)
items.add(new_item)
You can use Union
obj2 = obj1.Union(obj1).ToList();
Another way in .Net 2.0
static void Main(string[] args)
{
List<string> alpha = new List<string>();
for(char a = 'a'; a <= 'd'; a++)
{
alpha.Add(a.ToString());
alpha.Add(a.ToString());
}
Console.WriteLine("Data :");
alpha.ForEach(delegate(string t) { Console.WriteLine(t); });
alpha.ForEach(delegate (string v)
{
if (alpha.FindAll(delegate(string t) { return t == v; }).Count > 1)
alpha.Remove(v);
});
Console.WriteLine("Unique Result :");
alpha.ForEach(delegate(string t) { Console.WriteLine(t);});
Console.ReadKey();
}
There are many ways to solve - the duplicates issue in the List, below is one of them:
List<Container> containerList = LoadContainer();//Assume it has duplicates
List<Container> filteredList = new List<Container>();
foreach (var container in containerList)
{
Container duplicateContainer = containerList.Find(delegate(Container checkContainer)
{ return (checkContainer.UniqueId == container.UniqueId); });
//Assume 'UniqueId' is the property of the Container class on which u r making a search
if(!containerList.Contains(duplicateContainer) //Add object when not found in the new class object
{
filteredList.Add(container);
}
}
Cheers
Ravi Ganesan
Here's a simple solution that doesn't require any hard-to-read LINQ or any prior sorting of the list.
private static void CheckForDuplicateItems(List<string> items)
{
if (items == null ||
items.Count == 0)
return;
for (int outerIndex = 0; outerIndex < items.Count; outerIndex++)
{
for (int innerIndex = 0; innerIndex < items.Count; innerIndex++)
{
if (innerIndex == outerIndex) continue;
if (items[outerIndex].Equals(items[innerIndex]))
{
// Duplicate Found
}
}
}
}
David J.'s answer is a good method, no need for extra objects, sorting, etc. It can be improved on however:
for (int innerIndex = items.Count - 1; innerIndex > outerIndex ; innerIndex--)
So the outer loop goes top bottom for the entire list, but the inner loop goes bottom "until the outer loop position is reached".
The outer loop makes sure the entire list is processed, the inner loop finds the actual duplicates, those can only happen in the part that the outer loop hasn't processed yet.
Or if you don't want to do bottom up for the inner loop you could have the inner loop start at outerIndex + 1.
A simple intuitive implementation:
public static List<PointF> RemoveDuplicates(List<PointF> listPoints)
{
List<PointF> result = new List<PointF>();
for (int i = 0; i < listPoints.Count; i++)
{
if (!result.Contains(listPoints[i]))
result.Add(listPoints[i]);
}
return result;
}
All answers copy lists, or create a new list, or use slow functions, or are just painfully slow.
To my understanding, this is the fastest and cheapest method I know (also, backed by a very experienced programmer specialized on real-time physics optimization).
// Duplicates will be noticed after a sort O(nLogn)
list.Sort();
// Store the current and last items. Current item declaration is not really needed, and probably optimized by the compiler, but in case it's not...
int lastItem = -1;
int currItem = -1;
int size = list.Count;
// Store the index pointing to the last item we want to keep in the list
int last = size - 1;
// Travel the items from last to first O(n)
for (int i = last; i >= 0; --i)
{
currItem = list[i];
// If this item was the same as the previous one, we don't want it
if (currItem == lastItem)
{
// Overwrite last in current place. It is a swap but we don't need the last
list[i] = list[last];
// Reduce the last index, we don't want that one anymore
last--;
}
// A new item, we store it and continue
else
lastItem = currItem;
}
// We now have an unsorted list with the duplicates at the end.
// Remove the last items just once
list.RemoveRange(last + 1, size - last - 1);
// Sort again O(n logn)
list.Sort();
Final cost is:
nlogn + n + nlogn = n + 2nlogn = O(nlogn) which is pretty nice.
Note about RemoveRange:
Since we cannot set the count of the list and avoid using the Remove funcions, I don't know exactly the speed of this operation but I guess it is the fastest way.
Using HashSet this can be done easily.
List<int> listWithDuplicates = new List<int> { 1, 2, 1, 2, 3, 4, 5 };
HashSet<int> hashWithoutDuplicates = new HashSet<int> ( listWithDuplicates );
List<int> listWithoutDuplicates = hashWithoutDuplicates.ToList();
Using HashSet:
list = new HashSet<T>(list).ToList();
public static void RemoveDuplicates<T>(IList<T> list )
{
if (list == null)
{
return;
}
int i = 1;
while(i<list.Count)
{
int j = 0;
bool remove = false;
while (j < i && !remove)
{
if (list[i].Equals(list[j]))
{
remove = true;
}
j++;
}
if (remove)
{
list.RemoveAt(i);
}
else
{
i++;
}
}
}
If you need to compare complex objects, you will need to pass a Comparer object inside the Distinct() method.
private void GetDistinctItemList(List<MyListItem> _listWithDuplicates)
{
//It might be a good idea to create MyListItemComparer
//elsewhere and cache it for performance.
List<MyListItem> _listWithoutDuplicates = _listWithDuplicates.Distinct(new MyListItemComparer()).ToList();
//Choose the line below instead, if you have a situation where there is a chance to change the list while Distinct() is running.
//ToArray() is used to solve "Collection was modified; enumeration operation may not execute" error.
//List<MyListItem> _listWithoutDuplicates = _listWithDuplicates.ToArray().Distinct(new MyListItemComparer()).ToList();
return _listWithoutDuplicates;
}
Assuming you have 2 other classes like:
public class MyListItemComparer : IEqualityComparer<MyListItem>
{
public bool Equals(MyListItem x, MyListItem y)
{
return x != null
&& y != null
&& x.A == y.A
&& x.B.Equals(y.B);
&& x.C.ToString().Equals(y.C.ToString());
}
public int GetHashCode(MyListItem codeh)
{
return codeh.GetHashCode();
}
}
And:
public class MyListItem
{
public int A { get; }
public string B { get; }
public MyEnum C { get; }
public MyListItem(int a, string b, MyEnum c)
{
A = a;
B = b;
C = c;
}
}
I think the simplest way is:
Create a new list and add unique item.
Example:
class MyList{
int id;
string date;
string email;
}
List<MyList> ml = new Mylist();
ml.Add(new MyList(){
id = 1;
date = "2020/09/06";
email = "zarezadeh#gmailcom"
});
ml.Add(new MyList(){
id = 2;
date = "2020/09/01";
email = "zarezadeh#gmailcom"
});
List<MyList> New_ml = new Mylist();
foreach (var item in ml)
{
if (New_ml.Where(w => w.email == item.email).SingleOrDefault() == null)
{
New_ml.Add(new MyList()
{
id = item.id,
date = item.date,
email = item.email
});
}
}

Fastest way to find common items across multiple lists in C#

Given the following:
List<List<Option>> optionLists;
what would be a quick way to determine the subset of Option objects that appear in all N lists? Equality is determined through some string property such as option1.Value == option2.Value.
So we should end up with List<Option> where each item appears only once.
Ok, this will find the list of Option objects that have a Value appearing in every list.
var x = from list in optionLists
from option in list
where optionLists.All(l => l.Any(o => o.Value == option.Value))
orderby option.Value
select option;
It doesn't do a "distinct" select so it'll return multiple Option objects, some of them with the same Value.
Building on Matt's answer, since we are only interested in options that all lists have in common, we can simply check for any options in the first list that the others share:
var sharedOptions =
from option in optionLists.First( ).Distinct( )
where optionLists.Skip( 1 ).All( l => l.Contains( option ) )
select option;
If an option list cannot contain duplicate entires, the Distinct call is unnecessary. If the lists vary greatly in size, it would be better to iterate over the options in the shortest list, rather than whatever list happens to be First. Sorted or hashed collections could be used to improve the lookup time of the Contains call, though it should not make much difference for a moderate number of items.
Here's a much more efficent implementation:
static SortedDictionary<T,bool>.KeyCollection FindCommon<T> (List<List<T>> items)
{
SortedDictionary<T, bool>
current_common = new SortedDictionary<T, bool> (),
common = new SortedDictionary<T, bool> ();
foreach (List<T> list in items)
{
if (current_common.Count == 0)
{
foreach (T item in list)
{
common [item] = true;
}
}
else
{
foreach (T item in list)
{
if (current_common.ContainsKey(item))
common[item] = true;
else
common[item] = false;
}
}
if (common.Count == 0)
{
current_common.Clear ();
break;
}
SortedDictionary<T, bool>
swap = current_common;
current_common = common;
common = swap;
common.Clear ();
}
return current_common.Keys;
}
It works by creating a set of all items common to all lists processed so far and comparing each list with this set, creating a temporary set of the items common to the current list and the list of common items so far. Effectively an O(n.m) where n is the number of lists and m the number of items in the lists.
An example of using it:
static void Main (string [] args)
{
Random
random = new Random();
List<List<int>>
items = new List<List<int>>();
for (int i = 0 ; i < 10 ; ++i)
{
List<int>
list = new List<int> ();
items.Add (list);
for (int j = 0 ; j < 100 ; ++j)
{
list.Add (random.Next (70));
}
}
SortedDictionary<int, bool>.KeyCollection
common = FindCommon (items);
foreach (List<int> list in items)
{
list.Sort ();
}
for (int i = 0 ; i < 100 ; ++i)
{
for (int j = 0 ; j < 10 ; ++j)
{
System.Diagnostics.Trace.Write (String.Format ("{0,-4:D} ", items [j] [i]));
}
System.Diagnostics.Trace.WriteLine ("");
}
foreach (int item in common)
{
System.Diagnostics.Trace.WriteLine (String.Format ("{0,-4:D} ", item));
}
}
Fastest to write :)
var subset = optionLists.Aggregate((x, y) => x.Intersect(y))
what about using a hashSet? that way you can do what you want in O(n) where n is the number of items in all the lists combined, and I think that's the fastest way to do it.
you just have to iterate over every list and insert the values you find into the hashset
When you insert a key that already exists you will receive false as the return value of the .add method, otherwise true is returned
Sort, then do something akin to a merge-sort.
Basically you would do this:
Retrieve the first item from each list
Compare the items, if equal, output
If any of the items are before the others, sort-wise, retrieve a new item from the corresponding list to replace it, otherwise, retrieve new items to replace them all, from all the list
As long as you still got items, go back to 2.
I don't have the performance stats, but if you don't want to roll your own method, various collections libraries have a 'Set' or 'Set(T)' object that offer the usual set procedures. (listed in the order I would use them).
IESI Collections (literally just Set classes)
PowerCollections (not updated in a while)
C5 (never personally used)
You can do this by counting occurrences of all items in all lists - those items whose occurrence count is equal to the number of lists, are common to all lists:
static List<T> FindCommon<T>(IEnumerable<List<T>> lists)
{
Dictionary<T, int> map = new Dictionary<T, int>();
int listCount = 0; // number of lists
foreach (IEnumerable<T> list in lists)
{
listCount++;
foreach (T item in list)
{
// Item encountered, increment count
int currCount;
if (!map.TryGetValue(item, out currCount))
currCount = 0;
currCount++;
map[item] = currCount;
}
}
List<T> result= new List<T>();
foreach (KeyValuePair<T,int> kvp in map)
{
// Items whose occurrence count is equal to the number of lists are common to all the lists
if (kvp.Value == listCount)
result.Add(kvp.Key);
}
return result;
}
/// <summary>
/// The method FindCommonItems, returns a list of all the COMMON ITEMS in the lists contained in the listOfLists.
/// The method expects lists containing NO DUPLICATE ITEMS.
/// </summary>
/// <typeparam name="T"></typeparam>
/// <param name="allSets"></param>
/// <returns></returns>
public static List<T> FindCommonItems<T>(IEnumerable<List<T>> allSets)
{
Dictionary<T, int> map = new Dictionary<T, int>();
int listCount = 0; // Number of lists.
foreach (IEnumerable<T> currentSet in allSets)
{
int itemsCount = currentSet.ToList().Count;
HashSet<T> uniqueItems = new HashSet<T>();
bool duplicateItemEncountered = false;
listCount++;
foreach (T item in currentSet)
{
if (!uniqueItems.Add(item))
{
duplicateItemEncountered = true;
}
if (map.ContainsKey(item))
{
map[item]++;
}
else
{
map.Add(item, 1);
}
}
if (duplicateItemEncountered)
{
uniqueItems.Clear();
List<T> duplicateItems = new List<T>();
StringBuilder currentSetItems = new StringBuilder();
List<T> currentSetAsList = new List<T>(currentSet);
for (int i = 0; i < itemsCount; i++)
{
T currentItem = currentSetAsList[i];
if (!uniqueItems.Add(currentItem))
{
duplicateItems.Add(currentItem);
}
currentSetItems.Append(currentItem);
if (i < itemsCount - 1)
{
currentSetItems.Append(", ");
}
}
StringBuilder duplicateItemsNamesEnumeration = new StringBuilder();
int j = 0;
foreach (T item in duplicateItems)
{
duplicateItemsNamesEnumeration.Append(item.ToString());
if (j < uniqueItems.Count - 1)
{
duplicateItemsNamesEnumeration.Append(", ");
}
}
throw new Exception("The list " + currentSetItems.ToString() + " contains the following duplicate items: " + duplicateItemsNamesEnumeration.ToString());
}
}
List<T> result= new List<T>();
foreach (KeyValuePair<T, int> itemAndItsCount in map)
{
if (itemAndItsCount.Value == listCount) // Items whose occurrence count is equal to the number of lists are common to all the lists.
{
result.Add(itemAndItsCount.Key);
}
}
return result;
}
#Skizz The method is not correct. It returns also items that are not common to all the lists in items.
Here is the corrected method:
/// <summary>.
/// The method FindAllCommonItemsInAllTheLists, returns a HashSet that contains all the common items in the lists contained in the listOfLists,
/// regardless of the order of the items in the various lists.
/// </summary>
/// <typeparam name="T"></typeparam>
/// <param name="listOfLists"></param>
/// <returns></returns>
public static HashSet<T> FindAllCommonItemsInAllTheLists<T>(List<List<T>> listOfLists)
{
if (listOfLists == null || listOfLists.Count == 0)
{
return null;
}
HashSet<T> currentCommon = new HashSet<T>();
HashSet<T> common = new HashSet<T>();
foreach (List<T> currentList in listOfLists)
{
if (currentCommon.Count == 0)
{
foreach (T item in currentList)
{
common.Add(item);
}
}
else
{
foreach (T item in currentList)
{
if (currentCommon.Contains(item))
{
common.Add(item);
}
}
}
if (common.Count == 0)
{
currentCommon.Clear();
break;
}
currentCommon.Clear(); // Empty currentCommon for a new iteration.
foreach (T item in common) /* Copy all the items contained in common to currentCommon.
* currentCommon = common;
* does not work because thus currentCommon and common would point at the same object and
* the next statement:
* common.Clear();
* will also clear currentCommon.
*/
{
if (!currentCommon.Contains(item))
{
currentCommon.Add(item);
}
}
common.Clear();
}
return currentCommon;
}
After searching the 'net and not really coming up with something I liked (or that worked), I slept on it and came up with this. My SearchResult is similar to your Option. It has an EmployeeId in it and that's the thing I need to be common across lists. I return all records that have an EmployeeId in every list. It's not fancy, but it's simple and easy to understand, just what I like. For small lists (my case) it should perform just fineā€”and anyone can understand it!
private List<SearchResult> GetFinalSearchResults(IEnumerable<IEnumerable<SearchResult>> lists)
{
Dictionary<int, SearchResult> oldList = new Dictionary<int, SearchResult>();
Dictionary<int, SearchResult> newList = new Dictionary<int, SearchResult>();
oldList = lists.First().ToDictionary(x => x.EmployeeId, x => x);
foreach (List<SearchResult> list in lists.Skip(1))
{
foreach (SearchResult emp in list)
{
if (oldList.Keys.Contains(emp.EmployeeId))
{
newList.Add(emp.EmployeeId, emp);
}
}
oldList = new Dictionary<int, SearchResult>(newList);
newList.Clear();
}
return oldList.Values.ToList();
}

Categories

Resources