Removing descrepancies from my list - c#

Hi there. I was hoping I could ask for some advice in regards to a problem I am struggling with.
I have a List with more than a thousand values and there are some duplicates, not exact duplicates but discrepencies based on upper and lower case.
so for example I would have
Training and training in the same list or
Vision and Values and Vision and values.
So there are various instances where there are minor discrepies based on Case difference.
How could I go about in removing thsese 'excess' values?

Use Linq:
var listWithDups = new List<string>() = {"blah","Blah","etc","etc."};
var listWithoutDups = listWithDups.Distinct(StringComparer.CurrentCultureIgnoreCase).ToList();

Tried this in LinqPad:
var list = new List<String> { "Hello", "World", "HELLO", "beautiful", "WORLD" };
var l = list.Distinct(StringComparer.CurrentCultureIgnoreCase).ToList();
Console.WriteLine(l);

I would add all entries to a Hashset
A Hashset is a collection that stores maximum one of every item added to it.
You'd write a "ignore case" equity comparer that you'd pass into the Hashset construcor.
Like:
var set = new Hashset( yourListWithDuplicates, (x,y) => x.Equals(y,
StringComparison.CurrentCultureIgnoreCase));

Related

comparing two lists of string and if one of the item match do some processing

I have two lists of string. I want to compare each elements in one list with another and if at least one of them match then do some processing else dont do anything.
I dont know how to do. I do have the following lists and the code I used was SequenceEqual but my lead said its wrong as it just compares if its equal or not and does nothing. I couldn't disagree and I want to achieve my intended functionality I mentioned above. Please help. As you seem, order doesn't matter, here 123 is in both list but in different order, so it matches and hence do some processing as per my requirement.
List<string> list1 = new List<string> () { "123", "234" };
List<string> list2 = new List<string> () { "333", "234" , "123"};
You can use the Any method for this :
var matchfound = list1.Any(x=> list2.Contains(x));
Now you can do conditional block on the matchFound if it returns true you can process what ever is required.
if you want to do case insentitive comparison then you will need to use String.Equals and can specify if case does not matter for comaparing those.
You can use Intersect to find common elements:
var intersecting = list1.Intersect(list2);
If you just want to know if there are common elements append .Any():
bool atLeastOneCommonElement = intersecting.Any();
If you want to process them:
foreach(var commonElement in intersecting)
{
// do something ...
}
You could check with Intersect and Any
var matchFound = list1.Intersect(list2).Any();
For example,
List<string> list1 = new List<string>{ "123", "234" };
List<string> list2 = new List<string>{ "333", "234" , "123"};
var result = list1.Intersect(list2).Any();
Output True
List<string> list3 = new List<string>{"5656","8989"};
result = list1.Intersect(list3).Any();
Output False
You need to take all those item that are matches from both list and then do code if match found like
foreach (var item in list1.Where(x => list2.Contains(x)))
{
//do some processing here
Console.WriteLine($"Match found: {item}");
}
In above code foreach iterate when item present in both list.
Output:
Use LINQ to find the matches; and then check the resulting array size as follows:
var intersect = list1.Where(el1=>list2.Any(el2=>el2==el1));
var isMatch = intersect.Count > 0;

Most efficient way to compare two lists and delete the same

I want to compare two lists and get the valid words into a new list.
var words = new List<string>();
var badWords = new List<string>();
//this is just an example list. actual list does contain 700 records
words.Add("Apple");
words.Add("Moron");
words.Add("Seafood");
words.Add("Cars");
words.Add("Chicken");
words.Add("Twat");
words.Add("Watch");
words.Add("Android");
words.Add("c-sharp");
words.Add("Fool");
badWords.Add("Idiot");
badWords.Add("Retarded");
badWords.Add("Twat");
badWords.Add("Fool");
badWords.Add("Moron");
I am looking for most efficient way to compare the lists and put all the 'good' words into a new list. The finalList shouldn't contain "Moron", "Twat" and "Fool".
var finalList = new List<string>();
Or is it unnecessary to create a new List? I am happy to hear your ideas!
Thank you in advance
Use EnumerableExcept function storing in System.Linq namespace
finalList = words.Except(badWords).ToList();
Most efficient way to save your time and also the fastest way to do it, because Except implementation uses Set, which is fast
Use Enumerable.Except:
List<string> cleanList = words.Except(badWords).ToList();
This is efficient because Except uses a set based approach.
An even more efficient approach is to avoid that "bad" words are added to the first list at all. For example by using a HashSet<string> with a case-insensitive comparer:
var badWords = new HashSet<string>(StringComparer.InvariantCultureIgnoreCase){ "Idiot", "Retarded", "Twat", "Fool", "Moron" };
string word = "idiot";
if (!badWords.Contains(word))
words.Add(word);
https://msdn.microsoft.com/library/bb908822(v=vs.90).aspx
var words = new List<string>();
var badWords = new List<string>();
//this is just an example list. actual list does contain 700 records
words.Add("Apple");
words.Add("Moron");
words.Add("Seafood");
words.Add("Cars");
words.Add("Chicken");
words.Add("Twat");
words.Add("Watch");
words.Add("Android");
words.Add("c-sharp");
words.Add("Fool");
badWords.Add("Idiot");
badWords.Add("Retarded");
badWords.Add("Twat");
badWords.Add("Fool");
badWords.Add("Moron");
var result = words.Except(badWords).ToList();
Edit: Got in late.
you can use contains method
words.Where(g=>!badWords.Contains(g)).ToList()
If your don't want to create a new List you can remove the bad words from your existing List with RemoveAll()
words.RemoveAll(badWords.Contains);

Sort Dictionary based on values in a list of integers

I'm having a problem sorting a dictionary based on the sum of 1s in lists of integers inside the same Dictionary. So first I want to count the 1s in each list and then sort the dictionary based on the result.
I've found some solutions in Stackoverflow but they don't answer my question.
Th dictionary looks like the following:
Dictionary<int, List<int>> myDic = new Dictionary<int, List<int>>();
List<int> myList = new List<int>();
myList = new List<int>();//Should appear third
myList.Add(0);
myList.Add(0);
myList.Add(1);
myDic.Add(0, myList);
myList = new List<int>();//Should appear second
myList.Add(1);
myList.Add(1);
myList.Add(0);
myDic.Add(1, myList);
myList = new List<int>();//Should appear first
myList.Add(1);
myList.Add(1);
myList.Add(1);
myDic.Add(2, myList);
I tried this code but it seems it doesn't do anything.
List<KeyValuePair<int, List<int>>> myList2 = myDic.ToList();
myList2.Sort((firstPair, nextPair) =>
{
return firstPair.Value.Where(i=>i==1).Sum().CompareTo(nextPair.Value.Where(x=>x==1).Sum());
});
You are sorting list items in ascending order. I.e. items with more 1s will go to the end of list. You should use descending order. Just compare nextPair to firstPair (or change sign of comparison result):
myList2.Sort((firstPair, nextPair) =>
{
return nextPair.Value.Where(i => i==1).Sum().CompareTo(
firstPair.Value.Where(x => x==1).Sum());
});
This approach has one problem - sum of 1s in value will be calculated each time two items are compared. Better use Enumerable.OrderByDescending. It's more simple to use, and it will compute comparison values (i.e. keys) only once. Thus Dictionary is a enumerable of KeyValuePairs, you can use OrderByDescending directly with dictionary:
var result = myDic.OrderByDescending(kvp => kvp.Value.Where(i => i == 1).Sum());
Your sort is backward, which is why you think it's not doing anything. Reverse the firstPair/nextPair values in your lambda and you'll get the result you expect.
Though, #Sergey Berezovskiy is correct, you could just use OrderBy, your example code could benefit from perhaps a different pattern overall.
class SummedKV
{
public KeyValuePair Kvp {get; set;}
public int Sum {get; set;}
}
var myList =
myDic.ToList()
.Select(kvp=> new SummedKV {Kvp = kvp, Sum = kvp.Value.Sum() });
myList.Sort(skv=>skv.Sum);
Maybe something simpler
myList2.OrderByDescending(x => x.Value.Sum());
Your code does do something. it creates a list of the items that used to be in the dictionary, sorted based on the number of 1 items contained in the list. The code that you have correctly creates this list and sorts it as your requirements say it should. (Note that using OrderByDescending would let you do the same thing more simply.)
It has no effect on the dictionary that you pulled the lists out of, of course. Dictionaries are unordered, so you can't "reorder" the items even if you wanted to. If it were some different type of ordered collection then it would be possible to change the order of it's items, but just creating a new structure and ordering that wouldn't do it; you'd need to use some sort of operation on the collection itself to change the order of the items.

Problems with Lists and Dictionaries

I'm having a problem with a Dictionary of Lists for both the Key and Value.
My dictionary is set up as this
Dictionary<List<string>,List<double>> f = new Dictionary<List<string>,List<double>>();
(it's like this for a very specific reason).
My problem is how to get the two lists out into their own lists. I have tried the following
List<string> s = new List<string>(f.Keys);
List<string> s = f.Select(kvp=>kvp.Keys).ToList()
List<string> s = f.Select(kvp=>kvp.Keys);
List<string> s = f.Keys;
as well as a variant using IEnumerable. No matter what I do, I can't seem to retrieve the Keys (or using f.Values, the values).
Any help here would be appreciated.
A list of strings seems like a VERY odd key for a dictionary, and will have complexities of its own, but you seem confident that it's correct, so I'll focus on your actual question.
Since Keys is a collection of key values, each of which is a List<string>, any of these should work:
List<List<string>> s = f.Select(kvp=>kvp.Key).ToList();
List<List<string>> s = f.Keys.ToList();
If you want ALL strings as a single list (essentially joining all of the lists together), you can use:
List<string> s2 = f.SelectMany(kvp => kvp.Key).ToList();
The SelectMany essentially selects each item from the collection within each key across the whole dictionary.
Lol This is probably the funniest thing I've seen in a while.
Alright. In c# there is a structure called KeyValuePair<TKey, TValue>. You can then iterate through the entire dataset with foreach and get access to what you want.
foreach(KeyValuePair<<List<string>,List<double>> item in f) {
List<string> key = item.key;
List<double> value = item.value;
}
If you have only 1 key,meaning 1 list of strings:
List<string> newf = f.Keys.ElementAt(0);
If you have more place another index.
Or check if the list as some item so that would be the list to retrieve:
List<string> newf = f.Keys.Single(k => k.Contains("SomeString"));
//this must exist or it will throw exception.
Get a key by checking if the corresponding values sum is above(or less,or equal...)
var newf1 = f.Where(k => k.Value.Sum() > 10).Select(v => v.Key);

Check if one collection of values contains another

Suppose I have two collections as follows:
Collection1:
"A1"
"A1"
"M1"
"M2"
Collection2:
"M2"
"M3"
"M1"
"A1"
"A1"
"A2"
all the values are string values. I want to know if all the elements in Collection1 are contained in Collection2, but I have no guarantee on the order and a set may have multiple entries with the same value. In this case, Collection2 does contain Collection1 because Collection2 has two A1's, M1 and M2. Theres the obvious way: sorting both collections and popping off values as i find matches, but I was wondering if there's a faster more efficient way to do this. Again with the initial collections I have no guarantee on the order or how many times a given value will appear
EDIT: Changed set to collection just to clear up that these aren't sets as they can contain duplicate values
The most concise way I know of:
//determine if Set2 contains all of the elements in Set1
bool containsAll = Set1.All(s => Set2.Contains(s));
Yes, there is a faster way, provided you're not space-constrained. (See space/time tradeoff.)
The algorithm:
Just insert all the elements in Set2 into a hashtable (in C# 3.5, that's a HashSet<string>), and then go through all the elements of Set1 and check if they're in the hashtable. This method is faster (Θ(m + n) time complexity), but uses O(n) space.
Alternatively, just say:
bool isSuperset = new HashSet<string>(set2).IsSupersetOf(set1);
Edit 1:
For those people concerned about the possibility of duplicates (and hence the misnomer "set"), the idea can easily be extended:
Just make a new Dictionary<string, int> representing the count of each word in the super-list (add one to the count each time you see an instance of an existing word, and add the word with a count of 1 if it's not in the dictionary), and then go through the sub-list and decrement the count each time. If every word exists in the dictionary and the count is never zero when you try to decrement it, then the subset is in fact a sub-list; otherwise, you had too many instances of a word (or it didn't exist at all), so it's not a real sub-list.
Edit 2:
If the strings are very big and you're concerned about space efficiency, and an algorithm that works with (very) high probability works for you, then try storing a hash of each string instead. It's technically not guaranteed to work, but the probability of it not working is pretty darn low.
The problem I see with the HashSet, Intersect, and other Set theory answers is that you do contain duplicates, and "A set is a collection that contains no duplicate elements". Here's a way to handle the duplicate cases.
var list1 = new List<string> { "A1", "A1", "M1", "M2" };
var list2 = new List<string> { "M2", "M3", "M1", "A1", "A1", "A2" };
// Remove returns true if it was able to remove it, and it won't be there to be matched again if there's a duplicate in list1
bool areAllPresent = list1.All(i => list2.Remove(i));
EDIT: I renamed from Set1 and Set2 to list1 and list2 to appease Mehrdad.
EDIT 2: The comment implies it, but I wanted to explicitly state that this does alter list2. Only do it this way if you're using it as a comparison or control but don't need the contents afterwards.
Check out linq...
string[] set1 = {"A1", "A1", "M1", "M2" };
string[] set2 = { "M2", "M3", "M1", "A1", "A1", "A2" };
var matching = set1.Intersect(set2);
foreach (string x in matching)
{
Console.WriteLine(x);
}
Similar one
string[] set1 = new string[] { "a1","a2","a3","a4","a5","aa","ab" };
string[] set2 = new string[] {"m1","m2","a4","a6","a1" };
var a = set1.Select(set => set2.Contains(set));

Categories

Resources