How to use custom IComparer for SortedDictionary? - c#

I am having difficulties to use my custom IComparer for my SortedDictionary<>. The goal is to put email addresses in a specific format (firstnam.lastname#domain.com) as the key, and sort by last name.
When I do something like this:
public class Program
{
public static void Main(string[] args)
{
SortedDictionary<string, string> list = new SortedDictionary<string, string>(new SortEmailComparer());
list.Add("a.johansson#domain.com", "value1");
list.Add("b.johansson#domain.com", "value2");
foreach (KeyValuePair<string, string> kvp in list)
{
Console.WriteLine(kvp.Key);
}
Console.ReadLine();
}
}
public class SortEmailComparer : IComparer<string>
{
public int Compare(string x, string y)
{
Regex regex = new Regex("\\b\\w*#\\b",
RegexOptions.IgnoreCase
| RegexOptions.CultureInvariant
| RegexOptions.IgnorePatternWhitespace
| RegexOptions.Compiled
);
string xLastname = regex.Match(x).ToString().Trim('#');
string yLastname = regex.Match(y).ToString().Trim('#');
return xLastname.CompareTo(yLastname);
}
}
I get this ArgumentException:
An entry with the same key already exists. when adding the second item.
I haven't worked with a custom IComparer for a SortedDictionary before, and I fail to see my error , what am I doing wrong?

If the 2 lastNames are equal then compare for example the whole email like:
int comp = xLastname.CompareTo(yLastname);
if (comp == 0)
return x.CompareTo(y);
return comp;
Actually, sorteddictionary comparison is also used to distinguish amongst keys* , so you must specify a complete comparison (not only your sorting strategy)
EDIT:
* I mean in sortedDictionary 2 keys are equal if Comparer gives 0

Well, I haven't taken apart your comparer - but it looks like it's just comparing by last name, and you're trying to add the same last name (johansson) twice. That should give you an ArgumentException.
What did you want to happen - and what do you want your comparer to do?
Perhaps you want to sort by last name and then first name? That way you can have two email addresses with the same last name but different first names, and have them still be in the dictionary together, ordered by first name.

Related

Check if the number is contained within Dictionary array C#

I have Dictionary that the key is an array of int, and the value is a string. How can I get the value by check if int is contained in the key array?
public static Dictionary<int[], string> MyDic = new Dictionary<int[], string>
{
{new int[]{2,25},"firstValue"},
{new int[]{3,91,315,322},"secondValue"}
};
I have :
int number=91;
string value=?;
I need the value will get "secondValue"
I think this is a bad design choice. If the numbers don't repeat between keys (as you said in your comment for the question) then just flatten the keys into a simple Dictionary<int,string>. Just have the different integers all be keys for the same strings.
For example:
Dictionary<int,string>
{
[2] = "firstValue",
[25] = "firstValue",
};
In order to not repeat the same values but as different objects you can place a reference there:
string firstValue = "firstValue";
Dictionary<int,string>
{
[2] = firstValue,
[25] = firstValue,
};
In this case changing the value's content (not for a string as it is immutable but if it was some other object) for one key will change for all.
Use contains and a foreach loop (more readable than some other solutions):
string value;
int number = 91;
foreach(KeyValuePair<int[], string> entry in MyDic)
{
if (entry.Key.Contains(number))
{
value = entry.Value;
}
}
However, maybe a dictionary isn't the right choice for this.
Check out Gilads answer for another structure that you could use
string value = MyDic.FirstOrDefault(x => x.Key.Contains(number)).Value;
? is not needed, can not apply ? operand to KeyValuePair
something like
value = MyDic.FirstOrDefault(x => x.Key.Contains(number)).Value;
will return the first occurrence or null

Handle Collision using Hashtable Class in c#

In the below scenario how can I handle or implement collision in C# using the Hashtable class? If the 'Key' value is same I am getting an "Argument Exception".
static void Main(string[] args)
{
Console.Write("Enter a string:");
string input = Console.ReadLine();
checkString(input);
Console.ReadLine();
}
static void checkString(string input)
{
Hashtable hashTbl = new Hashtable();
foreach(char c in input)
{
hashTbl.Add(c.GetHashCode(), c);
}
printHash(hashTbl);
}
static void printHash(Hashtable hash)
{
foreach(int key in hash.Keys)
{
Console.WriteLine("Key: {0} Value: {1}",key,hash[key]);
}
}
My Expectation:
What do I need to do in the 'Value' argument to get around the 'Collision' issue. I am trying to check if the string consists of unique characters.
It seems you are misunderstanding how the Hashtable class works (and it has been deprecated since 2005 - use Dictionary<K,V> instead, but its behavior here is identical).
It seems you're expecting it to be your job to get an object's hashcode and add it to the hashtable. It isn't. All you need to do is add the object you want to use as key (each character), and the internal implementation will extract the hashcode.
However, what you're actually doing won't work even if you added the key object yourself. You're taking an input string (say, "test"), and for each character, you're adding it to the hashtable as a key. But since keys are, by definition, unique, you'll be adding the character 't' twice (it shows up twice in the input), so you'll get an exception.
I am trying to check if the string consists of unique characters.
Then you need keys only without values, that's what HashSet<T> is for.
var chars = new HashSet<char>();
foreach (char c in input)
{
if (chars.Contains(c))
{
// c is not unique
}
else
{
chars.Add(c);
}
}
But I'd prefer usin LINQ in this case:
var hasUniqueChars = input.Length == input.Distinct().Count();
As previously stated you should probably switch to the Dictionary<TKey, TValue> class for this.
If you want to get around the collission issue, then you have to check the key for existence.
Dictionary<string, object> dictValues = new Dictionary<string, object>();
Then you can use check for collission:
if (dictValues.ContainsKey(YourKey))
{
/* ... your collission handling here ... */
}
else
{
// No collission
}
Another possibility would be, if you are not interested in preserving previous values for the same key:
dictValues[YourKey] = YourValue;
This will add the key entry if it is not there already. If it is, it will overwrite its value with the given input.

How to sort case insensitive with System.Dynamic.Linq?

I use System.Linq.Dynamic to order an items list.
items = items.AsQueryable().OrderBy("Name ASC");
To my surprise, lowercase names gets ordered after the capital cased names, so the items are returned something like this.
Ape
Cat
Dog
alligator
ant
beetle
I expected this order:
alligator
ant
Ape
beetle
Cat
Dog
Is there a way to get the correct order? Checked all method signatures for OrderBy and googled around, but nada.
You do not need to create a custom comparer because there's already a StringComparer class which derives from IComparer.
words.OrderBy (x => x, StringComparer.OrdinalIgnoreCase)
This way, you do not need to create different IComparer implementations if you wanted to use other string comparison methods, like StringComparer.InvariantCultureIgnoreCase.
However, this might be desirable depending on your situation. For example, I do have multiple extension methods defined in LINQPad, like OrderBySelfInvariantCultureIgnoreCase, because it is convenient to use this with code completion rather than typing out the equivalent code by hand:
public static IEnumerable<string> OrderBySelfInvariantCultureIgnoreCase(this IEnumerable<string> source)
{
return source.OrderBy (x => x, StringComparer.InvariantCultureIgnoreCase);
}
You must create a custom comparer, such as:
public void Main()
{
String[] words = { "aPPLE", "AbAcUs", "bRaNcH", "BlUeBeRrY", "ClOvEr", "cHeRry" };
var sortedWords = words.OrderBy(a => a, new CaseInsensitiveComparer());
ObjectDumper.Write(sortedWords);
}
public class CaseInsensitiveComparer : IComparer<string>
{
public int Compare(string x, string y)
{
return string.Compare(x, y, StringComparison.OrdinalIgnoreCase);
}
}
Found # https://code.msdn.microsoft.com/SQL-Ordering-Operators-050af19e
I have faced the same issue and found no easy solution over the internet. Then I was trying in many ways and finally got a very simple way. It completely worked for me. My solution is
string sort = "Name ASC";
string[] data = sort.Split(" ");
items.OrderBy($"{data[0].ToUpper() data[1]}");
Now the output is alligator,
ant,
Ape,
beetle,
Cat,
Dog

Custom List<string[]> Sort

I have a list of string[].
List<string[]> cardDataBase;
I need to sort that list by each list-item's second string value (item[1]) in custom order.
The custom order is a bit complicated, order by those starting characters:
"MW1"
"FW"
"DN"
"MWSTX1CK"
"MWSTX2FF"
then order by these letters following above starting letters:
"A"
"Q"
"J"
"C"
"E"
"I"
"A"
and then by the numbers following above.
a sample, unordered list left, ordered right:
MW1E10 MW1Q04
MWSTX2FFI06 MW1Q05
FWQ02 MW1E10
MW1Q04 MW1I06
MW1Q05 FWQ02
FWI01 FWI01
MWSTX2FFA01 DNC03
DNC03 MWSTX1CKC02
MWSTX1CKC02 MWSTX2FFI03
MWSTX2FFI03 MWSTX2FFI06
MW1I06 MWSTX2FFA01
I tried Linq but I am not that good in it right now and cannot solve this on my own. Do I need a dictionary, regex or a dictionary with regex in it? What would be the best approach?
I think you're approaching this incorrectly. You're not sorting strings, you're sorting structured objects that are misrepresented as strings (somebody aptly named this antipattern "stringly typed"). Your requirements show that you know this structure, yet it's not represented in the datastructure List<string[]>, and that's making your life hard. You should parse that structure into a real type (struct or class), and then sort that.
enum PrefixCode { MW1, FW, DN, MWSTX1CK, MWSTX2FF, }
enum TheseLetters { Q, J, C, E, I, A, }
struct CardRecord : IComparable<CardRecord> {
public readonly PrefixCode Code;
public readonly TheseLetters Letter;
public readonly uint Number;
public CardRecord(string input) {
Code = ParseEnum<PrefixCode>(ref input);
Letter = ParseEnum<TheseLetters>(ref input);
Number = uint.Parse(input);
}
static T ParseEnum<T>(ref string input) { //assumes non-overlapping prefixes
foreach(T val in Enum.GetValues(typeof(T))) {
if(input.StartsWith(val.ToString())) {
input = input.Substring(val.ToString().Length);
return val;
}
}
throw new InvalidOperationException("Failed to parse: "+input);
}
public int CompareTo(CardRecord other) {
var codeCmp = Code.CompareTo(other.Code);
if (codeCmp!=0) return codeCmp;
var letterCmp = Letter.CompareTo(other.Letter);
if (letterCmp!=0) return letterCmp;
return Number.CompareTo(other.Number);
}
public override string ToString() {
return Code.ToString() + Letter + Number.ToString("00");
}
}
A program using the above to process your example might then be:
static class Program {
static void Main() {
var inputStrings = new []{ "MW1E10", "MWSTX2FFI06", "FWQ02", "MW1Q04", "MW1Q05",
"FWI01", "MWSTX2FFA01", "DNC03", "MWSTX1CKC02", "MWSTX2FFI03", "MW1I06" };
var outputStrings = inputStrings
.Select(s => new CardRecord(s))
.OrderBy(c => c)
.Select(c => c.ToString());
Console.WriteLine(string.Join("\n", outputStrings));
}
}
This generates the same ordering as in your example. In real code, I'd recommend you name the types according to what they represent, and not, for example, TheseLetters.
This solution - with a real parse step - is superior because it's almost certain that you'll want to do more with this data at some point, and this allows you to actually access the components of the data easily. Furthermore, it's comprehensible to a future maintainer since the reason behind the ordering is somewhat clear. By contrast, if you chose to do complex string-based processing it's often very hard to understand what's going on (especially if it's part of a larger program, and not a tiny example as here).
Making new types is cheap. If your method's return value doesn't quite "fit" in an existing type, just make a new one, even if that means 1000's of types.
A bit spoonfeeding, but I found this question pretty interesting and perhaps it will be useful for others, also added some comments to explain:
void Main()
{
var cardDatabase = new List<string>{
"MW1E10",
"MWSTX2FFI06",
"FWQ02",
"MW1Q04",
"MW1Q05",
"FWI01",
"MWSTX2FFA01",
"DNC03",
"MWSTX1CKC02",
"MWSTX2FFI03",
"MW1I06",
};
var orderTable = new List<string>[]{
new List<string>
{
"MW1",
"FW",
"DN",
"MWSTX1CK",
"MWSTX2FF"
},
new List<string>
{
"Q",
"J",
"C",
"E",
"I",
"A"
}
};
var test = cardDatabase.Select(input => {
var r = Regex.Match(input, "^(MW1|FW|DN|MWSTX1CK|MWSTX2FF)(A|Q|J|C|E|I|A)([0-9]+)$");
if(!r.Success) throw new Exception("Invalid data!");
// for each input string,
// we are going to split it into "substrings",
// eg: MWSTX1CKC02 will be
// [MWSTX1CK, C, 02]
// after that, we use IndexOf on each component
// to calculate "real" order,
// note that thirdComponent(aka number component)
// does not need IndexOf because it is already representing the real order,
// we still want to convert string to integer though, because we don't like
// "string ordering" for numbers.
return new
{
input = input,
firstComponent = orderTable[0].IndexOf(r.Groups[1].Value),
secondComponent = orderTable[1].IndexOf(r.Groups[2].Value),
thirdComponent = int.Parse(r.Groups[3].Value)
};
// and after it's done,
// we start using LINQ OrderBy and ThenBy functions
// to have our custom sorting.
})
.OrderBy(calculatedInput => calculatedInput.firstComponent)
.ThenBy(calculatedInput => calculatedInput.secondComponent)
.ThenBy(calculatedInput => calculatedInput.thirdComponent)
.Select(calculatedInput => calculatedInput.input)
.ToList();
Console.WriteLine(test);
}
You can use the Array.Sort() method. Where your first parameter is the string[] you're sorting and the second parameter contains the complicated logic of determining the order.
You can use the IEnumerable.OrderBy method provided by the System.Linq namespace.

Finding duplicates in List<string>

In a list with some hundred thousand entries, how does one go about comparing each entry with the rest of the list for duplicates?
For example, List fileNames contains both "00012345.pdf" and "12345.pdf" and are considered duplicte. What is the best strategy to flagging this kind of a duplicate?
Thanks
Update: The naming of files is restricted to numbers. They are padded with zeros. Duplicates are where the padding is missing. Thus, "123.pdf" & "000123.pdf" are duplicates.
You probably want to implement your own substring comparer to test equality based on whether a substring is contained within another string.
This isn't necessarily optimised, but it will work. You could also possibly consider using Parallel Linq if you are using .NET 4.0.
EDIT: Answer updated to reflect refined question after it was edited
void Main()
{
List<string> stringList = new List<string> { "00012345.pdf","12345.pdf","notaduplicate.jpg","3453456363234.jpg"};
IEqualityComparer<string> comparer = new NumericFilenameEqualityComparer ();
var duplicates = stringList.GroupBy (s => s, comparer).Where(grp => grp.Count() > 1);
// do something with grouped duplicates...
}
// Not safe for null's !
// NB do you own parameter / null checks / string-case options etc !
public class NumericFilenameEqualityComparer : IEqualityComparer<string> {
private static Regex digitFilenameRegex = new Regex(#"\d+", RegexOptions.Compiled);
public bool Equals(string left, string right) {
Match leftDigitsMatch = digitFilenameRegex.Match(left);
Match rightDigitsMatch = digitFilenameRegex.Match(right);
long leftValue = leftDigitsMatch.Success ? long.Parse(leftDigitsMatch.Value) : long.MaxValue;
long rightValue = rightDigitsMatch.Success ? long.Parse(rightDigitsMatch.Value) : long.MaxValue;
return leftValue == rightValue;
}
public int GetHashCode(string value) {
return base.GetHashCode();
}
}
I understand you are looking for duplicates in order to remove them?
One way to go about it could be the following:
Create a class MyString which takes care of duplication rules. That is, overrides Equals and GetHashCode to recreate exactly the duplication rules you are considering. (I'm understanding from your question that 00012345.pdf and 12345.pdf should be considered duplicates?)
Make this class explicitly or implictly convertible to string (or override ToString() for that matter).
Create a HashCode<MyString> and fill it up iterating through your original List<String> checking for duplicates.
Might be dirty but it will do the trick. The only "hard" part here is correctly implementing your duplication rules.
I have a simple solution for everyone to find a duplicate string word and cahracter
For word
public class Test {
public static void main(String[] args) {
findDuplicateWords("i am am a a learner learner learner");
}
private static void findDuplicateWords(String string) {
HashMap<String,Integer> hm=new HashMap<>();
String[] s=string.split(" ");
for(String tempString:s){
if(hm.get(tempString)!=null){
hm.put(tempString, hm.get(tempString)+1);
}
else{
hm.put(tempString,1);
}
}
System.out.println(hm);
}
}
for character use for loop, get array length and use charAt()
Maybe somthing like this:
List<string> theList = new List<string>() { "00012345.pdf", "00012345.pdf", "12345.pdf", "1234567.pdf", "12.pdf" };
theList.GroupBy(txt => txt)
.Where(grouping => grouping.Count() > 1)
.ToList()
.ForEach(groupItem => Console.WriteLine("{0} duplicated {1} times with these values {2}",
groupItem.Key,
groupItem.Count(),
string.Join(" ", groupItem.ToArray())));

Categories

Resources