The specificity of sorting - c#

Code of the character '-' is 45, code of the character 'a' is 97. It's clear that '-' < 'a' is true.
Console.WriteLine((int)'-' + " " + (int)'a');
Console.WriteLine('-' < 'a');
45 97
True
Hence the result of the following sort is correct
var a1 = new string[] { "a", "-" };
Console.WriteLine(string.Join(" ", a1));
Array.Sort(a1);
Console.WriteLine(string.Join(" ", a1));
a -
- a
But why the result of the following sort is wrong?
var a2 = new string[] { "ab", "-b" };
Console.WriteLine(string.Join(" ", a2));
Array.Sort(a2);
Console.WriteLine(string.Join(" ", a2));
ab -b
ab -b

The - is ignored,
so - = "" < a
and -b = "b" > "ab"
this is because of Culture sort ( which is default )
https://msdn.microsoft.com/en-us/library/system.globalization.compareoptions(v=vs.110).aspx
The .NET Framework uses three distinct ways of sorting: word sort, string
sort, and ordinal sort. Word sort performs a culture-sensitive
comparison of strings. Certain nonalphanumeric characters might have
special weights assigned to them. For example, the hyphen ("-") might
have a very small weight assigned to it so that "coop" and "co-op"
appear next to each other in a sorted list. String sort is similar to
word sort, except that there are no special cases. Therefore, all
nonalphanumeric symbols come before all alphanumeric characters.
Ordinal sort compares strings based on the Unicode values of each
element of the string.

Related

Find in the List of words with letters in certain positions

I'm doing a crossword puzzle maker. The user selects cells for words, and the program compiles a crossword puzzle from the dictionary (all words which can be used in the crossword) - List<string>.
I need to find a word (words) in a dictionary which matches given mask (pattern).
For example, I need to find all words which match
#a###g
pattern, i.e. all words of length 6 in the dictionary with "a" at index 1 and "g" at index 5
The number of letters and their position are unknown in advance
How do I realize this?
You can convert word description (mask)
#a###g
into corresponding regular expression pattern:
^\p{L}a\p{L}{3}g$
Pattern explained:
^ - anchor, word beginning
\p{L} - arbitrary letter
a - letter 'a'
\p{L}{3} - exactly 3 arbitrary letters
g - letter 'g'
$ - anchor, word ending
and then get all words from dictionary which match this pattern:
Code:
using System.Linq;
using System.Text.RegularExpressions;
...
private static string[] Variants(string mask, IEnumerable<string> availableWords) {
Regex regex = new Regex("^" + Regex.Replace(mask, "#*", m => #$"\p{{L}}{{{m.Length}}}") + "$");
return availableWords
.Where(word => regex.IsMatch(availableWords))
.OrderBy(word => word)
.ToArray();
}
Demo:
string[] allWords = new [] {
"quick",
"brown",
"fox",
"jump",
"rating",
"coding"
"lazy",
"paring",
"fang",
"dog",
};
string[] variants = Variants("#a###g", allWords);
Console.Write(string.Join(Environment.NewLine, variants));
Outcome:
paring
rating
I need to find a word in a list with "a" at index 1 and "g" at index 5, like the following
wordList.Where(word => word.Length == 6 && word[1] == 'a' && word[5] == 'g')
The length check first will be critical to preventing a crash, unless your words are arranged into different lists by length..
If you mean that you literally will pass "#a###g" as the parameter that conveys the search term:
var term = "#a###g";
var search = term.Select((c,i) => (Chr:c,Idx:i)).Where(t => t.Chr != '#').ToArray();
var words = wordList.Where(word => word.Length == term.Length && search.All(t => word[t.Idx] == t.Chr));
How it works:
Take "#a###g" and project it to a sequence of the index of the char and the char itself, so ('#', 0),('a', 1),('#', 2),('#', 3),('#', 4),('g', 5)
Discard the '#', leaving only ('a', 1),('g', 5)
This means "'a' at position 1 and 'g' at 5"
Search the wordlist demanding that the word length is same as "#a###g", and also that All the search terms match when we "get the char out of the word at Idx and check it matches the Chr in the search term

How to sort a list of strings by prefix number then alphabetically

My goal is to sort a List<string> in messy order to an order like this ["1", "1a", "2", "2a", "2b", "a", "b"]
My code is a little long, so I've included it at this link https://dotnetfiddle.net/wZ0dTG.
What I'm trying to do is split the strings using Regex.Split(string, "([0-9]+)")[0] then based on which strings pass int.TryParse, I sort the list numerically or alphabetically.
The regex matches all the integers contained within the string.
Until I apply the regex, it sorts. Although it sorts, it doesn't sort them properly.
When I apply the regex, I get this error:
ArgumentException: Unable to sort because the IComparer.Compare() method returns inconsistent results. Either a value does not compare equal to itself, or one value repeatedly compared to another value yields different results. IComparer: 'System.Comparison`1[Ndot_Partnering_Tool.Data.Models.Project
So you have to split your strings into an (optional) numeric part and an (optional) rest. This can be done by a Regex:
var match = Regex.Match(item, #"(?<number>\d+)?(?<rest>.*)$");
The "number" part matches one or more digits, but is optional (question mark), the "rest" part matches the whole rest of the string.
Sorting via Linq:
var input = new List<string>{ "12a", "1", "1a", "2", "2a", "2b", "a", "b", "12a" };
var sorted = input.OrderBy(item =>
{
var match = Regex.Match(item, #"(?<number>\d+)?(?<rest>.*)$");
return Tuple.Create(match.Groups["number"].Success ? int.Parse(match.Groups["number"].Value) : -1, match.Groups["rest"].Value);
}).ToList();
(I deliberately decided to put the items without a leading number before the rest; this was not specified in the question).
Output: a, b, 1, 1a, 2, 2a, 2b, 12a
For this particular task the OrderBy method is perfect for you. I would use that instead of Regex.
OrderBy uses lambda expressions as a key to sort.
Since letters are after numbers in the alphabet this method is using, you can actually just sort by default.
You can do:
List<string> List = new List<string>() {"a", "2b", "1a", "1", "2", "2a", "b", "1b" };
List<string> OrderedList = List.OrderBy(x => x).ToList();
The OrderBy method returns IEnumerable so you have to convert it back to List.
Output:
The original list: a 2b 1a 1 2 2a b 1b
The ordered list: 1 1a 1b 2 2a 2b a b
Two problems:
SplitRegex() failes on argument "a" because it does not match regex (RegEx.Split returns array with one element). You can use this code:
return Regex.Split(x, "([0-9]+)").ElementAtOrDefault(1) ?? string.Empty;
When neither x nor y can be converted to integer, you call CompareString() for x and y, but x and y are not whole strings, they are only numerical parts (and because of that empty). You need to pass list items as is to comparer and extract numbers there:
bool leftcanconvertflag = Int32.TryParse(SplitRegex(x), out leftconvertresult);
bool rightcanconvertflag = Int32.TryParse(SplitRegex(y), out rightconvertresult);
if (leftcanconvertflag && !rightcanconvertflag)
{
compareresult = -1;
}
if (!leftcanconvertflag && rightcanconvertflag)
{
compareresult = 1;
}
if (leftcanconvertflag && rightcanconvertflag)
{
compareresult = leftconvertresult.CompareTo(rightconvertresult);
}
if (!leftcanconvertflag && !rightcanconvertflag)
{
compareresult = CompareString(x, y);
}
and sort list like this:
list.Sort(CompareContractNumbers);

sorting on List<string> with middle 2 character

I like to sort a list with middle 2 character. for example: The list contains following:
body1text
body2text
body11text
body3text
body12text
body13text
if I apply list.OrderBy(r => r.body), it will sort as follows:
body1text
body11text
body12text
body13text
body2text
body3text
But I need the following result:
body1text
body2text
body3text
body11text
body12text
body13text
is there any easy way to sort with middle 2 digit character?
Regards
Shuvra
The issue here is that your numbers are compared as strings, so string.Compare("11", "2") will return -1 meaning that "11" is less than "2". Assuming that your string is always in format "body" + n numbers + "text" you can match numbers with regex and parse an integer from result:
new[]
{
"body1text"
,"body2text"
,"body3text"
,"body11text"
,"body12text"
,"body13text"
}
.OrderBy(s => int.Parse(Regex.Match(s,#"\d+").Value))

Can we use StringComparer to sort all kind of strings including special characters?

I have used StringComparer.Ordinal to sort a list of strings. It sorts the strings including special characters except \\.
Is there any other options to sort \\ without writing user defined codes?
Tried the following code:
Var string={"#a","\\b","c","1"}
Array.Sort(string,StringComparer.Ordinal)
I expect output as
#a \\b 1 c
but the actual output is
#a 1 c \\b
The code-point of # is 35, 1 is 49, \ is 92, a/b/c is 97/98/99
The output from:
var arr = new[] { "#a", "\\b", "c", "1" };
Array.Sort(arr, StringComparer.Ordinal);
Console.WriteLine(string.Join(" ", arr));
is:
#a 1 \b c
So... it is working as expected, sorting them by their ordinal values.

Regex masking of words that contain a digit

Trying to come up with a 'simple' regex to mask bits of text that look like they might contain account numbers.
In plain English:
any word containing a digit (or a train of such words) should be matched
leave the last 4 digits intact
replace all previous part of the matched string with four X's (xxxx)
So far
I'm using the following:
[\-0-9 ]+(?<m1>[\-0-9]{4})
replacing with
xxxx${m1}
But this misses on the last few samples below
sample data:
123456789
a123b456
a1234b5678
a1234 b5678
111 22 3333
this is a a1234 b5678 test string
Actual results
xxxx6789
a123b456
a1234b5678
a1234 b5678
xxxx3333
this is a a1234 b5678 test string
Expected results
xxxx6789
xxxxb456
xxxx5678
xxxx5678
xxxx3333
this is a xxxx5678 test string
Is such an arrangement possible with a regex replace?
I think I"m going to need some greediness and lookahead functionality, but I have zero experience in those areas.
This works for your example:
var result = Regex.Replace(
input,
#"(?<!\b\w*\d\w*)(?<m1>\s?\b\w*\d\w*)+",
m => "xxxx" + m.Value.Substring(Math.Max(0, m.Value.Length - 4)));
If you have a value like 111 2233 33, it will print xxxx3 33. If you want this to be free from spaces, you could turn the lambda into a multi-line statement that removes whitespace from the value.
To explain the regex pattern a bit, it's got a negative lookbehind, so it makes sure that the word behind it does not have a digit in it (with optional word characters around the digit). Then it's got the m1 portion, which looks for words with digits in them. The last four characters of this are grabbed via some C# code after the regex pattern resolves the rest.
I don't think that regex is the best way to solve this problem and that's why I am posting this answer. For so complex situations, building the corresponding regex is too difficult and, what is worse, its clarity and adaptability is much lower than a longer-code approach.
The code below these lines delivers the exact functionality you are after, it is clear enough and can be easily extended.
string input = "this is a a1234 b5678 test string";
string output = "";
string[] temp = input.Trim().Split(' ');
bool previousNum = false;
string tempOutput = "";
foreach (string word in temp)
{
if (word.ToCharArray().Where(x => char.IsDigit(x)).Count() > 0)
{
previousNum = true;
tempOutput = tempOutput + word;
}
else
{
if (previousNum)
{
if (tempOutput.Length >= 4) tempOutput = "xxxx" + tempOutput.Substring(tempOutput.Length - 4, 4);
output = output + " " + tempOutput;
previousNum = false;
}
output = output + " " + word;
}
}
if (previousNum)
{
if (tempOutput.Length >= 4) tempOutput = "xxxx" + tempOutput.Substring(tempOutput.Length - 4, 4);
output = output + " " + tempOutput;
previousNum = false;
}
Have you tried this:
.*(?<m1>[\d]{4})(?<m2>.*)
with replacement
xxxx${m1}${m2}
This produces
xxxx6789
xxxx5678
xxxx5678
xxxx3333
xxxx5678 test string
You are not going to get 'a123b456' to match ... until 'b' becomes a number. ;-)
Here is my really quick attempt:
(\s|^)([a-z]*\d+[a-z,0-9]+\s)+
This will select all of those test cases. Now as for C# code, you'll need to check each match to see if there is a space at the beginning or end of the match sequence (e.g., the last example will have the space before and after selected)
here is the C# code to do the replace:
var redacted = Regex.Replace(record, #"(\s|^)([a-z]*\d+[a-z,0-9]+\s)+",
match => "xxxx" /*new String("x",match.Value.Length - 4)*/ +
match.Value.Substring(Math.Max(0, match.Value.Length - 4)));

Categories

Resources