I have some strings, entered by users, that may look like this:
++7
7++
1++7
1+7
1++7+10++15+20+30++
Those are to mean:
Anything up to and including 7
Anything from 7 and up
1 and 7 and anything inbetween
1 and 7 only
1 to 7, 10 to 15, 20 and 30 and above
I need to parse those strings into actual ranges. That is I need to create a list of objects of type Range which have a start and an end. For single items I just set the start and end to the same, and for those that are above or below, I set start or end to null. For example for the first one I would get one range which had start set to null and end set to 7.
I currently have a kind of messy method using a regular expression to do this splitting and parsing and I want to simplify it. My problem is that I need to split on + first, and then on ++. But if I split on + first, then the ++ instances are ruined and I end up with a mess.
Looking at those strings it should be really easy to parse them, I just can't come up with a smart way to do it. It just have to be an easier (cleaner, easier to read) way. Probably involving some easy concept I just haven't heard about before :P
The regular expression looks like this:
private readonly Regex Pattern = new Regex(#" ( [+]{2,} )?
([^+]+)
(?:
(?: [+]{2,} [^+]* )*
[+]{2,} ([^+]+)
)?
( [+]{2,} )? ", RegexOptions.IgnorePatternWhitespace);
That is then used like this:
public IEnumerable<Range<T>> Parse(string subject, TryParseDelegate<string, T> itemParser)
{
if (string.IsNullOrEmpty(subject))
yield break;
for (var item = RangeStringConstants.Items.Match(subject); item.Success; item = item.NextMatch())
{
var startIsOpen = item.Groups[1].Success;
var endIsOpen = item.Groups[4].Success;
var startItem = item.Groups[2].Value;
var endItem = item.Groups[3].Value;
if (endItem == string.Empty)
endItem = startItem;
T start, end;
if (!itemParser(startItem, out start) || !itemParser(endItem, out end))
continue;
yield return Range.Create(startIsOpen ? default(T) : start,
endIsOpen ? default(T) : end);
}
}
It works, but I don't think it is particularly readable or maintainable. For example changing the '+' and '++' into ',' and '-' would not be that trivial to do.
My problem is that I need to split on + first, and then on ++. But if I split on + first, then the ++ instances are ruined and I end up with a mess.
You could split on this regex first:
(?<!\+)\+(?!\+)
That way, only the 'single' +'s are being split on, leaving you to parse the ++'s. Note that I am assuming that there cannot be three successive +'s.
The regex above simple says: "split on the '+' only if there's no '+' ahead or behind it".
Edit:
After reading that there can be more than 2 successive +'s, I recommend writing a small grammar and letting a parser-generator create a lexer+parser for your little language. ANTLR can generate C# source code as well.
Edit 2:
But before implementing any solution (parser or regex) you'd first have to define what is and what isn't valid input. If you're going to let more than two successive +'s be valid, ie. 1+++++5, which is [1++, +, ++5], I'd write a little grammar. See this tutorial how that works: http://www.antlr.org/wiki/display/ANTLR3/Quick+Starter+on+Parser+Grammars+-+No+Past+Experience+Required
And if you're going to reject input of more than 2 successive +'s, you can use either Lasse's or my (first) regex-suggestion.
Here's some code that uses regular expressions.
Note that the issue raised by Bart in the comments to your question, ie. "How do you handle 1+++5", is not handled at all.
To fix that, unless your code is already out in the wild and not subject to change of behaviour, I would suggest you change your syntax to the following:
use .. to denote ranges
allow both + and - for numbers, for positive and negative numbers
use comma and/or semicolon to separate distinct numbers or ranges
allow whitespace
Look at the difference between the two following strings:
1++7+10++15+20+30++
1..7, 10..15, 20, 30..
The second string is much easier to parse, and much easier to read.
It would also remove all ambiguity:
1+++5 = 1++ + 5 = 1.., 5
1+++5 = 1 + ++5 = 1, ..5
There's no way to parse wrong the second syntax.
Anyway, here's my code. Basically it works by adding four regex patterns for the four types of patterns:
num
num++
++num
num++num
For "num", it will handle negative numbers with a leading minus sign, and one or more digits. It does not, for obvious reasons, handle the plus sign as part of the number.
I've interpreted "and up" to mean "up to Int32.MaxValue" and same for down to Int32.MinValue.
public class Range
{
public readonly Int32 From;
public readonly Int32 To;
public Range(Int32 from, Int32 to)
{
From = from;
To = to;
}
public override string ToString()
{
if (From == To)
return From.ToString();
else if (From == Int32.MinValue)
return String.Format("++{0}", To);
else if (To == Int32.MaxValue)
return String.Format("{0}++", From);
else
return String.Format("{0}++{1}", From, To);
}
}
public static class RangeSplitter
{
public static Range[] Split(String s)
{
if (s == null)
throw new ArgumentNullException("s");
String[] parts = new Regex(#"(?<!\+)\+(?!\+)").Split(s);
List<Range> result = new List<Range>();
var patterns = new Dictionary<Regex, Action<Int32[]>>();
patterns.Add(new Regex(#"^(-?\d+)$"),
values => result.Add(new Range(values[0], values[0])));
patterns.Add(new Regex(#"^(-?\d+)\+\+$"),
values => result.Add(new Range(values[0], Int32.MaxValue)));
patterns.Add(new Regex(#"^\+\+(-?\d+)$"),
values => result.Add(new Range(Int32.MinValue, values[0])));
patterns.Add(new Regex(#"^(-?\d+)\+\+(-?\d+)$"),
values => result.Add(new Range(values[0], values[1])));
foreach (String part in parts)
{
foreach (var kvp in patterns)
{
Match ma = kvp.Key.Match(part);
if (ma.Success)
{
Int32[] values = ma.Groups
.OfType<Group>()
.Skip(1) // group 0 is the entire match
.Select(g => Int32.Parse(g.Value))
.ToArray();
kvp.Value(values);
}
}
}
return result.ToArray();
}
}
Unit-tests:
[TestFixture]
public class RangeSplitterTests
{
[Test]
public void Split_NullString_ThrowsArgumentNullException()
{
Assert.Throws<ArgumentNullException>(() =>
{
var result = RangeSplitter.Split(null);
});
}
[Test]
public void Split_EmptyString_ReturnsEmptyArray()
{
Range[] result = RangeSplitter.Split(String.Empty);
Assert.That(result.Length, Is.EqualTo(0));
}
[TestCase(01, "++7", Int32.MinValue, 7)]
[TestCase(02, "7", 7, 7)]
[TestCase(03, "7++", 7, Int32.MaxValue)]
[TestCase(04, "1++7", 1, 7)]
public void Split_SinglePatterns_ProducesExpectedRangeBounds(
Int32 testIndex, String input, Int32 expectedLower,
Int32 expectedUpper)
{
Range[] result = RangeSplitter.Split(input);
Assert.That(result.Length, Is.EqualTo(1));
Assert.That(result[0].From, Is.EqualTo(expectedLower));
Assert.That(result[0].To, Is.EqualTo(expectedUpper));
}
[TestCase(01, "++7")]
[TestCase(02, "7++")]
[TestCase(03, "1++7")]
[TestCase(04, "1+7")]
[TestCase(05, "1++7+10++15+20+30++")]
public void Split_ExamplesFromQuestion_ProducesCorrectResults(
Int32 testIndex, String input)
{
Range[] ranges = RangeSplitter.Split(input);
String rangesAsString = String.Join("+",
ranges.Select(r => r.ToString()).ToArray());
Assert.That(rangesAsString, Is.EqualTo(input));
}
[TestCase(01, 10, 10, "10")]
[TestCase(02, 1, 10, "1++10")]
[TestCase(03, Int32.MinValue, 10, "++10")]
[TestCase(04, 10, Int32.MaxValue, "10++")]
public void RangeToString_Patterns_ProducesCorrectResults(
Int32 testIndex, Int32 lower, Int32 upper, String expected)
{
Range range = new Range(lower, upper);
Assert.That(range.ToString(), Is.EqualTo(expected));
}
}
Related
I am not very experienced with C# and have got a task to develop a simple program that can take two parameters; one string and one integer. The string is to be returned, and the integer is supposed to define the max number of each specific character that is returned in the string, i.e.:
input: "aaabbbbc", "2" output: aabbc
input: "acc", "1" output: ac
I've tried looking at different collections like IEnumerator to help make the code easier to write, but as I'm not very experienced I can't sort out how to utilize them.
This is the code that I've written so far:
public static string TwoParameters()
{
Console.Write("Write some characters (i.e. 'cccaabbb'): ");
string myString = Console.ReadLine();
return myString;
Console.Write("Write a number - ");
int max = Convert.ToInt32(Console.Read());
}
public static void Counter(string myString, int max)
{
string myString = TwoParameters(myString);
foreach (char a in myString)
{
var occurences = myString.Count(x => x == 'a');
if (occurences > max)
max = occurences;
}
}
Errors I get when running:
CS0136: Local or parameter 'myString' cannot be declared in scope because of enclosing local scope.
CS1501: No overload for method 'TwoParameters' takes 1 arg.
CS1061: 'string' does not contain a definition for count.
CS0165: Use of unassigned local variable 'myString'.
CS7036: There is no argument given that corresponds to the required formal parameter 'myString' of 'Program.Counter(string, int)'
Any kind of pointers to what I'm doing wrong, suggestions to how I can improve my code and/or finish it up for the program to make the output will be hugely appreciated.
A string can be treated as an IEnumerable<char>. You can use LINQ to first group the characters then take only 2 from each group, eg :
var input="aaabbbbc";
var max=2;
var chars=input.GroupBy(c=>c)
.SelectMany(g=>g.Take(2))
.ToArray();
var result=new String(chars);
This produces
aabbc
This query groups the characters together with GroupBy and then takes only max from each group with Take. SelectMany flattens all the IEnumerable<char> returned from Take into a single IEnumerable<char> that can be used to create a string
This function would also respect the order within the string, so aabcabc, 2 would result into aabcbc:
static string ReturnMaxOccurences(string source, int count)
{
return source.Aggregate(new Accumulator(), (acc, c) =>
{
acc.Histogram.TryGetValue(c, out int charCount);
if (charCount < count)
acc.Result.Append(c);
acc.Histogram[c] = ++charCount;
return acc;
}, acc => acc.Result.ToString());
}
But you need also this little helper class:
public class Accumulator
{
public Dictionary<char, int> Histogram { get; } = new Dictionary<char, int>();
public StringBuilder Result { get; } = new StringBuilder();
}
This method iterates the whole string and save within a histogram the occurences of each character. If the value is lower than the desired max value it will be added to the result string, otherwise it simply steps over the character and continues with the next.
Pointers to what you are doing wrong:
you reset your given max to the counts in your string
you only handle "a"
your TwoParameters function has unreachable code
you try to declare a variable name again already providing it to the function as parameter
you do not build a string to output
Using Linq is probably somewhat overkill for your level of knowledge. This is a simpler Version of Oliver's answer - respecting order of letters as well:
public static void Main()
{
var input = "abbaaabcbcccd";
var max = 2;
// stores count of characters that we added already
var occurences = new Dictionary<char,int>();
var sb = new StringBuilder();
foreach (var c in input)
{
// add with a count of 0 if not yet encountered
if (!occurences.ContainsKey(c))
{
// if you want only "consecutive" letters not repeated over max:
// uncomment the following line that resets your dict - you might
// want to use a single integer and character instead in that case
// occurences.Clear(); // better use a single int-counter instead
occurences[c] = 0;
}
// add character if less then max occurences
if (occurences[c] < max)
{
sb.Append(c);
occurences[c]+=1;
}
}
Console.WriteLine(sb.ToString());
}
Output:
abbaccd
I want to ellaborate a regex that covers the following scenarios:
The searched word is "potato".
It matches if the user searches for "potaot" (He typed quickly and the "o" finger was faster than the "t" finger. (done)
It matches if the user searches for "ptato" (He forgot one letter). (done)
With my knowlege of regex the further I could go was:
(?=[potato]{5,6})p?o?t?a?t?o?
The problem with this is that it matches reversed words like "otatop", which is a little clever but a little bezarre, and "ooooo", which is totally undesirable. So not I describe what I don't want.
I don't want repeated letters to match "ooooo", "ooopp" and etc. (unable)
By the way I'm using C#.
Don't use regular expressions.
The best solution is the simple one. There are only eleven possible inexact matches, so just enumerate them:
List<string> inexactMatches = new List<string> {
"otato", "ptato", "poato", "potto", "potao", "potat",
"optato", "ptoato", "poatto", "pottao", "potaot"};
...
bool hasInexactMatch = inexactMatches.Contains(whatever);
It took less than a minute to type those out; use the easy specific solution rather than trying to do some crazy regular expression that's going to take you hours to find and debug.
If you're going to insist on using a regular expression, here's one that works:
otato|ptato|poato|potto|potao|potat|optato|ptoato|poatto|pottao|potaot
Again: simpler is better.
Now, one might suppose that you want to solve this problem for more words than "potato". In that case, you might have said so -- but regardless, we can come up with some easy solutions.
First, let's enumerate all the strings that have an omission of one letter from a target string. Strings are IEnumerable<char> so let's solve the general problem:
static IEnumerable<T> OmitAt<T>(this IEnumerable<T> items, int i) =>
items.Take(i).Concat(items.Skip(i + 1));
That's a bit gross enumerating the sequence twice but I'm not going to stress about it. Now let's make a specific version for strings:
static IEnumerable<string> Omits(this string s) =>
Enumerable
.Range(0, s.Length)
.Select(i => new string(s.OmitAt(i).ToArray()));
Great. Now we can say "frob".Omits() and get back rob, fob, frb, fro.
Now let's do the swaps. Again, solve the general problem first:
static void Swap<T>(ref T x, ref T y)
{
T t = x;
x = y;
y = t;
}
static T[] SwapAt<T>(this IEnumerable<T> items, int i)
{
T[] newItems = items.ToArray();
Swap(ref newItems[i], ref newItems[i + 1]);
return newItems;
}
And now we can solve it for strings:
static IEnumerable<string> Swaps(this string s) =>
Enumerable
.Range(0, s.Length - 1)
.Select(i => new string(s.SwapAt(i)));
And now we're done:
string source = "potato";
string target = whatever;
bool match =
source.Swaps().Contains(target) ||
source.Omits().Contains(target);
Easy peasy. Solve general problems using simple, straightforward, correct algorithms that can be composed into larger solutions. None of my algorithms there was more than three lines long and they can easily be seen to be correct.
The weapon of choice here is a similarity (or distance) matching algorithm.
Compare similarity algorithms gives a good overview of the most common distance metrics/algorithms.
The problem is that there is no single best metric. The choice depends, e.g. on input type, accuracy requirements, speed, resources availability, etc. Nevertheless, comparing algorithms can be messy.
Two of the most commonly suggested metrics are the Levenshtein distance and Jaro-Winkler:
Levenshtein distance, which provides a similarity measure between two strings, is arguably less forgiving, and more intuitive to understand than some other metrics. (There are modified versions of the Levenshtein distance like the Damerau-Levenshtein distance, which includes transpositions, that could be even more appropriate to your use case.)
Some claim the Jaro-Winkler, which provides a similarity measure between two strings allowing for character transpositions to a degree adjusting the weighting for common prefixes, distance is "one of the most performant and accurate approximate string matching algorithms currently available [Cohen, et al.], [Winkler]." However, the choice still depends very much on the use case and one cannot draw general conclusions from specific studies, e.g. name-matching Cohen, et al. 2003.
You can find plenty of packages on NuGet that offer you a variety of similarity algorithms (a, b, c), fuzzy matches, phonetic, etc. to add this feature to your site or app.
Fuzzy matching can also be used directly on the database layer. An implementation of the Levenshtein distance can be found for most database systems (e.g. MySQL, SQL Server) or is already built-in (Oracle, PostgreSQL).
Depending on your exact use case, you could also use a cloud-based solution (i.e. use a microservice based on AWS, Azure, etc. or roll-your-own) to get autosuggest-like fuzzy search/matching.
It's easiest to do it this way:
static void Main(string[] args)
{
string correctWord = "Potato";
string incorrectSwappedWord = "potaot";
string incorrectOneLetter = "ptato";
// Returns true
bool swapped = SwappedLettersMatch(correctWord, incorrectSwappedWord);
// Returns true
bool oneLetter = OneLetterOffMatch(correctWord, incorrectOneLetter);
}
public static bool OneLetterOffMatch(string str, string input)
{
int ndx = 0;
str = str.ToLower();
input = input.ToLower();
if (string.IsNullOrWhiteSpace(str) || string.IsNullOrWhiteSpace(input))
{
return false;
}
while (ndx < str.Length)
{
string newStr = str.Remove(ndx, 1);
if (input == newStr)
{
return true;
}
ndx++;
}
return false;
}
public static bool SwappedLettersMatch(string str, string input)
{
if (string.IsNullOrWhiteSpace(str) || string.IsNullOrWhiteSpace(input))
{
return false;
}
if (str.Length != input.Length)
{
return false;
}
str = str.ToLower();
input = input.ToLower();
int ndx = 0;
while (ndx < str.Length)
{
if (ndx == str.Length - 1)
{
return false;
}
string newStr = str[ndx + 1].ToString() + str[ndx];
if (ndx > 0)
{
newStr = str.Substring(0, ndx) + newStr;
}
if (str.Length > ndx + 2)
{
newStr = newStr + str.Substring(ndx + 2);
}
if (newStr == input)
{
return true;
}
ndx++;
}
return false;
}
OneLetterOffMatch will return true is there is a match that's off by just one character missing. SwappedLettersMatch will return true is there is a match when just two of the letters are swapped. These functions are case-insenstive, but if you want a case-sensitive version, just remove the calls to .ToLower().
I have a text file full of strings, one on each line. Some of these strings will contain an unknown number of "#" characters. Each "#" can represent the numbers 1, 2, 3, or 4. I want to generate all possible combinations (permutations?) of strings for each of those "#"s. If there were a set number of "#"s per string, I'd just use nested for loops (quick and dirty). I need help finding a more elegant way to do it with an unknown number of "#"s.
Example 1: Input string is a#bc
Output strings would be:
a1bc
a2bc
a3bc
a4bc
Example 2: Input string is a#bc#d
Output strings would be:
a1bc1d
a1bc2d
a1bc3d
a1bc4d
a2bc1d
a2bc2d
a2bc3d
...
a4bc3d
a4bc4d
Can anyone help with this one? I'm using C#.
This is actually a fairly good place for a recursive function. I don't write C#, but I would create a function List<String> expand(String str) which accepts a string and returns an array containing the expanded strings.
expand can then search the string to find the first # and create a list containing the first part of the string + expansion. Then, it can call expand on the last part of the string and add each element in it's expansion to each element in the last part's expansion.
Example implementation using Java ArrayLists:
ArrayList<String> expand(String str) {
/* Find the first "#" */
int i = str.indexOf("#");
ArrayList<String> expansion = new ArrayList<String>(4);
/* If the string doesn't have any "#" */
if(i < 0) {
expansion.add(str);
return expansion;
}
/* New list to hold the result */
ArrayList<String> result = new ArrayList<String>();
/* Expand the "#" */
for(int j = 1; j <= 4; j++)
expansion.add(str.substring(0,i-1) + j);
/* Combine every expansion with every suffix expansion */
for(String a : expand(str.substring(i+1)))
for(String b : expansion)
result.add(b + a);
return result;
}
I offer you here a minimalist approach for the problem at hand.
Yes, like other have said recursion is the way to go here.
Recursion is a perfect fit here, since we can solve this problem by providing the solution for a short part of the input and start over again with the other part until we are done and merge the results.
Every recursion must have a stop condition - meaning no more recursion needed.
Here my stop condition is that there are no more "#" in the string.
I'm using string as my set of values (1234) since it is an IEnumerable<char>.
All other solutions here are great, Just wanted to show you a short approach.
internal static IEnumerable<string> GetStrings(string input)
{
var values = "1234";
var permutations = new List<string>();
var index = input.IndexOf('#');
if (index == -1) return new []{ input };
for (int i = 0; i < values.Length; i++)
{
var newInput = input.Substring(0, index) + values[i] + input.Substring(index + 1);
permutations.AddRange(GetStrings(newInput));
}
return permutations;
}
An even shorter and cleaner approach with LINQ:
internal static IEnumerable<string> GetStrings(string input)
{
var values = "1234";
var index = input.IndexOf('#');
if (index == -1) return new []{ input };
return
values
.Select(ReplaceFirstWildCardWithValue)
.SelectMany(GetStrings);
string ReplaceFirstWildCardWithValue(char value) => input.Substring(0, index) + value + input.Substring(index + 1);
}
This is shouting out loud for a recursive solution.
First, lets make a method that generates all combinations of a certain length from a given set of values. Because we are only interested in generating strings, lets take advantage of the fact that string is immutable (see P.D.2); this makes recursive functions so much easier to implement and reason about:
static IEnumerable<string> GetAllCombinations<T>(
ISet<T> set, int length)
{
IEnumerable<string> getCombinations(string current)
{
if (current.Length == length)
{
yield return current;
}
else
{
foreach (var s in set)
{
foreach (var c in getCombinations(current + s))
{
yield return c;
}
}
}
}
return getCombinations(string.Empty);
}
Study carefully how this methods works. Work it out by hand for small examples to understand it.
Now, once we know how to generate all possible combinations, building the strings is easy:
Figure out the number of wildcards in the specified string: this will be our combination length.
For every combination, insert in order each character into the string where we encounter a wildcard.
Ok, lets do just that:
public static IEnumerable<string> GenerateCombinations<T>(
this string s,
IEnumerable<T> set,
char wildcard)
{
var length = s.Count(c => c == wildcard);
var combinations = GetAllCombinations(set, length);
var builder = new StringBuilder();
foreach (var combination in combinations)
{
var index = 0;
foreach (var c in s)
{
if (c == wildcard)
{
builder.Append(combination[index]);
index += 1;
}
else
{
builder.Append(c);
}
}
yield return builder.ToString();
builder.Clear();
}
}
And we're done. Usage would be:
var set = new HashSet<int>(new[] { 1, 2, 3, 4 });
Console.WriteLine(
string.Join("; ", "a#bc#d".GenerateCombinations(set, '#')));
And sure enough, the output is:
a1bc1d; a1bc2d; a1bc3d; a1bc4d; a2bc1d; a2bc2d; a2bc3d;
a2bc4d; a3bc1d; a3bc2d; a3bc3d; a3bc4d; a4bc1d; a4bc2d;
a4bc3d; a4bc4d
Is this the most performant or efficient implementation? Probably not but its readable and maintainable. Unless you have a specific performance goal you are not meeting, write code that works and is easy to understand.
P.D. I’ve omitted all error handling and argument validation.
P.D.2: if the length of the combinations is big, concatenting strings inside GetAllCombinations might not be a good idea. In that case I’d have GetAllCombinations return an IEnumerable<IEnumerable<T>>, implement a trivial ImmutableStack<T>, and use that as the combination buffer instead of string.
In C#, given the array :
string[] myStrings = new string[] {
"test#test",
"##test",
"######", // Winner (outputs 6)
};
How can I find the maximum number of occurrences that the character # appears in a single string ?
My current solution is :
int maxOccurrences = 0;
foreach (var myString in myStrings)
{
var occurrences = myString.Count(x => x == '#');
if (occurrences > maxOccurrences)
{
maxOccurrences = occurrences;
}
}
return maxOccurrences;
Is their a simplier way using linq that can act directly on the myStrings[] array ?
And can this be made into an extension method that can work on any IEnumerable<string> ?
First of all let's project your strings into a sequence with count of matches:
myStrings.Select(x => x.Count(x => x == '#')) // {1, 2, 6} in your example
Then pick maximum value:
int maximum = myStrings
.Select(s => s.Count(x => x == '#'))
.Max(); // 6 in your example
Let's make an extension method:
public static int CountMaximumOccurrencesOf(this IEnumerable<string> strings, char ch)
{
return strings
.Select(s => s.Count(c => c == ch))
.Max();
}
However there is a big HOWEVER. What in C# you call char is not what you call character in your language. This has been widely discussed in other posts, for example: Fastest way to split a huge text into smaller chunks and How can I perform a Unicode aware character by character comparison? then I won't repeat everything here. To be "Unicode aware" you need to make your code more complicate (please note code is wrote here then it's untested):
private static IEnumerable<string> EnumerateCharacters(string s)
{
var enumerator = StringInfo.GetTextElementEnumerator(s.Normalize());
while (enumerator.MoveNext())
yield return (string)enumerator.Value;
}
Then change our original code to:
public static int CountMaximumOccurrencesOf(this IEnumerable<string> strings, string character)
{
return strings
.Select(s => s.EnumerateCharacters().Count(c => String.Equals(c, character, StringComparison.CurrentCulture))
.Max();
}
Note that Max() alone requires collection to don't be empty (use DefaultIfEmpty() if collection may be empty and it's not an error). To do not arbitrary decide what to do in this situation (throw an exception if it should happen or just return 0) you can may make this method less specialized and leave this responsibility to caller:
public static int CountOccurrencesOf(this IEnumerable<string> strings,
string character,
StringComparison comparison = StringComparison.CurrentCulture)
{
Debug.Assert(character.EnumerateCharacters().Count() == 1);
return strings
.Select(s => s.EnumerateCharacters().Count(c => String.Equals(c, character, comparison ));
}
Used like this:
var maximum = myStrings.CountOccurrencesOf("#").Max();
If you need it case-insensitive:
var maximum = myStrings.CountOccurrencesOf("à", StringComparison.CurrentCultureIgnoreCase)
.Max();
As you can now imagine this comparison isn't limited to some esoteric languages but it also applies to invariant culture (en-US) then for strings that must always be compared with invariant culture you should specify StringComparison.InvariantCulture. Don't forget that you may need to call String.Normalize() also for input character.
You can write something like this. Note the usage of DefaultIfEmpty, to not throw an exception if myStrings is empty, but revert to 0.
var maximum = myStrings.Select(e => e.Count(ee => ee == '#')).DefaultIfEmpty().Max();
You can do that with Linq combined to Regex:
myStrings.Select(x => Regex.Matches(x, "#").Count).max();
I have a list that needs ordering say:
R1-1
R1-11
R2-2
R1-2
this needs to be ordered:
R1-1
R1-2
R1-11
R2-2
Currently I am using the C# Regex.Replace method and adding a 0 before the occurance of single numbers at the end of a string with something like:
Regex.Replace(inString,#"([1-9]$)", #"0$2")
I'm sure there is a nicer way to do this which I just can't figure out.
Does anyone have a nice way of sorting letter and number strings with regex?
I have used Greg's method below to complete this and just thought I should add the code I am using for completeness:
public static List<Rack> GetRacks(Guid aisleGUID)
{
log.Debug("Getting Racks with aisleId " + aisleGUID);
List<Rack> result = dataContext.Racks.Where(
r => r.aisleGUID == aisleGUID).ToList();
return result.OrderBy(r => r.rackName, new NaturalStringComparer()).ToList();
}
I think what you're after is natural sort order, like Windows Explorer does? If so then I wrote a blog entry a while back showing how you can achieve this in a few lines of C#.
Note: I just checked and using the NaturalStringComparer in the linked entry does return the order you are looking for with the example strings.
You can write your own comparator and use regular expressions to compare the number between "R" and "-" first, followed by the number after "-", if the first numbers are equal.
Sketch:
public int Compare(string x, string y)
{
int releaseX = ...;
int releaseY = ...;
int revisionX = ...;
int revisionY = ...;
if (releaseX == releaseY)
{
return revisionX - revisionY;
}
else
{
return releaseX - releaseY;
}
}