C# - Dealing with contradictions in string.replace

C# - Dealing with contradictions in string.replace - c#

I'm getting started with C# and programming in general and I've been playing with the "if" statements, arrays and generally getting to grips with things. However, one thing that has stumped me is how you would go about performing an replace operation which is inherently contradictory.
IE: I have string "AAABBB" but I want to search through my text and replace all "A"s with "B"s and vice-versa. So my intended output would be "BBBAAA".
I'm currently trying to use string.replace and if statements but it's not working (it follows the order of the statements, so in the above examples I'd get all "A" or all "B".
Code examples:
if (string.Contains("a"));
{
string = string.Replace("a", "b");
}
if (string.Contains("b"));
{
string = string.Replace("b", "a");
}
Any help would be super welcome!

If you're always replacing one character with another, it's probably simplest to convert it to a char[] and go through it one character at a time, fixing each one appropriately - rather than doing "all the As" and then "all the Bs".
public static string PerformReplacements(string text)
{
char[] chars = text.ToCharArray();
for (int i = 0; i < chars.Length; i++)
{
switch (chars[i])
{
case 'A':
chars[i] = 'B';
break;
case 'B':
chars[i] = 'A';
break;
}
}
return new string(chars);
}

Consider using Linq:
s = new string(s.Select(x => x == 'A' ? 'B' : x == 'B' ? 'A' : x).ToArray());

The reason why this fails is because all A's are first replaced by B's but then back to A's.
A generic way to solve this is the following:
using System.Linq;
using System.Text;
using System.Diagnostics.Contracts;
public class Foo {
public static string ParallelReplace (string text, char[] fromc, char[] toc) {
Contract.Requires(text != null);
Contract.Requires(fromc != null);
Contract.Requires(toc != null)
Contract.Requires(fromc.Length == toc.Length);
Contract.Ensures(Contract.Result<string>().Length == text.Length);
Array.Sort(fromc,toc);
StringBuilder sb = new StringBuilder();
foreach(char c in text) {
int i = Array.BinarySearch(fromc,c);
if(i >= 0) {
sb.Append(toc[i]);
} else {
sb.Append(c);
}
}
return sb.ToString();
}
}
Demo with csharp interactive shell:
csharp> Foo.ParallelReplace("ABasdsadsadaABABB",new char[] {'b','a','s'},new char[] {'f','s','a'});
"ABsadasdasdsABABB"
This represents a mapping {b->f,a->s,s->a}. The method works in O(s*log(n)+n*log(n)), with s the length of the string and n the number of rules.
The Contract's are not necessary, but can help if one uses a static code analysis tool to prevent making errors.

Related

Change indexed character of string

Im working on a program that changes a specified character, if its uppercase, it will change it to lowercase and vice versa.
I have written a piece of code that in theory should work. I have also written it with a foreach(char c in x) method but that wont work wither. Any tips?
Expected output teSTSTring
Given output TEststRING
string x = "TEststRING";
for (int i = 0; i < x.Length; i++)
{
if (char.IsUpper(x[i]))
{
char.ToLower(x[i]);
}
if (char.IsLower(x[i]))
{
char.ToUpper(x[i]);
}
}
Console.WriteLine(x);

Here is a solution with StringBuilder
string x = "TEststRING";
car sb = new StringBuilder();
foreach (car c in x)
{
if (char.IsUpper(c))
{
sb.Append(char.ToLower(c));
}
else
{
sb.Append(char.ToUpper(c));
}
}
Console.WriteLine(sb.ToString());

You are not changing the input string. I would suggest you use the below code.
It will handle the cases where you have characters other than alphabets as well.
string output = new string(x.Select(c => char.IsLetter(c) ? (char.IsUpper(c) ? char.ToLower(c) : char.ToUpper(c)) : c).ToArray());

Here is an alternative to my foreach version. This alternative will use Linq like vivek nuna's answer. Note that Linq is a powerful part of C#, while you should dig it, it is quite advanced. Here is my version of it:
char SwitchCase(char c) => char.IsUpper(c) ? char.ToLower(c) : char.ToUpper(c);
var x = "TEststRING(1)";
var output = string.Concat(x.Select(SwitchCase));
Console.WriteLine(output);
Changes include:
Use string.Concat to, well, concat the IEnumerable into a string instead of using ToArray() with new string()
Remove the char.IsLetter check, ToUpper already handles it.
Create a SwitchCase function to name the logic
Try it online!

Recommended way of checking if a certain string has a specified character more than once

Firstly I understand that there are several ways to do this and I do have some code which runs, but what I just wanted to find out was if anyone else has a recommended way to do this. Say I have a string which I already know that would have contain a specific character (a ‘,’ in this case). Now I just want to validate that this comma is used only once and not more. I know iterating through each character is an option but why go through all that work because I just want to make sure that this special character is not used more than once, I’m not exactly interested in the count per se. The best I could think was to use the split and here is some sample code that works. Just curious to find out if there is a better way to do this.
In summary,
I have a certain string in which I know has a special character (‘,’ in this case)
I want to validate that this special character has only been used once in this string
const char characterToBeEvaluated = ',';
string myStringToBeTested = "HelloWorldLetus,code";
var countOfIdentifiedCharacter = myStringToBeTested.Split(characterToBeEvaluated).Length - 1;
if (countOfIdentifiedCharacter == 1)
{
Console.WriteLine("Used exactly once as expected");
}
else
{
Console.WriteLine("Used either less than or more than once");
}

You can use string's IndexOf methods:
const char characterToBeEvaluated = ',';
string myStringToBeTested = "HelloWorldLetus,code";
string substringToFind = characterToBeEvaluated.ToString();
int firstIdx = myStringToBeTested.IndexOf(substringToFind, StringComparison.Ordinal);
bool foundOnce = firstIdx >= 0;
bool foundTwice = foundOnce && myStringToBeTested.IndexOf(substringToFind, firstIdx + 1, StringComparison.Ordinal) >= 0;
Try it online

You could use the LINQ Count() method:
const char characterToBeEvaluated = ',';
string myStringToBeTested = "HelloWorldLetus,code";
var countOfIdentifiedCharacter = myStringToBeTested.Count(x => x == characterToBeEvaluated);
if (countOfIdentifiedCharacter == 1)
{
Console.WriteLine("Used exactly once as expected");
}
else
{
Console.WriteLine("Used either less than or more than once");
}
This is the most readable and simplest approach and is great if you need to know the exact count but for your specific case #ProgrammingLlama's answer is better in terms of efficiency.

Adding another answer using a custom method:
public static void Main()
{
const char characterToBeEvaluated = ',';
string myStringToBeTested = "HelloWorldLetus,code";
var characterAppearsOnlyOnce = DoesCharacterAppearOnlyOnce(characterToBeEvaluated, myStringToBeTested);
if (characterAppearsOnlyOnce)
{
Console.WriteLine("Used exactly once as expected");
}
else
{
Console.WriteLine("Used either less than or more than once");
}
}
public static bool DoesCharacterAppearOnlyOnce(char characterToBeEvaluated, string stringToBeTested)
{
int count = 0;
for (int i = 0; i < stringToBeTested.Length && count < 2; ++i)
{
if (stringToBeTested[i] == characterToBeEvaluated)
{
++count;
}
}
return count == 1;
}
The custom method DoesCharacterAppearOnlyOnce() performs better than the method using IndexOf() for smaller strings - probably due to the overhead calling IndexOf. As the strings get larger the IndexOf method is better.

Generate all combinations with unknown number of slots

I have a text file full of strings, one on each line. Some of these strings will contain an unknown number of "#" characters. Each "#" can represent the numbers 1, 2, 3, or 4. I want to generate all possible combinations (permutations?) of strings for each of those "#"s. If there were a set number of "#"s per string, I'd just use nested for loops (quick and dirty). I need help finding a more elegant way to do it with an unknown number of "#"s.
Example 1: Input string is a#bc
Output strings would be:
a1bc
a2bc
a3bc
a4bc
Example 2: Input string is a#bc#d
Output strings would be:
a1bc1d
a1bc2d
a1bc3d
a1bc4d
a2bc1d
a2bc2d
a2bc3d
...
a4bc3d
a4bc4d
Can anyone help with this one? I'm using C#.

This is actually a fairly good place for a recursive function. I don't write C#, but I would create a function List<String> expand(String str) which accepts a string and returns an array containing the expanded strings.
expand can then search the string to find the first # and create a list containing the first part of the string + expansion. Then, it can call expand on the last part of the string and add each element in it's expansion to each element in the last part's expansion.
Example implementation using Java ArrayLists:
ArrayList<String> expand(String str) {
/* Find the first "#" */
int i = str.indexOf("#");
ArrayList<String> expansion = new ArrayList<String>(4);
/* If the string doesn't have any "#" */
if(i < 0) {
expansion.add(str);
return expansion;
}
/* New list to hold the result */
ArrayList<String> result = new ArrayList<String>();
/* Expand the "#" */
for(int j = 1; j <= 4; j++)
expansion.add(str.substring(0,i-1) + j);
/* Combine every expansion with every suffix expansion */
for(String a : expand(str.substring(i+1)))
for(String b : expansion)
result.add(b + a);
return result;
}

I offer you here a minimalist approach for the problem at hand.
Yes, like other have said recursion is the way to go here.
Recursion is a perfect fit here, since we can solve this problem by providing the solution for a short part of the input and start over again with the other part until we are done and merge the results.
Every recursion must have a stop condition - meaning no more recursion needed.
Here my stop condition is that there are no more "#" in the string.
I'm using string as my set of values (1234) since it is an IEnumerable<char>.
All other solutions here are great, Just wanted to show you a short approach.
internal static IEnumerable<string> GetStrings(string input)
{
var values = "1234";
var permutations = new List<string>();
var index = input.IndexOf('#');
if (index == -1) return new []{ input };
for (int i = 0; i < values.Length; i++)
{
var newInput = input.Substring(0, index) + values[i] + input.Substring(index + 1);
permutations.AddRange(GetStrings(newInput));
}
return permutations;
}
An even shorter and cleaner approach with LINQ:
internal static IEnumerable<string> GetStrings(string input)
{
var values = "1234";
var index = input.IndexOf('#');
if (index == -1) return new []{ input };
return
values
.Select(ReplaceFirstWildCardWithValue)
.SelectMany(GetStrings);
string ReplaceFirstWildCardWithValue(char value) => input.Substring(0, index) + value + input.Substring(index + 1);
}

This is shouting out loud for a recursive solution.
First, lets make a method that generates all combinations of a certain length from a given set of values. Because we are only interested in generating strings, lets take advantage of the fact that string is immutable (see P.D.2); this makes recursive functions so much easier to implement and reason about:
static IEnumerable<string> GetAllCombinations<T>(
ISet<T> set, int length)
{
IEnumerable<string> getCombinations(string current)
{
if (current.Length == length)
{
yield return current;
}
else
{
foreach (var s in set)
{
foreach (var c in getCombinations(current + s))
{
yield return c;
}
}
}
}
return getCombinations(string.Empty);
}
Study carefully how this methods works. Work it out by hand for small examples to understand it.
Now, once we know how to generate all possible combinations, building the strings is easy:
Figure out the number of wildcards in the specified string: this will be our combination length.
For every combination, insert in order each character into the string where we encounter a wildcard.
Ok, lets do just that:
public static IEnumerable<string> GenerateCombinations<T>(
this string s,
IEnumerable<T> set,
char wildcard)
{
var length = s.Count(c => c == wildcard);
var combinations = GetAllCombinations(set, length);
var builder = new StringBuilder();
foreach (var combination in combinations)
{
var index = 0;
foreach (var c in s)
{
if (c == wildcard)
{
builder.Append(combination[index]);
index += 1;
}
else
{
builder.Append(c);
}
}
yield return builder.ToString();
builder.Clear();
}
}
And we're done. Usage would be:
var set = new HashSet<int>(new[] { 1, 2, 3, 4 });
Console.WriteLine(
string.Join("; ", "a#bc#d".GenerateCombinations(set, '#')));
And sure enough, the output is:
a1bc1d; a1bc2d; a1bc3d; a1bc4d; a2bc1d; a2bc2d; a2bc3d;
a2bc4d; a3bc1d; a3bc2d; a3bc3d; a3bc4d; a4bc1d; a4bc2d;
a4bc3d; a4bc4d
Is this the most performant or efficient implementation? Probably not but its readable and maintainable. Unless you have a specific performance goal you are not meeting, write code that works and is easy to understand.
P.D. I’ve omitted all error handling and argument validation.
P.D.2: if the length of the combinations is big, concatenting strings inside GetAllCombinations might not be a good idea. In that case I’d have GetAllCombinations return an IEnumerable<IEnumerable<T>>, implement a trivial ImmutableStack<T>, and use that as the combination buffer instead of string.

Slimming down a switch statement

Wondering if there are good alternatives to this that perform no worse than what I have below? The real switch statement has additional sections for other non-English characters.
Note that I'd love to put multiple case statements per line, but StyleCop doesn't like it and will fail our release build as a result.
var retVal = String.Empty;
switch(valToCheck)
{
case "é":
case "ê":
case "è":
case "ë":
retVal = "e";
break;
case "à":
case "â":
case "ä":
case "å":
retVal = "a";
break;
default:
retVal = "-";
break;
}

The first thing that comes to mind is a Dictionary<char,char>()
(I prefer char instead of strings because you are dealing with chars)
Dictionary<char,char> dict = new Dictionary<char,char>();
dict.Add('å', 'a');
......
then you could remove your entire switch
char retValue;
char testValue = 'å';
if(dict.TryGetValue(testValue, out retValue) == false)
retVal = '-';

Well, start off by doing this transformation.
public class CharacterSanitizer
{
private static Dictionary<string, string> characterMappings = new Dictionary<string, string>();
static CharacterSanitizer()
{
characterMappings.Add("é", "e");
characterMappings.Add("ê", "e");
//...
}
public static string mapCharacter(string input)
{
string output;
if (characterMappings.TryGetValue(input, out output))
{
return output;
}
else
{
return input;
}
}
}
Now you're in the position where the character mappings are part of the data, rather than the code. I've hard coded the values here, but at this point it is simple enough to store the mappings in a file, read in the file and then populate the dictionary accordingly. This way you can not only clean up the code a lot by reducing the case statement to one bit text file (outside of code) but you can modify it without needing to re-compile.

You could make a small range check and look at the ascii values.
Assuming InRange(val, min, max) checks if a number is, yep, in range..
if(InRange(System.Convert.ToInt32(valToCheck),232,235))
return 'e';
else if(InRange(System.Convert.ToInt32(valToCheck),224,229))
return 'a';
This makes the code a little confusing, and depends on the standard used, but perhaps something to consider.

This answer presumes that you are going to apply that switch statement to a string, not just to single characters (though that would also work).
The best approach seems to be the one outlined in this StackOverflow answer.
I adapted it to use LINQ:
var chars = from character in valToCheck.Normalize(NormalizationForm.FormD)
where CharUnicodeInfo.GetUnicodeCategory(character)
!= UnicodeCategory.NonSpacingMark
select character;
return string.Join("", chars).Normalize(NormalizationForm.FormC);
you'll need a using directive for System.Globalization;
Sample input:
string valToCheck = "êéÈöü";
Sample output:
eeEou

Based on Michael Kaplan's RemoveDiacritics(), you could do something like this:
static char RemoveDiacritics(char c)
{
string stFormD = c.ToString().Normalize(NormalizationForm.FormD);
StringBuilder sb = new StringBuilder();
for (int ich = 0; ich < stFormD.Length; ich++)
{
UnicodeCategory uc = CharUnicodeInfo.GetUnicodeCategory(stFormD[ich]);
if (uc != UnicodeCategory.NonSpacingMark)
{
sb.Append(stFormD[ich]);
}
}
return (sb.ToString()[0]);
}
switch(RemoveDiacritics(valToCheck))
{
case 'e':
//...
break;
case 'a':
//...
break;
//...
}
or, potentially even:
retval = RemoveDiacritics(valToCheck);

Use Contains instead of switch.
var retVal = String.Empty;
string es = "éêèë";
if (es.Contains(valToCheck)) retVal = "e";
//etc.

Should a regular expression used to break up lines account for unix/dos issue?

I didn't feel like using XML for the input file of my T4 so I made this snippet that splits up a document into chunks separated by a blank line.
Am I appropriately making the carriage return optional here?
string s = #"Default
Default
CurrencyConversion
Details of currency conversions.
BudgetReportCache
Indicates wheather the budget report is taken from query results or cache.";
string oneLine = #"[\r]\n";
string twoLines = #"[\r]\n[\r]\n";
var chunks = Regex.Split(s, twoLines, RegexOptions.Multiline);
var items = chunks.Select(c=>Regex.Split(c, oneLine, RegexOptions.Multiline)).ToDictionary(c=>c[0], c=>c[1]);
Note: I would never have thought of this, but since I started using Git, I have seen it "say" things that reminded me of the unix2dos issues, which in turn made me think of Mono and finally if I needed to deal with portability (assuming the goal is perfection).

Your regular expressions doesn't do what you think that they do. Putting \r inside a set doesn't accomplish anything; the expression [\r]\n means the same thing as just \r\n.
You can make the work using the ? operator:
string oneLine = #"\r?\n";
string twoLines = #"\r?\n\r?\n";
However, I would suggest that you use the regular String.Split method instead of regular expressions:
string[] oneLine = { #"\r\n", #"\n" };
string[] twoLines = { #"\r\n\r\n", #"\n\n" };
var chunks = s.Split(twoLines, StringSplitOptions.None);
var items =
chunks.Select(c => c.Split(oneLine, StringSplitOptions.None))
.ToDictionary(c => c[0], c => c[1]);

Yes, you should allow for different line separators, but that's not how you do it. The square brackets don't make their contents optional, and you aren't taking the old Mac-style \r into account. I'd use these regexes:
string oneLine = #"\r\n|[\r\n]";
string twoLines = #"(?:\r\n|[\r\n]){2}";
That's "carriage-return + linefeed OR carriage-return OR linefeed".
Also, you don't need the Multiline option. It only changes the meaning of the ^ and $ anchors, which you aren't using (and don't need to use).

If you want to go full hog on portability (and yes, I'm only adding this answer in response to Alan's mentioning of old Mac-style \r) then you want to cover:
*nix style: \n
DOS/Windows style: \r\n
Old Mac style: \r
EBCDIC style: \u0085 (probably slightly more current-day use than old mac, I'd guess).
Line-separator formatting character: \u2028
Paragraph-separator formatting character: \u2029
Let's just not dwell on the precise semantics of \u000B and \u000C and turn this into something sensible (eventually). If we were to try to deal with all of those. How would we do it?
With 6 different line-breaks, one of which is a combination of two of the others, but which should not be treated as two line-breaks, dealing with this in the reg-ex itself could be nasty.
Much better would be to filter them all out in a TextReader wrapper:
public class LineBreakNormaliser : TextReader
{
private readonly TextReader _source;
private bool isNewLine(int charAsInt)
{
switch(charAsInt)
{
case '\n': case '\r':
case '\u0085': case '\u2028': case '\u2029':
case '\u000B': case '\u000C':
return true;
default:
return false;
}
}
public LineBreakNormaliser(TextReader source)
{
_source = source;
}
public override void Close()
{
_source.Close();
base.Close();
}
protected override void Dispose(bool disposing)
{
if(disposing)
_source.Dispose();
base.Dispose(disposing);
}
public override int Peek()
{
int i = _source.Peek();
if(i == -1)
return -1;
if(isNewLine(i))
return '\n';
return i;
}
public override int Read()
{
int i = _source.Read();
if(i == -1)
return -1;
if(i == '\r')
{
if(_source.Peek() == '\n')
_source.Read(); //eat next half of CRLF pair.
return i;
}
if(isNewLine(i))
return '\n';
return i;
}
public override int Read(char[] buffer, int index, int count)
{
//We take advantage of the fact that we are allowed to return fewer than requested.
//ReadBlock does the work for us for those who need the full amount:
char[] tmpBuffer = new char[count];
int cChars = count = _source.Read(tmpBuffer, 0, count);
if(cChars == 0)
return 0;
for(int i = 0; i != cChars; ++i)
{
char cur = tmpBuffer[i];
if(cur == '\r')
{
if(i == cChars -1)
{
if(_source.Peek() == '\n')
{
_source.Read(); //eat second half of CRLF
--count;
}
}
else if(tmpBuffer[i + 1] == '\r')
{
++i;
--count;
}
buffer[index++] = '\n';
}
else if(isNewLine(cur))
buffer[index++] = '\n';
else
buffer[index++] = '\n';
}
return count;
}
}
If you read the file via this text reader, then from this point on your regex can depend the only newline being \n and so can any other code.
This done, the regex can actually be simpler than ever, and you while it's totally overkill for this single case (and only written because after Alan's mention of OS9 and earlier the idea of supporting IBM EBCDIC machines amused me), it is reusable for all other cases, in which context it's actually not over-kill at all, because it becomes "just use the well-tested line-normaliser to make things simpler". (Once it is well-tested that is, I haven't tested any of the above).

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

C# - Dealing with contradictions in string.replace - c#

Consider using Linq: s = new string(s.Select(x => x == 'A' ? 'B' : x == 'B' ? 'A' : x).ToArray());

Related

Change indexed character of string

Recommended way of checking if a certain string has a specified character more than once

Generate all combinations with unknown number of slots

Slimming down a switch statement

Should a regular expression used to break up lines account for unix/dos issue?

Categories

Resources