Replace several substrings in a string - c#

I need to replace several substrings in a string
Let's say:
replace all A in original string to B
replace all B in original string to D
so for example "AB gives BD"
The "naive" approach doesn't work properly:
string S="AB";
S=S.Replace("A","B");
S=S.Replace("B","D");
as it will give DD instead of BD. (first A is changed to B but then is unnecessarily changed to D)
How to deal with such cases? Does it make sense with substrings of any size to do such a separate replacements?
EDIT: I gave some not real life example where in fact it would work doing it in reverse order(first B to D, then A to B) But as others noticed I'm interested in more general solutions: for any list of char substitutions and for any list of words substitutions
With chars I suppose now a good approach is just to go through all chars in a string and build a new string making replacements when necessary.
For words I suppose it could be more difficult, what if one replaced word is a part of another word?
For example
string S="man superman woman superwoman"
and I want replace "man" to "boy" and "woman" to "girl" only as single words

Assuming there are no cycles, you need to do it the other way around.
Meaning:
string S="AB";
S=S.Replace("B","D");
S=S.Replace("A","B");
This way, B switches to D, then A switches to B and you have no unwanted changes.
As Niklas B. rightfully pointed out, in case of general substrings there's a different way that probably should be taken.
I would iterate over the string, storing the indexes where any of the substrings appear. Once done, then I'll go ahead and perform the actual switching. This way you cannot "run over" changes that you made.

One simple way is to simply loop through the string yourself, and then using an if-else or switch statement to test the characters and change them accordingly.
This way characters only change once.
var testString = "Hello World";
var newString = new StringBuilder();
foreach (char c in testString)
{
switch (c)
{
case 'e':
newString.Append('l');
break;
case 'l':
newString.Append('e');
break;
default:
newString.Append(c);
break;
}
}
// testString will be "Hleeo Wored"
testString = newString.ToString();

You could first replace "A" with a token that certainly will not occur in the source string.
For example:
S=S.Replace("A","#");
S=S.Replace("B","D");
S=S.Replace("#","B");

Use Regex.Replace
Like the below code, you can map the strings that you want to replace with their replacement.
This is more clean and for some - more readable
IDictionary<string,string> map = new Dictionary<string,string>()
{
{"A","B"},
{"B","D"},
};
var regex = new Regex(String.Join("|",map.Keys));
var newStr = regex.Replace(str, m => map[m.Value]);

Just change order of code lines:
string S="AB";
S=S.Replace("B","D");
S=S.Replace("A","B");

Related

Check array for string that starts with given one (ignoring case)

I am trying to see if my string starts with a string in an array of strings I've created. Here is my code:
string x = "Table a";
string y = "a table";
string[] arr = new string["table", "chair", "plate"]
if (arr.Contains(x.ToLower())){
// this should be true
}
if (arr.Contains(y.ToLower())){
// this should be false
}
How can I make it so my if statement comes up true? Id like to just match the beginning of string x to the contents of the array while ignoring the case and the following characters. I thought I needed regex to do this but I could be mistaken. I'm a bit of a newbie with regex.
It seems you want to check if your string contains an element from your list, so this should be what you are looking for:
if (arr.Any(c => x.ToLower().Contains(c)))
Or simpler:
if (arr.Any(x.ToLower().Contains))
Or based on your comments you may use this:
if (arr.Any(x.ToLower().Split(' ')[0].Contains))
Because you said you want regex...
you can set a regex to var regex = new Regex("(table|plate|fork)");
and check for if(regex.IsMatch(myString)) { ... }
but it for the issue at hand, you dont have to use Regex, as you are searching for an exact substring... you can use
(as #S.Akbari mentioned : if (arr.Any(c => x.ToLower().Contains(c))) { ... }
Enumerable.Contains matches exact values (and there is no build in compare that checks for "starts with"), you need Any that takes predicate that takes each array element as parameter and perform the check. So first step is you want "contains" to be other way around - given string to contain element from array like:
var myString = "some string"
if (arr.Any(arrayItem => myString.Contains(arrayItem)))...
Now you actually asking for "string starts with given word" and not just contains - so you obviously need StartsWith (which conveniently allows to specify case sensitivity unlike Contains - Case insensitive 'Contains(string)'):
if (arr.Any(arrayItem => myString.StartsWith(
arrayItem, StringComparison.CurrentCultureIgnoreCase))) ...
Note that this code will accept "tableAAA bob" - if you really need to break on word boundary regular expression may be better choice. Building regular expressions dynamically is trivial as long as you properly escape all the values.
Regex should be
beginning of string - ^
properly escaped word you are searching for - Escape Special Character in Regex
word break - \b
if (arr.Any(arrayItem => Regex.Match(myString,
String.Format(#"^{0}\b", Regex.Escape(arrayItem)),
RegexOptions.IgnoreCase)) ...
you can do something like below using TypeScript. Instead of Starts with you can also use contains or equals etc..
public namesList: Array<string> = ['name1','name2','name3','name4','name5'];
// SomeString = 'name1, Hello there';
private isNamePresent(SomeString : string):boolean{
if (this.namesList.find(name => SomeString.startsWith(name)))
return true;
return false;
}
I think I understand what you are trying to say here, although there are still some ambiguity. Are you trying to see if 1 word in your String (which is a sentence) exists in your array?
#Amy is correct, this might not have to do with Regex at all.
I think this segment of code will do what you want in Java (which can easily be translated to C#):
Java:
x = x.ToLower();
string[] words = x.Split("\\s+");
foreach(string word in words){
foreach(string element in arr){
if(element.Equals(word)){
return true;
}
}
}
return false;
You can also use a Set to store the elements in your array, which can make look up more efficient.
Java:
x = x.ToLower();
string[] words = x.Split("\\s+");
HashSet<string> set = new HashSet<string>(arr);
for(string word : words){
if(set.contains(word)){
return true;
}
}
return false;
Edit: (12/22, 11:05am)
I rewrote my solution in C#, thanks to reminders by #Amy and #JohnyL. Since the author only wants to match the first word of the string, this edited code should work :)
C#:
static bool contains(){
x = x.ToLower();
string[] words = x.Split(" ");
var set = new HashSet<string>(arr);
if(set.Contains(words[0])){
return true;
}
return false;
}
Sorry my question was so vague but here is the solution thanks to some help from a few people that answered.
var regex = new Regex("^(table|chair|plate) *.*");
if (regex.IsMatch(x.ToLower())){}

Replace multiple characters in C# using .replace without creating a loop

I need to replace multiple characters in C# using .replace without creating a loop resulting in final string of the final character in the loop
Example code:
string t1="ABCD";
t1.Replace('A','B').Replace('B','C').Replace('C','D').Replace('D','E');
Result: EEEE
Expected result: BCDE
How do I get the expected result, I do this for a large number of characters in a string <=100 so I need a easy way. Can I do it with replace method or is there some other way?
Before going to the answer let me describe what went wrong in your case, Actually the replace operations returns a new instance of the string so after the first replace(t1.Replace('A','B') the resulting string becomes BBCD(A is replaced with B) and you are performing the next replace operation in this string, hence every B will be replaced with C. so before final Replace your input string becomes DDDD.
I've a simple solution using LINQ with String.Join, You can take a look into the working example here
string inputString = "ABCD";
var ReplacedString = String.Join("", inputString.Select(x => x == 'A' ? 'B' :
x == 'B' ? 'C' :
x == 'C' ? 'D' :
x == 'D' ? 'E' :
x));
If you don't want to write it yourself, probably the simplest way to code it would be with regexes:
Regex.Replace(mystring, "[ABCD]", s =>
{
switch (s)
{
case "A": return "B";
case "B": return "C";
case "C": return "D";
case "D": return "E";
default: return s;
}
});
In this particular example, it should work if you just reverse the order of your Replace(...) calls.
string t1="ABCD";
t1.Replace('D','E').Replace('C','D').Replace('B','C').Replace('A','B');
This might do the trick for you
string t1 = "ABCD";
var ans = string.Join("", t1.Select(x => x = (char) ((int) x + 1)));
This code will give the next character of the string. But the case of the last character of the alphabet which is z and Z this will gonna fail. Fail means it would not be a or A instead it will give { and [. But most of the cases this could be used to get the next character in the string.
The answers already posted will solve the immediate example that you give, but you also say that you have to do this for a large number of characters in a string. I may be misunderstanding your requirements, but it sounds like you are just trying to "increment" each letter. That is, A becomes B, I becomes J, etc.
If this is the case, a loop (not sure why you want to avoid loops; they seem like the best option here) will be much better than stringing a bunch of replaces together, especially for longer strings.
The below code assumes your only input will be capital latin letters, and for the letter Z, I've just wrapped the alphabet, so it will be replaced with A.
string t1 = "ABCDEFGXYZ";
StringBuilder sb = new StringBuilder();
foreach (char character in t1)
{
if (character == 'Z')
{
sb.Append('A');
}
else
{
sb.Append((Char)(Convert.ToUInt16(character) + 1));
}
}
Console.WriteLine(sb.ToString());
The following code takes input ABCDEFGXYZ and outputs BCDEFGHYZA. This is extensible to much larger inputs as well.

Using string.ToUpper on substring

Have an assignment to allow a user to input a word in C# and then display that word with the first and third characters changed to uppercase. Code follows:
namespace Capitalizer
{
class Program
{
static void Main(string[] args)
{
string text = Console.ReadLine();
char[] delimiterChars = { ' ' };
string[] words = text.Split(delimiterChars);
string Upper = text.ToUpper();
Console.WriteLine(Upper);
Console.ReadKey();
}
}
}
This of course generates the entire word in uppercase, which is not what I want. I can't seem to make text.ToUpper(0,2) work, and even then that'd capitalize the first three letters. Only solution I can think of now that would make the word appear on one line (and I don't know if it works) is to move the capitalized letters and lowercase letters into a character array and try to get that to print all values in a modified order.
The simplest way I can think of to address your exact question as described — to convert to upper case the first and third characters of the input — would be something like the following:
StringBuilder sb = new StringBuilder(text);
sb[0] = char.ToUpper(sb[0]);
sb[2] = char.ToUpper(sb[2]);
text = sb.ToString();
The StringBuilder class is essentially a mutable string object, so when doing these kinds of operations is the most fluid way to approach the problem, as it provides the most straightforward conversions to and from, as well as the full range of string operations. Changing individual characters is easy in many data structures, but insertions, deletions, appending, formatting, etc. all also come with StringBuilder, so it's a good habit to use that versus other approaches.
But frankly, it's hard to see how that's a useful operation. I can't help but wonder if you have stated the requirements incorrectly and there's something more to this question than is seen here.
You could use LINQ:
var upperCaseIndices = new[] { 0, 2 };
var message = "hello";
var newMessage = new string(message.Select((c, i) =>
upperCaseIndices.Contains(i) ? Char.ToUpper(c) : c).ToArray());
Here is how it works. message.Select (inline LINQ query) selects characters from message one by one and passes into selector function:
upperCaseIndices.Contains(i) ? Char.ToUpper(c) : c
written as C# ?: shorthand syntax for if. It reads as "If index is present in the array, then select upper case character. Otherwise select character as is."
(c, i) => condition
is a lambda expression. See also:
Understand Lambda Expressions in 3 minutes
The rest is very simple - represent result as array of characters (.ToArray()), and create a new string based off that (new string(...)).
Only solution I can think of now that would make the word appear on one line (and I don't know if it works) is to move the capitalized letters and lowercase letters into a character array and try to get that to print all values in a modified order.
That seems a lot more complicated than necessary. Once you have a character array, you can simply change the elements of that character array. In a separate function, it would look something like
string MakeFirstAndThirdCharacterUppercase(string word) {
var chars = word.ToCharArray();
chars[0] = chars[0].ToUpper();
chars[2] = chars[2].ToUpper();
return new string(chars);
}
My simple solution:
string text = Console.ReadLine();
char[] delimiterChars = { ' ' };
string[] words = text.Split(delimiterChars);
foreach (string s in words)
{
char[] chars = s.ToCharArray();
chars[0] = char.ToUpper(chars[0]);
if (chars.Length > 2)
{
chars[2] = char.ToUpper(chars[2]);
}
Console.Write(new string(chars));
Console.Write(' ');
}
Console.ReadKey();

C# Replace with regex

I'm new to VB, C#, and am struggling with regex. I think I've got the following code format to replace the regex match with blank space in my file.
EDIT: Per comments this code block has been changed.
var fileContents = System.IO.File.ReadAllText(#"C:\path\to\file.csv");
fileContents = fileContents.Replace(fileContents, #"regex", "");
regex = new Regex(pattern);
regex.Replace(filecontents, "");
System.IO.File.WriteAllText(#"C:\path\to\file.csv", fileContents);
My files are formatted like this:
"1111111","22222222222","Text that may, have a comma, or two","2014-09-01",,,,,,
So far, I have regex finding any string between ," and ", that contains a comma (there are never commas in the first or last cell, so I'm not worried about excluding those two. I'm testing regex in Expresso
(?<=,")([^"]+,[^"]+)(?=",)
I'm just not sure how to isolate that comma as what needs to be replaced. What would be the best way to do this?
SOLVED:
Combined [^"]+ with look behind/ahead:
(?<=,"[^"]+)(,)(?=[^"]+",)
FINAL EDIT:
Here's my final complete solution:
//read file contents
var fileContents = System.IO.File.ReadAllText(#"C:\path\to\file.csv");
//find all commas between double quotes
var regex = new Regex("(?<=,\")([^\"]+,[^\"]+(?=\",)");
//replace all commas with ""
fileContents = regex.Replace(fileContents, m => m.ToString().Replace(",", ""));
//write result back to file
System.IO.File.WriteAllText(#"C:\path\to\file.csv", fileContents);
Figured it out by combining the [^"]+ with the look ahead ?= and look behind ?<= so that it finds strings beginning with ,"[anything that's not double quotes, one or more times] then has a comma, then ends with [anything that's not double quotes, one or more times]",
(?<=,"[^"]+)(,)(?=[^"]+",)
Try to parse out all your columns with this:
Regex regex = new Regex("(?<=\").*?(?=\")");
Then you can just do:
foreach(Match match in regex.Matches(filecontents))
{
fileContents = fileContents.Replace(match.ToString(), match.ToString().Replace(",",string.Empty))
}
Might not be as fast but should work.
I would probably use the overload of Regex.Replace that takes a delegate to return the replaced text.
This is useful when you have a simple regex to identify the pattern but you need to do something less straightforward (complex logic) for the replace.
I find keeping your regexes simple will pay benefits when you're trying to maintain them later.
Note: this is similar to the answer by #Florian, but this replace restricts itself to replacement only in the matched text.
string exp = "(?<=,\")([^\"]+,[^\"]+)(?=\",)";
var regex = new Regex(exp);
string replacedtext = regex.Replace(filecontents, m => m.ToString().Replace(",",""))
What you have there is an irregular language. This is because a comma can mean different things depending upon where it is in the text stream. Strangely Regular Expressions are designed to parse regular languages where a comma would mean the same thing regardless of where it is in the text stream. What you need for an irregular language is a parser. In fact Regular expressions are mostly used for tokenizing strings before they are entered into a parser.
While what you are trying to do can be done using regular expressions it is likely to be very slow. For example you can use the following (which will work even if the comma is the first or last character in the field). However every time it finds a comma it will have to scan backwards and forwards to check if it is between two quotation characters.
(?<=,"[^"]*),(?=[^"]*",)
Note also that their may be a flaw in this approach that you have not yet spotted. I don't know if you have this issue but often in CSV files you can have quotation characters in the middle of fields where there may also be a comma. In these cases applications like MS Excel will typically double the quote up to show that it is not the end of the field. Like this:
"1111111","22222222222","Text that may, have a comma, Quote"" or two","2014-09-01",,,,,,
In this case you are going to be out of luck with a regular expression.
Thankfully the code to deal with CSV files is very simple:
public static IList<string> ParseCSVLine(string csvLine)
{
List<string> result = new List<string>();
StringBuilder buffer = new StringBuilder();
bool inQuotes = false;
char lastChar = '\0';
foreach (char c in csvLine)
{
switch (c)
{
case '"':
if (inQuotes)
{
inQuotes = false;
}
else
{
if (lastChar == '"')
{
buffer.Append('"');
}
inQuotes = true;
}
break;
case ',':
if (inQuotes)
{
buffer.Append(',');
}
else
{
result.Add(buffer.ToString());
buffer.Clear();
}
break;
default:
buffer.Append(c);
break;
}
lastChar = c;
}
result.Add(buffer.ToString());
buffer.Clear();
return result;
}
PS. There are another couple of issues often run into with CSV files which the code I have given doesn't solve. First is what happens if a field has an end of line character in the middle of it? Second is how do you know what character encoding a CSV file is in? The former of these two issues is easy to solve by modifying my code slightly. The second however is near impossible to do without coming to some agreement with the person supplying the file to you.

Nomalize Two Strings Then Compare

I have 2 strings which both are some kind of reference number (have a prefix and digits).
string a = "R&D123";
string b = "R&D 123";
string a and string b are two different user input, and I'm trying to compare if the two strings matches.
I know I can use String.Compare() to check if two strings are the same, but like in the example above, they could be different strings but are technically the same thing.
Because they are both user inputs (from different users), there can be several different formats.
"R&D123"
"R&D 123" //with space in between
"R.D.123 " //using period or other character
"r&d123" //different case
"RD123" //no special character
...etc
Is there a way I can somehow "normalize" the two strings first then compare them??
I know a easy-to-understand way is use string.Replace() to replace special characters and spaces to blank space and use string.ToLower() so I don't have to worry about cases. But the problem with this method is that if I have many special characters, I'll be doing .Replace() quite a few times and that's not ideal.
Another problem is that R&D is not the only prefix I need to worry about, there are others such as A.P., K-D, etc. Not sure if this will make a difference :/
Any help is appreciated, thanks!
If you want to just letters and digits,you can do it with linq:
var array1 = a.Where(x =>char.IsLetterOrDigit(x)).ToArray();
var array2 = b.Where(x => char.IsLetterOrDigit(x)).ToArray();
var normalizedStr1 = new String(array1).ToLower();
var normalizedStr2 = new String(array2).ToLower();
String.Compare(normalizedStr1,normalizedStr2);
This might not be the prettiest way to to do but it's the fastest
static void Main(string[] args)
{
string sampleResult = NormlizeAlphaNumeric("Hello wordl 3242348&&))&)*^&#R&#&R#)R##)R##R#R##");
}
public static string NormlizeAlphaNumeric(string someValue)
{
var sb = new StringBuilder(someValue.Length);
foreach (var ch in someValue)
{
if(char.IsLetterOrDigit(ch))
{
sb.Append(ch);
}
}
return sb.ToString().ToLower();
}
try this...
string s2 = Regex.Replace(s, #"[^[a-zA-Z0-9]]+", String.Empty);
it will replace all the special characters and give you the normalize string.

Categories

Resources