How can I replace the nth index of a character using only Regex.
string input = "%fdfdfdfdfdfdfdfdfdfdfdffd";
string result = Regex.Replace(input, "^%", "");
The above code, replaces the first character with an empty string, But, I want to specify an index: like nth index, so that character gets replaced with an empty string.
Can someone help me out here.
It's possible to create a regex pattern that captures all characters before and after the replaced character and then replace the whole string with the two captures separated by the new character. For example:
Regex.Replace("abcdefgh", #"^(.{4}).(.*)$", #"$1E$2") // returns "abcdEfgh"
You could then create a method that replaces the character at a specific index:
string ReplaceCharacter(string text, int index, char value)
=> Regex.Replace(text, $#"^(.{{{index}}}).(.*)$", $#"${{1}}{value}${{2}}");
// Usage:
ReplaceCharacter("Foo-bar", 3, 'l') // returns "Foolbar"
As Johan Wentholt said in the comments, you can perfectly use Regex.Replace to match a number of characters from the start of the line and replace it with a capture group that's one character less than the full matched piece:
String result = Regex.Replace(input, "^(.{" + index + "}).", "$1");
This matches "index times any character, followed by another character, at the start of the string", but replaces it by only the "index times any character" without that last character, since that last dot is outside of the capture group.
If you want to replace by something else than an empty string, you just concatenate it to the end of the "$1" replacement string. Though to be safe then, you should replace it with "${1}" to avoid problems if the piece you add behind it starts with a number, since that would change the capture group number.
What you want to do may not be possible with Regex alone. This is sort of a cheat:
var input = "%fdfd678dfdfdfdfdfdfdfdffd";
var result = Regex.Replace(input, "^.{7}", input.Substring(0,6));
Console.WriteLine($"result = {result}");
Related
I have string of the following format:
string test = "test.BO.ID";
My aim is string that part of the string whatever comes after first dot.
So ideally I am expecting output as "BO.ID".
Here is what I have tried:
// Checking for the first occurence and take whatever comes after dot
var output = Regex.Match(test, #"^(?=.).*?");
The output I am getting is empty.
What is the modification I need to make it for Regex?
You get an empty output because the pattern you have can match an empty string at the start of a string, and that is enough since .*? is a lazy subpattern and . matches any char.
Use (the value will be in Match.Groups[1].Value)
\.(.*)
or (with a lookahead, to get the string as a Match.Value)
(?<=\.).*
See the regex demo and a C# online demo.
A non-regex approach can be use String#Split with count argument (demo):
var s = "test.BO.ID";
var res = s.Split(new[] {"."}, 2, StringSplitOptions.None);
if (res.GetLength(0) > 1)
Console.WriteLine(res[1]);
If you only want the part after the first dot you don't need a regex at all:
x.Substring(x.IndexOf('.'))
what is the best way to trim ALL non alpha numeric characters from the beginning and end of a string ? I tried to add characters that I do no need manually but it doesn't work well and use the . I just need to trim anything not alphanumeric.
I tried using this function:
string something = "()&*1#^#47*^#21%Littering aaaannnndóú(*&^1#*32%#**)7(#9&^";
string somethingNew = Regex.Replace(something, #"[^\p{L}-\s]+", "");
But it removes all characters that are non alpha numeric from the string. What I basically want is like this:
"test1" -> test1
#!#!2test# -> 2test
(test3) -> test3
##test4---- -> test4
I do want to support unicode characters but not symbols..
EDIT:
The output of the example should be:
Littering aaaannnndóú
Regards
Assuming you want to trim non-alphanumeric characters from the start and end of your string:
s = new string(s.SkipWhile(c => !char.IsLetterOrDigit(c))
.TakeWhile(char.IsLetterOrDigit)
.ToArray());
#"[^\p{L}\s-]+(test\d*)|(test\d*)[^\p{L}\s-]+","$1"
You can use String function String.Trim Method (Char[]) in .NET library to trim the unnecessary characters from the given string.
From MSDN : String.Trim Method (Char[])
Removes all leading and trailing occurrences of a set of characters
specified in an array from the current String object.
Before trimming the unwanted characters, you need to first identify whether the character is Letter Or Digit, if it is non-alphanumeric then you can use String.Trim Method (Char[]) function to remove it.
you need to use Char.IsLetterOrDigit() function to identify wether the character is alphanumeric or not.
From MSDN: Char.IsLetterOrDigit()
Indicates whether a Unicode character is categorized as a letter or a
decimal digit.
Try This:
string str = "()&*1#^#47*^#21%Littering aaaannnndóú(*&^1#*32%#**)7(#9&^";
foreach (char ch in str)
{
if (!char.IsLetterOrDigit(ch))
str = str.Trim(ch);
}
Output:
1#^#47*^#21%Littering aaaannnndóú(*&^1#*32%#**)7(#9
If you need to remove any character which is not alphanumeric, you can use IsLetterOrDigit paired with a Where to go through every character. And because we're working at the char level, we'll need a little Concat at the end to bring everything back into a string.
string result = string.Concat(input.Where(char.IsLetterOrDigit));
which you can easily convert into an extension method
public static class Extensions
{
public static string ToAlphaNum(this string input)
{
return string.Concat(input.Where(char.IsLetterOrDigit));
}
}
that you can use like this :
string testString = "#!#!\"(test123)\"";
string result = testString.ToAlphaNum(); //test123
Note: this will remove every non-alphanumeric character from your string, if you really need to remove only those at the beginning/end, please add more details about what defines a beginning or an end and add more examples.
And you could also replace all the non-letters/numbers at the beginning and/or end of the line:
^[^\p{L}\p{N}]*|[^\p{L}\p{N}]*$
used as
resultString = Regex.Replace(subjectString, #"^[^\p{L}\p{N}]*|[^\p{L}\p{N}]*$", "", RegexOptions.Multiline);
If you really want to only remove characters at the beginning and end of the "String" and not do this line by line, then remove the ^$ match at linebreak option (RegexOption.Multiline)
If you wanted to include leading or trailing underscores, as characters to be retained, you could simplify the regex to:
^\W+|\W+$
The core of the regex:
[^\p{L}\p{N}]
is a negated character class which includes all of the characters in the Unicode class of Letters \p{L} or Numbers \p{N}
In other words:
Trim non-unicode alphanumeric characters
^[^\p{L}\p{N}]*|[^\p{L}\p{N}]*$
Options: Case sensitive; Exact spacing; Dot doesn't match line breaks; ^$ match at line breaks; Parentheses capture
Match this alternative «^[^\p{L}\p{N}]*»
Assert position at the beginning of a line «^»
Match any single character NOT present in the list below «[^\p{L}\p{N}]*»
Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
A character from the Unicode category “letter” «\p{L}»
A character from the Unicode category “number” «\p{N}»
Or match this alternative «[^\p{L}\p{N}]*$»
Match any single character NOT present in the list below «[^\p{L}\p{N}]*»
Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
A character from the Unicode category “letter” «\p{L}»
A character from the Unicode category “number” «\p{N}»
Assert position at the end of a line «$»
Created with RegexBuddy
Without using regex:
In Java, you could do: (in c# syntax would be nearly the same with same functionality)
while (true) {
if (word.length() == 0) {
return ""; // bad
}
if (!Character.isLetter(word.charAt(0))) {
word = word.substring(1);
continue; // so we are doing front first
}
if (!Character.isLetter(word.charAt(word.length()-1))) {
word = word.substring(0, word.length()-1);
continue; // then we are doing end
}
break; // if front is done, and end is done
}
you could use this pattern
^[^[:alnum:]]+|[^[:alnum:]]+$
with g option
Demo
I am trying to replace all characters inside a Regular Expression expect the number, but the number should not start with 0
How can I achieve this using Regular Expression?
I have tried multiple things like #"^([1-9]+)(0+)(\d*)"and "(?<=[1-9])0+", but those does not work
Some examples of the text could be hej:\\\\0.0.0.22, hej:22, hej:\\\\?022 and hej:\\\\?22, and the result should in all places be 22
Rather than replace, try and match against [1-9][0-9]*$ on your string. Grab the matched text.
Note that as .NET regexes match Unicode number characters if you use \d, here the regex restricts what is matched to a simple character class instead.
(note: regex assumes matches at end of line only)
According to one of your comments hej:\\\\0.011.0.022 should yield 110022. First select the relevant string part from the first non zero digit up to the last number not being zero:
([1-9].*[1-9]\d*)|[1-9]
[1-9] is the first non zero digit
.* are any number of any characters
[1-9]\d* are numbers, starting at the first non-zero digit
|[1-9] includes cases consisting of only one single non zero digit
Then remove all non digits (\D)
Match match = Regex.Match(input, #"([1-9].*[1-9]\d*)|[1-9]");
if (match.Success) {
result = Regex.Replace(match.Value, "\D", "");
} else {
result = "";
}
Use following
[1-9][0-9]*$
You don't need to do any recursion, just match that.
Here is something that you can try The87Boy you can play around with or add to the pattern as you like.
string strTargetString = #"hej:\\\\*?0222\";
string pattern = "[\\\\hej:0.?*]";
string replacement = " ";
Regex regEx = new Regex(pattern);
string newRegStr = Regex.Replace(regEx.Replace(strTargetString, replacement), #"\s+", " ");
Result from the about Example = 22
How can I get the string before the character "-" using regular expressions?
For example, I have "text-1" and I want to return "text".
So I see many possibilities to achieve this.
string text = "Foobar-test";
Regex Match everything till the first "-"
Match result = Regex.Match(text, #"^.*?(?=-)");
^ match from the start of the string
.*? match any character (.), zero or more times (*) but as less as possible (?)
(?=-) till the next character is a "-" (this is a positive look ahead)
Regex Match anything that is not a "-" from the start of the string
Match result2 = Regex.Match(text, #"^[^-]*");
[^-]* matches any character that is not a "-" zero or more times
Regex Match anything that is not a "-" from the start of the string till a "-"
Match result21 = Regex.Match(text, #"^([^-]*)-");
Will only match if there is a dash in the string, but the result is then found in capture group 1.
Split on "-"
string[] result3 = text.Split('-');
Result is an Array the part before the first "-" is the first item in the Array
Substring till the first "-"
string result4 = text.Substring(0, text.IndexOf("-"));
Get the substring from text from the start till the first occurrence of "-" (text.IndexOf("-"))
You get then all the results (all the same) with this
Console.WriteLine(result);
Console.WriteLine(result2);
Console.WriteLine(result21.Groups[1]);
Console.WriteLine(result3[0]);
Console.WriteLine(result4);
I would prefer the first method.
You need to think also about the behavior, when there is no dash in the string. The fourth method will throw an exception in that case, because text.IndexOf("-") will be -1. Method 1 and 2.1 will return nothing and method 2 and 3 will return the complete string.
Here is my suggestion - it's quite simple as that:
[^-]*
This is something like the regular expression you need:
([^-]*)-
Quick tests in JavaScript:
/([^-]*)-/.exec('text-1')[1] // 'text'
/([^-]*)-/.exec('foo-bar-1')[1] // 'foo'
/([^-]*)-/.exec('-1')[1] // ''
/([^-]*)-/.exec('quux')[1] // explodes
I dont think you need regex to achieve this. I would look at the SubString method along with the indexOf method. If you need more help, add a comment showing what you have attempted and I will offer more help.
You could just use another non-regex based method. Someone gave the suggestion of using Substring, but you could also use Split:
string testString = "my-string";
string[] splitString = testString.Split("-");
string resultingString = splitString[0]; //my
See http://msdn.microsoft.com/en-US/library/ms228388%28v=VS.80%29.aspx for another good example.
If you want use RegEx in .NET,
Regex rx = new Regex(#"^([\w]+)(\-)*");
var match = rx.Match("thisis-thefirst");
var text = match.Groups[1].Value;
Assert.AreEqual("thisis", text);
Find all word and space characters up to and including a -
^[\w ]+-
Say I have a string such as
abc123def456
What's the best way to split the string into an array such as
["abc", "123", "def", "456"]
string input = "abc123def456";
Regex re = new Regex(#"\D+|\d+");
string[] result = re.Matches(input).OfType<Match>()
.Select(m => m.Value).ToArray();
string[] result = Regex.Split("abc123def456", "([0-9]+)");
The above will use any sequence of numbers as the delimiter, though wrapping it in () says that we still would like to keep our delimiter in our returned array.
Note: In the example snippet we will get an empty element as the last entry of our array.
The boundary you look for can be described as "A position where a digit follows a non-digit, or where a non-digit follows a digit."
So:
string[] result = Regex.Split("abc123def456", #"(?<=\D)(?=\d)|(?<=\d)(?=\D)");
Use [0-9] and [^0-9], respectively, if \d and \D are not specific enough.
Add space around digitals, then split it. So there is the solution.
Regex.Replace("abc123def456", #"(\d+)", #" \1 ").Split(' ');
I hope it works.
You could convert the string to a char array and then loop through the characters. As long as the characters are of the same type (letter or number) keep adding them to a string. When the next character no longer is of the same type (or you've reached the end of the string), add the temporary string to the array and reset the temporary string to null.