Remove single alphabets from a string - c#

I need help in removing letters but not words from an incoming data string. Like the following,
String A = "1 2 3A 4 5C 6 ABCD EFGH 7 8D 9";
to
String A = "1 2 3 4 5 6 ABCD EFGH 7 8 9";

You need to match a letter and ensure that there is no letter before and after. So match
(?<!\p{L})\p{L}(?!\p{L})
and replace with an empty string.
Look around assertions on regular-expresssion.info
Unicode properties on regular-expresssion.info
In C#:
string s = "1 2 3A 4 5C 6 ABCD EFGH 7 8D 9";
string result = Regex.Replace(s, #"(?<!\p{L}) # Negative lookbehind assertion to ensure not a letter before
\p{L} # Unicode property, matches a letter in any language
(?!\p{L}) # Negative lookahead assertion to ensure not a letter following
", String.Empty, RegexOptions.IgnorePatternWhitespace);

The "obligatory" Linq approach:
string[] words = A.Split();
string result = string.Join(" ",
words.Select(w => w.Any(c => Char.IsDigit(c)) ?
new string(w.Where(c => Char.IsDigit(c)).ToArray()) : w));
This approach looks if each word contains a digit. Then it filters out the non-digit chars and creates a new string from the result. Otherwise it just takes the word.

And here comes the old school:
Dim A As String = "1 2 3A 4 5C 6 ABCD EFGH 7 8D 9"
Dim B As String = "1 2 3 4 5 6 ABCD EFGH 7 8 9"
Dim sb As New StringBuilder
Dim letterCount As Integer = 0
For i = 0 To A.Length - 1
Dim ch As Char = CStr(A(i)).ToLower
If ch >= "a" And ch <= "z" Then
letterCount += 1
Else
If letterCount > 1 Then sb.Append(A.Substring(i - letterCount, letterCount))
letterCount = 0
sb.Append(A(i))
End If
Next
Debug.WriteLine(B = sb.ToString) 'prints True

Related

Removing whitespace between consecutive numbers

I have a string, from which I want to remove the whitespaces between the numbers:
string test = "Some Words 1 2 3 4";
string result = Regex.Replace(test, #"(\d)\s(\d)", #"$1$2");
the expected/desired result would be:
"Some Words 1234"
but I retrieve the following:
"Some Words 12 34"
What am I doing wrong here?
Further examples:
Input: "Some Words That Should not be replaced 12 9 123 4 12"
Output: "Some Words That Should not be replaced 129123412"
Input: "test 9 8"
Output: "test 98"
Input: "t e s t 9 8"
Output: "t e s t 98"
Input: "Another 12 000"
Output: "Another 12000"
Regex.Replace continues to search after the previous match:
Some Words 1 2 3 4
^^^
first match, replace by "12"
Some Words 12 3 4
^
+-- continue searching here
Some Words 12 3 4
^^^
next match, replace by "34"
You can use a zero-width positive lookahead assertion to avoid that:
string result = Regex.Replace(test, #"(\d)\s(?=\d)", #"$1");
Now the final digit is not part of the match:
Some Words 1 2 3 4
^^?
first match, replace by "1"
Some Words 12 3 4
^
+-- continue searching here
Some Words 12 3 4
^^?
next match, replace by "2"
...
Your regex consumes the digit on the right. (\d)\s(\d) matches and captures 1 in Some Words 1 2 3 4 into Group 1, then matches 1 whitespace, and then matches and consumes (i.e. adds to the match value and advances the regex index) 2. Then, the regex engine tries to find another match from the current index, that is already after 1 2. So, the regex does not match 2 3, but finds 3 4.
Here is your regex demo and a diagram showing that:
Also, see the process of matching here:
Use lookarounds instead that are non-consuming:
(?<=\d)\s+(?=\d)
See the regex demo
Details
(?<=\d) - a positive lookbehind that matches a location in string immediately preceded with a digit
\s+ - 1+ whitespaces
(?=\d) - a positive lookahead that matches a location in string immediately followed with a digit.
C# demo:
string test = "Some Words 1 2 3 4";
string result = Regex.Replace(test, #"(?<=\d)\s+(?=\d)", "");
See the online demo:
var strs = new List<string> {"Some Words 1 2 3 4", "Some Words That Should not be replaced 12 9 123 4 12", "test 9 8", "t e s t 9 8", "Another 12 000" };
foreach (var test in strs)
{
Console.WriteLine(Regex.Replace(test, #"(?<=\d)\s+(?=\d)", ""));
}
Output:
Some Words 1234
Some Words That Should not be replaced 129123412
test 98
t e s t 98
Another 12000

Regex replacement in strings

I have string pairs as :
s_1 : "He graduated in 1994 with 32 courses"
s_2 : "I graduated in 0000 with 00 courses"
What I want to do is modify s_2, such that 0000 gets changed to 1994 and 00 to 32.
modified_s_2 : "I graduated in 1994 with 32 courses"
Basically, 0000...n_times...0 tells about that it's going to be matched with string number with n digits in s_1.
I can implement this by looping.
I am looking for efficient implementation. I think regex implementation would be easy for this.
Note : There can any n numbers in strings, and each number can have any number of digits.
I think you mean this:
var s_1 = "He graduated in 1994 with 32 courses";
var s_2 = "I graduated in 0000 with 00 courses 0000";
//// I'll find combination of '0's to be replaced
var regexes =
Regex.Matches(s_2, #"\b0+\b")
.OfType<Match>()
.Select(c => new { c.Value, Reg = new Regex(c.Value.Replace("0", #"\d")) })
.ToList();
//// now I replace each '0's combination with first match
var curS1 = s_1;
foreach (var regex in regexes)
{
var s1Value = regex.Reg.Match(curS1).Value;
curS1 = regex.Reg.Replace(curS1, "", 1); //// I remove first match of s_1 to don't matched again
s_2 = new Regex(regex.Value).Replace(s_2, s1Value, 1);
}
[ C# Demo ]
A test cases can be:
var s_1 = "He graduated in 1994 with 32 courses then 254 for 1998";
var s_2 = "I graduated in 0000 with 00 courses then 000 for 0000";
That result will be:
I graduated in 1994 with 32 courses then 254 for 1998
Hopefully this can get you started since you're looking for regex. You can modify it to be in a loop for whatever kind of "string pairs" you are using.
This is how the regex looks visually: Regex101 (this is why we do the .Trim() below). I changed it so it's less tied to that specific example and can work with a variety of numbers in different places
var s_1 = "He graduated number 1 in class in 1900 with 20 courses over the course of 12 weeks";
var s_2 = "I graduated number 0 in class in 0000 with 00 courses over the course of 00 weeks";
// Finds the matches in s_1 with the year and the number of courses
// The spaces are important in the regex so we match properly
var regex = new Regex("( \\d{1,} )");
var matches = regex.Matches(s_1);
var lastIndex = 0; // This is necessary so we aren't replacing previously replaced values
foreach(var match in matches.Cast<Match>())
{
// The matched value, removing extra spaces
var trimmedMatch = match.Value.Trim();
// The n-length 0 string in s_2
var zeroString = new String('0', trimmedMatch.Length);
// A simpler way to replace a string within a string
var sb = new StringBuilder(s_2);
var replaceIndex = s_2.IndexOf(zeroString, lastIndex);
sb.Remove(replaceIndex, zeroString.Length);
sb.Insert(replaceIndex, trimmedMatch);
s_2 = sb.ToString();
// This is necessary otherwise we could end up overwriting previously done work
lastIndex = replaceIndex + zeroString.Length;
}
Disclamer: I let you handle the error of if the patern string "00" is not in the string.
As I don't have information about the real performance issue that you encounter in your implementation but you can count the number of digit in each your input and templated output so you know if they match.
string input = "He graduated in 1994 with 32 coursesHe graduated in 1994 with 32 coursesHe graduated in 1994 with 32 courses ";
string ouput = "He 0000 with 00 He in 0000 with 00 He in 0000 with 00";
string regex = #"(\d+)";
var matches = Regex.Matches(input, regex).Cast<Match>();
var tempSB = new StringBuilder(ouput);
foreach(var i in matches)
{
var strI = i.Value;
var strILength = strI.Length;
var template = new string('0', strILength );
var index = ouput.IndexOf(template); // if (index ==-1) exception;
tempSB.Remove(index, strILength);
tempSB.Insert(index, strI);
ouput = tempSB.ToString();
}
For a 50Mo input it take about 10 sec. Thats sound reasonable.

How to filter hidden characters in a String using C#

I am new to C# and trying to lean how to filter data that I read from a file. I have a file that I read from that has data similer to the follwoing:
3 286 858 95.333 0.406 0.427 87.00 348 366 4 b
9 23 207 2.556 0.300 1.00 1.51 62 207 41 a
9 37 333 4.111 0.390 0.811 2.03 130 270 64 a
10 21 210 2.100 0.348 0.757 3.17 73 159 23 a
9 79 711 8.778 0.343 0.899 2.20 244 639 111 a
10 66 660 6.600 0.324 0.780 2.25 214 515 95 a
When I read these data, some of them have Carriage return Or Line Feed characters hidden in them. Can you please tell me if there is a way to remove them. For example, one of my variable may hold the the following value due to a newline character in them:
mystringval = "9
"
I want this mystringval variable to be converted back to
mystringval = "9"
If you want to get rid of all special characters, you can learn regular expressions and use Regex.Replace.
var value = "&*^)#abcd.";
var filtered = System.Text.RegularExpressions.Regex.Replace(value, #"[^\w]", "");
REGEXPLANATION
the # before the string means that you're using a literal string and c# escape sequences don't work, leaving only the regex escape sequences
[^abc] matches all characters that are not a, b, or c(to replace them with empty space)
\w is a special regex code that means a letter, number, or underscore
you can also use #"[^A-Za-z0-9\.]" which will filter letters, numbers and decimal. See http://rubular.com/ for more details.
As well as using RegEx, you can use LINQ to do something like
var goodCharacters = input
.Replace("\r", " ")
.Replace("\n", " ")
.Where(c => char.IsLetterOrDigit(c) || c == ' ' || c == '.')
.ToArray();
var result = new string(goodCharacters).Trim();
The first two Replace calls will guard against having a number at the end of one line and a number at the start of the next, e.g. "123\r\n987" would otherwise be "123987", whereas I assume you want "123 987".
Try my sample here on ideone.com.

regular expression to match numbers of length 2 from given string

How can I create a regular expression that will match numbers of length 2 from a given string.
Example input:
givenpercentage#60or•70and 8090
Desired output:
60 70 80 90
Try this:
string x = "givenpercentage#60or•70and 8090";
Regex r = new Regex(#"\d{2}");
foreach(Match m in r.Matches(x))
{
string temp = m.Value;
//Do something
}
\d -> only numbers
{2} -> 2 numbers only
Output will be:
60 70 80 90

How to replace a numeric character with empty character in C#?

I have string like
1 69 / EMP1094467 EMP1094467 : 2 69 / ScreenLysP
here the numeric characters should be replace with empty characters, Llike:
/ EMP1094467
I tried like this
var output = Regex.Replace(input, #"[\d-]", string.Empty);
which produced the following result:
/ EMP
Please suggest a better solution.
You can try using word boundaries:
var input = "1 69 / EMP1094467 EMP1094467 : 2 69 / ScreenLysP ";
var output = Regex.Replace(input, #"\b[\d]+\b", string.Empty);
string.Substring seems fitting here:
var str = "1 69 / EMP1094467";
var result = str.Substring(str.IndexOf("/")); // "/ EMP1094467"

Categories

Resources