Regex replacement in strings - c#

I have string pairs as :
s_1 : "He graduated in 1994 with 32 courses"
s_2 : "I graduated in 0000 with 00 courses"
What I want to do is modify s_2, such that 0000 gets changed to 1994 and 00 to 32.
modified_s_2 : "I graduated in 1994 with 32 courses"
Basically, 0000...n_times...0 tells about that it's going to be matched with string number with n digits in s_1.
I can implement this by looping.
I am looking for efficient implementation. I think regex implementation would be easy for this.
Note : There can any n numbers in strings, and each number can have any number of digits.

I think you mean this:
var s_1 = "He graduated in 1994 with 32 courses";
var s_2 = "I graduated in 0000 with 00 courses 0000";
//// I'll find combination of '0's to be replaced
var regexes =
Regex.Matches(s_2, #"\b0+\b")
.OfType<Match>()
.Select(c => new { c.Value, Reg = new Regex(c.Value.Replace("0", #"\d")) })
.ToList();
//// now I replace each '0's combination with first match
var curS1 = s_1;
foreach (var regex in regexes)
{
var s1Value = regex.Reg.Match(curS1).Value;
curS1 = regex.Reg.Replace(curS1, "", 1); //// I remove first match of s_1 to don't matched again
s_2 = new Regex(regex.Value).Replace(s_2, s1Value, 1);
}
[ C# Demo ]
A test cases can be:
var s_1 = "He graduated in 1994 with 32 courses then 254 for 1998";
var s_2 = "I graduated in 0000 with 00 courses then 000 for 0000";
That result will be:
I graduated in 1994 with 32 courses then 254 for 1998

Hopefully this can get you started since you're looking for regex. You can modify it to be in a loop for whatever kind of "string pairs" you are using.
This is how the regex looks visually: Regex101 (this is why we do the .Trim() below). I changed it so it's less tied to that specific example and can work with a variety of numbers in different places
var s_1 = "He graduated number 1 in class in 1900 with 20 courses over the course of 12 weeks";
var s_2 = "I graduated number 0 in class in 0000 with 00 courses over the course of 00 weeks";
// Finds the matches in s_1 with the year and the number of courses
// The spaces are important in the regex so we match properly
var regex = new Regex("( \\d{1,} )");
var matches = regex.Matches(s_1);
var lastIndex = 0; // This is necessary so we aren't replacing previously replaced values
foreach(var match in matches.Cast<Match>())
{
// The matched value, removing extra spaces
var trimmedMatch = match.Value.Trim();
// The n-length 0 string in s_2
var zeroString = new String('0', trimmedMatch.Length);
// A simpler way to replace a string within a string
var sb = new StringBuilder(s_2);
var replaceIndex = s_2.IndexOf(zeroString, lastIndex);
sb.Remove(replaceIndex, zeroString.Length);
sb.Insert(replaceIndex, trimmedMatch);
s_2 = sb.ToString();
// This is necessary otherwise we could end up overwriting previously done work
lastIndex = replaceIndex + zeroString.Length;
}

Disclamer: I let you handle the error of if the patern string "00" is not in the string.
As I don't have information about the real performance issue that you encounter in your implementation but you can count the number of digit in each your input and templated output so you know if they match.
string input = "He graduated in 1994 with 32 coursesHe graduated in 1994 with 32 coursesHe graduated in 1994 with 32 courses ";
string ouput = "He 0000 with 00 He in 0000 with 00 He in 0000 with 00";
string regex = #"(\d+)";
var matches = Regex.Matches(input, regex).Cast<Match>();
var tempSB = new StringBuilder(ouput);
foreach(var i in matches)
{
var strI = i.Value;
var strILength = strI.Length;
var template = new string('0', strILength );
var index = ouput.IndexOf(template); // if (index ==-1) exception;
tempSB.Remove(index, strILength);
tempSB.Insert(index, strI);
ouput = tempSB.ToString();
}
For a 50Mo input it take about 10 sec. Thats sound reasonable.

Related

How to filter hidden characters in a String using C#

I am new to C# and trying to lean how to filter data that I read from a file. I have a file that I read from that has data similer to the follwoing:
3 286 858 95.333 0.406 0.427 87.00 348 366 4 b
9 23 207 2.556 0.300 1.00 1.51 62 207 41 a
9 37 333 4.111 0.390 0.811 2.03 130 270 64 a
10 21 210 2.100 0.348 0.757 3.17 73 159 23 a
9 79 711 8.778 0.343 0.899 2.20 244 639 111 a
10 66 660 6.600 0.324 0.780 2.25 214 515 95 a
When I read these data, some of them have Carriage return Or Line Feed characters hidden in them. Can you please tell me if there is a way to remove them. For example, one of my variable may hold the the following value due to a newline character in them:
mystringval = "9
"
I want this mystringval variable to be converted back to
mystringval = "9"
If you want to get rid of all special characters, you can learn regular expressions and use Regex.Replace.
var value = "&*^)#abcd.";
var filtered = System.Text.RegularExpressions.Regex.Replace(value, #"[^\w]", "");
REGEXPLANATION
the # before the string means that you're using a literal string and c# escape sequences don't work, leaving only the regex escape sequences
[^abc] matches all characters that are not a, b, or c(to replace them with empty space)
\w is a special regex code that means a letter, number, or underscore
you can also use #"[^A-Za-z0-9\.]" which will filter letters, numbers and decimal. See http://rubular.com/ for more details.
As well as using RegEx, you can use LINQ to do something like
var goodCharacters = input
.Replace("\r", " ")
.Replace("\n", " ")
.Where(c => char.IsLetterOrDigit(c) || c == ' ' || c == '.')
.ToArray();
var result = new string(goodCharacters).Trim();
The first two Replace calls will guard against having a number at the end of one line and a number at the start of the next, e.g. "123\r\n987" would otherwise be "123987", whereas I assume you want "123 987".
Try my sample here on ideone.com.

regular expression to match numbers of length 2 from given string

How can I create a regular expression that will match numbers of length 2 from a given string.
Example input:
givenpercentage#60or•70and 8090
Desired output:
60 70 80 90
Try this:
string x = "givenpercentage#60or•70and 8090";
Regex r = new Regex(#"\d{2}");
foreach(Match m in r.Matches(x))
{
string temp = m.Value;
//Do something
}
\d -> only numbers
{2} -> 2 numbers only
Output will be:
60 70 80 90

How to replace a numeric character with empty character in C#?

I have string like
1 69 / EMP1094467 EMP1094467 : 2 69 / ScreenLysP
here the numeric characters should be replace with empty characters, Llike:
/ EMP1094467
I tried like this
var output = Regex.Replace(input, #"[\d-]", string.Empty);
which produced the following result:
/ EMP
Please suggest a better solution.
You can try using word boundaries:
var input = "1 69 / EMP1094467 EMP1094467 : 2 69 / ScreenLysP ";
var output = Regex.Replace(input, #"\b[\d]+\b", string.Empty);
string.Substring seems fitting here:
var str = "1 69 / EMP1094467";
var result = str.Substring(str.IndexOf("/")); // "/ EMP1094467"

How to format a string to include blank spaces in C#

My string is "2345000012999922"
I want to convert it to: 2345 0000 12 9999 22. The pattern is always de same AAAA BBBB CC DDDD EE but EE is optional and may not be filled.
I tried:
string.format("{0:#### #### ## #### ##});
with no success. I used a long variable instead of string but still with no succed.
Try this:
void Main()
{
var d = decimal.Parse("2345000012999922");
Console.Out.WriteLine("{0:#### #### ## #### ##}", d);
}
First convert to decimal, then use your own strategy.
Formatting of numbers works right-to-left, meaning if you had 2 numbers as follows:
2345000012999922
23450000129999
And we did something like:
void Main()
{
var d1 = decimal.Parse("23450000129999");
var d2 = decimal.Parse("234500001299");
Console.Out.WriteLine("{0:#### #### ## #### ##}", d1);
Console.Out.WriteLine("{0:#### #### ## #### ##}", d2);
Console.Out.WriteLine("{0:0000 0000 00 0000 00}", d1);
Console.Out.WriteLine("{0:0000 0000 00 0000 00}", d2);
}
We'd get:
23 4500 00 1299 99
2345 00 0012 99
0023 4500 00 1299 99
0000 2345 00 0012 99
(Notice the 0-padding).
In a format string, "0" means put the corresponding digit here, if present, otherwise pad with a 0. A "#" means, put the corresponding digit here, if present, otherwise ignore it.
With this in mind, I think your best strategy would be something like:
void Main()
{
var s1 = "23450000129999";
var s2 = "234500001299";
var n1 = s1.Length;
var n2 = s2.Length;
var c = 12;
var f1 = "{0:#### #### ## #### ##}";
var f2 = "{0:#### #### ## ####}";
var d1 = decimal.Parse(s1);
var d2 = decimal.Parse(s2);
Console.Out.WriteLine(n1 > c ? f1 : f2, d1);
Console.Out.WriteLine(n2 > c ? f1 : f2, d2);
}
This will give:
23 4500 00 1299 99
23 4500 00 1299
The idea is that you check the string-length of the input string first. If it is 12, then you have the last optional bit absent, so you use the truncated format-string. If it is more than 12 (or equal to 14) then use the full format-string.
The other approaches such as regex and string manipulation are good approaches too, though I would suspect that they are less-performant. You should test all approaches though, especially if this piece of code will run many, many times (e.g., if you are showing data in a table).
You can improve the readability of the code further using extension methods by defining something like
public static class FormattingHelper
{
public static string GetFormatString(this string s)
{
if (s.Length == 12)
return "{0:#### #### ## ####}";
else
return "{0:#### #### ## #### ##}";
}
}
void Main()
{
var s1 = "23450000129999";
var s2 = "234500001299";
var d1 = decimal.Parse(s1);
var d2 = decimal.Parse(s2);
Console.Out.WriteLine(s1.GetFormatString(), d1);
Console.Out.WriteLine(s2.GetFormatString(), d2);
}
string s = "2345000012999922";
s = s.Insert(14, " ").Insert(10, " ").Insert(8, " ").Insert(4, " ");
Console.WriteLine(s);
Note: Inserting spaces from the end (i.e. indices go down) so that you can use the indices from the original string. If you tried it the other way, you'd have to successively add 1 to each index to account for the new spaces added before the place you are currently adding a space. Not critical, but I think it's easier to understand if the indices match the places to add spaces in the original string.
Just as another way of doing it (for all slightly daft):
string input = "2345000012999922";
string Formatted = new Regex(#"(\d{4})(\d{4})(\d{2})(\d{4})(\d{2})").
replace(input, "$1 $2 $3 $4 $5");
//Formatted = 2345 0000 12 9999 22
Works for me when using long (PowerShell test, should be the same for C#):
PS> '{0:#### #### ## #### ##}' -f 2345000012999922
2345 0000 12 9999 22
string.Format("{0:#### #### ## #### ##}", 2345000012999922)
output
2345 0000 12 9999 22
Edited
This would also work for you
string str = "2345000012999922";
string str2 = string.Format("{0:#### #### ## #### ##}", Convert.ToDouble(str));

Remove single alphabets from a string

I need help in removing letters but not words from an incoming data string. Like the following,
String A = "1 2 3A 4 5C 6 ABCD EFGH 7 8D 9";
to
String A = "1 2 3 4 5 6 ABCD EFGH 7 8 9";
You need to match a letter and ensure that there is no letter before and after. So match
(?<!\p{L})\p{L}(?!\p{L})
and replace with an empty string.
Look around assertions on regular-expresssion.info
Unicode properties on regular-expresssion.info
In C#:
string s = "1 2 3A 4 5C 6 ABCD EFGH 7 8D 9";
string result = Regex.Replace(s, #"(?<!\p{L}) # Negative lookbehind assertion to ensure not a letter before
\p{L} # Unicode property, matches a letter in any language
(?!\p{L}) # Negative lookahead assertion to ensure not a letter following
", String.Empty, RegexOptions.IgnorePatternWhitespace);
The "obligatory" Linq approach:
string[] words = A.Split();
string result = string.Join(" ",
words.Select(w => w.Any(c => Char.IsDigit(c)) ?
new string(w.Where(c => Char.IsDigit(c)).ToArray()) : w));
This approach looks if each word contains a digit. Then it filters out the non-digit chars and creates a new string from the result. Otherwise it just takes the word.
And here comes the old school:
Dim A As String = "1 2 3A 4 5C 6 ABCD EFGH 7 8D 9"
Dim B As String = "1 2 3 4 5 6 ABCD EFGH 7 8 9"
Dim sb As New StringBuilder
Dim letterCount As Integer = 0
For i = 0 To A.Length - 1
Dim ch As Char = CStr(A(i)).ToLower
If ch >= "a" And ch <= "z" Then
letterCount += 1
Else
If letterCount > 1 Then sb.Append(A.Substring(i - letterCount, letterCount))
letterCount = 0
sb.Append(A(i))
End If
Next
Debug.WriteLine(B = sb.ToString) 'prints True

Categories

Resources