I have string like
1 69 / EMP1094467 EMP1094467 : 2 69 / ScreenLysP
here the numeric characters should be replace with empty characters, Llike:
/ EMP1094467
I tried like this
var output = Regex.Replace(input, #"[\d-]", string.Empty);
which produced the following result:
/ EMP
Please suggest a better solution.
You can try using word boundaries:
var input = "1 69 / EMP1094467 EMP1094467 : 2 69 / ScreenLysP ";
var output = Regex.Replace(input, #"\b[\d]+\b", string.Empty);
string.Substring seems fitting here:
var str = "1 69 / EMP1094467";
var result = str.Substring(str.IndexOf("/")); // "/ EMP1094467"
Related
I have string pairs as :
s_1 : "He graduated in 1994 with 32 courses"
s_2 : "I graduated in 0000 with 00 courses"
What I want to do is modify s_2, such that 0000 gets changed to 1994 and 00 to 32.
modified_s_2 : "I graduated in 1994 with 32 courses"
Basically, 0000...n_times...0 tells about that it's going to be matched with string number with n digits in s_1.
I can implement this by looping.
I am looking for efficient implementation. I think regex implementation would be easy for this.
Note : There can any n numbers in strings, and each number can have any number of digits.
I think you mean this:
var s_1 = "He graduated in 1994 with 32 courses";
var s_2 = "I graduated in 0000 with 00 courses 0000";
//// I'll find combination of '0's to be replaced
var regexes =
Regex.Matches(s_2, #"\b0+\b")
.OfType<Match>()
.Select(c => new { c.Value, Reg = new Regex(c.Value.Replace("0", #"\d")) })
.ToList();
//// now I replace each '0's combination with first match
var curS1 = s_1;
foreach (var regex in regexes)
{
var s1Value = regex.Reg.Match(curS1).Value;
curS1 = regex.Reg.Replace(curS1, "", 1); //// I remove first match of s_1 to don't matched again
s_2 = new Regex(regex.Value).Replace(s_2, s1Value, 1);
}
[ C# Demo ]
A test cases can be:
var s_1 = "He graduated in 1994 with 32 courses then 254 for 1998";
var s_2 = "I graduated in 0000 with 00 courses then 000 for 0000";
That result will be:
I graduated in 1994 with 32 courses then 254 for 1998
Hopefully this can get you started since you're looking for regex. You can modify it to be in a loop for whatever kind of "string pairs" you are using.
This is how the regex looks visually: Regex101 (this is why we do the .Trim() below). I changed it so it's less tied to that specific example and can work with a variety of numbers in different places
var s_1 = "He graduated number 1 in class in 1900 with 20 courses over the course of 12 weeks";
var s_2 = "I graduated number 0 in class in 0000 with 00 courses over the course of 00 weeks";
// Finds the matches in s_1 with the year and the number of courses
// The spaces are important in the regex so we match properly
var regex = new Regex("( \\d{1,} )");
var matches = regex.Matches(s_1);
var lastIndex = 0; // This is necessary so we aren't replacing previously replaced values
foreach(var match in matches.Cast<Match>())
{
// The matched value, removing extra spaces
var trimmedMatch = match.Value.Trim();
// The n-length 0 string in s_2
var zeroString = new String('0', trimmedMatch.Length);
// A simpler way to replace a string within a string
var sb = new StringBuilder(s_2);
var replaceIndex = s_2.IndexOf(zeroString, lastIndex);
sb.Remove(replaceIndex, zeroString.Length);
sb.Insert(replaceIndex, trimmedMatch);
s_2 = sb.ToString();
// This is necessary otherwise we could end up overwriting previously done work
lastIndex = replaceIndex + zeroString.Length;
}
Disclamer: I let you handle the error of if the patern string "00" is not in the string.
As I don't have information about the real performance issue that you encounter in your implementation but you can count the number of digit in each your input and templated output so you know if they match.
string input = "He graduated in 1994 with 32 coursesHe graduated in 1994 with 32 coursesHe graduated in 1994 with 32 courses ";
string ouput = "He 0000 with 00 He in 0000 with 00 He in 0000 with 00";
string regex = #"(\d+)";
var matches = Regex.Matches(input, regex).Cast<Match>();
var tempSB = new StringBuilder(ouput);
foreach(var i in matches)
{
var strI = i.Value;
var strILength = strI.Length;
var template = new string('0', strILength );
var index = ouput.IndexOf(template); // if (index ==-1) exception;
tempSB.Remove(index, strILength);
tempSB.Insert(index, strI);
ouput = tempSB.ToString();
}
For a 50Mo input it take about 10 sec. Thats sound reasonable.
I am new to C# and trying to lean how to filter data that I read from a file. I have a file that I read from that has data similer to the follwoing:
3 286 858 95.333 0.406 0.427 87.00 348 366 4 b
9 23 207 2.556 0.300 1.00 1.51 62 207 41 a
9 37 333 4.111 0.390 0.811 2.03 130 270 64 a
10 21 210 2.100 0.348 0.757 3.17 73 159 23 a
9 79 711 8.778 0.343 0.899 2.20 244 639 111 a
10 66 660 6.600 0.324 0.780 2.25 214 515 95 a
When I read these data, some of them have Carriage return Or Line Feed characters hidden in them. Can you please tell me if there is a way to remove them. For example, one of my variable may hold the the following value due to a newline character in them:
mystringval = "9
"
I want this mystringval variable to be converted back to
mystringval = "9"
If you want to get rid of all special characters, you can learn regular expressions and use Regex.Replace.
var value = "&*^)#abcd.";
var filtered = System.Text.RegularExpressions.Regex.Replace(value, #"[^\w]", "");
REGEXPLANATION
the # before the string means that you're using a literal string and c# escape sequences don't work, leaving only the regex escape sequences
[^abc] matches all characters that are not a, b, or c(to replace them with empty space)
\w is a special regex code that means a letter, number, or underscore
you can also use #"[^A-Za-z0-9\.]" which will filter letters, numbers and decimal. See http://rubular.com/ for more details.
As well as using RegEx, you can use LINQ to do something like
var goodCharacters = input
.Replace("\r", " ")
.Replace("\n", " ")
.Where(c => char.IsLetterOrDigit(c) || c == ' ' || c == '.')
.ToArray();
var result = new string(goodCharacters).Trim();
The first two Replace calls will guard against having a number at the end of one line and a number at the start of the next, e.g. "123\r\n987" would otherwise be "123987", whereas I assume you want "123 987".
Try my sample here on ideone.com.
I have a regex match string as;
public static string RegExMatchString = "(?<NVE>.{20})(?<SN>.{20})(?<REGION>.{4})(?<YY>\\d{4})(?<Mo" +
"n>\\d{2})(?<DD>\\d{1,2})(?<HH>\\d{2})(?<Min>\\d{2})(?<SS>\\d" +
"{2}).{6}(?<USER>.{10})(?<SCANTYPE>.{2})(?<IN>.{4})(?<OU" +
"T>.{4})(?<DISPO>.{2})(?<ROUTE>.{7})(?<LP>.{16})(?<POOL>.{3})" +
"(?<CONT>.{9})(?<REGION_L>.{18})(?<CAT>.{2})";
And I'm replacing it as
public string RegExReplacementString = "LogBarcodeID ( \"${NVE}\", ID2: \"${SN}\", Scanner: \"${USER}" +
"\", AreaName: \"${REGION_L}${CAT}${SCANTYPE}\", TimeStamp: \"${YY}/${Mon}/${D" +
"D} ${HH}:${Min}:${SS} \") ";
I need to remove all trailing and preceding whitespaces from these three variable;
${REGION_L}
${CAT}
${SCANTYPE}
How should I change RegExReplacementString (or maybe RegExMatchString) so that this can be achieved?
Sample input is:
0034025876080795786104041811071 135 20150304111404 DFRANZ 61 9990020569910 DA ST6007 135 F
Currently I'm getting related part as
AreaName: "135 F61" however I need to get AreaName: "135F61"
EDIT:
I'm reading regex match string from text file. And initing regex ;
RegExMatchString = File.ReadAllText(regexMatchStringPath);
regex = new Regex( RegExMatchString ,
RegexOptions.IgnoreCase | RegexOptions.CultureInvariant
| RegexOptions.IgnorePatternWhitespace | RegexOptions.Compiled
);
string replaced = regex.Replace("0034025876080795786104041811071 135 20150304111404 DFRANZ 61 9990020569910 DA ST6007 135 F", RegExReplacementString);
I think the fixed length info of each field would be useful to solve the problem here.
use a regex like : "^(.{20})(.{10})(.{2})(.{2})(.{2})$" to isolate each field.
This is for an example with 5 fields that you know are of
Length 20, Length 10, Length 2, Length 2, Length 2.
then use some LINQ and C# to get a list of (trimmed) fields.
Example :
var testRegex = "^(.{20})(.{10})(.{2})(.{2})(.{2})$";
var testData = "Field of length 20 FieldLen10123456";
var fields = Regex.Match(testData, testRegex).Groups.Cast<Group>().Skip(1).Select(i => i.Value.Trim());
How can I create a regular expression that will match numbers of length 2 from a given string.
Example input:
givenpercentage#60or•70and 8090
Desired output:
60 70 80 90
Try this:
string x = "givenpercentage#60or•70and 8090";
Regex r = new Regex(#"\d{2}");
foreach(Match m in r.Matches(x))
{
string temp = m.Value;
//Do something
}
\d -> only numbers
{2} -> 2 numbers only
Output will be:
60 70 80 90
I need help in removing letters but not words from an incoming data string. Like the following,
String A = "1 2 3A 4 5C 6 ABCD EFGH 7 8D 9";
to
String A = "1 2 3 4 5 6 ABCD EFGH 7 8 9";
You need to match a letter and ensure that there is no letter before and after. So match
(?<!\p{L})\p{L}(?!\p{L})
and replace with an empty string.
Look around assertions on regular-expresssion.info
Unicode properties on regular-expresssion.info
In C#:
string s = "1 2 3A 4 5C 6 ABCD EFGH 7 8D 9";
string result = Regex.Replace(s, #"(?<!\p{L}) # Negative lookbehind assertion to ensure not a letter before
\p{L} # Unicode property, matches a letter in any language
(?!\p{L}) # Negative lookahead assertion to ensure not a letter following
", String.Empty, RegexOptions.IgnorePatternWhitespace);
The "obligatory" Linq approach:
string[] words = A.Split();
string result = string.Join(" ",
words.Select(w => w.Any(c => Char.IsDigit(c)) ?
new string(w.Where(c => Char.IsDigit(c)).ToArray()) : w));
This approach looks if each word contains a digit. Then it filters out the non-digit chars and creates a new string from the result. Otherwise it just takes the word.
And here comes the old school:
Dim A As String = "1 2 3A 4 5C 6 ABCD EFGH 7 8D 9"
Dim B As String = "1 2 3 4 5 6 ABCD EFGH 7 8 9"
Dim sb As New StringBuilder
Dim letterCount As Integer = 0
For i = 0 To A.Length - 1
Dim ch As Char = CStr(A(i)).ToLower
If ch >= "a" And ch <= "z" Then
letterCount += 1
Else
If letterCount > 1 Then sb.Append(A.Substring(i - letterCount, letterCount))
letterCount = 0
sb.Append(A(i))
End If
Next
Debug.WriteLine(B = sb.ToString) 'prints True