I want to code
var text = "14. hello my friends we meet 1 test, 2 baby 3 wiki 4 marvel";
string[] split = text.Split('14.', 1, 2, 3, 4);
var needText = split[0].Replace('14.', '');
"1" "2" "3" "4" is static text.
but, "14." is dynamic text.
ex)
var text2 = "1972. google youtube. 1 phone, 2 star 3 tv 4 mouse";
string[] split = text.Split('1972.', 1, 2, 3, 4);
var needText = split[0].Replace('1972.', '');
If you have dynamic separators like this, String.Split is not suitable. Use Regex.Split instead.
You can give a pattern to Regex.Split and it will treat every substring that matches the pattern as a separator.
In this case, you need a pattern like this:
\d+\. |1|2|3|4
| are or operators. \d matches any digit character. + means match between 1 to unlimited times. \. matches the dot literally because . has special meaning in regex.
Usage:
var split = Regex.Split(text, "\\d+\\. |1|2|3|4");
And I think the text you need is at index 1 of split.
Remember to add a using directive to System.Text.RegularExpressions!
If you use IndexOf() with Substring(), you can very easily grab the information you need. If it's any more complex than your examples then use Regex.
var text = "14. hello my friends we meet 1 test, 2 baby 3 wiki 4 marvel";
var strArr = text.Substring(text.IndexOf(' ')).Split('1', '2', '3', '4');
Related
So I want the formats xxxxxx-xxxx AND xxxxxxxx-xxxx to be possible. I've managed to fix the first section before the dash, but the last four digits are troublesome. It does require to match at least 4 characters, but I also want the regex to return false if there's more than 4 characters. How do I do it?
This is how it looks so far:
var regex = new Regex(#"^\d{6,8}[-|(\s)]{0,1}\d{4}");
And this is the results:
var regex = new Regex(#"^\d{6,8}[-|(\s)]{0,1}\d{4}");
Match m = regex.Match("840204-2344");
Console.WriteLine(m.Success); // Outputs True
Match m = regex.Match("19840204-2344");
Console.WriteLine(m.Success); // Outputs True
Match m = regex.Match("19840204-23");
Console.WriteLine(m.Success); // Outputs false
Match m = regex.Match("19840204-2323423423");
Console.WriteLine(m.Success); // Outputs true, and this is what I don't want
The \d{6,8} pattern matches 6, 7 or 8 digits, so that will already invalidate your regex pattern. Besdies, [-|(\s)]{0,1} matches 1 or 0 -, (, ), | or whitespace chars, and will also match strings like 19840204|2323, 19840204(2323 and 19840204)2323.
You may use
^\d{6}(?:\d{2})?[-\s]?\d{4}$
See the regex demo.
Details
^ - start of string
\d{6} - 6 digits
(?:\d{2})? - optional 2 digits
[-\s]? - 1 or 0 - or whitespaces
\d{4} - 4 digits
$ - end of string.
To make \d only match ASCII digits, pass RegexOptions.ECMAScriptoption. Example:
var res = Regex.IsMatch(s, #"^\d{6}(?:\d{2})?[-\s]?\d{4}$", RegexOptions.ECMAScript);
You are forgetting the $ at the end:
var regex = new Regex(#"^(\d{6}|\d{8})-\d{4}$");
If you want to match the social security number anywhere in a string, you van also use \b to test for boundaries:
var regex = new Regex(#"\b(\d{6}|\d{8})-\d{4}\b");
Edit: I corrected the RegEx to fix the problems mentioned in the comments. The commentors are right, of course. In my earlier post I just wanted to explain why the RegEx matched the longer string.
I have the following RegEx pattern in order to determine some 3-digit exchanges of phone numbers:
(?:2(?:04|[23]6|[48]9|50)|3(?:06|43|65)|4(?:03|1[68]|3[178]|50)|5(?:06|1[49]|79|8[17])|6(?:0[04]|13|39|47)|7(?:0[59]|78|8[02])|8(?:[06]7|19|73)|90[25])
It looks pretty daunting, but it only yields around 40 or 50 numbers. Is there a way in C# to generate all numbers that match this pattern? Offhand, I know I can loop through the numbers 001 thru 999, and check each number against the pattern, but is there a cleaner, built-in way to just generate a list or array of matches?
ie - {"204","226","236",...}
No, there is no off the shelf tool to determine all matches given a regex pattern. Brute force is the only way to test the pattern.
Update
It is unclear why you are using (?: ) which is the "Match but don't capture". It is used to anchor a match, for example take this phone text phone:303-867-5309 where we don't care about the phone: but we want the number.
The pattern used would be
(?:phone\:)(\d{3}-\d{3}-\d{4})
which would match the whole line, but the capture returned would just be the second match of the phone number 303-867-5309.
So the (?: ) as mentioned is used to anchor a match capture at a specific point; with text match text thrown away.
With that said, I have redone your pattern with comments and a test to 2000:
string pattern = #"
^ # Start at beginning of line so no mid number matches erroneously found
(
2(04|[23]6|49|[58]0) # 2 series only match 204, 226, 236, 249, 250, 280
| # Or it is not 2, then match:
3(06|43|65) # 3 series only match 306, 343, 365
)
$ # Further Anchor it to the end of the string to keep it to 3 numbers";
// RegexOptions.IgnorePatternWhitespace allows us to put the pattern over multiple lines and comment it. Does not
// affect regex parsing/processing.
var results = Enumerable.Range(0, 2000) // Test to 2000 so we don't get any non 3 digit matches.
.Select(num => num.ToString().PadLeft(3, '0'))
.Where (num => Regex.IsMatch(num, pattern, RegexOptions.IgnorePatternWhitespace))
.ToArray();
Console.WriteLine ("These results found {0}", string.Join(", ", results));
// These results found 204, 226, 236, 249, 250, 280, 306, 343, 365
I took the advice of #LucasTrzesniewski and just looped through the possible values. Since I know I’m dealing w/ 3-digit numbers, I just looped through the numbers/strings “000” thru “999” and checked for matches like this:
private static void FindRegExMatches(string pattern)
{
for (var i = 0; i < 1000; i++)
{
var numberString = i.ToString().PadLeft(3, '0');
if (!Regex.IsMatch(numberString, pattern)) continue;
Console.WriteLine("Found a match: {0}, numberString);
}
}
Trying to come up with a 'simple' regex to mask bits of text that look like they might contain account numbers.
In plain English:
any word containing a digit (or a train of such words) should be matched
leave the last 4 digits intact
replace all previous part of the matched string with four X's (xxxx)
So far
I'm using the following:
[\-0-9 ]+(?<m1>[\-0-9]{4})
replacing with
xxxx${m1}
But this misses on the last few samples below
sample data:
123456789
a123b456
a1234b5678
a1234 b5678
111 22 3333
this is a a1234 b5678 test string
Actual results
xxxx6789
a123b456
a1234b5678
a1234 b5678
xxxx3333
this is a a1234 b5678 test string
Expected results
xxxx6789
xxxxb456
xxxx5678
xxxx5678
xxxx3333
this is a xxxx5678 test string
Is such an arrangement possible with a regex replace?
I think I"m going to need some greediness and lookahead functionality, but I have zero experience in those areas.
This works for your example:
var result = Regex.Replace(
input,
#"(?<!\b\w*\d\w*)(?<m1>\s?\b\w*\d\w*)+",
m => "xxxx" + m.Value.Substring(Math.Max(0, m.Value.Length - 4)));
If you have a value like 111 2233 33, it will print xxxx3 33. If you want this to be free from spaces, you could turn the lambda into a multi-line statement that removes whitespace from the value.
To explain the regex pattern a bit, it's got a negative lookbehind, so it makes sure that the word behind it does not have a digit in it (with optional word characters around the digit). Then it's got the m1 portion, which looks for words with digits in them. The last four characters of this are grabbed via some C# code after the regex pattern resolves the rest.
I don't think that regex is the best way to solve this problem and that's why I am posting this answer. For so complex situations, building the corresponding regex is too difficult and, what is worse, its clarity and adaptability is much lower than a longer-code approach.
The code below these lines delivers the exact functionality you are after, it is clear enough and can be easily extended.
string input = "this is a a1234 b5678 test string";
string output = "";
string[] temp = input.Trim().Split(' ');
bool previousNum = false;
string tempOutput = "";
foreach (string word in temp)
{
if (word.ToCharArray().Where(x => char.IsDigit(x)).Count() > 0)
{
previousNum = true;
tempOutput = tempOutput + word;
}
else
{
if (previousNum)
{
if (tempOutput.Length >= 4) tempOutput = "xxxx" + tempOutput.Substring(tempOutput.Length - 4, 4);
output = output + " " + tempOutput;
previousNum = false;
}
output = output + " " + word;
}
}
if (previousNum)
{
if (tempOutput.Length >= 4) tempOutput = "xxxx" + tempOutput.Substring(tempOutput.Length - 4, 4);
output = output + " " + tempOutput;
previousNum = false;
}
Have you tried this:
.*(?<m1>[\d]{4})(?<m2>.*)
with replacement
xxxx${m1}${m2}
This produces
xxxx6789
xxxx5678
xxxx5678
xxxx3333
xxxx5678 test string
You are not going to get 'a123b456' to match ... until 'b' becomes a number. ;-)
Here is my really quick attempt:
(\s|^)([a-z]*\d+[a-z,0-9]+\s)+
This will select all of those test cases. Now as for C# code, you'll need to check each match to see if there is a space at the beginning or end of the match sequence (e.g., the last example will have the space before and after selected)
here is the C# code to do the replace:
var redacted = Regex.Replace(record, #"(\s|^)([a-z]*\d+[a-z,0-9]+\s)+",
match => "xxxx" /*new String("x",match.Value.Length - 4)*/ +
match.Value.Substring(Math.Max(0, match.Value.Length - 4)));
I'm making small app for myself, and I want to find strings which match to a pattern but I could not find the right regular expression.
Stargate.SG-1.S01E08.iNT.DVDRip.XviD-LOCK.avi
That is expamle of string I have and I only want to know if it contains substring of S[NUMBER]E[NUMBER] with each number max 2 digits long.
Can you give me a clue?
Regex
Here is the regex using named groups:
S(?<season>\d{1,2})E(?<episode>\d{1,2})
Usage
Then, you can get named groups (season and episode) like this:
string sample = "Stargate.SG-1.S01E08.iNT.DVDRip.XviD-LOCK.avi";
Regex regex = new Regex(#"S(?<season>\d{1,2})E(?<episode>\d{1,2})");
Match match = regex.Match(sample);
if (match.Success)
{
string season = match.Groups["season"].Value;
string episode = match.Groups["episode"].Value;
Console.WriteLine("Season: " + season + ", Episode: " + episode);
}
else
{
Console.WriteLine("No match!");
}
Explanation of the regex
S // match 'S'
( // start of a capture group
?<season> // name of the capture group: season
\d{1,2} // match 1 to 2 digits
) // end of the capture group
E // match 'E'
( // start of a capture group
?<episode> // name of the capture group: episode
\d{1,2} // match 1 to 2 digits
) // end of the capture group
There's a great online test site here: http://gskinner.com/RegExr/
Using that, here's the regex you'd want:
S\d\dE\d\d
You can do lots of fancy tricks beyond that though!
Take a look at some of the media software like XBMC they all have pretty robust regex filters for tv shows
See here, here
The regex I would put for S[NUMBER1]E[NUMBER2] is
S(\d\d?)E(\d\d?) // (\d\d?) means one or two digit
You can get NUMBER1 by <matchresult>.group(1), NUMBER2 by <matchresult>.group(2).
I would like to propose a little more complex regex. I don't have ". : - _"
because i replace them with space
str_replace(
array('.', ':', '-', '_', '(', ')'), ' ',
This is the capture regex that splits title to title season and episode
(.*)\s(?:s?|se)(\d+)\s?(?:e|x|ep)\s?(\d+)
e.g. Da Vinci's Demons se02ep04 and variants
https://regex101.com/r/UKWzLr/3
The only case that i can't cover is to have interval between season and the number, because the letter s or se is becoming part if the title that does not work for me. Anyhow i haven't seen such a case, but still it is an issue.
Edit:
I managed to get around it with a second line
$title = $matches[1];
$title = preg_replace('/(\ss|\sse)$/i', '', $title);
This way i remove endings on ' s' and ' se' if name is part of series
I'm currently running a simple find-and-replace, on strings like this:
1. User.Name "John"
2. User.Age 20
3. Name.Length 5
However, trying to replace Name with WHATEVER results in this:
1. User.WHATEVER "John"
2. User.Age 20
3. WHATEVER.Length 5
I needed to change line 3, but not line 1. How do I check if the current word is after a dot (.) and skip replacing that word?
I'm in .NET 4.0 and my regex currently looks like this:
result = new Regex(#"\b" + oldWord + #"\b").Replace(text, newWord);
You can use a negative lookbehind on .: (?<!\.)
That gives:
result = new Regex(#"\b(?<!\.)" + oldWord + #"\b").Replace(text, newWord);