Regular Expressions + Including one space in pattern - c#

I'm trying to figure out how to write a pattern to match to the following: "3Z 5Z". The numbers in this can vary, but the Z's are constant. The issue I'm having is trying to include the white space... Currently I have this as my pattern
pattern = #"\b*Z\s*Z\b";
The '*' represent the wildcard for the number preceding the "Z", but it doesn't seem to want to work with the space in it. For example, I can use the following pattern successfully for matching to the same thing without the space (i.e. 3Z5Z)
pattern = #"\b*Z*Z\b";
I am writing this program in .NET 4.0 (C#). Any help is much appreciated!
EDIT: This pattern is part of a larger string, for example:
3Z 10Z lock 425"

Try this:
pattern = #"\b\d+Z\s+\d+Z\b";
Explanation:
"
\b # Assert position at a word boundary
\d # Match a single digit 0..9
+ # Between one and unlimited times, as many times as possible, giving back as needed (greedy)
Z # Match the character “Z” literally
\s # Match a single character that is a “whitespace character” (spaces, tabs, line breaks, etc.)
+ # Between one and unlimited times, as many times as possible, giving back as needed (greedy)
\d # Match a single digit 0..9
+ # Between one and unlimited times, as many times as possible, giving back as needed (greedy)
Z # Match the character “Z” literally
\b # Assert position at a word boundary
"
By the way:
\b*
Should throw an exception. \b is a word anchor. You can't quantify it.

Try this code.
using System;
using System.Text.RegularExpressions;
namespace ConsoleApplication1
{
class Program
{
static void Main(string[] args)
{
string txt="3Z 5Z";
string re1="(\\d+)"; // Integer Number 1
string re2="(Z)"; // Any Single Character 1
string re3="( )"; // Any Single Character 2
string re4="(\\d+)"; // Integer Number 2
string re5="(Z)"; // Any Single Character 3
Regex r = new Regex(re1+re2+re3+re4+re5,RegexOptions.IgnoreCase|RegexOptions.Singleline);
Match m = r.Match(txt);
if (m.Success)
{
String int1=m.Groups[1].ToString();
String c1=m.Groups[2].ToString();
String c2=m.Groups[3].ToString();
String int2=m.Groups[4].ToString();
String c3=m.Groups[5].ToString();
Console.Write("("+int1.ToString()+")"+"("+c1.ToString()+")"+"("+c2.ToString()+")"+"("+int2.ToString()+")"+"("+c3.ToString()+")"+"\n");
}
Console.ReadLine();
}
}
}

I addition to other posts I would add characters of the Begin and End of string.
patter = "^\d+Z\s\d+Z$"

Related

I need help for building a regex

It is my first time working with regex and I am a little lost. To give you a little background, I am making a program that reads a text file line by line and it saves in a string called "line". If the line starts with either a tab o or a whitespace, followed by a number or number and dots (such as 1 or 1.2.1, for instance) followed by another tab or whitespace, it copies the line to another file.
So far I build this regex, but it does not work
string pattern = #"(\t| ) *[0-9.] (\t| )";
if (line.StartsWith(pattern))
{
//copy line
}
Also, is line.StartsWith correct? Or should I use something like rgx.Matches(pattern)?
Your pattern contains a character class without a quantifier, which will match either a single digit or dot.
To prevent matching for example only dots you could first match digits followed by an optional part which matches a dot and then again digits [0-9]+(?:\.[0-9]+)*
Note that in this part (\t| ) there are 2 characters expected to match as the space in that part has meaning.
You could simplify the pattern to use a character class to match either a tab or space instead of using an alternation and if you don't need the capturing group you could omit it.
Instead of using StartsWith you could usefor example IsMatch
^[ \t][0-9]+(?:\.[0-9]+)*[ \t]
^ Start of string
[ \t] Match a single tab or space
[0-9]+ Match 1+ digits 0-9
(?:\.[0-9]+)* Repeat 0+ times a dot and 1+ digits
[ \t] Match a single tab or space
Regex demo | C# demo
For example
string s = "\t1.2.1 ";
Regex regex = new Regex(#"^[ \t][0-9]+(?:\.[0-9]+)*[ \t]");
if (regex.IsMatch(s)) {
//copy line
}

C# - Removing single word in string after certain character

I have string that I would like to remove any word following a "\", whether in the middle or at the end, such as:
testing a\determiner checking test one\pronoun
desired result:
testing a checking test one
I have tried a simple regex that removes anything between the backslash and whitespace, but it gives the following result:
string input = "testing a\determiner checking test one\pronoun";
Regex regex = new Regex(#"\\.*\s");
string output = regex.Replace(input, " ");
Result:
testing a one\pronoun
It looks like this regex matches from the backslash until the last whitespace in the string. I cannot seem to figure out how to match from the backlash to the next whitespace. Also, I am not guaranteed a whitespace at the end, so I would need to handle that. I could continue processing the string and remove any text after the backslash, but I was hoping I could handle both cases with one step.
Any advice would be appreciated.
Change .* which match any characters, to \w*, which only match word characters.
Regex regex = new Regex(#"\\\w*");
string output = regex.Replace(input, "");
".*" matches zero or more characters of any kind. Consider using "\w+" instead, which matches one or more "word" characters (not including whitespace).
Using "+" instead of "*" would allow a backslash followed by a non-"word" character to remain unmatched. For example, no matches would be found in the sentence "Sometimes I experience \ an uncontrollable compulsion \ to intersperse backslash \ characters throughout my sentences!"
With your current pattern, .* tells the parser to be "greedy," that is, to take as much of the string as possible until it hits a space. Adding a ? right after that * tells it instead to make the capture as small as possible--to stop as soon as it hits the first space.
Next, you want to end at not just a space, but at either a space or the end of the string. The $ symbol captures the end of the string, and | means or. Group those together using parentheses and your group collectively tells the parser to stop at either a space or the end of the string. Your code will look like this:
string input = #"testing a\determiner checking test one\pronoun";
Regex regex = new Regex(#"\\.*?(\s|$)");
string output = regex.Replace(input, " ");
Try this regex (\\[^\s]*)
(\\[^\s]*)
1st Capturing group (\\[^\s]*)
\\ matches the character \ literally
[^\s]* match a single character not present in the list below
Quantifier: * Between zero and unlimited times, as many times as possible, giving back as needed [greedy]
\s match any white space character [\r\n\t\f ].

How to extract numbers from a string using regular expressions?

This little challenge just screams regular expressions to me, but so far I am stumped.
I have an arbitrary string that contains two numbers embedded in it. I need to extract those two numbers, which will be n and m digits long (n,m are unknown in advance). The format of the string is always
FixedWord[n digits]anotherfixedword[m digits]alotmorestuffontheend
The first number is of the format 1.2.3.4 (the number of digits varying) eg 5.3.20 or 5.3.10.1 or 5.4.
and the second is a simpler 'm' digits (eg 25 or 2)
eg "AppName5.2.6dbVer44Oracle.Group"
It shouts 'pattern matching' and hence "extraction using regexes". Can anyone guide me further?
TIA
The following pattern:
(\d+(?>\.\d+)*)\w+?(\d+)
Will match this:
AppName5.2.6dbVer44Oracle.Group
\__________/ <-- match
\___/ \/ <-- captures
Demo
And will capture the two values you're interested in in capture groups.
Use it like this:
var match = Regex.Match(input, #"(\d+(?>\.\d+)*)\w+?(\d+)");
if (match.Success)
{
var first = match.Groups[1].Value;
var second = match.Groups[2].Value;
// ...
}
Pattern explanation:
( # Start of group 1
\d+ # a series of digits
(?> # start of atomic group
\.\d+ # dot followed by digits
)* # .. 0 to n times
)
\w+? # some word characters (as few as possible)
(\d+) # a series of digits captured in group 2
Try this:
\w*?([\d|\.]+)\w*?([\d{1,4}]+).*
You could start from the following:
^[a-zA-Z]+((?:\d+\.)+\d)[a-zA-Z]+(\d+).*$
I assumed that the fixed words are just made of letters and that you want to match the entire string. If you prefer, you could substitute the parts not in parentheses with the actual fixed words or change the character sets as desired. I recommend using a tool like https://regex101.com to fine-tune the expression.
Keep it basic by specifing a match ( ) by looking for a digit \d, then zero or more * digits or periods in a set [\d.] (the set is \d -or- a literal period):
var data = "AppName5.2.6dbVer44Oracle.Group";
var pattern = #"(\d[\d.]*)";
// Outputs:
// 5.2.6
// 44
Console.WriteLine (Regex.Matches(data, pattern)
.OfType<Match>()
.Select (mt => mt.Groups[1].Value));
Each match will be a number within the sentence. So if the total set of numbers change, the pattern will not fail and dutifully report 1 to N numbers.
Simply look for the numbers, since you only care for the numbers and don't want to check the syntax of the whole input string.
Matches matches = Regex.Matches(input, #"\d+(\.\d+)*");
if (matches.Count >= 2) {
string number1 = matches[0].Value;
string number2 = matches[1].Value;
} else {
// Less than two numbers found
}
The expression \d+(\.\d+)* means:
\d+ one or more digits.
( )* repeat zero, one or more times.
\.\d+ one decimal point (escaped with \) followed by one or more digits.
and
\d one digit.
( ) grouping.
+ repeat the expression to the left one or more times.
* repeat the expression to the left zero, one or more times.
\ escapes characters that have a special meaning in regex.
. any character (without escaping).
\. period character (".").

c# String - Split Pascal Case

I've been trying to get a C# regex command to turn something like
EYDLessThan5Days
into
EYD Less Than 5 Days
Any ideas?
The code I used :
public static string SplitPascalCase(this string value) {
Regex NameExpression = new Regex("([A-Z]+(?=$|[A-Z][a-z])|[A-Z]?[a-z0-9]+)",
RegexOptions.Compiled);
return NameExpression.Replace(value, " $1").Trim();
}
Out:
EYD Less Than5 Days
But still give me wrong result.
Actually I already asked about this in javascript code but when i implemented in c# code with same logic, it's failed.
Please help me. Thanks.
Use lookarounds in your regex so that it won't consume any characters and it allows overlapping of matches.
(?<=[A-Za-z])(?=[A-Z][a-z])|(?<=[a-z0-9])(?=[0-9]?[A-Z])
Replace the matched boundaries with a space.
Regex.Replace(yourString, #"(?<=[A-Za-z])(?=[A-Z][a-z])|(?<=[a-z0-9])(?=[0-9]?[A-Z])", " ");
DEMO
Explanation:
(?<=[A-Za-z])(?=[A-Z][a-z]) Matches the boundary which was exists inbetween an upper or lowercase letter and an Uppercase letter which was immediately followed by a lowercase letter. For example. consider this ABc string. And this regex would match, the boundary exists inbetween A and Bc. For this aBc example , this regex would match, the boundary exists inbetween a and Bc
| Logical OR operator.
(?<=[a-z0-9])(?=[0-9]?[A-Z]) Matches the boundary which was exists inbetween an lower case letter or digit and an optional digit which was immediately followed by an Uppercase letter. For example. consider this a9A string. And this regex would match, the boundary exists inbetween a and 9A , and also the boundary exists inbetween 9 and A, because we gave [0-9] as optional in positive lookahead.
You can just match and join..
var arr = Regex.Matches(str, #"[A-Z]+(?=[A-Z][a-z]+)|\d|[A-Z][a-z]+").Cast<Match>().Select(m => m.Value).ToArray();
Console.WriteLine(String.Join(" ",arr));
The regex isn't complex at all, it is just capturing each and joining them with a " "
DEMO
Something like this should do
string pattern=#"(?<=\d)(?=[a-zA-Z])|(?<=[a-zA-Z])(?=\d)|(?=[A-Z][a-z])|(?<=[a-z])(?=[A-Z])";
Regex.Replace(input,pattern," ");

Using Regular Expression Match a String that contains numbers letters and dashes

I need to match this string 011Q-0SH3-936729 but not 345376346 or asfsdfgsfsdf
It has to contain characters AND numbers AND dashes
Pattern could be 011Q-0SH3-936729 or 011Q-0SH3-936729-SDF3 or 000-222-AAAA or 011Q-0SH3-936729-011Q-0SH3-936729-011Q-0SH3-936729-011Q-0SH3-936729 and I want it to be able to match anyone of those. Reason for this is that I don't really know if the format is fixed and I have no way of finding out either so I need to come up with a generic solution for a pattern with any number of dashes and the pattern recurring any number of times.
Sorry this is probably a stupid question, but I really suck at Regular expressions.
TIA
foundMatch = Regex.IsMatch(subjectString,
#"^ # Start of the string
(?=.*\p{L}) # Assert that there is at least one letter
(?=.*\p{N}) # and at least one digit
(?=.*-) # and at least one dash.
[\p{L}\p{N}-]* # Match a string of letters, digits and dashes
$ # until the end of the string.",
RegexOptions.IgnorePatternWhitespace);
should do what you want. If by letters/digits you meant "only ASCII letters/digits" (and not international/Unicode letters, too), then use
foundMatch = Regex.IsMatch(subjectString,
#"^ # Start of the string
(?=.*[A-Z]) # Assert that there is at least one letter
(?=.*[0-9]) # and at least one digit
(?=.*-) # and at least one dash.
[A-Z0-9-]* # Match a string of letters, digits and dashes
$ # until the end of the string.",
RegexOptions.IgnorePatternWhitespace | RegexOptions.IgnoreCase);
EDIT:
this will match any of the key provided in your comments:
^[0-9A-Z]+(-[0-9A-Z]+)+$
this means the key starts with the digit or letter and have at leats one dash symbol:
Without more info about the regularity of the dashes or otherwise, this is the best we can do:
Regex.IsMatch(input,#"[A-Z0-9\-]+\-[A-Z0-9]")
Although this will also match -A-0
Most naive implementation EVER (might get you started):
([0-9]|[A-Z])+(-)([0-9]|[A-Z])+(-)([0-9]|[A-Z])+
Tested with Regex Coach.
EDIT:
That does match only three groups; here another, slightly better:
([0-9A-Z]+\-)+([0-9A-Z]+)
Are you applying the regex to a whole string (i.e., validating or filtering)? If so, Tim's answer should put you right. But if you're plucking matches from a larger string, it gets a bit more complicated. Here's how I would do that:
string input = #"Pattern could be 011Q-0SH3-936729 or 011Q-0SH3-936729-SDF3 or 000-222-AAAA or 011Q-0SH3-936729-011Q-0SH3-936729-011Q-0SH3-936729-011Q-0SH3-936729 but not 345-3763-46 or ASFS-DFGS-FSDF or ASD123FGH987.";
Regex pluckingRegex = new Regex(
#"(?<!\S) # start of 'word'
(?=\S*\p{L}) # contains a letter
(?=\S*\p{N}) # contains a digit
(?=\S*-) # contains a hyphen
[\p{L}\p{N}-]+ # gobble up letters, digits and hyphens only
(?!\S) # end of 'word'
", RegexOptions.IgnorePatternWhitespace);
foreach (Match m in pluckingRegex.Matches(input))
{
Console.WriteLine(m.Value);
}
output: 011Q-0SH3-936729
011Q-0SH3-936729-SDF3
000-222-AAAA
011Q-0SH3-936729-011Q-0SH3-936729-011Q-0SH3-936729-011Q-0SH3-936729
The negative lookarounds serve as 'word' boundaries: they insure the matched substring starts either at the beginning of the string or after a whitespace character ((?<!\S)), and ends either at the end of the string or before a whitespace character ((?!\S)).
The three positive lookaheads work just like Tim's, except they use \S* to skip whatever precedes the first letter/digit/hyphen. We can't use .* in this case because that would allow it to skip to the next word, or the next, etc., defeating the purpose of the lookahead.

Categories

Resources