Regularexpression C# how to - c#

static void Main(string[] args)
{
int count = 0;
String s = "writeln('Helloa a') tung ('main')";
String patern = #"\'+[\S+\s]*\'";
Regex myRegex = new Regex(patern);
foreach (Match regex in myRegex.Matches(s)) {
Console.WriteLine(regex.Value.ToString());
}
}
When run, it show
'Helloa a') tung ('main'
I want not to this
I want to it print
'Helloa a'
'main'
Can you help me?

try using this regexp:
#"\'[^']+\'"
It will print:
'Helloa a'
'main'

add a ? after the * to make the * non-greedy
#"\'+[\S+\s]*?\'"
http://rubular.com/r/tso5Uvc88v
REGEXPLANATION:
A greedy regex operator will take the largest possible string that it can(between 2 single quotes which in your case is the bolded part.
writeln('Helloa a') tung ('main')
a non-greedy operator will take the smallest possible section, which is what you wanted.
to make a + or * non-greedy, just put a ? after it.

You can use a lazy quantifier. Replace * by *? as Sam I am suggests it, or use this solution:
#"\'(?>[^']+|(?<=\\)')*\'"
that allows escaped quotes.
Details
(?> open an atomic group
[^']+ all that is not a quote one or more times
| OR
(?<=\\)' a quote preceded by a backslash
) close the atomic group
* repeat the group zero or more times
More informations about atomic groups here.

I take it you only want to capture anything within the quotes and parentheses? Try this: \(\'(.+?)\'\)

Related

Regular Expression to find string and set as variable

I am trying to make a regular expression that will tell me if a string has {0#} where zero can be repeated. Once I confirm that a string has this I am then trying to set it to a variable so I can count the number of 0s and replace the # with another number. I have /([{0]})([#}])/g which works on detection but not on pulling it out to another variable.
Edit:
Thanks to all, the answer was
Regex regex = new Regex(#"\{(0+)(#)\}");
Match match = regex.Match(text);
if (match.Success)
{
int zeros = Regex.Matches(match.Value, "0").Count;
}
Use this:
\{(0+)(#)\}
character {
then one or more occurance of 0
a # sign
character }
Live Demo
You are super close. The problem you are having is because your capture group - the ( ) needs to be just around the zeroes. You also don't strictly need the other capture group unless you are doing something with it. You can rewrite your regex like this:
{(0+)#}
{ - match '{'
(0+) - match and capture one or more '0'
# - match '#'
} - match '}'

.NET Regex - get parts of string that do not match pattern

I have this string
TEST_TEXT_ONE_20112017
I want to eliminate _20112017, which is a underscore with numbers, those numbers can vary; my goal is to have only
TEST_TEXT_ONE
So far I have this but I get the entire string, is there something I'm missing?
Regex r = new Regex(#"\b\w+[0-9]+\b");
MatchCollection words = r.Matches("TEST_TEXT_ONE_20112017");
foreach(Match word in words)
{
string w = word.Groups[0].Value;
//I still get the entire string
}
Notes for your consideration:
You should use parenthesis to mark groups for capture -or- use named group. The first group (index=0) is the entire match. you probably want index=1 instead.
\w stands for word character and it already includes both underscore and digits. If you want to match anything before the numbers then you should consider using . instead of \w.
by default +is greedy and your \w+ will consume your last undescore and all but the very last number as well. You probably want to explicitly require an underscore before last block of numbers.
I would suggest considering if you want to find a matching substring or the entire string to match. if the latter, then consider using the start and end markers: ^ and $.
if you know you want to eliminate 8 digits, then you could giving explicit count like \d{8}
For example this should work:
Regex r = new Regex(#"^(.+)_\d+$");
MatchCollection words = r.Matches("TEST_TEXT_ONE_20112017");
foreach (Match word in words)
{
string w = word.Groups[1].Value;
}
Alternative
Use a Zero-Width Positive Lookahead Assertions construct to check what comes next without capturing it. This uses the syntax on (?=stuff). So you could use a shorter code and avoid surfing in Groups altogether:
Regex r = new Regex(#"^.+(?=_\d+$)");
String result = r.Match("TEST_TEXT_ONE_20112017").Value;
Note that we require the end marker $ within the positive lookahead group.
Regex r = new Regex(#"(\b.+)_([0-9]+)\b");
String w = r.Match("TEST_TEXT_ONE_20112017").Groups[1].Value; //TEST_TEXT_ONE
or:
String w = r.Match("TEST_TEXT_ONE_20112017").Groups[2].Value; //20112017
This seems a bit overkill for Regex in my opinion. As an alternative you could just split on the _ character and rebuild the string:
private static string RemoveDate(string input)
{
string[] parts = input.Split('_');
return string.Join("_", parts.Take(parts.Length - 1));
}
Or if the date suffix is always the same length, you could also just substring:
private static string RemoveDateFixedLength(string input)
{
//Removes last 9 characters (8 for date, 1 for underscore)
return input.Substring(0, input.Length - 9);
}
However I feel like the first approach is better, this is just another option.
Fiddle here

Detect Two Consecutive Single Quotes Inside Single Quotes

I'm struggling to get this regex pattern exactly right, and am open to other options outside of regex if someone has a better alternative.
The situation:
I'm basically looking to parse a T-SQL "in" clause against a text column in C#. So, I need to take a string value like this:
"'don''t', 'do', 'anything', 'stupid'"
And interpret that as a list of values (I'll take care of the double single quotes later):
"don''t"
"do"
"anything"
"stupid"
I have a regex that works for most cases, but I'm struggling to generalize it to the point where it will accept any character OR a doubled-up single quote inside my group: (?:')([a-z0-9\s(?:'(?='))]+)(?:')[,\w]*
I'm fairly experienced with regexes, but have rarely, if ever, found a need for look-arounds (so downgrade my assessment of my regex experience accordingly).
So, to put this another way, I'm wanting to take a string of comma-delimited values, each enclosed in single quotes but can contain doubled single quotes, and output each such value.
EDIT
Here's a non-working example with my current regex (my problem is I need to handle all characters in my grouping and stop when I encounter a single quote not followed by a second single quote):
"'don''t', 'do?', 'anything!', '#stupid$'"
If you still think about a regex-based solution, you can use the following regex:
'(?:''|[^'])*'
Or an "un-rolled" version suggested by #sln:
'[^']*(?:''[^']*)*'
See demo
It is fairly simple, it captures double single quotation marks OR anything that is not a single quotation mark. No need using any look-behinds or look-aheads. It does not take care of any escaped entities, but I do not see this requirement in your question.
Moreover, this regex will return matches that are easy to access and deal with:
var text = "'don''t', 'do', 'anything', 'stupid'";
var re = new Regex(#"'[^']*(?:''[^']*)*'"); // Updated thanks to #sln, previous (#"'(?:''|[^'])*'");
var match_values = re.Matches(text).Cast<Match>().Select(p => p.Value).ToList();
Output:
If you want to use the Capture Collection feature, you can grab them all in a
single pass.
# #"""\s*(?:'([^']*(?:''[^']*)*)'\s*(?:,\s*|(?="")))+"""
"
\s*
(?:
'
( # (1 start)
[^']*
(?:
'' [^']*
)*
) # (1 end)
'
\s*
(?:
, \s*
| (?= " )
)
)+
"
C# code:
string strSrc = "\"'don''t', 'do', 'anything', 'stupid'\"";
Regex rx = new Regex(#"""\s*(?:'([^']*(?:''[^']*)*)'\s*(?:,\s*|(?="")))+""");
Match srcMatch = rx.Match(strSrc);
if (srcMatch.Success)
{
CaptureCollection cc = srcMatch.Groups[1].Captures;
for (int i = 0; i < cc.Count; i++)
Console.WriteLine("{0} = '{1}'", i, cc[i].Value);
}
Output:
0 = 'don''t'
1 = 'do'
2 = 'anything'
3 = 'stupid'
Press any key to continue . . .
Why don't you split on ', ':
Regex regex = new Regex(#"'\s*,\s*'");
string[] substrings = regex.Split(str);
And then take care of the extra single quotes by Trimming
Looks to me like you're over-thinking the problem. A quoted string with an escaped quote looks just like two strings without escaped quotes, one right after the other (not even spaces between them).
(?:'[^']*')+
Of course, you'll have to remove the enclosing quotes, but you probably had to do some post-processing anyway, to unescape the escaped quotes.
Also note that I'm not trying to validate the input or work around possible errors; for example, I don't bother matching the commas between the strings. If the input is well formed, this regex should be all you need.
In the interest of maintainability, I decided against a regex and followed the advice of using a state machine. Here's the crux of my implementation:
string currentTerm = string.Empty;
State currentState = State.BetweenTerms;
foreach (char c in valueToParse)
{
switch (currentState)
{
// if between terms, only need to do something if we encounter a single quote, signalling to start a new term
// encloser is client-specified char to look for (e.g. ')
case State.BetweenTerms:
if (c == encloser)
{
currentState = State.InTerm;
}
break;
case State.InTerm:
if (c == encloser)
{
if (valueToParse.Length > index + 1 && valueToParse[index + 1] == encloser && valueToParse.Length > index + 2)
{
// if next character is also encloser then add it and move on
currentTerm += c;
}
else if (currentTerm.Length > 0 && currentTerm[currentTerm.Length - 1] != encloser)
{
// on an encloser and didn't just add encloser, so we are done
// converterFunc is a client-specified Func<string,T> to return terms in the specified type (to allow for converting to int, for example)
yield return converterFunc(currentTerm);
currentTerm = string.Empty;
currentState = State.BetweenTerms;
}
}
else
{
currentTerm += c;
}
break;
}
index++;
}
if (currentTerm.Length > 0)
{
yield return converterFunc(currentTerm);
}

Regex instance can not find more than one match even though it exists

I was using Regex and I wrote this:
static void Main(string[] args)
{
string test = "this a string meant to test long space recognition n a";
Regex regex = new Regex(#"[a-z][\s]{4,}[a-z]$");
MatchCollection matches = regex.Matches(test);
if (matches.Count > 1)
Console.WriteLine("yes");
else
{
Console.WriteLine("no");
Console.WriteLine("the number of matches is "+matches.Count);
}
}
In my opinion the Matches method should find both "n n" and "n a". Nevertheless, it only manages to find "n n" and I just do not understand why is that..
The $ in your regular expression means, that the pattern must occur at the end of the line. If you want to find all the long spaces this simple expression suffices:
\s{4,}
If you really need to know whether the spaces are enclosed by a-z, you can search like this
(?<=[a-z])\s{4,}(?=[a-z])
This uses the pattern...
(?<=prefix)find(?=suffix)
...and finds positions enclosed between a prefix and a suffix. The prefix and suffix are not part of the match, i.e. match.Value contains only the contiguous spaces. Therefore you don't get the "n" is consumed problem mentioned by Jon Skeet.
You have two problems:
1) You're anchoring the match to the end of the string. So actually, the value that's matched is "n...a", not "n...n"
2) The middle "n" is consumed by the first match, so can't be part of the second match. If you change that "n" to "nx" (and remove the $) you'll see "n...n" and "x...a"
Short but complete example:
using System;
using System.Text.RegularExpressions;
public class Test
{
static void Main(string[] args)
{
string text = "ignored a bc d";
Regex regex = new Regex(#"[a-z][\s]{4,}[a-z]");
foreach (Match match in regex.Matches(text))
{
Console.WriteLine(match);
}
}
}
Result:
a b
c d
I just do not understand why is that..
I think the 'why' it is consumed by the first match is to prevent regexes like "\\w+s", designed to get every word that ends with an 's' from returning "ts", "ats" and "cats" when matched against "cats".
The Regex machinery does one match, if you want more, you have to restart it youself after the first match.

How to match an enclosing substring using regex in .NET

I'm trying to match * in id=resultsStats>*<nobr> to extract the middle bit.
This would match e.g.
id=resultsStats>3<nobr>
id=resultsStats>anything<nobr>
so I can extract the middle "3" or "anything"
How do I do this in .NET regex or otherwise?
(?<=id=resultsStats>).+?(?=<nobr>)
Use * instead of + if content is optional rather than required.
Example of use (F#):
open System.Text.RegularExpressions
let tryFindResultsStats input =
let m = Regex.Match (input,
"(?<=id=resultsStats>).+?(?=<nobr>)",
RegexOptions.Singleline)
if m.Success then Some m.Value else None
I'm not a regex expert but something like this might work:
#"\>\*{1}\<"
This means "match a single asterisk between the lt/gt characters". You just need to make sure you escape the asterisk because it has special meaning in regular expressions.
Hope this helps!
If you are looking to capture a * then you need to escape it with a backslash. Note that if you are doing this within a string it is safest to escape the backslash as well (Although technically \* isn't valid and will work)
"\\*"
Try this:
using System;
using System.Text.RegularExpressions;
namespace SO6312611
{
class Program
{
static void Main()
{
string input = "id=resultsStats>anything<nobr>";
Regex r = new Regex("id=resultsStats>(?<data>[^<]*)<nobr>");
Match m = r.Match(input);
Console.WriteLine("Matched: >{0}<", m.Groups["data"]);
}
}
}

Categories

Resources