Regex.split, how to read left of the matched pattern - c#

I am trying to convert a Perl script to a C# 3.5 routine.
The perl code I have is:
if($work =~ /\<[0-9][0-9][0-9]\>/){
$left = $`;
$match = $&;
$work = $';
}
In C# I wrote the following code:
string[] sSplit = Regex.Split(work, #"\<[0-9][0-9][0-9]\>");
if sSplit.length is > 2
{
left = sSplit[0];
match = sSplit[1];
work = sSPlit[2];
}
However the above is not giving me the matched pattern in sSplit[1], but the content to the right of the matched string instead.

Regex.Split is not what you need. The equivalent to =~ /.../ is Regex.Match.
However, Regex.Match has no equivalent to Perl’s $` or $', so you need to use a workaround, but I think it’s a fair one:
var m = Regex.Match(work, #"^(.*?)(\<[0-9][0-9][0-9]\>)(.*)$", RegexOptions.Singleline);
if (m.Success)
{
left = m.Groups[0].Value;
match = m.Groups[1].Value; // perhaps with Convert.ToInt32()?
work = m.Groups[2].Value;
}
Alternatively, you can use the match index and length to get the stuff:
var m = Regex.Match(work, #"^\<[0-9][0-9][0-9]\>");
if (m.Success)
{
left = work.Substring(0, m.Index);
match = m.Value; // perhaps with Convert.ToInt32()?
work = work.Substring(m.Index + m.Length);
}

When trying regular expressions, I always recomment RegexHero, which is an online tool that visualizes your .NET regular expressions. In this case, use Regex.Match and use Groups. That'll give what you want.
Note that the backslash in \< and \> are not needed in C# (nor in Perl, btw).
Also note that $`, $& and $' have equivalents in C# when used in a replacement expression. If that's what you need in the end, you can use these "magic variables", but only in Regex.Replace.

A split is usually asking to throw away the delimiters. Perl acts just the same way (without the verboten $& type variables.)
You capture delimters in Perl by putting parens around them:
my #parts = split /(<[0-9][0-9][0-9]>)/; # includes the delimiter
my #parts = split /<[0-9][0-9][0-9]>/; # doesn't include the delimiter

Related

match first digits before # symbol

How to match all first digits before # in this line
26909578#Sbrntrl_7x06-lilla.avi#356028416#2012-10-24 09:06#0#http://bitshare.com/files/dvk9o1oz/Sbrntrl_7x06-lilla.avi.html#[URL=http://bitshare.com/files/dvk9o1oz/Sbrntrl_7x06-lilla.avi.html]http://bitshare.com/files/dvk9o1oz/Sbrntrl_7x06-lilla.avi.html[/URL]#http://bitshare.com/files/dvk9o1oz/Sbrntrl_7x06-lilla.avi.html#http://bitshare.com/?f=dvk9o1oz#http://bitshare.com/delete/dvk9o1oz/4511e6f3612961f961a761adcb7e40a0/Sbrntrl_7x06-lilla.avi.html
Im trying to get this number 26909578
My try
string text = #"26909578#Sbrntrl_7x06-lilla.avi#356028416#2012-10-24 09:06#0#http://bitshare.com/files/dvk9o1oz/Sbrntrl_7x06-lilla.avi.html#[URL=http://bitshare.com/files/dvk9o1oz/Sbrntrl_7x06-lilla.avi.html]http://bitshare.com/files/dvk9o1oz/Sbrntrl_7x06-lilla.avi.html[/URL]#http://bitshare.com/files/dvk9o1oz/Sbrntrl_7x06-lilla.avi.html#http://bitshare.com/?f=dvk9o1oz#http://bitshare.com/delete/dvk9o1oz/4511e6f3612961f961a761adcb7e40a0/Sbrntrl_7x06-lilla.avi.html";
MatchCollection m1 = Regex.Matches(text, #"(.+?)#", RegexOptions.Singleline);
but then its outputs all text
Make it explicit that it has to start at the beginning of the string:
#"^(.+?)#"
Alternatively, if you know that this will always be a number, restrict the possible characters to digits:
#"^\d+"
Alternatively use the function Match instead of Matches. Matches explicitly says, "give me all the matches", while Match will only return the first one.
Or, in a trivial case like this, you might also consider a non-RegEx approach. The IndexOf() method will locate the '#' and you could easily strip off what came before.
I even wrote a sscanf() replacement for C#, which you can see in my article A sscanf() Replacement for .NET.
If you dont want to/dont like to use regex, use a string builder and just loop until you hit the #.
so like this
StringBuilder sb = new StringBuilder();
string yourdata = "yourdata";
int i = 0;
while(yourdata[i]!='#')
{
sb.Append(yourdata[i]);
i++;
}
//when you get to that # your stringbuilder will have the number you want in it so return it with .toString();
string answer = sb.toString();
The entire string (except the final url) is composed of segments that can be matched by (.+?)#, so you will get several matches. Retrieve only the first match from the collection returned by matching .+?(?=#)

.NET regex replace using backreference

I have a fairly long string that contains sub strings with the following format:
project[1]/someword[1]
project[1]/someotherword[1]
There will be about 10 or so instances of this pattern in the string.
What I want to do is to be able to replace the second integer in square brackets with a different one. So the string would look like this for instance:
project[1]/someword[2]
project[1]/someotherword[2]
I''m thinking that regular expressions are what I need here. I came up with the regex:
project\[1\]/.*\[([0-9])\]
Which should capture the group [0-9] so I can replace it with something else. I'm looking at MSDN Regex.Replace() but I'm not seeing how to replace part of a string that is captured with a value of your choosing. Any advice on how to accomplish this would be appreciated. Thanks much.
*Edit: * After working with #Tharwen some I have changed my approach a bit. Here is the new code I am working with:
String yourString = String yourString = #"<element w:xpath=""/project[1]/someword[1]""/> <anothernode></anothernode> <another element w:xpath=""/project[1]/someotherword[1]""/>";
int yourNumber = 2;
string anotherString = string.Empty;
anotherString = Regex.Replace(yourString, #"(?<=project\[1\]/.*\[)\d(?=\]"")", yourNumber.ToString());
Matched groups are replaced using the $1, $2 syntax as follows :-
csharp> Regex.Replace("Meaning of life is 42", #"([^\d]*)(\d+)", "$1($2)");
"Meaning of life is (42)"
If you are new to regular expressions in .NET I recommend http://www.ultrapico.com/Expresso.htm
Also http://www.regular-expressions.info/dotnet.html has some good stuff for quick reference.
I've adapted yours to use a lookbehind and lookahead to only match a digit which is preceded by 'project[1]/xxxxx[' and followed by ']':
(?<=project\[1\]/.*\[)\d(?=\]")
Then, you can use:
String yourString = "project[1]/someword[1]";
int yourNumber = 2;
yourString = Regex.Replace(yourString, #"(?<=project\[1\]/.*\[)\d(?=\]"")", yourNumber.ToString());
I think maybe you were confused because Regex.Replace has lots of overloads which do slightly different things. I've used this one.
If you want to process the value of a captured group before replacing it, you'll have to separate the different parts of the string, make your modifications and put them back together.
string test = "project[1]/someword[1]\nproject[1]/someotherword[1]\n";
string result = string.Empty;
foreach (Match match in Regex.Matches(test, #"(project\[1\]/.*\[)([0-9])(\]\n)"))
{
result += match.Groups[1].Value;
result += (int.Parse(match.Groups[2].Value) + 1).ToString();
result += match.Groups[3].Value;
}
If you just want to replace text verbatim, it's easier: Regex.Replace(test, #"abc(.*)cba", #"cba$1abc").
you can use String.Replace (String, String)
for example
String.Replace ("someword[1]", "someword[2]")

Regex accepting all strings, wronly

I am trying to substrings if they have certain format. Substring Regex query is [CENAOD(xyx)]. I have done following code but when running this in cycle it says all results match which is wrong. Where I've done something wrong?
string strRegex = #"(\[CENAOD\((\S|\W)*\)\])*";
string strCenaOd = sReader["intro"].ToString()
if (Regex.IsMatch(strCenaOd, strRegex, RegexOptions.IgnoreCase))
{
string = (want to read content of ( ) = xyz in example)
}
Remove the outer ( ... )*.
That says no match is a good match too.
Or use + instead of *.
Adding to #Kent's and #leppie's answers, the code surrounding the regex needs work, too. I think this is what you were trying for:
string strRegex = #"\[CENAOD\(([^)]*)\)\]";
string strCenaOd = sReader["intro"].ToString();
Match m = Regex.Match(strCenaOd, strRegex, RegexOptions.IgnoreCase);
if (m.Success)
{
string content = m.Groups[1];
// ...
}
IsMatch() is a simple yes-or-no check, it doesn't provide any way to retrieve the matched text.
I especially want to comment on (\S|\W)*, from your regex. First, \S|\W is a very inefficient way to match any character. . is usually all you need, but as Kent pointed out, [^)] (i.e., any character except )) is more appropriate in this case. Also, by placing the * outside the round brackets, you'll only ever capture the last character. ([^)]*) captures all of them. For more details, read this.
if you said "all strings", how about:
\[CENAOD\([^\)]*\)\]

Make groups with regular expression like in perl?

In Perl if I use this regex /(\w+)\.(\w+)/ on the string "1A3.25D", the global vars $1 strores "1A3" and $2 stores "25D".
Is there a way to do this in C#?
Certainly, look at this example:
var pattern = #"^\D*(\d+)$";
var result = Regex.Match("Some text 10", pattern);
var num = int.Parse(result.Groups[1].Value); // 10
Group[0] is the entire match (in this case the entire line because I use ^ and $.
If you use Regex.Replace(...) you can use $X to incorporate the groups as you are used to :-)

How can I find a string after a specific string/character using regex

I am hopeless with regex (c#) so I would appreciate some help:
Basicaly I need to parse a text and I need to find the following information inside the text:
Sample text:
KeywordB:***TextToFind* the rest is not relevant but **KeywordB: Text ToFindB and then some more text.
I need to find the word(s) after a certain keyword which may end with a “:”.
[UPDATE]
Thanks Andrew and Alan: Sorry for reopening the question but there is quite an important thing missing in that regex. As I wrote in my last comment, Is it possible to have a variable (how many words to look for, depending on the keyword) as part of the regex?
Or: I could have a different regex for each keyword (will only be a hand full). But still don't know how to have the "words to look for" constant inside the regex
The basic regex is this:
var pattern = #"KeywordB:\s*(\w*)";
\s* = any number of spaces
\w* = 0 or more word characters (non-space, basically)
() = make a group, so you can extract the part that matched
var pattern = #"KeywordB:\s*(\w*)";
var test = #"KeywordB: TextToFind";
var match = Regex.Match(test, pattern);
if (match.Success) {
Console.Write("Value found = {0}", match.Groups[1]);
}
If you have more than one of these on a line, you can use this:
var test = #"KeywordB: TextToFind KeyWordF: MoreText";
var matches = Regex.Matches(test, #"(?:\s*(?<key>\w*):\s?(?<value>\w*))");
foreach (Match f in matches ) {
Console.WriteLine("Keyword '{0}' = '{1}'", f.Groups["key"], f.Groups["value"]);
}
Also, check out the regex designer here: http://www.radsoftware.com.au/. It is free, and I use it constantly. It works great to prototype expressions. You need to rearrange the UI for basic work, but after that it's easy.
(fyi) The "#" before strings means that \ no longer means something special, so you can type #"c:\fun.txt" instead of "c:\fun.txt"
Let me know if I should delete the old post, but perhaps someone wants to read it.
The way to do a "words to look for" inside the regex is like this:
regex = #"(Key1|Key2|Key3|LastName|FirstName|Etc):"
What you are doing probably isn't worth the effort in a regex, though it can probably be done the way you want (still not 100% clear on requirements, though). It involves looking ahead to the next match, and stopping at that point.
Here is a re-write as a regex + regular functional code that should do the trick. It doesn't care about spaces, so if you ask for "Key2" like below, it will separate it from the value.
string[] keys = {"Key1", "Key2", "Key3"};
string source = "Key1:Value1Key2: ValueAnd A: To Test Key3: Something";
FindKeys(keys, source);
private void FindKeys(IEnumerable<string> keywords, string source) {
var found = new Dictionary<string, string>(10);
var keys = string.Join("|", keywords.ToArray());
var matches = Regex.Matches(source, #"(?<key>" + keys + "):",
RegexOptions.IgnoreCase);
foreach (Match m in matches) {
var key = m.Groups["key"].ToString();
var start = m.Index + m.Length;
var nx = m.NextMatch();
var end = (nx.Success ? nx.Index : source.Length);
found.Add(key, source.Substring(start, end - start));
}
foreach (var n in found) {
Console.WriteLine("Key={0}, Value={1}", n.Key, n.Value);
}
}
And the output from this is:
Key=Key1, Value=Value1
Key=Key2, Value= ValueAnd A: To Test
Key=Key3, Value= Something
/KeywordB\: (\w)/
This matches any word that comes after your keyword. As you didn´t mentioned any terminator, I assumed that you wanted only the word next to the keyword.

Categories

Resources