string manipulation in regex

string manipulation in regex - c#

i have a problem in string manipulation
here is the code
string str = "LDAP://company.com/OU=MyOU1 Control,DC=MyCompany,DC=com";
Regex regex = new Regex("OU=\\w+");
var result = regex.Matches(str);
var strList = new List<string>();
foreach (var item in result)
{
strList.Add(item.ToString().Remove(0,3));
}
Console.WriteLine(string.Join("/",strList));
the result i am getting is "MyOU1" instead of getting "MyOU1 Control"
please help thanks

If you want the space character to be matched as well, you need to include it in your regex. \w only matches word charactes, which does not include spaces.
Regex regex = new Regex(#"OU=[\w\s]+");
This matches word characters (\w) and whitespace characters (\s).
(The # in front of the string is just for convenience: If you use it, you don't need to escape backslashes.)

Either add space to the allowed list (\w doesn't allow space) or use the knowledge that comma can be used as a separator.
Regex regex = new Regex("OU=(\\w|\\s)+");
OR
Regex regex = new Regex("OU=[^,]+");

Related

Matching words with a forward-slash via Regular Expression

I am trying to match words that start with a forward slash in C#.
For example /exit and I have tried using the regex \b(/exit)\b but for some reason it doesn't match.
Here's a sample code that I am trying out:
static void Main(string[] args)
{
var commands= new List<string>();
commands.Add("/exit");
var listOfString = commands.Select(Regex.Escape).ToList();
var joinTheWords = string.Join("|", listOfString);
var regexPattern = $#"\b({joinTheWords})\b";
var theRegex= new Regex(regexPattern, RegexOptions.IgnoreCase);
Console.WriteLine(theRegex);
Console.WriteLine(theRegex.Match(#"/exit").Success);
Console.WriteLine("Press any key to exit.");
Console.ReadLine();
}

At the beginning of the string "/exit", there's no word boundary /b because "/" isn't a letter, number, or underscore. (there's a word boundary just after the "/")
you could roll your own "smart word boundary" to include matching these forward slashes as valid "word" characters:
(?:((?<!/)\B(?=/))|\b(?=\w))
In English, this means that you must have either a "NON word boundary followed by a slash that doesn't have any preceding slashes" (?<!/)\B(?=/), OR "a regular word boundary, provided you can 'see' an alphanumeric after it" \b(?=\w). By using a \B with "/", we can get "pseudo word boundary" behavior:
var commands = new List<string>();
commands.Add("/exit");
List<String> listOfString = commands.Select(Regex.Escape).ToList();
String joinTheWords = string.Join("|", listOfString);
var regexPattern = $#"(?:(?:(?<!/)\B)(?=/)|\b(?=\w))({joinTheWords})\b";
var theRegex = new Regex(regexPattern, RegexOptions.IgnoreCase);
Console.WriteLine(theRegex);
Console.WriteLine(theRegex.Match("/exit").Success);
Console.WriteLine("Press any key to exit.");
Console.ReadLine();
There may (and probably are) more simple ways to approach this, especially if you can "preprocess" the list of pattern fragments first to replace special characters with a static tokens, match with regular \b's, then replace them back.
regex demo

Since you already know the / is included in all the words,
you can factor them out of your command list.
Change commands.Add("/exit"); to this commands.Add("exit");
Then do as normal, escaping metachars and joining.
Then, since you only care that / is not preceded with a / all
thats needed in the beginning is(?<!/)/.
As for the end, I'd use a conditional word boundary (?(?<=\w)\b).
I mean, that's all you really need.
Putting it all together, the regex line would be:
var regexPattern = $#"(?<!/)(/(?:{joinTheWords}))(?(?<=\w)\b)";

a not so clean way (but simple) to find words with forward slashes is to replace the forward slash with accepted (but never used string), and use that in your regex search:
str = "this is a search string with /exit and/exit";
key = "/exit";
value="/EXIT";
str = str.replace(/\//gi, "_a_a_");
k = key.replace(/\//gi, "_a_a_");
var regex = new RegExp('\\b' + k + '\\b', "g");
str = str.replace(regex, value) ;
str = str.replace("_a_a_","/");
console.log(str);

Parsing with regex between html tags

I'm trying to extract text between html tags on this way:
var regex = new Regex(#"<td>ID zahtjeva: <b>"".*?""</b></td>");
var match = regex.Match(#"<td>ID zahtjeva: <b>438398694</b></td>");
var result = match.Groups[1].Value;
The result should be text between <b> </b> tags but I get empty string. I'm not sure what I miss in regex.

Your regex should be the following (assuming you're only matching numbers):
var regex = new Regex(#"<td>ID zahtjeva: <b>(\d+)</b></td>");
Your previous regex was searching for " characters, which don't exist in your sample code. You also need to define a capturing group via ().

Change your regex like this...
var regex = new Regex(#"<td>ID zahtjeva: <b>(.*?)</b></td>");

As per MSDN :
\b
Start the match at a word boundary.
(?<word>\w+)
Match one or more word characters up to a word boundary. Name this captured group word.
\s+
Match one or more white-space characters.
(\k<word>)
Match the captured group that is named word.
\b
Match a word boundary.
So for your problem , it would be
var regex = new Regex(#"<td>ID zahtjeva: <b>(.*?)</b></td>");

C# Regex search string for text including surrounding brackets

I would like to search a string for '[E1010]' or '[E1011]' or '[E1012]'. Currently, I can only successfully search without using the brackets []. How can I adjust my regex to include the texting surrounded by the brackets as it is in my sClientError variable.
Thanks!
string sClientErrors = "Bla Blah \"30\" [E1011]\r\nBlah Blah"44\" [E1012]";
Regex myRegexE10 = new Regex(#"\bE1010\b");
Regex myRegexE11 = new Regex(#"\bE1011\b");
Regex myRegexE12 = new Regex(#"\bE1012\b");
if (myRegexE10.IsMatch(sClientErrors) || myRegexE11.IsMatch(sClientErrors) || myRegexE12.IsMatch(sClientErrors))
{
// do code here...
}

By adding the brackets:
Regex myRegexE10 = new Regex(#"\[E1010]");
or
Regex myRegexE1x = new Regex(#"\[E101[012]]");
if (myRegexE1x.IsMatch(sClientErrors)) { ...
Note that once you add the brackets, word boundaries are no longer necessary. Note too that you don't need to escape closing square brackets

You can put a "\" if front of a character you want to include, so you would use:
Regex myRegexE10 = new Regex(#"\[\bE1010\b\]")
You can also use "\\" if you needed to find something like "\s", where "\*" is a Regex option.

regex pattern for tags needed

Howzit,
I need help with the following please.
I need to find tags in a string. These tags start with {{ and end with }}, there will be multiple tags in the string I receive.
So far I have this, but it doesn't find any matches, what am I missing here?
List<string> list = new List<string>();
string pattern = "{{*}}";
Regex r = new Regex(pattern, RegexOptions.IgnoreCase);
Match m = r.Match(text);
while (m.Success)
{
list.Add(m.Groups[0].Value);
m = m.NextMatch();
}
return list;
even tried string pattern = "{{[A-Za-z0-9]}}";
thanx
PS. I know close to nothing about regex.

Not only do you want to use {{.+?}} as your regex, you also need to pass RegexOptions.SingleLine. That will treat your entire string as a single line and the . will match \n (which it normally will not do).

Try {{.+}}. The .+ means there has to be at least one character as part of the tag.
EDIT:
To capture the string containing your tags you can do {{(.+)}} and then tokenize your match with the Tokenize or Scanner class?

I would recommend trying something like the following:
List<string> list = new List<string>();
string pattern = "{{(.*?)}}";
Regex r = new Regex(pattern, RegexOptions.IgnoreCase);
Match m = r.Match(text);
while (m.Success)
{
list.Add(m.Groups[1].Value);
m = m.NextMatch();
}
return list;
the regex specifies:
{{ # match {{ literally
( # begin capturing into group #1
.*? # match any characters, from zero to infinite, but be lazy*
) # end capturing group
}} # match }} literally
"lazy" means to attempt to continue matching the pattern afterwards "}}" before backtracking to the .*? and reluctantly adding a character to the capturing group only if the character does not match }} - hope that made sense.
I changed your code by modifying the regex and to extract the first matching group from the regex match object (m.Groups[1].value) instead of the entire match.

{{.*?}} or
{{.+?}}
. - means any symbol
? - means lazy(don't capute nextpattern)

Regex Pattern - Alphanumeric

[username] where username is any string containing only alphanumeric chars between 1 and 12 characters long
My code:
Regex pat = new Regex(#"\[[a-zA-Z0-9_]{1,12}\]");
MatchCollection matches = pat.Matches(accountFileData);
foreach (Match m in matches)
{
string username = m.Value.Replace("[", "").Replace("]", "");
MessageBox.Show(username);
}
Gives me one blank match

This gets you a name inside brackets (the match does't contain the square brackets symbol):
(?<=\[)[A-Za-z0-9]{1,12}(?=\])
You could use it like:
Regex pat = new Regex(#"(?<=\[)[A-Za-z0-9]{1,12}(?=\])");
MatchCollection matches = pat.Matches(accountFileData);
foreach (Match m in matches)
{
MessageBox.Show(m.Value);
}

You have too many brackets, and you may want to match the beginning (^) and end ($) of the string.
^[a-zA-Z0-9]{1,12}$
If you are expecting square brackets in the string you are matching, then escape them with a backslash.
\[[a-zA-Z0-9]{1,12}\]
// In C#
new Regex(#"\[[a-zA-Z0-9]{1,12}\]")

You have too many brackets.
[a-zA-Z0-9]{1, 12}

If you're trying to match the brackets, they need to be escaped properly:
\[[a-zA-Z0-9]{1, 12}\]

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

string manipulation in regex - c#

Either add space to the allowed list (\w doesn't allow space) or use the knowledge that comma can be used as a separator. Regex regex = new Regex("OU=(\\w|\\s)+"); OR Regex regex = new Regex("OU=[^,]+");

Related

Matching words with a forward-slash via Regular Expression

Parsing with regex between html tags

C# Regex search string for text including surrounding brackets

regex pattern for tags needed

Regex Pattern - Alphanumeric

Categories

Resources