I need a regular expression to match a string but exclude a specific words from the string.
for example
dfm HSBC12323
i need to extract
HSBC12323
and do not include dfm. if the string HSBC12323 it need to match it as it is as dfm may not be exist.
if the string dfm123213 i need to match 123213
adx 212321 i need to match 212321 not adx
adx hsbc123uy i need to match hsbc123uy
hsbc1237 i need to match it as is.
(?<!dfm\s*?|adx\s*?|\w)\d+
but it doesn't work like i want
Actual string : dfm HSBC12323 excpected HSBC12323
Actual string : HSBC12323 expected HSBC12323
Actual string : dfm123213 expected 123213
Actual string : adx 212321 expected 212321
Actual string : usa1237 expected usa1237
Your pattern (?<!dfm\s*?|adx\s*?|\w)\d+ matches 1+ digits if what is on the left is not either dfm or adx or a word character where after the first 2 there can be whitespace chars. You don't have to make the s*? non greedy as it can not pass matching the following digits \d+
In all your examples that would not match because before all the examples \w can match before a digit when the first 2 can not match. This for example $22 would match.
One option to match your values could be using a alternation in combination with a positive lookbehind and a negative lookahead.
(?<=\b(?:dfm|adx) *)\w+|\b(?!(?:dfm|adx))\w+
Explanation
(?<= Positive lookbehind, assert what is on the left
\b(?:dfm|adx) * Word boundary, match either dfm or adx followed by 0+ times a space
) Close positive lookbehind
\w+ Match 1+ word chars
| Or
\b Word boundary
(?! Negative lookahead, assert what is directly on the right is not
(?:dfm|adx) Match either dfm or adx
) Close negative lookahead
\w+ Match 1+ word characters
See a .NET regex demo
You might also add (?!\S) after matching \w+ if the match should not be followed by a non whitespace char.
My guess is that with this expression or one similar to, we can just step by step capture what we desire to, and then we can even strengthen our expression with additional constraints, just to be safe:
(?=dfm\s+|adx\s+)(?:dfm\s+([A-Z0-9]+)|adx\s+([0-9]+))|(?=dfm)dfm([0-9]+)|[A-Za-z0-9]+
Demo
Test
using System;
using System.Text.RegularExpressions;
public class Example
{
public static void Main()
{
string pattern = #"(?=dfm\s+|adx\s+)(?:dfm\s+([A-Z0-9]+)|adx\s+([0-9]+))|(?=dfm)dfm([0-9]+)|[A-Za-z0-9]+";
string input = #"dfm HSBC12323
HSBC12323
dfm123213
adx 212321
usa1237";
RegexOptions options = RegexOptions.Multiline;
foreach (Match m in Regex.Matches(input, pattern, options))
{
Console.WriteLine("'{0}' found at index {1}.", m.Value, m.Index);
}
}
}
RegEx Circuit
jex.im visualizes regular expressions:
Related
I'm trying to match fully qualified C# type names, but the + after \w+ captures too much:
global((::|\.)\w+(?!\s|\())+
Tried to play with quantifiers and negative lookahead but without success.
Online sandbox:
https://regex101.com/r/L6Y8kv/1
Sample:
public global::libebur128.EBUR128StateInternal D
{
get
{
var __result0 = global::libebur128.EBUR128StateInternal.__GetOrCreateInstance(((__Internal*)__Instance)->d, false);
return __result0;
}
Result:
global::libebur128.EBUR128StateInterna
global::libebur128.EBUR128StateInternal.__GetOrCreateInstanc
Expected:
global::libebur128.EBUR128StateInternal
global::libebur128.EBUR128StateInternal
For the example data, you might use:
\bglobal::[^\W_]+(?:\.[^\W_]+)*
The pattern matches:
\bglobal:: A word boundary, followed by matching global::
[^\W_]+ Match 1+ word characters excluding _
(?:\.[^\W_]+)* Optionally repeat matching . and 1+ word characters excluding _
See a regex101 demo.
If the last part should not be followed by ( and you don't want to take the underscore into account, you might add a word boundary and a negative lookahead:
\bglobal::\w+(?:\.\w+)*\b(?!\()
The pattern matches:
\b A word boundary
global:: Match literally
\w+ Match 1+ word chars
(?:\.\w+)* Optionally repeat . and 1+ word chars
\b A word boundary (to prevent backtracking to make the next assertion true)
(?!\() Negative lookahead, assert not ( directly to the right of the current position
regex101 demo
I have a string that has variables inserted within them. They are surround by double curly braces, i.e. {{VARIABLE}}.
What Regex expression could be use to return the variable names within the double curly braces?
You can use lookahead and lookbehind assertions to match text that comes after and before certain patterns. You can also use a negative character class to match characters that aren't }, so that your matched string isn't too greedy.
(?<=\{\{)[^}]+(?=\}\})
You can see this pattern in action here
You could also use a capturing group:
\{\{(.+?)}}
Regex demo
If there can not be anything before or after the placeholder and the placeholder itself can contain a { or } you might use:
(?<!\S)\{\{(.+?)}}(?!\S)
Explanation
(?<!\S) Assert what is on the left is not a non whitespace char
\{\{ Match {{
(.+?) Capture in group 1 matching any char 1+ times non greedy
}} Match literally
(?!\S) Assert what is on the right is not a non whitespace char
Regex demo
I'm trying to create what I think is a repeating non-capturing group, and I just can't figure out how.
In plain words, I want to match:
Any number which is both
Preceded by any amount of blocks that doesn't contain a space, but is not either just a number.
Followed by any amount of blocks that doesn't contain a space, but is not either just a number.
Here is what I tried:
Pattern: (?:\w.)+(\d+)(?:.\w+)+
Test Set:
3.AAA
AAA.BBB
AAA.3.BBB
AAA.3.B555B
AAA.3.BBB.4
AAA.3.BBB.4.CCC
AAA.3.BBB.CCC
AAA.3.BBB.CCC.4
AAA.3.BBB.CCC.4.DDD
ZZZ.AAA.3.BBB
ZZZ.AAA.3.BBB.4
ZZZ.AAA.3.BBB.4.CCC
ZZZ.AAA.3.BBB.CCC
ZZZ.AAA.3.BBB.CCC.4
ZZZ.AAA.3.BBB.CCC.4.DDD
I would want it to match only to:
AAA.3.BBB
AAA.3.B555B
AAA.3.BBB.CCC
ZZZ.AAA.3.BBB
ZZZ.AAA.3.BBB.CCC
Note: I saw some other posts asking the same-ish question, but I can't use the answers because they were all like "Instead of trying to repeat a group, just match 'this' and it will work for your specific case".
Code
See regex in use here
^(?:(?!(?:\.|^)\d+\.)\S)+\.\d+\.(?:(?!\.\d+(?:\.|$))\S)+$
Results
Input
3.AAA
AAA.BBB
AAA.3.BBB
AAA.3.B555B
AAA.3.BBB.4
AAA.3.BBB.4.CCC
AAA.3.BBB.CCC
AAA.3.BBB.CCC.4
AAA.3.BBB.CCC.4.DDD
ZZZ.AAA.3.BBB
ZZZ.AAA.3.BBB.4
ZZZ.AAA.3.BBB.4.CCC
ZZZ.AAA.3.BBB.CCC
ZZZ.AAA.3.BBB.CCC.4
ZZZ.AAA.3.BBB.CCC.4.DDD
Output
AAA.3.BBB
AAA.3.B555B
AAA.3.BBB.CCC
ZZZ.AAA.3.BBB
ZZZ.AAA.3.BBB.CCC
Explanation
^ Assert position at the start of the line
(?:(?!(?:\.|^)\d+\.)\S)+ Match the following one or more times
(?!(?:\.|^)\d+\.) Negative lookahead ensuring what follows doesn't match
(?:\.|^) Match either of the following
\. Match a literal dot character .
^ Assert position at the start of the line
\d+ Match one or more digits
\. Match a literal dot character .
\S Match any non-whitespace character
\. Match a literal dot character .
\d+ Match one or more digits
\. Match a literal dot chracter .
(?:(?!\.\d+(?:\.|$))\S)+ Match the following one or more times
(?!\.\d+(?:\.|$)) Negative lookahead ensuring what follows doesn't match
\. Match a literal dot chracter .
\d+ Match one or more digits
(?:\.|$) Match either of the following
\. Match a literal dot chracter .
$ Assert position at the end of the line
\S Match any non-whitespace character
$ Assert position at the end of the line
There is a bit simpler solution:
^(?:(?!\d+\.)\w+\.)+\d+(?:\.(?!\d+(?=\.|$))\w+)+$
See the .NET regex demo (since it is a multiline demo, \r? has to be added before $, it is not necessary when matching standalone strings).
Details
^ - start of string
(?:(?!\d+\.)\w+\.)+ - 1 or more occurrences (due to (?:...)+) of any 1+ word chars (letters, digits, _ - due to \w+) that are not all digits followed with a dot (note that to match only letters and digits, you need to use [\w-[_]] or [^\W_] instead of \w, or if you are really after matching the blocks that may even have symbols or punctuation, replace \w with [^\s.] - any char but whitespace or dot)
\d+ - 1 or more digits
(?:\.(?!\d+(?=\.|$))\w+)+ - 1 or more occurrences of
\. - a dot
(?!\d+(?=\.|$)) - not followed with 1+ digits (\d+) followed with a dot or end of string
\w+ - 1 or more word chars
$ - end of string.
C# demo:
using System.Collections.Generic;
using System.Text.RegularExpressions;
public class Test
{
public static void Main()
{
var lst = new List<string> {"3.AAA", "AAA.BBB", "AAA.3.BBB", "AAA.3.B555B", "AAA.3.BBB.4",
"AAA.3.BBB.4.CCC", "AAA.3.BBB.CCC", "AAA.3.BBB.CCC.4", "AAA.3.BBB.CCC.4.DDD",
"ZZZ.AAA.3.BBB","ZZZ.AAA.3.BBB.4","ZZZ.AAA.3.BBB.4.CCC", "ZZZ.AAA.3.BBB.CCC",
"ZZZ.AAA.3.BBB.CCC.4", "ZZZ.AAA.3.BBB.CCC.4.DDD"};
var rx = new Regex(#"^(?:(?!\d+\.)[^\s.]+\.)+\d+(?:\.(?!\d+(?=\.|$))[^\s.]+)+$",
RegexOptions.Compiled | RegexOptions.ECMAScript);
foreach (var s in lst)
{
if (rx.IsMatch(s))
Console.WriteLine(s);
}
}
}
Results:
AAA.3.BBB
AAA.3.B555B
AAA.3.BBB.CCC
ZZZ.AAA.3.BBB
ZZZ.AAA.3.BBB.CCC
Using Regex, how can I escape special characters in xml attribute values?
Given the following xml as string:
"<node attr=\"<Sample>\"></node>"
I want to get:
"<node attr=\"<Sample>\"></node>"
System.Security.SecurityElement.Escape function won't work as it tries to escape every special characters (including tag opening/closing angle brackets).
string text = "<node attr=\"<Sample>\"></node>";
string pattern = #"(?<=\b\w+\s*=\s*"")<\w+>(?="")";
string result = Regex.Replace(text, pattern, m => SecurityElement.Escape(m.Value));
Console.WriteLine(text);
Console.WriteLine(result);
Where:
?<= - positive lookbehind
\b - start the match at a word boundary
\w+ - match one or more word characters
\s* - match zero or more white-space characters
?= - positive lookahead
This question already has an answer here:
Regex expression to match whole word ?
(1 answer)
Closed 4 years ago.
I was going through this question
C#, Regex.Match whole words
It says for match whole word use "\bpattern\b"
This works fine for match whole word without any special characters since it is meant for word characters only!
I need an expression to match words with special characters also. My code is as follows
class Program
{
static void Main(string[] args)
{
string str = Regex.Escape("Hi temp% dkfsfdf hi");
string pattern = Regex.Escape("temp%");
var matches = Regex.Matches(str, "\\b" + pattern + "\\b" , RegexOptions.IgnoreCase);
int count = matches.Count;
}
}
But it fails because of %. Do we have any workaround for this?
There can be other special characters like 'space','(',')', etc
If you have non-word characters then you cannot use \b. You can use the following
#"(?<=^|\s)" + pattern + #"(?=\s|$)"
Edit: As Tim mentioned in comments, your regex is failing precisely because \b fails to match the boundary between % and the white-space next to it because both of them are non-word characters. \b matches only the boundary between word character and a non-word character.
See more on word boundaries here.
Explanation
#"
(?<= # Assert that the regex below can be matched, with the match ending at this position (positive lookbehind)
# Match either the regular expression below (attempting the next alternative only if this one fails)
^ # Assert position at the beginning of the string
| # Or match regular expression number 2 below (the entire group fails if this one fails to match)
\s # Match a single character that is a “whitespace character” (spaces, tabs, and line breaks)
)
temp% # Match the characters “temp%” literally
(?= # Assert that the regex below can be matched, starting at this position (positive lookahead)
# Match either the regular expression below (attempting the next alternative only if this one fails)
\s # Match a single character that is a “whitespace character” (spaces, tabs, and line breaks)
| # Or match regular expression number 2 below (the entire group fails if this one fails to match)
$ # Assert position at the end of the string (or before the line break at the end of the string, if any)
)
"
If the pattern can contain characters that are special to Regex, run it through Regex.Escape first.
This you did, but do not escape the string that you search through - you don't need that.
output = Regex.Replace(output, "(?<!\w)-\w+", "")
output = Regex.Replace(output, " -"".*?""", "")