Regex for a URL with illegal characters \\ - c#

From the following string:
google.com/local/reviews?placeid\\u003dChIJ070npYRaeEgRZNoxwuYYrew\\u0026q\\u003d
To extract u003dChIJ070npYRaeEgRZNoxwuYYrew although this value will change every time.
I have tried
Regex r = new Regex(#"("(?<=\placeid\\\s+)\p{L}+");
Which does not work.
I am guilty of neglecting my knowledge is regex so I apologise if this is painfully easy.

There are no whitespace chars in the string that you want to match with \s+ and there are 2 backslashes.
Using \p{L}+ only matches any letter and the string that you want also contains numbers.
(?<=placeid\\\\\s*)[\p{L}\p{N}]+
Regex demo
For example
string pattern = #"(?<=placeid\\\\\s*)[\p{L}\p{N}]+";
string input = #"google.com/local/reviews?placeid\\u003dChIJ070npYRaeEgRZNoxwuYYrew\\u0026q\\u003d";
Match m = Regex.Match(input, pattern);
Console.WriteLine(m.Value);
Output
u003dChIJ070npYRaeEgRZNoxwuYYrew

Related

Regex exclude ":" and a whitespace if they exist

So I have a regex here:
var text = new Regex(#"(?<=Paybacks).*", RegexOptions.IgnoreCase);
This looks for the line where it starts with Paybacks. Now it currently prints ": blah".
The context sometimes can be "Paybacks" or "Paybacks:" or "Paybacks " or I don't know "Paybacks (with thousands of whitespaces). How can I modify this regex to be like.. after "Paybacks" ignore a colon and a whitespace (or whitespaces) that may or may not exist.
I've been playing with it in regex101 and this seems to be working, but is there a better way?
(?<=Volatility(:\s)).*
In these situations, you'd better use a regex with a capturing group:
var pattern = new Regex(#"Paybacks[\s:]*(.*)", RegexOptions.IgnoreCase);
Then, you can use
var output = Regex.Match(text, pattern)?.Groups[1].Value;
See the .NET regex demo:
See the C# demo:
var texts = new List<string> { "Paybacks: blah","Paybacks:blah","Paybacks blah"};
var pattern = new Regex(#"Paybacks[\s:]*(.*)", RegexOptions.IgnoreCase);
texts.ForEach(text => Console.WriteLine(pattern.Match(text)?.Groups[1].Value));
printing 3 blahs.
You might also match optional colons and whitspace chars in the lookbehind, and start matching the first chars being any non whitspace char other than :
(?<=Paybacks[:\s]*)[^\s:].*
The pattern matches:
(?<= Positive lookbehind, assert what is on the left is
Paybacks Match literally
[:\s]* Optionally match either : or a whitespace char using a character class
) Close lookbehind
[^\s:].* Match a single non whitespace char other than : and the rest of the line
Regex demo | C# demo
var regex = new Regex(#"(?<=Paybacks[:\s]*)[^\s:].*", RegexOptions.IgnoreCase);
string[] strings = {"Paybacks: blah", "Paybacks blah", "Paybacks blah"};
foreach (String s in strings)
{
Console.WriteLine(regex.Match(s)?.Value);
}
Output
blah
blah
blah
If the order should be a single optional colon and optional whitespace chars, you can make the colon optional and the quantifier for the whitespace chars 0 or more using :?\s*
(?<=Paybacks:?\s*)[^\s:].*
Regex demo

C# Regex - starts with pattern1 not contain pattern2

for the following input string contains all of these:
a1.aaa[SUBSCRIBED]
a1.bbb
a1.ccc
b1.ddd
d1.ddd[SUBSCRIBED]
I want to get the output:
bbb
ccc
which means: all the words that come after "a1." And not contain the substring "[SUBSCRIBED]"
all the words comes after "a1." And not contains the substring
"[SUBSCRIBED]"
Why regex? Following is crystal clear:
var result = strings
.Where(s => s.StartsWith("a1.") && !s.Contains("[SUBSCRIBED]"))
.Select(s => s.Substring(3));
Tim's answer makes sense. However if you insist on it I would venture that a Regex would look like this though.
^a1\.(.*)(?<!\[SUBSCRIBED\])$
with ^a1 meaning starts with a1
\.(.*) taking any number of character
and the negative lookbehind (?<!\[SUBSCRIBED\])$ would refuse text ending with [SUBSCRIBED]
You may use
^a1\.(?!.*\[SUBSCRIBED])(.*)
See the regex demo.
Details
^ - start of string
a1\. - a literal a1. substring
(?!.*\[SUBSCRIBED]) - a negative lookahead that fails the match if there is a [SUBSCRIBED] substring is present after any 0+ chars (other than newline if the RegexOptions.Singleline option is not used)
(.*) - Group 1: the rest of the line up to the end (if you use RegexOptions.Singleline option, . will match newlines as well).
C# code:
var result = string.Empty;
var m = Regex.Match(s, #"^a1\.(?!.*\[SUBSCRIBED])(.*)");
if (m.Success)
{
result = m.Groups[1].Value;
}

Regular Expression to just complete word C#

I would like to know how to extract complete words using a Regex in C#
For example, my String input:
This$#23 is-kt jkdls
I want to get Regex Match as
This$#23
is-kt
jkdls
I need to extract non space words [which can have numbers or special characters]
by specifying Regex Match pattern
Regex myrex = new Regex("pattern")
MatchCollection matches = Regex.Matches("This$#23 is-kt jkdls", #"\S+");
foreach(Match match in matches)
Console.WriteLine(match.Value);
Use \S+ to match words.
var words = string.Split(' ');

C# regex pattern to getting words

I am trying to figure out the pattern that will get words from a string. Say for instance my string is:
string text = "HI/how.are.3.a.d.you.&/{}today 2z3";
I tried to eliminate anything under 1 letter or number but it doesn't work:
Regex.Split(s, #"\b\w{1,1}\b");
I also tried this:
Regex.Splits(text, #"\W+");
But it outputs:
"HI how are a d you today"
I just want to get all the words so that my final string is:
"HI how are you today"
To get all words that are at least 2 characters long you can use this pattern: \b[a-zA-Z]{2,}\b.
string text = "HI/how.are.3.a.d.you.&/{}today 2z3";
var matches = Regex.Matches(text, #"\b[a-zA-Z]{2,}\b");
string result = String.Join(" ", matches.Cast<Match>().Select(m => m.Value));
Console.WriteLine(result);
As others have pointed out in the comments, "A" and "I" are valid words. In case you decide to match those you can use this pattern instead:
var matches = Regex.Matches(text, #"\b(?:[a-z]{2,}|[ai])\b",
RegexOptions.IgnoreCase);
In both patterns I've used \b to match word-boundaries. If you have input such as "1abc2" then "abc" wouldn't be matched. If you want it to be matched then remove the \b metacharacters. Doing so from the first pattern is straightforward. The second pattern would change to [a-z]{2,}|[ai].

Regex Pattern - Alphanumeric

[username] where username is any string containing only alphanumeric chars between 1 and 12 characters long
My code:
Regex pat = new Regex(#"\[[a-zA-Z0-9_]{1,12}\]");
MatchCollection matches = pat.Matches(accountFileData);
foreach (Match m in matches)
{
string username = m.Value.Replace("[", "").Replace("]", "");
MessageBox.Show(username);
}
Gives me one blank match
This gets you a name inside brackets (the match does't contain the square brackets symbol):
(?<=\[)[A-Za-z0-9]{1,12}(?=\])
You could use it like:
Regex pat = new Regex(#"(?<=\[)[A-Za-z0-9]{1,12}(?=\])");
MatchCollection matches = pat.Matches(accountFileData);
foreach (Match m in matches)
{
MessageBox.Show(m.Value);
}
You have too many brackets, and you may want to match the beginning (^) and end ($) of the string.
^[a-zA-Z0-9]{1,12}$
If you are expecting square brackets in the string you are matching, then escape them with a backslash.
\[[a-zA-Z0-9]{1,12}\]
// In C#
new Regex(#"\[[a-zA-Z0-9]{1,12}\]")
You have too many brackets.
[a-zA-Z0-9]{1, 12}
If you're trying to match the brackets, they need to be escaped properly:
\[[a-zA-Z0-9]{1, 12}\]

Categories

Resources