Regex exclude ":" and a whitespace if they exist - c#

So I have a regex here:
var text = new Regex(#"(?<=Paybacks).*", RegexOptions.IgnoreCase);
This looks for the line where it starts with Paybacks. Now it currently prints ": blah".
The context sometimes can be "Paybacks" or "Paybacks:" or "Paybacks " or I don't know "Paybacks (with thousands of whitespaces). How can I modify this regex to be like.. after "Paybacks" ignore a colon and a whitespace (or whitespaces) that may or may not exist.
I've been playing with it in regex101 and this seems to be working, but is there a better way?
(?<=Volatility(:\s)).*

In these situations, you'd better use a regex with a capturing group:
var pattern = new Regex(#"Paybacks[\s:]*(.*)", RegexOptions.IgnoreCase);
Then, you can use
var output = Regex.Match(text, pattern)?.Groups[1].Value;
See the .NET regex demo:
See the C# demo:
var texts = new List<string> { "Paybacks: blah","Paybacks:blah","Paybacks blah"};
var pattern = new Regex(#"Paybacks[\s:]*(.*)", RegexOptions.IgnoreCase);
texts.ForEach(text => Console.WriteLine(pattern.Match(text)?.Groups[1].Value));
printing 3 blahs.

You might also match optional colons and whitspace chars in the lookbehind, and start matching the first chars being any non whitspace char other than :
(?<=Paybacks[:\s]*)[^\s:].*
The pattern matches:
(?<= Positive lookbehind, assert what is on the left is
Paybacks Match literally
[:\s]* Optionally match either : or a whitespace char using a character class
) Close lookbehind
[^\s:].* Match a single non whitespace char other than : and the rest of the line
Regex demo | C# demo
var regex = new Regex(#"(?<=Paybacks[:\s]*)[^\s:].*", RegexOptions.IgnoreCase);
string[] strings = {"Paybacks: blah", "Paybacks blah", "Paybacks blah"};
foreach (String s in strings)
{
Console.WriteLine(regex.Match(s)?.Value);
}
Output
blah
blah
blah
If the order should be a single optional colon and optional whitespace chars, you can make the colon optional and the quantifier for the whitespace chars 0 or more using :?\s*
(?<=Paybacks:?\s*)[^\s:].*
Regex demo

Related

Regex for a URL with illegal characters \\

From the following string:
google.com/local/reviews?placeid\\u003dChIJ070npYRaeEgRZNoxwuYYrew\\u0026q\\u003d
To extract u003dChIJ070npYRaeEgRZNoxwuYYrew although this value will change every time.
I have tried
Regex r = new Regex(#"("(?<=\placeid\\\s+)\p{L}+");
Which does not work.
I am guilty of neglecting my knowledge is regex so I apologise if this is painfully easy.
There are no whitespace chars in the string that you want to match with \s+ and there are 2 backslashes.
Using \p{L}+ only matches any letter and the string that you want also contains numbers.
(?<=placeid\\\\\s*)[\p{L}\p{N}]+
Regex demo
For example
string pattern = #"(?<=placeid\\\\\s*)[\p{L}\p{N}]+";
string input = #"google.com/local/reviews?placeid\\u003dChIJ070npYRaeEgRZNoxwuYYrew\\u0026q\\u003d";
Match m = Regex.Match(input, pattern);
Console.WriteLine(m.Value);
Output
u003dChIJ070npYRaeEgRZNoxwuYYrew

c# Regex of value after certain words

I have a question at regex I have a string that looks like:
Slot:0 Module:No module in slot
And what I need is a regex that well get values after slot and module, slot will allways be a number but i have a problem with module (this can be word with spaces), I tried:
var pattern = "(?<=:)[a-zA-Z0-9]+";
foreach (string config in backplaneConfig)
{
List<string> values = Regex.Matches(config, pattern).Cast<Match>().Select(x => x.Value).ToList();
modulesInfo.Add(new ModuleIdentyfication { ModuleSlot = Convert.ToInt32(values.First()), ModuleType = values.Last() });
}
So slot part works, but module works only if it is a word with no spaces, in my example it will give me only "No". Is there a way to do that
You may use a regex to capture the necessary details in the input string:
var pattern = #"Slot:(\d+)\s*Module:(.+)";
foreach (string config in backplaneConfig)
{
var values = Regex.Match(config, pattern);
if (values.Success)
{
modulesInfo.Add(new ModuleIdentyfication { ModuleSlot = Convert.ToInt32(values.Groups[1].Value), ModuleType = values.Groups[2].Value });
}
}
See the regex demo. Group 1 is the ModuleSlot and Group 2 is the ModuleType.
Details
Slot: - literal text
(\d+) - Capturing group 1: one or more digits
\s* - 0+ whitespaces
Module: - literal text
(.+) - Capturing group 2: the rest of the string to the end.
The most simple way would be to add 'space' to your pattern
var pattern = "(?<=:)[a-zA-Z0-9 ]+";
But the best solution would probably the answer from #Wiktor Stribiżew
Another option is to match either 1+ digits followed by a word boundary or match a repeating pattern using your character class but starting with [a-zA-Z]
(?<=:)(?:\d+\b|[a-zA-Z][a-zA-Z0-9]*(?: [a-zA-Z0-9]+)*)
(?<=:) Assert a : on the left
(?: Non capturing group
\d+\b Match 1+ digits followed by a word boundary
| Or
[a-zA-Z][a-zA-Z0-9]* Start a match with a-zA-Z
(?: [a-zA-Z0-9]+)* Optionally repeat a space and what is listed in the character class
) Close on capturing group
Regex demo
Plase replace this:
// regular exp.
(\d+)\s*(.+)
You don't need to use regex for such simple parsing. Try below:
var str = "Slot:0 Module:No module in slot";
str.Split(new string[] { "Slot:", "Module:"},StringSplitOptions.RemoveEmptyEntries)
.Select(s => s.Trim());

Find pipe in quotes ignore false positives [duplicate]

This question already has answers here:
Need C# Regex for replacing spaces inside of strings
(2 answers)
C# Regex Split - commas outside quotes
(7 answers)
Closed 3 years ago.
I'm trying to replace pipe delimited character inside quotes with a space. The issue is I get to many false positives because some strings are null. I only want to replace the pipe if there is text between the quotes. The regex pattern I'm using is from another stackoverflow post as my regex skills are lacking.
data sample:
"Hello"|"Green | Blue"|123.45|""|""|""|5|45
code i'm using:
internal class Program
{
public static void Main()
{
string pattern = #"(?: (?<= "")|\G(?!^))(\s*[^"" |\s]+(?:\s +[^
""|\s]+)*)\s*\|\s*(?=[^""] * "")";
string substitution = #"\1 \2";
string input = #"""20190430|""Test Text""|""""|""""|""Manual""|""""|""Machine""|""""|""""|10.00|""""|0.00|||0.00||5600.00||||""A+""|""""|40.00||""""|""Vision Service |Troubleshoot""|57|""Y""|838|""Yellow Maroon""|850||""FL""||||0.00|||||||||||""""||""""||""""|||""""||||||""""||""""|""""||""""|""""||||||""""|""""|""""||||||||1||""";
RegexOptions options = RegexOptions.Multiline;
Regex regex = new Regex(pattern, options);
string result = regex.Replace(input, substitution);
Console.WriteLine("Result:" + result);
Console.ReadKey();
}
}
It replaces the 'Blue Green' pipe just fine. But it also replaces the pipes between quotes later which breaks the file as column get removed.
Updated the code with an actual sample of my file I'm processing. The regex finds it but doesn't replace the pipe. Missing something.
If there should be text between the double quotes and the text should be on both sides of the pipe, you might use:
(?<=")(\s*[^"\s|]+)\s*\|\s*([^\s"|]+\s*)(?=")
In the replacement use $1 $2
Explanation
(?<=") Postive lookbehind, assert what is on the left is "
(\s*[^"\s|]+) Capture in group 1 matching 0+ times a whitespace char, 1+ times not ", | or a whitespace char
\s*\|\s* Match a | between 0+ times a whitespace char
([^\s"|]+\s*) Capture in group 2 matching 1+ times not ", | or a whitespace char and match 0+ times a whitespace char
(?=") Positive lookahead, assert what is on the right is "
.NET Regex demo
Edit
If you want to replace multiple pipes with a space between the double quotes you could make use of the \G anchor to assert the position at the end of previous match.
In the replacement use the first capturing group followed by a space $1
(?:(?<=")|\G(?!^))(\s*[^"|\s]+(?:\s+[^"|\s]+)*)\s*\|\s*(?=[^"]*")
Explanation
(?: Non capturing group
(?<=") Assert what is on the left is "
| Or
\G(?!^) Assert position at the end of the previous match
) Close non capturing group
( Capure group 1
\s*[^"|\s]+ Match 0+ times a whitespace char, followed by 1+ times not a | or whitespace char
(?:\s+[^"|\s]+)* Repeat 0+ times matching 1+ whitespace chars followed by 1+ times not a | or whitespace char
) Close capturing group 1
\s*\|\s* Match a | between 0+ times a whitespace char
(?=[^"]*") Assert what is on the right is a "
See another .NET regex demo
My guess is that, we might also want to keep only one space in our text, and this expression,
"([^"]+?)\s+\|\s+([^"]+?)"
with a replacement of $1 $2 might work.
Demo
Example
using System;
using System.Text.RegularExpressions;
public class Example
{
public static void Main()
{
string pattern = #"""([^""]+?)\s+\|\s+([^""]+?)""";
string substitution = #"\1 \2";
string input = #"""Hello""|""Green | Blue""|123.45|""""|""""|""""|5|45";
RegexOptions options = RegexOptions.Multiline;
Regex regex = new Regex(pattern, options);
string result = regex.Replace(input, substitution);
}
}

I think my regular expression pattern in C# is incorrect

I'm checking to see if my regular expression matches my string.
I have a filename that looks like somename_somthing.txt and I want to match it to somename_*.txt, but my code is failing when I try to pass something that should match. Here is my code.
string pattern = "somename_*.txt";
Regex r = new Regex(pattern, RegexOptions.IgnoreCase);
using (ZipFile zipFile = ZipFile.Read(fullPath))
{
foreach (ZipEntry e in zipFile)
{
Match m = r.Match("somename_something.txt");
if (!m.Success)
{
throw new FileNotFoundException("A filename with format: " + pattern + " not found.");
}
}
}
The asterisk is matching the underscore and throwing it off.
Try:
somename_(\w+).txt
The (\w+) here will match the group at this location.
You can see it match here: https://regex101.com/r/qS8wA5/1
In General
Regex give in this code matches the _ with an * meaning zero or more underscores instead of what you intended. The * is used to denote zero or more of the previous item. Instead try
^somename_(.*)\.txt$
This matches exactly the first part "somename_".
Then anything (.*)
And finally the end ".txt". The backslash escapes the 'dot'.
More Specific
You can also say if you only want letters and not numbers or symbols in the middle part of the match with:
^somename_[a-z]*\.txt$
As written, your regular expression
somename_*.txt
matches (in a case-insensitive manner):
the literal text somename, followed by
zero or more underscore characters (_), followed
any character (other than newline), followed
the literal text txt
And it will match that anywhere in the source text. You probably want to write something like
Regex myPattern = new Regex( #"
^ # anchor the match to start-of-text, followed by
somename # the literal 'somename', followed by
_ # a literal underscore character, followed by
.* # zero or of any character (except newline), followed by
\. # a literal period/fullstop, followed by
txt # the literal text 'txt'
$ # with the match anchored at end-of-text
" , RegexOptions.IgnoreCase|RegexOptions.IgnorePatternWhitespace
) ;
Hi I think the pattern should be
string pattern = "somename_.*\\.txt";
Regards

Regexp skip pattern

Problem
I need to replace all asterisk symbols('*') with percent symbol('%'). The asterisk symbols in square brackets should be ignored.
Example
[Test]
public void Replace_all_asterisks_outside_the_square_brackets()
{
var input = "Hel[*o], w*rld!";
var output = Regex.Replace(input, "What_pattern_should_be_there?", "%")
Assert.AreEqual("Hel[*o], w%rld!", output));
}
Try using a look ahead:
\*(?![^\[\]]*\])
Here's a bit stronger solution, which takes care of [] blocks better, and even escaped \[ characters:
string text = #"h*H\[el[*o], w*rl\]d!";
string pattern = #"
\\. # Match an escaped character. (to skip over it)
|
\[ # Match a character class
(?:\\.|[^\]])* # which may also contain escaped characters (to skip over it)
\]
|
(?<Asterisk>\*) # Match `*` and add it to a group.
";
text = Regex.Replace(text, pattern,
match => match.Groups["Asterisk"].Success ? "%" : match.Value,
RegexOptions.IgnorePatternWhitespace);
If you don't care about escaped characters you can simplify it to:
\[ # Skip a character class
[^\]]* # until the first ']'
\]
|
(?<Asterisk>\*)
Which can be written without comments as: #"\[[^\]]*\]|(?<Asterisk>\*)".
To understand why it works we need to understand how Regex.Replace works: for every position in the string it tries to match the regex. If it fails, it moves one character. If it succeeds, it moves over the whole match.
Here, we have dummy matches for the [...] blocks so we may skip over the asterisks we don't want to replace, and match only the lonely ones. That decision is made in a callback function that checks if Asterisk was matched or not.
I couldn't come up with a pure RegEx solution. Therefore I am providing you with a pragmatic solution. I tested it and it works:
[Test]
public void Replace_all_asterisks_outside_the_square_brackets()
{
var input = "H*]e*l[*o], w*rl[*d*o] [o*] [o*o].";
var actual = ReplaceAsterisksNotInSquareBrackets(input);
var expected = "H%]e%l[*o], w%rl[*d*o] [o*] [o*o].";
Assert.AreEqual(expected, actual);
}
private static string ReplaceAsterisksNotInSquareBrackets(string s)
{
Regex rx = new Regex(#"(?<=\[[^\[\]]*)(?<asterisk>\*)(?=[^\[\]]*\])");
var matches = rx.Matches(s);
s = s.Replace('*', '%');
foreach (Match match in matches)
{
s = s.Remove(match.Groups["asterisk"].Index, 1);
s = s.Insert(match.Groups["asterisk"].Index, "*");
}
return s;
}
EDITED
Okay here is my final attempt ;)
Using negative lookbehind (?<!) and negative lookahead (?!).
var output = Regex.Replace(input, #"(?<!\[)\*(?!\])", "%");
This also passes the test in the comment to another answer "Hel*o], w*rld!"

Categories

Resources