Find pipe in quotes ignore false positives [duplicate] - c#

This question already has answers here:
Need C# Regex for replacing spaces inside of strings
(2 answers)
C# Regex Split - commas outside quotes
(7 answers)
Closed 3 years ago.
I'm trying to replace pipe delimited character inside quotes with a space. The issue is I get to many false positives because some strings are null. I only want to replace the pipe if there is text between the quotes. The regex pattern I'm using is from another stackoverflow post as my regex skills are lacking.
data sample:
"Hello"|"Green | Blue"|123.45|""|""|""|5|45
code i'm using:
internal class Program
{
public static void Main()
{
string pattern = #"(?: (?<= "")|\G(?!^))(\s*[^"" |\s]+(?:\s +[^
""|\s]+)*)\s*\|\s*(?=[^""] * "")";
string substitution = #"\1 \2";
string input = #"""20190430|""Test Text""|""""|""""|""Manual""|""""|""Machine""|""""|""""|10.00|""""|0.00|||0.00||5600.00||||""A+""|""""|40.00||""""|""Vision Service |Troubleshoot""|57|""Y""|838|""Yellow Maroon""|850||""FL""||||0.00|||||||||||""""||""""||""""|||""""||||||""""||""""|""""||""""|""""||||||""""|""""|""""||||||||1||""";
RegexOptions options = RegexOptions.Multiline;
Regex regex = new Regex(pattern, options);
string result = regex.Replace(input, substitution);
Console.WriteLine("Result:" + result);
Console.ReadKey();
}
}
It replaces the 'Blue Green' pipe just fine. But it also replaces the pipes between quotes later which breaks the file as column get removed.
Updated the code with an actual sample of my file I'm processing. The regex finds it but doesn't replace the pipe. Missing something.

If there should be text between the double quotes and the text should be on both sides of the pipe, you might use:
(?<=")(\s*[^"\s|]+)\s*\|\s*([^\s"|]+\s*)(?=")
In the replacement use $1 $2
Explanation
(?<=") Postive lookbehind, assert what is on the left is "
(\s*[^"\s|]+) Capture in group 1 matching 0+ times a whitespace char, 1+ times not ", | or a whitespace char
\s*\|\s* Match a | between 0+ times a whitespace char
([^\s"|]+\s*) Capture in group 2 matching 1+ times not ", | or a whitespace char and match 0+ times a whitespace char
(?=") Positive lookahead, assert what is on the right is "
.NET Regex demo
Edit
If you want to replace multiple pipes with a space between the double quotes you could make use of the \G anchor to assert the position at the end of previous match.
In the replacement use the first capturing group followed by a space $1
(?:(?<=")|\G(?!^))(\s*[^"|\s]+(?:\s+[^"|\s]+)*)\s*\|\s*(?=[^"]*")
Explanation
(?: Non capturing group
(?<=") Assert what is on the left is "
| Or
\G(?!^) Assert position at the end of the previous match
) Close non capturing group
( Capure group 1
\s*[^"|\s]+ Match 0+ times a whitespace char, followed by 1+ times not a | or whitespace char
(?:\s+[^"|\s]+)* Repeat 0+ times matching 1+ whitespace chars followed by 1+ times not a | or whitespace char
) Close capturing group 1
\s*\|\s* Match a | between 0+ times a whitespace char
(?=[^"]*") Assert what is on the right is a "
See another .NET regex demo

My guess is that, we might also want to keep only one space in our text, and this expression,
"([^"]+?)\s+\|\s+([^"]+?)"
with a replacement of $1 $2 might work.
Demo
Example
using System;
using System.Text.RegularExpressions;
public class Example
{
public static void Main()
{
string pattern = #"""([^""]+?)\s+\|\s+([^""]+?)""";
string substitution = #"\1 \2";
string input = #"""Hello""|""Green | Blue""|123.45|""""|""""|""""|5|45";
RegexOptions options = RegexOptions.Multiline;
Regex regex = new Regex(pattern, options);
string result = regex.Replace(input, substitution);
}
}

Related

Regular expression : Excluding last part

I'm looking to apply a regular expression to an input string.
Regular expression:(.*)\\(.*)_(.*)_(.*)-([0-9]{4}).*
Test entries:
Parkman\L9\B137598_00_T-3298-B
Parkman\L9\B137598_00_T-3298
The result should be B137598_00_T-3298 for both test entries. The problem is that if I add 4 digits in the test entries, the result will be, for example, B137598_00_T-3298-5555.
What I need here is that anything after the 3298 should not be taken into account.
What are the changes that I can perform to make that possible
You can use a single capture group with a bit more specific pattern:
\w\\\w+\\((?:[^\W_]+_){2}[^\W_]+-[0-9]{4})\b
The pattern matches:
\w Match a single word char
\\\w+\\ Match 1+ word chars between backslashes
( Capture group 1
(?:[^\W_]+_){2} Repeat 2 times word chars without _ followed by a single _
[^\W_]+- Match 1+ word chars without _ and then -
-[0-9]{4} Match - and 4 digits
) Close group 1
\b A word boundary
Regex demo
Or a bit broader pattern with a match only, where \w also matches an underscore, and asserting \ to the left:
(?<=\\)\w+-[0-9]{4}\b
Regex demo
c# code:
string s1 = #"Parkman\\L9\\B137598_00_T-3298-B";
string s2 = #"Parkman\L9\B137598_00_T-3298";
string pattern = #"\w+_[0-9]{2}_T-[0-9]{4}";
var match = Regex.Matches( s1, pattern);
Console.WriteLine("s1: {0}", match[0]);
match = Regex.Matches(s2, pattern);
Console.WriteLine("s2: {0}" , match[0]);
then the result:
s1: B137598_00_T-3298
s2: B137598_00_T-3298

Validating a string with comma-separated alphanumeric words only OR just spaces

I am working on a regex that allows alphanumeric characters separated by comma. Or just spaces. Without a comma as the first character.
What I am trying to do:
"101010101sadadsasd,120120310231023a,adasdads1231,asdasdasda1231"
" " < -- case of just spaces of any number
What I am trying to avoid:
"&###$,asdasdads,asdsd#!#"
",aasdas,asdasd"
" asda asdsad asdasd ,asdasd"
What's acceptable but not wanted: (can live with it)
",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"
"asdasdasdas,asdasd123123,adsasd23123," <-- I can just trim(",")
Below is screenshot of the implementation and the event where isMatch = true even though the value is symbols and not alphanumeric.
The link shows a screenshot of the problem, and the screenshot code is as follows:
bool result = true;
Regex regx = new Regex(#"(^[a-zA-Z0-9]+[a-zA-Z0-9,-,]*$| *)");
if (regx.IsMatch(rowUpdate.ConNoteNumber))
{
result = false;
}
return result;
You can use
^(?:[a-zA-Z0-9]+(?:,[a-zA-Z0-9]+)*|\s*)$
Details:
^ - start of a string
(?:
[a-zA-Z0-9]+(?:,[a-zA-Z0-9]+)* - one or more letters/digits, and then zero or more occurrences of a comma and one or more letters/digits
| - or
\s* - zero or more whitespaces
) - end of the group
$ - end of string (if the regex is executed on the server side, $ can be replaced with \z, too).
See the regex demo.

Regex exclude ":" and a whitespace if they exist

So I have a regex here:
var text = new Regex(#"(?<=Paybacks).*", RegexOptions.IgnoreCase);
This looks for the line where it starts with Paybacks. Now it currently prints ": blah".
The context sometimes can be "Paybacks" or "Paybacks:" or "Paybacks " or I don't know "Paybacks (with thousands of whitespaces). How can I modify this regex to be like.. after "Paybacks" ignore a colon and a whitespace (or whitespaces) that may or may not exist.
I've been playing with it in regex101 and this seems to be working, but is there a better way?
(?<=Volatility(:\s)).*
In these situations, you'd better use a regex with a capturing group:
var pattern = new Regex(#"Paybacks[\s:]*(.*)", RegexOptions.IgnoreCase);
Then, you can use
var output = Regex.Match(text, pattern)?.Groups[1].Value;
See the .NET regex demo:
See the C# demo:
var texts = new List<string> { "Paybacks: blah","Paybacks:blah","Paybacks blah"};
var pattern = new Regex(#"Paybacks[\s:]*(.*)", RegexOptions.IgnoreCase);
texts.ForEach(text => Console.WriteLine(pattern.Match(text)?.Groups[1].Value));
printing 3 blahs.
You might also match optional colons and whitspace chars in the lookbehind, and start matching the first chars being any non whitspace char other than :
(?<=Paybacks[:\s]*)[^\s:].*
The pattern matches:
(?<= Positive lookbehind, assert what is on the left is
Paybacks Match literally
[:\s]* Optionally match either : or a whitespace char using a character class
) Close lookbehind
[^\s:].* Match a single non whitespace char other than : and the rest of the line
Regex demo | C# demo
var regex = new Regex(#"(?<=Paybacks[:\s]*)[^\s:].*", RegexOptions.IgnoreCase);
string[] strings = {"Paybacks: blah", "Paybacks blah", "Paybacks blah"};
foreach (String s in strings)
{
Console.WriteLine(regex.Match(s)?.Value);
}
Output
blah
blah
blah
If the order should be a single optional colon and optional whitespace chars, you can make the colon optional and the quantifier for the whitespace chars 0 or more using :?\s*
(?<=Paybacks:?\s*)[^\s:].*
Regex demo

c# Regex of value after certain words

I have a question at regex I have a string that looks like:
Slot:0 Module:No module in slot
And what I need is a regex that well get values after slot and module, slot will allways be a number but i have a problem with module (this can be word with spaces), I tried:
var pattern = "(?<=:)[a-zA-Z0-9]+";
foreach (string config in backplaneConfig)
{
List<string> values = Regex.Matches(config, pattern).Cast<Match>().Select(x => x.Value).ToList();
modulesInfo.Add(new ModuleIdentyfication { ModuleSlot = Convert.ToInt32(values.First()), ModuleType = values.Last() });
}
So slot part works, but module works only if it is a word with no spaces, in my example it will give me only "No". Is there a way to do that
You may use a regex to capture the necessary details in the input string:
var pattern = #"Slot:(\d+)\s*Module:(.+)";
foreach (string config in backplaneConfig)
{
var values = Regex.Match(config, pattern);
if (values.Success)
{
modulesInfo.Add(new ModuleIdentyfication { ModuleSlot = Convert.ToInt32(values.Groups[1].Value), ModuleType = values.Groups[2].Value });
}
}
See the regex demo. Group 1 is the ModuleSlot and Group 2 is the ModuleType.
Details
Slot: - literal text
(\d+) - Capturing group 1: one or more digits
\s* - 0+ whitespaces
Module: - literal text
(.+) - Capturing group 2: the rest of the string to the end.
The most simple way would be to add 'space' to your pattern
var pattern = "(?<=:)[a-zA-Z0-9 ]+";
But the best solution would probably the answer from #Wiktor Stribiżew
Another option is to match either 1+ digits followed by a word boundary or match a repeating pattern using your character class but starting with [a-zA-Z]
(?<=:)(?:\d+\b|[a-zA-Z][a-zA-Z0-9]*(?: [a-zA-Z0-9]+)*)
(?<=:) Assert a : on the left
(?: Non capturing group
\d+\b Match 1+ digits followed by a word boundary
| Or
[a-zA-Z][a-zA-Z0-9]* Start a match with a-zA-Z
(?: [a-zA-Z0-9]+)* Optionally repeat a space and what is listed in the character class
) Close on capturing group
Regex demo
Plase replace this:
// regular exp.
(\d+)\s*(.+)
You don't need to use regex for such simple parsing. Try below:
var str = "Slot:0 Module:No module in slot";
str.Split(new string[] { "Slot:", "Module:"},StringSplitOptions.RemoveEmptyEntries)
.Select(s => s.Trim());

How can I remove quoted string literals from a string in C#?

I have a string:
Hello "quoted string" and 'tricky"stuff' world
and want to get the string minus the quoted parts back. E.g.,
Hello and world
Any suggestions?
resultString = Regex.Replace(subjectString,
#"([""'])# Match a quote, remember which one
(?: # Then...
(?!\1) # (as long as the next character is not the same quote as before)
. # match any character
)* # any number of times
\1 # until the corresponding closing quote
\s* # plus optional whitespace
",
"", RegexOptions.IgnorePatternWhitespace);
will work on your example.
resultString = Regex.Replace(subjectString,
#"([""'])# Match a quote, remember which one
(?: # Then...
(?!\1) # (as long as the next character is not the same quote as before)
\\?. # match any escaped or unescaped character
)* # any number of times
\1 # until the corresponding closing quote
\s* # plus optional whitespace
",
"", RegexOptions.IgnorePatternWhitespace);
will also handle escaped quotes.
So it will correctly transform
Hello "quoted \"string\\" and 'tricky"stuff' world
into
Hello and world
Use a regular expression to match any quoted strings with the string and replace them with the empty string. Use the Regex.Replace() method to do the pattern matching and replacement.
In case, like me, you're afraid of regex, I've put together a functional way to do it, based on your example string. There's probably a way to make the code shorter, but I haven't found it yet.
private static string RemoveQuotes(IEnumerable<char> input)
{
string part = new string(input.TakeWhile(c => c != '"' && c != '\'').ToArray());
var rest = input.SkipWhile(c => c != '"' && c != '\'');
if(string.IsNullOrEmpty(new string(rest.ToArray())))
return part;
char delim = rest.First();
var afterIgnore = rest.Skip(1).SkipWhile(c => c != delim).Skip(1);
StringBuilder full = new StringBuilder(part);
return full.Append(RemoveQuotes(afterIgnore)).ToString();
}

Categories

Resources