Combine multiple regex expressions into a single line - c#

I am trying to get the following Regex expressions into a single line.
filename = Regex.Replace(filename, #"[^a-z0-9\s-!/\-_\.\*\(\)']", ""); // Remove all non valid chars
filename = Regex.Replace(filename, #"\s+", " ").Trim(); // convert multiple spaces into one space
filename = Regex.Replace(filename, #"\s", "_"); // //Replace spaces by dashes

You can use
filename = Regex.Replace(filename, #"[^a-z0-9\s!/_.*()'-]|^\s+|\s+$|(\s+)", m => m.Groups[1].Success ? "_" : "");
The regex matches
[^a-z0-9\s!/_.*()'-]| - any char but lowercase ASCII letters, digits, whitespace, !, /, _, ., *, (, ), ' and -, or
^\s+| - start of string and then one or more whitespaces, or
\s+$| - one or more whitespaces and then end of string
(\s+) - Group 1: one or more whitespaces in any other context.
If Group 1 matches, the replacement is a _ char, else, the replacement is an empty string (the match is removed).
See the regex demo.

Related

Regex.Split string into substrings by a delimiter while preserving whitespace

I created a Regex to split a string by a delimiter ($), but it's not working the way I want.
var str = "sfdd fgjhk fguh $turn.bak.orm $hahr*____f";
var list = Regex.Split(str, #"(\$\w+)").Where(x => !string.IsNullOrEmpty(x)).ToList();
foreach (var item in list)
{
Console.WriteLine(item);
}
Output:
"sfdd fgjhk fguh "
"$turn"
".bak.orm "
"$hahr"
"*____f"
The problem is \w+ is not matching any periods or stars. Here's the output I want:
"sfdd fgjhk fguh "
"$turn.bak.orm"
" "
"$hahr*____f"
Essentially, I want to split a string by $ and make sure $ appears at the beginning of a substring and nowhere else (it's okay for a substring to be $ only). I also want to make sure whitespace characters are preserved as in the first substring, but any match should not contain whitespace as in the second and fourth cases. I don't care for case sensitivity.
It appears you want to split with a pattern that starts with a dollar and then has any 0 or more chars other than whitespace and dollar chars:
var list = Regex.Split(s, #"(\$[^\s$]*)")
.Where(x => !string.IsNullOrEmpty(x))
.ToList();
Details
( - start of a capturing group (so that Regex.Split tokenized the string, could keep the matches inside the resulting array)
\$ - a dollar sign
[^\s$]* - a negated character class matching 0 or more chars other than whitespace (\s) and dollar symbols
) - end of the capturing group.
See the regex demo:
To include a second delimiter, you may use #"([€$][^\s€$]*)".

Getting the substring after a character in C# using regex

I have the following input string:
string val = "[01/02/70]\nhello world ";
I want to get the all words after the last ] character.
Example output for a sample string above:
\nhello world
In C#, use Substring() with IndexOf:
string val = val.Substring(val.IndexOf(']') + 1);
If you have multiple ] symbols, and you want to get all the string after the last one, use LastIndexOf:
string val = "[01/02/70]\nhello [01/02/80] world ";
string val = val.Substring(val.LastIndexOf(']') + 1); // => " world "
If you are a fan of Regex, you might want to use a Regex.Replace like
string val = "[01/02/70]\nhello [01/02/80] world ";
val = Regex.Replace(val, #"^.*\]", string.Empty, RegexOptions.Singleline); // => " world "
See demo
Notes on REGEX:
RegexOptions.Singleline makes . match a linebreak
^ - matches beginning of string
.* - matches 0 or more characters but as many as possible (greedy matching)
\] - matches literal ] (as it is a special regex metacharacter, it must be escaped).
You need to use lookbehind assertion. And not only that, you have to enable DOTALL modifier also, so that it would also match the newline character present inbetween.
"(?s)(?<=\\]).*"
(?s) - DOTALL modifier.
(?<=\\]) - lookbehind which asserts that the match must be preceeded by a close bracket
.* - Matches any chracater zero or more times.
or
"(?s)(?<=\\])[\\s\\S]*"
Try this if you don't want to match the following newline character.
#"(?<=\][\n\r]*).*"

I think my regular expression pattern in C# is incorrect

I'm checking to see if my regular expression matches my string.
I have a filename that looks like somename_somthing.txt and I want to match it to somename_*.txt, but my code is failing when I try to pass something that should match. Here is my code.
string pattern = "somename_*.txt";
Regex r = new Regex(pattern, RegexOptions.IgnoreCase);
using (ZipFile zipFile = ZipFile.Read(fullPath))
{
foreach (ZipEntry e in zipFile)
{
Match m = r.Match("somename_something.txt");
if (!m.Success)
{
throw new FileNotFoundException("A filename with format: " + pattern + " not found.");
}
}
}
The asterisk is matching the underscore and throwing it off.
Try:
somename_(\w+).txt
The (\w+) here will match the group at this location.
You can see it match here: https://regex101.com/r/qS8wA5/1
In General
Regex give in this code matches the _ with an * meaning zero or more underscores instead of what you intended. The * is used to denote zero or more of the previous item. Instead try
^somename_(.*)\.txt$
This matches exactly the first part "somename_".
Then anything (.*)
And finally the end ".txt". The backslash escapes the 'dot'.
More Specific
You can also say if you only want letters and not numbers or symbols in the middle part of the match with:
^somename_[a-z]*\.txt$
As written, your regular expression
somename_*.txt
matches (in a case-insensitive manner):
the literal text somename, followed by
zero or more underscore characters (_), followed
any character (other than newline), followed
the literal text txt
And it will match that anywhere in the source text. You probably want to write something like
Regex myPattern = new Regex( #"
^ # anchor the match to start-of-text, followed by
somename # the literal 'somename', followed by
_ # a literal underscore character, followed by
.* # zero or of any character (except newline), followed by
\. # a literal period/fullstop, followed by
txt # the literal text 'txt'
$ # with the match anchored at end-of-text
" , RegexOptions.IgnoreCase|RegexOptions.IgnorePatternWhitespace
) ;
Hi I think the pattern should be
string pattern = "somename_.*\\.txt";
Regards

Replace special character with white space through regex

I have a function which replace character.
public static string Replace(string value)
{
value = Regex.Replace(value, "[\n\r\t]", " ");
return value;
}
value="abc\nbcd abcd abcd\ " if in string there is any unwanted white space they are also remove.Means I want result like this
value="abcabcdabcd".Help to change Regex Pattern to get desire result.Thanks a lot.
If you need to remove any number of whitespace characters from the string, probably you're looking for something like this:
value = Regex.Replace(value, #"\s+", "");
where \s matches any whitespace character and + means one or more times.
Instead of replacing your newline, tab, etc. characters with a space, just replace all whitespace with nothing:
public static string RemoveWhitespace(string value)
{
return Regex.Replace(value, "\\s", "");
}
\s is a special character group that matches all whitespace characters. (The backslash is doubled because the backslash has a special meaning in C# strings as well.) The following MSDN link contains the exact definition of that character group:
Character Classes: White-Space Character: \s
You may want to try \s indicating white spaces. With the statement Regex.Replace(value, #"\s", ""), the output will be "abcabcdabcd".

How can I remove quoted string literals from a string in C#?

I have a string:
Hello "quoted string" and 'tricky"stuff' world
and want to get the string minus the quoted parts back. E.g.,
Hello and world
Any suggestions?
resultString = Regex.Replace(subjectString,
#"([""'])# Match a quote, remember which one
(?: # Then...
(?!\1) # (as long as the next character is not the same quote as before)
. # match any character
)* # any number of times
\1 # until the corresponding closing quote
\s* # plus optional whitespace
",
"", RegexOptions.IgnorePatternWhitespace);
will work on your example.
resultString = Regex.Replace(subjectString,
#"([""'])# Match a quote, remember which one
(?: # Then...
(?!\1) # (as long as the next character is not the same quote as before)
\\?. # match any escaped or unescaped character
)* # any number of times
\1 # until the corresponding closing quote
\s* # plus optional whitespace
",
"", RegexOptions.IgnorePatternWhitespace);
will also handle escaped quotes.
So it will correctly transform
Hello "quoted \"string\\" and 'tricky"stuff' world
into
Hello and world
Use a regular expression to match any quoted strings with the string and replace them with the empty string. Use the Regex.Replace() method to do the pattern matching and replacement.
In case, like me, you're afraid of regex, I've put together a functional way to do it, based on your example string. There's probably a way to make the code shorter, but I haven't found it yet.
private static string RemoveQuotes(IEnumerable<char> input)
{
string part = new string(input.TakeWhile(c => c != '"' && c != '\'').ToArray());
var rest = input.SkipWhile(c => c != '"' && c != '\'');
if(string.IsNullOrEmpty(new string(rest.ToArray())))
return part;
char delim = rest.First();
var afterIgnore = rest.Skip(1).SkipWhile(c => c != delim).Skip(1);
StringBuilder full = new StringBuilder(part);
return full.Append(RemoveQuotes(afterIgnore)).ToString();
}

Categories

Resources