Replace and remove a string via regex

Replace and remove a string via regex - c#

This is first time I am working with regex.
The string below
var value ="abc ltd as yes"
need to be change to
var value ="abc Limited"
I have the following code:
public static string Attempt_Prefix_Removal( string prefix, string replacement, bool remove = false)
{
if (remove == true)
{
var yesy = $"(?<!prefix )" + prefix + ".*";
var test = Regex.Replace(prefix.ToLower(), $"(?<!prefix )" + prefix + ".*", replacement );
}
var output = (remove == true) ? Regex.Replace(prefix.ToLower(), $"(?<!prefix )" + prefix + ".*", replacement) : Regex.Replace(prefix.ToLower(), $"(?<!prefix )" + prefix + "", replacement);
return output;
}
the values that are passed to the method are
prefix ="ltd", replacement = "Limited" , remove = ture
after running the code the result is
abc Limited as yes
what do i need to change to get ride of as yes ??
thanks

You may leverage this code:
var prefix ="ltd"; var replacement = "Limited";
var pat = $#"(?s)(?<!\w){Regex.Escape(prefix)}(?!\w){remove ? ".*" : string.Empty}";
return Regex.Replace(val, pat, replacement.Replace("$", "$$"));
See the C# demo online
The main points here are:
(?s) - will allow . match a newline (in case you will use .* in the pattern)
(?<!\w){Regex.Escape(prefix)}(?!\w) - the (?<!\w) negative lookbehind will fail the match if the current location is preceded with a word char (you may further tweak the lookbehind pattern as per your requirements)
{remove ? ".*" : string.Empty} - this will either append .* (if remove is true) or not.

private string regexOp(string sentence, string word, string wordtoReplace, bool isRemove)
{
var retValue = sentence;
if (isRemove)
{
var Pattern = "^.*?(?=" + word + ")";
Match result = Regex.Match(sentence, #Pattern);
if (!string.IsNullOrEmpty(result.Value))
retValue = result + wordtoReplace;
}
else
retValue = Regex.Replace(sentence, word, wordtoReplace);
return retValue;
}
try this method, this will work as you expected with dynamic,
do not forget to mark it as answer if this really helped you,

Related

Removing parts of the path from string

Consider the following string
string path = #"\\ParentDirectory\All_Attachments$\BATCH_NUMBERS\TS0001\SubDirectory\FileName.txt";
I am trying to modify the path by removing the \\ParentDirectory\All_Attachments$\. So I want my final string to look like:
BATCH_NUMBERS\TS0001\SubDirectory\FileName.txt
I have come up with the following regex
string pattern = #"(?<=\$)(\\)";
string returnValue = Regex.Replace(path, pattern, "", RegexOptions.IgnoreCase);
With the above if I do Console.WriteLine(returnValue) I get
\\ParentDirectory\All_Attachments$BATCH_NUMBERS\TS0001\SubDirectory\FileName.txt
So it only removes \ can someone tell me how to achieve this please.

The code below should do the trick.
string path = #"\\ParentDirectory\All_Attachments$\BATCH_NUMBERS\TS0001\SubDirectory\FileName.txt";
var result = Regex.Replace(path,
#"^ # Start of string
[^$]+ # Anything that is not '$' at least one time
\$ # The '$ sign
\\ # The \ after the '$'
", String.Empty, RegexOptions.IgnorePatternWhitespace);
When executed in LinqPad it gives the following result:
BATCH_NUMBERS\TS0001\SubDirectory\FileName.txt

As an alternative avoiding an RE or split/join you could just run along the string until you have seen 4 slashes:
string result = null;
for (int i = 0, m = 0; i < path.Length; i++)
if (path[i] == '\\' && ++m == 4) {
result = path.Substring(i + 1);
break;
}

Using a regex that takes the first 2 groups of (backslash(es) followed by 1 or more non-backslashes). And including the $ and backslash after that.
string returnValue = Regex.Replace(path, #"^(?:\\+[^\\]+){2}\$\\", "");
Or by splitting on $, joining the string array without it's first element and then trim \ from the start:
string returnValue = string.Join(null, path.Split('$').Skip(1)).TrimStart('\\');
But you'll be using System.Linq for that method to work.

You can use a combination of Substring() and IndexOf() to accomplish your goal:
string result = path.Substring(path.IndexOf("$") + 1);

Extra delimiter in MAC Address reformat

I've looked at several questions on here about formatting and validating MAC addresses, which is where I developed my regex from. The problem I'm having is that when I go to update the field is that there are extra delimiters in the new formatted MAC or if no delimiter exists the MAC fails to validate. I'm new to using regex, so can someone clarify why this is happening?
if (checkMac(NewMacAddress.Text) == true)
{
string formattedMAC = NewMacAddress.Text;
formattedMAC.Replace(" ", "").Replace(":", "").Replace("-", ""); //attempt to remove the delimiters before formatting
var regex = "(.{2})(.{2})(.{2})(.{2})(.{2})(.{2})";
var replace = "$1:$2:$3:$4:$5:$6";
var newformat = Regex.Replace(formattedMAC, regex, replace);
NewMacAddress.Text = newformat.ToString();
}
Here is the checkmac function
protected bool checkMac(string macaddress)
{
macaddress.Replace(" ", "").Replace(":", "").Replace("-", "");
Regex r = new Regex("^([0-9A-Fa-f]{2}[:-]){5}([0-9A-Fa-f]{2})$");
if (r.Match(macaddress).Success)
{
return true;
}
else
{
return false;
}
}
This is sample output for the extra delimiter that I was talking about. 00::5:0::56::b:f:00:7f
I was able to get the original MAC from a textbox. This also occurs with the MAC address I get from screen scrapes.

The reason your code is not working as intended is because:
String.Replace does not modify the string you pass in, but returns a new string instead (strings are immutable). You have to assign the result of String.Replace to a variable.
Your checkMac function only allows mac addresses with delimiters. You can simply remove this restriction to resolve your problems.
The working code then becomes something along the lines of:
string newMacAddress = "00::5:0::56::b:f:00:7f";
if (checkMac(newMacAddress) == true)
{
string formattedMAC = newMacAddress;
formattedMAC = formattedMAC.Replace(" ", "").Replace(":", "").Replace("-", ""); //attempt to remove the delimiters before formatting
var regex = "(.{2})(.{2})(.{2})(.{2})(.{2})(.{2})";
var replace = "$1:$2:$3:$4:$5:$6";
var newformat = Regex.Replace(formattedMAC, regex, replace);
newMacAddress = newformat.ToString();
}
protected static bool checkMac(string macaddress)
{
macaddress = macaddress.Replace(" ", "").Replace(":", "").Replace("-", "");
Regex r = new Regex("^([0-9A-Fa-f]{12})$");
if (r.Match(macaddress).Success)
{
return true;
}
else
{
return false;
}
}

You're close. I'm first going to answer using Ruby because that's what I'm most familiar with at the moment, and it should be sufficient for you to understand how to get it working in C#. Maybe I can convert it to C# later.
using these elements:
\A - start of entire string
[0-9a-fA-F] - any hex digit
{2} - twice
[:-]? - either ":" or "-" or "" (no delimiter)
\Z - end of entire string, before ending newline if it exists
() - parenthetical match in order to reference parts of the regex, e.g. match[1]
This regex will do what you need:
mac_address_regex = /\A([0-9a-fA-F]{2})[:-]?([0-9a-fA-F]{2})[:-]?([0-9a-fA-F]{2})[:-]?([0-9a-fA-F]{2})[:-]?([0-9a-fA-F]{2})[:-]?([0-9a-fA-F]{2})\Z/
You can both validate and sanitize input using this regex:
match = mac_address_regex.match(new_mac_address.text)
if match.present?
sanitized_mac_addr = (1..6).map { |i| match[i] }.join(":") # join match[i] for i = (1,2,3,4,5,6)
sanitized_mac_addr.upcase! # uppercase
else
sanitized_mac_addr = nil
end

C# Regex to replace invalid character to make it as perfect float number

for example if the string is "-234.24234.-23423.344"
the result should be "-234.2423423423344"
if the string is "898.4.44.4"
the result should be "898.4444"
if the string is "-898.4.-"
the result should be "-898.4"
the result should always make scene as a double type
What I can make is this:
string pattern = String.Format(#"[^\d\{0}\{1}]",
NumberFormatInfo.CurrentInfo.NumberDecimalSeparator,
NumberFormatInfo.CurrentInfo.NegativeSign);
string result = Regex.Replace(value, pattern, string.Empty);
// this will not be able to deal with something like this "-.3-46821721.114.4"
Is there any perfect way to deal with those cases?

It's probably a bad idea, but you can do this with regex like this:
Regex.Replace(input, #"[^-.0-9]|(?<!^)-|(?<=\..*)\.", "")
The regex matches:
[^-.0-9] # anything which isn't ., -, or a digit.
| # or
(?<!^)- # a - which is not at the start of the string
| # or
(?<=\..*)\. # a dot which is not the first dot in the string
This works on your examples, and additionally this case: "9-1.1" becomes "91.1".
You could also change (?<!^)- to (?<!^[^-.0-9]*)- if you'd like "asd-8" to become "-8" rather than "8".

It's not a good idea using regex itself to achieve your goal, since regex lack AND and NOT logic for expression.
Try the code below, it will do the same thing.
var str = #"-.3-46821721.114.4";
var beforeHead = "";
var afterHead = "";
var validHead = new Regex(#"(\d\.)" /* use #"\." if you think "-.5" is also valid*/, RegexOptions.Compiled);
Regex.Replace(str, #"[^0-9\.-]", "");
var match = validHead.Match(str);
beforeHead = str.Substring(0, str.IndexOf(match.Value));
if (beforeHead[0] == '-')
{
beforeHead = '-' + Regex.Replace(beforeHead, #"[^0-9]", "");
}
else
{
beforeHead = Regex.Replace(beforeHead, #"[^0-9]", "");
}
afterHead = Regex.Replace(str.Substring(beforeHead.Length + 2 /* 1, if you use \. as head*/), #"[^0-9]", "");
var validFloatNumber = beforeHead + match.Value + afterHead;
String must be trimmed before operation.

Need to perform Wildcard (*,?, etc) search on a string using Regex

I need to perform Wildcard (*, ?, etc.) search on a string.
This is what I have done:
string input = "Message";
string pattern = "d*";
Regex regex = new Regex(pattern, RegexOptions.IgnoreCase);
if (regex.IsMatch(input))
{
MessageBox.Show("Found");
}
else
{
MessageBox.Show("Not Found");
}
With the above code "Found" block is hitting but actually it should not!
If my pattern is "e*" then only "Found" should hit.
My understanding or requirement is d* search should find the text containing "d" followed by any characters.
Should I change my pattern as "d.*" and "e.*"? Is there any support in .NET for Wild Card which internally does it while using Regex class?

From http://www.codeproject.com/KB/recipes/wildcardtoregex.aspx:
public static string WildcardToRegex(string pattern)
{
return "^" + Regex.Escape(pattern)
.Replace(#"\*", ".*")
.Replace(#"\?", ".")
+ "$";
}
So something like foo*.xls? will get transformed to ^foo.*\.xls.$.

You can do a simple wildcard mach without RegEx using a Visual Basic function called LikeString.
using Microsoft.VisualBasic;
using Microsoft.VisualBasic.CompilerServices;
if (Operators.LikeString("This is just a test", "*just*", CompareMethod.Text))
{
Console.WriteLine("This matched!");
}
If you use CompareMethod.Text it will compare case-insensitive. For case-sensitive comparison, you can use CompareMethod.Binary.
More info here: http://www.henrikbrinch.dk/Blog/2012/02/14/Wildcard-matching-in-C
MSDN: http://msdn.microsoft.com/en-us/library/microsoft.visualbasic.compilerservices.operators.likestring%28v=vs.100%29.ASPX

The correct regular expression formulation of the glob expression d* is ^d, which means match anything that starts with d.
string input = "Message";
string pattern = #"^d";
Regex regex = new Regex(pattern, RegexOptions.IgnoreCase);
(The # quoting is not necessary in this case, but good practice since many regexes use backslash escapes that need to be left alone, and it also indicates to the reader that this string is special).

Windows and *nux treat wildcards differently. *, ? and . are processed in a very complex way by Windows, one's presence or position would change another's meaning. While *nux keeps it simple, all it does is just one simple pattern match. Besides that, Windows matches ? for 0 or 1 chars, Linux matches it for exactly 1 chars.
I didn't find authoritative documents on this matter, here is just my conclusion based on days of tests on Windows 8/XP (command line, dir command to be specific, and the Directory.GetFiles method uses the same rules too) and Ubuntu Server 12.04.1 (ls command). I made tens of common and uncommon cases work, although there'are many failed cases too.
The current answer by Gabe, works like *nux. If you also want a Windows style one, and are willing to accept the imperfection, then here it is:
/// <summary>
/// <para>Tests if a file name matches the given wildcard pattern, uses the same rule as shell commands.</para>
/// </summary>
/// <param name="fileName">The file name to test, without folder.</param>
/// <param name="pattern">A wildcard pattern which can use char * to match any amount of characters; or char ? to match one character.</param>
/// <param name="unixStyle">If true, use the *nix style wildcard rules; otherwise use windows style rules.</param>
/// <returns>true if the file name matches the pattern, false otherwise.</returns>
public static bool MatchesWildcard(this string fileName, string pattern, bool unixStyle)
{
if (fileName == null)
throw new ArgumentNullException("fileName");
if (pattern == null)
throw new ArgumentNullException("pattern");
if (unixStyle)
return WildcardMatchesUnixStyle(pattern, fileName);
return WildcardMatchesWindowsStyle(fileName, pattern);
}
private static bool WildcardMatchesWindowsStyle(string fileName, string pattern)
{
var dotdot = pattern.IndexOf("..", StringComparison.Ordinal);
if (dotdot >= 0)
{
for (var i = dotdot; i < pattern.Length; i++)
if (pattern[i] != '.')
return false;
}
var normalized = Regex.Replace(pattern, #"\.+$", "");
var endsWithDot = normalized.Length != pattern.Length;
var endWeight = 0;
if (endsWithDot)
{
var lastNonWildcard = normalized.Length - 1;
for (; lastNonWildcard >= 0; lastNonWildcard--)
{
var c = normalized[lastNonWildcard];
if (c == '*')
endWeight += short.MaxValue;
else if (c == '?')
endWeight += 1;
else
break;
}
if (endWeight > 0)
normalized = normalized.Substring(0, lastNonWildcard + 1);
}
var endsWithWildcardDot = endWeight > 0;
var endsWithDotWildcardDot = endsWithWildcardDot && normalized.EndsWith(".");
if (endsWithDotWildcardDot)
normalized = normalized.Substring(0, normalized.Length - 1);
normalized = Regex.Replace(normalized, #"(?!^)(\.\*)+$", #".*");
var escaped = Regex.Escape(normalized);
string head, tail;
if (endsWithDotWildcardDot)
{
head = "^" + escaped;
tail = #"(\.[^.]{0," + endWeight + "})?$";
}
else if (endsWithWildcardDot)
{
head = "^" + escaped;
tail = "[^.]{0," + endWeight + "}$";
}
else
{
head = "^" + escaped;
tail = "$";
}
if (head.EndsWith(#"\.\*") && head.Length > 5)
{
head = head.Substring(0, head.Length - 4);
tail = #"(\..*)?" + tail;
}
var regex = head.Replace(#"\*", ".*").Replace(#"\?", "[^.]?") + tail;
return Regex.IsMatch(fileName, regex, RegexOptions.IgnoreCase);
}
private static bool WildcardMatchesUnixStyle(string pattern, string text)
{
var regex = "^" + Regex.Escape(pattern)
.Replace("\\*", ".*")
.Replace("\\?", ".")
+ "$";
return Regex.IsMatch(text, regex);
}
There's a funny thing, even the Windows API PathMatchSpec does not agree with FindFirstFile. Just try a1*., FindFirstFile says it matches a1, PathMatchSpec says not.

d* means that it should match zero or more "d" characters. So any string is a valid match. Try d+ instead!
In order to have support for wildcard patterns I would replace the wildcards with the RegEx equivalents. Like * becomes .* and ? becomes .?. Then your expression above becomes d.*

You need to convert your wildcard expression to a regular expression. For example:
private bool WildcardMatch(String s, String wildcard, bool case_sensitive)
{
// Replace the * with an .* and the ? with a dot. Put ^ at the
// beginning and a $ at the end
String pattern = "^" + Regex.Escape(wildcard).Replace(#"\*", ".*").Replace(#"\?", ".") + "$";
// Now, run the Regex as you already know
Regex regex;
if(case_sensitive)
regex = new Regex(pattern);
else
regex = new Regex(pattern, RegexOptions.IgnoreCase);
return(regex.IsMatch(s));
}

You must escape special Regex symbols in input wildcard pattern (for example pattern *.txt will equivalent to ^.*\.txt$)
So slashes, braces and many special symbols must be replaced with #"\" + s, where s - special Regex symbol.

I think #Dmitri has nice solution at
Matching strings with wildcard https://stackoverflow.com/a/30300521/1726296
Based on his solution, I have created two extension methods. (credit goes to him)
May be helpful.
public static String WildCardToRegular(this String value)
{
return "^" + Regex.Escape(value).Replace("\\?", ".").Replace("\\*", ".*") + "$";
}
public static bool WildCardMatch(this String value,string pattern,bool ignoreCase = true)
{
if (ignoreCase)
return Regex.IsMatch(value, WildCardToRegular(pattern), RegexOptions.IgnoreCase);
return Regex.IsMatch(value, WildCardToRegular(pattern));
}
Usage:
string pattern = "file.*";
var isMatched = "file.doc".WildCardMatch(pattern)
or
string xlsxFile = "file.xlsx"
var isMatched = xlsxFile.WildCardMatch(pattern)

All upper code is not correct to the end.
This is because when searching zz*foo* or zz* you will not get correct results.
And if you search "abcd*" in "abcd" in TotalCommander will he find a abcd file so all upper code is wrong.
Here is the correct code.
public string WildcardToRegex(string pattern)
{
string result= Regex.Escape(pattern).
Replace(#"\*", ".+?").
Replace(#"\?", ".");
if (result.EndsWith(".+?"))
{
result = result.Remove(result.Length - 3, 3);
result += ".*";
}
return result;
}

You may want to use WildcardPattern from System.Management.Automation assembly. See my answer here.

The most accepted answer works fine for most cases and can be used in most scenarios:
"^" + Regex.Escape(pattern).Replace(#"\*", ".*").Replace(#"\?", ".") + "$";
However if you allow escaping in you input wildcard pattern, e.g. "find \*", meaning you want to search for a string "find *" with asterisk, it won't work. The already escaped * will be escaped to "\\\\\\*" and after replacing we have "^value\\ with\\\\.*$", which is wrong.
The following code (which for sure can be optimized and rewritten) handles that special case:
public static string WildcardToRegex(string wildcard)
{
var sb = new StringBuilder();
for (var i = 0; i < wildcard.Length; i++)
{
// If wildcard has an escaped \* or \?, preserve it like it is in the Regex expression
var character = wildcard[i];
if (character == '\\' && i < wildcard.Length - 1)
{
if (wildcard[i + 1] == '*')
{
sb.Append("\\*");
i++;
continue;
}
if (wildcard[i + 1] == '?')
{
sb.Append("\\?");
i++;
continue;
}
}
switch (character)
{
// If it's unescaped * or ?, change it to Regex equivalents. Add more wildcard characters (like []) if you need to support them.
case '*':
sb.Append(".*");
break;
case '?':
sb.Append('.');
break;
default:
//// Escape all other symbols because wildcard could contain Regex special symbols like '.'
sb.Append(Regex.Escape(character.ToString()));
break;
}
}
return $"^{sb}$";
}
Solution for the problem just with Regex substitutions is proposed here https://stackoverflow.com/a/15275806/1105564

Regular Expression to Match Exact Word - Search String Highlight

I'm using the following 2 methods to highlight the search keywords. It is working fine but fetching partial words also.
For Example:
Text: "This is .net Programming"
Search Key Word: "is"
It is highlighting partial word from this and "is"
Please let me know the correct regular expression to highlight the correct match.
private string HighlightSearchKeyWords(string searchKeyWord, string text)
{
Regex exp = new Regex(#", ?");
searchKeyWord = "(\b" + exp.Replace(searchKeyWord, #"|") + "\b)";
exp = new Regex(searchKeyWord, RegexOptions.Singleline | RegexOptions.IgnoreCase);
return exp.Replace(text, new MatchEvaluator(MatchEval));
}
private string MatchEval(Match match)
{
if (match.Groups[1].Success)
{
return "<span class='search-highlight'>" + match.ToString() + "</span>";
}
return ""; //no match
}

You really just need # before your "(\b" and "\b)" because the string "\b" will not be "\b" as you would expect. But I have also tried making another version with a replacement pattern instead of a full-blown method.
How about this one:
private string keywordPattern(string searchKeyword)
{
var keywords = searchKeyword.Split(',').Select(k => k.Trim()).Where(k => k != "").Select(k => Regex.Escape(k));
return #"\b(" + string.Join("|", keywords) + #")\b";
}
private string HighlightSearchKeyWords(string searchKeyword, string text)
{
var pattern = keywordPattern(searchKeyword);
Regex exp = new Regex(pattern, RegexOptions.IgnoreCase | RegexOptions.Singleline);
return exp.Replace(text, #"<span class=""search-highlight"">$0</span>");
}
Usage:
var res = HighlightSearchKeyWords("is,this", "Is this programming? This is .net Programming.");
Result:
<span class="search-highlight">Is</span> <span class="search-highlight">this</span> programming? <span class="search-highlight">This</span> <span class="search-highlight">is</span> .net Programming.
Updated to use \b and a simplified replace pattern. (The old one used (^|\s) instead of the first \b and ($|\s) instead of the last \b. So it would also work on search terms which not only includes word-characters.
Updated to your comma notation for search terms
Updated forgot Regex.Escape - added now. Otherwise searches for "\w" would blow up the thing :)
Updated do to a comment ;)

Try this fixed line:
searchKeyWord = #"(\b" + exp.Replace(searchKeyWord, #"|") + #"\b)";

You need to enclose the keywords in a non-matching group, otherwise you will get false positives (if you are using multiple keywords separated by commas as indicated in the sample)!
private string EscapeKeyWords(string searchKeyWord)
{
string[] keyWords = searchKeyWord.Split(',');
for (int i = 0; i < keyWords.Length; i++) keyWords[i] = Regex.Escape(keyWords[i].Trim());
return String.Join("|", keyWords);
}
private string HighlightSearchKeyWords(string searchKeyWord, string text)
{
searchKeyWord = #"(\b(?:" + EscapeKeyWords(searchKeyWord) + #")\b)";
Regex exp = new Regex(searchKeyWord, RegexOptions.Singleline | RegexOptions.IgnoreCase);
return exp.Replace(text, #"<span class=""search-highlight"">$0</span>");
}

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Replace and remove a string via regex - c#

Related

Removing parts of the path from string

Extra delimiter in MAC Address reformat

C# Regex to replace invalid character to make it as perfect float number

Need to perform Wildcard (*,?, etc) search on a string using Regex

Regular Expression to Match Exact Word - Search String Highlight

Categories

Resources