Removing Sub-string with some pattern from a string - c#

I have a string something like JSON format:
XYZ DIV Parameters: width=\"1280\" height=\"720\", session=\"1\"
Now I want to remove width=\"1280\" height=\"720\" from this string.
Note: There can be any number in place of 1280 and 720. So, I can't just replace it with null.
Please tell me how to solve it? Either by Regex or any other better method possible.

Regex to be replaced with empty string:
(width|height)=\\"\d+\\"
Regex visualization:
Code:
string input = #"XYZ DIV Parameters: width=\""1280\"" height=\""720\"", session=\""1\""";
string output = Regex.Replace(input, #"(width|height)=\\""\d+\\""", string.Empty);

You could do a find and replace using the following regex:
width=\\"\d*+\\" replace with a blank string, as well as replacing height=\\"\d*+\\" with a blank string.
This is removing the entire text of width=\"XYZ\", if you wanted to just replace the numbers or blank out the numbers you can replace with a string that suits your needs (width=\"\" for example)
If you can guarantee the width and height will ALWAYS be in that format and ALWAYS follow each other seperated by a space, you can combine that into one bigger regex find/replace using width=\\"\d*+\\" height=\\"\d*+\\".
A little more explanation on the regex so you take something away, not just a quick fix :)
width=\\"\d*+\\" breaks down to:
width= pretty simple, just find the text you are looking for to start your removal.
\\" since \ is a special char in regex you have to escape it, then the " char can just follow it up like normal.
\d*+ digits \d, zero or more of them *, and then non greedy +. The important part here is the non greedy on the digits. If you left that off, your regex would look and consume digits until it found the last ". Not 100% needed in your case (since height is buffering) but it is still a lot safer.
\\" to end the regex out

This will do it:
string resultString = null;
try {
Regex regexObj = new Regex(#"^(.*?)width=\\"".*?\\"" height=\\"".*?\\""(.*?)$", RegexOptions.IgnoreCase);
resultString = regexObj.Replace(subjectString, #"$1width=\""\"" height=\""\""$2");
} catch (ArgumentException ex) {
// Syntax error in the regular expression
}

Related

Regex to match a word beginning with a period and ending with an underscore?

I'm quite the Regex novice, but I have a series of strings similar to this "[$myVar.myVar_STATE]" I need to replace the 2nd myVar that begins with a period and ends with an underscore. I need it to match it exactly, as sometimes I'll have "[$myVar.myVar_moreMyVar_STATE]" and in that case I wouldn't want to replace anything.
I've tried things like "\b.myVar_\b", "\.\bmyVar_\b" and several more, all to no luck.
How about this:
\[\$myVar\.([^_]+)_STATE\]
Matches:
[$myVar.myVar_STATE] // matches and captures 'myvar'
[$myVar.myVar_moreMyVar_STATE] // no match
Working regex example:
http://regex101.com/r/yM9jQ3
Or if _STATE was variable, you could use this: (as long as the text in the STATE part does not have underscores in it.)
\[\$myVar\.([^_]+)_[^_]+\]
Working regex example:
http://regex101.com/r/kW8oE1
Edit: Conforming to OP's comments below, This should be what he's going for:
(\[\$myVar\.)([^_]+)(_[^_]+\])
Regex replace example:
http://regex101.com/r/pU6yL8
C#
var pattern = #"(\[\$myVar\.)([^_]+)(_[^_]+\])";
var replaced = Regex.Replace(input, pattern, "$1"+ newVar + "$3")
What about something like:
.*.(myVar_).*
This looks for anything then a . and "myVar_" followed by anything.
It matches:
"[$myVar.myVar_STATE]"
And only the first myVar_ here:
"[$myVar.myVar_moremyVar_STATE]"
See it in action.
This should do it:
\[\$myVar\.(.*?)_STATE\]
You can use this little trick to pick out the groups, and build the replacement at the end, like so:
var replacement = "something";
var input = #"[$myVar.myVar_STATE]";
var pattern = #"(\[\$myVar\.)(.*?)_(.*?)]";
var replaced = Regex.Replace(input, pattern, "$1"+ replacement + "_$2]")
C# already has builtin method to do this
string text = ".asda_";
Response.Write((text.StartsWith(".") && text.EndsWith("_")));
Is Regex really required?
string input = "[$myVar.myVar_STATE]";
string oldVar = "myVar";
string newVar = "myNewVar";
string result = input.Replace("." + oldVar + "_STATE]", "." + newVar + "_STATE]");
In case "STATE" is a variable part, then we'll need to use Regex. The easiest way is to use this Regex pattern which matches a position between a prefix and a suffix. Prefix and suffix are used for searching but are not included in the resulting match:
(?<=prefix)find(?=suffix)
result =
Regex.Replace(input, #"(?<=\.)" + Regex.Escape(oldVar) + "(?=_[A-Z]+])", newVar);
Explanation:
The prefix part is \., which stand for ".".
The find part is the escaped old variable to be replaced. Regex escaping makes sure that characters with a special meaning in Regex are escaped.
The suffix part is _[A-Z]+], an underscore followed by at least one letter followed by "]". Note: the second ] needs not to be escaped. An opening bracket [ would have to be escaped like this: \[. We cannot use \w for word characters for the STATE-part as \w includes underscores. You might have to adapt the [A-Z] part to exactly match all possible states (e.g. if state has digits, use [A-Z0-9].

Match <keyword> with whitespace at end/start of line

I can't figure out how to get a C# regex IsMatch to match a <keyword> followed by an end of line or whitespace.
I currently have [\s]+keyword[\s]+ which works for spaces, but does not work for keyword<end of string> or <start of string>keyword.
I have tried [\s^]+keyword[\s$]+, but this makes it fail to match with the spaces, and doesn't work at the end or start of a string.
Here's the code I tried:
string pattern = string.Format("[\\s^]+{0}[\\s$]+",keyword);
if(Regex.IsMatch(Text, pattern, RegexOptions.IgnoreCase))
The problem is that ^ and $ inside character classes are not treated as anchors but as literal characters. You could simply use alternation instead of a character class:
string pattern = string.Format(#"(?:\s|^){0}(?:\s|$)",keyword);
Note that there is no need for the +, because you just want to make sure if there is one space. You don't care if there are more of them. The ?: is just good practice and suppresses capturing which you don't need here. And the # makes the string a verbatim string, where you don't have to double-escape your backslashes.
There is another way, which I find slightly neater. You can use lookarounds, to ensure that there is not a non-space character to left and right of your keyword (yes, double negation, think about it). This assumption is valid if there is a space-character or if there is one end of the string:
string pattern = string.Format(#"(?<!\S){0}(?!\S)",keyword);
This does exactly the same, but might be slightly more efficient (you'd have to profile that to be certain, though - if it even matters).
You can also use the first pattern (with non-inverted logic) with (positive) lookarounds:
string pattern = string.Format(#"(?<=\s|^){0}(?=\s|$)",keyword);
However, this doesn't really make a difference to the first pattern, unless you want to find multiple matches in a string.
By the way, if your keyword might contain regex meta-characters (like |, $, + and so on), make sure to escape it first using Regex.Escape
I am not exactly sure what you are really trying to accomplish with this regex but the following code will match the the string 'keyword' when it has white space on either side of it:
string resultString = null;
try {
Regex regexObj = new Regex(#"\b(keyword)\b");
resultString = regexObj.Match(subjectString).Value;
} catch (ArgumentException ex) {
// Syntax error in the regular expression
}
It can be generally explained as: the \b asserts the position at the beginning and end word boundaries. In this case I assumed the word of interest was keyword.
I also thought from my interpretation of your question that you might be interested in matching the entire series of characters that follow the keyword up to the line break. If that is the case the following regex code will return that match:
string resultString = null;
try {
Regex regexObj = new Regex(#"\bkeyword\b(\w*\s*)$");
resultString = regexObj.Match(subjectString).Value;
} catch (ArgumentException ex) {
// Syntax error in the regular expression
}
This regular expression can be interpreted as find the beginning and ending word boundaries which is the reason for the \b on either side. The (\w*\s*)$ reads like this match all word \w characters and space characters \s* as many times as they occur and move position to the end of the line $.
This next bit of code will read in the entire line of data that contains the keyword, lines of data that do not contain the keyword will not match.
string resultString = null;
try {
Regex regexObj = new Regex("^.*keyword.*$");
resultString = regexObj.Match(subjectString).Value;
} catch (ArgumentException ex) {
// Syntax error in the regular expression
}
Explained: the ^ positions at the beginning of the string, the .* matches any character that is not a line break character, the keyword is then included followed by the .* so the remaining non line breaking characters are included and the $ asserts the position at the end of the string which would be the entire line in this example.
I hope the above is helpful if not this time maybe in the future. I am always trying to discover alternative practices to achieve the same results, so if you have any constructive criticism please post it.
Best wishes,
Steve
Try this:
string pattern = string.Format("^\\s*{0}\\s*$",keyword);
i found this other post
How to specify "Space or end of string" and "space or start of string"?
and that answered the question
so my code is now
string pattern = string.Format("\\b+{0}\\b+",keyword);
if(Regex.IsMatch(UserText, pattern, RegexOptions.IgnoreCase))

Regex to find percent sign with actual math value

I have a string like 30+20%. Now I want to replace 20% with (20/100). Thats it.
If the percent doesn't occur in any other situation in the string, you don't even need a regular expression:
s = s.Replace("%", "/100");
To add the parentheses you need the regular expression though:
s = Regex.Replace(s, #"(\d{1,3})%", "($1/100)");
string s="30+20%";
s=s.Replace("%","/100)");
s=s.Replace("+","+(");
I'll just assume you run Perl
input="30+20%"
echo $input | perl -pe 's#(\d+)%#\($1/100\)#g'
EDIT: just read the tags, anyways, the regex should work in C#
That should be an easy regex to try. 1 to 3 digits followed by a percentage sign.
You need to capture the 1-3 digits group for backreference, and use it to create
(DIGITS/100) string.
You can play here :http://gskinner.com/RegExr/ to learn regexes.
I'm not sure what programming language are you using but this is how you would do this in python:
import re
re.sub(r'(\d*)%', r'\1/100', '30+20%')
The returned string will be '30+20/100'.
Explanation:
Let's look at the regex. r'\d*%' is a regex that matches a series of digits followed by the % sign. I put paranthesis arount (\d*) to tell the regex compiler that the series of digits (aka the number) is the first group. The second arguemnt tells the sub functions how to replace the matched string. The argument '\1/100' tells the sub function I want it to replace the matched string with the value of the first group matched by the regex (through the \1 part) followed by /100.
You can check the python re module for more information.
Try this
string resultString = null;
try {
resultString = Regex.Replace(subjectString, #"\b(\d+)%", "($1/100)");
} catch (ArgumentException ex) {
// Syntax error in the regular expression
}

Looking for a quote matching Reg Ex

I'm after a regex for C# which will turn this:
"*one*" *two** two and a bit "three four"
into this:
"*one*" "*two**" two and a bit "three four"
IE a quoted string should be unchanged whether it contains one or many words.
Any words with asterisks to be wrapped in double quotes.
Any unquoted words with no asterisks to be unchanged.
Nice to haves:
If multiple asterisks could be merged into one in the same step that would be better.
Noise words - eg and, a, the - which are not part of a quoted string should be dumped.
Thanks for any help / advice.
Julio
The following regex will do what you're looking for:
\*+ # Match 1 or more *
(
\w+ # Capture character string
)
\*+ # Match 1 or more *
If you use this in conjunction with this replace statement, all you words matched by (\w+) will be wrapped in "**":
string s = "\"one\" *two** two and a bit \"three four\"";
Regex r = new Regex(#"\*+(\w+)\*+");
var output = r.Replace(s, #"""*$1*""");
Note: This will leave the below string unquoted:
*two two*
If you wish to match those strings as well, use this regex:
\*+([^*]+)\*+
EDIT: updated code.
This solution works for your request, as well as the nice to have items:
string text = #"test the ""one"" and a *two** two and a the bit ""three four"" a";
string result = Regex.Replace(text, #"\*+(.*?)\*+", #"""*$1*""");
string noiseWordsPattern = #"(?<!"") # match if double quote prefix is absent
\b # word boundary to prevent partial word matches
(and|a|the) # noise words
\b # word boundary
(?!"") # match if double quote suffix is absent
";
// to use the commented pattern use RegexOptions.IgnorePatternWhitespace
result = Regex.Replace(result, noiseWordsPattern, "", RegexOptions.IgnorePatternWhitespace);
// or use this one line version instead
// result = Regex.Replace(result, #"(?<!"")\b(and|a|the)\b(?!"")", "");
// remove extra spaces resulting from noise words replacement
result = Regex.Replace(result, #"\s+", " ");
Console.WriteLine("Original: {0}", text);
Console.WriteLine("Result: {0}", result);
Output:
Original: test the "one" and a *two** two and a the bit "three four" a
Result: test "one" "*two*" two bit "three four"
The 2nd regex replacement for noise words causes potential duplicate of blank spaces. To remedy this side effect I added the 3rd regex replacement to clean it up.
Something like this. ArgumentReplacer is a callback that is called for each match. The return value is substituted into the returned string.
void Main() {
string text = "\"one\" *two** and a bit \"three *** four\"";
string finderRegex = #"
(""[^""]*"") # quoted
| ([^\s""*]*\*[^\s""]*) # with asteriks
| ([^\s""]+) # without asteriks
";
return Regex.Replace(text, finderRegex, ArgumentReplacer,
RegexOptions.IgnorePatternWhitespace);
}
public static String ArgumentReplacer(Match theMatch) {
// Don't touch quoted arguments, and arguments with no asteriks
if (theMatch.Groups[2].Value.Length == 0)
return theMatch.Value;
// Quote arguments with asteriks, and replace sequences of such
// by a single one.
return String.Format("\"%s\"",
Regex.Replace(theMatch.Value, #"\*\*+", "*"));
}
Alternatives to the left in the pattern has priority over those to the right. This is why I just needed to write "[^\s""]+" in the last alternative.
The quotes, on the other hand, are only matched if they occur at the beginning of the argument. They will not be detected if they occur in the middle of the argument, and we must stop before those if they occur.
Given that you wish to match pairs of quotes, I don’t think your language is regular, therefore I don’t think RegEx is a good solution. E.g
Some people, when confronted with a problem, think “I know, I'll use
regular expressions.”
Now they have two problems.
See "When not to use Regex in C# (or Java, C++ etc)"
I've decided to follow the advice of a couple of responses and go with a parser solution. I've tried the regexes contributed so far and they seem to fail in some cases. That's probably an indication that regexes aren't the appropriate solution to this problem. Thanks for all responses.

C# regex replace unexpected behavior

Given $displayHeight = "800";, replace whatever number is at 800 with int value y_res.
resultString = Regex.Replace(
im_cfg_contents,
#"\$displayHeight[\s]*=[\s]*""(.*)"";",
Convert.ToString(y_res));
In Python I'd use re.sub and it would work. In .NET it replaces the whole line, not the matched group.
What is a quick fix?
Building on a couple of the answers already posted. The Zero-width assertion allows you to do a regular expression match without placing those characters in the match. By placing the first part of the string in a group we've separated it from the digits that you want to be replaced. Then by using a zero-width lookbehind assertion in that group we allow the regular expression to proceed as normal but omit the characters in that group in the match. Similarly, we've placed the last part of the string in a group, and used a zero-width lookahead assertion. Grouping Constructs on MSDN shows the groups as well as the assertions.
resultString = Regex.Replace(
im_cfg_contents,
#"(?<=\$displayHeight[\s]*=[\s]*"")(.*)(?="";)",
Convert.ToString(y_res));
Another approach would be to use the following code. The modification to the regular expression is just placing the first part in a group and the last part in a group. Then in the replace string, we add back in the first and third groups. Not quite as nice as the first approach, but not quite as bad as writing out the $displayHeight part. Substitutions on MSDN shows how the $ characters work.
resultString = Regex.Replace(
im_cfg_contents,
#"(\$displayHeight[\s]*=[\s]*"")(.*)("";)",
"${1}" + Convert.ToString(y_res) + "${3}");
Try this:
resultString = Regex.Replace(
im_cfg_contents,
#"\$displayHeight[\s]*=[\s]*""(.*)"";",
#"\$displayHeight = """ + Convert.ToString(y_res) + #""";");
It replaces the whole string because you've matched the whole string - nothing about this statement tells C# to replace just the matched group, it will find and store that matched group sure, but it's still matching the whole string overall.
You can either change your replacer to:
#"\$displayHeight = """ + Convert.ToString(y_res) + #""";"
..or you can change your pattern to just match the digits, i.e.:
#"[0-9]+"
..or you could see if C# regex supports lookarounds (I'm not sure if it does offhand) and change your match accordingly.
You could also try this, though I think it is a little slower than my other method:
resultString = Regex.Replace(
im_cfg_contents,
"(?<=\\$displayHeight[\\s]*=[\\s]*\").*(?=\";)",
Convert.ToString(y_res));
Check this pattern out
(?<=(\$displayHeight\s*=\s*"))\d+(?=";)
A word about "lookaround".

Categories

Resources