Regular Expression without braces - c#

i have the following sample cases :
1) "Sample"
2) "[10,25]"
I want to form a(only one) regular expression pattern, to which the above examples are passed returns me "Sample" and "10,25".
Note: Input strings do not include Quotes.
I came up with the following expression (?<=\[)(.*?)(?=\]), this satisfies the second case and retreives me only "10,25" but when the first case is matched it returns me blank. I want "Sample" to be returned? can anyone help me.
C#.

here you go, a small regex using a positive lookbehind, sometime these are very handy
Regex
(?<=^|\[)([\w,]+)
Test string
Sample
[10,25]
Result
MATCH 1
[0-6] Sample
MATCH 2
[8-13] 10,25
try at regex101.com
if " is included in your original string, use this regex, this will look for " mark as well, you may choose to remove ^| from lookup if " mark is always included or you may choose to leave it as it is if your text has combination of with and without " marks
Regex
(?<=^|\[|\")([\w,]+)
try at regex101.com

As far as I can tell, the below regex should help:
Regex regex = new Regex(#"^\w+|[[](\w)+\,(\w)+[]]$");
This will match multiple words, or 2 words (alphanumeric) separated by commas and inside square brackets.

One Java example:
// String input = "Sample";
String input = "[10,25]";
String text = "[^,\\[\\]]+";
Pattern pMod = Pattern.compile("(" + text + ")|(?>\\[(" + text + "," + text + ")\\])");
Matcher mMod = pMod.matcher(input);
while (mMod.find()) {
if(mMod.group(1) != null) {
System.out.println(mMod.group(1));
}
if(mMod.group(2)!=null) {
System.out.println(mMod.group(2));
}
}
if input is "[hello&bye,25|35]", then the output is hello&bye,25|35

Related

Replacing a portion of a string with an exact matching

I just want to replace a portion of a string only if matches the given text.
My use case is as follows:
var text = "<wd:response><wd:response-data></wd:response-data></wd:response >";
string result = text.Replace("wd:response", "response");
/*
* expecting the below text
<response><wd:response-data></wd:response-data></response>
*
*/
I followed the following answers:
Way to have String.Replace only hit "whole words"
Regular expression for exact match of a string
But I failed to achieve what I want.
Please share your thoughts/solutions.
Sample on
https://dotnetfiddle.net/pMkO8Q
In general, you should really be parsing and manipulating XML as XML, using functions that know how XML works and what's legal in the language. Regex and other naive text manipulation will often lead you into trouble.
That said, for a very simple solution to this specific problem, you can do this with two replaces:
var text = "<wd:response><wd:response-data></wd:response-data></wd:response >";
text.Replace("wd:response>", "response>").Replace("wd:response ", "response ")
(Note the spaces at the end of the parameters to the second replace.)
Alternatively use a regex similar to "wd:response\s*>"
The easiest way to achieve your result as per your .net fiddle is use the replace as below.
string result = text.Replace("wd:response>", "response>");
But proper way to achieve this is parsing using XML
You can capture the string wd-response in a capturing group and replace using Regex.Replace using the MatchEvaluator like this.
Regex explanation - <[/]?(wd:response)[\s+]?>
Match < literally
Match / optionally hence the ?
Match the string wd:response and place it in a capturing group enclosed with ()
Match one or more optional whitespace [\s+]?
Match > literally
public class Program
{
public static void Main(string[] args)
{
string text = "<wd:response><wd:response-data></wd:response-data></wd:response >";
string replacePattern = "response";
string pattern = #"<[/]?(wd:response)[\s+]?>";
string replacedPattern = Regex.Replace(text, pattern, match =>
{
// Extract the first group
Group group = match.Groups[1];
// Replace the group value with the replacePattern
return string.Format("{0}{1}{2}", match.Value.Substring(0, group.Index - match.Index), replacePattern, match.Value.Substring(group.Index - match.Index + group.Length));
});
Console.WriteLine(replacedPattern);
}
}
Outputting:
<response><wd:response-data></wd:response-data></response >

Using Regex to replace part of the entire string/expression

Regex are simple yet complex at times. Stuck to replace an expression having variables, assuming variable is of the following pattern:
\w+(\.\w+)*
I want to replace all the occurrences of my variable replacing dot (.) because i have to eventually tokenize the expression where tokenizer do not recognize variable having dots. So i thought to replace them with underscore before parsing. After tokenizing however i want to get the variable token with original value.
Expression:
(x1.y2.z3 + 9.99) + y2_z1 - x1.y2.z3
Three Variables:
x1.y2.z3
y2_z1
x1.y2.z3
Desired Output:
(x1_y2_z3 + 9.99) + y2_z1 - x1_y2_z3
Question 1: How to use Regex replace in this case?
Question 2: Is there any better way to address above mentioned problem because variable can have underscore so replacing dot with underscore is not a viable solution to get the original variable back in tokens?
This regex pattern seems to work: [a-zA-Z]+\d+\S+
To replace a dot found only in a match you use MatchEvaluator:
private static char charToReplaceWith = '_';
static void Main(string[] args)
{
string s = "(x1.y2.z3 + 9.99) + y2_z1 - x1.y2.z3";
Console.WriteLine(Regex.Replace(s, #"[a-zA-Z]+\d+\S+", new MatchEvaluator(ReplaceDotWithCharInMatch)));
Console.Read();
}
private static string ReplaceDotWithCharInMatch(Match m)
{
return m.Value.Replace('.', charToReplaceWith);
}
Which gives this output:
(x1_y2_z3 + 9.99) + y2_z1 - x1_y2_z3
I don't fully understand your second question and how to deal with tokenizing variables that already have underscores, but you should be able to choose a character to replace with (i.e., if (string.Contains('_')) is true then you choose a different character to replace with, but probably have to maintain a dictionary that says "I replaced all dots with underscores, and all underscores with ^, etc..).
Try this:
string input = "(x1.y2.z3 + 9.99) + y2_z1 - x1.y2.z3";
string output = Regex.Replace(input, "\\.(?<![a-z])", "_");
This will replace only periods which are followed by a letter (a-z).
Use Regex' negative lookahead by making a group that starts with (?!
A dot followed by something non-numeric would be as simple as this:
// matches any dot NOT followed by a character in the range 0-9
String output = Regex.Replace(input, "\\.(?![0-9])", "_");
This has the advantage that while the [0-9] is part of the expression, it is only checked as being behind the match, but is not actually part of the match.

Remove special characters from string with unicode

I found the most popular answer to this question is:
Regex.Replace(value, "[^a-zA-Z0-9]+", " ", RegexOptions.Compiled);
However, if users type in Non-English name when billing, this method will consider these non- are special characters and remove them.
Is there any way we can build for most of users since my website is multi-language.
Make it Unicode aware:
var res = Regex.Replace(value, #"[^\p{L}\p{M}\p{N}]+", " ");
If you plan to keep only regular digits, keep [0-9].
The regex matches one or more symbols other than Unicode letters (\p{L}), diacritics (\p{M}) and digits (\p{N}).
You might consider var res = Regex.Replace(value, #"\W+", " "), but it will keep _ since the underscore is a "word" character.
I found my self that the best way to achieve this and make work with all languages is create a string with all banned characters, look this code:
string input = #"heya's #FFFFF , CUL8R M8 how are you?'"; // This is the input string
string regex = #"[!""#$%&'()*+,\-./:;<=>?#[\\\]^_`{|}~]"; //Banned characters string, add all characters you don´t want to be displayed here.
Match m;
while ((m = Regex.Match(input, regex)) != null)
{
if (m.Success)
input = input.Remove(m.Index, m.Length);
else // if m.Success is false: break, because while loop can be infinite
break;
}
input = input.Replace(" ", " ").Replace(" "," "); //if string has two-three-four spaces together change it to one
MessageBox.Show(input);
Hope it works!
PS: As others posted here, there are other ways. But I personally prefer that one even though it´s way more code. Choose the one you think better fits for your needing.

Regex pattern for text between 2 strings

I am trying to extract all of the text (shown as xxxx) in the follow pattern:
Session["xxxx"]
using c#
This may be Request.Querystring["xxxx"] so I am trying to build the expression dynamically. When I do so, I get all sorts of problems about unescaped charecters or no matches :(
an example might be:
string patternstart = "Session[";
string patternend = "]";
string regexexpr = #"\\" + patternstart + #"(.*?)\\" + patternend ;
string sText = "Text to be searched containing Session[\"xxxx\"] the result would be xxxx";
MatchCollection matches = Regex.Matches(sText, #regexexpr);
Can anyone help with this as I am stumped (as I always seem to be with RegEx :) )
With some little modifications to your code.
string patternstart = Regex.Escape("Session[");
string patternend = Regex.Escape("]");
string regexexpr = patternstart + #"(.*?)" + patternend;
The pattern you construct in your example looks something like this:
\\Session[(.*?)\\]
There are a couple of problems with this. First it assumes the string starts with a literal backslash, second, it wraps the entire (.*?) in a character class, that means it will match any single open parenthesis, period, asterisk, question mark, close parenthesis or backslash. You'd need to escape the the brackets in your pattern, if you want to match a literal [.
You could use a pattern like this:
Session\[(.*?)]
For example:
string regexexpr = #"Session\[(.*?)]";
string sText = "Text to be searched containing Session[\"xxxx\"] the result would be xxxx";
MatchCollection matches = Regex.Matches(sText, #regexexpr);
Console.WriteLine(matches[0].Groups[1].Value); // "xxxx"
The characters [ and ] have a special meaning with regular expressions - they define a group where one of the contained characters must match. To work around this, simply 'escape' them with a leading \ character:
string patternstart = "Session\[";
string patternend = "\]";
An example "final string" could then be:
Session\["(.*)"\]
However, you could easily write your RegEx to handle Session, Querystring, etc automatically if you require (without also matching every other array you throw at it), and avoid having to build up the string in the first place:
(Querystring|Session|Form)\["(.*)"\]
and then take the second match.

Regular Expression to get all characters before "-"

How can I get the string before the character "-" using regular expressions?
For example, I have "text-1" and I want to return "text".
So I see many possibilities to achieve this.
string text = "Foobar-test";
Regex Match everything till the first "-"
Match result = Regex.Match(text, #"^.*?(?=-)");
^ match from the start of the string
.*? match any character (.), zero or more times (*) but as less as possible (?)
(?=-) till the next character is a "-" (this is a positive look ahead)
Regex Match anything that is not a "-" from the start of the string
Match result2 = Regex.Match(text, #"^[^-]*");
[^-]* matches any character that is not a "-" zero or more times
Regex Match anything that is not a "-" from the start of the string till a "-"
Match result21 = Regex.Match(text, #"^([^-]*)-");
Will only match if there is a dash in the string, but the result is then found in capture group 1.
Split on "-"
string[] result3 = text.Split('-');
Result is an Array the part before the first "-" is the first item in the Array
Substring till the first "-"
string result4 = text.Substring(0, text.IndexOf("-"));
Get the substring from text from the start till the first occurrence of "-" (text.IndexOf("-"))
You get then all the results (all the same) with this
Console.WriteLine(result);
Console.WriteLine(result2);
Console.WriteLine(result21.Groups[1]);
Console.WriteLine(result3[0]);
Console.WriteLine(result4);
I would prefer the first method.
You need to think also about the behavior, when there is no dash in the string. The fourth method will throw an exception in that case, because text.IndexOf("-") will be -1. Method 1 and 2.1 will return nothing and method 2 and 3 will return the complete string.
Here is my suggestion - it's quite simple as that:
[^-]*
This is something like the regular expression you need:
([^-]*)-
Quick tests in JavaScript:
/([^-]*)-/.exec('text-1')[1] // 'text'
/([^-]*)-/.exec('foo-bar-1')[1] // 'foo'
/([^-]*)-/.exec('-1')[1] // ''
/([^-]*)-/.exec('quux')[1] // explodes
I dont think you need regex to achieve this. I would look at the SubString method along with the indexOf method. If you need more help, add a comment showing what you have attempted and I will offer more help.
You could just use another non-regex based method. Someone gave the suggestion of using Substring, but you could also use Split:
string testString = "my-string";
string[] splitString = testString.Split("-");
string resultingString = splitString[0]; //my
See http://msdn.microsoft.com/en-US/library/ms228388%28v=VS.80%29.aspx for another good example.
If you want use RegEx in .NET,
Regex rx = new Regex(#"^([\w]+)(\-)*");
var match = rx.Match("thisis-thefirst");
var text = match.Groups[1].Value;
Assert.AreEqual("thisis", text);
Find all word and space characters up to and including a -
^[\w ]+-

Categories

Resources