What will be regular expression of domain URL of "www.google.com"? - c#

What will be the regex for RegularExpressionValidator in asp.net for domain name like "www.google.com"?
Valid Cases:
www.google.com
www.youwebsite.com
Invalid Cases:
http://www.google.com
https://www.google.com
google.com
www.google
Currently I used (?:[a-z0-9](?:[a-z0-9-]{0,61}[a-z0-9])?\.)+[a-z0-9][a-z0-9-]{0,61}[a-z0-9] but it fails for invalid case number 3 and 4.

The pattern that you tried fails for the third and fourth of the invalid cases because in general you are matching a-z0-9 and then repeat 1+ times . followed by a-z0-9 which does not take a www into account.
If you want to keep your pattern, you should make sure that it starts with www.
^www\.(?:[a-z0-9](?:[a-z0-9-]{0,61}[a-z0-9])?\.)+[a-z0-9][a-z0-9-]{0,61}[a-z0-9]$
Regex demo
You might shorten your pattern and make the match a bit broader:
^www\.[a-z0-9-]+(?:\.[a-z0-9-]+)*\.com$
Regex demo
You can always extend the character class if you want to allow matching more characters.

Assuming that we would have valid ULRs as listed, we can start with a simple expression such as:
^www\..+\.com
Demo 1
Then, we can add additional boundaries, if desired. For instance, we could add char class and end anchor, such as:
^www\..+\.com$
^www\.[A-Za-z_]+\.com$
Demo 2
If necessary, we would continue adding more constraints and test:
using System;
using System.Text.RegularExpressions;
public class Example
{
public static void Main()
{
string pattern = #"^www\.[A-Za-z_]+\.com";
string input = #"www.google.com
www.youwebsite.com
http://www.google.com
https://www.google.com
google.com
www.google";
RegexOptions options = RegexOptions.Multiline;
foreach (Match m in Regex.Matches(input, pattern, options))
{
Console.WriteLine("'{0}' found at index {1}.", m.Value, m.Index);
}
}
}
RegEx Circuit
jex.im visualizes regular expressions:

This matches the part you want.
\bwww\.[a-zA-Z0-9]{2,256}\.com\b
But easier way to go with such a simple pattern is to use StartsWith, EndsWith, and then check what is in between.

Related

How to exclude regex match inside nested parentheses

I have a text like this:
UseProp1?(Prop1?Prop1:Test):(UseProp2?Prop2:(Test Text: '{TextProperty}' Test Reference:{Reference}))
I'm trying to use regex in c# to extract the nested if/else-segments.
To find '?' I've used:
Pattern 1: \?\s*(?![^()]*\))
and to find ':' I've used:
Pattern 2: \:\s*(?![^()]*\))
This works fine when there is one level of parentheses but not when nesting them.
I've used this online tool to simplify the testing: http://regexstorm.net/tester (and insert pattern-1 and input from above)
As you can see, it highlights two matches but I only want the first. You'll also notice that first parentheses is overlooked but not the next one with the nested levels
I expect the match list to be:
1) UseProp1
2) (Prop1?Prop1:Test):(UseProp2?Prop2:(Test Text: '{TextProperty}' Test Reference:{Reference}))
What I'm getting now is:
1) UseProp1
2) (Prop1?Prop1:Test):(UseProp2
3) Prop2:(Test Text: '{TextProperty}' Test Reference:{Reference}))
Expanding on #bobble bubble's comment, here's my regex:
It will capture the first layer of ternary functions. Capture groups: $1 is the conditional, $2 is the true clause, and $3 is the false clause. You will then have to match the regex on each of those to step further down the tree:
((?:\((?>\((?<c>)|[^()]+|\)(?<-c>))*(?(c)(?!))\))+|\b[^)(?:]+)+\?((?:\((?>\((?<c>)|[^()]+|\)(?<-c>))*(?(c)(?!))\))+|\b[^)(?:]+)+\:((?:\((?>\((?<c>)|[^()]+|\)(?<-c>))*(?(c)(?!))\))+|\b[^)(?:]+)+
Code in Tester
That being said, if you are evaluating math in these expressions as well, it may be more valuable to use a runtime compiler to do all the heavy lifting for you. This answer will help you design in that direction if you so choose.
If I understand it right, and we wish to capture only the two listed formats, we can start with a simple expression using alternation, then we'd modify its compartments, if we would like so:
UseProp1|(\(?Prop1\?Prop1(:Test)\)):(\(UseProp2\?Prop2):\((Test\sText):\s+'\{(.+?)}'\s+Test\sReference:\{(.+?)}\)\)
Demo
Test
using System;
using System.Text.RegularExpressions;
public class Example
{
public static void Main()
{
string pattern = #"UseProp1|(\(?Prop1\?Prop1(:Test)\)):(\(UseProp2\?Prop2):\((Test\sText):\s+'\{(.+?)}'\s+Test\sReference:\{(.+?)}\)\)";
string input = #"UseProp1
(Prop1?Prop1:Test):(UseProp2?Prop2:(Test Text: '{TextProperty}' Test Reference:{Reference}))
";
RegexOptions options = RegexOptions.Multiline;
foreach (Match m in Regex.Matches(input, pattern, options))
{
Console.WriteLine("'{0}' found at index {1}.", m.Value, m.Index);
}
}
}
RegEx
If this expression wasn't desired and you wish to modify it, please visit this link at regex101.com.
RegEx Circuit
jex.im visualizes regular expressions:

Regular expression to match Url's, except a certain domain

I have the following regular expression that matches Url's. What I want to do is make it not match when a URL belongs to a certain domain, let's say google.com.
How can I do that? I've been reading other question and regular expression references and so far I could achieve it. My Regular expression:
^(https?:\/\/)?([\da-zA-Z\.-]+)\.([a-zA-Z\.]{2,6})([\/\w \.-]*)*\/?$
I use this to filter messages in a chat, I'm using C# to do so. Here's a tool in case you want to dig further: http://regexr.com/3faji
C# extension method:
static class String
{
public static string ClearUrl(string text)
{
Regex regx = new Regex(#"^(https?:\/\/)?([\da-zA-Z\.-]+)\.([a-zA-Z\.]{2,6})([\/\w \.-]*)*\/?$",
RegexOptions.IgnoreCase);
string output = regx.Replace(text, "*");
return output;
}
}
Thanks for any help
You can use negative lookahead in your regex to avoid matching certain domains:
^(https?:\/\/)?(?!(?:www\.)?google\.com)([\da-zA-Z.-]+)\.([a‌​-zA-Z\.]{2,6})([\/\w .-]*)*\/?$
Or else:
^(https?:\/\/)?(?!.*google\.com)([\da-zA-Z.-]+)\.([a‌​-zA-Z\.]{2,6})([\/\w .-]*)*\/?$
(?!(?:www\.)?google\.com) is negative lookahead that will assert failure when we have www.google.com or google.com ahead.
RegEx Demo
This should work using negative lookahead, and also includes URLs that start with www instead of the protocol, and also that are not the first character of a line:
((http|ftp|https):\/\/|www.)(?!google|www.google)[\w\-_]+(\.[\w\-_]+)+([\w\-\.,#?^=%&:/~\+#]*[\w\-\#?^=%&/~\+#])?

Regex working in Regexr but not C#, why?

From the below mentioned input string, I want to extract the values specified in {} for s:ds field. I have attached my regex pattern. Now the pattern I used for testing on http://www.regexr.com/ is:
s:ds=\\\"({[\d\w]{8}\-([\d\w]{4}\-){3}[\d\w]{12}})\\\"
and it works absolutely fine.
But the same in C# code does not work. I have also added \\ instead of \ for c# code and replaced \" with \"" . Let me know if Im doing something wrong. Below is the code snippet.
string inputString is "s:ds=\"{46C01EB7-6D43-4E2A-9267-608DE8AFA311}\" s:ds=\"{37BA4BA0-581C-40DC-A542-FFD9E99BC345}\" s:id=\"{C091E71D-4817-49BC-B120-56CE88BC52C2}\"";
string regex = #"s:ds=\\\""({[\d\w]{8}\-(?:[\d\w]{4}\-){3}[\d\w]{12}})\\\""";
MatchCollection matchCollection = Regex.Matches(layoutField, regex);
if (matchCollection.Count > 1)
{
Log.Info("Collection Found.", this);
}
If you only watch to match the values...
You should be able to just use ([\d\w]{8}\-([\d\w]{4}\-){3}[\d\w]{12}) for your expression if you only want to match the withing your gullwing braces :
string input = "s:ds=\"{46C01EB7-6D43-4E2A-9267-608DE8AFA311} ...";
// Use the following expression to just match your GUID values
string regex = #"([\d\w]{8}\-([\d\w]{4}\-){3}[\d\w]{12})";
// Store your matches
MatchCollection matchCollection = Regex.Matches(input, regex);
// Iterate through each one
foreach(var match in matchCollection)
{
// Output the match
Console.WriteLine("Collection Found : {0}", match);
}
You can see a working example of this in action here and example output demonstrated below :
If you want to only match those following s:ds...
If you only want to capture the values for s:ds sections, you could consider appending (?<=(s:ds=""{)) to the front of your expression, which would be a look-behind that would only match values that were preceded by "s:ds={" :
string regex = #"(?<=(s:ds=""{))([\d\w]{8}\-([\d\w]{4}\-){3}[\d\w]{12})";
You can see an example of this approach here and demonstrated below (notice it doesn't match the s:id element :
Another Consideration
Currently you are using \w to match "word" characters within your expression and while this might work for your uses, it will match all digits \d, letters a-zA-z and underscores _. It's unlikely that you would need some of these, so you may want to consider revising your character sets to use just what you would expect like [A-Z\d] to only match uppercase letters and numbers or [0-9A-Fa-f] if you are only expected GUID values (e.g. hex).
Looks like you might be over-escaping.
Give this a shot:
#"s:ds=\""({[\d\w]{8}\-([\d\w]{4}\-){3}[\d\w]{12}})\"""

Regex to capture groups of parentheses including inner and outer parentheses

I want to match all parentheses including the inner and outer parentheses.
Input: abc(test)def(rst(another test)uv)xy
Desired Output: (test)
(rst(another test)uv)
(another test)
My following c# code returns only (test) and (rst(another test)uv):
string input = "abc(test)def(rst(another test)uv)xy";
Regex regex = new Regex(#"\(([^()]+| (?<Level>\()| (?<-Level>\)))+(?(Level)(?!))\)", RegexOptions.IgnorePatternWhitespace);
foreach (Match c in regex.Matches(input))
{
Console.WriteLine(c.Value);
}
You are looking for overlapping matches. Thus, just place your regex into a capturing group and put it inside a non-anchored positive lookahead:
Regex regex = new Regex(#"(?=(\(([^()]+| (?<Level>\()| (?<-Level>\)))+(?(Level)(?!))\)))", RegexOptions.IgnorePatternWhitespace);
The value you need will be inside match.Groups[1].Value.
See the IDEONE demo:
using System;
using System.Text.RegularExpressions;
using System.IO;
using System.Linq;
public class Test
{
public static void Main()
{
var input = "abc(test)def(rst(another test)uv)xy";
var regex = new Regex(#"(?=(\(([^()]+| (?<Level>\()| (?<-Level>\)))+(?(Level)(?!))\)))", RegexOptions.IgnorePatternWhitespace);
var results = regex.Matches(input).Cast<Match>()
.Select(p => p.Groups[1].Value)
.ToList();
Console.WriteLine(String.Join(", ", results));
}
}
Results: (test), (rst(another test)uv), (another test).
Note that unanchored positive look-aheads can be used to find overlapping matches with capturing in place because they do not consume text (i.e. the regex engine index stays at its current position when trying to match with all the subpatterns inside the lookahead) and the regex engine automatically moves its index after match/failure making the matching process "global" (i.e. tests for a match at every position inside an input string).
Although lookahead subexpressions do not match, they still can capture into groups.
Thus, when the look-ahead comes to the (, it may match a zero-width string and place they value you need into Group 1. Then, it goes on and finds another ( inside the first (...), and can capture a substring inside it again.
You could use this one : \((?>[^()]+|\((?<P>)|(?<C-P>)\))*(?(P)(?!))\) but you'll have to dig through captures, groups and groups' captures to get what you want (see demo)
Edit: This answer is flat out wrong for .Net regular expressions - see nam's comment below.
Original answer:
Regular expressions match regular languages. Nested parentheses are not a regular language, they require a context-free grammar to match. So the short answer is there is no way to do what you're attempting.
https://stackoverflow.com/a/133684/361631

Want any alphanumeric and underscore, dash and period to pass

In my controller, I current set a constraint that has the following regex:
#"\w+"
I want to also allow for underscore,dash and period.
THe more effecient the better since this is called allot.
any tips?
try this:
#"[._\w-]+"
(?:\w|[-_.])+
Will match either one or more word characters or a hyphen, underscore or period. They are bundled in a non-capturing group.
I guess we don't really want to add the _ to our expression here, it is already part of the \w construct, which would account for uppers, lowers, digits and underscore [A-Za-z0-9_], and
[\w.-]+
would work just fine.
We can also add start and end anchors, if you'd wanted to:
^[\w.-]+$
Test
using System;
using System.Text.RegularExpressions;
public class Example
{
public static void Main()
{
string pattern = #"[\w.-]+";
string input = #"abcABC__.
ABCabc_.-
-_-_-abc.
";
RegexOptions options = RegexOptions.Multiline;
foreach (Match m in Regex.Matches(input, pattern, options))
{
Console.WriteLine("'{0}' found at index {1}.", m.Value, m.Index);
}
}
}
C# Demo
If you wish to simplify/modify/explore the expression, it's been explained on the top right panel of regex101.com. If you'd like, you can also watch in this link, how it would match against some sample inputs.
RegEx Circuit
jex.im visualizes regular expressions:
Does the following help?
#"[\w_-.]+"
P.S. I use Rad Software Regular Expression Designer to design complex Regexes.
#"[\w_-.]+"
im no regex guru was just a guess so verify that it works...
I'd use #"[\w\-._]+" as regex since the dash might be interpreted as a range delimiter. It is of no danger with \w but if you later on add say # it's safer to have the dash already escaped.
There's a few suggestions that have _-. already on the page and I believe that will be interpreted as a "word character" or anything from "_" to "." in a range.
Pattern to include all: [_.-\w]+
You can suffix the _ \. and - with ? to make any of the characters optional (none or more) e.g. for the underscore:
[_?\.-\w]+
but see Skurmedel's pattern to make all optional.

Categories

Resources