How to find string between some characters using regex in c# - c#

Good day~
I am not really good at this regular expression.
So I need your help, please.
Condition:
Users can input their email addressed and name together.
I want to extract email address and user name out of string.
string pattern1 = "Peter Jackson<peter#jackson.com>";
From that string I want to get "Peter Jackson" and "<peter#jackson.com>".
string pattern2 = "Peter Jackson(peter#jackson.com)";
However, people always make mistakes like below.
And they can also use "[" instead of "<".
so...
string pattern3 = "Peter Jackson[peter#jackson.com]";
Even some stupid users can input like...
string pattern4 = "Peter Jackson{peter#jackson.com}";
So, I had to look for the characters which are "<", "(", "[" and "{".
I tried
string regularExpressionPattern = #"^(<|(|[|{)(.*?)^(}|]|)|>)";
But I think I've done something wrong.
And I also try to think that people could input more mistake like....
string pattern5 = "Peter Jackson<peter#jackson.com>mistake";
Could anyone help this problem?
Advanced thanks.
PS: I know how to split string with a character. So it won't help. I needa proper regular expression.

I believe the regular expression you are looking for is as follows:
(.*?)[<([{](.*?)[>)\]}]
You would want group 1 and group 2.

(.*?)[([<{](.*?)[)\]>}]
I believe this should be adequate.
The name will be in the first captured group, and the email will be in the second.

Seems like (.*)[<\(\[{](.*)[>\)]}] works for me.
Groups for the name, and the email address as well.
http://regexr.com?33sq4 is my test.

Related

C# string masking/formatting/filtering with or without regex

Hopefully this isn't too complicated, I just can't seem to find the answer I need.
I have a string with variables in, such as: this is a %variable% string
The format of the variables within the string is arbitrary, although in this example we're using the filter %{0}%
I am wanting to match variable names to properties and ideally I don't want to loop through GetProperties, formatting and testing each name. What I'd like to do is obtain "variable" as a string and test that.
I already use RegEx to get a list of the variables in a string, using the given filter:
string regExSyntax = string.Format(syntax, #"(?<word>\w+)");
but this returns them WITH the '%' (e.g. '%variable%') and as I said, that filter is arbitrary so I can't just do a string.Replace.
This feels like it should be straight-forward....
Thanks!
"(?<word>\w+)"
Is just capturing anything alphnumeric and putting it into a named capturing group called "Word"
You might be interested in learning about lookbehind and lookahead. For example:
"(?<=%)(?<word>\w+)(?=%)"
You can make it a bit more generic with putting your filter in a seperate variable:
string Boundie = "%";
string Expression = #"(?<=" + Boundie + #")(?<word>\w+)(?=" + Boundie + #")";
I hope this is anywhere near what you are looking for.
Given that your regex syntax is: string regExSyntax = string.Format(syntax, #"(?<word>\w+)");, I assume you're then going to create a Regex and use it to match against some string:
Regex reExtractVars = new Regex(regExSyntax);
Match m = reExtractVars.Match(inputString);
while (m.Success)
{
// get the matched variable
string wholeVar = m.Value; // returns "%variable%"
// get just the "word"
string wordOnly = m.Groups["word"].Value; // returns "variable"
m = m.NextMatch();
}
Or have I completely misunderstood the problem?
Acron,
If you're going to roll-your own script parser... apart from being "a bit mad", unless that's the point of the exercise (is it?), then I strongly suggest that you KISS it... Keep It Simple Stoopid.
So what denotes a VARIABLE in your scripting syntax? Is it the percent signs? And they're fixed, yes? So %name% is a variable, but #comment# is NOT a variable... correct? The phrase "that filter is arbitrary" has me worried. What's a "filter"?
If this isn't homework then just use an existing scripting engine, with existing, well defined, well known syntax. Something like Jint, for example.
Cheers. Keith.

c# regex - matching optionals after a named group

I'm sure this has been quite numerous times but though i've checked all similar questions, i couldn't come up with a solution.
The problem is that i've an input urls similar to;
http://www.justin.tv/peacefuljay
http://www.justin.tv/peacefuljay#/w/778713616/3
http://de.justin.tv/peacefuljay#/w/778713616/3
I want to match the slug part of it (in above examples, it's peacefuljay).
Regex i've tried so far are;
http://.*\.justin\.tv/(?<Slug>.*)(?:#.)?
http://.*\.justin\.tv/(?<Slug>.*)(?:#.)
But i can't come with a solution. Either it fails in the first url or in others.
Help appreciated.
The easiest way of parsing a Uri is by using the Uri class:
string justin = "http://www.justin.tv/peacefuljay#/w/778713616/3";
Uri uri = new Uri(justin);
string s1 = uri.LocalPath; // "/peacefuljay"
string s2 = uri.Segments[1]; // "peacefuljay"
If you insisnt on a regex, you can try someting a bit more specific:
Match mate = Regex.Match(str, #"http://(\w+\.)*justin\.tv(?:/(?<Slug>[^#]*))?");
(\w+\.)* - Ensures you match the domain, not anywhere else in the string (eg, hash or query string).
(?:/(?<Slug>[^#]*))? - Optional group with the string you need. [^#] limits the characters you expect to see in your slug, so it should eliminate the need of the extra group after it.
As I see it there's no reason to treat to the parts after the "slug".
Therefore you only need to match all characters after the host that aren't "/" or "#".
http://.*\.justin\.tv/(?<Slug>[^/#]+)
http://.*\.justin\.tv/(?<Slug>.*)#*?
or
http://.*\.justin\.tv/(?<Slug>.*)(#|$)

How to Extract the Word Following a Symbol?

I have a string that could have any sentence in it but somewhere in that string will be the # symbol, followed by an attached word, sort of like #username you see on some sites.
so maybe the string is "hey how are you" or it's "#john hey how are you".
IF there's an "#" in the string i want to pull what comes immediately after it into its own new string.
in this instance how can i pull "john" into a different string so i could theoretically notify this person of his new message? i'm trying to play with string.contains or .replace but i'm pretty new and having a hard time.
this btw is in c# asp.net
You can use the Substring and IndexOf methods together to achieve this.
I hope this helps.
Thanks,
Damian
Here's how you do it without regex:
string s = "hi there #john how are you";
string getTag(string s)
{
int atSign = s.IndexOf("#");
if (atSign == -1) return "";
// start at #, stop at sentence or phrase end
// I'm assuming this is English, of course
// so we leave in ' and -
int wordEnd = s.IndexOfAny(" .,;:!?", atSign);
if (wordEnd > -1)
return s.Substring(atSign, wordEnd - atSign);
else
return s.Substring(atSign);
}
You should really learn regular expressions. This will work for you:
using System.Text.RegularExpressions;
var res = Regex.Match("hey #john how are you", #"#(\S+)");
if (res.Success)
{
//john
var name = res.Groups[1].Value;
}
Finds the first occurrence. If you want to find all you can use Regex.Matches. \S means anything else than a whitespace. This means it also make hey #john, how are you => john, and #john123 => john123 which may be wrong. Maybe [a-zA-Z] or similar would suit you better (depends on which characters the usernames is made of). If you would give more examples, I could tune it :)
I can recommend this page:
http://www.regular-expressions.info/
and this tool where you can test your statements:
http://regexlib.com/RESilverlight.aspx
The best way to solve this is using Regular Expressions. You can find a great resource here.
Using RegEx, you can search for the pattern you are after. I always have to refer to some documentation to write one...
Here is a pattern to start with - "#(\w+)" - the # will get matched, and then the parentheses will indicate that you want what comes after. The "\w" means you want only word characters to match (a-z or A-Z), and the "+" indicates that there should be one or more word characters in a row.
You can try Regex...
I think will be something like this
string userName = Regex.Match(yourString, "#(.+)\\s").Groups[1].Value;
RegularExpressions. Dont know C#, but the RegEx would be
/(#[\w]+) / - Everything in the parans is captured in a special variable, or attached to RegEx object.
Use this:
var r = new Regex(#"#\w+");
foreach (Match m in r.Matches(stringToSearch))
DoSomething(m.Value);
DoSomething(string foundName) is a function that handles name (found after #).
This will find all #names in stringToSearch

How can I use RegEx to make sure a valid email is written in my TextBox>?

I'm a complete newbie to RegEx and I'm sure it'll be brilliant to use once I know how to use it. :P
I have a couple of textBoxes and I was wondering if anyone could me acomplish what I need.
In the EMail textbox, I'd like to make sure the user writes in a valid email. xxx#yyy.zzz
Is there a way for RegEx to help me out?
I'd also really like a way to format the name the user writes down. So if a user writes in "SerGIo TAPIA gutTIerrez I want to format that string (behind the scenes before saving it) to "Sergio Tapia Gutierrez" Can RegEx do this?
Thanks so much SO.
(inb4 Rex :P )
A complete and accurate regex for email validation is surprisingly difficult, I trust you can use google to find some examples.
The general rule for email validation is to actually try to send an email.
Well, this is an easy one! :)
no, there exists no regex that can validate* e-mail addresses;
no, regex cannot transform "SerGIo TAPIA gutTIerrez" into "Sergio Tapia Gutierrez". Sure, some language like Perl (and other perhaps) can mix-in some fancy stuff inside regex-es to do this, but it is not regex that actually performs the transformation. Regex only matches text, plain and simple.
* by 'valid' I mean see if the address actually exists.
This is one way, but there are many others.
public static bool isEmail(string emailAddress)
{
if(string.IsNullOrEmpty(emailAddress))
return false;
Regex EmailAddress = new Regex(#"^([0-9a-zA-Z]([-.\w]*[0-9a-zA-Z])*#([0-9a-zA-Z][-\w]*[0-9a-zA-Z]\.)+[a-zA-Z]{2,9})$");
return EmailAddress.IsMatch(emailAddress);
}
http://www.cambiaresearch.com/c4/bf974b23-484b-41c3-b331-0bd8121d5177/Parsing-Email-Addresses-with-Regular-Expressions.aspx
public bool TestEmailRegex(string emailAddress)
{
// string patternLenient = #"\w+([-+.]\w+)*#\w+([-.]\w+)*\.\w+([-.]\w+)*";
// Regex reLenient = new Regex(patternLenient);
string patternStrict = #"^(([^<>()[\]\\.,;:\s#""]+"
+ #"(\.[^<>()[\]\\.,;:\s#""]+)*)|("".+""))#"
+ #"((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}"
+ #"\.[0-9]{1,3}\])|(([a-zA-Z\-0-9]+\.)+"
+ #"[a-zA-Z]{2,}))$";
Regex reStrict = new Regex(patternStrict);
// bool isLenientMatch = reLenient.IsMatch(emailAddress);
// return isLenientMatch;
bool isStrictMatch = reStrict.IsMatch(emailAddress);
return isStrictMatch;
}

C# Extracting a name from a string

I want to extract 'James\, Brown' from the string below but I don't always know what the name will be. The comma is causing me some difficuly so what would you suggest to extract James\, Brown?
OU=James\, Brown,OU=Test,DC=Internal,DC=Net
Thanks
A regex is likely your best approach
static string ParseName(string arg) {
var regex = new Regex(#"^OU=([a-zA-Z\\]+\,\s+[a-zA-Z\\]+)\,.*$");
var match = regex.Match(arg);
return match.Groups[1].Value;
}
You can use a regex:
string input = #"OU=James\, Brown,OU=Test,DC=Internal,DC=Net";
Match m = Regex.Match(input, "^OU=(.*?),OU=.*$");
Console.WriteLine(m.Groups[1].Value);
A quite brittle way to do this might be...
string name = #"OU=James\, Brown,OU=Test,DC=Internal,DC=Net";
string[] splitUp = name.Split("=".ToCharArray(),3);
string namePart = splitUp[1].Replace(",OU","");
Console.WriteLine(namePart);
I wouldn't necessarily advocate this method, but I've just come back from a departmental Christmas lyunch and my brain is not fully engaged yet.
I'd start off with a regex to split up the groups:
Regex rx = new Regex(#"(?<!\\),");
String test = "OU=James\\, Brown,OU=Test,DC=Internal,DC=Net";
String[] segments = rx.Split(test);
But from there I would split up the parameters in the array by splitting them up manually, so that you don't have to use a regex that depends on more than the separator character used. Since this looks like an LDAP query, it might not matter if you always look at params[0], but there is a chance that the name might be set as "CN=". You can cover both cases by just reading the query like this:
String name = segments[0].Split('=', 2)[1];
That looks suspiciously like an LDAP or Active Directory distinguished name formatted according to RFC 2253/4514.
Unless you're working with well known names and/or are okay with a fragile hackaround (like the regex solutions) - then you should start by reading the spec.
If you, like me, generally hate implementing code according to RFCs - then hope this guy did a better job following the spec than you would. At least he claims to be 2253 compliant.
If the slash is always there, I would look at potentially using RegEx to do the match, you can use a match group for the last and first names.
^OU=([a-zA-Z])\,\s([a-zA-Z])
That RegEx will match names that include characters only, you will need to refine it a bit for better matching for the non-standard names. Here is a RegEx tester to help you along the way if you go this route.
Replace \, with your own preferred magic string (perhaps & #44;), split on remaining commas or search til the first comma, then replace your magic string with a single comma.
i.e. Something like:
string originalStr = #"OU=James\, Brown,OU=Test,DC=Internal,DC=Net";
string replacedStr = originalStr.Replace("\,", ",");
string name = replacedStr.Substring(0, replacedStr.IndexOf(","));
Console.WriteLine(name.Replace(",", ","));
Assuming you're running in Windows, use PInvoke with DsUnquoteRdnValueW. For code, see my answer to another question: https://stackoverflow.com/a/11091804/628981
If the format is always the same:
string line = GetStringFromWherever();
int start = line.IndexOf("=") + 1;//+1 to get start of name
int end = line.IndexOf("OU=",start) -1; //-1 to remove comma
string name = line.Substring(start, end - start);
Forgive if syntax is not quite right - from memory. Obviously this is not very robust and fails if the format ever changes.

Categories

Resources