Correction in this simple regular expression

Correction in this simple regular expression - c#

I am new to regular expressions and the one that i have written might be a very simple one but donot know where I am wrong.
#"^([a-zA-Z._]+)#([\d]+)"
This RE is for the following string:
somename#somenumber
Now i am trying to retrieve the somename and somenumber. This is what i did:
ac.name = m.Groups[0].Value;
ac.number = m.Groups[1].Value;
Here ac.name reads the complete string, and ac.number reads somenumber. Where am I wrong in ac.name?

i guess the regex is correct, the problem is, you get the ac.name not from group 1 but group(0), which is the whole string. try this:
ac.name = m.Groups[1].Value;
ac.number = m.Groups[2].Value;

This regex is correct. I think your mistake is in somewhere else. You seem to use C#. So, you should think about the regex usage in the language.
Looking to the code sample in MSDN, you need to use 1-based indexes while accessing Groups instead of zero-based (as also Kent suggested). So, use this:
String name = m.Groups[1].Value;
String number = m.Groups[2].Value;

use this regex (\w+)#(\d+([.,]\d+)?)
Groups[1] will be contain name
Groups[2] will be contain number

I think you should move the + into the capture group:
#"^([a-zA-Z._]+)#([\d]+)"

If this is C#, try without the ^
([a-zA-Z\._]+)#([\d]+)
I just tried it out and it groups properly
Update: escaped the .
If you want only one match (and hence the ^ in original expression), use .Match instead of .Matches method. See MSDN documentation on Regular Expression Classes.

Related

Regular expression for filenames that doesn't exclude whitespaces

I have been using this regular expression to extract file names out of file path strings:
Regex r = new Regex(#"\w+[.]\w+$+");
This works, as long as there is no space in the file name. For example:
r.Match("c:\somestuff\myfile.doc").Value = "myfile.doc"
r.Match("c:\somestuff\my file.doc").Value = "file.doc"
I need my regular expression to give me "my file.doc", and not just "file.doc"
I tried messing around with the expression myself. In particular I tried adding \s+ after learning that that is for matching whitespaces. I didn't get the results I hoped for.
I did devise a solution just to get the job done: I started at the end of the string, went backwards until a backslash was reached. This gave me the file name in reverse order (i.e. cod.elifym) into an array of chars, then I used Array.Reverse() to turn it around. However I'd like to learn how to achieve this by simply modifying my original regular expression.

Does it have to be a regular expression? Use System.IO.Path.GetFileName() instead.

Regex r = new Regex(#"[\w ]+\.\w+$");

A working regex might simply look like:
[^\\]+$
Consider using:
System.IO.Path.GetFileName(path)

C# string masking/formatting/filtering with or without regex

Hopefully this isn't too complicated, I just can't seem to find the answer I need.
I have a string with variables in, such as: this is a %variable% string
The format of the variables within the string is arbitrary, although in this example we're using the filter %{0}%
I am wanting to match variable names to properties and ideally I don't want to loop through GetProperties, formatting and testing each name. What I'd like to do is obtain "variable" as a string and test that.
I already use RegEx to get a list of the variables in a string, using the given filter:
string regExSyntax = string.Format(syntax, #"(?<word>\w+)");
but this returns them WITH the '%' (e.g. '%variable%') and as I said, that filter is arbitrary so I can't just do a string.Replace.
This feels like it should be straight-forward....
Thanks!

"(?<word>\w+)"
Is just capturing anything alphnumeric and putting it into a named capturing group called "Word"
You might be interested in learning about lookbehind and lookahead. For example:
"(?<=%)(?<word>\w+)(?=%)"
You can make it a bit more generic with putting your filter in a seperate variable:
string Boundie = "%";
string Expression = #"(?<=" + Boundie + #")(?<word>\w+)(?=" + Boundie + #")";
I hope this is anywhere near what you are looking for.

Given that your regex syntax is: string regExSyntax = string.Format(syntax, #"(?<word>\w+)");, I assume you're then going to create a Regex and use it to match against some string:
Regex reExtractVars = new Regex(regExSyntax);
Match m = reExtractVars.Match(inputString);
while (m.Success)
{
// get the matched variable
string wholeVar = m.Value; // returns "%variable%"
// get just the "word"
string wordOnly = m.Groups["word"].Value; // returns "variable"
m = m.NextMatch();
}
Or have I completely misunderstood the problem?

Acron,
If you're going to roll-your own script parser... apart from being "a bit mad", unless that's the point of the exercise (is it?), then I strongly suggest that you KISS it... Keep It Simple Stoopid.
So what denotes a VARIABLE in your scripting syntax? Is it the percent signs? And they're fixed, yes? So %name% is a variable, but #comment# is NOT a variable... correct? The phrase "that filter is arbitrary" has me worried. What's a "filter"?
If this isn't homework then just use an existing scripting engine, with existing, well defined, well known syntax. Something like Jint, for example.
Cheers. Keith.

Regex for a string

It would be great if someone could provide me the Regular expression for the following string.
Sample 1: <div>abc</div><br>
Sample 2: <div>abc</div></div></div></div></div><br>
As you can see in the samples provided above, I need to match the string no matter how many number of </div> occurs. If there occurs any other string between </div> and <br>, say like this <div>abc</div></div></div>DEF</div></div><br> OR <div>abc</div></div></div></div></div>DEF<br>, then the Regex should not match.
Thanks in advance.

Try this:
<div>([^<]+)(?:<\/div>)*<br>
As seen on rubular
Notes:
This only works if there are not tags in the abc part (or anything that has a < symbol).
You might want to use start and end of string anchors (^<div>([^<]+)(?:<\/div>)*<br>$ if you want your string to match the pattern exactly.
If you want to allow the abc part to be empty, use * instead of +
That being said, you should be wary of using regex to parse HTML.
In this example, you can use regex because you are parsing a (hopefully) known, regular subset of HTML. But a more robust solution (ie: an [X]HTML parser like HtmlAgilityPack) is preferred when it comes to parsing HTML.

You need to use a real parser. Things like infinitely nested tags can't be handled via regex.

You could also include a named group in the the expression, e.g.:
<div>(?<text>[^<]*)(?:<\/div>)*<br>
Implemented in C#:
var regex = new Regex(#"<div>(?<text>[^<]*)(?:<\/div>)*<br>");
Func<Match, string> getGroupText = m => (m.Success && m.Groups["text"] != null) ? m.Groups["text"].Value : null;
Func<string, string> getText = s => getGroupText(regex.Match(s));
Console.WriteLine(getText("<div>abc</div><br>"));
Console.WriteLine(getText("<div>123</div></div></div></div></div><br>"));

NullUserException's answer is good. Here are a couple of questions, and variations, depending on what you want.
Do you want to prevent anything from occurring before the open div tag? If so, keep the ^ at the beginning of the regex. If not, drop it.
The rest of this post refers to the following section of the regex:
([^<]+?)
Do you want to capture the contents of the div, or just know that it matches your form? To capture, leave it as is. If you don't need to capture, drop the parentheses from the above.
Do you want to match if there is nothing inside the div? If so change the + in the above to *
Finally, although it will work fine, you don't need the ? in the above.

I think, this regex is more flexible:
<div\b[^><]*+>(?>.*?</div>)(?:\s*+</div>)*+\s*+<br(?:\s*+/)?>
I don't include the ^ and $ in the beginning and the end of my regex because we cannot assure that your sample will always in a single line.

How to Extract the Word Following a Symbol?

I have a string that could have any sentence in it but somewhere in that string will be the # symbol, followed by an attached word, sort of like #username you see on some sites.
so maybe the string is "hey how are you" or it's "#john hey how are you".
IF there's an "#" in the string i want to pull what comes immediately after it into its own new string.
in this instance how can i pull "john" into a different string so i could theoretically notify this person of his new message? i'm trying to play with string.contains or .replace but i'm pretty new and having a hard time.
this btw is in c# asp.net

You can use the Substring and IndexOf methods together to achieve this.
I hope this helps.
Thanks,
Damian

Here's how you do it without regex:
string s = "hi there #john how are you";
string getTag(string s)
{
int atSign = s.IndexOf("#");
if (atSign == -1) return "";
// start at #, stop at sentence or phrase end
// I'm assuming this is English, of course
// so we leave in ' and -
int wordEnd = s.IndexOfAny(" .,;:!?", atSign);
if (wordEnd > -1)
return s.Substring(atSign, wordEnd - atSign);
else
return s.Substring(atSign);
}

You should really learn regular expressions. This will work for you:
using System.Text.RegularExpressions;
var res = Regex.Match("hey #john how are you", #"#(\S+)");
if (res.Success)
{
//john
var name = res.Groups[1].Value;
}
Finds the first occurrence. If you want to find all you can use Regex.Matches. \S means anything else than a whitespace. This means it also make hey #john, how are you => john, and #john123 => john123 which may be wrong. Maybe [a-zA-Z] or similar would suit you better (depends on which characters the usernames is made of). If you would give more examples, I could tune it :)
I can recommend this page:
http://www.regular-expressions.info/
and this tool where you can test your statements:
http://regexlib.com/RESilverlight.aspx

The best way to solve this is using Regular Expressions. You can find a great resource here.
Using RegEx, you can search for the pattern you are after. I always have to refer to some documentation to write one...
Here is a pattern to start with - "#(\w+)" - the # will get matched, and then the parentheses will indicate that you want what comes after. The "\w" means you want only word characters to match (a-z or A-Z), and the "+" indicates that there should be one or more word characters in a row.

You can try Regex...
I think will be something like this
string userName = Regex.Match(yourString, "#(.+)\\s").Groups[1].Value;

RegularExpressions. Dont know C#, but the RegEx would be
/(#[\w]+) / - Everything in the parans is captured in a special variable, or attached to RegEx object.

Use this:
var r = new Regex(#"#\w+");
foreach (Match m in r.Matches(stringToSearch))
DoSomething(m.Value);
DoSomething(string foundName) is a function that handles name (found after #).
This will find all #names in stringToSearch

C# Extracting a name from a string

I want to extract 'James\, Brown' from the string below but I don't always know what the name will be. The comma is causing me some difficuly so what would you suggest to extract James\, Brown?
OU=James\, Brown,OU=Test,DC=Internal,DC=Net
Thanks

A regex is likely your best approach
static string ParseName(string arg) {
var regex = new Regex(#"^OU=([a-zA-Z\\]+\,\s+[a-zA-Z\\]+)\,.*$");
var match = regex.Match(arg);
return match.Groups[1].Value;
}

You can use a regex:
string input = #"OU=James\, Brown,OU=Test,DC=Internal,DC=Net";
Match m = Regex.Match(input, "^OU=(.*?),OU=.*$");
Console.WriteLine(m.Groups[1].Value);

A quite brittle way to do this might be...
string name = #"OU=James\, Brown,OU=Test,DC=Internal,DC=Net";
string[] splitUp = name.Split("=".ToCharArray(),3);
string namePart = splitUp[1].Replace(",OU","");
Console.WriteLine(namePart);
I wouldn't necessarily advocate this method, but I've just come back from a departmental Christmas lyunch and my brain is not fully engaged yet.

I'd start off with a regex to split up the groups:
Regex rx = new Regex(#"(?<!\\),");
String test = "OU=James\\, Brown,OU=Test,DC=Internal,DC=Net";
String[] segments = rx.Split(test);
But from there I would split up the parameters in the array by splitting them up manually, so that you don't have to use a regex that depends on more than the separator character used. Since this looks like an LDAP query, it might not matter if you always look at params[0], but there is a chance that the name might be set as "CN=". You can cover both cases by just reading the query like this:
String name = segments[0].Split('=', 2)[1];

That looks suspiciously like an LDAP or Active Directory distinguished name formatted according to RFC 2253/4514.
Unless you're working with well known names and/or are okay with a fragile hackaround (like the regex solutions) - then you should start by reading the spec.
If you, like me, generally hate implementing code according to RFCs - then hope this guy did a better job following the spec than you would. At least he claims to be 2253 compliant.

If the slash is always there, I would look at potentially using RegEx to do the match, you can use a match group for the last and first names.
^OU=([a-zA-Z])\,\s([a-zA-Z])
That RegEx will match names that include characters only, you will need to refine it a bit for better matching for the non-standard names. Here is a RegEx tester to help you along the way if you go this route.

Replace \, with your own preferred magic string (perhaps & #44;), split on remaining commas or search til the first comma, then replace your magic string with a single comma.
i.e. Something like:
string originalStr = #"OU=James\, Brown,OU=Test,DC=Internal,DC=Net";
string replacedStr = originalStr.Replace("\,", ",");
string name = replacedStr.Substring(0, replacedStr.IndexOf(","));
Console.WriteLine(name.Replace(",", ","));

Assuming you're running in Windows, use PInvoke with DsUnquoteRdnValueW. For code, see my answer to another question: https://stackoverflow.com/a/11091804/628981

If the format is always the same:
string line = GetStringFromWherever();
int start = line.IndexOf("=") + 1;//+1 to get start of name
int end = line.IndexOf("OU=",start) -1; //-1 to remove comma
string name = line.Substring(start, end - start);
Forgive if syntax is not quite right - from memory. Obviously this is not very robust and fails if the format ever changes.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Correction in this simple regular expression - c#

i guess the regex is correct, the problem is, you get the ac.name not from group 1 but group(0), which is the whole string. try this: ac.name = m.Groups[1].Value; ac.number = m.Groups[2].Value;

use this regex (\w+)#(\d+([.,]\d+)?) Groups[1] will be contain name Groups[2] will be contain number

I think you should move the + into the capture group: #"^([a-zA-Z._]+)#([\d]+)"

If this is C#, try without the ^ ([a-zA-Z\._]+)#([\d]+) I just tried it out and it groups properly Update: escaped the . If you want only one match (and hence the ^ in original expression), use .Match instead of .Matches method. See MSDN documentation on Regular Expression Classes.

Related

Regular expression for filenames that doesn't exclude whitespaces

C# string masking/formatting/filtering with or without regex

Regex for a string

How to Extract the Word Following a Symbol?

C# Extracting a name from a string

Categories

Resources