Regex: optional-ignore double quotes - c#

Hello I have a question about matching groups based on the following regular expression
static string partPattern = #"^(?<Key>\w+)\s*(?<Value>.*)$";
Sample Data as following:
TEST_REPLICATE
{
REPLICATE_ID 1986
ASSAY_NUMBER 877
ASSAY_VERSION 4
ASSAY_STATUS "Research"
}
I am able to retrieve values correctly and if values are NULL, it works correctly as well. What I am trying to do is to also retrieve a value for instance the last one module which is in double quotes. I am not really sure if i am doing it correctly, would this be the correct regex for the above scenario, I just added quotes before w. Please correct, thanks
static string partPattern = #"^(?<Key>\"w+)\s*(?<Value>.*)$";

You regex is not correct.Atleast for the input you have provided..
If I have understood your question,this is the regex that you need.
^\s*(?<Key>\w+)\s*\"?(?<Value>.*?)\"?$
It would work with multiline mode...

Not sure where your problem is. This works for me:
\s*(?<Key>[^\s]+)\s*(?<Value>.*)

Related

How do I see if a string contains another string with quotes in it?

I am trying to see if a large string contains this line of HTML:
<label ng-class="choiceCaptionClass" class="ng-binding choice-caption">Was this information helpful?</label>
As you can see, this snippet has quotations in multiple places and it's causing problems when I do something like this:
Assert.IsTrue(responseContent.Contains("<label ng-class="choiceCaptionClass" class="ng - binding choice - caption">Was this information helpful?</label>"));
I've tried both of these ways of defining the string:
#"<label ng-class=""choiceCaptionClass"" class=""ng - binding choice - caption"">Was this information helpful?</label>"
and
"<label ng-class=\"choiceCaptionClass\" class=\"ng - binding choice - caption\">Was this information helpful?</label>"
But in each case the Contains() method looks for the literal string with either the double quotes or the backslashes. Is there another way I could define this string so I can correctly search for it?
Escaping the double-quotes with backslashes is the proper thing to do.
The reason your search may be failing is that the strings don't actually match. For example, in your version with backslashes, you have spaces around some of the dashes but your HTML string does not.
Try using regular expressions. I made this one for you but you can test your own regex here.
var regex = new Regex(#"<label\s+ng-class\s*=\s*""choiceCaptionClass""\s+class\s*=\s*""ng-binding choice-caption""\s*>\s*Was this information helpful\?\s*</label>", RegexOptions.IgnoreCase);
Assert.IsTrue(regex.IsMatch(responseContent));
If this is not working use the tester tool to figure it out what part of the pattern is getting off.
Hope this help!

Parsing out Excel functions from Formula string

I have a string which contains an Excel formula. How to parse out each particular function name from within the string?
I can't figure out how to write the regex for this. Basically it has to be the string of characters before a ( that isn't in a single or double quote.
For example:
=VLOOKUP($A9,'Summary'!$A$10:$C$30,3,FALSE) - Should return VLOOKUP
=IFERROR((C10/B10),"N/A") - should return IFERROR
='New Chart Data (Date)'!L70 - Should return nothing because there is no function
=IFERROR((C10/B10),Len(E30)) - should return IFERROR and LEN
='New Chart Data(Date)'!L70 + Len(5) - should return Len. This is the tricky one. A lot will return Data as well which is wrong.
Any ideas?
Thanks in advance.
You can use something like this I guess...
(?<=[=,])[A-Za-z2]+(?=\()
regex101 demo (with descriptions of regex)
Actually, there's one catch: a formula such as =IFERROR((C10/B10), Len(E30)) won't get Len. You can use this one instead and trim any spaces if any:
(?<=[=,])\s*[A-Za-z2]+(?=\()
Or since C# accepts variable length lookbehinds...
(?<=[=,]\s*)[A-Za-z2]+(?=\()
Which I think takes a bit more resources than the previous.
EDIT: I didn't think of the fact that sheetnames can take the form =Sheet(2) e.g. ='=Sheet(2)'!A1
(?<=[=,])\s*[A-Za-z2]+(?=\()(?![^']*'!)
revised regex101
EDIT2: Forgot operators as well... I guess I'll use a word boundary like Andy's, since the only issue is
\b[A-Za-z2]+(?=\()(?![^']*'!)
updated regex101
I think it could be simplified, using a word-break \b rather than a look-behind:
\b([A-Za-z2]+)(?=\()

Correction in this simple regular expression

I am new to regular expressions and the one that i have written might be a very simple one but donot know where I am wrong.
#"^([a-zA-Z._]+)#([\d]+)"
This RE is for the following string:
somename#somenumber
Now i am trying to retrieve the somename and somenumber. This is what i did:
ac.name = m.Groups[0].Value;
ac.number = m.Groups[1].Value;
Here ac.name reads the complete string, and ac.number reads somenumber. Where am I wrong in ac.name?
i guess the regex is correct, the problem is, you get the ac.name not from group 1 but group(0), which is the whole string. try this:
ac.name = m.Groups[1].Value;
ac.number = m.Groups[2].Value;
This regex is correct. I think your mistake is in somewhere else. You seem to use C#. So, you should think about the regex usage in the language.
Looking to the code sample in MSDN, you need to use 1-based indexes while accessing Groups instead of zero-based (as also Kent suggested). So, use this:
String name = m.Groups[1].Value;
String number = m.Groups[2].Value;
use this regex (\w+)#(\d+([.,]\d+)?)
Groups[1] will be contain name
Groups[2] will be contain number
I think you should move the + into the capture group:
#"^([a-zA-Z._]+)#([\d]+)"
If this is C#, try without the ^
([a-zA-Z\._]+)#([\d]+)
I just tried it out and it groups properly
Update: escaped the .
If you want only one match (and hence the ^ in original expression), use .Match instead of .Matches method. See MSDN documentation on Regular Expression Classes.

Regex for a string

It would be great if someone could provide me the Regular expression for the following string.
Sample 1: <div>abc</div><br>
Sample 2: <div>abc</div></div></div></div></div><br>
As you can see in the samples provided above, I need to match the string no matter how many number of </div> occurs. If there occurs any other string between </div> and <br>, say like this <div>abc</div></div></div>DEF</div></div><br> OR <div>abc</div></div></div></div></div>DEF<br>, then the Regex should not match.
Thanks in advance.
Try this:
<div>([^<]+)(?:<\/div>)*<br>
As seen on rubular
Notes:
This only works if there are not tags in the abc part (or anything that has a < symbol).
You might want to use start and end of string anchors (^<div>([^<]+)(?:<\/div>)*<br>$ if you want your string to match the pattern exactly.
If you want to allow the abc part to be empty, use * instead of +
That being said, you should be wary of using regex to parse HTML.
In this example, you can use regex because you are parsing a (hopefully) known, regular subset of HTML. But a more robust solution (ie: an [X]HTML parser like HtmlAgilityPack) is preferred when it comes to parsing HTML.
You need to use a real parser. Things like infinitely nested tags can't be handled via regex.
You could also include a named group in the the expression, e.g.:
<div>(?<text>[^<]*)(?:<\/div>)*<br>
Implemented in C#:
var regex = new Regex(#"<div>(?<text>[^<]*)(?:<\/div>)*<br>");
Func<Match, string> getGroupText = m => (m.Success && m.Groups["text"] != null) ? m.Groups["text"].Value : null;
Func<string, string> getText = s => getGroupText(regex.Match(s));
Console.WriteLine(getText("<div>abc</div><br>"));
Console.WriteLine(getText("<div>123</div></div></div></div></div><br>"));
NullUserException's answer is good. Here are a couple of questions, and variations, depending on what you want.
Do you want to prevent anything from occurring before the open div tag? If so, keep the ^ at the beginning of the regex. If not, drop it.
The rest of this post refers to the following section of the regex:
([^<]+?)
Do you want to capture the contents of the div, or just know that it matches your form? To capture, leave it as is. If you don't need to capture, drop the parentheses from the above.
Do you want to match if there is nothing inside the div? If so change the + in the above to *
Finally, although it will work fine, you don't need the ? in the above.
I think, this regex is more flexible:
<div\b[^><]*+>(?>.*?</div>)(?:\s*+</div>)*+\s*+<br(?:\s*+/)?>
I don't include the ^ and $ in the beginning and the end of my regex because we cannot assure that your sample will always in a single line.

Regex Question

In my C# Console App I'm trying to use Regex to search a string to determine if there is a match or not. Below is my code but it is not quite working right so I will explain further. sSearchString is set to "_One-Call_Pipeline_Locations" and pDS.Name is a filename it is searching against. Using the code below it is set to true for Nevada_One-Call_Pipeline_Locations and Nevada_One-Call_Pipeline_LocationsMAXIMUM. There should be a match for Nevada_One-Call_Pipeline_Locations But Not for Nevada_One-Call_Pipeline_LocationsMAXIMUM. How can I change my code to do this properly?
Thanks in advance
if (Regex.IsMatch(pDS.Name, sSearchString))
Change the sSearchString to ".*_One-Call_Pipeline_Locations$"
You need to specify that a matching name must end with the text you have entered using the dollar token.
sSearchString = "_One-Call_Pipeline_Locations$";
Since you provided no details as to what else should match, we can only assume that if the string ends with "Nevada_One-Call_Pipeline_Locations", then it matches? Is this correct?
If so, you don't need Regex:
if (pDS.Name.EndsWith("Nevada_One-Call_Pipeline_Locations"))
{ //...

Categories

Resources