How to allow only one / in regex - c#

I have an regex looking like this "[a-æøåA-ÆØÅ0-9-/().\s]{1,100}$". I would like to allow ONE / in user input from textbox, e.g. like "3/4 inch fitting bla bla".
How can I do that in a safe way, and is it safe at all?
My query look something like this.
XmlElement bemærkning = xmldoc.CreateElement("Bemærkning");
bemærkning.InnerText = txtBemærkningWT.Text;
//XmlElement usernamePCxml = xmldoc.CreateElement("UsernamepcXML");
//usernamePCxml.InnerText = usernamePC.ToString();
parentelement.AppendChild(type);
parentelement.AppendChild(art);
parentelement.AppendChild(l);
parentelement.AppendChild(bemærkning);
parentelement.AppendChild(varenummer);
parentelement.AppendChild(opretter);
parentelement.AppendChild(date);
//parentelement.AppendChild(usernamePCxml);
xmldoc.DocumentElement.AppendChild(parentelement);
xmldoc.Save(Server.MapPath(map));

Since you also check the total number of characters {1,100} there is no simple solution (that I can think of) for one regex. The easyest way is probably to do a separate check either for the occurence of / or for the overall length. If you check the total length separately you could use a regex like this:
"^[a-æøåA-ÆØÅ0-9-().\s]*\/?[a-æøåA-ÆØÅ0-9-().\s]*$"
Notice that I added a ^ at the beginnin to indicate the start of the input. I don'nt know if this is necessary in your case but it probably is.
Regarding the safity: you are using InnerText, which escapes markup that might be contained in the input, InnerXml does not. So you should be fine.

Related

Stop regex from spanning across unrequired content

I need to extract a series of meaningful values from a file. The basic pattern for the values I need to match looks like:
"indicator\..+?"\[true\]
Unfortunately, in places this is spanning across quite a bit of content to get a true match, and the lazy quantifier (?) is not being as lazy as I'd like.
How do I modify the above so that out of the following:
"indicator.value here"[false],"other content","more other
content","indicator don't match this one because the full stop is missing"[true],"indicator.this is the
value I want matched"[true]
only this value is returned: "indicator.this is the value I want matched"[true]
Currently, that whole string is being returned by my above regex.
Assuming commas are the delimiter - simply avoid matching on them:
#"""indicator\.[^,]+?""\[true\]"
Try using "indicator\.(.*)?"\[true\] instead and see if that helps. I think the lazy only applies to the * operator. I vaguely remember having this issue years ago.
You can leverage the discard technique by discarding the pattern you don't want. So, you could have something like this:
"indicator\..+?"\[false\]|"indicator\.(.+?)"\[true\]
Discard this pattern --^ Capture this --^
Working demo
Match information
MATCH 1
1. [150-182] `this is the value I want matched`

how to replace the exact word by another in a list?

I have a list like :
george fg
michel fgu
yasser fguh
I would like to replace fg, fgu, and fguh by "fguhCool" I already tried something like this :
foreach (var ignore in NameToPoulate)
{
tempo = ignore.Replace("fg", "fguhCool");
NameToPoulate_t.Add(tempo);
}
But then "fgu" become "fguhCoolu" and "fguh" become "fguhCooluh" is there are a better idea ?
Thanks for your help.
I assume that this is a homework assignment and that you are being tested for the specific algorihm rather than any code that does the job.
This is probably what your teacher has in mind:
Students will realize that the code should check for "fguh" first, then "fgu" then "fg". The order is important because replacing "fg" will, as you have noticed, destroy a "fguh".
This will by some students be implemented as a loop with if-else conditions in them. So that you will not replace a "fg" that is within an already replaced "fguhCool".
But then you will find that the algorithm breaks down if "fg" and "fgu" are both within the same string. You cannot then allow the presence of "fgu" prevent you to check for "fg" at a different part of the string.
The answer that your teacher is looking for is probably that you should first locate "fguh", "fgu" and "fg" (in that order) and replace them with an intermediary string that doesn't contain "fg". Then after you have done that, you can search for that intermediary string and replace it with "fguhCool".
You could use regular expressions:
Regex.Replace(#"\bfg\b", "fguhCool");
The \b matches a so-called word boundary which means it matches the beginnnig or end of a word (roughly, but for this purpose enough).
Use a regular expression:
Regex.Replace("fg(uh?)?", "fguhCool");
An alternative would be replacing the long words for the short ones first, then replacing the short for the end value (I'm assuming all words - "fg", "fgu" and "fguh" - would map to the same value "fguhCool", right?)
tempo = ignore
.Replace("fguh", "fg")
.Replace("fgu", "fg")
.Replace("fg", "fguhCool");
Obs.: That assumes those words can appear anywhere in the string. If you're worried about whole words (i.e. cases where those words are not substrings of a bigger word), then see #Joey's answer (in this case, simple substitutions won't do, regexes are really the best option).

RegEx: Correct usage of lookbehind assertion and group definitions

I have the following string:
i:0#.w|domain\x123456
I know about the possibility to group searchterms by using <mysearchterm> and calling it via RegEx.Match(myRegEx).Result("${mysearchtermin}");.
I also know that I can lookbehind assertions like (?<= subexpression) via MSDN. Could someone help me in geting the (including the possibility to search for them via groups as shown before):
domain ("domain")
user account ("x12345")
I don't need anything from before the pipe character (nor the pipe character itself) - so basically I am interested in domain\x123456.
As others have noted, this can be done without regex, or without lookbehinds. That being said, I can think of reasons you might want them: to write a RegexValidator instead of having to roll up a CustomValidator, for example. In ASP.NET, CustomValidators can be a little longer to write, and sometimes a RegexValidator does the job just fine.
As far as lookbehinds, the main reason you'd want one for something like this is if the target string could contain irrelevant copies of the |domain\x123456 pattern:
foo#bar|domain\x999999 says: 'i:0#.w|domain\x888888i:0#.w|domain\x123456|domain\x000000'
If you only wanted to grab domain\x888888 and domain\x123456 out of that, a lookbehind could be useful. Or maybe you just want to learn about lookbehinds. Anyway, since we only have one sample input, I can only guess at the rules; so perhaps something like this:
#"(?<=[a-z]:\d#\.[a-z]\|)(?<domain>[^\\]*)\\(?<user>x\d+)"
Lookarounds are one of the most subtle and misunderstood features of regex, IMHO. I've gotten a lot of use out of them in preventing false positives, or in limiting the length of matches when I'm not trying to match the entire string (for example, if I want only the 3-digit numbers in blah 1234 123 1234567 123 foo, I can use (?<!\d)\d{3}(?!\d)). Here's a good reference if you want to learn more about named groups and lookarounds.
You can just use the regex #"\|([^\\]+)\\(.+)".
The domain and user will be in groups 1 and 2, respectively.
You don't need regular expressions for that.
var myString = #"i:0#.w|domain\x123456";
var relevantParts = myString.Split('|')[1].Split('\\');
var domain = relevantParts[0];
var user = relevantParts[1];
Explanation: String.Split(separator) returns an array of substrings separated by separator.
If you insist of using regular expressions, this is how you do it with named groups and Match.Result, based on SLaks answer (+1, by the way):
var myString = #"i:0#.w|domain\x123456";
var r = new Regex(#"\|(?<domain>[^\\]+)\\(?<user>.+)");
var match = r.Matches(myString)[0]; // get first match
var domain = match.Result("${domain}");
var user = match.Result("${user}");
Personally, however, I would prefer the following syntax, if you are just extracting the values:
var domain = match.Groups["domain"];
var user = match.Groups["user"];
And you really don't need lookbehind assertions here.

Regex for a string

It would be great if someone could provide me the Regular expression for the following string.
Sample 1: <div>abc</div><br>
Sample 2: <div>abc</div></div></div></div></div><br>
As you can see in the samples provided above, I need to match the string no matter how many number of </div> occurs. If there occurs any other string between </div> and <br>, say like this <div>abc</div></div></div>DEF</div></div><br> OR <div>abc</div></div></div></div></div>DEF<br>, then the Regex should not match.
Thanks in advance.
Try this:
<div>([^<]+)(?:<\/div>)*<br>
As seen on rubular
Notes:
This only works if there are not tags in the abc part (or anything that has a < symbol).
You might want to use start and end of string anchors (^<div>([^<]+)(?:<\/div>)*<br>$ if you want your string to match the pattern exactly.
If you want to allow the abc part to be empty, use * instead of +
That being said, you should be wary of using regex to parse HTML.
In this example, you can use regex because you are parsing a (hopefully) known, regular subset of HTML. But a more robust solution (ie: an [X]HTML parser like HtmlAgilityPack) is preferred when it comes to parsing HTML.
You need to use a real parser. Things like infinitely nested tags can't be handled via regex.
You could also include a named group in the the expression, e.g.:
<div>(?<text>[^<]*)(?:<\/div>)*<br>
Implemented in C#:
var regex = new Regex(#"<div>(?<text>[^<]*)(?:<\/div>)*<br>");
Func<Match, string> getGroupText = m => (m.Success && m.Groups["text"] != null) ? m.Groups["text"].Value : null;
Func<string, string> getText = s => getGroupText(regex.Match(s));
Console.WriteLine(getText("<div>abc</div><br>"));
Console.WriteLine(getText("<div>123</div></div></div></div></div><br>"));
NullUserException's answer is good. Here are a couple of questions, and variations, depending on what you want.
Do you want to prevent anything from occurring before the open div tag? If so, keep the ^ at the beginning of the regex. If not, drop it.
The rest of this post refers to the following section of the regex:
([^<]+?)
Do you want to capture the contents of the div, or just know that it matches your form? To capture, leave it as is. If you don't need to capture, drop the parentheses from the above.
Do you want to match if there is nothing inside the div? If so change the + in the above to *
Finally, although it will work fine, you don't need the ? in the above.
I think, this regex is more flexible:
<div\b[^><]*+>(?>.*?</div>)(?:\s*+</div>)*+\s*+<br(?:\s*+/)?>
I don't include the ^ and $ in the beginning and the end of my regex because we cannot assure that your sample will always in a single line.

Quick & Dirty way to update "IDs" in a string formatted as XML (C#)

For a one-shot operation, i need to parse the contents of an XML string and change the numbers of the "ID" field. However, i can not risk changing anything else of the string, eg. whitespace, line feeds, etc. MUST remain as they are!
Since i have made the experience that XmlReader tends to mess whitespace up and may even reformat your XML i don't want to use it (but feel free to convince me otherwise). This also screams for RegEx but ... i'm not good at RegEx, particularly not with the .NET implementation.
Here's a short part of the string, the number of the ID field needs to be updated in some cases. There can be many such VAR entries in the string. So i need to convert each ID to Int32, compare & modify it, then put it back into the string.
<VAR NAME="sf_name" ID="1001210">
I am looking for the simplest (in terms of coding time) and safest way to do this.
The regex pattern you are looking for is:
ID="(\d+)"
Match group 1 would contain the number. Use a MatchEvaluator Delegate to replace matches with dynamically calculated replacements.
Regex r = new Regex("ID=\"(\\d+)\"");
string outputXml = r.Replace(inputXml, new MatchEvaluator(ReplaceFunction));
where ReplaceFunction is something like this:
public string ReplaceFunction(Match m)
{
// do stuff with m.Groups(1);
return result.ToString();
}
If you need I can expand the Regex to match more specifically. Currently all ID values (that contain numbers only) are replaced. You can also build that bit of "extra intelligence" into the match evaluator function and make it return the match unchanged if you don't want to change it.
Take a look at this property PreserveWhitespace in XmlDocument class

Categories

Resources