C# Regex needs additional characters - c#

my current string regex is working but I wanted to add <?php and <?PHP. Here is the code:
#"\b(public|private)\b"
string Keywords = #"\b(public|private)\b";
This is how you can do a div
#"<\s*div[^>]*>(.*?)<\s*/div\s*>";
How can I add <?php and <?PHP in the same regex?

To specify a set of acceptable characters in your pattern, you can either build a character class yourself or use a predefined one. A character class lets you represent a bunch of characters as a single item in a regular expression. You can build your own character class by enclosing the acceptable characters in square brackets.
Read Using Regular Expressions with PHP for more details and learn how to use php in regex.
Hope it helps.

<?php is not a word so you cannot use \b to match it
var regex = new Regex(#"\b(public|private)\b|<\?php", RegexOptions.IgnoreCase);
var match = regex.Match("public");
Console.WriteLine(match.Value);
match = regex.Match("<?php");
Console.WriteLine(match.Value);
Console.ReadLine();

Related

Regex working in Regexr but not C#, why?

From the below mentioned input string, I want to extract the values specified in {} for s:ds field. I have attached my regex pattern. Now the pattern I used for testing on http://www.regexr.com/ is:
s:ds=\\\"({[\d\w]{8}\-([\d\w]{4}\-){3}[\d\w]{12}})\\\"
and it works absolutely fine.
But the same in C# code does not work. I have also added \\ instead of \ for c# code and replaced \" with \"" . Let me know if Im doing something wrong. Below is the code snippet.
string inputString is "s:ds=\"{46C01EB7-6D43-4E2A-9267-608DE8AFA311}\" s:ds=\"{37BA4BA0-581C-40DC-A542-FFD9E99BC345}\" s:id=\"{C091E71D-4817-49BC-B120-56CE88BC52C2}\"";
string regex = #"s:ds=\\\""({[\d\w]{8}\-(?:[\d\w]{4}\-){3}[\d\w]{12}})\\\""";
MatchCollection matchCollection = Regex.Matches(layoutField, regex);
if (matchCollection.Count > 1)
{
Log.Info("Collection Found.", this);
}
If you only watch to match the values...
You should be able to just use ([\d\w]{8}\-([\d\w]{4}\-){3}[\d\w]{12}) for your expression if you only want to match the withing your gullwing braces :
string input = "s:ds=\"{46C01EB7-6D43-4E2A-9267-608DE8AFA311} ...";
// Use the following expression to just match your GUID values
string regex = #"([\d\w]{8}\-([\d\w]{4}\-){3}[\d\w]{12})";
// Store your matches
MatchCollection matchCollection = Regex.Matches(input, regex);
// Iterate through each one
foreach(var match in matchCollection)
{
// Output the match
Console.WriteLine("Collection Found : {0}", match);
}
You can see a working example of this in action here and example output demonstrated below :
If you want to only match those following s:ds...
If you only want to capture the values for s:ds sections, you could consider appending (?<=(s:ds=""{)) to the front of your expression, which would be a look-behind that would only match values that were preceded by "s:ds={" :
string regex = #"(?<=(s:ds=""{))([\d\w]{8}\-([\d\w]{4}\-){3}[\d\w]{12})";
You can see an example of this approach here and demonstrated below (notice it doesn't match the s:id element :
Another Consideration
Currently you are using \w to match "word" characters within your expression and while this might work for your uses, it will match all digits \d, letters a-zA-z and underscores _. It's unlikely that you would need some of these, so you may want to consider revising your character sets to use just what you would expect like [A-Z\d] to only match uppercase letters and numbers or [0-9A-Fa-f] if you are only expected GUID values (e.g. hex).
Looks like you might be over-escaping.
Give this a shot:
#"s:ds=\""({[\d\w]{8}\-([\d\w]{4}\-){3}[\d\w]{12}})\"""

Regex for ConstantText dot any text

I neeed a c# regex for this 2 cases.
1)MyConstantText
2)MyConstantText.[a-zA-Z]
ex.
My const text is Hello, then regex must match
Hello
Hello.ashdkajshd
Do not forget to escape when creating regular expressions:
String text = "Hello";
// Escape text as well as dot (\.)
// Technically, you do want to escape "Hello", but since
// text can be an arbitrary string, you'd better do it
String pattern = Regex.Escape(text) + #"(\.[a-zA-Z]+)?";
// Simple test
Console.Write(Regex.Match("Hello.ashdkajshd", pattern).Value);
Remark: Please note, that pattern, provided in the question (MyConstantText.[a-zA-Z]) doesn't match the sample in the question ("Hello.ashdkajshd") but "Hello.a" only. So, I've change the corresponding subpattern into [a-zA-Z]+ (note +).
Here is tuto for regex in c# ... if you got an error you can post it

Regex Match for HTML string with newline

I am trying to match:
<h4>Manufacturer</h4>\n\n Gigabyte\n\n\n
My Regex ATM is:
Match regex = Regex.Match(cleanedUpHtml, "Manufacturer(.*?)\n\n\n", RegexOptions.IgnoreCase);
However it does not work.
The (.*?) should match all in between.
Here are 2 things I find important:
Whenever you declare a regex pattern in C#, it is advisable to use string literals, i.e. #"PATTERN". This simplifies writing regex patterns.
RegexOptions.Singleline must be used to treat multiline text as a string, i.e. a dot will match a line break.
Here is my code snippet:
var str = "<h4>Manufacturer</h4>\n\n Gigabyte\n\n\n";
var regex = Regex.Match(str, #"Manufacturer(.*?)\n\n\n",
RegexOptions.IgnoreCase | RegexOptions.Singleline);
if (regex.Success)
MessageBox.Show("\"" + regex.Value + "\"");
The regex.Value is
"Manufacturer</h4>
Gigabyte
"
Best regards.
I replaced \n with another value and then Regex searched my replaced value. It is working for the time being, but it may not be the best approach. Any recommendations appreciated.
cleanedUpHtml = cleanedUpHtml.Replace("\n", "p19o9");
Match regex = Regex.Match(cleanedUpHtml, "Manufacturer(.*?)p19o9p19o9p19o9", RegexOptions.IgnoreCase);
Generally I prefere to cleanup the string from html tags and new-line characters before using the regex.
(.*?) stops capture with \n characer, you might use a more generic group instead, like ([\w|\W]*?)

Regex Pattern Using Brackets

Simple question here guys. I'm attempting to create a pattern to use with a Regex in C#.
Here is my attempt:
"(value\":\[\[\"([A-Za-z0-9]+(?:-{0,1})[A-Za-z0-9]+)\"\]\])"
However for some reason when I go to compile this I get "Unrecognized escape sequence" on the brackets. Can I not simply use \ to escape the brackets?
The strings I'm searching for have the form of
value":[["AB-AB"]]
or
value":[["ABAB"]]
and I'd like to grab group[1] from the results.
The C# compiler by default disallows escape sequences it does not recognize. You can override this behavior by using "#" like this:
#"(value\"":\[\[\""([A-Za-z0-9]+(?:-{0,1})[A-Za-z0-9]+)\""\]\])"
Edit:
The # sign is a little more complicated than that. To quote #Guffa:
A # delimited string simply doesn't use backslash for escape
sequences.
Furthermore it should be noted that the replacement for \" in such a string is ""
I would recommend placing your pattern inside a verbatim string literal while implementing a negated character class to match the context; then reference the first group to grab the match results.
String s = #"I have value"":[[""AB-AB""]] and value"":[[""ABAB""]]";
foreach (Match m in Regex.Matches(s, #"value"":\[\[""([^""]+)""]]"))
Console.WriteLine(m.Groups[1].Value);
Output
AB-AB
ABAB

Regex to exclude all chars except letters

I'm a real regex n00b so I ask your help:
I need a regex witch match only letters and numbers and exclude punctations, non ascii characters and spaces.
"ilikestackoverflow2012" would be a valid string.
"f### you °§è" not valid.
"hello world" not valid
"hello-world" and "*hello_world*" not valid
and so on.
I need it to make a possibly complex business name url friendly.
Thanks in advance!
You don't need regex for this.
string s = "......"
var isValid = s.All(Char.IsLetterOrDigit);
-
I need it to make a possibly complex business name url friendly
You can also use HttpUtility.UrlEncode
var urlFriendlyString = HttpUtility.UrlEncode(yourString);
To validate a string you can use the following regular expression with Regex.IsMatch:
"^[0-9A-Za-z]+$"
Explanation:
^ is a start of string anchor.
[...] is a character class.
+ means one or more.
$ is an end of string anchor.
I need it to make a possibly complex business name url friendly
Then you want to replace the characters that don't match. Use Regex.Replace with the following regular expression:
"[^0-9A-Za-z]+"
Explanation:
[^...] is a negated character class.
Code:
string result = Regex.Replace(input, "[^0-9A-Za-z]+" , "");
See it working online: ideone
Note that different business names could give the same resulting string. For example, businesses whose names contain only Chinese characters will all give the empty string.
You can use below regex.
^[a-zA-Z0-9]+$
^[0-9a-zA-Z]+$
Matches one or more alphanumeric characters with no spaces or non-alpha characters.
Try this:
var regex = new Regex(#"^[a-zA-Z0-9]+$");
var test = new[] {"ilikestack", "hello world", "hello-world", "###"};
foreach (var s in test)
Console.WriteLine("{0}: {1}", s, regex.IsMatch(s));
EDIT: If you want something like #Andre_Miller said, you should use the same regex with Regex.Replace();
Regex.Replace(s, #"[^a-zA-Z0-9]+", "")
OR
var regex = new Regex(#"^[a-zA-Z0-9]+$");
regex.Replace("input-string-##$##");
Try
^[a-zA-Z0-9]+$
www.regexr.com is a GREAT resource.
What's wrong with [:alnum:]? It's a posix standard. So your whole regex would be: ^[:alnum:]+$.
The wikipedia article for regular expressions includes lots of examples and details.

Categories

Resources