Pattern matching for swedish character - c#

I need a help regarding regular expression.
I have to match string like this:
âãa34dc
Pattern that i have used:
\s*[a-zA-Z]+[a-zA-Z_0-9]*\s
but this pattern is not good enough to identify this kind of string e.g. âãa34dc
P.S. âã these are swedish character.
Please help me for find out correct pattern for this kind of string.

Do you actually want to restrict it to Swedish characters? In other words, should a German character not match? If so, then you'll probably have to enumerate the whole alphabet, and include that.
If what you really want is to match every alphabetic character, use the regular expression terms for matching all letters.
\w matches any word character, but that includes numbers & some punctuation. That's close, but not exactly what you want for your second term.
For the first term, where you don't want to include numbers, specifying that the character should be a Unicode 'letter' class will work. \p{L} specifies all Unicode characters that are a letter. This includes [a-zA-Z], and all the Swedish characters, and German, and Russian, etc.
Therefore, I think this regular expression is what you want:
\s*[\p{L}][\p{L}_0-9]*\s
If you want to include digits from other character sets, and some other punctuation, then you can use [\w]* for the second term.

please give a set of rules.
according to your question :
[X-Ya-zA-Z]{3}[0-9]{2}[a-zA-Z]{2}
Replace X with the first swedish letter
Replace Y with the last swedish letter

John Machin provides a great answer for this. Adapting his pattern, what you need is probably something similar to: \s*[^\W\d_]\w*\s*
P.S. I removed the + quantifier from your first part. Any subsequent letters would be matched by the subsequent quantified \w.

Related

Simple phone number regex to match numbers, spaces, etc

I'm trying to modify a fairly basic regex pattern in C# that tests for phone numbers.
The patterns is -
[0-9]+(\.[0-9][0-9]?)?
I have two questions -
1) The existing expression does work (although it is fairly restrictive) but I can't quite understand how it works. Regexps for similar issues seem to look more like this one -
/^[0-9()]+$/
2) How could I extend this pattern to allow brackets, periods and a single space to separate numbers. I tried a few variations to include -
[0-9().+\s?](\.[0-9][0-9]?)?
Although i can't seem to create a valid pattern.
Any help would be much appreciated.
Thanks,
[0-9]+(\.[0-9][0-9]?)?
First of all, I recommend checking out either regexr.com or regex101.com, so you yourself get an understanding of how regex works. Both websites will give you a step-by-step explanation of what each symbol in the regex does.
Now, one of the main things you have to understand is that regex has special characters. This includes, among others, the following: []().-+*?\^$. So, if you want your regex to match a literal ., for example, you would have to escape it, since it's a special character. To do so, either use \. or [.]. Backslashes serve to escape other characters, while [] means "match any one of the characters in this set". Some special characters don't have a special meaning inside these brackets and don't require escaping.
Therefore, the regex above will match any combination of digits of length 1 or more, followed by an optional suffix (foobar)?, which has to be a dot, followed by one or two digits. In fact, this regex seems more like it's supposed to match decimal numbers with up to two digits behind the dot - not phone numbers.
/^[0-9()]+$/
What this does is pretty simple - match any combination of digits or round brackets that has the length 1 or greater.
[0-9().+\s?](\.[0-9][0-9]?)?
What you're matching here is:
one of: a digit, round bracket, dot, plus sign, whitespace or question mark; but exactly once only!
optionally followed by a dot and one or two digits
A suitable regex for your purpose could be:
(\+\d{2})?((\(0\)\d{2,3})|\d{2,3})?\d+
Enter this in one of the websites mentioned above to understand how it works. I modified it a little to also allow, for example +49 123 4567890.
Also, for simplicity, I didn't include spaces - so when using this regex, you have to remove all the spaces in your input first. In C#, that should be possible with yourString.Replace(" ", ""); (simply replacing all spaces with nothing = deleting spaces)
The + after the character set is a quantifier (meaning the preceeding character, character set or group is repeated) at least one, and unlimited number of times and it's greedy (matched the most possible).
Then [0-9().+\s]+ will match any character in set one or more times.

Difficulty finding where to insert "word exclusion" in a regex

I know the regex for excluding words, roughly anyway, It would be (!?wordToIgnore|wordToIgnore2|wordToIgnore3)
But I have an existing, complicated regex that I need to add this to, and I am a bit confused about how to go about that. I'm still pretty new to regex, and it took me a very long time to make this particular one, but I'm not sure where to insert it or how ...
The regex I have is ...
^(?!.*[ ]{2})(?!.*[']{2})(?!.*[-]{2})(?:[a-zA-Z0-9 \:/\p{L}'-]{1,64}$)$
This should only allow the person typing to insert between 1 and 64 letters that match that pattern, cannot start with a space, quote, double quote, special character, a dash, an escape character, etc, and only allows a-z both upper and lowercase, can include a space, ":", a dash, and a quote anywhere but the beginning.
But I want to forbid them from using certain words, so I have this list of words that I want to be forbidden, I just cannot figure out how to get that to fit into here.. I tried just pasting the whole .. "block" in, and that didn't work.
?!the|and|or|a|given|some|that|this|then|than
Has anyone encountered this before?
ciel, first off, congratulations for getting this far trying to build your regex rule. If you want to read something detailed about all kinds of exclusions, I suggest you have a look at Match (or replace) a pattern except in situations s1, s2, s3 etc
Next, in your particular situation, here is how we could approach your regex.
For consision, let's make all the negative lookarounds more compact, replacing them with a single (?!.*(?: |-|'){2})
In your character class, the \: just escapes the colon, needlessly so as : is enough. I assume you wanted to add a backslash character, and if so we need to use \\
\p{L} includes [a-zA-Z], so you can drop [a-zA-Z]. But are you sure you want to match all letters in any script? (Thai etc). If so, remember to set the u flag after the regex string.
For your "bad word exclusion" applying to the whole string, place it at the same position as the other lookarounds, i.e., at the head of the string, but using the .* as in your other exclusions: (?!.*(?:wordToIgnore|wordToIgnore2|wordToIgnore3)) It does not matter which lookahead comes first because lookarounds do not change your position in the string. For more on this, see Mastering Lookahead and Lookbehind
This gives us this glorious regex (I added the case-insensitive flag):
^(?i)(?!.*(?:wordToIgnore|wordToIgnore2|wordToIgnore3))(?!.*(?: |-|'){2})(?:[\\0-9 :/\p{L}'-]{1,64}$)$
Of course if you don't want unicode letters, replace \p{L} with a-z
Also, if you want to make sure that the wordToIgnore is a real word, as opposed to an embedded string (for instance you don't want cat but you are okay with catalog), add boundaries to the lookahead rule: (?!.*\b(?:wordToIgnore|wordToIgnore2|wordToIgnore3)\b)
use this:
^(?!.*(the|and|or|a|given|some|that|this|then|than))(?!.*[ ]{2})(?!.*[']{2})(?!.*[-]{2})(?:[a-zA-Z0-9 \:\p{L}'-]{1,64}$)$
see demo

Regular Expression to not allow 3 consecutive characters

I have the following regex:
Regex pattern = new Regex(#"^(?=.*\d)(?=.*[a-z])(?=.*[A-Z])[0-9a-zA-Z]{8,20}/(.)$");
(?=.*\d) //should contain at least one digit
(?=.*[a-z]) //should contain at least one lower case
(?=.*[A-Z]) //should contain at least one upper case
[a-zA-Z0-9]{8,20} //should contain at least 8 characters and maximum of 20
My problem is I also need to check if 3 consecutive characters are identical. Upon searching, I saw this solution:
/(.)\1\1/
However, I can't make it to work if I combined it to my existing regex, still no luck:
Regex(#"^(?=.*\d)(?=.*[a-z])(?=.*[A-Z])[0-9a-zA-Z]{8,20}$/(.)\1\1/");
What did I missed here? Thanks!
The problem is that /(.)\1\1/ includes the surrounding / characters which are used to quote literal regular expressions in some languages (like Perl). But even if you don't use the quoting characters, you can't just add it to a regular expression.
At the beginning of your regex, you have to say "What follows cannot contain a character followed by itself and then itself again", like this: (?!.*(.)\1\1). The (?! starts a zero-width negative lookahead assertion. The "zero-width" part means that it does not consume any characters in the input string, and the "negative lookahead assertions" means that it looks forward in the input string to make sure that the given pattern does not appear anywhere.
All told, you want a regex like this:
new Regex(#"^(?!.*(.)\1\1)(?=.*\d)(?=.*[a-z])(?=.*[A-Z])[0-9a-zA-Z]{8,20}$")
I solved by using trial and error:
Regex pattern = new Regex(#"^(?!.*(.)\1\1)(?=.*\d)(?=.*[a-z])(?=.*[A-Z])[0-9a-zA-Z]{8,20}$");

Regular expression - match character, digits and special characters

I have the following string:
test123 test ödo 123teö"st 123 m.1212 123t.est
I only want to match strings as a whole that have either characters, digits and special character mixed together. So the regex should match the following string of the example above:
test123 test ödo 123teö"st 123 m.1212 123t.est
Could someone help me out please?
Update
Sorry for not giving a clear explanation of what I need.
I am using C#.
I need to find words that contain alphanumeric strings (eg abc123, 123abc, a1b2c3, 1abc23 etc). Also I need to find strings that contain any kind of symbols (symbols = anythings else than word characters and digits) (eg abc"123, "abc, ab?dd, 100mm", 345t{asd]dd)
If I find a match, I need to "tokenize" (separate digits, word characters and symbols with whitespace) these strings so abc123 becomes abc 123 or 345t{asd]dd becomes 345 t { asd ] dd etc
Assuming you're using a regex flavor that supports lookaheads and Unicode properties, this should get you started:
(?!(?:\pL+|\pN+|\pP+)(?!\S))\S+
\S+ matches one or more non-whitespace characters, but only after the negative lookahead asserts that those characters are not all letters (\pL), digits (\pN), or punctuation (\pP). The inner negative lookahead--(?!\S)--ensures that the outer one examines all the characters in the word.
Although it might satisfy your requirements, this regex is really just a demonstration of the technique you'll probably want to use. As it is, it can be fooled by "words" with (for example) control characters or dingbats in them.
The answer to the question you’ve actually asked is (?s).+, but perhaps you would care to refine your question.

Regex - Match a string only when it contains any alphabetic characters

example strings
785*()&!~`a
##$%$~2343
455frt&*&*
i want to capture the first and the third but not the second since it doesnt contain any alphabet character plz help
In fact, I think [a-zA-Z] might suffice to match your strings.
To capture the whole thing, try: ^.*[a-zA-Z].*$
Here is one possible way:
.*[a-zA-Z]+
You should maybe clarify a bit what you mean by 'catpuring': do you want the whole string of just the ascii bits?
Also, you don't say if it should match just plain Roman alphabet (A to Z) or if it should also match Unicode chars to match strings in other languages.
If you just need to test your string, in C# you would do:
bool matching = Regex.IsMatch(myString, "[a-zA-Z]");
You wouldn't need anything else, since just one letter anywhere in the myString string will match (according to your definition).
This is my favorite RegEx testing site: Javascript Regexp Tester and Cheat Sheet
If you want to match all letters (including non-ascii ones), use p{L} instead of [a-zA-Z]. See Unicode categories.

Categories

Resources