Regular expression to check if a string contains all specific symbols

Regular expression to check if a string contains all specific symbols - c#

the problem is that possible strings are:
abcdefghijklmnopqrstuvwxyz
(sorted)
and i have other one string that have chars to find like: adef
what is the regex to check if all of specified characters is in string?
test cases:
string: amnosxy
find chars: osy
result: true
strings: amnosxy
find chars: anz
result: false ( z not found ).
it looks like containsAll method
what is the regex to check? (it is possible to make it dinamically depends on find chars string).
I don't like solution like loop for each chars and check IndexOf..

No need to use regex:
bool containsAll = !"osy".Except("amnosxy").Any();
Another efficient approach is using a HashSet<char> and it's IsSubsetOf method:
HashSet<char> chars = new HashSet<char>("osy");
bool containsAll = chars.IsSubsetOf("amnosxy");

I wouldn't use regular expressions for this if it is guaranteed that both arrays are sorted. Just loop input[x] through until you find toFind[y] or until toFind[y] is bigger than input[x] -> which would mean there is no such element.
Edit: alternative RegEx: .*o.*s.*y.*, so just put .* between all those chars.

Related

Using Regular Expression to match fields with an arbitrary delimiter

I suppose this should be an old question, however, I didn't find suitable solution in the forums after several hours searching.
I'm using C# and I know the Regex.Split and String.Split methods can be used to achieve the expected results. For some reason, I need to use a regular expression to match the required fields by specifying an arbitrary delimiter. For example, here is the string:
#DIV#This#DIV#is#DIV#"A "#DIV#string#DIV#
Here, #DIV# is the delimiter and is going to be split as:
This
is
"A "
string
How can I use a regular expression to match these values?
By the way, the leading and trailing #DIV# could also be ignored, for example, below source string should also be same result with above:
#DIV#This#DIV#is#DIV#"A "#DIV#string
This#DIV#is#DIV#"A "#DIV#string#DIV#
This#DIV#is#DIV#"A "#DIV#string

UPDATE:
I think I found a way (mind it is not efficient!) to get rid of empty values with a regex.
var splits = Regex.Matches(strIn, #"(?<=#DIV#|^)(?:(?!#DIV#).)+?(?=$|#DIV#)");
See demo on regexstorm (mind the \r? is only to demo in Multiline mode, you do not need it when using in real life)
ORIGINAL ANSWER
Here is another approach using a regular Split:
var strIn = "#DIV#This#DIV#is#DIV#\"A # \"#DIV#string#DIV#";
var splitText = strIn.Split(new[] {"#DIV#"}, StringSplitOptions.RemoveEmptyEntries);
Or else, you can use a regex to match the fields you need and then remove empty items with LINQ:
var spltsTxt2 = Regex.Matches(strIn, #"(?<=#DIV#|^).*?(?=#DIV#|$)").Cast<Match>().Where(p => !string.IsNullOrEmpty(p.Value)).Select(p => p.Value).ToList();
Output:

#DIV#|(.+?)(?=#DIV#|$)
Try this.Grab the captures or groups.See demo.
https://www.regex101.com/r/fJ6cR4/21

You can use the following to match:
/#?DIV#?/g
And replace with ' ' (space)
But this will give trailing and leading spaces sometimes.. which can be removed by using String.Trim()
Edit1: If you want to match the field values you can use the following:
(?<=(#?DIV#?)|^)[^#]*?(?=(#?DIV#?)|$)
See DEMO
Edit2: More generalized regex for matching # in fields:
(?m)(?<=(^(?!#?DIV#)|(#?DIV#)))(.*?)(?=($|(#DIV#?)))

Regular Expression - Remove zeroes inside an expression

I need to remove leading zeroes from the numerical part of an expression (using .net 2.0 C# Regex class).
Ex:
PAR0000034 -> PAR34
WP0003204 -> WP3204
I tried the following:
//keep starting characters, get rid of leading zeroes, keep remaining digits
string result = Regex.Replace(inputStr, "^(.+)(0+)(/d*)", "$1$3", RegexOptions.IgnoreCase)
Obviously, it did not work. I need a bit of help to find the mistake.

You don't need a regular expression for that, the Split method can do that for you.
Splitting on '0', removing empty entries (i.e. between the mulitple zeroes), and limiting the result to two strings will give you the two strings before and after the leading zeroes. Then you just put those two strings together again:
string result = String.Concat(
input.Split(new char[] { '0' }, 2, StringSplitOptions.RemoveEmptyEntries)
);

In your expression the .* part is greedy, so it catches full string. Further
use backslash instead of slash for digit \d
string result = Regex.Replace(inputStr, #"^([^0]+)(0+)(\d*)", "$1$3");
Or use look behind instead:
string result = Regex.Replace(inputStr, "(?<=[a-zA-Z])0+", "");

This works for me:
Regex.Replace("PPP00001001", "([^0]*)0+(.*)", "$1$2");

The phrase "leading zeroes" is confusing, since the zeroes you're talking about aren't actually at the beginning of the string. But if I understand you correctly, you want this:
string result = Regex.Replace(inputStr, "^(.*?)0+", "$1");
There are actually several ways to do it, with and without regex, but the above is probably the shortest and easiest to understand. The important part is the .*? lazy quantifier. This will ensure that it a) finds only the first string of zeroes, and b) deletes all the "leading" zeroes in the string.

Check Formatting of a String

This has probably been answered somewhere before but since there are millions of unrelated posts about string formatting.
Take the following string:
24:Something(true;false;true)[0,1,0]
I want to be able to do two things in this case. I need to check whether or not all the following conditions are true:
There is only one : Achieved using Split() which I needed to use anyway to separate the two parts.
The integer before the : is a 1-3 digit int Simple int.parse logic
The () exists, and that the "Something", in this case any string less than 10 characters, is there
The [] exists and has at least 1 integer in it. Also, make sure the elements in the [] are integers separated by ,
How can I best do this?
EDIT: I have crossed out what I've achieved so far.

A regular expression is the quickest way. Depending on the complexity it may also be the most computationally expensive.
This seems to do what you need (I'm not that good so there might be better ways to do this):
^\d{1,3}:\w{1,9}\((true|false)(;true|;false)*\)\[\d(,[\d])*\]$
Explanation
\d{1,3}
1 to 3 digits
:
followed by a colon
\w{1,9}
followed by a 1-9 character alpha-numeric string,
\((true|false)(;true|;false)*\)
followed by parenthesis containing "true" or "false" followed by any number of ";true" or ";false",
\[\d(,[\d])*\]
followed by another set of parenthesis containing a digit, followed by any number of comma+digit.
The ^ and $ at the beginning and end of the string indicate the start and end of the string which is important since we're trying to verify the entire string matches the format.
Code Sample
var input = "24:Something(true;false;true)[0,1,0]";
var regex = new System.Text.RegularExpressions.Regex(#"^\d{1,3}:.{1,9}\(.*\)\[\d(,[\d])*\]$");
bool isFormattedCorrectly = regex.IsMatch(input);
Credit # Ian Nelson

This is one of those cases where your only sensible option is to use a Regular Expression.
My hasty attempt is something like:
var input = "24:Something(true;false;true)[0,1,0]";
var regex = new System.Text.RegularExpressions.Regex(#"^\d{1,3}:.{1,9}\(.*\)\[\d(,[\d])*\]$");
System.Diagnostics.Debug.Assert(regex.IsMatch(input));
This online RegEx tester should help refine the expression.

I think, the best way is to use regular expressions like this:
string s = "24:Something(true;false;true)[0,1,0]";
Regex pattern = new Regex(#"^\d{1,3}:[a-zA-z]{1,10}\((true|false)(;true|;false)*\)\[\d(,\d)*\]$");
if (pattern.IsMatch(s))
{
// s is valid
}
If you want anything inside (), you can use following regex:
#"^\d{1,3}:[a-zA-z]{1,10}\([^:\(]*\)\[\d(,\d)*\]$"

Regex which ensures no character is repeated

I need to ensure that a input string follows these rules:
It should contain upper case characters only.
NO character should be repeated in the string.
eg. ABCA is not valid because 'A' is being repeated.
For the upper case thing, [A-Z] should be fine.
But i am lost at how to ensure no repeating characters.
Can someone suggest some method using regular expressions ?

You can do this with .NET regular expressions although I would advise against it:
string s = "ABCD";
bool result = Regex.IsMatch(s, #"^(?:([A-Z])(?!.*\1))*$");
Instead I'd advise checking that the length of the string is the same as the number of distinct characters, and checking the A-Z requirement separately:
bool result = s.Cast<char>().Distinct().Count() == s.Length;
Alteranatively, if performance is a critical issue, iterate over the characters one by one and keep a record of which you have seen.

This cannot be done via regular expressions, because they are context-free. You need at least context-sensitive grammar language, so only way how to achieve this is by writing the function by hand.
See formal grammar for background theory.

Why not check for a character which is repeated or not in uppercase instead ? With something like ([A-Z])?.*?([^A-Z]|\1)

Use negative lookahead and backreference.
string pattern = #"^(?!.*(.).*\1)[A-Z]+$";
string s1 = "ABCDEF";
string s2 = "ABCDAEF";
string s3 = "ABCDEBF";
Console.WriteLine(Regex.IsMatch(s1, pattern));//True
Console.WriteLine(Regex.IsMatch(s2, pattern));//False
Console.WriteLine(Regex.IsMatch(s3, pattern));//False
\1 matches the first captured group. Thus the negative lookahead fails if any character is repeated.

This isn't regex, and would be slow, but You could create an array of the contents of the string, and then iterate through the array comparing n to n++
=Waldo

It can be done using what is call backreference.
I am a Java program so I will show you how it is done in Java (for C#, see here).
final Pattern aPattern = Pattern.compile("([A-Z]).*\\1");
final Matcher aMatcher1 = aPattern.matcher("ABCDA");
System.out.println(aMatcher1.find());
final Matcher aMatcher2 = aPattern.matcher("ABCDA");
System.out.println(aMatcher2.find());
The regular express is ([A-Z]).*\\1 which translate to anything between 'A' to 'Z' as group 1 ('([A-Z])') anything else (.*) and group 1.
Use $1 for C#.
Hope this helps.

Replacing numbers in strings with C#

I'd thought i do a regex replace
Regex r = new Regex("[0-9]");
return r.Replace(sz, "#");
on a file named aa514a3a.4s5 . It works exactly as i expect. It replaces all the numbers including the numbers in the ext. How do i make it NOT replace the numbers in the ext. I tried numerous regex strings but i am beginning to think that its a all or nothing pattern so i cant do this? do i need to separate the ext from the string or can i use regex?

This one does it for me:
(?<!\.[0-9a-z]*)[0-9]
This does a negative lookbehind (the string must not occur before the matched string) on a period, followed by zero or more alphanumeric characters. This ensures only numbers are matched that are not in your extension.
Obviously, the [0-9a-z] must be replaced by which characters you expect in your extension.

I don't think you can do that with a single regular expression.
Probably best to split the original string into base and extension; do the replace on the base; then join them back up.

Yes, I thing you'd be better off separating the extension.
If you are sure there is always a 3-character extension at the end of your string, the easiest, most readable/maintainable solution would be to only perform the replace on
yourString.Substring(0,YourString.Length-4)
..and then append
yourString.Substring(YourString.Length-4, 4)

Why not run the regex on the substring?
String filename = "aa514a3a.4s5";
String nameonly = filename.Substring(0,filename.Length-4);

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Regular expression to check if a string contains all specific symbols - c#

No need to use regex: bool containsAll = !"osy".Except("amnosxy").Any(); Another efficient approach is using a HashSet<char> and it's IsSubsetOf method: HashSet<char> chars = new HashSet<char>("osy"); bool containsAll = chars.IsSubsetOf("amnosxy");

Related

Using Regular Expression to match fields with an arbitrary delimiter

Regular Expression - Remove zeroes inside an expression

Check Formatting of a String

Regex which ensures no character is repeated

Replacing numbers in strings with C#

Categories

Resources