Regex Comparison between two strings - c#

I need to compare strings with standard string structure.
string structure = Bank*Loan*1.pdf;
string name = BankAutoLoanByCustomer1.pdf;
How to compare name string with standard string structure? I am trying to get whether skeleton of name is same as of skeleton of structure.

You test it like this:
bool matches = Regex.IsMatch(name, structure);
However, regex syntax is different to what you're using.
A few special characters you need:
. = any single character
* = 0 or more times
\ = escape character: treat the next character literally, not as a special character.
So, your structure should be more like:
string structure = "Bank.*Loan.*1\\.pdf";
Note that you actually have to use two slashes to escape a character, because C# strings also use \ as an escape character.

You can utilize Operators.LikeString method.
bool isMatch = Operators.LikeString("BankAutoLoanByCustomer1.pdf", "Bank*Loan*1.pdf", Microsoft.VisualBasic.CompareMethod.Text);

Based on your loose criteria:
Regex.IsMatch(name, #"^Bank.*Loan.*1\.pdf$");

So you want to match Bank anything Loan anything 1.pdf?
In that case, you can use the Regex
Bank.*Loan.*1\.pdf
The period means "ANY character except newline". The star means "0 or more times". The \ escapes the period before pdf, because we want an actual period, not an "ANY character except newline".

First you should learn how to declare strings in c# - you need to use double quotes ""
string structure = "Bank*Loan*1.pdf";
string name = "BankAutoLoanByCustomer1.pdf";
Secondly you should ask the right questions - as I see you basically want to find files in a directory that match a specific pattern and then you can use
Directory.GetFiles("directoryName", structure)

Related

Failure To Get Specific Text From Regex Group

My example is working fine with greedy when I use to capture the whole value of a string and a group(in group[1] ONLY) enclose with a pair of single quote.
But when I want to capture the whole value of a string and a group(in group[1] ONLY) enclose with multiple pair of single quote , it only capture the value of string enclose with last pair but not the string between first and last single quotes.
string val1 = "Content:abc'23'asad";
string val2 = "Content:'Scale['#13212']'ta";
Match match1 = Regex.Match(val1, #".*'(.*)'.*");
Match match2 = Regex.Match(val2, #".*'(.*)'.*");
if (match1.Success)
{
string value1 = match1.Value;
string GroupValue1 = match1.Groups[1].Value;
Console.WriteLine(value1);
Console.WriteLine(GroupValue1);
string value2 = match2.Value;
string GroupValue2 = match2.Groups[1].Value;
Console.WriteLine(value2);
Console.WriteLine(GroupValue2);
Console.ReadLine();
// using greedy For val1 i am getting perfect value for-
// value1--->Content:abc'23'asad
// GroupValue1--->23
// BUT using greedy For val2 i am getting the string elcosed by last single quote-
// value2--->Content:'Scale['#13212']'ta
// GroupValue2---> ]
// But i want GroupValue2--->Scale['#13212']
}
The problem with your existing regex is that you are using too many greedy modifiers. That first one is going to grab everything it can until it runs into the second to last apostrophe in the string. That's why your end result of the second example is just the stuff within the last pair of quotes.
There are a few ways to approach this. The simplest way is to use Slai's suggestion - just a pattern to grab anything and everything within the most "apart" apostrophes available:
'(.*)'
A more explicitly defined approach would be to slightly tweak the pattern you are currently using. Just change the first greedy modifier into a lazy one:
.*?'(.*)'.*
Alternatively, you could change the dot in that first and last section to instead match every character other than an apostrophe:
[^']*'(.*)'[^']*
Which one you end up using depends on what you're personally going after. One thing of note, though, is that according to Regex101, the first option involves the fewest steps, so it will be the most efficient method. However, it also dumps the rest of the string, but I don't know if that matters to you.
First off use named match capture groups such as (?<Data> ... ) then you can access that group by its name in C# such as match1.Groups["Data"].Value.
Secondly try not to use * which means zero to many. Is there really going to be no data? For a majority of the cases, that answer is no, there is data.
Use the +, one to many instead.
IMHO * screws up more patterns because it has to find zero data, when it does that, it skips ungodly amounts of data. When you know there is data use +.
It is better to match on what is known, than unknown and we will create a pattern to what is known. Also in that light use the negation set [^ ] to capture text such as [^']+ which says capture everything that is not a ', one to many times.
Pattern
Content:\x27?[^\x27?]+\x27(?<Data>[^\27]+?)\x27
The results on your two sets of data are 23 and #13212 and placed into match capture group[1] and group["Data"].
Note \x27 is the hex escape of the single quote '. \x22 is for the double quote ", which I bet is what you are really running into.
I use the hex escapes when dealing with quotes so not to have to mess with the C# compiler thinking they are quotes while parsing.

Replace with wildcards

I need some advice. Suppose I have the following string: Read Variable
I want to find all pieces of text like this in a string and make all of them like the following:Variable = MessageBox.Show. So as aditional examples:
"Read Dog" --> "Dog = MessageBox.Show"
"Read Cat" --> "Cat = MessageBox.Show"
Can you help me? I need a fast advice using RegEx in C#. I think it is a job involving wildcards, but I do not know how to use them very well... Also, I need this for a school project tomorrow... Thanks!
Edit: This is what I have done so far and it does not work: Regex.Replace(String, "Read ", " = Messagebox.Show").
You can do this
string ns= Regex.Replace(yourString,"Read\s+(.*?)(?:\s|$)","$1 = MessageBox.Show");
\s+ matches 1 to many space characters
(.*?)(?:\s|$) matches 0 to many characters till the first space (i.e \s) or till the end of the string is reached(i.e $)
$1 represents the first captured group i.e (.*?)
You might want to clarify your question... but here goes:
If you want to match the next word after "Read " in regex, use Read (\w*) where \w is the word character class and * is the greedy match operator.
If you want to match everything after "Read " in regex, use Read (.*)$ where . will match all characters and $ means end of line.
With either regex, you can use a replace of $1 = MessageBox.Show as $1 will reference the first matched group (which was denoted by the parenthesis).
Complete code:
replacedString = Regex.Replace(inStr, #"Read (.*)$", "$1 = MessageBox.Show");
The problem with your attempt is, that it cannot know that the replacement string should be inserted after your variable. Let's assume that valid variable names contain letters, digits and underscores (which can be conveniently matched with \w). That means, any other character ends the variable name. Then you could match the variable name, capture it (using parentheses) and put it in the replacement string with $1:
output = Regex.Replace(input, #"Read\s+(\w+)", "$1 = MessageBox.Show");
Note that \s+ matches one or more arbitrary whitespace characters. \w+ matches one or more letters, digits and underscores. If you want to restrict variable names to letters only, this is the place to change it:
output = Regex.Replace(input, #"Read\s+([a-zA-Z]+)", "$1 = MessageBox.Show");
Here is a good tutorial.
Finally note, that in C# it is advisable to write regular expressions as verbatim strings (#"..."). Otherwise, you will have to double escape everything, so that the backslashes get through to the regex engine, and that really lessens the readability of the regex.

Retrive a Digit from a String using Regex

What I am trying to do is fairly simple, although I am running into difficulty. I have a string that is a url, it will have the format http://www.somedomain.com?id=someid what I want to retrive is the someid part. I figure I can use a regular expression but I'm not very good with them, this is what I tried:
Match match = Regex.Match(theString, #"*.?id=(/d.)");
I get a regex exception saying there was an error parsing the regex. The way I am reading this is "any number of characters" then the literal "?id=" followed "by any number of digits". I put the digits in a group so I could pull them out. I'm not sure what is wrong with this. If anyone could tell me what I'm doing wrong I would appreciated it, thanks!
No need for Regex. Just use built-in utilities.
string query = new Uri("http://www.somedomain.com?id=someid").Query;
var dict = HttpUtility.ParseQueryString(query);
var value = dict["id"]
You've got a couple of errors in your regex. Try this:
Match match = Regex.Match(theString, #".*\?id=(\d+)");
Specifically, I:
changed *. to .* (dot matches all non-newline chars and * means zero or more of the preceding)
added a an escape sequence before the ? because the question mark is a special charcter in regular expressions. It means zero or one of the preceding.
changed /d. to \d* (you had the slash going the wrong way and you used dot, which was explained above, instead of * which was also explained above)
Try
var match = RegEx.Match(theString, #".*\?id=(\d+)");
The error is probably due to preceding *. The * character in regex matches zero or more occurrences of previous character; so it cannot be the first character.
Probably a typo, but shortcut for digit is \d, not /d
. matches any character, you need to match one or more digits - so use a +
? is a special character, so it needs to be escaped.
So it becomes:
Match match = Regex.Match(theString, #".*\?id=(\d+)");
That being said, regex is not the best tool for this; use a proper query string parser or things will eventually become difficult to manage.

Regex which ensures no character is repeated

I need to ensure that a input string follows these rules:
It should contain upper case characters only.
NO character should be repeated in the string.
eg. ABCA is not valid because 'A' is being repeated.
For the upper case thing, [A-Z] should be fine.
But i am lost at how to ensure no repeating characters.
Can someone suggest some method using regular expressions ?
You can do this with .NET regular expressions although I would advise against it:
string s = "ABCD";
bool result = Regex.IsMatch(s, #"^(?:([A-Z])(?!.*\1))*$");
Instead I'd advise checking that the length of the string is the same as the number of distinct characters, and checking the A-Z requirement separately:
bool result = s.Cast<char>().Distinct().Count() == s.Length;
Alteranatively, if performance is a critical issue, iterate over the characters one by one and keep a record of which you have seen.
This cannot be done via regular expressions, because they are context-free. You need at least context-sensitive grammar language, so only way how to achieve this is by writing the function by hand.
See formal grammar for background theory.
Why not check for a character which is repeated or not in uppercase instead ? With something like ([A-Z])?.*?([^A-Z]|\1)
Use negative lookahead and backreference.
string pattern = #"^(?!.*(.).*\1)[A-Z]+$";
string s1 = "ABCDEF";
string s2 = "ABCDAEF";
string s3 = "ABCDEBF";
Console.WriteLine(Regex.IsMatch(s1, pattern));//True
Console.WriteLine(Regex.IsMatch(s2, pattern));//False
Console.WriteLine(Regex.IsMatch(s3, pattern));//False
\1 matches the first captured group. Thus the negative lookahead fails if any character is repeated.
This isn't regex, and would be slow, but You could create an array of the contents of the string, and then iterate through the array comparing n to n++
=Waldo
It can be done using what is call backreference.
I am a Java program so I will show you how it is done in Java (for C#, see here).
final Pattern aPattern = Pattern.compile("([A-Z]).*\\1");
final Matcher aMatcher1 = aPattern.matcher("ABCDA");
System.out.println(aMatcher1.find());
final Matcher aMatcher2 = aPattern.matcher("ABCDA");
System.out.println(aMatcher2.find());
The regular express is ([A-Z]).*\\1 which translate to anything between 'A' to 'Z' as group 1 ('([A-Z])') anything else (.*) and group 1.
Use $1 for C#.
Hope this helps.

Replacing numbers in strings with C#

I'd thought i do a regex replace
Regex r = new Regex("[0-9]");
return r.Replace(sz, "#");
on a file named aa514a3a.4s5 . It works exactly as i expect. It replaces all the numbers including the numbers in the ext. How do i make it NOT replace the numbers in the ext. I tried numerous regex strings but i am beginning to think that its a all or nothing pattern so i cant do this? do i need to separate the ext from the string or can i use regex?
This one does it for me:
(?<!\.[0-9a-z]*)[0-9]
This does a negative lookbehind (the string must not occur before the matched string) on a period, followed by zero or more alphanumeric characters. This ensures only numbers are matched that are not in your extension.
Obviously, the [0-9a-z] must be replaced by which characters you expect in your extension.
I don't think you can do that with a single regular expression.
Probably best to split the original string into base and extension; do the replace on the base; then join them back up.
Yes, I thing you'd be better off separating the extension.
If you are sure there is always a 3-character extension at the end of your string, the easiest, most readable/maintainable solution would be to only perform the replace on
yourString.Substring(0,YourString.Length-4)
..and then append
yourString.Substring(YourString.Length-4, 4)
Why not run the regex on the substring?
String filename = "aa514a3a.4s5";
String nameonly = filename.Substring(0,filename.Length-4);

Categories

Resources