Use RegEx to uppercase and lowercase the string - c#

I am trying to convert a string to uppercase and lowercase based on the index.
My string is a LanguageCode like cc-CC where cc is the language code and CC is the country code. The user can enter in any format like "cC-Cc". I am using the regular expression to match whether the data is in the format cc-CC.
var regex = new Regex("^[a-z]{2}-[A-Z]{2}$", RegexOptions.IgnoreCase);
//I can use CultureInfos from .net framework and compare it's valid or not.
//But the requirement is it should allow invalid language codes also as long
//The enterd code is cc-CC format
Now when the user enters something cC-Cc I'm trying to lowercase the first two characters and then uppercase last two characters.
I can split the string using - and then concatenate them.
var languageDetails = languageCode.Split('-');
var languageCodeUpdated = $"{languageDetails[0].ToLowerInvariant()}-{languageDetails[1].ToUpperInvariant()}";
I thought can I avoid multiple strings creation and use RegEx itself to uppercase and lowercase accordingly.
While searching for the same I found some solutions to use \L and \U but I am not able to use them as the C# compiler showing error. Also, RegEx.Replace() has a parameter or delegate MatchEvaluator which I'm not able to understand.
Is there any way in C# we can use RegEx to replace uppercase with lowercase and vice versa.

.NET regex does not support case modifying operators.
You may use MatchEvaluator:
var result = Regex.Replace(s, #"(?i)^([a-z]{2})-([a-z]{2})$", m =>
$"{m.Groups[1].Value.ToLower()}-{m.Groups[2].Value.ToUpper()}");
See the C# demo.
Details
(?i) - the inline version of RegexOptions.IgnoreCase mopdiofier
^ - start of the string
([a-z]{2}) - Capturing group #1: 2 ASCII letters
- - a hyphen
([a-z]{2}) - Capturing group #2: 2 ASCII letters
$ - end of string.

TLDR: This is Regex.Replace with \U and \L support.
private static string EnhancedReplace(string input, string pattern, string replacement, RegexOptions options)
{
replacement = Regex.Replace(replacement, #"(?<mode>\\[UL])(?<group>\$((\d+)|({[^}]+})))", #"<!<mode:${mode}>%&${group}&%>");
var output = Regex.Replace(input, pattern, replacement, options);
output = Regex.Replace(output, #"<!<mode:\\L>%&(?<value>[\w\W]*?)&%>", x => x.Groups["value"].Value.ToLower());
output = Regex.Replace(output, #"<!<mode:\\U>%&(?<value>[\w\W]*?)&%>", x => x.Groups["value"].Value.ToUpper());
return output;
}
How To Use
Call the function with \U followed by the group to be uppercase
var result = EnhancedReplace(input, #"(public \w+ )(\w)", #"$1\U$2", RegexOptions.None);
Will replace this:
public string test12 { get; set; } = "test3";
With that:
public string Test12 { get; set; } = "test3";
Details
I'm currently working on an app which allows the user to define a batch of Regex Replace operations.
For example the user enters json and the batch converts it to a C#-Class.
Therefore, speed is no key requirement. But it would be very handy to be able to use \U and \L.
This method will apply Regex.Replace 3 times to the whole content and one time to the replacement string. Therefore it’s at least three times slower than Regex.Replace without \U \L support.
Step by Step
The first Regex.Replace enhances the replacement string.
It replaces: \U$1 with <!<mode:\\U>%&$1&%>
(Also works for named groups: ${groupName})
The new replacement will be applied to the content.
& 4. The inserted placeholder is now relatively unique. That allows you to search only for <!<mode:\\U>%&Actual Value&%> and use the MatchEvaluator to replace it with its uppercase version. The same will be done for \L
Regex101 Demo:
Step 1: Enhance pattern with placeholder
https://regex101.com/r/ZtqigN/1
Step 2 Use new replacement pattern
https://regex101.com/r/PWLTFD/1
Step 3&4 Resolve new placeholders
https://regex101.com/r/5DIIUo/1
Answer
var result = EnhancedReplace(input, #"(cc)(-)(cc)", #"\L$1$2\U$3", RegexOptions.IgnoreCase);

Related

Extract groups with regex and construct URL in a single line

I am currently trying to extract values from a string and construct a URL that includes those values. I went through a dozen regex question, but I am not quite satisfied with the answers.
I have custom encoded strings with more than one information and I want to construct a new URL that contains those information.
For example 35afe06d-8393-4559-b6d7-74d35ce131d8|Master should become http://my-server/media/guid/35afe06d-8393-4559-b6d7-74d35ce131d8?v=Master. My first assumption was
var input = "35afe06d-8393-4559-b6d7-74d35ce131d8|Master"
var pattern = #"((?:[a-f0-9]+-?){5})|(\w+)"
var replacement = "http://my-server/media/guid/$1?v=$2"
var output = Regex.Replace(input, pattern, replacement)
However this replaces each group with the full URL. Limitation is, that I am not aware of input, pattern, replacement or output. pattern and replacement are two config values and I don't want to make it x pairs of config values, input comes from somewhere else in the application and could have any custom encoding (pipe, colon, ...) output depends on the use case. It can have any number of groups in the pattern and doesn't even have to be a URL in the end.
I can think of different ways to do this, like parsing the string myself, or trying to create a replacement dictionary, or using regex to find the groups and then string replace for $1 => match.Groups[0]. I just feel like there must be an obvious 1-liner solution for that in .NET since I even remember doing that in PHP.
Answer: It's not a .NET limitation, it was simply the unescaped pipe.
In your pattern (([a-f0-9]+-?){5})|\w+ the second group should be capturing the word characters after the pipe (escape the pipe to match it literally).
If you repeat this part ([a-f0-9]+-?) 5 times, the match could also end on a hyphen.
To match the values separated by the dash, you could match the character class [a-f0-9]+ and repeat matching that {4} times prepended by a -
([a-f0-9]+(?:-[a-f0-9]+){4})\|(\w+)
.NET Regex demo | C# demo
var input = "35afe06d-8393-4559-b6d7-74d35ce131d8|Master";
var pattern = #"([a-f0-9]+(?:-[a-f0-9]+){4})\|(\w+)";
var replacement = "http://my-server/media/guid/$1?v=$2";
var output = Regex.Replace(input, pattern, replacement);
Console.WriteLine(output);
Result
http://my-server/media/guid/35afe06d-8393-4559-b6d7-74d35ce131d8?v=Master
This expression might also work here:
^(\b[0-9a-f]{8}\b-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-\b[0-9a-f]{12}\b)\s*\|\s*(.*?)\s*$
The expression is explained on the top right panel of this demo if you wish to explore/simplify/modify it.
Test
using System;
using System.Text.RegularExpressions;
public class Example
{
public static void Main()
{
string pattern = #"^(\b[0-9a-f]{8}\b-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-\b[0-9a-f]{12}\b)\s*\|\s*(.*?)\s*$";
string substitution = #"http://my-server/media/guid/\1?v=$2";
string input = #"35afe06d-8393-4559-b6d7-74d35ce131d8|Master
35afe06d-8393-4559-b6d7-74d35ce131d8| Master ";
RegexOptions options = RegexOptions.Multiline;
Regex regex = new Regex(pattern, options);
string result = regex.Replace(input, substitution);
}
}
Reference
Searching for UUIDs in text with regex

Splitting of a string using Regex

I have string of the following format:
string test = "test.BO.ID";
My aim is string that part of the string whatever comes after first dot.
So ideally I am expecting output as "BO.ID".
Here is what I have tried:
// Checking for the first occurence and take whatever comes after dot
var output = Regex.Match(test, #"^(?=.).*?");
The output I am getting is empty.
What is the modification I need to make it for Regex?
You get an empty output because the pattern you have can match an empty string at the start of a string, and that is enough since .*? is a lazy subpattern and . matches any char.
Use (the value will be in Match.Groups[1].Value)
\.(.*)
or (with a lookahead, to get the string as a Match.Value)
(?<=\.).*
See the regex demo and a C# online demo.
A non-regex approach can be use String#Split with count argument (demo):
var s = "test.BO.ID";
var res = s.Split(new[] {"."}, 2, StringSplitOptions.None);
if (res.GetLength(0) > 1)
Console.WriteLine(res[1]);
If you only want the part after the first dot you don't need a regex at all:
x.Substring(x.IndexOf('.'))

Regex to text before set of numbers

I have text like this
Inc12345_Month
Ted12345_Month
J8T12345_Month
What I need to do is extract the 12345 and also remove everything before it. This will be done in C#
.+?(?=\d_Monthly) was working in a regex tester online but when I put it in my code it only returned 5_Month.
Edit: the 12345 could be a variable length so I cannot [0-9] multiple times.
Edit2: Code this was just to try and remove everything before the 12345
string text = /* the above text pulled in from a file */;
Regex reg = new Regex(#".+?(?=\d+_Monthly)");
text = reg.Replace(string, "");
You can use this function to strip it:
private static Regex getNumberAndBeyondRegex = new Regex(^.{2}\D+(\d.*)$", RegexOptions.Compiled);
public static string GetNumberAndBeyond(string input)
{
var match = getNumberAndBeyondRegex.Match(input);
if (!match.Success) throw new ArgumentException("String isn't in the correct format.", "input");
return match.Groups[1].Value;
}
The regex at work is ^.{2}\D+(\d.*)$
It works by grabbing anything that's a number, after at least one character that isn't a number. It'll not only match _Month but also other endings.
The regex exists out of a few parts:
^ matches the beginning of the string
.{2} matches any two characters, to prevent a digit from matching if it's the first or 2nd character, you can increase this number to be equal to the minimum prefix length - 1
\D+ matches at least one character that isn't a number
( starts capturing a group
\d.* matches at least one number and any values beyond that
) closes the capturing group
$ matches the end of the string
There are a lot of different regex flavors, many of them have slight differences in terms of escaping, capturing, replacing and quite surely some others.
For testing .NET regexes online I use the free version of the tool RegexHero, it has an popup every now and then, but it makes up for that time by showing you live results, capture groups, and instant replacing. Next to having quite a lot of features.
If you want to match anywhere within the string, you can use the regex \d+_Month, it is very similiar to your original regex. In code:
new Regex("\d+_Month").Match(input).Value
Edit:
Based on the format you supplied in the comment I've created a regex and function to parse the entire file name:
private static Regex parseFileNameRegex = new Regex(#"^.*\D(\d+)_Month_([a-zA-Z]+)\.(\w+)$", RegexOptions.Compiled);
public static bool TryParseFileName(string fileName, out int id, out string month, out string fileExtension)
{
id = 0; month = null; fileExtension = null;
if (fileName == null) return false;
var match = parseFileNameRegex.Match(fileName);
if (!match.Success) return false;
if (!int.TryParse(match.Groups[1].Value, out id) || id < 1) return false; // Convert the ID into a number
month = match.Groups[2].Value;
fileExtension = match.Groups[3].Value;
return true;
}
In the parse function it requires the ID to be at least 1, 0 isn't accepted (and negative numbers won't match the regex), if you don't want this restriction, simply remove || id < 1 from the function.
Using the function would look like:
int id; string month, fileExtension;
if (!TryParseFileName("CompanyName_ClientName12345_Month_Nov.pdf", out id, out month, out fileExtension))
throw new FormatException("File name is incorrectly formatted."); // Do whatever you want when you get an invalid filename
// Use id, month and fileExtension here :)
The regex ^.*\D(\d+)_Month_([a-zA-Z]+)\.(\w+)$ works like:
^ matches the beginning of the string
.*\D matches at least one non-numeric character
(\d+) captures at least 1 number, this is the ID
_Month_ is the literal text in between
([a-zA-Z]+) matches and captures at least 1 letter, this is the month
\. matches a . character
(\w+) matches and captures any alphanumeric (letters and numbers), this is the file extension
$ matches the end of the string
Using :
Regex reg = new Regex(#"\D+(?=(\d+)_Monthly)");
is more explicit, the result is in Groups[1].
Part by part:
.+?
Match anything, maybe. This doesn't make any sense to me. It would be equivalent to ".*", which may or may not be what you meant.
(?=
start a group
\d
Match exactly 1 decimal, which explains what you are seeing, and the rest of the number is matched by .+? which is outside the group
_Monthly
match the literal text
)
end group
I think what you want is:
.*(?=\d+_Monthly)
I guess you are missing the + sign after \d
.+?(?=\d+_Monthly)
This should ask for one or more digits.
If you don't need anything before the number, this should work:
(\d+_Month)
I use Derek Slager's regex tester when I'm working with C# regex.
Better dotnet regular expression tester

Validate filename in c# through regex

I want to validate a filename with this format : LetterNumber_Enrollment_YYYYMMDD_HHMM.xml
string filename = "Try123_Enrollment_20130102_1200.xml";
Regex pattern = new Regex(#"[a-zA-z]_Enrollment_[0-9]{6}_[0-9]{4}\\.xml");
if (pattern.IsMatch(filename))
{
return isValid = true;
}
However, I can't make it to work.
Any thing that i missed here?
You are not matching digits at the beginning. Your pattern should be: ^[A-Za-z0-9]+_Enrollment_[0-9]{8}_[0-9]{4}\.xml$ to match given string.
Changes:
Your string starts with alphanumeric string before first _ symbol so you need to check both (letters and digits).
After Environment_ part you have digits with the length of 8 not 6.
No need of double \. You need to escape just dot (i.e. \.).
Demo app:
using System;
using System.Text.RegularExpressions;
class Test {
static void Main() {
string filename = "Try123_Enrollment_20130102_1200.xml";
Regex pattern = new Regex(#"^[A-Za-z0-9]+_Enrollment_[0-9]{8}_[0-9]{4}\.xml$");
if (pattern.IsMatch(filename))
{
Console.WriteLine("Matched");
}
}
}
Your Regex is nowhere near your actual string:
you only match a single letter at the start (and no digits) so Try123 doesn't match
you match 6 digits instead of 8 at the date part so 20130102 doesn't match
you have escaped your backslash near the end (\\.xml) but you've also used # on your string: with # you don't need to escape.
Try this instead:
#"[a-zA-Z]{3}\d{3}_Enrollment_[0-9]{8}_[0-9]{4}\.xml"
I've assumed you want only three letters and three numbers at the start; in fact you may want this:
#"[\w]*_Enrollment_[0-9]{8}_[0-9]{4}\.xml"
You can try the following, it matches letters and digits at the beginning and also ensures that the date is valid.
[A-Za-z0-9]+_Enrollment_(19|20)\d\d(0[1-9]|1[012])(0[1-9]|[12][0-9]|3[01])_[0-9]{4}\.xml
As an aside, to test your regular expressions try the free regular expression designer from Rad Software, I find that it helps me work out complex expressions beforehand.
http://www.radsoftware.com.au/regexdesigner/

match first digits before # symbol

How to match all first digits before # in this line
26909578#Sbrntrl_7x06-lilla.avi#356028416#2012-10-24 09:06#0#http://bitshare.com/files/dvk9o1oz/Sbrntrl_7x06-lilla.avi.html#[URL=http://bitshare.com/files/dvk9o1oz/Sbrntrl_7x06-lilla.avi.html]http://bitshare.com/files/dvk9o1oz/Sbrntrl_7x06-lilla.avi.html[/URL]#http://bitshare.com/files/dvk9o1oz/Sbrntrl_7x06-lilla.avi.html#http://bitshare.com/?f=dvk9o1oz#http://bitshare.com/delete/dvk9o1oz/4511e6f3612961f961a761adcb7e40a0/Sbrntrl_7x06-lilla.avi.html
Im trying to get this number 26909578
My try
string text = #"26909578#Sbrntrl_7x06-lilla.avi#356028416#2012-10-24 09:06#0#http://bitshare.com/files/dvk9o1oz/Sbrntrl_7x06-lilla.avi.html#[URL=http://bitshare.com/files/dvk9o1oz/Sbrntrl_7x06-lilla.avi.html]http://bitshare.com/files/dvk9o1oz/Sbrntrl_7x06-lilla.avi.html[/URL]#http://bitshare.com/files/dvk9o1oz/Sbrntrl_7x06-lilla.avi.html#http://bitshare.com/?f=dvk9o1oz#http://bitshare.com/delete/dvk9o1oz/4511e6f3612961f961a761adcb7e40a0/Sbrntrl_7x06-lilla.avi.html";
MatchCollection m1 = Regex.Matches(text, #"(.+?)#", RegexOptions.Singleline);
but then its outputs all text
Make it explicit that it has to start at the beginning of the string:
#"^(.+?)#"
Alternatively, if you know that this will always be a number, restrict the possible characters to digits:
#"^\d+"
Alternatively use the function Match instead of Matches. Matches explicitly says, "give me all the matches", while Match will only return the first one.
Or, in a trivial case like this, you might also consider a non-RegEx approach. The IndexOf() method will locate the '#' and you could easily strip off what came before.
I even wrote a sscanf() replacement for C#, which you can see in my article A sscanf() Replacement for .NET.
If you dont want to/dont like to use regex, use a string builder and just loop until you hit the #.
so like this
StringBuilder sb = new StringBuilder();
string yourdata = "yourdata";
int i = 0;
while(yourdata[i]!='#')
{
sb.Append(yourdata[i]);
i++;
}
//when you get to that # your stringbuilder will have the number you want in it so return it with .toString();
string answer = sb.toString();
The entire string (except the final url) is composed of segments that can be matched by (.+?)#, so you will get several matches. Retrieve only the first match from the collection returned by matching .+?(?=#)

Categories

Resources