Extract specific number from string with fixed pattern in C# - c#

This might sound like a very basic question, but it's one that's given me quite a lot of trouble in C#.
Assume I have, for example, the following Strings known as my chosenTarget.titles:
2008/SD128934 - Wordz aaaaand more words (1233-26-21)
20998/AD1234 - Wordz and less words (1263-21-21)
208/ASD12345 - Wordz and more words (1833-21-21)
Now as you can see, all three Strings are different in some ways.
What I need is to extract a very specific part of these Strings, but getting the subtleties right is what confuses me, and I was wondering if some of you knew better than I.
What I know is that the Strings will always come in the following pattern:
yearNumber + "/" + aFewLetters + theDesiredNumber + " - " + descriptiveText + " (" + someDate + ")"
In the above example, what I would want to return to me would be:
128934
1234
12345
I need to extract theDesiredNumber.
Now, I'm not (that) lazy so I have made a few attempts myself:
var a = chosenTarget.title.Substring(chosenTarget.title.IndexOf("/") + 1, chosenTarget.title.Length - chosenTarget.title.IndexOf("/"));
What this has done is sliced out yearNumber and the /, leaving me with aFewLetter before theDesiredNumber.
I have a hard time properly removing the rest however, and I was wondering if any of you could aid me in the matter?

It sounds as if you only need to extract the number behind the first / which ends at -. You could use a combination of string methods and LINQ:
int startIndex = str.IndexOf("/");
string number = null;
if (startIndex >= 0 )
{
int endIndex = str.IndexOf(" - ", startIndex);
if (endIndex >= 0)
{
startIndex++;
string token = str.Substring(startIndex, endIndex - startIndex); // SD128934
number = String.Concat(token.Where(char.IsDigit)); // 128934
}
}
Another mainly LINQ approach using String.Split:
number = String.Concat(
str.Split(new[] { " - " }, StringSplitOptions.None)[0]
.Split('/')
.Last()
.Where(char.IsDigit));

Try this:
int indexSlash = chosenTarget.title.IndexOf("/");
int indexDash = chosenTarget.title.IndexOf("-");
string out = new string(chosenTarget.title.Substring(indexSlash,indexDash-indexSlash).Where(c => Char.IsDigit(c)).ToArray());

You can use a regex:
var pattern = "(?:[0-9]+/\w+)[0-9]";
var matcher = new Regex(pattern);
var result = matcher.Matches(yourEntireSetOfLinesInAString);
Or you can loop every line and use Match instead of Matches. In this case you don't need to build a "matcher" in every iteration but build it outside the loop

Regex is your friend:
(new [] {"2008/SD128934 - Wordz aaaaand more words (1233-26-21)",
"20998/AD1234 - Wordz and less words (1263-21-21)",
"208/ASD12345 - Wordz and more words (1833-21-21)"})
.Select(x => new Regex(#"\d+/[A-Z]+(\d+)").Match(x).Groups[1].Value)

The pattern you had recognized is very important, here is the solution:
const string pattern = #"\d+\/[a-zA-Z]+(\d+).*$";
string s1 = #"2008/SD128934 - Wordz aaaaand more words(1233-26-21)";
string s2 = #"20998/AD1234 - Wordz and less words(1263-21-21)";
string s3 = #"208/ASD12345 - Wordz and more words(1833-21-21)";
var strings = new List<string> { s1, s2, s3 };
var desiredNumber = string.Empty;
foreach (var s in strings)
{
var match = Regex.Match(s, pattern);
if (match.Success)
{
desiredNumber = match.Groups[1].Value;
}
}

I would use a RegEx for this, the string you're looking for is in Match.Groups[1]
string composite = "2008/SD128934 - Wordz aaaaand more words (1233-26-21)";
Match m= Regex.Match(composite,#"^\d{4}\/[a-zA-Z]+(\d+)");
if (m.Success) Console.WriteLine(m.Groups[1]);
The breakdown of the RegEx is as follows
"^\d{4}\/[a-zA-Z]+(\d+)"
^ - Indicates that it's the beginning of the string
\d{4} - Four digits
\/ - /
[a-zA-Z]+ - More than one letters
(\d+) - More than one digits (the parenthesis indicate that this part is captured as a group - in this case group 1)

Related

Replace text in camelCase inside the string

I have string like:
/api/agencies/{AgencyGuid}/contacts/{ContactGuid}
I need to change text in { } to cameCase
/api/agencies/{agencyGuid}/contacts/{contactGuid}
How can I do that? What is the best way to do that? Please help
I have no experience with Regex. So, I have tried so far:
string str1 = "/api/agencies/{AgencyGuid}/contacts/{ContactGuid}";
string str3 = "";
int i = 0;
while(i < str1.Length)
{
if (str1[i] == '{')
{
str3 += "{" + char.ToLower(str1[i + 1]);
i = i + 2;
} else
{
str3 += str1[i];
i++;
}
}
You can do it with regex of course.
But you can do it also with LINQ like this:
var result = String.Join("/{",
str1.Split(new string[1] { "/{" }, StringSplitOptions.RemoveEmptyEntries)
.Select(k => k = !k.StartsWith("/") ? Char.ToLowerInvariant(k[0]) + k.Substring(1) : k));
What is done here is: Splitting into 3 parts:
"/api/agencies/"
"AgencyGuid}/contactpersons"
"ContactPersonGuid}"
After that we are selecting from each element such value: "If you start with "/" it means you are the first element. If so - you should be returned without tampering. Otherwise : take first char (k[0]) change it to lowercase ( Char.ToLowerInvariant() ) and concatenate with the rest.
At the end Join those three (one unchanged and two changed) strings
With Regex you can do it as:
var regex = new Regex(#"\/{(\w)");
var result = regex.Replace(str1, m => m.ToString().ToLower());
in regex we search for pattern "/{\w" meaning find "/{" and one letter (\w). This char will be taken into a group ( because of () surrounding) and after that run Regex and replace such group to m.ToString().ToLower()
I probably wouldn't use regex, but since you asked
Regex.Replace(
"/api/agencies/{AgencyGuid}/contactpersons/{ContactPersonGuid}",
#"\{[^\}]+\}",
m =>
$"{{{m.Value[1].ToString().ToLower()}{m.Value.Substring(2, m.Value.Length-3)}}}",
RegexOptions.ExplicitCapture
)
This assumes string interpolation in c# 6, but you can do the same thing by concatenating.
Explanation:
{[^}]+} - grab all letters that follow an open mustache that are not a close mustache and then the close mustache
m => ... - A lambda to run on each match
"{{{m.Value[1].ToString().ToLower()}{m.Value.Substring(2, m.Value.Length-3)}}}" - return a new string by taking the an open mustache, the first letter lowercased, then the rest of the string, then a close mustache.

regex match partial or whole word

I am trying to figure out a regular expression which can match either the whole word or a (predefined in length, e.g first 4 chars) part of the word.
For example, if I am trying to match the word between and my offset is set to 4, then
between betwee betwe betw
are matches, but not the
bet betweenx bet12 betw123 beta
I have created an example in regex101, where I am trying (with no luck) a combination of positive lookahead (?=) and a non-word boundary \B.
I found a similar question which proposes a word around in its accepted answer. As I understand, it overrides the matcher somehow, to run all the possible regular expressions, based on the word and an offset.
My code has to be written in C#, so I am trying to convert the aforementioned code. As I see Regex.Replace (and I assume Regex.Match also) can accept delegates to override the default functionality, but I can not make it work.
You could take the first 4 characters, and make the remaining ones optional.
Then wrap these in word boundaries and parenthesis.
So in the case of "between", it would be
#"\b(betw)(?:(e|ee|een)?)\b"
The code to achieve that would be:
public string createRegex(string word, int count)
{
var mandatory = word.Substring(0, count);
var optional = "(" + String.Join("|", Enumerable.Range(1, count - 1).Select(i => word.Substring(count, i))) + ")?";
var regex = #"\b(" + mandatory + ")(?:" + optional + #")\b";
return regex;
}
The code in the answer you mentioned simply builds up this:
betw|betwe|betwee|between
So all you need is to write a function, to build up a string with a substrings of given word given minimum length.
static String BuildRegex(String word, int min_len)
{
String toReturn = "";
for(int i = 0; i < word.Length - min_len +1; i++)
{
toReturn += word.Substring(0, min_len+i);
toReturn += "|";
}
toReturn = toReturn.Substring(0, toReturn.Length-1);
return toReturn;
}
Demo
You can use this regex
\b(bet(?:[^\s]){1,4})\b
And replace bet and the 4 dynamically like this:
public static string CreateRegex(string word, int minLen)
{
string token = word.Substring(0, minLen - 1);
string pattern = #"\b(" + token + #"(?:[^\s]){1," + minLen + #"})\b";
return pattern;
}
Here's a demo: https://regex101.com/r/lH0oL2/1
EDIT: as for the bet1122 match, you can edit the pattern this way:
\b(bet(?:[^\s0-9]){1,4})\b
If you don't want to match some chars, just enqueue them into the [] character class.
Demo: https://regex101.com/r/lH0oL2/2
For more info, see http://www.regular-expressions.info/charclass.html

pick up a particular string before a hyphen and after a star in c#

I want to pick up the three strings from a string for example Vks - Vks * Son
txtName.Text = objDoctor.DocName.Substring(0, objDoctor.DocName.IndexOf("-")).Trim();
I have successfully obtained the first part i.e.
txtMidName.Text = objDoctor.DocName.Substring(1, objDoctor.DocName.IndexOf("-")).Trim();
txtLastName.Text = objDoctor.DocName.Substring(0,objDoctor.DocName.LastIndexOf("*")).Trim();
Note: The second part is the MidName and 3rd part of Vks - Vks * Son i.e. Son is the Last Name
Do check this out :
Name = Name.Trim();
arrNames = Name.Split(' ', '-' , '*');
if (arrNames.Length > 0) {
GivenName = arrNames[0];
}
if (arrNames.Length > 1) {
FamilyName = arrNames[arrNames.Length - 1];
}
if (arrNames.Length > 2) {
MiddleName = string.Join(" ", arrNames, 1, arrNames.Length - 2);
}
Would really help someone
Simply use string.Split:
var parts = objDoctor.DocName.Split('-', '*');
txtName.Text = parts[0];
txtMidName.Text = parts[1];
txtLastName.Text = parts[2];
Please note that this will throw an exception if the string doesn't contain at least three parts.
Use the String.Split method. I'm also assuming you want to get rid of the spaces too?
String data = "Vks - Vks * Son";
String[] pieces = (from piece in data.Split(new char[]{'-', '*'}, StringSplitOptions.RemoveEmptyEntries)
select piece.Trim()).ToArray();
Console.WriteLine(pieces[0]);
Console.WriteLine(pieces[1]);
Console.WriteLine(pieces[2]);
So I used a little LINQ to help trim the spaces off of the string pieces after they have been split.
This gives the following results on the command line:
Vks
Vks
Son
I would use Regular Expressions for parsing strings like these.
But of course there is a learning curve. But once mastered, it is so much easier. So here is the alternative:
string parse = "Vks - Vks * Son";
string pattern = #"(?'first'\w*)\s*-\s*(?'second'\w*)\s*\*\s*(?'last'\w*)";
Match m = Regex.Match(parse, pattern);
if (m.Success)
{
Console.WriteLine(m.Groups["first"].Value);
Console.WriteLine(m.Groups["second"].Value);
Console.WriteLine(m.Groups["last"].Value);
}
Quick explanation of the pattern:
We are breaking the string in 3 parts: first, second, and last.
A word (0 or more characters) is \w* - if at least character is required use \w+
\s*-\s* is space (0 or more whitespace character) followed by '-' and then space (again 0 or more white space character.
\s*\*\s* - as '*' is a special character we have to escape it.
Hope this helps.
BD

Getting a value from a string using regular expressions?

I have a string "Page 1 of 15".
I need to get the value 15 as this value could be any number between 1 and 100. I'm trying to figure out:
If regular expressions are best suited here. Considering string will never change maybe just split the string by spaces? Or a better solution.
How to get the value using a regular expression.
Regular expression you can use: Page \d+ of (\d+)
Regex re = new Regex(#"Page \d+ of (\d+)");
string input = "Page 1 of 15";
Match match = re.Match(input);
string pages = match.Groups[1].Value;
Analysis of expression: between ( and ) you capture a group. The \d stands for digit, the + for one or more digits. Note that it is important to be exact, so copy the spaces as well.
The code is a tad verbose, but I figured it'd be better understandable this way. With a split you just need: var pages = input.Split(' ')[3];, looks easier, but is error-prone. The regex is easily extended to grab other parts of the string in one go.
var myString = "Page 1 of 15";
var number = myString.SubString(myString.LastIndexOf(' ') + 1);
If there is a risk of whitespace at the end of the string then apply a TrimEnd method:
var number = myString.SubString(myString.TrimEnd().LastIndexOf(' ') + 1);
I think a simple String.Replace() is the best, most readable solution for something so simple.
string myString = "Page 1 of 15";
string pageNumber = myString.Replace("Page 1 of ", "");
EDIT:
The above solution assumes that the string will never be Page 2 of 15. If the first page number can change, you'll need to use String.Split() instead.
string myString = "Page 1 of 15";
string pageNumber = myString.Split(new string[] {"of"},
StringSplitOptions.None).Last().Trim();
if the string format will never change then you can try this...
string input = "Page 1 of 15";
string[] splits = input.Split(' ');
int totalPages = int.Parse(splits[splits.Length - 1]);
If this is the only case you will ever have to handle, just split the string by spaces and use the parse the 4th part to an integer. Something like that will work:
string input = "Page 1 of 15";
string[] splitInput = string.Split(' ');
int numberOfPages = int.Parse(splitInput[3]);
In c# it should looks like this (using Regex class):
Regex r = new Regex(#"Page \d+ of (\d+)");
var str = "Page 1 of 15";
var mathes = r.Matches(str);
Your resoult will be in: mathes[0].Groups[1]
You dont need a regex for this. Just the the index of the last space
string var = "1 of100";
string secondvar = string.Empty;
int startindex = var.LastIndexOf(" ");
if (startindex > -1)
{
secondvar = var.Substring(startindex +1);
}

Why Group.Value always the last matched group string?

Recently, I found one C# Regex API really annoying.
I have regular expression (([0-9]+)|([a-z]+))+. I want to find all matched string. The code is like below.
string regularExp = "(([0-9]+)|([a-z]+))+";
string str = "abc123xyz456defFOO";
Match match = Regex.Match(str, regularExp, RegexOptions.None);
int matchCount = 0;
while (match.Success)
{
Console.WriteLine("Match" + (++matchCount));
Console.WriteLine("Match group count = {0}", match.Groups.Count);
for (int i = 0; i < match.Groups.Count; i++)
{
Group group = match.Groups[i];
Console.WriteLine("Group" + i + "='" + group.Value + "'");
}
match = match.NextMatch();
Console.WriteLine("go to next match");
Console.WriteLine();
}
The output is:
Match1
Match group count = 4
Group0='abc123xyz456def'
Group1='def'
Group2='456'
Group3='def'
go to next match
It seems that all group.Value is the last matched string ("def" and "456"). I spent some time to figure out that I should count on group.Captures instead of group.Value.
string regularExp = "(([0-9]+)|([a-z]+))+";
string str = "abc123xyz456def";
//Console.WriteLine(str);
Match match = Regex.Match(str, regularExp, RegexOptions.None);
int matchCount = 0;
while (match.Success)
{
Console.WriteLine("Match" + (++matchCount));
Console.WriteLine("Match group count = {0}", match.Groups.Count);
for (int i = 0; i < match.Groups.Count; i++)
{
Group group = match.Groups[i];
Console.WriteLine("Group" + i + "='" + group.Value + "'");
CaptureCollection cc = group.Captures;
for (int j = 0; j < cc.Count; j++)
{
Capture c = cc[j];
System.Console.WriteLine(" Capture" + j + "='" + c + "', Position=" + c.Index);
}
}
match = match.NextMatch();
Console.WriteLine("go to next match");
Console.WriteLine();
}
This will output:
Match1
Match group count = 4
Group0='abc123xyz456def'
Capture0='abc123xyz456def', Position=0
Group1='def'
Capture0='abc', Position=0
Capture1='123', Position=3
Capture2='xyz', Position=6
Capture3='456', Position=9
Capture4='def', Position=12
Group2='456'
Capture0='123', Position=3
Capture1='456', Position=9
Group3='def'
Capture0='abc', Position=0
Capture1='xyz', Position=6
Capture2='def', Position=12
go to next match
Now, I am wondering why the API design is like this. Why Group.Value only returns the last matched string? This design doesn't look good.
The primary reason is historical: regexes have always worked that way, going back to Perl and beyond. But it's not really bad design. Usually, if you want every match like that, you just leave off the outermost quantifier (+ in ths case) and use the Matches() method instead of Match(). Every regex-enabled language provides a way to do that: in Perl or JavaScript you do the match in /g mode; in Ruby you use the scan method; in Java you call find() repeatedly until it returns false. Similarly, if you're doing a replace operation, you can plug the captured substrings back in as you go with placeholders ($1, $2 or \1, \2, depending on the language).
On the other hand, I know of no other Perl 5-derived regex flavor that provides the ability to retrieve intermediate capture-group matches like .NET does with its CaptureCollections. And I'm not surprised: it's actually very seldom that you really need to capture all the matches in one go like that. And think of all the storage and/or processing power it can take to keep track of all those intermediate matches. It is a nice feature though.

Categories

Resources