Extract multiple values from a string - c#

I need to extract values from a string.
string sTemplate = "Hi [FirstName], how are you and [FriendName]?"
Values I need returned:
FirstName
FriendName
Any ideas on how to do this?

You can use the following regex globally:
\[(.*?)\]
Explanation:
\[ : [ is a meta char and needs to be escaped if you want to match it literally.
(.*?) : match everything in a non-greedy way and capture it.
\] : ] is a meta char and needs to be escaped if you want to match it literally.
Example:
string input = "Hi [FirstName], how are you and [FriendName]?";
string pattern = #"\[(.*?)\]";
Regex rgx = new Regex(pattern, RegexOptions.IgnoreCase);
MatchCollection matches = rgx.Matches(input);
if (matches.Count > 0)
{
Console.WriteLine("{0} ({1} matches):", input, matches.Count);
foreach (Match match in matches)
Console.WriteLine(" " + match.Value);
}

If the format/structure of the text won't be changing at all, and assuming the square brackets were used as markers for the variable, you could try something like this:
string sTemplate = "Hi FirstName, how are you and FriendName?"
// Split the string into two parts. Before and after the comma.
string[] clauses = sTemplate.Split(',');
// Grab the last word in each part.
string[] names = new string[]
{
clauses[0].Split(' ').Last(), // Using LINQ for .Last()
clauses[1].Split(' ').Last().TrimEnd('?')
};
return names;

You will need to tokenize the text and then extract the terms.
string[] tokenizedTerms = new string[7];
char delimiter = ' ';
tokenizedTerms = sTemplate.Split(delimiter);
firstName = tokenizedTerms[1];
friendName = tokenizedTerms[6];
char[] firstNameChars = firstName.ToCharArray();
firstName = new String(firstNameChars, 0, firstNameChars.length - 1);
char[] friendNameChars = lastName.ToCharArray();
friendName = new String(friendNameChars, 0, friendNameChars.length - 1);
Explanation:
You tokenize the terms, which separates the string into a string array with each element being the char sequence between each delimiter, in this case between spaces which is the words. From this word array we know that we want the 3rd word (element) and the 7th word (element). However each of these terms have punctuation at the end. So we convert the strings to a char array then back to a string minus that last character, which is the punctuation.
Note:
This method assumes that since it is a first name, there will only be one string, as well with the friend name. By this I mean if the name is just Will, it will work. But if one of the names is Will Fisher (first and last name), then this will not work.

Related

Regex How to Match 2 fields

How would capture both the filenames inside the quotes, and the numbers following as named captures (Regex / C#)?
Files("fileone.txt", 5969784, "file2.txt", 45345333)
Out of every occurrence in the string, the ability to capture "fileone.txt" and the integer following (a loop cycles each pair)
I am trying to use this https://regex101.com/r/MwMzBo/1 but having issues matching without the '[' and ']'.
Required to be able to loop each filename+size as a pair and moving next.
Any help is appreciated!
UPDATE
string file = "Files(\"fileone.txt\", 5969784, \"file2.txt\", 45345333, \"file2.txt\", 45345333)";
var regex = new Regex(#"(?:\G(?!\A)\s*,\s*|\w+\()(?:""(?<file>.*?)""|'(?<file>.*?)')\s*,\s*(?<number>\d+)");
var match = regex.Match(file);
var names = match.Groups["file"].Captures.Cast<Capture>();
var lengths = match.Groups["number"].Captures.Cast<Capture>();
var filelist = names.Zip(lengths, (f, n) => new { file = f.Value, length = long.Parse(n.Value) }).ToArray();
foreach (var item in filelist)
{
// Only returning 1 pair result, ignoring the rest
}
Reading match.Value to confirm what is being read. Only first pair is being picked up.
while (match.Success)
{
MessageBox.Show(match.Value);
match = match.NextMatch();
}
Now we are getting all results properly. I read, that Regex.Match only returns the first matched result. This explains a lot.
You can use
(?:\G(?!\A)\s*,\s*|\w+\()(?:""(?<file>.*?)""|'(?<file>.*?)')\s*,\s*(?<number>\d+)
See the regex demo
Details:
(?:\G(?!\A)\s*,\s*|\w+\() - end of the previous successful match and a comma enclosed with zero or more whitespaces, or a word and an opening ( char
(?:""(?<file>.*?)""|'(?<file>.*?)') - ", Group "file" capturing any zero or more chars other than a newline char as few as possible and then a ", or a ', Group "file" capturing any zero or more chars other than a newline char as few as possible and then a '
\s*,\s* - a comma enclosed with zero or more whitespaces
(?<number>\d+) - Group "number": one or more digits.
I like doing it in smaller pieces :
string input = "cov('Age', ['5','7','9'])";
string pattern1 = #"\((?'key'[^,]+),\s+\[(?'values'[^\]]+)";
Match match = Regex.Match(input, pattern1);
string key = match.Groups["key"].Value.Trim(new char[] {'\''});
string pattern2 = #"'(?'value'[^']+)'";
string values = match.Groups["values"].Value;
MatchCollection matches = Regex.Matches(values, pattern2);
int[] number = matches.Cast<Match>().Select(x => int.Parse(x.Value.Replace("'",string.Empty))).ToArray();

How to split string by another string

I have this string (it's from EDI data):
ISA*ESA?ISA*ESA?
The * indicates it could be any character and can be of any length.
? indicates any single character.
Only the ISA and ESA are guaranteed not to change.
I need this split into two strings which could look like this: "ISA~this is date~ESA|" and
"ISA~this is more data~ESA|"
How do I do this in c#?
I can't use string.split, because it doesn't really have a delimeter.
You can use Regex.Split for accomplishing this
string splitStr = "|", inputStr = "ISA~this is date~ESA|ISA~this is more data~ESA|";
var regex = new Regex($#"(?<=ESA){Regex.Escape(splitStr)}(?=ISA)", RegexOptions.Compiled);
var items = regex.Split(inputStr);
foreach (var item in items) {
Console.WriteLine(item);
}
Output:
ISA~this is date~ESA
ISA~this is more data~ESA|
Note that if your string between the ISA and ESA have the same pattern that we are looking for, then you will have to find some smart way around it.
To explain the Regex a bit:
(?<=ESA) Look-behind assertion. This portion is not captured but still matched
(?=ISA) Look-ahead assertion. This portion is not captured but still matched
Using these look-around assertions you can find the correct | character for splitting
Simply use the
int x = whateverString.indexOf("?ISA"); // replace ? with the actual character here
and then just use the substring from 0 to that indexOf, indexOf to length.
Edit:
If ? is not known,
can we just use the regex Pattern and Matcher.
Matcher matcher = Patter.compile("ISA.*ESA").match(whateverString);
if(matcher.find()) {
matcher.find();
int x = matcher.start();
}
Here x would give that start index of that match.
Edit: I mistakenly saw it as java one, for C#
string pattern = #"ISA.*ESA";
Regex myRegex = new Regex(pattern, RegexOptions.IgnoreCase);
Match m = myRegex.Match(whateverString); // m is the first match
while (m.Success)
{
Console.writeLine(m.value);
m = m.NextMatch(); // more matches
}
RegEx will probably be the best for this. See this link
Mask would be
ISA(?<data1>.*?)ESA.ISA(?<data2>.*?)ESA.
This will give you 2 groups with data you need
Match match = Regex.Match(input, #"ISA(?<data1>.*?)ESA.ISA(?<data2>.*?)ESA.",RegexOptions.IgnoreCase);
if (match.Success)
{
var data1 = match.Groups["data1"].Value;
var data2 = match.Groups["data2"].Value;
}
Use Regex.Matches If you need multiple matches found, and specify different RegexOptions if needed.
It's kinda hacky but you could do...
string x = "ISA*ESA?ISA*ESA?";
x = x.Replace("*","~"); // OR SOME OTHER DELIMITER
string[] y = x.Split('~');
Not perfect in all situations, but it could solve your problem simply.
You could split by "ISA" and "ESA" and then put the parts back together.
string input = "ISA~this is date~ESA|ISA~this is more data~ESA|";
string start = "ISA",
end = "ESA";
var splitedInput = input.Split(new[] { start, end }, StringSplitOptions.None);
var firstPart = $"{start}{splitedInput[1]}{end}{splitedInput[2]}";
var secondPart = $"{start}{splitedInput[3]}{end}{splitedInput[4]}";
firstPart = "ISA~this is date~ESA|"
secondPart = "ISA~this is more data~ESA|";
Use a Regex like ISA(.+?)ESA and select the first group
string input = "ISA~mycontent+ESA";
Match match = Regex.Match(input, #"ISA(.+?)ESA",RegexOptions.IgnoreCase);
if (match.Success)
{
string key = match.Groups[1].Value;
}
Instead of "splitting" by a string, I would instead describe your question as "grouping" by a string. This can easily be done using a regular expression:
Regular expression: ^(ISA.*?(?=ESA)ESA.)(ISA.*?(?=ESA)ESA.)$
Explanation:
^ - asserts position at start of the string
( - start capturing group
ISA - match string ISA exactly
.*?(?=ESA) - match any character 0 or more times, positive lookahead on the
string ESA (basically match any character until the string ESA is found)
ESA - match string ESA exactly
. - match any character
) - end capturing group
repeat one more time...
$ - asserts position at end of the string
Try it on Regex101
Example:
string input = "ISA~this is date~ESA|ISA~this is more data~ESA|";
Regex regex = new Regex(#"^(ISA.*?(?=ESA)ESA.)(ISA.*?(?=ESA)ESA.)$",
RegexOptions.Compiled);
Match match = regex.Match(input);
if (match.Success)
{
string firstValue = match.Groups[1].Value; // "ISA~this is date~ESA|"
string secondValue = match.Groups[2].Value; // "ISA~this is more data~ESA|"
}
There are two answers to the question "How to split a string by another string".
var matches = input.Split(new [] { "ISA" }, StringSplitOptions.RemoveEmptyEntries);
and
var matches = Regex.Split(input, "ISA").ToList();
However, the first removes empty entries, while the second does not.

How would I split TO extract spaces in C#

Split or Regex.Split is used to extract the word in a sentence(s) and store them in array. I instead would like to extract the spaces in a sentence(s) and store them in array (it is possible that this sentence contains multiple spaces). Is there easy way of doing it? I first tried to split it normally, and then use string.split(theSplittedStrings, StringSplitOptions.RemoveEmptyEntries) however, that did not preserve the amount of spaces that exists.
---------- EDIT -------------
for example. If there is a sentence "This is a test".
I would like to make an array of string { " ", " ", " "}.
---------- EDIT END ---------
Any helps are appreciated.
Thank you.
EDIT:
Based on your edited question, I believe you can do that with simple iteration like:
string str = "This is a test";
List<string> spaceList = new List<string>();
var temp = str.TakeWhile(char.IsWhiteSpace).ToList();
List<char> charList = new List<char>();
foreach (char c in str)
{
if (c == ' ')
{
charList.Add(c);
}
if (charList.Any() && c != ' ')
{
spaceList.Add(new string(charList.ToArray()));
charList = new List<char>();
}
}
That would give you spaces in different elements of List<string>, if you need an array back then you can call ToArray
(Old Answer)
You don't need string.Split. You can count the spaces in the string and then create array like:
int spaceCount = str.Count(r => r == ' ');
char[] array = Enumerable.Repeat<char>(' ', spaceCount).ToArray();
If you want to consider White-Space (Space, LineBreak, Tabs) as space then you can use:
int whiteSpaceCount = str.Count(char.IsWhiteSpace);
This code matches all spaces in the input string and outputs their indexes:
const string sentence = "This is a test sentence.";
MatchCollection matches = Regex.Matches(sentence, #"\s");
foreach (Match match in matches)
{
Console.WriteLine("Space at character {0}", match.Index);
}
This code retrieves all space groups as an array:
const string sentence = "This is a test sentence.";
string[] spaceGroups = Regex.Matches(sentence, #"\s+").Cast<Match>().Select(arg => arg.Value).ToArray();
In either case, you can look at the Match instances' Index property values to get the location of the space/space group in the string.

How to replace the text between two characters in c#

I am bit confused writing the regex for finding the Text between the two delimiters { } and replace the text with another text in c#,how to replace?
I tried this.
StreamReader sr = new StreamReader(#"C:abc.txt");
string line;
line = sr.ReadLine();
while (line != null)
{
if (line.StartsWith("<"))
{
if (line.IndexOf('{') == 29)
{
string s = line;
int start = s.IndexOf("{");
int end = s.IndexOf("}");
string result = s.Substring(start+1, end - start - 1);
}
}
//write the lie to console window
Console.Write Line(line);
//Read the next line
line = sr.ReadLine();
}
//close the file
sr.Close();
Console.ReadLine();
I want replace the found text(result) with another text.
Use Regex with pattern: \{([^\}]+)\}
Regex yourRegex = new Regex(#"\{([^\}]+)\}");
string result = yourRegex.Replace(yourString, "anyReplacement");
string s = "data{value here} data";
int start = s.IndexOf("{");
int end = s.IndexOf("}", start);
string result = s.Substring(start+1, end - start - 1);
s = s.Replace(result, "your replacement value");
To get the string between the parentheses to be replaced, use the Regex pattern
string errString = "This {match here} uses 3 other {match here} to {match here} the {match here}ation";
string toReplace = Regex.Match(errString, #"\{([^\}]+)\}").Groups[1].Value;
Console.WriteLine(toReplace); // prints 'match here'
To then replace the text found you can simply use the Replace method as follows:
string correctString = errString.Replace(toReplace, "document");
Explanation of the Regex pattern:
\{ # Escaped curly parentheses, means "starts with a '{' character"
( # Parentheses in a regex mean "put (capture) the stuff
# in between into the Groups array"
[^}] # Any character that is not a '}' character
* # Zero or more occurrences of the aforementioned "non '}' char"
) # Close the capturing group
\} # "Ends with a '}' character"
The following regular expression will match the criteria you specified:
string pattern = #"^(\<.{27})(\{[^}]*\})(.*)";
The following would perform a replace:
string result = Regex.Replace(input, pattern, "$1 REPLACE $3");
For the input: "<012345678901234567890123456{sdfsdfsdf}sadfsdf" this gives the output "<012345678901234567890123456 REPLACE sadfsdf"
You need two calls to Substring(), rather than one: One to get textBefore, the other to get textAfter, and then you concatenate those with your replacement.
int start = s.IndexOf("{");
int end = s.IndexOf("}");
//I skip the check that end is valid too avoid clutter
string textBefore = s.Substring(0, start);
string textAfter = s.Substring(end+1);
string replacedText = textBefore + newText + textAfter;
If you want to keep the braces, you need a small adjustment:
int start = s.IndexOf("{");
int end = s.IndexOf("}");
string textBefore = s.Substring(0, start-1);
string textAfter = s.Substring(end);
string replacedText = textBefore + newText + textAfter;
the simplest way is to use split method if you want to avoid any regex .. this is an aproach :
string s = "sometext {getthis}";
string result= s.Split(new char[] { '{', '}' })[1];
You can use the Regex expression that some others have already posted, or you can use a more advanced Regex that uses balancing groups to make sure the opening { is balanced by a closing }.
That expression is then (?<BRACE>\{)([^\}]*)(?<-BRACE>\})
You can test this expression online at RegexHero.
You simply match your input string with this Regex pattern, then use the replace methods of Regex, for instance:
var result = Regex.Replace(input, "(?<BRACE>\{)([^\}]*)(?<-BRACE>\})", textToReplaceWith);
For more C# Regex Replace examples, see http://www.dotnetperls.com/regex-replace.

Regex Pattern - Alphanumeric

[username] where username is any string containing only alphanumeric chars between 1 and 12 characters long
My code:
Regex pat = new Regex(#"\[[a-zA-Z0-9_]{1,12}\]");
MatchCollection matches = pat.Matches(accountFileData);
foreach (Match m in matches)
{
string username = m.Value.Replace("[", "").Replace("]", "");
MessageBox.Show(username);
}
Gives me one blank match
This gets you a name inside brackets (the match does't contain the square brackets symbol):
(?<=\[)[A-Za-z0-9]{1,12}(?=\])
You could use it like:
Regex pat = new Regex(#"(?<=\[)[A-Za-z0-9]{1,12}(?=\])");
MatchCollection matches = pat.Matches(accountFileData);
foreach (Match m in matches)
{
MessageBox.Show(m.Value);
}
You have too many brackets, and you may want to match the beginning (^) and end ($) of the string.
^[a-zA-Z0-9]{1,12}$
If you are expecting square brackets in the string you are matching, then escape them with a backslash.
\[[a-zA-Z0-9]{1,12}\]
// In C#
new Regex(#"\[[a-zA-Z0-9]{1,12}\]")
You have too many brackets.
[a-zA-Z0-9]{1, 12}
If you're trying to match the brackets, they need to be escaped properly:
\[[a-zA-Z0-9]{1, 12}\]

Categories

Resources