I am splitting a string with a period and space(". "), I want to split with a ". " but ignore if it matches few patterns like MR. , JR. , [oneletter]. , Dr.
Pattern list is static.(case insensitive)
Examples:
1) My Name is MR. ABC and working for XYZ.
Output: No split. Just one line
2) My Name is Mr. ABC. I work for XYZ.
Output: string[0] = My Name is Mr. ABC.
string[1] = I work for XYZ.
3) My Name is ABC. I work for XYZ.
Output: string[0] = My Name is ABC.
string[1] = I work for XYZ.
4) My Name is MR. ABC Jr. DEF. I work for XYZ.
Output: string[0] = My Name is MR. ABC Jr. DEF. (MR. and Jr. are ignoring cases )
string[1] = I work for XYZ.
Using sln's regex pattern here's a mock up of how it should work
List<string> ignores = new List<string>(){ "MR", "MS", "MRS", "DR", "PROF" };
ignores = ignores.Select(x => #"\b" + x).ToList();
string alphabet = "ABCDEFGHIJKLMNOPQRSTUVWXYZ";
foreach (char letter in alphabet.ToCharArray())
{
ignores.Add(#"\b" + letter);
}
string test = "This is a test for Prof. Plum. Here is a test for Ms. White. This is A. Test. Welcome to GMR. Next Line.";
string regexPattern = $#"(?<!{string.Join("|", ignores)})\.\s";
string[] results = Regex.Split(test, regexPattern, RegexOptions.IgnoreCase);
results are the 3 sentences (though you need to re-add the . to the end of all but the last value)
Edited to add all single character ignores
Edited to only account for whole words on ignore list
Related
I am trying to extract a first name from a text snippet, which optionally has a last name in the same line as: <first_name>name<last_name>
E.g.:
Text: JohnnameSnow -> Result: John
Text: John -> Result: John
So I want to extract the <first_name> part from that line, but if there is no name<last_name> it should return the full line.
I have tried the following Regex:
([A-zÀ-ÿ-]{2,})(?=(?:name))
That works fine if there's actually a last name in the same line, but does not return me the full line when there is not. Unfortunately the solution doesn't seem to be as easy as adding |$.
Can I look for an optional end word and ignore it if it does not occur?
You can use
^(?<first>\p{L}+?)(?:name(?<last>\p{L}+))?$
See the regex demo. Output:
Details
^ - start of string
(?<first>\p{L}+?) - Group "first": one or more letters, but as few as possible
(?:name(?<last>\p{L}+))? - an optional non-capturing group:
name - a substring
(?<last>\p{L}+) - Group "last": one or more letters
$ - end of string.
See the C# demo:
var strings = new List<string> { "JohnnameSnow", "John" };
foreach (var s in strings)
{
Console.WriteLine(s);
var m = Regex.Match(s, #"^(?<first>\p{L}+?)(?:name(?<last>\p{L}+))?$");
if (m.Success)
{
Console.WriteLine("First name: {0}, Last name = {1}", m.Groups["first"].Value, m.Groups["last"].Value);
}
else
{
Console.WriteLine("No match!");
}
}
Output:
JohnnameSnow
First name: John, Last name = Snow
John
First name: John, Last name =
How do I only get numbers and include whitespaces in one string and only text and white spaces in another?
Iv'e tried this:
string value1 = "123 45 New York";
string result1 = Regex.Match(value1, #"^[\w\s]*$").Value;
string value2 = "123 45 New York";
string result2 = Regex.Match(value2, #"^[\w\s]*$").Value;
result1 need to be "123 45"
result2 need to be " New York"
Try next code:
string value1 = "123 45 New York";
string digitsAndSpaces = Regex.Match(value1, #"([0-9 ]+)").Value;
string value2 = "123 45 New York";
string lettersAndSpaces = Regex.Match(value2, #"([A-Za-z ])+([A-Za-z ]+)").Value;
Update:
How do I allow charachters like å ä ö in result from value2?
string value3 = "å ä ö";
string speclettersAndSpaces = Regex.Match(value3, #"([a-zÀ-ÿ ])+([a-zÀ-ÿ ]+)").Value;
The fallowing regex will allow only digits and spaces between them, the same goes with characters.
Regex: (?:\d[0-9 ]*\d)|(?:[A-Za-z][A-Za-z ]*[A-Za-z])
Details:
(?:) Non-capturing group
\d matches a digit (equal to [0-9])
[] Match a single character present in the list
* Matches between zero and unlimited times
| or
Output:
Match 1
Full match 0-6 `123 45`
Match 2
Full match 7-15 `New York`
Regex demo
I want to code
var text = "14. hello my friends we meet 1 test, 2 baby 3 wiki 4 marvel";
string[] split = text.Split('14.', 1, 2, 3, 4);
var needText = split[0].Replace('14.', '');
"1" "2" "3" "4" is static text.
but, "14." is dynamic text.
ex)
var text2 = "1972. google youtube. 1 phone, 2 star 3 tv 4 mouse";
string[] split = text.Split('1972.', 1, 2, 3, 4);
var needText = split[0].Replace('1972.', '');
If you have dynamic separators like this, String.Split is not suitable. Use Regex.Split instead.
You can give a pattern to Regex.Split and it will treat every substring that matches the pattern as a separator.
In this case, you need a pattern like this:
\d+\. |1|2|3|4
| are or operators. \d matches any digit character. + means match between 1 to unlimited times. \. matches the dot literally because . has special meaning in regex.
Usage:
var split = Regex.Split(text, "\\d+\\. |1|2|3|4");
And I think the text you need is at index 1 of split.
Remember to add a using directive to System.Text.RegularExpressions!
If you use IndexOf() with Substring(), you can very easily grab the information you need. If it's any more complex than your examples then use Regex.
var text = "14. hello my friends we meet 1 test, 2 baby 3 wiki 4 marvel";
var strArr = text.Substring(text.IndexOf(' ')).Split('1', '2', '3', '4');
I need to parse a German address that I get in one string like "Example Street 5b". I want to split it in groups: Street, Number and Additional Information.
For example: address = Test Str. 5b
-> Street: "Test Str." Number: "5", Add.: "b"
My code looks like that:
string street = "";
string number = "";
string addition = "";
//this works:
string address = "Test Str. 5b";
//this doesn't match, but I want it in the street group:
//string address = "Test Str.";
Match adressMatch = Regex.Match(address, #"(?<street>.*?\.*)\s*(?<number>[1-9][0-9]*)\s*(?<addition>.*)");
street = adressMatch.Groups["street"].Value;
number = adressMatch.Groups["number"].Value;
addition = adressMatch.Groups["addition"].Value;
That code works well for the example and most other cases.
My problem:
If the adress does not contain a number, the function fails. I tried to add *? after the number group and several other things, but then the whole string got parsed into the "addition" and "street" and "number" remain empty. But if the number is missing, I want the string to parse into "street" and "number" and "addition" shall remain empty.
Thanks in advance :)
I would do it like this: I'd match the street into the street group, then match the number - if any - into the number group, and then the rest into the addition group.
Then, if the number group does not succeed, the addition value should be moved to the number group, which can be done easily within C# code.
So, use
(?<street>.*\.)(?:\s*(?<number>[1-9][0-9]*))?\s*(?<addition>.*)
^^ ^^ ^^
See the regex demo here (note the changes: the first .*? is turned greedy, the * quantifier after \. is removed, the number group is made optional together with the \s* in front).
Then, use this logic (C# sample snippet):
string street = "";
string number = "";
string addition = "";
//string address = "Test Str. 5b"; // => Test Str. | 5 | b
string address = "Test Str. b"; // => Test Str. | b |
Match adressMatch = Regex.Match(address, #"(?<street>.*\.)(?:\s*(?<number>[1-9][0-9]*))?\s*(?<addition>.*)");
if (adressMatch.Success) {
street = adressMatch.Groups["street"].Value;
addition = adressMatch.Groups["addition"].Value;
if (adressMatch.Groups["number"].Success)
number = adressMatch.Groups["number"].Value;
else
{
number = adressMatch.Groups["addition"].Value;
addition = string.Empty;
}
}
Console.WriteLine("Street: {0}\nNumber: {1}\nAddition: {2}", street, number, addition);
Let's say my texts are:
New York, NY is where I live.
Boston, MA is where I live.
Kentwood in the Pines, CA is where I live.
How do I extract just "New York", "Boston", "Kentwood in the Pines".
I can extract State name by pattern #"\b,\s(?"<"state">"\w\w)\s\w+\s\w+\s\w\s\w+"
I am using regular expression but I'm not able to figure out how to extract city names as city names can be more than two words or three.
Just substring from the beginning of the string to the first comma:
var city = input.Substring(0, input.IndexOf(','));
This will work if your format is always [City], [State] is where I live. and [City] never contains a comma.
this is want you need ..
static void Main(string[] args)
{
string exp = "New York, NY is where I live. Boston, MA is where I live. Kentwood in the Pines, CA is where I live.";
string reg = #"[\w\s]*(?=,)";
var matches = Regex.Matches(exp, reg);
foreach (Match m in matches)
{
Console.WriteLine(m.ToString());
}
Console.ReadLine();
}