Regex Help String Matching - c#

I've got a long string in the format of:
WORD_1#WORD_3#WORD_5#CAT_DOG_FISH#WORD_2#WORD_3#CAT_DOG_FISH_2#WORD_7
I'm trying to dynamically match a string so I can return its position within the string.
I know the string will start with CAT_DOG_ but the FISH is dynamic and could be anything. It's also important not to match on the CAT_DOG_FISH_2(int)
Basically, I need to get back a match on any word starting with [CAT_DOG_] but not ending in [_(int)]
I've tried a few different think and I don't seem to be getting anywhere, any help appreciated.
Once I have the regex to match, I'll be able to get the index of the match, then work out when the next #(delimiter) is , which will get me the start/end position of the word, I can then substring it out to return the full word.
I hope that makes sense?

Personally I avoid Regex whenever possible as I find them hard to read and maintain unless you use them a lot, so here is a non-regex solution:
string words = "WORD_1#WORD_3#WORD_5#CAT_DOG_FISH#WORD_2#WORD_3#CAT_DOG_FISH_2#WORD_7";
var result = words.Split('#')
.Select((w,p) => new { WholeWord = w, SplitWord = w.Split('_'), Position = p, Dynamic = w.Split('_').Last() })
.FirstOrDefault(
x => x.SplitWord.Length == 3 &&
x.SplitWord[0] == "CAT" &&
x.SplitWord[1] == "DOG");
That gives you the whole word, the dynamic part and the position. I does assume the dynamic part doesn't have underscores.

You can use the following regex:
\bCAT_DOG_[a-zA-Z]+(?!_\d)\b
See demo
Or (if the FISH is really anything, but not _ or #):
\bCAT_DOG_[^_#]+(?!_\d)\b
See demo
The word boundaries \b with the look-ahead (?!_\d) (meaning that there must be no _ and a digit) help us return only the required strings. The [^_#] character class matches any character but a _ or #.
You can get the indices using LINQ:
var s = "WORD_1#WORD_3#WORD_5#CAT_DOG_FISH#WORD_2#WORD_3#CAT_DOG_FISH_2#WORD_7";
var rx1 = new Regex(#"\bCAT_DOG_[^_#]+(?!_\d)\b");
var indices = rx1.Matches(s).Cast<Match>().Select(p => p.Index).ToList();
Values can be obtained like this:
var values = rx1.Matches(s).Cast<Match>().Select(p => p.Value).ToList();
Or together:
var values = rx1.Matches(s).OfType<Match>().Select(p => new { p.Index, p.Value }).ToList();

Thanks for the help guys, since i know the int the string will end with I've settled on this:
int i = 0;
string[] words = textBox1.Text.Split('#');
foreach (string word in words)
{
if (word.StartsWith("CAT_DOG_") && (!word.EndsWith(i.ToString())) )
{
//process here
MessageBox.Show("match is: " + word);
}
}
Thanks to Eser for pointing me towards String.Split()

Related

Get a number and string from string

I have a kinda simple problem, but I want to solve it in the best way possible. Basically, I have a string in this kind of format: <some letters><some numbers>, i.e. q1 or qwe12. What I want to do is get two strings from that (then I can convert the number part to an integer, or not, whatever). The first one being the "string part" of the given string, so i.e. qwe and the second one would be the "number part", so 12. And there won't be a situation where the numbers and letters are being mixed up, like qw1e2.
Of course, I know, that I can use a StringBuilder and then go with a for loop and check every character if it is a digit or a letter. Easy. But I think it is not a really clear solution, so I am asking you is there a way, a built-in method or something like this, to do this in 1-3 lines? Or just without using a loop?
You can use a regular expression with named groups to identify the different parts of the string you are interested in.
For example:
string input = "qew123";
var match = Regex.Match(input, "(?<letters>[a-zA-Z]+)(?<numbers>[0-9]+)");
if (match.Success)
{
Console.WriteLine(match.Groups["letters"]);
Console.WriteLine(match.Groups["numbers"]);
}
You can try Linq as an alternative to regular expressions:
string source = "qwe12";
string letters = string.Concat(source.TakeWhile(c => c < '0' || c > '9'));
string digits = string.Concat(source.SkipWhile(c => c < '0' || c > '9'));
You can use the Where() extension method from System.Linq library (https://learn.microsoft.com/en-us/dotnet/api/system.linq.enumerable.where), to filter only chars that are digit (number), and convert the resulting IEnumerable that contains all the digits to an array of chars, that can be used to create a new string:
string source = "qwe12";
string stringPart = new string(source.Where(c => !Char.IsDigit(c)).ToArray());
string numberPart = new string(source.Where(Char.IsDigit).ToArray());
MessageBox.Show($"String part: '{stringPart}', Number part: '{numberPart}'");
Source:
https://stackoverflow.com/a/15669520/8133067
if possible add a space between the letters and numbers (q 3, zet 64 etc.) and use string.split
otherwise, use the for loop, it isn't that hard
You can test as part of an aggregation:
var z = "qwe12345";
var b = z.Aggregate(new []{"", ""}, (acc, s) => {
if (Char.IsDigit(s)) {
acc[1] += s;
} else {
acc[0] += s;
}
return acc;
});
Assert.Equal(new [] {"qwe", "12345"}, b);

How to get the count of only special character in a string using Regex?

If my input string is ~!##$%^&*()_+{}:"<>?
How do I get the count of each special character using Regex? For example:
Regex.Matches(inputText, "each special character").Count;
This should be the answer to your question:
Regex.Matches("Little?~ birds! like to# sing##", "[~!##$%^&*()_+{}:\"<>?]").Count
Count should return 6 matches, change the sentence to other variable or something else.
You can find more info about regex expressions here:
http://www.zytrax.com/tech/web/regex.htm
Best Regards!
Instead of thinking of every special characters and adding them up, do it the other way; count every letters/digits and subtract them from the count.
You can do that with a simple one-liner :
string input = "abc?&;3";
int numberOfSpecialCharacters = input.Length - input.Count(char.IsLetterOrDigit); //Gives 3
Which you can also change to
int numberOfSpecialCharacters = input.Count(c => !char.IsLetterOrDigit(c));
Regex is not the best way to do this. here is the Linq based solution
string chars = "~!##$%^&*()_+{}:\"<>?";
foreach (var item in chars.Where(x=> !char.IsLetterOrDigit(x)).GroupBy(x => x))
{
Console.WriteLine(string.Format("{0},{1}",item.Key,item.Count()));
}
I understand that you need to count each spl character count. Correct me If am mistaken.
The non-regex way (which sounds much easier) it to make a list of characters you want to check and use Linq to find the count of those characters.
string inputString = "asdf1!%jkl(!*";
List<char> charsToCheckFor = new List<char>() { '!', '#', '#', ..... };
int charCount = inputString.Count(x => charsToCheckFor.Contains(x));
I am making you write in all the characters you need to check for, because you need to figure out what you want.
If you want to follow other approach then you can use.
string str = "#123:*&^789'!##$*()_+=";
int count = 0;
foreach (char c in str)
{
if (!char.IsLetterOrDigit(c.ToString(),0))
{
count++;
}
}
MessageBox.Show(count.ToString());
It's been a while and I needed a similar answer for handling password validation. Pretty much what VITA said, but here was my specific take for others needing it for the same thing:
var pwdSpecialCharacterCount = Regex.Matches(item, "[~!##$%^&*()_+{}:\"<>?]").Count;
var pwdMinNumericalCharacters = Regex.Matches(item, "[0-9]").Count;
var pwdMinUpperCaseCharacters = Regex.Matches(item, "[A-Z]").Count;
var pwdMinLowerCaseCharacters = Regex.Matches(item, "[a-z]").Count;

how to extract a whole sentence by a single word match in a string?

So I have got a whole string (about 10k chars) and then searching for a word(or many words) in that string. With regex(word).Matches(scrappedstring).
But how to do so to extract the whole sentence, that contains that word. I was thinking of taking a substring after the searched word until the first dot/exclamation mark/question mark/etc. But how to take the part of the sentence before the searched word ?
Or maybe there's a better logic ?
If your boundaries are e.g. ., !, ? and ;, match all sentences across [^.!?;]*(wordmatch)[^.!?;]* expression.
It will give all sentences with desired wordmatch inside.
Example:
var s = "First sentence. Second with wordmatch ? Third one; The last wordmatch, EOM!";
var r = new Regex("[^.!?;]*(wordmatch)[^.!?;]*");
var m = r.Matches(s);
var result = Enumerable.Range(0, m.Count).Select(index => m[index].Value).ToList();
You can get substrings between sentence finishers (dot/exclamation mark/qustion mark/etc) and search for the word in each sentence inside a loop.
Then return the substring when you find the matching word.
Once you have a position, you would then read up to the next ., or end of the file.. but you also need to read backwards from the beginning of the word to a . or the beginning of the file. Those two positions mean you can then extract the sentence.
Note, it's not fool-proof... in its simplest form as outlined above e.g. would mean the sentence started after the g. which is not probably the case.
Extract the sentances from the input. Then search for the specified word(s) within each sentance.
Return the sentances where the word(s) is present.
public List<string> GetMatchedString(string match, string input)
{
var sentanceList = input.Split(new char[] { '.', '?', '!' });
var regex = new Regex(match);
return sentanceList.Where(sentance => regex.Matches(sentance,0).Count > 0).ToList();
}
You can do that using a process in 2 steps.
1st you fragment the phrases and then filter each one has the word.
something like this:
var input = "A large text with many sentences. Many chars in a string!. A sentence without the pattern word.";
//Step 1: fragment phrase.
var patternPhrase = #"(?<=(^|[.!?]\s*))[^ .!?][^.!?]+[.!?]";
//Step 2: filter out only the phrases containing the word.
var patternWord = #"many";
var result = Regex
.Matches(input, patternPhrase) // step 1
.Cast<Match>()
.Select(s => s.Value)
.Where(w => Regex.IsMatch(w, patternWord, RegexOptions.IgnoreCase)); // step 2
foreach (var item in result)
{
//do something with any phrase.
}

Extract digit in a string

I have a list of string
goal0=1234.4334abc12423423
goal1=-234234
asdfsdf
I want to extract the number part from string that start with goal,
in the above case is
1234.4334, -234234
(if two fragments of digit get the first one)
how should i do it easily?
Note that "goal0=" is part of the string, goal0 is not a variable.
Therefore I would like to have the first digit fragment that come after "=".
You can do the following:
string input = "goal0=1234.4334abc12423423";
input = input.Substring(input.IndexOf('=') + 1);
IEnumerable<char> stringQuery2 = input.TakeWhile(c => Char.IsDigit(c) || c=='.' || c=='-');
string result = string.Empty;
foreach (char c in stringQuery2)
result += c;
double dResult = double.Parse(result);
Try this
string s = "goal0=-1234.4334abc12423423";
string matches = Regex.Match(s, #"(?<=^goal\d+=)-?\d+(\.\d+)?").Value;
The regex says
(?<=^goal\d+=) - A positive look behind which means look back and make sure goal(1 or more number)= is at the start of the string, but dont make it part of the match
-? - A minus sign which is optional (the ? means 1 or more)
\d+ - One or more digits
(\.\d+)? - A decimal point followed by 1 or more digits which is optional
This will work if your string contains multiple decimal points as well as it will only take the first set of numbers after the first decimal point if there are any.
Use a regex for extracting:
x = Regex.Match(string, #"\d+").Value;
Now convert the resulting string to the number by using:
finalNumber = Int32.Parse(x);
Please try this:
string sample = "goal0=1234.4334abc12423423goal1=-234234asdfsdf";
Regex test = new Regex(#"(?<=\=)\-?\d*(\.\d*)?", RegexOptions.Singleline);
MatchCollection matchlist = test.Matches(sample);
string[] result = new string[matchlist.Count];
if (matchlist.Count > 0)
{
for (int i = 0; i < matchlist.Count; i++)
result[i] = matchlist[i].Value;
}
Hope it helps.
I didn't get the question at first. Sorry, but it works now.
I think this simple expression should work:
Regex.Match(string, #"\d+")
You can use the old VB Val() function from C#. That will extract a number from the front of a string, and it's already available in the framework:
result0 = Microsoft.VisualBasic.Conversion.Val(goal0);
result1 = Microsoft.VisualBasic.Conversion.Val(goal1);
string s = "1234.4334abc12423423";
var result = System.Text.RegularExpressions.Regex.Match(s, #"-?\d+");
List<String> list = new List<String>();
list.Add("goal0=1234.4334abc12423423");
list.Add("goal1=-23423");
list.Add("asdfsdf");
Regex regex = new Regex(#"^goal\d+=(?<GoalNumber>-?\d+\.?\d+)");
foreach (string s in list)
{
if(regex.IsMatch(s))
{
string numberPart = regex.Match(s).Groups["GoalNumber"];
// do something with numberPart
}
}

C# string does not contain possible?

I'm looking to know when a string does not contain two strings. For example.
string firstString = "pineapple"
string secondString = "mango"
string compareString = "The wheels on the bus go round and round"
So, I want to know when the first string and second string are not in the compareString.
How?
This should do the trick for you.
For one word:
if (!string.Contains("One"))
For two words:
if (!(string.Contains("One") && string.Contains("Two")))
You should put all your words into some kind of Collection or List and then call it like this:
var searchFor = new List<string>();
searchFor.Add("pineapple");
searchFor.Add("mango");
bool containsAnySearchString = searchFor.Any(word => compareString.Contains(word));
If you need to make a case or culture independent search you should call it like this:
bool containsAnySearchString =
searchFor.Any(word => compareString.IndexOf
(word, StringComparison.InvariantCultureIgnoreCase >= 0);
So you can utilize short-circuiting:
bool containsBoth = compareString.Contains(firstString) &&
compareString.Contains(secondString);
Use Enumerable.Contains function:
var result =
!(compareString.Contains(firstString) || compareString.Contains(secondString));
bool isFirst = compareString.Contains(firstString);
bool isSecond = compareString.Contains(secondString );
Option with a regexp if you want to discriminate between Mango and Mangosteen.
var reg = new Regex(#"\b(pineapple|mango)\b",
RegexOptions.IgnoreCase | RegexOptions.Multiline);
if (!reg.Match(compareString).Success)
...
The accepted answer, and most others will present a logic failure when an unassociated word contains another. Such as "low" in "follow". Those are separate words and .Contains and IndexOf will fail on those.
Word Boundary
What is needed is to say that a word must stand alone and not be within another word. The only way to handle that situation is using regular expressions and provide a word boundary \b rule to isolate each word properly.
Tests And Example
string first = "name";
var second = "low";
var sentance = "Follow your surname";
var ignorableWords = new List<string> { first, second };
The following are two tests culled from other answers (to show the failure) and then the suggested answer.
// To work, there must be *NO* words that match.
ignorableWords.Any(word => sentance.Contains(word)); // Returns True (wrong)
ignorableWords.Any(word => // Returns True (wrong)
sentance.IndexOf(word,
StringComparison.InvariantCultureIgnoreCase) >= 0);
// Only one that returns False
ignorableWords.Any(word =>
Regex.IsMatch(sentance, #$"\b{word}\b", RegexOptions.IgnoreCase));
Summary
.Any(word =>Regex.IsMatch(sentance, #$"\b{word}\b", RegexOptions.IgnoreCase)
One to many words to check against.
No internal word failures
Case is ignored.

Categories

Resources