string.IndexOf search for whole word match

string.IndexOf search for whole word match - c#

I am seeking a way to search a string for an exact match or whole word match. RegEx.Match and RegEx.IsMatch don't seem to get me where I want to be. Consider the following scenario:
namespace test
{
class Program
{
static void Main(string[] args)
{
string str = "SUBTOTAL 34.37 TAX TOTAL 37.43";
int indx = str.IndexOf("TOTAL");
string amount = str.Substring(indx + "TOTAL".Length, 10);
string strAmount = Regex.Replace(amount, "[^.0-9]", "");
Console.WriteLine(strAmount);
Console.WriteLine("Press any key to continue...");
Console.ReadKey();
}
}
}
The output of the above code is:
// 34.37
// Press any key to continue...
The problem is, I don't want SUBTOTAL, but IndexOf finds the first occurrence of the word TOTAL which is in SUBTOTAL which then yields the incorrect value of 34.37.
So the question is, is there a way to force IndexOf to find only an exact match or is there another way to force that exact whole word match so that I can find the index of that exact match and then perform some useful function with it. RegEx.IsMatch and RegEx.Match are, as far as I can tell, simply boolean searches. In this case, it isn't enough to just know the exact match exists. I need to know where it exists in the string.
Any advice would be appreciated.

You can use Regex
string str = "SUBTOTAL 34.37 TAX TOTAL 37.43";
var indx = Regex.Match(str, #"\WTOTAL\W").Index; // will be 18

My method is faster than the accepted answer because it does not use Regex.
string str = "SUBTOTAL 34.37 TAX TOTAL 37.43";
var indx = str.IndexOfWholeWord("TOTAL");
public static int IndexOfWholeWord(this string str, string word)
{
for (int j = 0; j < str.Length &&
(j = str.IndexOf(word, j, StringComparison.Ordinal)) >= 0; j++)
if ((j == 0 || !char.IsLetterOrDigit(str, j - 1)) &&
(j + word.Length == str.Length || !char.IsLetterOrDigit(str, j + word.Length)))
return j;
return -1;
}

You can use word boundaries, \b, and the Match.Index property:
var text = "SUBTOTAL 34.37 TAX TOTAL 37.43";
var idx = Regex.Match(text, #"\bTOTAL\b").Index;
// => 19
See the C# demo.
The \bTOTAL\b matches TOTAL when it is not enclosed with any other letters, digits or underscores.
If you need to count a word as a whole word if it is enclosed with underscores, use
var idx = Regex.Match(text, #"(?<![^\W_])TOTAL(?![^\W_])").Index;
where (?<![^\W_]) is a negative lookbehind that fails the match if there is a character other than a non-word and underscore immediately to the left of the current location (so, there can be a start of string position, or a char that is a not a digit nor letter), and (?![^\W_]) is a similar negative lookahead that only matches if there is an end of string position or a char other than a letter or digit immediately to the right of the current location.
If the boundaries are whitespaces or start/end of string use
var idx = Regex.Match(text, #"(?<!\S)TOTAL(?!\S)").Index;
where (?<!\S) requires start of string or a whitespace immediately on the left, and (?!\S) requires the end of string or a whitespace on the right.
NOTE: \b, (?<!...) and (?!...) are non-consuming patterns, that is the regex index does not advance when matching these patterns, thus, you get the exact positions of the word you search for.

To make the accepted answer a little bit safer (since IndexOf returns -1 for unmatched):
string pattern = String.Format(#"\b{0}\b", findTxt);
Match mtc = Regex.Match(queryTxt, pattern);
if (mtc.Success)
{
return mtc.Index;
}
else
return -1;

While this may be a hack that just works for only your example, try
string amount = str.Substring(indx + " TOTAL".Length, 10);
giving an extra space before total. As this will not occur with SUBTOTAL, it should skip over the word you don't want and just look for an isolated TOTAL.

I'd recommend the Regex solution from L.B. too, but if you can't use Regex, then you could use String.LastIndexOf("TOTAL"). Assuming the TOTAL always comes after SUBTOTAL?
http://msdn.microsoft.com/en-us/library/system.string.lastindexof(v=vs.110).aspx

Related

regex match partial or whole word

I am trying to figure out a regular expression which can match either the whole word or a (predefined in length, e.g first 4 chars) part of the word.
For example, if I am trying to match the word between and my offset is set to 4, then
between betwee betwe betw
are matches, but not the
bet betweenx bet12 betw123 beta
I have created an example in regex101, where I am trying (with no luck) a combination of positive lookahead (?=) and a non-word boundary \B.
I found a similar question which proposes a word around in its accepted answer. As I understand, it overrides the matcher somehow, to run all the possible regular expressions, based on the word and an offset.
My code has to be written in C#, so I am trying to convert the aforementioned code. As I see Regex.Replace (and I assume Regex.Match also) can accept delegates to override the default functionality, but I can not make it work.

You could take the first 4 characters, and make the remaining ones optional.
Then wrap these in word boundaries and parenthesis.
So in the case of "between", it would be
#"\b(betw)(?:(e|ee|een)?)\b"
The code to achieve that would be:
public string createRegex(string word, int count)
{
var mandatory = word.Substring(0, count);
var optional = "(" + String.Join("|", Enumerable.Range(1, count - 1).Select(i => word.Substring(count, i))) + ")?";
var regex = #"\b(" + mandatory + ")(?:" + optional + #")\b";
return regex;
}

The code in the answer you mentioned simply builds up this:
betw|betwe|betwee|between
So all you need is to write a function, to build up a string with a substrings of given word given minimum length.
static String BuildRegex(String word, int min_len)
{
String toReturn = "";
for(int i = 0; i < word.Length - min_len +1; i++)
{
toReturn += word.Substring(0, min_len+i);
toReturn += "|";
}
toReturn = toReturn.Substring(0, toReturn.Length-1);
return toReturn;
}
Demo

You can use this regex
\b(bet(?:[^\s]){1,4})\b
And replace bet and the 4 dynamically like this:
public static string CreateRegex(string word, int minLen)
{
string token = word.Substring(0, minLen - 1);
string pattern = #"\b(" + token + #"(?:[^\s]){1," + minLen + #"})\b";
return pattern;
}
Here's a demo: https://regex101.com/r/lH0oL2/1
EDIT: as for the bet1122 match, you can edit the pattern this way:
\b(bet(?:[^\s0-9]){1,4})\b
If you don't want to match some chars, just enqueue them into the [] character class.
Demo: https://regex101.com/r/lH0oL2/2
For more info, see http://www.regular-expressions.info/charclass.html

For looping and string

So I have a string, and in it, I want to replace last 3 chars with a dot. I did something but my result is not what I wanted it to be.
Here is my code:
string word = "To je";
for (int k = word.Length; k > (word.Length) - 3; k--)
{
string newWord = word.Replace(word[k - 1], '.');
Console.WriteLine(newWord);
}
The output I get is:
To j.
To .e
To.je But the output I want is: To... How do I get there? So the program is doing something similar to what I actually want it to do, but not quite. I've really been struggling with this and any help would be appreciated.

Look at this:
string newWord = word.Replace(word[k - 1], '.');
You're always replacing a single character from word... but word itself doesn't change, so on the next iteration the replacement has "gone".
You could use:
word = word.Replace(word[k - 1], '.');
(And then move the output to the end, just writing out word.)
However, note that this will replace all occurrences of any of the last three characters with a ..
The simplest way to fix all of this is to use Substring of course, but if you really want to loop, you could use a StringBuilder:
StringBuilder builder = new StringBuilder(word);
for (int k = word.Length; k > (word.Length) - 3; k--)
{
builder[k - 1] = '.';
}
word = builder.ToString();

You're replacing all instances of the character at each of those three last positions with a period. You only want to replace that one character at the end. "aaaaa" shouldn't become "....." but rather "aa...".
You're printing out newWord after calculating an intermediate value and then never doing anything with it, leaving word unchanged. You'll want to assign it back to word, after correctly adjusting the character in question.
Of course the far easier solution (both for you, and for the computer) is to simply concat a substring of the string you have that excludes the last three characters with three periods.

Assuming the string is always at least 3 characters, you could substring everything but the last three characters and then append the three dots (periods) to the end of that string.
string word = "To je";
string newWord = word.Substring(0, word.Length - 3); // Grabs everything but the last three chars
newWord += "..."; // Appends three dots at the end of the new string
Console.WriteLine(newWord);
Note: this assumes that the input string word is at least three characters. If you were to supply a shorter string, you would need to supply an extra check on the string's length.

If you don't need the original word afterwards
Using Jon Skeets method
string word = "To je";
word = word.Substring(0,word.Length - 3);
word += "...";
Console.WriteLine(word);

as #jon-skeet said, you could do this with substrings. Here are 3 ways that you could do this with substring.
You could use String.Concat
string word = "To je";
word = String.Concat(word.Substring(0,word.Length-3),"...");
Console.WriteLine(word);
You could use the + operator
string word2 = "To je";
word2 = word2.Substring(0,word.Length-3) + "...";
Console.WriteLine(word2);
You could use String.Format
string word3 = "To je";
word3 = String.Format("{0}...",word3.Substring(0,word.Length-3));
Console.WriteLine(word3);

I'm a bit late to the party but all of the other solutions posted so far do not elegantly handle the case that the string is shorter than the requested number of replacements, or an arbitrary number of replacements. Here is a general function for replacing the last n characters at the end of a string with a user specified value:
static String replace_last_n(String s, int nchars, char replacement='.')
{
nchars = Math.Min(s.Length, nchars > 0 ? nchars : 0);
return s.Substring(0, s.Length - nchars) + new String(replacement, nchars);
}
Console.WriteLine(replace_last_n("wow wow wow", 3));
Console.WriteLine(replace_last_n("wow", 3, 'x'));
Console.WriteLine(replace_last_n("", 3));
Console.WriteLine(replace_last_n("w", 3));
Console.WriteLine(replace_last_n("wow", 0));
Console.WriteLine(replace_last_n("wow", -2));
Console.WriteLine(replace_last_n("wow", 33, '-'));
Output:
wow wow ...
xxx
.
wow
wow
---

if (word.Length > 3)
Console.WriteLine(word.substring(0, word.Length - 3) + "...");
or something along those lines, no need for a loop!

Get partial string from string

I have the following string:
This isMyTest testing
I want to get isMyTest as a result. I only have two first characters available("is"). The rest of the word can vary.
Basically, I need to select a first word delimeted by spaces which starts with chk.
I've started with the following:
if (text.contains(" is"))
{
text.LastIndexOf(" is"); //Should give me index.
}
now I cannot find the right bound of the word since I need to match on something like

You can use regular expressions:
string pattern = #"\bis";
string input = "This isMyTest testing";
return Regex.Matches(input, pattern);

You can use IndexOf to get the index of the next space:
int startPosition = text.LastIndexOf(" is");
if (startPosition != -1)
{
int endPosition = text.IndexOf(' ', startPosition + 1); // Find next space
if (endPosition == -1)
endPosition = text.Length - 1; // Select end if this is the last word?
}

What about using a regex match? Generally if you are searching for a pattern in a string (ie starting with a space followed by some other character) regex are perfectly suited to this. Regex statements really only fall apart in contextually sensitive areas (such as HTML) but are perfect for a regular string search.
// First we see the input string.
string input = "/content/alternate-1.aspx";
// Here we call Regex.Match.
Match match = Regex.Match(input, #"[ ]is[A-z0-9]*", RegexOptions.IgnoreCase);
// Here we check the Match instance.
if (match.Success)
{
// Finally, we get the Group value and display it.
string key = match.Groups[1].Value;
Console.WriteLine(key);
}

Regular expression to split long strings in several lines

I'm not an expert in regular expressions and today in my project I face the need to split long string in several lines in order to check if the string text fits the page height.
I need a C# regular expression to split long strings in several lines by "\n", "\r\n" and keeping 150 characters by line maximum. If the character 150 is in the middle of an word, the entire word should be move to the next line.
Can any one help me?

It's actually a quite simple problem. Look for any characters up to 150, followed by a space. Since Regex is greedy by nature it will do exactly what you want it to. Replace it by the Match plus a newline:
.{0,150}(\s+|$)
Replace with
$0\r\n
See also: http://regexhero.net/tester/?id=75645133-1de2-4d8d-a29d-90fff8b2bab5

var regex = new Regex(#".{0,150}", RegexOptions.Multiline);
var strings = regex.Replace(sourceString, "$0\r\n");

Here you go:
^.{1,150}\n
This will match the longest initial string like this.

if you just want to split a long string into lines of 150 chars then I'm not sure why you'd need a regular expression:
private string stringSplitter(string inString)
{
int lineLength = 150;
StringBuilder sb = new StringBuilder();
while (inString.Length > 0)
{
var curLength = inString.Length >= lineLength ? lineLength : inString.Length;
var lastGap = inString.Substring(0, curLength).LastIndexOfAny(new char[] {' ', '\n'});
if (lastGap == -1)
{
sb.AppendLine(inString.Substring(0, curLength));
inString = inString.Substring(curLength);
}
else
{
sb.AppendLine(inString.Substring(0, lastGap));
inString = inString.Substring(lastGap + 1);
}
}
return sb.ToString();
}
edited to account for word breaks

This code should help you. It will check the length of the current string. If it is greater than your maxLength (150) in this case, it will start at the 150th character and (going backwards) find the first non-word character (as described by the OP, this is a sequence of non-space characters). It will then store the string up to that character and start over again with the remaining string, repeating until we end up with a substring that is less than maxLength characters. Finally, join them all back together again in a final string.
string line = "This is a really long run-on sentence that should go for longer than 150 characters and will need to be split into two lines, but only at a word boundary.";
int maxLength = 150;
string delimiter = "\r\n";
List<string> lines = new List<string>();
// As long as we still have more than 'maxLength' characters, keep splitting
while (line.Length > maxLength)
{
// Starting at this character and going backwards, if the character
// is not part of a word or number, insert a newline here.
for (int charIndex = (maxLength); charIndex > 0; charIndex--)
{
if (char.IsWhiteSpace(line[charIndex]))
{
// Split the line after this character
// and continue on with the remainder
lines.Add(line.Substring(0, charIndex+1));
line = line.Substring(charIndex+1);
break;
}
}
}
lines.Add(line);
// Join the list back together with delimiter ("\r\n") between each line
string final = string.Join(delimiter , lines);
// Check the results
Console.WriteLine(final);
Note: If you run this code in a console application, you may want to change "maxLength" to a smaller number so that the console doesn't wrap on you.
Note: This code does not take into effect any tab characters. If tabs are also included, your situation gets a bit more complicated.
Update: I fixed a bug where new lines were starting with a space.

How to remove words based on a word count

Here is what I'm trying to accomplish. I have an object coming back from
the database with a string description. This description can be up to 1000
characters long, but we only want to display a short view of this. So I coded
up the following, but I'm having trouble in actually removing the number of
words after the regular expression finds the total count of words. Does anyone
have good way of dispalying the words which are less than the Regex.Matches?
Thanks!
if (!string.IsNullOrEmpty(myObject.Description))
{
string original = myObject.Description;
MatchCollection wordColl = Regex.Matches(original, #"[\S]+");
if (wordColl.Count < 70) // 70 words?
{
uxDescriptionDisplay.Text =
string.Format("<p>{0}</p>", myObject.Description);
}
else
{
string shortendText = original.Remove(200); // 200 characters?
uxDescriptionDisplay.Text =
string.Format("<p>{0}</p>", shortendText);
}
}
EDIT:
So this is what I got working on my own:
else
{
int count = 0;
StringBuilder builder = new StringBuilder();
string[] workingText = original.Split(' ');
foreach (string word in workingText)
{
if (count < 70)
{
builder.AppendFormat("{0} ", word);
}
count++;
}
string shortendText = builder.ToString();
}
It's not pretty, but it worked. I would call it a pretty naive way of doing this. Thanks for all of the suggestions!

I would opt to go by a strict character count rather than a word count because you might happen to have a lot of long words.
I might do something like (pseudocode)
if text.Length > someLimit
find first whitespace after someLimit (or perhaps last whitespace immediately before)
display substring of text
else
display text
Possible code implementation:
string TruncateText(string input, int characterLimit)
{
if (input.Length > characterLimit)
{
// find last whitespace immediately before limit
int whitespacePosition = input.Substring(0, characterLimit).LastIndexOf(" ");
// or find first whitespace after limit (what is spec?)
// int whitespacePosition = input.IndexOf(" ", characterLimit);
if (whitespacePosition > -1)
return input.Substring(0, whitespacePosition);
}
return input;
}

One method, if you're using at least C#3.0, would be a LINQ like the following. This is provided you're going strictly by word count, not character count.
if (wordColl.Count > 70)
{
foreach (var subWord in wordColl.Cast<Match>().Select(r => r.Value).Take(70))
{
//Build string here out of subWord
}
}
I did a test using a simple Console.WriteLine with your Regex and your question body (which is over 70 words, it turns out).

You can use Regex Capture Groups to hold the match and access it later.
For your application, I'd recommend instead simply splitting the string by spaces and returning the first n elements of the array:
if (!string.IsNullOrEmpty(myObject.Description))
{
string original = myObject.Description;
string[] words = original.Split(' ');
if (words.Length < 70)
{
uxDescriptionDisplay.Text =
string.Format("<p>{0}</p>", original);
}
else
{
string shortDesc = string.Empty;
for(int i = 0; i < 70; i++) shortDesc += words[i] + " ";
uxDescriptionDisplay.Text =
string.Format("<p>{0}</p>", shortDesc.Trim());
}
}

Are you wanting to remove 200 characters or start truncating at the 200th character? When you call original.Remove(200) you are indexing the start of the truncation at the 200th character. This is how you use Remove() for a certain number of characters to remove:
string shortendText = original.Remove(0,200);
This starts at the first character and removes 200 starting with that one. Which I imagine that's not what you're trying to do since you're shortening a description. That's merely the correct way to use Remove().
Instead of using Regex matchcollections why not just split the string? It's a lot easier and straight forward. You can set the delimiter to a space character and split that way. Not sure if that completely fixes your need but it just might. I'm not sure what your data looks like in the description. But you split this way:
String[] wordArray = original.Split(' ');
From there you can determine the word count with wordArray's Length property value.

If I was you I would go by characters as you may have many one letter words or many long words in your text.
Go through until characters <= your limit, then either find the next space and then add these characters to a new string (possibly using the SubString method) or take these characters and add a few full stops, then make a new string The later could be unproffessional I suppose.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

string.IndexOf search for whole word match - c#

You can use Regex string str = "SUBTOTAL 34.37 TAX TOTAL 37.43"; var indx = Regex.Match(str, #"\WTOTAL\W").Index; // will be 18

To make the accepted answer a little bit safer (since IndexOf returns -1 for unmatched): string pattern = String.Format(#"\b{0}\b", findTxt); Match mtc = Regex.Match(queryTxt, pattern); if (mtc.Success) { return mtc.Index; } else return -1;

While this may be a hack that just works for only your example, try string amount = str.Substring(indx + " TOTAL".Length, 10); giving an extra space before total. As this will not occur with SUBTOTAL, it should skip over the word you don't want and just look for an isolated TOTAL.

I'd recommend the Regex solution from L.B. too, but if you can't use Regex, then you could use String.LastIndexOf("TOTAL"). Assuming the TOTAL always comes after SUBTOTAL? http://msdn.microsoft.com/en-us/library/system.string.lastindexof(v=vs.110).aspx

Related

regex match partial or whole word

For looping and string

Get partial string from string

Regular expression to split long strings in several lines

How to remove words based on a word count

Categories

Resources