String.IndexOf() returns unexpected index of string

String.IndexOf() returns unexpected index of string - c#

String.IndexOf() method is not acting as I expected.
I expected it not to find a match, since the exact word you is not in str.
string str = "I am your Friend";
int index = str.IndexOf("you",0,StringComparison.OrdinalIgnoreCase);
Console.WriteLine(index);
Output: 5
My Expected Result is -1 because the string doesn't contain you.

The issue you're facing is because IndexOf matches a single character, or sequence of characters (a search string) within the greater string. Therefore "I am your friend" contains the sequence "you". To match words only, you have to consider things at a word level.
For example, you could use Regular Expressions' to match around the word boundaries:
private static int IndexOfWord(string val, int startAt, string search)
{
// escape the match expression in case it contains any characters meaningful
// to regular expressions, and then create an expression with the \b boundary
// characters
var escapedMatch = string.Format(#"\b{0}\b", Regex.Escape(search));
// create a case-sensitive regular expression object using the pattern
var exp = new Regex(escapedMatch, RegexOptions.IgnoreCase);
// perform the match from the start position
var match = exp.Match(val, startAt);
// if it's successful, return the match index
if (match.Success)
{
return match.Index;
}
// if it's unsuccessful, return -1
return -1;
}
// overload without startAt, for when you just want to start from the beginning
private static int IndexOfWord(string val, string search)
{
return IndexOfWord(val, 0, search);
}
In your example you would try to match \byou\b, which because of the boundary requirements won't match your.
Try it online
See more about word boundaries in Regular Expressions here.

you is a valid substring of I am your Friend. If you would like to find if a word is in a string you can parse the string with Split method.
char[] delimiterChars = { ' ', ',', '.', ':', '\t' };
string[] words = text.Split(delimiterChars);
And then look inside the array. Or turn it into more lookup-friendly data structure.
If you would like to search case-insensitive you can use the following code:
char[] delimiterChars = { ' ', ',', '.', ':', '\t' };
string text = "I am your Friend";
// HasSet allows faster lookups in case of big strings
var words = text.Split(delimiterChars).ToHashSet(StringComparer.OrdinalIgnoreCase);
Console.WriteLine(words.Contains("you"));
Console.WriteLine(words.Contains("friend"));
False
True
Creating dictionary as in the following code-snippet you can quickly check all positions for all words.
char[] delimiterChars = { ' ', ',', '.', ':', '\t' };
string text = "i am your friend. I Am Your Friend.";
var words = text.Split(delimiterChars);
var dict = new Dictionary<string, List<int>>(StringComparer.InvariantCultureIgnoreCase);
for (int i = 0; i < words.Length; ++i)
{
if (dict.ContainsKey(words[i])) dict[words[i]].Add(i);
else dict[words[i]] = new List<int>() { i };
}
Console.WriteLine("youR: ");
dict["youR"].ForEach(i => Console.WriteLine("\t{0}", i));
Console.WriteLine("friend");
dict["friend"].ForEach(i => Console.WriteLine("\t{0}", i));
youR:
2
7
friend
3
8

Related

C# IndexOf, when word is part of another word, How to?

let's say I have string "soak oak" and I want to have string index of ("oak"), it returns me the index of where "oak" starts in "soak" (1) but I want to find index of exact word "oak" (5), what do I need to do?
string text = "soak oak";
char[] seperators = {' ', '.', ',', '!', '?', ':',
';', '(', ')', '\t', '\r', '\n', '"', '„', '“'};
string[] parts = text.Split(seperators,
StringSplitOptions.RemoveEmptyEntries);
text.IndexOf("oak"); // gets '1' because "oak" is in "soak"
// but I want to get 5 because of exact word "oak"

Regex approach
string text = "soak oak";
int result = Regex.Match(text, #"\boak\b").Index;

You may use below regex to find exact word in your string.
string text = "soak oak";
string searchText = "oak";
var index = Regex.Match(text, #"\b" + Regex.Escape(searchText) + #"\b").Index;
Output:
5
See the demo

We can test indexes (IndexOf) in a loop:
static HashSet<char> s_Separtors = new HashSet<char>() {
' ', '.', ',', '!', '?', ':', ';', '(', ')', '\t', '\r', '\n', '"', '„', '“'
};
private static int WordIndexOf(string source, string toFind) {
if (string.IsNullOrEmpty(source) || string.IsNullOrEmpty(toFind))
return -1;
for (int index = source.IndexOf(toFind);
index >= 0;
index = source.IndexOf(toFind, index + 1)) {
if (index < 0)
return -1;
if ((index == 0 || s_Separtors.Contains(source[index - 1])) &&
(index >= source.Length - toFind.Length ||
s_Separtors.Contains(source[index + toFind.Length])))
return index;
}
return -1;
}
Demo:
// 5
Console.Write(WordIndexOf("soak oak", "oak"));

You can use regular expressions, you may also find it useful to use word boundaries defined by regular expressions:
string text = "soak oak";
var pattern = #"\boak\b";
var regex = new Regex(pattern);
foreach(Match m in regex.Matches(text)){
Console.WriteLine(m.Index);
Console.WriteLine(m.Value);
}

You could find the string in your array by converting it to a list and using the IndexOf() method.
parts.ToList().IndexOf("oak");
That tells you which array item it is, rather than the index in the original string.

Another RegEx approach-
string text = "soak oak";
var match = Regex.Match(text, #"\s[oak]");
if (match.Success)
{
Console.WriteLine(match.Index); // 4
}
\s White space
Hope it helps.

How can I convert text to Pascal case?

I have a variable name, say "WARD_VS_VITAL_SIGNS", and I want to convert it to Pascal case format: "WardVsVitalSigns"
WARD_VS_VITAL_SIGNS -> WardVsVitalSigns
How can I make this conversion?

You do not need a regular expression for that.
var yourString = "WARD_VS_VITAL_SIGNS".ToLower().Replace("_", " ");
TextInfo info = CultureInfo.CurrentCulture.TextInfo;
yourString = info.ToTitleCase(yourString).Replace(" ", string.Empty);
Console.WriteLine(yourString);

Here is my quick LINQ & regex solution to save someone's time:
using System;
using System.Linq;
using System.Text.RegularExpressions;
public string ToPascalCase(string original)
{
Regex invalidCharsRgx = new Regex("[^_a-zA-Z0-9]");
Regex whiteSpace = new Regex(#"(?<=\s)");
Regex startsWithLowerCaseChar = new Regex("^[a-z]");
Regex firstCharFollowedByUpperCasesOnly = new Regex("(?<=[A-Z])[A-Z0-9]+$");
Regex lowerCaseNextToNumber = new Regex("(?<=[0-9])[a-z]");
Regex upperCaseInside = new Regex("(?<=[A-Z])[A-Z]+?((?=[A-Z][a-z])|(?=[0-9]))");
// replace white spaces with undescore, then replace all invalid chars with empty string
var pascalCase = invalidCharsRgx.Replace(whiteSpace.Replace(original, "_"), string.Empty)
// split by underscores
.Split(new char[] { '_' }, StringSplitOptions.RemoveEmptyEntries)
// set first letter to uppercase
.Select(w => startsWithLowerCaseChar.Replace(w, m => m.Value.ToUpper()))
// replace second and all following upper case letters to lower if there is no next lower (ABC -> Abc)
.Select(w => firstCharFollowedByUpperCasesOnly.Replace(w, m => m.Value.ToLower()))
// set upper case the first lower case following a number (Ab9cd -> Ab9Cd)
.Select(w => lowerCaseNextToNumber.Replace(w, m => m.Value.ToUpper()))
// lower second and next upper case letters except the last if it follows by any lower (ABcDEf -> AbcDef)
.Select(w => upperCaseInside.Replace(w, m => m.Value.ToLower()));
return string.Concat(pascalCase);
}
Example output:
"WARD_VS_VITAL_SIGNS" "WardVsVitalSigns"
"Who am I?" "WhoAmI"
"I ate before you got here" "IAteBeforeYouGotHere"
"Hello|Who|Am|I?" "HelloWhoAmI"
"Live long and prosper" "LiveLongAndProsper"
"Lorem ipsum dolor..." "LoremIpsumDolor"
"CoolSP" "CoolSp"
"AB9CD" "Ab9Cd"
"CCCTrigger" "CccTrigger"
"CIRC" "Circ"
"ID_SOME" "IdSome"
"ID_SomeOther" "IdSomeOther"
"ID_SOMEOther" "IdSomeOther"
"CCC_SOME_2Phases" "CccSome2Phases"
"AlreadyGoodPascalCase" "AlreadyGoodPascalCase"
"999 999 99 9 " "999999999"
"1 2 3 " "123"
"1 AB cd EFDDD 8" "1AbCdEfddd8"
"INVALID VALUE AND _2THINGS" "InvalidValueAnd2Things"

First off, you are asking for title case and not camel-case, because in camel-case the first letter of the word is lowercase and your example shows you want the first letter to be uppercase.
At any rate, here is how you could achieve your desired result:
string textToChange = "WARD_VS_VITAL_SIGNS";
System.Text.StringBuilder resultBuilder = new System.Text.StringBuilder();
foreach(char c in textToChange)
{
// Replace anything, but letters and digits, with space
if(!Char.IsLetterOrDigit(c))
{
resultBuilder.Append(" ");
}
else
{
resultBuilder.Append(c);
}
}
string result = resultBuilder.ToString();
// Make result string all lowercase, because ToTitleCase does not change all uppercase correctly
result = result.ToLower();
// Creates a TextInfo based on the "en-US" culture.
TextInfo myTI = new CultureInfo("en-US",false).TextInfo;
result = myTI.ToTitleCase(result).Replace(" ", String.Empty);
Note: result is now WardVsVitalSigns.
If you did, in fact, want camel-case, then after all of the above, just use this helper function:
public string LowercaseFirst(string s)
{
if (string.IsNullOrEmpty(s))
{
return string.Empty;
}
char[] a = s.ToCharArray();
a[0] = char.ToLower(a[0]);
return new string(a);
}
So you could call it, like this:
result = LowercaseFirst(result);

Single semicolon solution:
public static string PascalCase(this string word)
{
return string.Join("" , word.Split('_')
.Select(w => w.Trim())
.Where(w => w.Length > 0)
.Select(w => w.Substring(0,1).ToUpper() + w.Substring(1).ToLower()));
}

Extension method for System.String with .NET Core compatible code by using System and System.Linq.
Does not modify the original string.
.NET Fiddle for the code below
using System;
using System.Linq;
public static class StringExtensions
{
/// <summary>
/// Converts a string to PascalCase
/// </summary>
/// <param name="str">String to convert</param>
public static string ToPascalCase(this string str){
// Replace all non-letter and non-digits with an underscore and lowercase the rest.
string sample = string.Join("", str?.Select(c => Char.IsLetterOrDigit(c) ? c.ToString().ToLower() : "_").ToArray());
// Split the resulting string by underscore
// Select first character, uppercase it and concatenate with the rest of the string
var arr = sample?
.Split(new []{'_'}, StringSplitOptions.RemoveEmptyEntries)
.Select(s => $"{s.Substring(0, 1).ToUpper()}{s.Substring(1)}");
// Join the resulting collection
sample = string.Join("", arr);
return sample;
}
}
public class Program
{
public static void Main()
{
Console.WriteLine("WARD_VS_VITAL_SIGNS".ToPascalCase()); // WardVsVitalSigns
Console.WriteLine("Who am I?".ToPascalCase()); // WhoAmI
Console.WriteLine("I ate before you got here".ToPascalCase()); // IAteBeforeYouGotHere
Console.WriteLine("Hello|Who|Am|I?".ToPascalCase()); // HelloWhoAmI
Console.WriteLine("Live long and prosper".ToPascalCase()); // LiveLongAndProsper
Console.WriteLine("Lorem ipsum dolor sit amet, consectetur adipiscing elit.".ToPascalCase()); // LoremIpsumDolorSitAmetConsecteturAdipiscingElit
}
}

var xs = "WARD_VS_VITAL_SIGNS".Split('_');
var q =
from x in xs
let first_char = char.ToUpper(x[0])
let rest_chars = new string(x.Skip(1).Select(c => char.ToLower(c)).ToArray())
select first_char + rest_chars;

Some answers are correct but I really don't understand why they set the text to LowerCase first, because the ToTitleCase will handle that automatically:
var text = "WARD_VS_VITAL_SIGNS".Replace("_", " ");
TextInfo textInfo = CultureInfo.CurrentCulture.TextInfo;
text = textInfo.ToTitleCase(text).Replace(" ", string.Empty);
Console.WriteLine(text);

You can use this:
public static string ConvertToPascal(string underScoreString)
{
string[] words = underScoreString.Split('_');
StringBuilder returnStr = new StringBuilder();
foreach (string wrd in words)
{
returnStr.Append(wrd.Substring(0, 1).ToUpper());
returnStr.Append(wrd.Substring(1).ToLower());
}
return returnStr.ToString();
}

This answer understands that there are Unicode categories which can be tapped while processing the text to ignore the connecting characters such as - or _. In regex parlance it is \p (for category) then the type which is {Pc} for punctuation and connector type character; \p{Pc} using our MatchEvaluator which is kicked off for each match within a session.
So during the match phase, we get words and ignore the punctuations, so the replace operation handles the removal of the connector character. Once we have the match word, we can push it down to lowercase and then only up case the first character as the return for the replace:
public static class StringExtensions
{
public static string ToPascalCase(this string initial)
=> Regex.Replace(initial,
// (Match any non punctuation) & then ignore any punctuation
#"([^\p{Pc}]+)[\p{Pc}]*",
new MatchEvaluator(mtch =>
{
var word = mtch.Groups[1].Value.ToLower();
return $"{Char.ToUpper(word[0])}{word.Substring(1)}";
}));
}
Usage:
"TOO_MUCH_BABY".ToPascalCase(); // TooMuchBaby
"HELLO|ITS|ME".ToPascalCase(); // HelloItsMe
See Word Character in Character Classes in Regular Expressions
Pc Punctuation, Connector. This category includes ten characters, the
most commonly used of which is the LOWLINE character (_), u+005F.

If you did want to replace any formatted string into a pascal case then you can do
public static string ToPascalCase(this string original)
{
string newString = string.Empty;
bool makeNextCharacterUpper = false;
for (int index = 0; index < original.Length; index++)
{
char c = original[index];
if(index == 0)
newString += $"{char.ToUpper(c)}";
else if (makeNextCharacterUpper)
{
newString += $"{char.ToUpper(c)}";
makeNextCharacterUpper = false;
}
else if (char.IsUpper(c))
newString += $" {c}";
else if (char.IsLower(c) || char.IsNumber(c))
newString += c;
else if (char.IsNumber(c))
newString += $"{c}";
else
{
makeNextCharacterUpper = true;
newString += ' ';
}
}
return newString.TrimStart().Replace(" ", "");
}
Tested with strings
I|Can|Get|A|String
ICan_GetAString
i-can-get-a-string
i_can_get_a_string
I Can Get A String
ICanGetAString

I found this gist useful after adding a ToLower() to it.
"WARD_VS_VITAL_SIGNS"
.ToLower()
.Split(new [] {"_"}, StringSplitOptions.RemoveEmptyEntries)
.Select(s => char.ToUpperInvariant(s[0]) + s.Substring(1, s.Length - 1))
.Aggregate(string.Empty, (s1, s2) => s1 + s2)

Use more than one character to replace other

I have a app and in this app it is possible with a function to replace some characters in a word with a other character
var newCharacter = "H";
if (/*something happens here and than the currentCharacter will be replaced*/)
{
// Replace the currentCharacter in the word with a random newCharacter.
wordString = wordString.Replace(currentCharacter, newCharacter);
}
now all the characters will be replaced with the code above with the "H". But i want more letters so by example the H, E, A, S
what is the best way to do this?
When i do this:
var newCharacter = "H" + "L" + "S";
it replaced the currentCharacter with H AND L AND S but i just want it to replace with H OR L OR S not all three
so if you have a word with HELLO and you want to replace the O with the newCharacter my output now is HELLHLS
O -> HLS
but O needs to be -> H or L or S

Here is a way to do using LINQ.You can add the characters you want to remove in the array excpChar
char[] excpChar= new[] { 'O','N' };
string word = "LONDON";
var result = excpChar.Select(ch => word = word.Replace(ch.ToString(), ""));
Console.WriteLine(result.Last());

The Replace function replaces all the occurences at once, this is not what we want. Let's do a ReplaceFirst function, only replacing the first occurence (one could make an extension method out of this):
static string ReplaceFirst(string word, char find, char replacement)
{
int location = word.IndexOf(find);
if (location > -1)
return word.Substring(0, location) + replacement + word.Substring(location + 1);
else
return word;
}
Then we can use a random generator to replace the target letter with different letters through successive calls of ReplaceFirst:
string word = "TpqsdfTsqfdTomTmeT";
char find = 'T';
char[] replacements = { 'H', 'E', 'A', 'S' };
Random random = new Random();
while (word.Contains(find))
word = ReplaceFirst(word, find, replacements[random.Next(replacements.Length)]);
word now may be EpqsdfSsqfdEomHmeS or SpqsdfSsqfdHomHmeE or ...

You can do like following :
string test = "abcde";
var result = ChangeFor(test, new char[] {'b', 'c'}, 'z');
// result = "azzde"
with ChangeFor :
private string ChangeFor(string input, IEnumerable<char> before, char after)
{
string result = input;
foreach (char c in before)
{
result = result.Replace(c, after);
}
return result;
}

Get first 250 words of a string?

How do I get the first 250 words of a string?

You need to split the string. You can use the overload without parameter(whitespaces are assumed).
IEnumerable<string> words = str.Split().Take(250);
Note that you need to add using System.Linq for Enumerable.Take.
You can use ToList() or ToArray() ro create a new collection from the query or save memory and enumerate it directly:
foreach(string word in words)
Console.WriteLine(word);
Update
Since it seems to be quite popular I'm adding following extension which is more efficient than the Enumerable.Take approach and also returns a collection instead of the (deferred executed) query.
It uses String.Split where white-space characters are assumed to be the delimiters if the separator parameter is null or contains no characters. But the method also allows to pass different delimiters:
public static string[] GetWords(
this string input,
int count = -1,
string[] wordDelimiter = null,
StringSplitOptions options = StringSplitOptions.None)
{
if (string.IsNullOrEmpty(input)) return new string[] { };
if(count < 0)
return input.Split(wordDelimiter, options);
string[] words = input.Split(wordDelimiter, count + 1, options);
if (words.Length <= count)
return words; // not so many words found
// remove last "word" since that contains the rest of the string
Array.Resize(ref words, words.Length - 1);
return words;
}
It can be used easily:
string str = "A B C D E F";
string[] words = str.GetWords(5, null, StringSplitOptions.RemoveEmptyEntries); // A,B,C,D,E

yourString.Split(' ').Take(250);
I guess. You should provide more info.

String.Join(" ", yourstring.Split().Take(250).ToArray())

Addition to Tim answer, this is what you can try
IEnumerable<string> words = str.Split().Take(250);
StringBuilder firstwords = new StringBuilder();
foreach(string s in words)
{
firstwords.Append(s + " ");
}
firstwords.Append("...");
Console.WriteLine(firstwords.ToString());

Try this one:
public string TakeWords(string str,int wordCount)
{
char lastChar='\0';
int spaceFound=0;
var strLen= str.Length;
int i=0;
for(; i<strLen; i++)
{
if(str[i]==' ' && lastChar!=' ')
{
spaceFound++;
}
lastChar=str[i];
if(spaceFound==wordCount)
break;
}
return str.Substring(0,i);
}

It's possible without calling Take().
string[] separatedWords = stringToProcess.Split(new char[] {' '}, 250, StringSplitOptions.RemoveEmptyEntries);
Which also allows splitting based on more than just space " " and therefore fixes occurrences when spaces are incorrectly missing in input.
string[] separatedWords = stringToProcess.Split(new char[] {' ', '.', ',' ':', ';'}, 250, StringSplitOptions.RemoveEmptyEntries);
Use StringSplitOptions.None, if you want empty strings to be returned instead.

Using String Split

I have a text
Category2,"Something with ,comma"
when I split this by ',' it should give me two string
Category2
"Something with ,comma"
but in actual it split string from every comma.
how can I achieve my expected result.
Thanx

Just call variable.Split(new char[] { ',' }, 2). Complete documentation in MSDN.

There are a number of things that you could be wanting to do here so I will address a few:
Split on the first comma
String text = text.Split(new char[] { ',' }, 2);
Split on every comma
String text = text.Split(new char[] {','});
Split on a comma not in "
var result = Regex.Split(samplestring, ",(?=(?:[^']*'[^']*')*[^']*$)");
Last one taken from C# Regex Split

Specify the maximum number of strings you want in the array:
string[] parts = text.Split(new char[] { ',' }, 2);

String.Split works at the simplest, fastest level - so it splits the text on all of the delimiters you pass into it, and it has no concept of special rules like double-quotes.
If you need a CSV parser which understands double-quotes, then you can write your own or there are some excellent open source parsers available - e.g. http://www.codeproject.com/KB/database/CsvReader.aspx - this is one I've used in several projects and recommend.

Try this:
public static class StringExtensions
{
public static IEnumerable<string> SplitToSubstrings(this string str)
{
int startIndex = 0;
bool isInQuotes = false;
for (int index = 0; index < str.Length; index++ )
{
if (str[index] == '\"')
isInQuotes = !isInQuotes;
bool isStartOfNewSubstring = (!isInQuotes && str[index] == ',');
if (isStartOfNewSubstring)
{
yield return str.Substring(startIndex, index - startIndex).Trim();
startIndex = index + 1;
}
}
yield return str.Substring(startIndex).Trim();
}
}
Usage is pretty simple:
foreach(var str in text.SplitToSubstrings())
Console.WriteLine(str);

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

String.IndexOf() returns unexpected index of string - c#

Related

C# IndexOf, when word is part of another word, How to?

How can I convert text to Pascal case?

Use more than one character to replace other

Get first 250 words of a string?

Using String Split

Categories

Resources