I'm wondering how I can replace (remove) multiple words (like 500+) from a string. I know I can use the replace function to do this for a single word, but what if I want to replace 500+ words? I'm interested in removing all generic keywords from an article (such as "and", "I", "you" etc).
Here is the code for 1 replacement.. I'm looking to do 500+..
string a = "why and you it";
string b = a.Replace("why", "");
MessageBox.Show(b);
Thanks
# Sergey Kucher Text size will vary between a few hundred words to a few thousand. I am replacing these words from random articles.
I would normally do something like:
// If you want the search/replace to be case sensitive, remove the
// StringComparer.OrdinalIgnoreCase
Dictionary<string, string> replaces = new Dictionary<string, string>(StringComparer.OrdinalIgnoreCase) {
// The format is word to be searched, word that should replace it
// or String.Empty to simply remove the offending word
{ "why", "xxx" },
{ "you", "yyy" },
};
void Main()
{
string a = "why and you it and You it";
// This will search for blocks of letters and numbers (abc/abcd/ab1234)
// and pass it to the replacer
string b = Regex.Replace(a, #"\w+", Replacer);
}
string Replacer(Match m)
{
string found = m.ToString();
string replace;
// If the word found is in the dictionary then it's placed in the
// replace variable by the TryGetValue
if (!replaces.TryGetValue(found, out replace))
{
// otherwise replace the word with the same word (so do nothing)
replace = found;
}
else
{
// The word is in the dictionary. replace now contains the
// word that will substitute it.
// At this point you could add some code to maintain upper/lower
// case between the words (so that if you -> xxx then You becomes Xxx
// and YOU becomes XXX)
}
return replace;
}
As someone else wrote, but without problems with substrings (the ass principle... You don't want to remove asses from classes :-) ), and working only if you only need to remove words:
var escapedStrings = yourReplaces.Select(Regex.Escape);
string result = Regex.Replace(yourInput, #"\b(" + string.Join("|", escapedStrings) + #")\b", string.Empty);
I use the \b word boundary... It's a little complex to explain what it's, but it's useful to find word boundaries :-)
Create a list of all text you want and load it into a list, you do this fairly simple or get very complex. A trivial example would be:
var sentence = "mysentence hi";
var words = File.ReadAllText("pathtowordlist.txt").Split(Enviornment.NewLine);
foreach(word in words)
sentence.replace("word", "x");
You could create two lists if you wanted a dual mapping scheme.
Try this:
string text = "word1 word2 you it";
List<string> words = new System.Collections.Generic.List<string>();
words.Add("word1");
words.Add("word2");
words.ForEach(w => text = text.Replace(w, ""));
Edit
If you want to replace text with another text, you can create class Word:
public class Word
{
public string SearchWord { get; set; }
public string ReplaceWord { get; set; }
}
And change above code to this:
string text = "word1 word2 you it";
List<Word> words = new System.Collections.Generic.List<Word>();
words.Add(new Word() { SearchWord = "word1", ReplaceWord = "replaced" });
words.Add(new Word() { SearchWord = "word2", ReplaceWord = "replaced" });
words.ForEach(w => text = text.Replace(w.SearchWord, w.ReplaceWord));
if you are talking about a single string the solution is to remove them all by a simple replace method. as you can read there:
"Returns a new string in which all occurrences of a specified string in the current instance are replaced with another specified string".
you may be needing to replace several words, and you can make a list of these words:
List<string> wordsToRemove = new List<string>();
wordsToRemove.Add("why");
wordsToRemove.Add("how);
and so on
and then remove them from the string
foreach(string curr in wordsToRemove)
a = a.ToLower().Replace(curr, "");
Importent
if you want to keep your string as it was, without lowering words and without struggling with lower and upper case use
foreach(string curr in wordsToRemove)
// You can reuse this object
Regex regex = new Regex(curr, RegexOptions.IgnoreCase);
myString = regex.Replace(myString, "");
depends on the situation ofcourse,
but if your text is long and you have many words,
and you want optimize performance.
you should build a trie from the words, and search the Trie for a match.
it won't lower the Order of complexity, still O(nm), but for large groups of words, it will be able to check multiple words against each char instead of one by one.
i can assume couple of houndred words should be enough to get this faster.
This is the fastest method in my opinion and
i written a function for you to start with:
public struct FindRecord
{
public int WordIndex;
public int PositionInString;
}
public static FindRecord[] FindAll(string input, string[] words)
{
LinkedList<FindRecord> result = new LinkedList<FindRecord>();
int[] matchs = new int[words.Length];
for (int i = 0; i < input.Length; i++)
{
for (int j = 0; j < words.Length; j++)
{
if (input[i] == words[j][matchs[j]])
{
matchs[j]++;
if(matchs[j] == words[j].Length)
{
FindRecord findRecord = new FindRecord {WordIndex = j, PositionInString = i - matchs[j] + 1};
result.AddLast(findRecord);
matchs[j] = 0;
}
}
else
matchs[j] = 0;
}
}
return result.ToArray();
}
Another option:
it might be the rare case where regex will be faster then building the code.
Try using
public static string ReplaceAll(string input, string[] words)
{
string wordlist = string.Join("|", words);
Regex rx = new Regex(wordlist, RegexOptions.Compiled);
return rx.Replace(input, m => "");
}
Regex can do this better, you just need all the replace words in a list, and then:
var escapedStrings = yourReplaces.Select(PadAndEscape);
string result = Regex.Replace(yourInput, string.Join("|", escapedStrings);
This requires a function that space-pads the strings before escaping them:
public string PadAndEscape(string s)
{
return Regex.Escape(" " + s + " ");
}
Related
I'm having some issues with replacing words in a string with values from a dictionary. Here's a small sample of my current code:
Dictionary<string, string> replacements = new Dictionary<string, string>()
{
{"ACFT", "AIRCRAFT"},
{"FT", "FEET"},
};
foreach(string s in replacements.Keys)
{
inputBox.Text = inputBox.Text.Replace(s, replacements[s]);
}
When I execute the code, if I have ACFT in the textbox, it is replaced with AIRCRAFEET because it sees the FT part in the string. I need to somehow differentiate this and only replace the whole word.
So for example, if I have ACFT in the box, it should replace it with AIRCRAFT. And, if I have FT in the box, replace it with FEET.
So my question is, how can I match whole words only when replacing words?
EDIT: I want to be able to use and replace multiple words.
use the if condition..
foreach(string s in replacements.Keys) {
if(inputBox.Text==s){
inputBox.Text = inputBox.Text.Replace(s, replacements[s]);
}
}
UPDATE after you modified your question..
string str = "ACFT FTT";
Dictionary<string, string> replacements = new Dictionary<string, string>()
{
{"ACFT", "AIRCRAFT"},
{"FT", "FEET"},
};
string[] temp = str.Split(' ');
string newStr = "";
for (int i = 0; i < temp.Length; i++)
{
try
{
temp[i] = temp[i].Replace(temp[i], replacements[temp[i]]);
}
catch (KeyNotFoundException e)
{
// not found..
}
newStr+=temp[i]+" ";
}
Console.WriteLine( newStr);
how can I match whole words only when replacing words?
Use regular expressions (as was suggested by David Pilkington)
Dictionary<string, string> replacements = new Dictionary<string, string>()
{
{"ACFT", "AIRCRAFT"},
{"FT", "FEET"},
};
foreach(string s in replacements.Keys)
{
var pattern = "\b" + s + "\b"; // match on word boundaries
inputBox.Text = Regex.Replace(inputBox.Text, pattern, replacements[s]);
}
However, if you have control over the design, I would much rather use keys like "{ACFT}","{FT}" (which have explicit boundaries), so you could just use them with String.Replace.
I think you may want to replace the max length subStr in inputText.
int maxLength = 0;
string reStr = "";
foreach (string s in replacements.Keys)
{
if (textBox2.Text.Contains(s))
{
if (maxLength < s.Length)
{
maxLength = s.Length;
reStr = s;
}
}
}
if (reStr != "")
textBox2.Text = textBox2.Text.Replace(reStr, replacements[reStr]);
The problem with this is that you are replacing every instance of the substring in the entire string. If what you want is to replace only whole, space-delimited instances of "ACFT" or "FT", you would want to use String.Split() to create a set of tokens.
For example:
string tempString = textBox1.Text;
StringBuilder finalString = new StringBuilder();
foreach (string word in tempString.Split(new char[] { ' ' })
{
foreach(string s in replacements.Keys)
{
finalString.Append(word.Replace(s, replacements[s]));
}
}
textBox1.Text = finalString.ToString();
I've used a StringBuilder here because concatenation requires the creation of a new string every single time, and this gets extremely inefficient over long periods. If you expect to have a small number of concatenations to make, you can probably get away with using string.
Note that there's a slight wrinkle in your design - if you have a KeyValuePair with a value that's identical to a key that occurs later in the dictionary iteration, the replacement will be overwritten.
Here's very funky way of doing this.
First up, you need to use regular expressions (Regex) as this has good built-in features for matching word boundaries.
So the key line of code would be to define a Regex instance:
var regex = new Regex(String.Format(#"\b{0}\b", Regex.Escape("ACFT"));
The \b marker looks for word boundaries. The Regex.Escape ensures that if any other your keys have special Regex characters that they are escaped out.
You could then replace the text like this:
var replacedtext = regex.Replace("A FT AFT", "FEET");
You would get replacedtext == "A FEET AFT".
Now, here's the funky part. If you start with your current dictionary then you can define a single function that will do all of the replacements in one go.
Do it this way:
Func<string, string> funcreplaceall =
replacements
.ToDictionary(
kvp => new Regex(String.Format(#"\b{0}\b", Regex.Escape(kvp.Key))),
kvp => kvp.Value)
.Select(kvp =>
(Func<string, string>)(x => kvp.Key.Replace(x, kvp.Value)))
.Aggregate((f0, f1) => x => f1(f0(x)));
Now you can just call it like so:
inputBox.Text = funcreplaceall(inputBox.Text);
No looping required!
Just as a sanity check I got this:
funcreplaceall("A ACFT FT RACFT B") == "A AIRCRAFT FEET RACFT B"
I have a list of names (cyclists) in order of Lastname, Firstname. I want to run code So it puts Lastname in front of Firstname. The Lastname is always written in uppercase and can contain one more values. So i decided to string split to array, that works. Only putting it together is hard.
here is my code so far: (tried it with for and foreach)
namespace ConsoleApplication1
{
class Program
{
static void Main(string[] args)
{
string fullName = "BELAMONTE VALVERDE Allechandro Jesus";
string[] words = fullName.Split(' ');
foreach (string word in words)
if (word.ToUpper() == word)
{
string lastname = string.Join(" ", word);
Console.WriteLine(word);
}
Console.ReadLine();
string fullName2 = "GONZALEZ GALDEANO Igor Anton";
string[] words2 = fullName2.Split(' ');
for (int i = 0; i < words2.Length; i++)
{
string word2 = words2[i];
if (word2.ToUpper() == word2)
{
string lastname2 = string.Join(" ", word2);
Console.WriteLine(lastname2);
}
}
Console.ReadLine();
}
}
}
It gives a output like
BELAMONTE VALVERDE
BELAMONTE VALVERDE
I want that to be on one line. The actual use wil be read a record from a table convert that
and Replace that for the loaded item.
The first thing you want to do is encapsulate the logic that's testing whether part of a string is uppercase:-
Detecting if a string is all CAPS
public bool IsAllUppercase(string value)
{
return value.All(x => x.IsUpper);
}
Then you want to encapsulate the logic that's extracting the uppercase part of your name
public string GetUppercasePart(string value)
{
return string.Join(" ", value.Split(" ").Where(x => IsAllUppercase(x));
}
Then getting the uppercase part of the name is really simple:-
var lastName = GetUppercasePart("BELAMONTE VALVERDE Allechandro Jesus");
I get the impression, though, that there's more to your problem than just getting all of the uppercase words in a string.
WARNING: If this is code for a production application that you're going to run anywhere other than your computer, then you want to take into account that IsUpper means different things in different locales. You might want to read up on how internationalisation concerns affect string manipulation:-
http://support.microsoft.com/kb/312890
In C# what is the difference between ToUpper() and ToUpperInvariant()?
C# String comparisons: Difference between CurrentCultureIgnoreCase and InvariantCultureIgnoreCase
If you know that lastname will be all UPPERCASED and will be in front of first name, you can use regex for parsing uppercased letters and the rest of the name.
This is the regex:
([A-Z\s]+) (.*)
This one will match uppercased words where a space can be between them, that's the \s
([A-Z\s]+)
This one will match the rest of the name
(.*)
So the final code for switching one name could look like this:
static void Main(string[] args)
{
string fullName = "BELAMONTE VALVERDE Allechandro Jesus";
string pattern = #"([A-Z\s]+) (.*)";
var parsedName = Regex.Match(fullName,pattern);
string firstName = parsedName.Groups[2].ToString();
string lastName = parsedName.Groups[1].ToString();
string result = firstName + " " + lastName;
}
You have a code design problem here:
foreach (string word in words)
if (word.ToUpper() == word)
{
string lastname = string.Join(" ", word);
Console.WriteLine(word);
}
What you want to do is to write the lastname once, right? So let's split the algorithm:
Get all words from the string: done string[] words = fullName.Split(' ');
Read first word, if it's uppercase, save it
Repeat 2 for the next word until it isn't uppercase
Join all the saved words
Print the result
We don't need to "save" the words thanks to a handy class named StringBuilder, so it would go like this:
string fullName = "BELAMONTE VALVERDE Allechandro Jesus";
string[] words = fullName.Split(' ');
StringBuilder sb = new StringBuilder();
foreach (string word in words)
if (word.ToUpper() == word)
{
sb.Append(word + " ");
}
else
break; // That's assuming you will never have a last name's part after a first name's part :p
if (sb.Length > 0)
sb.Length--; // removes the last " " added in the loop, but maybe you want it ;)
Console.WriteLine(sb.ToString());
string[] newArr = (from x in asda
select x.ToUpper()).ToArray();
I'm working with an array of strings, and would like to do the following:
//Regex regex; List<string> strList; List<string> strList2;
foreach (string str in strList){
if (regex.IsMatch(str)) { //only need in new array if matches...
strList2.Add(regex.Replace(str, myMatchEvaluator))
//but still have to apply transformation
}
}
Now, I know that works, but that effectively means running the same regex twice on each string in the array. Is there a way to collapse both of these steps - the filtering and the transformation - into one regex-parsing call?
(One that would work most of the time is
string str2 = regex.Replace(str, myMatchEvaluator);
if (str2 == str)
strList2.Add(str2);
But that would often throw out some valid matches that still didn't need replacement.)
EDIT: A regex example, roughly similar to mine, to illustrate why this is tricky:
Imagine looking for words at the beginning of lines in a log file, and wanting to capitalize them.
The regex would be new Regex("^[a-z]+", RegexOptions.IgnorePatternWhiteSpace), and the replacement function would be match => match.ToUpper().
Now some first words are already capitalized, and I don't want to throw them away. On the other hand, I don't want to upper-case all instances of the word on the line, just the first one.
you can create your own match evaluator:
private class DetectEvaluator {
public bool HasBeenAvaluated { get; private set }
private MatchEvaluator evaluator;
public DetectEvaluator(MatchEvaluator evaluator) {
HasBeenAvaluated = false;
this.evaluator = evaluator;
}
public string Evaluate(Match m) {
HasBeenAvaluated = true;
return evaluator(m);
}
}
and then create a new one for every of your checks:
var de1 = new DetectEvaluator(myMatchEvaluator);
string str2 = regex.Replace(str, de1.Evaluate);
if( de1.HasBeenEvaluated ) strList2.Add(str2);
but I do not see improved readability here.
You can use a lambda function as match evaluator that updates a list of words.
IEnumerable<string> Replaces(string source)
{
var rx = new Regex(#"\w+m", RegexOptions.IgnoreCase); // match words ending with 'm'
var result = new List<string>();
rx.Replace(source, m => { result.Add(m.ToString().ToUpper()); return m.ToString(); });
return result;
}
List<string> GetReplacements(List<string> sources) {
var rx = new Regex(#"\w+m", RegexOptions.IgnoreCase); // match words ending with 'm'.
var replacements = new List<string>(sources.Count); // no need to allocate more space than needed.
foreach(string source in sources)
// for each string in sources that matches 'rx', add the ToUpper() version to the result and replace 'source' with itself.
rx.Replace(source, m => {replacements.Add(m.ToString().ToUpper()); return m.ToString(); });
return replacements;
}
List<string> GetReplacements2(List<string> sources) {
var rx = new Regex(#"\w+m", RegexOptions.IgnoreCase); // match words ending with 'm'.
var replacements = new List<string>(sources.Count); // no need to allocate more space than needed.
foreach(string source in sources) {
var m = rx.Match(source); // do one rx match
if (m.Success) // if successfull
replacements.Add(m.ToString().ToUpper()); // add to result.
}
return replacements;
}
If you need to modify the original source and gather the unmodified matches then swap the parts in the lambda expression.
Would something like this work ?
foreach (string str in strList)
{
str = regex.Replace(str, delegate(Match thisMatch) {
// only gets here if matched the regex already
string str2 = yourReplacementFunction(thisMatch);
strList2.Add(str2);
return thisMatch.Value;
});
}
Building off of all of the answers I've received, the following works:
void AddToIfMatch(List<string> list, string str; Regex regex;
MatchEvaluator evaluator)
{
bool hasBeenEvaluated = false;
string str2 = regex.Replace(
str,
m => {HasBeenEvaluated = true; return evaluator(m);}
);
if( hasBeenEvaluated ) {list.Add(str2);}
}
I have a List of words I want to ignore like this one :
public List<String> ignoreList = new List<String>()
{
"North",
"South",
"East",
"West"
};
For a given string, say "14th Avenue North" I want to be able to remove the "North" part, so basically a function that would return "14th Avenue " when called.
I feel like there is something I should be able to do with a mix of LINQ, regex and replace, but I just can't figure it out.
The bigger picture is, I'm trying to write an address matching algorithm. I want to filter out words like "Street", "North", "Boulevard", etc. before I use the Levenshtein algorithm to evaluate the similarity.
How about this:
string.Join(" ", text.Split().Where(w => !ignoreList.Contains(w)));
or for .Net 3:
string.Join(" ", text.Split().Where(w => !ignoreList.Contains(w)).ToArray());
Note that this method splits the string up into individual words so it only removes whole words. That way it will work properly with addresses like Northampton Way #123 that string.Replace can't handle.
Regex r = new Regex(string.Join("|", ignoreList.Select(s => Regex.Escape(s)).ToArray()));
string s = "14th Avenue North";
s = r.Replace(s, string.Empty);
Something like this should work:
string FilterAllValuesFromIgnoreList(string someStringToFilter)
{
return ignoreList.Aggregate(someStringToFilter, (str, filter)=>str.Replace(filter, ""));
}
What's wrong with a simple for loop?
string street = "14th Avenue North";
foreach (string word in ignoreList)
{
street = street.Replace(word, string.Empty);
}
If you know that the list of word contains only characters that do not need escaping inside a regular expression then you can do this:
string s = "14th Avenue North";
Regex regex = new Regex(string.Format(#"\b({0})\b",
string.Join("|", ignoreList.ToArray())));
s = regex.Replace(s, "");
Result:
14th Avenue
If there are special characters you will need to fix two things:
Use Regex.Escape on each element of ignore list.
The word-boundary \b will not match a whitespace followed by a symbol or vice versa. You may need to check for whitespace (or other separating characters such as punctuation) using lookaround assertions instead.
Here's how to fix these two problems:
Regex regex = new Regex(string.Format(#"(?<= |^)({0})(?= |$)",
string.Join("|", ignoreList.Select(x => Regex.Escape(x)).ToArray())));
If it's a short string as in your example, you can just loop though the strings and replace one at a time. If you want to get fancy you can use the LINQ Aggregate method to do it:
address = ignoreList.Aggregate(address, (a, s) => a.Replace(s, String.Empty));
If it's a large string, that would be slow. Instead you can replace all strings in a single run through the string, which is much faster. I made a method for that in this answer.
LINQ makes this easy and readable. This requires normalized data though, particularly in that it is case-sensitive.
List<string> ignoreList = new List<string>()
{
"North",
"South",
"East",
"West"
};
string s = "123 West 5th St"
.Split(' ') // Separate the words to an array
.ToList() // Convert array to TList<>
.Except(ignoreList) // Remove ignored keywords
.Aggregate((s1, s2) => s1 + " " + s2); // Reconstruct the string
Why not juts Keep It Simple ?
public static string Trim(string text)
{
var rv = text.trim();
foreach (var ignore in ignoreList) {
if(tv.EndsWith(ignore) {
rv = rv.Replace(ignore, string.Empty);
}
}
return rv;
}
You can do this using and expression if you like, but it's easier to turn it around than using a Aggregate. I would do something like this:
string s = "14th Avenue North"
ignoreList.ForEach(i => s = s.Replace(i, ""));
//result is "14th Avenue "
public static string Trim(string text)
{
var rv = text;
foreach (var ignore in ignoreList)
rv = rv.Replace(ignore, "");
return rv;
}
Updated For Gabe
public static string Trim(string text)
{
var rv = "";
var words = text.Split(" ");
foreach (var word in words)
{
var present = false;
foreach (var ignore in ignoreList)
if (word == ignore)
present = true;
if (!present)
rv += word;
}
return rv;
}
If you have a list, I think you're going to have to touch all the items. You could create a massive RegEx with all your ignore keywords and replace to String.Empty.
Here's a start:
(^|\s+)(North|South|East|West){1,2}(ern)?(\s+|$)
If you have a single RegEx for ignore words, you can do a single replace for each phrase you want to pass to the algorithm.
Is there an easy way to capitalize the first letter of a string and lower the rest of it? Is there a built in method or do I need to make my own?
TextInfo.ToTitleCase() capitalizes the first character in each token of a string.
If there is no need to maintain Acronym Uppercasing, then you should include ToLower().
string s = "JOHN DOE";
s = CultureInfo.CurrentCulture.TextInfo.ToTitleCase(s.ToLower());
// Produces "John Doe"
If CurrentCulture is unavailable, use:
string s = "JOHN DOE";
s = new System.Globalization.CultureInfo("en-US", false).TextInfo.ToTitleCase(s.ToLower());
See the MSDN Link for a detailed description.
CultureInfo.CurrentCulture.TextInfo.ToTitleCase("hello world");
String test = "HELLO HOW ARE YOU";
string s = CultureInfo.CurrentCulture.TextInfo.ToTitleCase(test);
The above code wont work .....
so put the below code by convert to lower then apply the function
String test = "HELLO HOW ARE YOU";
string s = CultureInfo.CurrentCulture.TextInfo.ToTitleCase(test.ToLower());
There are some cases that CultureInfo.CurrentCulture.TextInfo.ToTitleCase cannot handle, for example : the apostrophe '.
string input = CultureInfo.CurrentCulture.TextInfo.ToTitleCase("o'reilly, m'grego, d'angelo");
// input = O'reilly, M'grego, D'angelo
A regex can also be used \b[a-zA-Z] to identify the starting character of a word after a word boundary \b, then we need just to replace the match by its upper case equivalence thanks to the Regex.Replace(string input,string pattern,MatchEvaluator evaluator) method :
string input = "o'reilly, m'grego, d'angelo";
input = Regex.Replace(input.ToLower(), #"\b[a-zA-Z]", m => m.Value.ToUpper());
// input = O'Reilly, M'Grego, D'Angelo
The regex can be tuned if needed, for instance, if we want to handle the MacDonald and McFry cases the regex becomes : (?<=\b(?:mc|mac)?)[a-zA-Z]
string input = "o'reilly, m'grego, d'angelo, macdonald's, mcfry";
input = Regex.Replace(input.ToLower(), #"(?<=\b(?:mc|mac)?)[a-zA-Z]", m => m.Value.ToUpper());
// input = O'Reilly, M'Grego, D'Angelo, MacDonald'S, McFry
If we need to handle more prefixes we only need to modify the group (?:mc|mac), for example to add french prefixes du, de : (?:mc|mac|du|de).
Finally, we can realize that this regex will also match the case MacDonald'S for the last 's so we need to handle it in the regex with a negative look behind (?<!'s\b). At the end we have :
string input = "o'reilly, m'grego, d'angelo, macdonald's, mcfry";
input = Regex.Replace(input.ToLower(), #"(?<=\b(?:mc|mac)?)[a-zA-Z](?<!'s\b)", m => m.Value.ToUpper());
// input = O'Reilly, M'Grego, D'Angelo, MacDonald's, McFry
Mc and Mac are common surname prefixes throughout the US, and there are others. TextInfo.ToTitleCase doesn't handle those cases and shouldn't be used for this purpose. Here's how I'm doing it:
public static string ToTitleCase(string str)
{
string result = str;
if (!string.IsNullOrEmpty(str))
{
var words = str.Split(' ');
for (int index = 0; index < words.Length; index++)
{
var s = words[index];
if (s.Length > 0)
{
words[index] = s[0].ToString().ToUpper() + s.Substring(1);
}
}
result = string.Join(" ", words);
}
return result;
}
ToTitleCase() should work for you.
http://support.microsoft.com/kb/312890
The most direct option is going to be to use the ToTitleCase function that is available in .NET which should take care of the name most of the time. As edg pointed out there are some names that it will not work for, but these are fairly rare so unless you are targeting a culture where such names are common it is not necessary something that you have to worry too much about.
However if you are not working with a .NET langauge, then it depends on what the input looks like - if you have two separate fields for the first name and the last name then you can just capitalize the first letter lower the rest of it using substrings.
firstName = firstName.Substring(0, 1).ToUpper() + firstName.Substring(1).ToLower();
lastName = lastName.Substring(0, 1).ToUpper() + lastName.Substring(1).ToLower();
However, if you are provided multiple names as part of the same string then you need to know how you are getting the information and split it accordingly. So if you are getting a name like "John Doe" you an split the string based upon the space character. If it is in a format such as "Doe, John" you are going to need to split it based upon the comma. However, once you have it split apart you just apply the code shown previously.
CultureInfo.CurrentCulture.TextInfo.ToTitleCase ("my name");
returns ~ My Name
But the problem still exists with names like McFly as stated earlier.
I use my own method to get this fixed:
For example the phrase: "hello world. hello this is the stackoverflow world." will be "Hello World. Hello This Is The Stackoverflow World.". Regex \b (start of a word) \w (first charactor of the word) will do the trick.
/// <summary>
/// Makes each first letter of a word uppercase. The rest will be lowercase
/// </summary>
/// <param name="Phrase"></param>
/// <returns></returns>
public static string FormatWordsWithFirstCapital(string Phrase)
{
MatchCollection Matches = Regex.Matches(Phrase, "\\b\\w");
Phrase = Phrase.ToLower();
foreach (Match Match in Matches)
Phrase = Phrase.Remove(Match.Index, 1).Insert(Match.Index, Match.Value.ToUpper());
return Phrase;
}
The suggestions to use ToTitleCase won't work for strings that are all upper case. So you are gonna have to call ToUpper on the first char and ToLower on the remaining characters.
This class does the trick. You can add new prefixes to the _prefixes static string array.
public static class StringExtensions
{
public static string ToProperCase( this string original )
{
if( String.IsNullOrEmpty( original ) )
return original;
string result = _properNameRx.Replace( original.ToLower( CultureInfo.CurrentCulture ), HandleWord );
return result;
}
public static string WordToProperCase( this string word )
{
if( String.IsNullOrEmpty( word ) )
return word;
if( word.Length > 1 )
return Char.ToUpper( word[0], CultureInfo.CurrentCulture ) + word.Substring( 1 );
return word.ToUpper( CultureInfo.CurrentCulture );
}
private static readonly Regex _properNameRx = new Regex( #"\b(\w+)\b" );
private static readonly string[] _prefixes = {
"mc"
};
private static string HandleWord( Match m )
{
string word = m.Groups[1].Value;
foreach( string prefix in _prefixes )
{
if( word.StartsWith( prefix, StringComparison.CurrentCultureIgnoreCase ) )
return prefix.WordToProperCase() + word.Substring( prefix.Length ).WordToProperCase();
}
return word.WordToProperCase();
}
}
If your using vS2k8, you can use an extension method to add it to the String class:
public static string FirstLetterToUpper(this String input)
{
return input = input.Substring(0, 1).ToUpper() +
input.Substring(1, input.Length - 1);
}
To get round some of the issues/problems that have ben highlighted I would suggest converting the string to lower case first and then call the ToTitleCase method. You could then use IndexOf(" Mc") or IndexOf(" O\'") to determine special cases that need more specific attention.
inputString = inputString.ToLower();
inputString = CultureInfo.CurrentCulture.TextInfo.ToTitleCase(inputString);
int indexOfMc = inputString.IndexOf(" Mc");
if(indexOfMc > 0)
{
inputString.Substring(0, indexOfMc + 3) + inputString[indexOfMc + 3].ToString().ToUpper() + inputString.Substring(indexOfMc + 4);
}
I like this way:
using System.Globalization;
...
TextInfo myTi = new CultureInfo("en-Us",false).TextInfo;
string raw = "THIS IS ALL CAPS";
string firstCapOnly = myTi.ToTitleCase(raw.ToLower());
Lifted from this MSDN article.
Hope this helps you.
String fName = "firstname";
String lName = "lastname";
String capitalizedFName = CultureInfo.CurrentCulture.TextInfo.ToTitleCase(fName);
String capitalizedLName = CultureInfo.CurrentCulture.TextInfo.ToTitleCase(lName);
public static string ConvertToCaptilize(string input)
{
if (!string.IsNullOrEmpty(input))
{
string[] arrUserInput = input.Split(' ');
// Initialize a string builder object for the output
StringBuilder sbOutPut = new StringBuilder();
// Loop thru each character in the string array
foreach (string str in arrUserInput)
{
if (!string.IsNullOrEmpty(str))
{
var charArray = str.ToCharArray();
int k = 0;
foreach (var cr in charArray)
{
char c;
c = k == 0 ? char.ToUpper(cr) : char.ToLower(cr);
sbOutPut.Append(c);
k++;
}
}
sbOutPut.Append(" ");
}
return sbOutPut.ToString();
}
return string.Empty;
}
Like edg indicated, you'll need a more complex algorithm to handle special names (this is probably why many places force everything to upper case).
Something like this untested c# should handle the simple case you requested:
public string SentenceCase(string input)
{
return input(0, 1).ToUpper + input.Substring(1).ToLower;
}