Checking if string contains multiple letters - c#

I have a collection of an object called dictionaryWords.
For each word in this collection I need to check if it does not contain certain letters. If it does contain one or more of a certain letter it is removed from the collection.
Example:
Collection before removal: ['abc','dee',fff']
letters to check for: e,f
Collection after removal: ['abc']
Rather than specifying multiple letters is there a way to check against an array?
My code:
foreach(DictionaryWord word in dictionaryWords)
{
if (!word.Contains("D") && !word.Contains("E") // optimize this line
{
// Word does not contain letters, word is good
}
}
How can I replace the "optimize this line" to say "if word contains any letter from an array of values"
Thanks,
Andrew

Try something like this:
// isolate the letters
string[] letters = new string[] { "D", "E", "F" }; // other strings or letters
// interate between your data
foreach(DictionaryWord word in dictionaryWords)
{
// check if the work does not contain any letter
if (!letters.Any(x => word.Contains(x))
{
// Word does not contain letters, word is good
}
}

Why not abstract this out to a function?
void CheckForLetters(IEnumerable<DictionaryWord> source, IEnumerable<char> letters) {
foreach (var word in source) {
if (letters.any(c => word.Contains(c)) {
// It has the letter
}
}
}

Create a method that accepts an array of characters to compare against.
The prototype would look like the following: bool isLetterInWord(DictionaryWord word, char[] letters);

Assuming that you can map your DictionaryWord class to a string, you can use a Linq query for this:
var words = new string[] { "abc", "def", "fe" };
var lettersToExclude = new char[] { 'e', 'f' };
var goodWords = from x in words where !lettersToExclude.Intersect(x).Any() select x;

Use HashSets and make use of IsSubsetOf method
http://msdn.microsoft.com/en-us/library/bb358446(v=vs.110).aspx
string str = "teststing";
var strset = new HashSet<char>(str.ToCharArray());
var charstoCheck = new HashSet<char>(new Char[] {'z'});
bool isstrcontainschars = charstoCheck.IsSubsetOf(strset);

You could use Regex for that.
string value = "abcdef";
var result = new Regex("[d-e-f]").Replace(value, string.Empty);
It checks if the string contains the letters at the Regex's constructor, and replace the letters for what you want. In my example, I replaced the letter to string.Empty.
In your case, you could do like this:
if (new Regex("[d-e-f]").IsMatch(word))
{
//Do something.
}

Related

How to check if a string contains a word and ignore special characters?

I need to check if a sentence contains any of the word from a string array but while checking it should ignore special characters like comma. But the result should have original sentence.
For example, I have a sentence "Tesla car price is $ 250,000."
In my word array I've wrdList = new string[5]{ "250000", "Apple", "40.00"};
I have written the below line of code, but it is not returning the result because 250,000 and 250000 are not matching.
List<string> res = row.ItemArray.Where(itmArr => wrdList.Any(wrd => itmArr.ToString().ToLower().Contains(wrd.ToString()))).OfType<string>().ToList();
And one important thing is, I need to get original sentence if it matches with string array.
For example, result should be "Tesla car price is $ 250,000."
not like "Tesla car price is $ 250000."
How about Replace(",", "")
itmArr.ToString().ToLower().Replace(",", "").Contains(wrd.ToString())
side note: .ToLower() isn't required since digits are case insensitive and a string doesn't need .ToString()
so the resuld could also be
itmArr.Replace(",", "").Contains(wrd)
https://dotnetfiddle.net/A2zN0d
Update
sice the , could be a different character - culture based, you can also use
ystem.Threading.Thread.CurrentThread.CurrentCulture.NumberFormat.NumberGroupSeparator
instead
The first option to consider for most text matching problems is to use regular expressions. This will work for your problem. The core part of the solution is to construct an appropriate regular expression to match what you need to match.
You have a list of words, but I'll focus on just one word. Your requirements specify that you want to match on a "word". So to start with, you can use the "word boundary" pattern \b. To match the word "250000", the regular expression would be \b250000\b.
Your requirements also specify that the word can "contain" characters that are "special". For it to work correctly, you need to be clear what it means to "contain" and which characters are "special".
For the "contain" requirement, I'll assume you mean that the special character can be between any two characters in the word, but not the first or last character. So for the word "250000", any of the question marks in this string could be a special character: "2?5?0?0?0?0".
For the "special" requirement, there are options that depend on your requirements. If it's simply punctuation, you can use the character class \p{P}. If you need to specify a specific list of special characters, you can use a character group. For example, if your only special character is comma, the character group would be [,].
To put all that together, you would create a function to build the appropriate regular expression for each target word, then use that to check your sentence. Something like this:
public static void Main()
{
string sentence = "Tesla car price is $ 250,000.";
var targetWords = new string[]{ "250000", "350000", "400000"};
Console.WriteLine($"Contains target word? {ContainsTarget(sentence, targetWords)}");
}
private static bool ContainsTarget(string sentence, string[] targetWords)
{
return targetWords.Any(targetWord => ContainsTarget(sentence, targetWord));
}
private static bool ContainsTarget(string sentence, string targetWord)
{
string targetWordExpression = TargetWordExpression(targetWord);
var re = new Regex(targetWordExpression);
return re.IsMatch(sentence);
}
private static string TargetWordExpression(string targetWord)
{
var sb = new StringBuilder();
// If special characters means a specific list, use this:
string specialCharacterMatch = $"[,]?";
// If special characters means any punctuation, then you can use this:
//string specialCharactersMatch = "\\p{P}?";
bool any = false;
foreach (char c in targetWord)
{
if (any)
{
sb.Append(specialCharacterMatch);
}
any = true;
sb.Append(c);
}
return $"\\b{sb}\\b";
}
Working code: https://dotnetfiddle.net/5UJSur
Hope below solution can help,
Used Regular expression for removing non alphanumeric characters
Returns the original string if it contains any matching word from wrdList.
string s = "Tesla car price is $ 250,000.";
string[] wrdList = new string[3] { "250000", "Apple", "40.00" };
Regex rgx = new Regex("[^a-zA-Z0-9 -]");
string str = rgx.Replace(s, "");
if (wrdList.Any(str.Contains))
{
Console.Write(s);
}
else
{
Console.Write("No Match Found!");
}
Uplodade on fiddle for more exploration
https://dotnetfiddle.net/zbwuDy
In addition for paragraph, can split into array of sentences and iterate through. Check the same on below fiddle.
https://dotnetfiddle.net/AvO6FJ

Splitting a string with a space/two spaces after the character

Consider a number of strings, which are assumed to contain "keys" of the form "Wxxx", where x are digits from 0-9. Each one can contain either one only, or multiple ones, separated by ',' followed by two spaces. For example:
W123
W432
W546, W234, W167
The ones that contain multiple "keys" need to be split up, into an array. So, the last one in the above examples should be split into an array like this: {"W546", "W234", "W167"}.
As a quick solution, String.Split comes to mind, but as far as I am aware, it can take one character, like ','. The problem is that it would return an array with like this: {"W546", " W234", " W167"}. The two spaces in all the array entries from the second one onwards can probably be removed using Substring, but is there a better solution?
For context, these values are being held in a spreadsheet, and are assumed to have undergone data validation to ensure the "keys" are separated by a comma followed by two spaces.
while ((ws.Cells[row,1].Value!=null) && (ws.Cells[row,1].Value.ToString().Equals("")))
{
// there can be one key, or multiple keys separated by ','
if (ws.Cells[row,keysCol].Value.ToString().Contains(','))
{
// there are multiple
// need to split the ones in this cell separated by a comma
}
else
{
// there is one
}
row++;
}
You can just specify ',' and ' ' as separators and RemoveEmptyEntries.
Using your sample of single keys and a string containing multiple keys you can just handle them all the same and get your list of individual keys:
List<string> cells = new List<string>() { "W123", "W432", "W546, W234, W167" };
List<string> keys = new List<string>();
foreach (string cell in cells)
{
keys.AddRange(cell.Split(new char[] { ',', ' ' }, StringSplitOptions.RemoveEmptyEntries));
}
Split can handle strings where's nothing to split and AddRange will accept your single keys as well as the multi-key split results.
You could use an old favorite--Regular Expressions.
Here are two flavors 'Loop' or 'LINQ'.
static void Main(string[] args)
{
var list = new List<string>{"W848","W998, W748","W953, W9484, W7373","W888"};
Console.WriteLine("LINQ");
list.ForEach(l => TestSplitRegexLinq(l));
Console.WriteLine();
Console.WriteLine("Loop");
list.ForEach(l => TestSplitRegexLoop(l));
}
private static void TestSplitRegexLinq(string s)
{
string pattern = #"[W][0-9]*";
var reg = new Regex(pattern);
reg.Matches(s).ToList().ForEach(m => Console.WriteLine(m.Value));
}
private static void TestSplitRegexLoop(string s)
{
string pattern = #"[W][0-9]*";
var reg = new Regex(pattern);
foreach (Match m in reg.Matches(s))
{
Console.WriteLine(m.Value);
}
}
Just replace the Console.Write with anything you want: eg. myList.Add(m.Value).
You will need to add the NameSpace: using System.Text.RegularExpressions;
Eliminate the extra space first (using Replace()), then use split.
var input = "W546, W234, W167";
var normalized = input.Replace(", ",",");
var array = normalized.Split(',');
This way, you treat a comma followed by a space exactly the same as you'd treat a comma. If there might be two spaces you can also replace that:
var input = "W546, W234, W167";
var normalized = input.Replace(" "," ").Replace(", ",",");
var array = normalized.Split(',');
After trying this in .NET fiddle, I think I may have a solution:
// if there are multiple
string keys = ws.Cells[row,keysCol].Value.ToString();
// remove spaces
string keys_normalised = keys.Replace(" ", string.Empty);
Console.WriteLine("Checking that spaces have been removed: " + keys3_normalised + "\n");
string[] splits = keys3_normalised.Split(',');
for (int i = 0; i < splits.Length; i++)
{
Console.WriteLine(splits[i]);
}
This produces the following output in the console:
Checking that spaces have been removed: W456,W234,W167
W456
W234
W167

Using string.ToUpper on substring

Have an assignment to allow a user to input a word in C# and then display that word with the first and third characters changed to uppercase. Code follows:
namespace Capitalizer
{
class Program
{
static void Main(string[] args)
{
string text = Console.ReadLine();
char[] delimiterChars = { ' ' };
string[] words = text.Split(delimiterChars);
string Upper = text.ToUpper();
Console.WriteLine(Upper);
Console.ReadKey();
}
}
}
This of course generates the entire word in uppercase, which is not what I want. I can't seem to make text.ToUpper(0,2) work, and even then that'd capitalize the first three letters. Only solution I can think of now that would make the word appear on one line (and I don't know if it works) is to move the capitalized letters and lowercase letters into a character array and try to get that to print all values in a modified order.
The simplest way I can think of to address your exact question as described — to convert to upper case the first and third characters of the input — would be something like the following:
StringBuilder sb = new StringBuilder(text);
sb[0] = char.ToUpper(sb[0]);
sb[2] = char.ToUpper(sb[2]);
text = sb.ToString();
The StringBuilder class is essentially a mutable string object, so when doing these kinds of operations is the most fluid way to approach the problem, as it provides the most straightforward conversions to and from, as well as the full range of string operations. Changing individual characters is easy in many data structures, but insertions, deletions, appending, formatting, etc. all also come with StringBuilder, so it's a good habit to use that versus other approaches.
But frankly, it's hard to see how that's a useful operation. I can't help but wonder if you have stated the requirements incorrectly and there's something more to this question than is seen here.
You could use LINQ:
var upperCaseIndices = new[] { 0, 2 };
var message = "hello";
var newMessage = new string(message.Select((c, i) =>
upperCaseIndices.Contains(i) ? Char.ToUpper(c) : c).ToArray());
Here is how it works. message.Select (inline LINQ query) selects characters from message one by one and passes into selector function:
upperCaseIndices.Contains(i) ? Char.ToUpper(c) : c
written as C# ?: shorthand syntax for if. It reads as "If index is present in the array, then select upper case character. Otherwise select character as is."
(c, i) => condition
is a lambda expression. See also:
Understand Lambda Expressions in 3 minutes
The rest is very simple - represent result as array of characters (.ToArray()), and create a new string based off that (new string(...)).
Only solution I can think of now that would make the word appear on one line (and I don't know if it works) is to move the capitalized letters and lowercase letters into a character array and try to get that to print all values in a modified order.
That seems a lot more complicated than necessary. Once you have a character array, you can simply change the elements of that character array. In a separate function, it would look something like
string MakeFirstAndThirdCharacterUppercase(string word) {
var chars = word.ToCharArray();
chars[0] = chars[0].ToUpper();
chars[2] = chars[2].ToUpper();
return new string(chars);
}
My simple solution:
string text = Console.ReadLine();
char[] delimiterChars = { ' ' };
string[] words = text.Split(delimiterChars);
foreach (string s in words)
{
char[] chars = s.ToCharArray();
chars[0] = char.ToUpper(chars[0]);
if (chars.Length > 2)
{
chars[2] = char.ToUpper(chars[2]);
}
Console.Write(new string(chars));
Console.Write(' ');
}
Console.ReadKey();

How to Replace Multiple Words in a String Using C#?

I'm wondering how I can replace (remove) multiple words (like 500+) from a string. I know I can use the replace function to do this for a single word, but what if I want to replace 500+ words? I'm interested in removing all generic keywords from an article (such as "and", "I", "you" etc).
Here is the code for 1 replacement.. I'm looking to do 500+..
string a = "why and you it";
string b = a.Replace("why", "");
MessageBox.Show(b);
Thanks
# Sergey Kucher Text size will vary between a few hundred words to a few thousand. I am replacing these words from random articles.
I would normally do something like:
// If you want the search/replace to be case sensitive, remove the
// StringComparer.OrdinalIgnoreCase
Dictionary<string, string> replaces = new Dictionary<string, string>(StringComparer.OrdinalIgnoreCase) {
// The format is word to be searched, word that should replace it
// or String.Empty to simply remove the offending word
{ "why", "xxx" },
{ "you", "yyy" },
};
void Main()
{
string a = "why and you it and You it";
// This will search for blocks of letters and numbers (abc/abcd/ab1234)
// and pass it to the replacer
string b = Regex.Replace(a, #"\w+", Replacer);
}
string Replacer(Match m)
{
string found = m.ToString();
string replace;
// If the word found is in the dictionary then it's placed in the
// replace variable by the TryGetValue
if (!replaces.TryGetValue(found, out replace))
{
// otherwise replace the word with the same word (so do nothing)
replace = found;
}
else
{
// The word is in the dictionary. replace now contains the
// word that will substitute it.
// At this point you could add some code to maintain upper/lower
// case between the words (so that if you -> xxx then You becomes Xxx
// and YOU becomes XXX)
}
return replace;
}
As someone else wrote, but without problems with substrings (the ass principle... You don't want to remove asses from classes :-) ), and working only if you only need to remove words:
var escapedStrings = yourReplaces.Select(Regex.Escape);
string result = Regex.Replace(yourInput, #"\b(" + string.Join("|", escapedStrings) + #")\b", string.Empty);
I use the \b word boundary... It's a little complex to explain what it's, but it's useful to find word boundaries :-)
Create a list of all text you want and load it into a list, you do this fairly simple or get very complex. A trivial example would be:
var sentence = "mysentence hi";
var words = File.ReadAllText("pathtowordlist.txt").Split(Enviornment.NewLine);
foreach(word in words)
sentence.replace("word", "x");
You could create two lists if you wanted a dual mapping scheme.
Try this:
string text = "word1 word2 you it";
List<string> words = new System.Collections.Generic.List<string>();
words.Add("word1");
words.Add("word2");
words.ForEach(w => text = text.Replace(w, ""));
Edit
If you want to replace text with another text, you can create class Word:
public class Word
{
public string SearchWord { get; set; }
public string ReplaceWord { get; set; }
}
And change above code to this:
string text = "word1 word2 you it";
List<Word> words = new System.Collections.Generic.List<Word>();
words.Add(new Word() { SearchWord = "word1", ReplaceWord = "replaced" });
words.Add(new Word() { SearchWord = "word2", ReplaceWord = "replaced" });
words.ForEach(w => text = text.Replace(w.SearchWord, w.ReplaceWord));
if you are talking about a single string the solution is to remove them all by a simple replace method. as you can read there:
"Returns a new string in which all occurrences of a specified string in the current instance are replaced with another specified string".
you may be needing to replace several words, and you can make a list of these words:
List<string> wordsToRemove = new List<string>();
wordsToRemove.Add("why");
wordsToRemove.Add("how);
and so on
and then remove them from the string
foreach(string curr in wordsToRemove)
a = a.ToLower().Replace(curr, "");
Importent
if you want to keep your string as it was, without lowering words and without struggling with lower and upper case use
foreach(string curr in wordsToRemove)
// You can reuse this object
Regex regex = new Regex(curr, RegexOptions.IgnoreCase);
myString = regex.Replace(myString, "");
depends on the situation ofcourse,
but if your text is long and you have many words,
and you want optimize performance.
you should build a trie from the words, and search the Trie for a match.
it won't lower the Order of complexity, still O(nm), but for large groups of words, it will be able to check multiple words against each char instead of one by one.
i can assume couple of houndred words should be enough to get this faster.
This is the fastest method in my opinion and
i written a function for you to start with:
public struct FindRecord
{
public int WordIndex;
public int PositionInString;
}
public static FindRecord[] FindAll(string input, string[] words)
{
LinkedList<FindRecord> result = new LinkedList<FindRecord>();
int[] matchs = new int[words.Length];
for (int i = 0; i < input.Length; i++)
{
for (int j = 0; j < words.Length; j++)
{
if (input[i] == words[j][matchs[j]])
{
matchs[j]++;
if(matchs[j] == words[j].Length)
{
FindRecord findRecord = new FindRecord {WordIndex = j, PositionInString = i - matchs[j] + 1};
result.AddLast(findRecord);
matchs[j] = 0;
}
}
else
matchs[j] = 0;
}
}
return result.ToArray();
}
Another option:
it might be the rare case where regex will be faster then building the code.
Try using
public static string ReplaceAll(string input, string[] words)
{
string wordlist = string.Join("|", words);
Regex rx = new Regex(wordlist, RegexOptions.Compiled);
return rx.Replace(input, m => "");
}
Regex can do this better, you just need all the replace words in a list, and then:
var escapedStrings = yourReplaces.Select(PadAndEscape);
string result = Regex.Replace(yourInput, string.Join("|", escapedStrings);
This requires a function that space-pads the strings before escaping them:
public string PadAndEscape(string s)
{
return Regex.Escape(" " + s + " ");
}

string replace using a List<string>

I have a List of words I want to ignore like this one :
public List<String> ignoreList = new List<String>()
{
"North",
"South",
"East",
"West"
};
For a given string, say "14th Avenue North" I want to be able to remove the "North" part, so basically a function that would return "14th Avenue " when called.
I feel like there is something I should be able to do with a mix of LINQ, regex and replace, but I just can't figure it out.
The bigger picture is, I'm trying to write an address matching algorithm. I want to filter out words like "Street", "North", "Boulevard", etc. before I use the Levenshtein algorithm to evaluate the similarity.
How about this:
string.Join(" ", text.Split().Where(w => !ignoreList.Contains(w)));
or for .Net 3:
string.Join(" ", text.Split().Where(w => !ignoreList.Contains(w)).ToArray());
Note that this method splits the string up into individual words so it only removes whole words. That way it will work properly with addresses like Northampton Way #123 that string.Replace can't handle.
Regex r = new Regex(string.Join("|", ignoreList.Select(s => Regex.Escape(s)).ToArray()));
string s = "14th Avenue North";
s = r.Replace(s, string.Empty);
Something like this should work:
string FilterAllValuesFromIgnoreList(string someStringToFilter)
{
return ignoreList.Aggregate(someStringToFilter, (str, filter)=>str.Replace(filter, ""));
}
What's wrong with a simple for loop?
string street = "14th Avenue North";
foreach (string word in ignoreList)
{
street = street.Replace(word, string.Empty);
}
If you know that the list of word contains only characters that do not need escaping inside a regular expression then you can do this:
string s = "14th Avenue North";
Regex regex = new Regex(string.Format(#"\b({0})\b",
string.Join("|", ignoreList.ToArray())));
s = regex.Replace(s, "");
Result:
14th Avenue
If there are special characters you will need to fix two things:
Use Regex.Escape on each element of ignore list.
The word-boundary \b will not match a whitespace followed by a symbol or vice versa. You may need to check for whitespace (or other separating characters such as punctuation) using lookaround assertions instead.
Here's how to fix these two problems:
Regex regex = new Regex(string.Format(#"(?<= |^)({0})(?= |$)",
string.Join("|", ignoreList.Select(x => Regex.Escape(x)).ToArray())));
If it's a short string as in your example, you can just loop though the strings and replace one at a time. If you want to get fancy you can use the LINQ Aggregate method to do it:
address = ignoreList.Aggregate(address, (a, s) => a.Replace(s, String.Empty));
If it's a large string, that would be slow. Instead you can replace all strings in a single run through the string, which is much faster. I made a method for that in this answer.
LINQ makes this easy and readable. This requires normalized data though, particularly in that it is case-sensitive.
List<string> ignoreList = new List<string>()
{
"North",
"South",
"East",
"West"
};
string s = "123 West 5th St"
.Split(' ') // Separate the words to an array
.ToList() // Convert array to TList<>
.Except(ignoreList) // Remove ignored keywords
.Aggregate((s1, s2) => s1 + " " + s2); // Reconstruct the string
Why not juts Keep It Simple ?
public static string Trim(string text)
{
var rv = text.trim();
foreach (var ignore in ignoreList) {
if(tv.EndsWith(ignore) {
rv = rv.Replace(ignore, string.Empty);
}
}
return rv;
}
You can do this using and expression if you like, but it's easier to turn it around than using a Aggregate. I would do something like this:
string s = "14th Avenue North"
ignoreList.ForEach(i => s = s.Replace(i, ""));
//result is "14th Avenue "
public static string Trim(string text)
{
var rv = text;
foreach (var ignore in ignoreList)
rv = rv.Replace(ignore, "");
return rv;
}
Updated For Gabe
public static string Trim(string text)
{
var rv = "";
var words = text.Split(" ");
foreach (var word in words)
{
var present = false;
foreach (var ignore in ignoreList)
if (word == ignore)
present = true;
if (!present)
rv += word;
}
return rv;
}
If you have a list, I think you're going to have to touch all the items. You could create a massive RegEx with all your ignore keywords and replace to String.Empty.
Here's a start:
(^|\s+)(North|South|East|West){1,2}(ern)?(\s+|$)
If you have a single RegEx for ignore words, you can do a single replace for each phrase you want to pass to the algorithm.

Categories

Resources