Find only equal words in list exist in string - c#

I have several lists, with words content about 2000-3000 words:
var list1 = new List<string> {"able", "adorable", "adventurous", ...};
and than if string inputStr = "do, dream"; contains any value from list, I want, look for each word in string into string[] words = inputStr.Split(' '); foreach (string word in words) with if (list1.Any(word.Contains)).
I'm not sure, maybe it is because I use list, or my search Contains method is not correct for this case, but in result I found words, which is not equal to words exist in input string, but which contains this words as part of word, for example for word "do" or word "dream":
(do) adorable, doubt, fully, do, doh, freedom, down, double
(dream) dreamily, dream
Not sure how to avoid this, maybe better use Dictionary or SortedDictionary if problem is list. Same result I have if I check it this way var val1 = list1.FirstOrDefault(stringToCheck => stringToCheck.Contains(word)); Seems like different search gives me same results with list, all words which contains found words in input string as part of word, but desired result is to find only equal words:
(do) do
(dream) dream

IndexOf() method will get you the index of any equivalent strings within the collection.
You could also do something like this with LINQ:
list.Any(x => x == "testString");

To find the sequence that contains your "word" you should use Linq :
// (do) adorable, doubt, fully, do, doh, freedom, down, double
var result = list1.Select(word => word.Contains("do"));
But if you're trying to get word that matches fully :
var result = list1.Select(word => word.Equals("do"));
Combining this with your input list :
var result = input.SelectMany(x => list1.Where(w => w.Equals(x)));
EDIT:
Here you can check it online

You can get it done with a single Linq line:
List<string> list1 = new List<string> { "able", "adorable", "adventurous" };
string inputstr = "the adorable adventurous cat";
var found_words = inputstr.Split(' ').Where(word => list1.Contains(word));
// found_words[0] = "adorable"
// found_words[1] = "adventurous"

if (list1.Contains(word))
Will only match whole exact strings in list.
But in that case, you should make list1 a HashSet instead, that will have much better performance.

Linq is still your best bet. Assuming you want case sensitivity but don't want to observe hanging whitespace:
public string Foo(string input, List<string> list)
{
return (list.FirstOrDefault(t.Trim() == input.Trim()));
}
I personally prefer to compare strings by value than using Equals most of the time, though for string comparisons you may want to narrow down Culture as necessary..

Related

Split two strings into List<string> and compare using Linq

I have a list of objects with a string property named "Color". I need to split the string into a list using a space delimiter and compare the list to another list to see if any of the contained strings match using Linq.
string searchString = "I like sand";
List<string> searches = searchString.Split(' ').ToList();
//Determine if matches exists anywhere between the 2 strings using linq
List<myObject> obj = myObjectList.Where(x=> searches.Any(a=>x.Color.Contains(a))).Any();
Using my current Linq query, I can only find exact matches. So Lets say one my Objects Color properties equaled "sand", the query will return a match, but if my Color equals a two word name like "sand dune" my query will not return a match.
This example should kind of help explain what needs to return as a match.
//Two strings should return a match as the word sand is in both
"I like sand"
"sand dune"
//Two strings should NOT return a match as no common words exist
"I like sand"
"Ice cream"
Any help is appreciated.
Try splitting both strings and then use LINQs Intersect() to get splits that are in both strings and Any() to check whether there is such an intersection:
var first = "I like sand";
var second = "san dune";
var result = first.Split(' ').Intersect(second.Split(' ')).Any();
I would suggest splitting on null instead of a blank character, that way you split on all whitespace. You can also extract this into a function:
private static bool CompareStrings(string a, string b)
{
return a.Split(null).Intersect(b.Split(null)).Any();
}
Then you can just call it like this:
bool result = CompareStrings("I like sand", "sand dune");
bool result2 = CompareStrings("I like sand", "Ice cream");
Keep in mind this solution will be case sensitive, so Sand and sand would not be a match.
Fiddle here

Check array for string that starts with given one (ignoring case)

I am trying to see if my string starts with a string in an array of strings I've created. Here is my code:
string x = "Table a";
string y = "a table";
string[] arr = new string["table", "chair", "plate"]
if (arr.Contains(x.ToLower())){
// this should be true
}
if (arr.Contains(y.ToLower())){
// this should be false
}
How can I make it so my if statement comes up true? Id like to just match the beginning of string x to the contents of the array while ignoring the case and the following characters. I thought I needed regex to do this but I could be mistaken. I'm a bit of a newbie with regex.
It seems you want to check if your string contains an element from your list, so this should be what you are looking for:
if (arr.Any(c => x.ToLower().Contains(c)))
Or simpler:
if (arr.Any(x.ToLower().Contains))
Or based on your comments you may use this:
if (arr.Any(x.ToLower().Split(' ')[0].Contains))
Because you said you want regex...
you can set a regex to var regex = new Regex("(table|plate|fork)");
and check for if(regex.IsMatch(myString)) { ... }
but it for the issue at hand, you dont have to use Regex, as you are searching for an exact substring... you can use
(as #S.Akbari mentioned : if (arr.Any(c => x.ToLower().Contains(c))) { ... }
Enumerable.Contains matches exact values (and there is no build in compare that checks for "starts with"), you need Any that takes predicate that takes each array element as parameter and perform the check. So first step is you want "contains" to be other way around - given string to contain element from array like:
var myString = "some string"
if (arr.Any(arrayItem => myString.Contains(arrayItem)))...
Now you actually asking for "string starts with given word" and not just contains - so you obviously need StartsWith (which conveniently allows to specify case sensitivity unlike Contains - Case insensitive 'Contains(string)'):
if (arr.Any(arrayItem => myString.StartsWith(
arrayItem, StringComparison.CurrentCultureIgnoreCase))) ...
Note that this code will accept "tableAAA bob" - if you really need to break on word boundary regular expression may be better choice. Building regular expressions dynamically is trivial as long as you properly escape all the values.
Regex should be
beginning of string - ^
properly escaped word you are searching for - Escape Special Character in Regex
word break - \b
if (arr.Any(arrayItem => Regex.Match(myString,
String.Format(#"^{0}\b", Regex.Escape(arrayItem)),
RegexOptions.IgnoreCase)) ...
you can do something like below using TypeScript. Instead of Starts with you can also use contains or equals etc..
public namesList: Array<string> = ['name1','name2','name3','name4','name5'];
// SomeString = 'name1, Hello there';
private isNamePresent(SomeString : string):boolean{
if (this.namesList.find(name => SomeString.startsWith(name)))
return true;
return false;
}
I think I understand what you are trying to say here, although there are still some ambiguity. Are you trying to see if 1 word in your String (which is a sentence) exists in your array?
#Amy is correct, this might not have to do with Regex at all.
I think this segment of code will do what you want in Java (which can easily be translated to C#):
Java:
x = x.ToLower();
string[] words = x.Split("\\s+");
foreach(string word in words){
foreach(string element in arr){
if(element.Equals(word)){
return true;
}
}
}
return false;
You can also use a Set to store the elements in your array, which can make look up more efficient.
Java:
x = x.ToLower();
string[] words = x.Split("\\s+");
HashSet<string> set = new HashSet<string>(arr);
for(string word : words){
if(set.contains(word)){
return true;
}
}
return false;
Edit: (12/22, 11:05am)
I rewrote my solution in C#, thanks to reminders by #Amy and #JohnyL. Since the author only wants to match the first word of the string, this edited code should work :)
C#:
static bool contains(){
x = x.ToLower();
string[] words = x.Split(" ");
var set = new HashSet<string>(arr);
if(set.Contains(words[0])){
return true;
}
return false;
}
Sorry my question was so vague but here is the solution thanks to some help from a few people that answered.
var regex = new Regex("^(table|chair|plate) *.*");
if (regex.IsMatch(x.ToLower())){}

what can be performance-improved alternate to the following Regex?

I have list and text file and I want:
Find all list items that are also in string (matched words) and
store them in list or array
Replace all the found matched words with "Names"
Count the matched words
The code is working fine, but it takes about 10 minutes to execute i want to improve the performance of the code, i have also try to use the contain function instead of the regex, but it effect on the working of the code as i am trying to matched the full words not sub-string.
Here is the code:
List<string> Names = new List<string>();
// Names = Millions values from the database
string Text = System.IO.File.ReadAllText(#"D:\record-13.txt");
var letter = new Regex(#"(?<letter>\W)");
var pattern = string.Join("|", names
.Select(n => $#"((?<=(^|\W)){letter.Replace(n, "[${letter}]")}(?=($|\W)))"));
var regex = new Regex(pattern);
var matchedWords = regex
.Matches(text)
.Cast<Match>()
.Select(m => m.Value)
//.Distinct()
.ToList();
text = regex.Replace(text, "Names");
Console.WriteLine($"Matched Words: {string.Join(", ", matchedWords.Distinct())}");
Console.WriteLine($"Count: {matchedWords.Count}");
Console.WriteLine($"Replaced Text: {text}");
Is there an alternate way to do the same thing as the above code do, with improved performance?
What you are doing is building a regular expression with "millions" of strings embedded in it, if Names really contains "millions" of strings. This is going to perform very poorly, even just to parse the regular expression, let alone evaluate it.
What you should do instead is load your Names into a HashSet<string>, then parse through the document one time, pulling out whole words. You can use a regular expression or write a state machine to do this. For each "word" you read, check if it exists in the HashSet<string> of names, and if so, write "Names" to your output (a StringBuilder would be ideal for this). If the word is not in the Names hashset, write the actual word to your output. Be sure to also write any non-word characters to the output as you encounter them. When you are done, your output will contain the sanitized result, and it should complete it milliseconds rather than minutes.
If I understand what you really want; I think you can use this code instead:
If you can ignore resulting Matched Words and Count:
text = names.Select(name => $#"\b{name}\b")
.Aggregate(text, (current, pattern) => Regex.Replace(current, pattern, "Names"));
Else:
var count = 0;
var matchedWord = new List<string>();
foreach (var name in names)
{
var regex = new Regex($#"\b{name}\b");
if (regex.IsMatch(text))
{
count++;
matchedWord.Add(name);
}
text = regex.Replace(text, "Names");
}

How can I compare a string to a "filter" list in linq?

I'm trying to filter a collection of strings by a "filter" list... a list of bad words. The string contains a word from the list I dont want it.
I've gotten so far, the bad Word here is "frakk":
string[] filter = { "bad", "words", "frakk" };
string[] foo =
{
"this is a lol string that is allowed",
"this is another lol frakk string that is not allowed!"
};
var items = from item in foo
where (item.IndexOf( (from f in filter select f).ToString() ) == 0)
select item;
But this aint working, why?
You can use Any + Contains:
var items = foo.Where(s => !filter.Any(w => s.Contains(w)));
if you want to compare case-insensitively:
var items = foo.Where(s => !filter.Any(w => s.IndexOf(w, StringComparison.OrdinalIgnoreCase) >= 0));
Update: If you want to exlude sentences where at least one word is in the filter-list you can use String.Split() and Enumerable.Intersect:
var items = foo.Where(sentence => !sentence.Split().Intersect(filter).Any());
Enumerable.Intersect is very efficient since it uses a Set under the hood. it's more efficient to put the long sequence first. Due to Linq's deferred execution is stops on the first matching word.
( note that the "empty" Split includes other white-space characters like tab or newline )
The first problem you need to solve is breaking up the sentence into a series of words. The simplest way to do this is based on spaces
string[] words = sentence.Split(new char[] {' '}, StringSplitOptions.RemoveEmptyEntries);
From there you can use a simple LINQ expression to find the profanities
var badWords = words.Where(x => filter.Contains(x));
However this is a bit of a primitive solution. It won't handle a number of complex cases that you likely need to think about
There are many characters which qualify as a space. My solution only uses ' '
The split doesn't handle punctuations. So dog! won't be viewed as dog. Probably much better to break up words on legal characters
The reason your initial attempt didn't work is that this line:
(from f in filter select f).ToString()
evaluates to a string of the Array Iterator type name that's implied by the linq expression portion. So you're actually comparing the characters of the following string:
System.Linq.Enumerable+WhereSelectArrayIterator``2[System.String,System.String]
rather than the words of the filter when examining your phrases.

C# search into a string for a specific pattern, and put in an Array

I'm having the following string as an example:
<tr class="row_odd"><td>08:00</td><td>08:10</td><td>TEST1</td></tr><tr class="row_even"><td>08:10</td><td>08:15</td><td>TEST2</td></tr><tr class="row_odd"><td>08:15</td><td>08:20</td><td>TEST3</td></tr><tr class="row_even"><td>08:20</td><td>08:25</td><td>TEST4</td></tr><tr class="row_odd"><td>08:25</td><td>08:30</td><td>TEST5</td></tr>
I need to have to have the output as a onedimensional Array.
Like 11111=myArray(0) , 22222=myArray(1) , 33333=myArray(2) ,......
I have already tried the myString.replace, but it seems I can only replace a single Char that way. So I need to use expressions and a for loop for filling the array, but since this is my first c# project, that is a bridge too far for me.
Thanks,
It seems like you want to use a Regex search pattern. Then return the matches (using a named group) into an array.
var regex = new Regex("act=\?(<?Id>\d+)");
regex.Matches(input).Cast<Match>()
.Select(m => m.Groups["Id"])
.Where(g => g.Success)
.Select(g => Int32.Parse(g.Value))
.ToArray();
(PS. I'm not positive about the regex pattern - you should check into it yourself).
Several ways you could do this. A couple are:
a) Use a regular expression to look for what you want in the string. Used a named group so you can access the matches directly
http://www.regular-expressions.info/dotnet.html
b) Split the expression at the location where your substrings are (e.g. split on "act="). You'll have to do a bit more parsing to get what you want, but that won't be to difficult since it will be at the beginning of the split string (and your have other srings that dont have your substring in them)
Use a combination of IndexOf and Substring... something like this would work (not sure how much your string varies). This will probably be quicker than any Regex you come up with. Although, looking at the length of your string, it might not really be an issue.
public static List<string> GetList(string data)
{
data = data.Replace("\"", ""); // get rid of annoying "'s
string[] S = data.Split(new string[] { "act=" }, StringSplitOptions.None);
var results = new List<string>();
foreach (string s in S)
{
if (!s.Contains("<tr"))
{
string output = s.Substring(0, s.IndexOf(">"));
results.Add(output);
}
}
return results;
}
Split your string using HTML tags like "<tr>","</tr>","<td>","</td>", "<a>","</a>" with strinng-variable.split() function. This gives you list of array.
Split html row into string array

Categories

Resources