Split two strings into List<string> and compare using Linq - c#

I have a list of objects with a string property named "Color". I need to split the string into a list using a space delimiter and compare the list to another list to see if any of the contained strings match using Linq.
string searchString = "I like sand";
List<string> searches = searchString.Split(' ').ToList();
//Determine if matches exists anywhere between the 2 strings using linq
List<myObject> obj = myObjectList.Where(x=> searches.Any(a=>x.Color.Contains(a))).Any();
Using my current Linq query, I can only find exact matches. So Lets say one my Objects Color properties equaled "sand", the query will return a match, but if my Color equals a two word name like "sand dune" my query will not return a match.
This example should kind of help explain what needs to return as a match.
//Two strings should return a match as the word sand is in both
"I like sand"
"sand dune"
//Two strings should NOT return a match as no common words exist
"I like sand"
"Ice cream"
Any help is appreciated.

Try splitting both strings and then use LINQs Intersect() to get splits that are in both strings and Any() to check whether there is such an intersection:
var first = "I like sand";
var second = "san dune";
var result = first.Split(' ').Intersect(second.Split(' ')).Any();

I would suggest splitting on null instead of a blank character, that way you split on all whitespace. You can also extract this into a function:
private static bool CompareStrings(string a, string b)
{
return a.Split(null).Intersect(b.Split(null)).Any();
}
Then you can just call it like this:
bool result = CompareStrings("I like sand", "sand dune");
bool result2 = CompareStrings("I like sand", "Ice cream");
Keep in mind this solution will be case sensitive, so Sand and sand would not be a match.
Fiddle here

Related

Check array for string that starts with given one (ignoring case)

I am trying to see if my string starts with a string in an array of strings I've created. Here is my code:
string x = "Table a";
string y = "a table";
string[] arr = new string["table", "chair", "plate"]
if (arr.Contains(x.ToLower())){
// this should be true
}
if (arr.Contains(y.ToLower())){
// this should be false
}
How can I make it so my if statement comes up true? Id like to just match the beginning of string x to the contents of the array while ignoring the case and the following characters. I thought I needed regex to do this but I could be mistaken. I'm a bit of a newbie with regex.
It seems you want to check if your string contains an element from your list, so this should be what you are looking for:
if (arr.Any(c => x.ToLower().Contains(c)))
Or simpler:
if (arr.Any(x.ToLower().Contains))
Or based on your comments you may use this:
if (arr.Any(x.ToLower().Split(' ')[0].Contains))
Because you said you want regex...
you can set a regex to var regex = new Regex("(table|plate|fork)");
and check for if(regex.IsMatch(myString)) { ... }
but it for the issue at hand, you dont have to use Regex, as you are searching for an exact substring... you can use
(as #S.Akbari mentioned : if (arr.Any(c => x.ToLower().Contains(c))) { ... }
Enumerable.Contains matches exact values (and there is no build in compare that checks for "starts with"), you need Any that takes predicate that takes each array element as parameter and perform the check. So first step is you want "contains" to be other way around - given string to contain element from array like:
var myString = "some string"
if (arr.Any(arrayItem => myString.Contains(arrayItem)))...
Now you actually asking for "string starts with given word" and not just contains - so you obviously need StartsWith (which conveniently allows to specify case sensitivity unlike Contains - Case insensitive 'Contains(string)'):
if (arr.Any(arrayItem => myString.StartsWith(
arrayItem, StringComparison.CurrentCultureIgnoreCase))) ...
Note that this code will accept "tableAAA bob" - if you really need to break on word boundary regular expression may be better choice. Building regular expressions dynamically is trivial as long as you properly escape all the values.
Regex should be
beginning of string - ^
properly escaped word you are searching for - Escape Special Character in Regex
word break - \b
if (arr.Any(arrayItem => Regex.Match(myString,
String.Format(#"^{0}\b", Regex.Escape(arrayItem)),
RegexOptions.IgnoreCase)) ...
you can do something like below using TypeScript. Instead of Starts with you can also use contains or equals etc..
public namesList: Array<string> = ['name1','name2','name3','name4','name5'];
// SomeString = 'name1, Hello there';
private isNamePresent(SomeString : string):boolean{
if (this.namesList.find(name => SomeString.startsWith(name)))
return true;
return false;
}
I think I understand what you are trying to say here, although there are still some ambiguity. Are you trying to see if 1 word in your String (which is a sentence) exists in your array?
#Amy is correct, this might not have to do with Regex at all.
I think this segment of code will do what you want in Java (which can easily be translated to C#):
Java:
x = x.ToLower();
string[] words = x.Split("\\s+");
foreach(string word in words){
foreach(string element in arr){
if(element.Equals(word)){
return true;
}
}
}
return false;
You can also use a Set to store the elements in your array, which can make look up more efficient.
Java:
x = x.ToLower();
string[] words = x.Split("\\s+");
HashSet<string> set = new HashSet<string>(arr);
for(string word : words){
if(set.contains(word)){
return true;
}
}
return false;
Edit: (12/22, 11:05am)
I rewrote my solution in C#, thanks to reminders by #Amy and #JohnyL. Since the author only wants to match the first word of the string, this edited code should work :)
C#:
static bool contains(){
x = x.ToLower();
string[] words = x.Split(" ");
var set = new HashSet<string>(arr);
if(set.Contains(words[0])){
return true;
}
return false;
}
Sorry my question was so vague but here is the solution thanks to some help from a few people that answered.
var regex = new Regex("^(table|chair|plate) *.*");
if (regex.IsMatch(x.ToLower())){}

Find only equal words in list exist in string

I have several lists, with words content about 2000-3000 words:
var list1 = new List<string> {"able", "adorable", "adventurous", ...};
and than if string inputStr = "do, dream"; contains any value from list, I want, look for each word in string into string[] words = inputStr.Split(' '); foreach (string word in words) with if (list1.Any(word.Contains)).
I'm not sure, maybe it is because I use list, or my search Contains method is not correct for this case, but in result I found words, which is not equal to words exist in input string, but which contains this words as part of word, for example for word "do" or word "dream":
(do) adorable, doubt, fully, do, doh, freedom, down, double
(dream) dreamily, dream
Not sure how to avoid this, maybe better use Dictionary or SortedDictionary if problem is list. Same result I have if I check it this way var val1 = list1.FirstOrDefault(stringToCheck => stringToCheck.Contains(word)); Seems like different search gives me same results with list, all words which contains found words in input string as part of word, but desired result is to find only equal words:
(do) do
(dream) dream
IndexOf() method will get you the index of any equivalent strings within the collection.
You could also do something like this with LINQ:
list.Any(x => x == "testString");
To find the sequence that contains your "word" you should use Linq :
// (do) adorable, doubt, fully, do, doh, freedom, down, double
var result = list1.Select(word => word.Contains("do"));
But if you're trying to get word that matches fully :
var result = list1.Select(word => word.Equals("do"));
Combining this with your input list :
var result = input.SelectMany(x => list1.Where(w => w.Equals(x)));
EDIT:
Here you can check it online
You can get it done with a single Linq line:
List<string> list1 = new List<string> { "able", "adorable", "adventurous" };
string inputstr = "the adorable adventurous cat";
var found_words = inputstr.Split(' ').Where(word => list1.Contains(word));
// found_words[0] = "adorable"
// found_words[1] = "adventurous"
if (list1.Contains(word))
Will only match whole exact strings in list.
But in that case, you should make list1 a HashSet instead, that will have much better performance.
Linq is still your best bet. Assuming you want case sensitivity but don't want to observe hanging whitespace:
public string Foo(string input, List<string> list)
{
return (list.FirstOrDefault(t.Trim() == input.Trim()));
}
I personally prefer to compare strings by value than using Equals most of the time, though for string comparisons you may want to narrow down Culture as necessary..

Match all of desired characters in string even when they are not next to each other

I am iterating through a list of words and I need to find words that contain ALL of desired characters. I know how to find a substring but that find words that have the characters next to each other. I want to create something that determines if the string contains all of the characters even when they are not next to each other.
For example if I have a string "ent", words in the list like "element", "nintendo", "telephone" would show up.
I currently something have this logic:
String textLine = "element";
Regex regX = new Regex("e|n|t");
bool containsAny = regX.IsMatch(textLine);
This currently returns true if ANY of the characters exist in the string. I want to create a Regex (or anything else) that will find words that match ALL desired characters. I'm writing this in C#.
Thanks!
You can use Linq
var desiredChars = "ent";
var word = "element";
bool contains = desiredChars.All(word.Contains);

How can I compare a string to a "filter" list in linq?

I'm trying to filter a collection of strings by a "filter" list... a list of bad words. The string contains a word from the list I dont want it.
I've gotten so far, the bad Word here is "frakk":
string[] filter = { "bad", "words", "frakk" };
string[] foo =
{
"this is a lol string that is allowed",
"this is another lol frakk string that is not allowed!"
};
var items = from item in foo
where (item.IndexOf( (from f in filter select f).ToString() ) == 0)
select item;
But this aint working, why?
You can use Any + Contains:
var items = foo.Where(s => !filter.Any(w => s.Contains(w)));
if you want to compare case-insensitively:
var items = foo.Where(s => !filter.Any(w => s.IndexOf(w, StringComparison.OrdinalIgnoreCase) >= 0));
Update: If you want to exlude sentences where at least one word is in the filter-list you can use String.Split() and Enumerable.Intersect:
var items = foo.Where(sentence => !sentence.Split().Intersect(filter).Any());
Enumerable.Intersect is very efficient since it uses a Set under the hood. it's more efficient to put the long sequence first. Due to Linq's deferred execution is stops on the first matching word.
( note that the "empty" Split includes other white-space characters like tab or newline )
The first problem you need to solve is breaking up the sentence into a series of words. The simplest way to do this is based on spaces
string[] words = sentence.Split(new char[] {' '}, StringSplitOptions.RemoveEmptyEntries);
From there you can use a simple LINQ expression to find the profanities
var badWords = words.Where(x => filter.Contains(x));
However this is a bit of a primitive solution. It won't handle a number of complex cases that you likely need to think about
There are many characters which qualify as a space. My solution only uses ' '
The split doesn't handle punctuations. So dog! won't be viewed as dog. Probably much better to break up words on legal characters
The reason your initial attempt didn't work is that this line:
(from f in filter select f).ToString()
evaluates to a string of the Array Iterator type name that's implied by the linq expression portion. So you're actually comparing the characters of the following string:
System.Linq.Enumerable+WhereSelectArrayIterator``2[System.String,System.String]
rather than the words of the filter when examining your phrases.

Multiple pattern matching using RegEx

I'm trying to use RegEx to split a string into several objects. Each record is separated by a :, and each field is separated by a ~.
So sample data would look like:
:1~Name1:2~Name2:3~Name3
The RegEx I have so far is
:(?<id>\d+)~(?<name>.+)
This however will only match the first record, when really I would expect 3. How do I get the RegEx to return all matches rather than just the first?
Your last .+ is greedy, so it gobbles up the Name1 as well as the rest of the string.
Try
:(?<id>\d+)~(?<name>[^:]+)
This means that the Name can't have a : in it (which is probably OK for your data), and makes sure the name doesn't grab into the next field.
(And also use the Regex.Matches method which grabs all matches, not just the first).
Use:
var result = Regex.Matches(input, #":(?<id>\d+)~(?<name>[^:]+)").Cast<Match>()
.Select(m => new
{
Id = m.Groups["id"].Value,
Name = m.Groups["name"].Value
});
you better use .split() method for strings.
String[] records = myString.split(':');
for(String rec : records)
{
String[] fields = rec.split('~');
//use fields
}

Categories

Resources