Find uncommon characters between two strings

Find uncommon characters between two strings - c#

I have following code:
public static void Main (string[] args) {
string word1 = "AN";
string word2 = "ANN";
//First try:
var intersect = word1.Intersect(word2);
var unCommon1 = word1.Except(intersect).Union(word2.Except(intersect));
//Second try:
var unCommon = word1.Except(word2).Union(word2.Except(word1));
}
Result I am trying to get is N. I tried few ways to get it by reading online posts, I am unable to figure it out. Is there a way to get uncommon character between two strings using linq.
Order of characters in string does not matter.
Here are few more scenarios:
FOO & BAR will result in F,O,O,B,A,R.
ANN & NAN will result in empty string.

Here's a straight forward LINQ function.
string word1 = "AN";
string word2 = "ANN";
//get all the characters in both strings
var group = string.Concat(word1, word2)
//remove duplicates
.Distinct()
//count the times each character appears in word1 and word2, find the
//difference, and repeat the character difference times
.SelectMany(i => Enumerable.Repeat(i, Math.Abs(
word1.Count(j => j == i) -
word2.Count(j => j == i))));

Related

Organizing a list of strings (sentences) by the number of times one word occurs in the sentence C#

I currently have a List (searchResults) of strings which are all sentences, which contain the mostPopular string word (mostPopular) in a large piece of text (SentenceList). This all works very well, I'm able to count the number of occurrences each word has in each sentence in the second foreach loop, shown by the d++. However, I'm having trouble then ordering each sentence in searchResults by the most popular word shown by d.
List<string> searchResults = SentenceList.FindAll(s => s.Contains(mostPopular));
foreach (string i in searchResults)
{ int d = 0;
string[] T = i.Split(" ");
foreach (string l in T)
{
if (l.Contains(mostPopular)) { d++; }
else { continue; }
}
Console.WriteLine(i + d);
}
}
Any help would be greatly appreciated, or any recommendations on improving the question, to help me find an answer would be great!
My overall goal is to find the sentence which has the most occurrences of the most popular word, I need it in an ordered list because then I want to select a number of the strings depending on the value typed in by the user.
Many thanks

This is very inefficient, as the inner loop is generating the split every time. In any case, don't write a sorting algorithm yourself, use the library functions.
List<(string s, int c)> searchResults = SentenceList
.Where(s => s.Contains(mostPopular))
// Find will materialize the list, Where does not
.Select(s => (s, s.Split(" ").Count(word => word.Contains(mostPopular)))
// Select tuple of string and count of matches
.ToList(); // materialize only at the end
searchResults.Sort((a, b) => a.c.CompareTo(b.c));
// This is a lambda taking two tuples a and b and comparing the count.
// To invert the order, add a - (minus) after the =>
//If you just need to get the top one: (for this you could use IEnumerable and remove ToList above)
(string s, int c) highest = default;
foreach (var tuple in searchResults)
{
if(tuple.c > highest.c)
highest = tuple;
}

You can do it using LINQ as follows:
string result = a
.Select(s => (count: s.Split(' ').Count(w => w == mostPopular), sentence: s))
.OrderByDescending(e => e.count)
.First()
.sentence;
By forming a Tuple of the count and the sentence, sort that and then grab top or how many entries you want and the decompose the tuple to get the sentence back.

Not sure if this is what you are looking for, but following checks the 'mostPopular' word in list of strings and sorts them highest to lowest.
What I am doing is, create an entry in a new object that holds the string and the number of occurences that the string has of the popular word. Once you have this object, you can use Linq to do the ordering and then printing.
List<string> SentenceList = new List<string>()
{
"This contains 1 mostPopular",
"This contains 2 mostPopular mostPopular",
"This contains 4 mostPopular mostPopular mostPopular mostPopular",
"This contains 3 mostPopular mostPopular mostPopular"
};
var listOfPopular = SentenceList.Select(x => new { str = x, count = x.Split(' ').Where(z => z.Equals("mostPopular")).Count() });
Console.WriteLine(string.Join(Environment.NewLine, listOfPopular.OrderByDescending(x => x.count).Select(x => x.str)));
// Prints
This contains 4 mostPopular mostPopular mostPopular mostPopular
This contains 3 mostPopular mostPopular mostPopular
This contains 2 mostPopular mostPopular
This contains 1 mostPopular

There are many ways to do this, however there are a many ways to get it wrong depending on your needs.
Maybe this is a job for regex and word boundaries \b.
The word boundary \b matches positions where one side is a word
character (usually a letter, digit or underscore) and the other side
is not a word character (for instance, it may be the beginning of the
string or a space character
var mostPopular = "bob";
var sentenceList = new List<string>()
{
"Bob is awesome, we like bob bobbing and stuff",
"This is just some line",
"bob is great",
"I like bobbing"
};
// create a nice compiled regex
var regex = new Regex(#$"\b{mostPopular}\b", RegexOptions.Compiled | RegexOptions.IgnoreCase);
// get a list of sentences that contain your favorite word
var searchResults = sentenceList.Where(s => regex.Match(s).Success);
// project the count and sentence to a tuple
var results = searchResults
.Select(x => (Sentence: x, regex.Matches(x).Count))
.OrderByDescending(x => x.Count); // order by the count
// print the results
foreach (var (sentence, count) in results)
Console.WriteLine($"{count} : {sentence}");
Output
2 : Bob is awesome, we like bob bobbing and stuff
1 : bob is great
Note, "bob" is only found twice in the first sentence and not at all in the last. Though, this may or may not mater to you.
Full Demo Here

find repeated substring in a string

I have a substring
string subString = "ABC";
Every time all three chars appears in a input, you get one point
for example, if input is:
"AABKM" = 0 points
"AAKLMBDC" = 1 point
"ABCC" = 1 point because all three occurs once
"AAZBBCC" = 2 points because ABC is repeated twice;
etc..
The only solution I could come up with is
Regex.Matches(input, "[ABC]").Count
But does not give me what I'm looking for.
Thanks

You could use a ternary operation, where first we determine that all the characters are present in the string (else we return 0), and then select only those characters, group by each character, and return the minimum count from the groups:
For example:
string subString = "ABC";
var inputStrings = new[] {"AABKM", "AAKLMBDC", "ABCC", "AAZBBCC"};
foreach (var input in inputStrings)
{
var result = subString.All(input.Contains)
? input
.Where(subString.Contains)
.GroupBy(c => c)
.Min(g => g.Count())
: 0;
Console.WriteLine($"{input}: {result}");
}
Output

It could be done with a single line, using Linq. However I am not very confident that this could be a good solution
string subString = "ABC";
string input = "AAZBBBCCC";
var arr = input.ToCharArray()
.Where(x => subString.Contains(x))
.GroupBy(x => x)
.OrderBy(a => a.Count())
.First()
.Count();
The result is 2 because the letter A is present only two times.
Let's try to explain the linq expression.
First transform the input string in a sequence of chars, then take only the chars that are contained in the substring. Now group these chars and order them according the the number of occurrences. At this point take the first group and read the count of chars in that group.
Let's see if someone has a better solution.

try this code :
string subString = "ABC";
var input = new[] { "AABKM", "AAKLMBDC", "ABCC", "AAZBBCC" };
foreach (var item in input)
{
List<int> a = new List<int>();
for (int i = 0; i < subString.Length; i++)
{
a.Add(Regex.Matches(item, subString.ToList()[i].ToString()).Count);
}
Console.WriteLine($"{item} : {a.Min()}");
}

C# Find Like Strings In Array

Question:
I have a array of strings and I am trying to find the closest match to a provided string. I have made a few attempts below as well as checking into some other solutions such as Levenshtein Distance which seems to only work if all the strings are of similar sizes.
Expetation:
If I were to use "two are better" as the match string is that it would match with "Two are better than one".
Thought:
I was wondering if breaking apart the stringToMatch string where there are spaces and then seeing if each of those parts of the stringToMatch string are found in the current iteration of the array ( arrayOfStrings[i] ) would be helpful at all?
// Test array and string to search
string[] arrayOfStrings = new string[] { "A hot potato", "Two are better than one", "Best of both worlds", "Curiosity killed the cat", "Devil's Advocate", "It takes two to tango", "a twofer" };
string stringToMatch = "two are better";
// Contains attempt
List<string> likeNames = new List<string>();
for (int i = 0; i < arrayOfStrings.Count(); i++)
{
if (arrayOfStrings[i].Contains(stringToMatch))
{
Console.WriteLine("Hit1");
likeNames.Add(arrayOfStrings[i]);
}
if (stringToMatch.Contains(arrayOfStrings[i]))
{
Console.WriteLine("Hit2");
likeNames.Add(arrayOfStrings[i]);
}
}
// StringComparison attempt
var matches = arrayOfStrings.Where(s => s.Equals(stringToMatch, StringComparison.InvariantCultureIgnoreCase)).ToList();
// Display matched array items
Console.WriteLine("List likeNames");
likeNames.ForEach(Console.WriteLine);
Console.WriteLine("\n");
Console.WriteLine("var matches");
matches.ForEach(Console.WriteLine);

You can try below code.
I have created List<string> based on your stringToMatch and checked if strings in array of strings contains every string present in toMatch, if yes then selected that string into match.
List<string> toMatch = stringToMatch.Split(' ').ToList();
List<string> match = arrayOfStrings.Where(x =>
!toMatch.Any(ele => !x.ToLower()
.Contains(ele.ToLower())))
.ToList();

For your implementation, I have split the stringToMatch and then took the count for matchings.
The below code will give you Order list with count with ordered with Highest string match count.
string[] arrayOfStrings = new string[] { "A hot potato", "Two are better than one", "Best of both worlds", "Curiosity killed the cat", "Devil's Advocate", "It takes two to tango", "a twofer" };
string stringToMatch = "two are better";
var matches = arrayOfStrings
.Select(s =>
{
int count = 0;
foreach (var item in stringToMatch.Split(' '))
{
if (s.Contains(item))
count++;
}
return new { count, s };
}).OrderByDescending(d => d.count);
I have used very simple string comparison to verify. The algorithm can vary as per exact requirement(Like sequence of matching string, etc)

Regex Help String Matching

I've got a long string in the format of:
WORD_1#WORD_3#WORD_5#CAT_DOG_FISH#WORD_2#WORD_3#CAT_DOG_FISH_2#WORD_7
I'm trying to dynamically match a string so I can return its position within the string.
I know the string will start with CAT_DOG_ but the FISH is dynamic and could be anything. It's also important not to match on the CAT_DOG_FISH_2(int)
Basically, I need to get back a match on any word starting with [CAT_DOG_] but not ending in [_(int)]
I've tried a few different think and I don't seem to be getting anywhere, any help appreciated.
Once I have the regex to match, I'll be able to get the index of the match, then work out when the next #(delimiter) is , which will get me the start/end position of the word, I can then substring it out to return the full word.
I hope that makes sense?

Personally I avoid Regex whenever possible as I find them hard to read and maintain unless you use them a lot, so here is a non-regex solution:
string words = "WORD_1#WORD_3#WORD_5#CAT_DOG_FISH#WORD_2#WORD_3#CAT_DOG_FISH_2#WORD_7";
var result = words.Split('#')
.Select((w,p) => new { WholeWord = w, SplitWord = w.Split('_'), Position = p, Dynamic = w.Split('_').Last() })
.FirstOrDefault(
x => x.SplitWord.Length == 3 &&
x.SplitWord[0] == "CAT" &&
x.SplitWord[1] == "DOG");
That gives you the whole word, the dynamic part and the position. I does assume the dynamic part doesn't have underscores.

You can use the following regex:
\bCAT_DOG_[a-zA-Z]+(?!_\d)\b
See demo
Or (if the FISH is really anything, but not _ or #):
\bCAT_DOG_[^_#]+(?!_\d)\b
See demo
The word boundaries \b with the look-ahead (?!_\d) (meaning that there must be no _ and a digit) help us return only the required strings. The [^_#] character class matches any character but a _ or #.
You can get the indices using LINQ:
var s = "WORD_1#WORD_3#WORD_5#CAT_DOG_FISH#WORD_2#WORD_3#CAT_DOG_FISH_2#WORD_7";
var rx1 = new Regex(#"\bCAT_DOG_[^_#]+(?!_\d)\b");
var indices = rx1.Matches(s).Cast<Match>().Select(p => p.Index).ToList();
Values can be obtained like this:
var values = rx1.Matches(s).Cast<Match>().Select(p => p.Value).ToList();
Or together:
var values = rx1.Matches(s).OfType<Match>().Select(p => new { p.Index, p.Value }).ToList();

Thanks for the help guys, since i know the int the string will end with I've settled on this:
int i = 0;
string[] words = textBox1.Text.Split('#');
foreach (string word in words)
{
if (word.StartsWith("CAT_DOG_") && (!word.EndsWith(i.ToString())) )
{
//process here
MessageBox.Show("match is: " + word);
}
}
Thanks to Eser for pointing me towards String.Split()

How to get words from a text box

In my WPF application I have a text box named: textBox.
I'm trying to get each individual word from a sentence typed by user in a string array, say arrayWords.
I found a piece of code on stackOverFlow that counts the number of words but I want to copy each individual word.
Bellow is the code for counting the number of words.
String text = textBox.Text.Trim();
int wordCount = 0, index = 0;
while (index < text.Length)
{
// check if current char is part of a word
while (index < text.Length && Char.IsWhiteSpace(text[index]) == false)
index++;
wordCount++;
// skip whitespace until next word
while (index < text.Length && Char.IsWhiteSpace(text[index]) == true)
index++;
}

You can use String.Split function.
String text = textBox.Text.Trim()
var words = text.Split(' ');
or
var words = text.Split(); // Default parameter is taken as whitespace delimiter

While #dotNET answer is on the right way, it assumes, that you should maintain punctuation marks list yourself (it isn't full in his answer). Besides, there could be words with hyphen.
I'd recommend to use regular expression:
var words = Regex.Matches(textBox.Text, #"\w+-?\w+")
.OfType<Match>()
.Select(m => m.Value)
.ToArray();

String.Split() can chop your sentence(s) into words. You should however take care of trimming punctuation characters from your words. For example if you use Split() on the sentence "StackOverflow is good, I like it.", two of the words you get in your array will have comma and period characters appended to them. So you should use something like this to get "pure" words:
string[] words = textBox.Text.Split().Select(x => x.TrimEnd(",.;:-".ToCharArray())).ToArray();
LINQ has been used in the above statement, so you should import System.Linq.

The following code will give an array of words from your textBox.
string[] words = textBox.Text.Split(" ");

string[] words = textBox.Text.Split(new char(" "));

Logic behind the getting words from sentence is that first you split sentence into words and then store these words into an array of string then you can do any thing you want. the code below will surely help you to sort out your problem.
static void Main(string[] args)
{
string sentence = "Thats the sentence";
string[] operands = Regex.Split(sentence,#" ");
foreach(string i in operands)
{
Console.WriteLine(i);
}
Console.ReadLine();
}
It will extract words from the sentence and will store in array and will display them.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Find uncommon characters between two strings - c#

Related

Organizing a list of strings (sentences) by the number of times one word occurs in the sentence C#

find repeated substring in a string

C# Find Like Strings In Array

Regex Help String Matching

How to get words from a text box

Categories

Resources