Get the difference between 2 strings - c#

I'm attempting to calculate the difference between two strings
For example
string val1 = "Have a good day";
string val2 = "Have a very good day, Joe";
The result would be a list of string, with 2 items "very " and ", Joe"
So far my research into this task hasn't turned up much
Edit: The result would probably need to be 2 separate lists of strings, one that hold additions, and one that hold removals

This is the simplest version I can think of:
class Program
{
static void Main(string[] args)
{
string val1 = "Have a good day";
string val2 = "Have a very good day, Joe";
MatchCollection words1 = Regex.Matches(val1, #"\b(\w+)\b");
MatchCollection words2 = Regex.Matches(val2, #"\b(\w+)\b");
var hs1 = new HashSet<string>(words1.Cast<Match>().Select(m => m.Value));
var hs2 = new HashSet<string>(words2.Cast<Match>().Select(m => m.Value));
// Optionaly you can use a custom comparer for the words.
// var hs2 = new HashSet<string>(words2.Cast<Match>().Select(m => m.Value), new MyComparer());
// h2 contains after this operation only 'very' and 'Joe'
hs2.ExceptWith(hs1);
}
}
custom comparer:
public class MyComparer : IEqualityComparer<string>
{
public bool Equals(string one, string two)
{
return one.Equals(two, StringComparison.OrdinalIgnoreCase);
}
public int GetHashCode(string item)
{
return item.GetHashCode();
}
}

Actually i followed this steps,
(i)Obtain all words from two words irrespective of special characters
(ii)From the two lists find the difference
CODE:
string s2 = "Have a very good day, Joe";
IEnumerable<string> diff;
MatchCollection matches = Regex.Matches(s1, #"\b[\w']*\b");
IEnumerable<string> first= from m in matches.Cast<Match>()
where !string.IsNullOrEmpty(m.Value)
select TrimSuffix(m.Value);
MatchCollection matches1 = Regex.Matches(s2, #"\b[\w']*\b");
IEnumerable<string> second = from m in matches1.Cast<Match>()
where !string.IsNullOrEmpty(m.Value)
select TrimSuffix(m.Value);
if (second.Count() > first.Count())
{
diff = second.Except(first).ToList();
}
else
{
diff = first.Except(second).ToList();
}
}
static string TrimSuffix(string word)
{
int apostropheLocation = word.IndexOf('\'');
if (apostropheLocation != -1)
{
word = word.Substring(0, apostropheLocation);
}
return word;
}
OUTPUT:
very, Joe

This code:
enum Where { None, First, Second, Both } // somewhere in your source file
//...
var val1 = "Have a good calm day calm calm calm";
var val2 = "Have a very good day, Joe Joe Joe Joe";
var words1 = from m in Regex.Matches(val1, "(\\w+)|(\\S+\\s+\\S+)").Cast<Match>()
where m.Success
select m.Value.ToLower();
var words2 = from m in Regex.Matches(val2, "(\\w+)|(\\S+\\s+\\S+)").Cast<Match>()
where m.Success
select m.Value.ToLower();
var dic = new Dictionary<string, Where>();
foreach (var s in words1)
{
dic[s] = Where.First;
}
foreach (var s in words2)
{
Where b;
if (!dic.TryGetValue(s, out b)) b = Where.None;
switch (b)
{
case Where.None:
dic[s] = Where.Second;
break;
case Where.First:
dic[s] = Where.Both;
break;
}
}
foreach (var kv in dic.Where(x => x.Value != Where.Both))
{
Console.WriteLine(kv.Key);
}
Gives us 'calm', 'very', ', Joe' and 'Joe' which are differences from both strings; 'calm' from the first one and 'very', ', Joe' and 'Joe' from the next one. It also removes repeated cases.
And to get two separate lists that shows us which word came from which text:
var list1 = dic.Where(x => x.Value == Where.First).ToList();
var list2 = dic.Where(x => x.Value == Where.Second).ToList();
foreach (var kv in list1)
{
Console.WriteLine("{0}: {1}", kv.Key, kv.Value);
}
foreach (var kv in list2)
{
Console.WriteLine("{0}: {1}", kv.Key, kv.Value);
}

Put the characters into two sets then compute the relative compliment of those sets.
The relative compliment will be available in any good set library.
You might want to take care to preserve the order of the characters.

you have to remove the ',' in order to get the expected result
string s1 = "Have a good day";
string s2 = "Have a very good day, Joe";
int index = s2.IndexOf(','); <----- get the index of the char to be removed
IEnumerable<string> diff;
IEnumerable<string> first = s1.Split(' ').Distinct();
IEnumerable<string> second = s2.Remove(index, 1).Split(' ').Distinct();<--- remove it
if (second.Count() > first.Count())
{
diff = second.Except(first).ToList();
}
else
{
diff = first.Except(second).ToList();
}

Related

Get string from another string array if value matches

String Array 1: (In this format: <MENU>|<Not Served?>|<Alternate item served>)
Burger|True|Sandwich
Pizza|True|Hot Dog
String Array 2: (Contains Menu)
Burger
Pizza
Grill Chicken
Pasta
I need the menu is served or any alternate item served for that particular item.
Code:
for(int i = 0; i < strArr2.Length; i++)
{
if(strArr2.Any(_r => _r.Split('|').Any(_rS => _rS.Contains(strArr1[i]))))
{
var menu = strArr2[i];
var alternate = ? // need to get alternate item
}
}
As I commented in the code, how to get the alternate item in that string array? Please help, thanks in advance.
P.S: Any help to trim if condition is also gladly welcome.
Instead of any, you may use Where to get the value matching.
#Markus is having the detailed answer, I am just using your code to find a quick fix for you.
for(int i = 0; i < strArr2.Length; i++)
{
if(strArr2.Any(_r => _r.Split('|').Any(_rS => _rS.Contains(strArr1[i]))))
{
var menu = strArr2[i];
var alternate = strArr2.Where(_rs => _rs.Split('|').Any(_rS => _rS.Contains(strArr1[i]))).First().Split('|').Last();
}
}
In order to simplify your code, it is a good idea to better separate the tasks. For instance, it will be much easier to handle the contents of string array 1 after you have converted the contents into objects, e.g.
class NotServedMenu
{
public string Menu { get; set; }
public bool NotServed { get; set; }
public string AlternateMenu { get; set; }
}
Instead of having an array of strings, you can read the strings to a list first:
private IEnumerable<NotServedMenu> NotServedMenusFromStrings(IEnumerable<string> strings)
{
return (from x in strings select ParseNotServedMenuFromString(x)).ToArray();
}
private NotServedMenu ParseNotServedMenuFromString(string str)
{
var parts = str.Split('|');
// Validate
if (parts.Length != 3)
throw new ArgumentException(string.Format("Unable to parse \"{0}\" to an object of type {1}", str, typeof(NotServedMenu).FullName));
bool notServedVal;
if (!bool.TryParse(parts[1], out notServedVal))
throw new ArgumentException(string.Format("Unable to read bool value from \"{0}\" in string \"{1}\".", parts[1], str));
// Create object
return new NotServedMenu() { Menu = parts[0],
NotServed = notServedVal,
AlternateMenu = parts[2] };
}
Once you can use the objects, the subsequent code will be much cleaner to read:
var notServedMenusStr = new[]
{
"Burger|True|Sandwich",
"Pizza|True|Hot Dog"
};
var notServedMenus = NotServedMenusFromStrings(notServedMenusStr);
var menus = new[]
{
"Burger",
"Pizza",
"Grill Chicken",
"Pasta"
};
var alternateMenus = (from m in menus join n in notServedMenus on m equals n.Menu select n);
foreach(var m in alternateMenus)
Console.WriteLine("{0}, {1}, {2}", m.Menu, m.NotServed, m.AlternateMenu);
In this sample, I've used a Linq join to find the matching items.
You could do something like that
string[] strArr1 = { "Burger|True|Sandwich", "Pizza|True|Hot Dog" };
string[] strArr2 = { "Burger", "Pizza", "Grill Chicken", "Pasta" };
foreach (string str2 in strArr2)
{
string str1 = strArr1.FirstOrDefault(str => str.Contains(str2));
if (str1 != null)
{
string[] splited = str1.Split('|');
string first = splited[0];
bool condition = Convert.ToBoolean(splited[1]);
string second = splited[2];
}
}

How to find maximum number of repeated string in a string in a list of string in c#

If we have a list of strings, then how we can find the list of strings that have the maximum number of repeated symbol by using LINQ.
List <string> mylist=new List <string>();
mylist.Add("%1");
mylist.Add("%136%250%3"); //s0
mylist.Add("%1%5%20%1%10%50%8%3"); // s1
mylist.Add("%4%255%20%1%14%50%8%4"); // s2
string symbol="%";
List <string> List_has_MAX_num_of_symbol= mylist.OrderByDescending(s => s.Length ==max_num_of(symbol)).ToList();
//the result should be a list of s1 + s2 since they have **8** repeated '%'
I tried
var longest = mylist.Where(s => s.Length == mylist.Max(m => m.Length)) ;
this gives me only one string not both
Here's a very simple solution, but not exactly efficient. Every element has the Count operation performed twice...
List<string> mylist = new List<string>();
mylist.Add("%1");
mylist.Add("%136%250%3"); //s0
mylist.Add("%1%5%20%1%10%50%8%3"); // s1
mylist.Add("%4%255%20%1%14%50%8%4"); // s2
char symbol = '%';
var maxRepeat = mylist.Max(item => item.Count(c => c == symbol));
var longest = mylist.Where(item => item.Count(c => c == symbol) == maxRepeat);
It will return 2 strings:
"%1%5%20%1%10%50%8%3"
"%4%255%20%1%14%50%8%4"
Here is an implementation that depends upon SortedDictionary<,> to get what you're after.
var mylist = new List<string> {"%1", "%136%250%3", "%1%5%20%1%10%50%8%3", "%4%255%20%1%14%50%8%4"};
var mappedValues = new SortedDictionary<int, IList<string>>();
mylist.ForEach(str =>
{
var count = str.Count(c => c == '%');
if (mappedValues.ContainsKey(count))
{
mappedValues[count].Add(str);
}
else
{
mappedValues[count] = new List<string> { str };
}
});
// output to validate output
foreach (var str in mappedValues.Last().Value)
{
Console.WriteLine(str);
}
Here's one using LINQ that gets the result you're after.
var result = (from str in mylist
group str by str.Count(c => c == '%')
into g
let max = (from gKey in g select g.Key).Max()
select new
{
Count = max,
List = (from str2 in g select str2)
}).LastOrDefault();
OK, here's my answer:
char symbol = '%';
var recs = mylist.Select(s => new { Str = s, Count = s.Count(c => c == symbol) });
var maxCount = recs.Max(x => x.Count);
var longest = recs.Where(x => x.Count == maxCount).Select(x => x.Str).ToList();
It is complicated because it has three lines (the char symbol = '%'; line excluded), but it counts each string only once. EZI's answer has only two lines, but it is complicated because it counts each string twice. If you really want a one-liner, here it is:
var longest = mylist.Where(x => x.Count(c => c == symbol) == mylist.Max(y => y.Count(c => c == symbol))).ToList();
but it counts each string many times. You can choose whatever complexity you want.
We can't assume that the % is always going to be the most repeated character in your list. First, we have to determine what character appears the most in an individual string for each string.
Once we have the character and it maximum occurrence, we can apply Linq to the List<string> and grab the strings that contain the character equal to its max occurrence.
using System;
using System.Collections.Generic;
using System.Linq;
public class Program
{
public static void Main()
{
List <string> mylist=new List <string>();
mylist.Add("%1");
mylist.Add("%136%250%3");
mylist.Add("%1%5%20%1%10%50%8%3");
mylist.Add("%4%255%20%1%14%50%8%4");
// Determine what character appears most in a single string in the list
char maxCharacter = ' ';
int maxCount = 0;
foreach (string item in mylist)
{
// Get the max occurrence of each character
int max = item.Max(m => item.Count(c => c == m));
if (max > maxCount)
{
maxCount = max;
// Store the character whose occurrence equals the max
maxCharacter = item.Select(c => c).Where(c => item.Count(i => i == c) == max).First();
}
}
// Print the strings containing the max character
mylist.Where(item => item.Count(c => c == maxCharacter) == maxCount)
.ToList().ForEach(Console.WriteLine);
}
}
Results:
%1%5%20%1%10%50%8%3
%4%255%20%1%14%50%8%4
Fiddle Demo
var newList = myList.maxBy(x=>x.Count(y=>y.Equals('%'))).ToList();
This should work. Please correct syntax if wrong anywhere and update here too if it works for you.

How can I compare and update the first occurrence in an array overlap?

Say I have the following 2 arrays...
string[] A = ["word1", "word2", "word3"];
string[] B = ["word0", "word1", "word2", "word3", "word4", "word5", "word6", "word1", "word2", "word3"];
If I want to compare A to B and remove the first occurrence in B so it looks like this...
string[] B = ["word0", " ", " ", " ", "word4", "word5", "word6", "word1", "word2", "word3"];
How would I go about this?
A straightforward way would be to use Array.IndexOf to find the first occurrence of each word from A in B:
foreach (var word in A)
{
var index = Array.IndexOf(B, word);
if (index >= 0) {
B[index] = " "; // or whatever other value
}
}
Note this might not work as expected if the replacement value is itself present inside A -- if that is possible you should specify what you want to happen.
Update: It looks like you want to find and replace the subsequence A as a whole inside B, and not individual elements. This is a very different problem. One (naive) implementation would be:
var start = Enumerable.Range(0, B.Length - A.Length + 1)
.Where(i => B.Skip(i).Take(A.Length).SequenceEqual(A))
.DefaultIfEmpty(-1)
.First();
if (start != -1)
{
for (var i = 0; i < A.Length; ++i)
{
B[start + i] = " ";
}
}
I fully support the answer #Jon gave. It is quite fast and succinct and precise.
Still I had a totally different approach, a more functional one, just in case
by string[] you actually wanted to say something more streamy:
Say you have a possibly infinite sequence of strings instead of a primitive array in the role of B. It could be anything: A reading of entities coming straight from a database, a monadic string generator, anything:
string[] A = ["word1", "word2", "word3"];
IEnumerable[] B = ...;
You could write yourself a nice little extension method:
public static class MyHelpers {
public static IEnumerable<string> ReplaceFirstOccurrencesWithEmpty(this IEnumerable<string> #this, IEnumerable<string> a) {
// prepare a HashSet<string> to know how many A elements there still exist
var set = new Hashset<string>(a);
// iterate and apply the rule you asked about
// virtually forever (if needed)
foreach (var value in #this) {
if (set.Remove(value))
yield return "";
else
yield return value;
}
}
}
And then you could use it like so, even on your initial A and B arrays:
string[] A = ["word1", "word2", "word3"];
string[] B = ["word0", "word1", "word2", "word3", "word4", "word5", "word6", "word1", "word2", "word3"];
var cQuery = B.ReplaceFirstOccurrencesWithEmpty(A);
string[] c = cQuery.ToArray();
With LINQ
string[] A = new string[] { "word1", "word2", "word3" };
string[] B = new string[] { "word0", "word1", "word2", "word3", "word4", "word5", "word6", "word1", "word2", "word3" };
string[] result = B.Select((word, i) => i <= A.Length && i > 0 && A[i-1] == word ? "" : word).ToArray();

Given collection of strings, count number of times each word appears in List<T>

Input 1: List<string>, e.g:
"hello", "world", "stack", "overflow".
Input 2: List<Foo> (two properties, string a, string b), e.g:
Foo 1:
a: "Hello there!"
b: string.Empty
Foo 2:
a: "I love Stack Overflow"
b: "It's the best site ever!"
So i want to end up with a Dictionary<string,int>. The word, and the number of times it appears in the List<Foo>, either in the a or the b field.
Current first-pass/top of my head code, which is far too slow:
var occurences = new Dictionary<string, int>();
foreach (var word in uniqueWords /* input1 */)
{
var aOccurances = foos.Count(x => !string.IsNullOrEmpty(x.a) && x.a.Contains(word));
var bOccurances = foos.Count(x => !string.IsNullOrEmpty(x.b) && x.b.Contains(word));
occurences.Add(word, aOccurances + bOccurances);
}
Roughly:
Build a dictionary (occurrences) from the first input, optionally with a case-insensitive comparer.
For each Foo in the second input, use RegEx to split a and b into words.
For each word, check if the key exists in occurrences. If it exists, increment and update the value in the dictionary.
You could try concating the two strings a + b. Then doing a regex to pull out all the words into a collection. Then finally indexing that using a group by query.
For example
void Main()
{
var a = "Hello there!";
var b = "It's the best site ever!";
var ab = a + " " + b;
var matches = Regex.Matches(ab, "[A-Za-z]+");
var occurences = from x in matches.OfType<System.Text.RegularExpressions.Match>()
let word = x.Value.ToLowerInvariant()
group word by word into g
select new { Word = g.Key, Count = g.Count() };
var result = occurences.ToDictionary(x => x.Word, x => x.Count);
Console.WriteLine(result);
}
Example with some changes suggested...
Edit. Just reread the requirement....kinda strange but hey...
void Main()
{
var counts = GetCount(new [] {
"Hello there!",
"It's the best site ever!"
});
Console.WriteLine(counts);
}
public IDictionary<string, int> GetCount(IEnumerable<Foo> inputs)
{
var allWords = from input in inputs
let matchesA = Regex.Matches(input.A, "[A-Za-z']+").OfType<System.Text.RegularExpressions.Match>()
let matchesB = Regex.Matches(input.B, "[A-Za-z']+").OfType<System.Text.RegularExpressions.Match>()
from x in matchesA.Concat(matchesB)
select x.Value;
var occurences = allWords.GroupBy(x => x, (x, y) => new{Key = x, Count = y.Count()}, StringComparer.OrdinalIgnoreCase);
var result = occurences.ToDictionary(x => x.Key, x => x.Count, StringComparer.OrdinalIgnoreCase);
return result;
}

Remove duplicated elements from a List<String>

I would like to remove the duplicate elements from a List. Some elements of the list looks like this:
Book 23
Book 22
Book 19
Notebook 22
Notebook 19
Pen 23
Pen 22
Pen 19
To get rid of duplicate elements i've done this:
List<String> nodup = dup.Distinct().ToList();
I would like to keep in the list just
Book 23
Notebook 22
Pen 23
How can i do that ?
you can do someting like
string firstElement = dup.Distinct().ToList().First();
and add it to another list if you want.
It's not 100% clear what you want here - however...
If you want to keep the "largest" number in the list, you could do:
List<string> noDup = dup.Select(s => s.Split(new[] {' '}, StringSplitOptions.RemoveEmptyEntries)
.Select(p => new { Name=p[0], Val=int.Parse(p[1]) })
.GroupBy(p => p.Name)
.Select(g => string.Join(" ", g.Key, g.Max().ToString()))
.ToList();
This would transform the List<string> by parsing the numeric portion into a number, taking the max per item, and creating the output string as you have specified.
You can use LINQ in combination with some String operations to group all your itemy by name and MAX(Number):
var q = from str in list
let Parts = str.Split(new char[] { ' ' }, StringSplitOptions.RemoveEmptyEntries)
let item = Parts[ 0 ]
let num = int.Parse(Parts[ 1 ])
group new { Name = item, Number = num } by item into Grp
select new {
Name = Grp.Key,
Value = Grp.Max(i => i.Number).ToString()
};
var highestGroups = q.Select(g =>
String.Format("{0} {1}", g.Name, g.Value)).ToList();
(Same as Reed's approach but in query syntax which is better readable to my mind)
Edit: I cannot reproduce your comment that it does not work, here is sample data:
List<String> list = new List<String>();
list.Add("Book 23");
list.Add("Book 22");
list.Add("Book 19");
list.Add("Notebook 23");
list.Add("Notebook 22");
list.Add("Notebook 19");
list.Add("Pen 23");
list.Add("Pen 22");
list.Add("Pen 19");
list.Add("sheet 3");
var q = from str in list
let Parts = str.Split(new char[] { ' ' }, StringSplitOptions.RemoveEmptyEntries)
let item = Parts[ 0 ]
let num = int.Parse(Parts[ 1 ])
group new { Name = item, Number = num } by item into Grp
select new {
Name = Grp.Key,
Value = Grp.Max(i => i.Number).ToString()
};
var highestGroups = q.Select(g => String.Format("{0} {1}", g.Name, g.Value));
MessageBox.Show(String.Join(Environment.NewLine, highestGroups));
The result:
Book 23
Notebook 23
Pen 23
sheet 3
You may want to add a custom comparer as a parameter, as you can see in the example on MSDN.
In this example I assumed Foo is a class with two members.
class Program
{
static void Main(string[] args)
{
var list = new List<Foo>()
{
new Foo("Book", 23),
new Foo("Book", 22),
new Foo("Book", 19)
};
foreach(var element in list.Distinct(new Comparer()))
{
Console.WriteLine(element.Type + " " + element.Value);
}
}
}
public class Foo
{
public Foo(string type, int value)
{
this.Type = type;
this.Value = value;
}
public string Type { get; private set; }
public int Value { get; private set; }
}
public class Comparer : IEqualityComparer<Foo>
{
public bool Equals(Foo x, Foo y)
{
if(x == null || y == null)
return x == y;
else
return x.Type == y.Type;
}
public int GetHashCode(Foo obj)
{
return obj.Type.GetHashCode();
}
}
This works on an IList, assuming that we want the first item each, not the one with the highest number. Be careful with different collection types (like ICollection or IEnumerable), as they do not guarantee you any order. Therefore any of the Foos may remain after the Distinct.
You could also override both Equals and GetHashCode of Foo instead of using a custom IEqualityComparer. However, I would not actually recommend this for a local distinct. Consumers of your class may not recognize that two instances with same value for Type are always equal, regardless of their Value.
a bit old fashioned , but it should work ,
If I understand correctrly
Dictionary<string,int> dict=new Dictionary<string,int>();
//Split accepts 1 character ,assume each line containes key value pair seperated with spaces and not containing whitespaces
input=input.Replace("\r\n","\n");
string[] lines=input.Split('\n');
//break to categories and find largest number at each
foreach(line in lines)
{
string parts[]=line.Split(' ');
string key=parts[0].Trim();
int value=Convert.ToInt32(parts[1].Trim());
if (dict.ContainsKey(key))
{
dict.Add(key, value);
}
else
{
if (dict[key]<value)
{
dict[key]=value;
}
}
}
//do somethig with dict

Categories

Resources