Comparing two strings with different orders

Comparing two strings with different orders - c#

I have a dictionary with a list of strings that each look something like:
"beginning|middle|middle2|end"
Now what I wanted was to do this:
List<string> stringsWithPipes = new List<string>();
stringWithPipes.Add("beginning|middle|middle2|end");
...
if(stringWithPipes.Contains("beginning|middle|middle2|end")
{
return true;
}
problem is, the string i'm comparing it against is built slightly different so it ends up being more like:
if(stringWithPipes.Contains(beginning|middle2|middle||end)
{
return true;
}
and obviously this ends up being false. However, I want to consider it true, since its only the order that is different.
What can I do?

You can split your string on | and then split the string to be compared, and then use Enumerable.Except along with Enumerable.Any like
List<string> stringsWithPipes = new List<string>();
stringsWithPipes.Add("beginning|middle|middle2|end");
stringsWithPipes.Add("beginning|middle|middle3|end");
stringsWithPipes.Add("beginning|middle2|middle|end");
var array = stringsWithPipes.Select(r => r.Split('|')).ToArray();
string str = "beginning|middle2|middle|end";
var compareArray = str.Split('|');
foreach (var subArray in array)
{
if (!subArray.Except(compareArray).Any())
{
//Exists
Console.WriteLine("Item exists");
break;
}
}
This can surely be optimized, but the above is one way to do it.

Try this instead::
if(stringWithPipes.Any(P => P.split('|')
.All(K => "beginning|middle2|middle|end".split('|')
.contains(K)))
Hope this will help !!

You need to split on a delimeter:
var searchString = "beginning|middle|middle2|end";
var searchList = searchString.Split('|');
var stringsWithPipes = new List<string>();
stringsWithPipes.Add("beginning|middle|middle2|end");
...
return stringsWithPipes.Select(x => x.Split('|')).Any(x => Match(searchList,x));
Then you can implement match in multiple ways
First up must contain all the search phrases but could include others.
bool Match(string[] search, string[] match) {
return search.All(x => match.Contains(x));
}
Or must be all the search phrases cannot include others.
bool Match(string[] search, string[] match) {
return search.All(x => match.Contains(x)) && search.Length == match.Length;
}

That should work.
List<string> stringsWithPipes = new List<string>();
stringsWithPipes.Add("beginning|middle|middle2|end");
string[] stringToVerifyWith = "beginning|middle2|middle||end".Split(new[] { '|' },
StringSplitOptions.RemoveEmptyEntries);
if (stringsWithPipes.Any(s => !s.Split('|').Except(stringToVerifyWith).Any()))
{
return true;
}
The Split will remove any empty entries created by the doubles |. You then check what's left if you remove every common element with the Except method. If there's nothing left (the ! [...] .Any(), .Count() == 0 would be valid too), they both contain the same elements.

Related

Is It possible to find out what are the common part in String List

I was working on finding out the Common string part in the String list. If we take a sample data set
private readonly List<string> Xpath = new List<string>()
{
"BODY>MAIN:nth-of-type(1)>DIV>SECTION>DIV>SECTION>DIV>DIV:nth-of-type(1)>DIV>DIV:nth-of-type(3)>DIV>ARTICLE>DIV>DIV>DIV>SECTION:nth-of-type(1)>H2:nth-of-type(1)",
"BODY>MAIN:nth-of-type(1)>DIV>SECTION>DIV>SECTION>DIV>DIV:nth-of-type(1)>DIV>DIV:nth-of-type(3)>DIV>ARTICLE>DIV>DIV>DIV>SECTION:nth-of-type(2)>H2:nth-of-type(1)",
"BODY>MAIN:nth-of-type(1)>DIV>SECTION>DIV>SECTION>DIV>DIV:nth-of-type(1)>DIV>DIV:nth-of-type(3)>DIV>ARTICLE>DIV>DIV>DIV>SECTION:nth-of-type(3)>H2:nth-of-type(1)",
"BODY>MAIN:nth-of-type(1)>DIV>SECTION>DIV>SECTION>DIV>DIV:nth-of-type(1)>DIV>DIV:nth-of-type(3)>DIV>ARTICLE>DIV>DIV>DIV>SECTION:nth-of-type(4)>H2:nth-of-type(1)",
"BODY>MAIN:nth-of-type(1)>DIV>SECTION>DIV>SECTION>DIV>DIV:nth-of-type(1)>DIV>DIV:nth-of-type(3)>DIV>ARTICLE>DIV>DIV>DIV>SECTION:nth-of-type(5)>H2:nth-of-type(1)",
"BODY>MAIN:nth-of-type(1)>DIV>SECTION>DIV>SECTION>DIV>DIV:nth-of-type(1)>DIV>DIV:nth-of-type(3)>DIV>ARTICLE>DIV>DIV>DIV>SECTION:nth-of-type(6)>H2:nth-of-type(1)",
"BODY>MAIN:nth-of-type(1)>DIV>SECTION>DIV>SECTION>DIV>DIV:nth-of-type(1)>DIV>DIV:nth-of-type(3)>DIV>ARTICLE>DIV>DIV>DIV>SECTION:nth-of-type(7)>H2:nth-of-type(1)",
"BODY>MAIN:nth-of-type(1)>DIV>SECTION>DIV>SECTION>DIV>DIV:nth-of-type(1)>DIV>DIV:nth-of-type(3)>DIV>ARTICLE>DIV>DIV>DIV>SECTION:nth-of-type(8)>H2:nth-of-type(1)",
"BODY>MAIN:nth-of-type(1)>DIV>SECTION>DIV>SECTION>DIV>DIV:nth-of-type(1)>DIV>DIV:nth-of-type(3)>DIV>ARTICLE>DIV>DIV>DIV>SECTION:nth-of-type(9)>H2:nth-of-type(1)"
};
From this, I want to find out to which children these are similar. data is an Xpath list.
Programmatically I should be able to tell
Expected output:
BODY>MAIN:nth-of-type(1)>DIV>SECTION>DIV>SECTION>DIV>DIV:nth-of-type(1)>DIV>DIV:nth-of-type(3)>DIV>ARTICLE>DIV>DIV>DIV
In order to get this What I did was like this. I separate each item by > and then create a list of items for each dataset originally.
Then using this find out what are the unique items
private IEnumerable<T> GetCommonItems<T>(IEnumerable<T>[] lists)
{
HashSet<T> hs = new HashSet<T>(lists.First());
for (int i = 1; i < lists.Length; i++)
{
hs.IntersectWith(lists[i]);
}
return hs;
}
Able to find out the unique values and create a dataset again. But what happened is if this contains Ex:- Div in two places and it also in every originally dataset even then this method will pick up only one Div.
From then I would get something like this:
BODY>MAIN:nth-of-type(1)>DIV>SECTION
But I need this
BODY>MAIN:nth-of-type(1)>DIV>SECTION>DIV>SECTION>DIV>DIV:nth-of-type(1)>DIV>DIV:nth-of-
type(3)>DIV>ARTICLE>DIV>DIV>DIV

Disclaimer: This is not the most performant solution but it works :)
Let's start with splitting the first path by > character
Do the same with all the paths
char separator = '>';
IEnumerable<string> firstPathChunks = Xpath[0].Split(separator);
var chunks = Xpath.Select(path => path.Split(separator).ToList()).ToArray();
Iterate through the firstPathChunks
Iterate through the chunks
if there is a match then remove the first element
if all first element is removed then append the matching prefix to sb
void Process(StringBuilder sb)
{
foreach (var pathChunk in firstPathChunks)
{
foreach (var chunk in chunks)
{
if (chunk[0] != pathChunk)
{
return;
}
chunk.RemoveAt(0);
}
sb.Append(pathChunk);
sb.Append(separator);
}
}
Sample usage
var sb = new StringBuilder();
Process(sb);
Console.WriteLine(sb.ToString());
Output
BODY>MAIN:nth-of-type(1)>DIV>SECTION>DIV>SECTION>DIV>DIV:nth-of-type(1)>DIV>DIV:nth-of-type(3)>DIV>ARTICLE>DIV>DIV>DIV>

Parsing the string by the seperator > is a good idea. Instead of then creating a list of unique items you should create a list of all items contained in the string which would result in
{
"BODY",
"MAIN:nth-of-type(1)",
"DIV",
"SECTTION",
"DIV",
...
}
for the first entry of your XPath list.
This way you create a List<List<string>> containing every element of each entry of your XPath list. You then can compare all first elements of the inner lists. If they are equal save that elements value to you output and proceed with all second elements and so on until you find an element that is not equal in all outer lists.
Edit:
After seperating your list by the > seperator this could look something like this:
List<List<string>> XPathElementsLists;
List<string> resultElements = new List<string>();
string result;
XPathElementsLists = ParseElementsFormXPath(XPath);
for (int i = 0; i < XPathElementsLists[0].Count; i++)
{
bool isEqual = true;
string compareElemment = XPathElementsLists[0][i];
foreach (List<string> element in XPathElementsLists)
{
if (!String.Equals(compareElemment, element))
{
isEqual = false;
break;
}
}
if (!isEqual)
{
break;
}
resultElements.Add(compareElemment);
}
result = String.Join(">", resultElements.ToArray());

How would I return the number of a sentence(s) that contains a word?

I have been trying to get better with programming (specifically c#) and am stuck on this project.
I am trying to create a windows form program where a user inputs a paragraph into a textbox and the program then returns all the unique words in the inputted paragraph as well as the count of unique words. I then need to list what sentence(s) contain these unique words.
I believe I have the first two parts down (listing unique words and their counts) however, I am struggling to think of how to complete the last part. I've listed my code below. Any advice would be great. Thanks!
namespace MrSteamyRayVaughn
{
public partial class MrSteamyRayVaughn : Form
{
public string[] paragraphWords { get; set; }
public string[] paragraphSentences { get; set; }
public MrSteamyRayVaughn()
{
InitializeComponent();
}
private void btnRun_Click(object sender, EventArgs e)
{
ParagraphSplitter();
UniqueWordCounter();
}
private void ParagraphSplitter()
{
string inputParagraph = richTextBox1.Text;
paragraphWords = inputParagraph.Split(new[] { " " }, StringSplitOptions.RemoveEmptyEntries);
paragraphSentences = inputParagraph.Split(new[] { "." }, StringSplitOptions.RemoveEmptyEntries);
}
private void UniqueWordCounter()
{
var uniqueWords = paragraphWords
.OrderBy(w => w)
.GroupBy(w => w, StringComparer.InvariantCultureIgnoreCase)
.Select(grp => new
{
Word = grp.Key,
Count = grp.Count()
}
);
foreach (var u in uniqueWords)
{
richTextBox1.AppendText($"{Environment.NewLine}Word {u.Word}: Count {u.Count}");
}
}
}
}

You can use the Where clause to filter out sentences where the sentence contains Any of the items in uniqueWords:
List<string> sentencesThatContainAUniqueWord = paragraphSentences
.Where(sentence => uniqueWords.Any(sentence.Contains))
.ToList();
Or, if you want to do a case-insensitive comparison, you can use the IndexOf method, which returns the index of the first match (or -1 if no matches were found):
List<string> sentencesThatContainAUniqueWord = sentences
.Where(sentence => words.Any(word =>
sentence.IndexOf(word, StringComparison.OrdinalIgnoreCase) > -1))
.ToList();

I'm not really sure to understand what you are trying, but if I do, I will go for a different approach.
The easiest, and obvious way to count the unique words will be just to add each work to a list if they aren't already included, and then count the elements in the list.
Yeah, it's less fancy but It will do the trick and if someone comes after you will understand right away what you were trying to do.

I am responding to this on my mobile, so apologies in advance on any formatting errors.
I think that if you just say
Dictionary<string,string> uWordSentenceDictionary = new Dictionary<string,string>();
foreach (string sentence in paragraphSentences) {
string[] sentenceWords = sentence.Split(new[] { " " }, StringSplitOptions.RemoveEmptyEntries);
List<string> matches = sentenceWords.Intersect(uniqueWords).ToList();
If(matches.Any()) {
foreach(string match in matches){
uWordSentenceDictionary.Add(match, sentence);
}
}
}
uWordSentenceDictionary.ForEach( x => {
Console.Writeline($"Unique word: {x.Key}, Sentence: {x.Value}");
});
That would allow you to separate it out nicely I would think. I saw others using contains which will also solve your problem. I'm offering this as an alternative, broken down approach.

Remove names that contain another in a list

I have a file with "Name|Number" in each line and I wish to remove the lines with names that contain another name in the list.
For example, if there is "PEDRO|3" , "PEDROFILHO|5" , "PEDROPHELIS|1" in the file, i wish to remove the lines "PEDROFILHO|5" , "PEDROPHELIS|1".
The list has 1.8 million lines, I made it like this but its too slow :
List<string> names = File.ReadAllLines("firstNames.txt").ToList();
List<string> result = File.ReadAllLines("firstNames.txt").ToList();
foreach (string name in names)
{
string tempName = name.Split('|')[0];
List<string> temp = names.Where(t => t.Contains(tempName)).ToList();
foreach (string str in temp)
{
if (str.Equals(name))
{
continue;
}
result.Remove(str);
}
}
File.WriteAllLines("result.txt",result);
Does anyone know a faster way? Or how to improve the speed?

Since you are looking for matches everywhere in the word, you will end up with O(n2) algorithm. You can improve implementation a bit to avoid string deletion inside a list, which is an O(n) operation in itself:
var toDelete = new HashSet<string>();
var names = File.ReadAllLines("firstNames.txt");
foreach (string name in names) {
var tempName = name.Split('|')[0];
toDelete.UnionWith(
// Length constraint removes self-matches
names.Where(t => t.Length > name.Length && t.Contains(tempName))
);
}
File.WriteAllLines("result.txt", names.Where(name => !toDelete.Contains(name)));

This works but I don't know if it's quicker. I haven't tested on millions of lines. Remove the tolower if the names are in the same case.
List<string> names = File.ReadAllLines(#"C:\Users\Rob\Desktop\File.txt").ToList();
var result = names.Where(w => !names.Any(a=> w.Split('|')[0].Length> a.Split('|')[0].Length && w.Split('|')[0].ToLower().Contains(a.Split('|')[0].ToLower())));
File.WriteAllLines(#"C:\Users\Rob\Desktop\result.txt", result);
test file had
Rob|1
Robbie|2
Bert|3
Robert|4
Jan|5
John|6
Janice|7
Carol|8
Carolyne|9
Geoff|10
Geoffrey|11
Result had
Rob|1
Bert|3
Jan|5
John|6
Carol|8
Geoff|10

C# dedupe List based on split

I'm having a hard time deduping a list based on a specific delimiter.
For example I have 4 strings like below:
apple|pear|fruit|basket
orange|mango|fruit|turtle
purple|red|black|green
hero|thor|ironman|hulk
In this example I should want my list to only have unique values in column 3, so it would result in an List that looks like this,
apple|pear|fruit|basket
purple|red|black|green
hero|thor|ironman|hulk
In the above example I would have gotten rid of line 2 because line 1 had the same result in column 3. Any help would be awesome, deduping is tough in C#.
how i'm testing this:
static void Main(string[] args)
{
BeginListSet = new List<string>();
startHashSet();
}
public static List<string> BeginListSet { get; set; }
public static void startHashSet()
{
string[] BeginFileLine = File.ReadAllLines(#"C:\testit.txt");
foreach (string begLine in BeginFileLine)
{
BeginListSet.Add(begLine);
}
}
public static IEnumerable<string> Dedupe(IEnumerable<string> list, char seperator, int keyIndex)
{
var hashset = new HashSet<string>();
foreach (string item in list)
{
var array = item.Split(seperator);
if (hashset.Add(array[keyIndex]))
yield return item;
}
}

Something like this should work for you
static IEnumerable<string> Dedupe(this IEnumerable<string> input, char seperator, int keyIndex)
{
var hashset = new HashSet<string>();
foreach (string item in input)
{
var array = item.Split(seperator);
if (hashset.Add(array[keyIndex]))
yield return item;
}
}
...
var list = new string[]
{
"apple|pear|fruit|basket",
"orange|mango|fruit|turtle",
"purple|red|black|green",
"hero|thor|ironman|hulk"
};
foreach (string item in list.Dedupe('|', 2))
Console.WriteLine(item);
Edit: In the linked question Distinct() with Lambda, Jon Skeet presents the idea in a much better fashion, in the form of a DistinctBy custom method. While similar, his is far more reusable than the idea presented here.
Using his method, you could write
var deduped = list.DistinctBy(item => item.Split('|')[2]);
And you could later reuse the same method to "dedupe" another list of objects of a different type by a key of possibly yet another type.

Try this:
var list = new string[]
{
"apple|pear|fruit|basket",
"orange|mango|fruit|turtle",
"purple|red|black|green",
"hero|thor|ironman|hulk "
};
var dedup = new List<string>();
var filtered = new List<string>();
foreach (var s in list)
{
var filter = s.Split('|')[2];
if (dedup.Contains(filter)) continue;
filtered.Add(s);
dedup.Add(filter);
}
// Console.WriteLine(filtered);

Can you use a HashSet instead? That will eliminate dupes automatically for you as they are added.

May be you can sort the words with delimited | on alphabetical order. Then store them onto grid (columns). Then when you try to insert, just check if there is column having a word which starting with this char.

If LINQ is an option, you can do something like this:
// assume strings is a collection of strings
List<string> list = strings.Select(a => a.Split('|')) // split each line by '|'
.GroupBy(a => a[2]) // group by third column
.Select(a => a.First()) // select first line from each group
.Select(a => string.Join("|", a))
.ToList(); // convert to list of strings
Edit (per Jeff Mercado's comment), this can be simplified further:
List<string> list =
strings.GroupBy(a => a.split('|')[2]) // group by third column
.Select(a => a.First()) // select first line from each group
.ToList(); // convert to list of strings

SQL Like search in LIST and LINQ in C#

I have a List which contains the list of words that needs to be excluded.
My approach is to have a List which contains these words and use Linq to search.
List<string> lstExcludeLibs = new List<string>() { "CONFIG", "BOARDSUPPORTPACKAGE", "COMMONINTERFACE", ".H", };
string strSearch = "BoardSupportPackageSamsung";
bool has = lstExcludeLibs.Any(cus => lstExcludeLibs.Contains(strSearch.ToUpper()));
I want to find out part of the string strSearch is present in the lstExcludedLibs.
It turns out that .any looks only for exact match. Is there any possibilities of using like or wildcard search
Is this possible in linq?
I could have achieved it using a foreach and contains but I wanted to use LINQ to make it simpler.
Edit: I tried List.Contains but it also doesn't seem to work

You've got it the wrong way round, it should be:-
List<string> lstExcludeLibs = new List<string>() { "CONFIG", "BOARDSUPPORTPACKAGE", "COMMONINTERFACE", ".H", };
string strSearch = "BoardSupportPackageSamsung";
bool has = lstExcludeLibs.Any(cus => strSearch.ToUpper().Contains(cus));
Btw - this is just an observation but, IMHO, your variable name prefixes 'lst' and 'str' should be ommitted. This is a mis-interpretation of Hungarian notation and is redundant.

I think the line should be:
bool has = lstExcludeLibs.Any(cus => cus.Contains(strSearch.ToUpper()));

Is this useful to you ?
bool has = lstExcludeLibs.Any(cus => strSearch.ToUpper().Contains(cus));
OR
bool has = lstExcludeLibs.Where(cus => strSearch.ToUpper().IndexOf(cus) > -1).Count() > 0;
OR
bool has = lstExcludeLibs.Count(cus => strSearch.ToUpper().IndexOf(cus) > -1) > 0;

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Comparing two strings with different orders - c#

Try this instead:: if(stringWithPipes.Any(P => P.split('|') .All(K => "beginning|middle2|middle|end".split('|') .contains(K))) Hope this will help !!

Related

Is It possible to find out what are the common part in String List

How would I return the number of a sentence(s) that contains a word?

Remove names that contain another in a list

C# dedupe List based on split

SQL Like search in LIST and LINQ in C#

Categories

Resources