How do I remove duplicates from excel range? c#

How do I remove duplicates from excel range? c# - c#

I've converted cells in my excel range from strings to form a string list and have separated each item after the comma in the original list. I am starting to think I have not actually separated each item, and they are still one whole, trying to figure out how to do this properly so that each item( ie. the_red_bucket_01)is it's own string.
example of original string in a cell 1 and 2:
Cell1 :
the_red_bucket_01, the_blue_duck_01,_the green_banana_02, the orange_bear_01
Cell2 :
the_purple_chair_01, the_blue_coyote_01,_the green_banana_02, the orange_bear_01
The new list looks like this, though I'm not sure they are separate items:
the_red_bucket_01
the_blue_duck_01
the green_banana_02
the orange_bear_01
the_red_chair_01
the_blue_coyote_01
the green_banana_02
the orange_bear_01
Now I want to remove duplicates so that the console only shows 1 of each item, no matter how many there are of them, I can't seem to get my foreah/if statements to work. It is printing out multiple copies of the items, I'm assuming because it is iterating for each item in the list, so it is returning the data that many items.
foreach (Excel.Range item in xlRng)
{
string itemString = (string)item.Text;
List<String> fn = new List<String>(itemString.Split(','));
List<string> newList = new List<string>();
foreach (string s in fn)
if (!newList.Contains(s))
{
newList.Add(s);
}
foreach (string combo in newList)
{
Console.Write(combo);
}

You probably need to trim the strings, because they have leading white spaces, so "string1" is different from " string1".
foreach (string s in fn)
if (!newList.Contains(s.Trim()))
{
newList.Add(s);
}

You can do this much simpler with Linq by using Distinct.
Returns distinct elements from a sequence by using the default
equality comparer to compare values.
foreach (Excel.Range item in xlRng)
{
string itemString = (string)item.Text;
List<String> fn = new List<String>(itemString.Split(','));
foreach (string combo in fn.Distinct())
{
Console.Write(combo);
}
}
As mentioned in another answer, you may also need to Trim any whitespace, in which case you would do:
fn.Select(x => x.Trim()).Distinct()

Where you need to contain keys/values, its better to use Dictionary type. Try changing code with List<T> to Dictionary<T>. i.e.
From:
List<string> newList = new List<string>();
foreach (string s in fn)
if (!newList.Containss))
{
newList.Add(s);
}
to
Dictionary<string, string> newList = new Dictionary<string, string>();
foreach (string s in fn)
if (!newList.ContainsKey(s))
{
newList.Add(s, s);
}

If you are concerned about the distinct items while you are reading, then just use the Distinct operator like fn.Distinct()
For processing the whole data, I can suggest two methods:
Read in the whole data then use LINQ's Distinct operator
Or use a Set data structure and store each element in that while reading the excel
I suggest that you take a look at the LINQ documentation if you are processing data. It has really great extensions. For even more methods, you can check out the MoreLINQ package.

I think your code would probably work as you expect if you moved newList out of the loop - you create a new variable named newList each loop so it's not going to find duplicates from earlier loops.
You can do all of this this more concisely with Linq:
//set up some similar data
string list1 = "a,b,c,d,a,f";
string list2 = "a,b,c,d,a,f";
List<string> lists = new List<string> {list1,list2};
// find unique items
var result = lists.SelectMany(i=>i.Split(',')).Distinct().ToList();
SelectMany() "flattens" the list of lists into a list.
Distinct() removes duplicates.

var uniqueItems = new HashSet<string>();
foreach (Excel.Range cell in xlRng)
{
var cellText = (string)cell.Text;
foreach (var item in cellText.Split(',').Select(s => s.Trim()))
{
uniqueItems.Add(item);
}
}
foreach (var item in uniqueItems)
{
Console.WriteLine(item);
}

Related

Create a 2-column List using a variable for list name

Because the original post (Create List with name from variable) was so old, I didn't want to approach this as an answer.
But, I wanted to add this use of the above solution because it was non-obvious to me. And, it may help some of my fellow noobs... Also, I ran into some issues I don't know how to address.
I needed a way to create a list using a variable name, in this case "mstrClock", for timing diagrams.
I was not able to get .NET to accept a two-column list, though, so I ended up with two dictionaries.
Is there a way to structure this so that I can use a single dictionary for both columns?
dictD.Add("mstrClock", new List<double>());
dictL.Add("mstrClock", new List<string>());
Then as I develop the timing diagram, I add to the lists as follows:
dictD["mstrClock"].Add(x); // This value will normally be the time value.
dictL["mstrClock"].Add("L"); // This value will be the "L", "F" or "H" logic level
Then to get at the data I did this:
for (int n = 0; n < dictD["mstrClock"].Count; n++)
{
listBox1.Items.Add(dictL["mstrClock"][n] + "\t" + dictD["mstrClock"][n].ToString());
}

Why not just store what you want to display, in the dictionary?
dict.Add("mstrClock", new List<string>());
dict["mstrClock"].Add($"L\t{x}");
for (int n = 0; n < dict["mstrClock"].Count; n++)
{
listBox1.Items.Add(dict["mstrClock"][n]);
}
On another point, do you even need a dictionary? What is the point of having a dictionary with one key? If you only need a List<string>, then only create that.
var items = new List<string>());
items.Add($"L\t{x}");
foreach (var item in items)
{
listBox1.Items.Add(item);
}

You can use Tuples in modern C# to create your two-column list as follows:
var list = new List<(double time, string logicLevel)>();
list.Add((1, "L"));
list.Add((2, "F"));
foreach (var element in list)
{
listBox1.Items.Add($"{element.time} \t {element.logicLevel}");
}
If using a dictionary is a must, you can change the above code to something like:
var dict = new Dictionary<string, List<(double time, string logicLevel)>>();
dict["mstrClock"] = new List<(double time, string logicLevel)>();
dict["mstrClock"].Add((1, "L"));
dict["mstrClock"].Add((2, "F"));
var list = dict["mstrClock"];
foreach (var element in list)
{
listBox1.Items.Add($"{element.time} \t {element.logicLevel}");
}

One approach to creating a 2-column list would be to create a list of key/value pairs:
var list = new List<KeyValuePair<double, string>();
list.Add(new KeyValuePair<double, string>(1, "L");
foreach (KeyValuePair<double, string> element in list)
{
listBox1.Items.Add($"{element.key} \t {element.value}");
}

C# - Duplicates in List of string Lists instead of proper values

Im reading from xml file using foreach (as in below) and writing found info into a List, which then is later added to a list of lists. My problem is that the moment foreach loop is trying to add another element to my lists of lists it somehow erases the content of previous elements of the list and instead adds x of the same. E.g. first loop is ok, on the second loop it erases the first element and adds 2 of the same, on the 3rd loop it adds 3 same lists etc.
It might be a simple problem but i really cannot think of a solution to at the moment.
Code:
static List<List<string>> AddPapers(XmlNodeList nodelist)
{
var papers = new List<List<string>>();
var paper = new List<string>();
foreach (XmlNode node in nodelist)
{
paper.Clear();
for (int i = 0; i < node.ChildNodes.Count; i++)
{
paper.Add(node.ChildNodes[i].InnerText);
}
papers.Add(paper);
}
return papers;
}
More info: This is sort of a simplified version without all the fancy stuff id do with the xml but nevertheless, the problem is the same.
The paper list is good everytime i check so the problem should be with adding to papers. I honestly have no idea why or even how can it erase the contents of papers and add same values on its own.

The problem is that you're only calling paper.Clear, which clears the list that you just added, but then you re-populate it with new items and add it again.
Instead, you should create a new instance of the list on each iteration, so you're not always modifying the same list over and over again (remember a List<T> is a reference type, so you're only adding a reference to the list).
For example:
static List<List<string>> AddPapers(XmlNodeList nodelist)
{
var papers = new List<List<string>>();
foreach (XmlNode node in nodelist)
{
// Create a new list on each iteration
var paper = new List<string>();
for (int i = 0; i < node.ChildNodes.Count; i++)
{
paper.Add(node.ChildNodes[i].InnerText);
}
papers.Add(paper);
}
return papers;
}
Also, using System.Linq extention methods, your code can be reduced to:
static List<List<string>> GetChildrenInnerTexts(XmlNodeList nodes)
{
return nodes.Cast<XmlNode>()
.Select(node => node.ChildNodes.Cast<XmlNode>()
.Select(child => child.InnerText)
.ToList())
.ToList();
}

The issue is with reference. You need to initialize 'paper' instead of clearing it.
Inside you first foreach loop, change
paper.Clear()
With
paper = new List<string>();
When you clear the object, you are keeping the reference to empty object for every index of papers

Find out if string list items startswith another item from another list

I'd like to loop over a string list, and find out if the items from this list start with one of the item from another list.
So I have something like:
List<string> firstList = new List<string>();
firstList.Add("txt random");
firstList.Add("text ok");
List<string> keyWords = new List<string>();
keyWords.Add("txt");
keyWords.Add("Text");

You can do that using a couple simple for each loops.
foreach (var t in firstList) {
foreach (var u in keyWords) {
if (t.StartsWith(u) {
// Do something here.
}
}
}

If you just want a list and you'd rather not use query expressions (I don't like them myself; they just don't look like real code to me)
var matches = firstList.Where(fl => keyWords.Any(kw => fl.StartsWith(kw)));

from item in firstList
from word in keyWords
where item.StartsWith(word)
select item

Try this one it is working fine.
var result = firstList.Where(x => keyWords.Any(y => x.StartsWith(y)));

C# dedupe List based on split

I'm having a hard time deduping a list based on a specific delimiter.
For example I have 4 strings like below:
apple|pear|fruit|basket
orange|mango|fruit|turtle
purple|red|black|green
hero|thor|ironman|hulk
In this example I should want my list to only have unique values in column 3, so it would result in an List that looks like this,
apple|pear|fruit|basket
purple|red|black|green
hero|thor|ironman|hulk
In the above example I would have gotten rid of line 2 because line 1 had the same result in column 3. Any help would be awesome, deduping is tough in C#.
how i'm testing this:
static void Main(string[] args)
{
BeginListSet = new List<string>();
startHashSet();
}
public static List<string> BeginListSet { get; set; }
public static void startHashSet()
{
string[] BeginFileLine = File.ReadAllLines(#"C:\testit.txt");
foreach (string begLine in BeginFileLine)
{
BeginListSet.Add(begLine);
}
}
public static IEnumerable<string> Dedupe(IEnumerable<string> list, char seperator, int keyIndex)
{
var hashset = new HashSet<string>();
foreach (string item in list)
{
var array = item.Split(seperator);
if (hashset.Add(array[keyIndex]))
yield return item;
}
}

Something like this should work for you
static IEnumerable<string> Dedupe(this IEnumerable<string> input, char seperator, int keyIndex)
{
var hashset = new HashSet<string>();
foreach (string item in input)
{
var array = item.Split(seperator);
if (hashset.Add(array[keyIndex]))
yield return item;
}
}
...
var list = new string[]
{
"apple|pear|fruit|basket",
"orange|mango|fruit|turtle",
"purple|red|black|green",
"hero|thor|ironman|hulk"
};
foreach (string item in list.Dedupe('|', 2))
Console.WriteLine(item);
Edit: In the linked question Distinct() with Lambda, Jon Skeet presents the idea in a much better fashion, in the form of a DistinctBy custom method. While similar, his is far more reusable than the idea presented here.
Using his method, you could write
var deduped = list.DistinctBy(item => item.Split('|')[2]);
And you could later reuse the same method to "dedupe" another list of objects of a different type by a key of possibly yet another type.

Try this:
var list = new string[]
{
"apple|pear|fruit|basket",
"orange|mango|fruit|turtle",
"purple|red|black|green",
"hero|thor|ironman|hulk "
};
var dedup = new List<string>();
var filtered = new List<string>();
foreach (var s in list)
{
var filter = s.Split('|')[2];
if (dedup.Contains(filter)) continue;
filtered.Add(s);
dedup.Add(filter);
}
// Console.WriteLine(filtered);

Can you use a HashSet instead? That will eliminate dupes automatically for you as they are added.

May be you can sort the words with delimited | on alphabetical order. Then store them onto grid (columns). Then when you try to insert, just check if there is column having a word which starting with this char.

If LINQ is an option, you can do something like this:
// assume strings is a collection of strings
List<string> list = strings.Select(a => a.Split('|')) // split each line by '|'
.GroupBy(a => a[2]) // group by third column
.Select(a => a.First()) // select first line from each group
.Select(a => string.Join("|", a))
.ToList(); // convert to list of strings
Edit (per Jeff Mercado's comment), this can be simplified further:
List<string> list =
strings.GroupBy(a => a.split('|')[2]) // group by third column
.Select(a => a.First()) // select first line from each group
.ToList(); // convert to list of strings

how to dissect string values

How can I dissect or retrieve string values?
Here's the sample code that I'm working on now:
private void SplitStrings()
{
List<string> listvalues = new List<string>();
listvalues = (List<string>)Session["mylist"];
string[] strvalues = listvalues.ToArray();
for (int x = 0; x < strvalues.Length; x++)
{
}
}
Now that I'am able to retrieve list values in my session. How can I separately get the values of each list using foreach or for statement?
What I want to happen is to programmatically split the values of the strings depending on how many is in the list.

If you have a list of string values, you can do the following:
private void SplitStrings()
{
List<string> listValues = (List<string>) Session["mylist"];
// always check session values for null
if(listValues != null)
{
// go through each list item
foreach(string stringElement in listValues)
{
// do something with variable 'stringElement'
System.Console.WriteLine(stringElement);
}
}
}
Note that I test the result of casting the session and that I don't create a new list first-off, which is not necessary. Also note that I don't convert to an array, simply because looping a list is actually easier, or just as easy, as looping an array.
Note that you named your method SplitStrings, but we're not splitting anything. Did you mean to split something like "one;two;three;four" in a four-element list, based on the separator character?

I'm not sure what you're trying to obtain in this code, I don't know why you're converting your List to an Array.
You can loop through your listValues collection with a foreach block:
foreach(string value in listValues)
{
//do something with value, I.e.
Response.Write(value);
}

I don't know what's in the strings but you can start by simplifying. There is no point allocating a new List if you're going to overwrite it immediately.
private void SplitStrings()
{
List<string> list = (List<string>)Session["mylist"];
foreach(string value in list)
{
}
}

List listvalues = (List)Session["mylist"];
foreach (string s in listvalues)
{
//do what you want with s here
}

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

How do I remove duplicates from excel range? c# - c#

You probably need to trim the strings, because they have leading white spaces, so "string1" is different from " string1". foreach (string s in fn) if (!newList.Contains(s.Trim())) { newList.Add(s); }

var uniqueItems = new HashSet<string>(); foreach (Excel.Range cell in xlRng) { var cellText = (string)cell.Text; foreach (var item in cellText.Split(',').Select(s => s.Trim())) { uniqueItems.Add(item); } } foreach (var item in uniqueItems) { Console.WriteLine(item); }

Related

Create a 2-column List using a variable for list name

C# - Duplicates in List of string Lists instead of proper values

Find out if string list items startswith another item from another list

C# dedupe List based on split

how to dissect string values

Categories

Resources