Get Duplicates in List and Edit Item - c#

I need to get any items in a list that already exist and edit BOTH items to include a duplicate property.
I've attempted to get duplicates by doing:
var duplicates = LIST.Select((t, i) => new { Index = i, Text = t }).GroupBy(g => g.Text.PROPERTYTOSEARCGBY).Where(g => g.Count() > 1);
However this returns me the property that is duplicated when however I need to get index of both of the records so that I can edit them using:
LIST[index1].FlaggedData = true;
LIST[index2].FlaggedData = true;
etc...
How can I get the index of BOTH duplicate or multiple records?

The variable duplicates is an IGrouping that can be iterated. Each element in the enumeration will be an anonymous type that you defined to have 2 properties: Index and Text.
foreach (var grouping in duplicates)
{
// This will contain the value that was grouped by:
// - grouping.Key
foreach (var pair in grouping)
{
// These properties are available
// - pair.Index
// - pair.Text
// set the FlaggedData property
pair.Text.FlaggedData = true;
}
}

You already have the correct LINQ query to do what you want. Seems like your code to update the Duplicate flag is incorrect. You should use some code like below to update the Dupe flag
foreach (var group in duplicates) {
foreach (var item in group) {
LIST[item.Index].FlaggedData = true;
}
}
or some more concise code like
foreach (var item in duplicates.SelectMany(item => item))
item.Text.FlaggedData = true;

Use SelectMany
duplicates = LIST.Select((t, i) => new { Index = i, Text = t })
.GroupBy(g => g.Text.PROPERTYTOSEARCGBY)
.Where(g => g.Count() > 1)
.SelectMany(g => g, (g, x) => x.Index);

You don't need the index.
It's enough to group as you did and select the item themselves.
var duplicates = LIST.GroupBy(t => t.PROPERTYTOSEARCGBY)
.Where(g => g.Count() > 1)
.SelectMany(t=>t);
foreach (var item in duplicates)
{
item.FlaggedData = true;
}
Or even shorter:
LIST.GroupBy(t => t.PROPERTYTOSEARCGBY)
.Where(g => g.Count() > 1)
.SelectMany(t=>t)
.ToList()
.ForEach(t=>t.FlaggedData = true);

Related

How do I pick out values between a duplicate value in a collection?

I have a method that returns a collection that has a duplicate value.
static List<string> GenerateItems()
{
var _items = new List<string>();
_items.Add("Tase");
_items.Add("Ray");
_items.Add("Jay");
_items.Add("Bay");
_items.Add("Tase");
_items.Add("Man");
_items.Add("Ran");
_items.Add("Ban");
return _items;
}
I want to search through that collection and find the first place that duplicate value is located and start collecting all the values from the first appearance of the duplicate value to its next appearance. I want to put this in a collection but I only want the duplicate value to appear once in that collection.
This is what I have so far but.
static void Main(string[] args)
{
string key = "Tase";
var collection = GenerateItems();
int index = collection.FindIndex(a => a == key);
var matchFound = false;
var itemsBetweenKey = new List<string>();
foreach (var item in collection)
{
if (item == key)
{
matchFound = !matchFound;
}
if (matchFound)
{
itemsBetweenKey.Add(item);
}
}
foreach (var item in itemsBetweenKey)
{
Console.WriteLine(item);
}
Console.ReadLine();
}
There must be an easier way of doing this. Perhaps with Indexing or a LINQ query?
You can do something like that
string key = "Tase";
var collection = GenerateItems();
int indexStart = collection.FindIndex(a => a == key);
int indexEnd = collection.FindIndex(indexStart+1, a => a == key);
var result = collection.GetRange(indexStart, indexEnd-indexStart);
You can use linq select and group by to find the first index and last index of all duplicates (Keep in mind if something is in the list more then 2 times it would ignore the middle occurences.
But I personally think the linq for this seems overcomplicated. I would stick with simple for loops and if statements (Just turn it into a method so it reads better)
Here is a solution with Linq to get all duplicate and all values between those duplicates including itself once as you mentioned.
var collection = GenerateItems();
var Duplicates = collection.Select((x,index) => new { index, value = x })
.GroupBy(x => x.value)//group by the strings
.Where(x => x.Count() > 1)//only take duplicates
.Select(x=>new {
Value = x.Key,
FirstIndex = x.Min(y=> y.index),//take first occurenc
LastIndex = x.Max(y => y.index)//take last occurence
}).ToList();
var resultlist = new List<List<string>>();
foreach (var duplicaterange in Duplicates)
resultlist .Add(collection.GetRange(duplicaterange.FirstIndex, duplicaterange.LastIndex - duplicaterange.FirstIndex));
Try this function
public List<string> PickOut(List<string> collection, string key)
{
var index = 0;
foreach (var item in collection)
{
if (item == key)
{
return collection.Skip(index).TakeWhile(x=> x != key).ToList();
}
index++;
};
return null;
}
First finding the duplicate key then find the second occurrence of the item and then take result.
var firstduplicate = collection.GroupBy(x => x)
.Where(g => g.Count() > 1)
.Select(g => g.Key).First();
var indices = collection.Select((b, i) => b == firstduplicate ? i : -1).Where(i => i != -1).Skip(1).FirstOrDefault();
if (indices>0)
{
var result = collection.Take(indices).ToList();
}

Count duplicate items in a List<string> and save them in different files (Or Lists/arrays/strings)

I need to count the duplicate values in List and save them in different files. With the name of file containing the email provider and number or duplicates.
The list always changes and has different values but it can look like that:
List<string> email_domains = new List<string>()
{
"gmail.com",
"gmail.com",
"outlook.com",
"outlook.com",
"outlook.com",
"outlook.com",
"ineria.pl",
"mail.ru"
}
The result i want to get is something like this:
gmail.com [2]
outlook.com[4]
var email_domains = new List<string>()
{
"gmail.com",
"gmail.com",
"outlook.com",
"outlook.com",
"outlook.com",
"outlook.com",
"ineria.pl",
"mail.ru"
};
var results = email_domains.GroupBy(x => x);
foreach (var domain in results)
{
Console.WriteLine("{0} [{1}]", domain.Key, domain.Count());
}
Instead of Console.WriteLine() you can write to a file.
If you only want items that has at least one duplicate, add an additional condition:
foreach (var domain in email_domains.GroupBy(x => x).Where(x => x.Count() > 1))
{
Console.WriteLine("{0} [{1}]", domain.Key, domain.Count());
}
var result = email_domains.GroupBy(_ => _)
.Select(g => new { Domain = g.Key, Count = g.Count() })
.Where(_ => _.Count > 1);
Instead of an anonymous type you could also select into a Dictionary<string, int>:
var result = email_domains.GroupBy(_ => _)
.Where(g => g.Count() > 1)
.ToDictionary(g => g.Key, g => g.Count());
You can try this using Linq to Objects:
var query = from item in email_domains
group item by item into g
where g.Count() > 1
select new { email = g.Key, count = g.Count() };
foreach ( var item in query )
File.WriteAllText($"c:\\{item.email} ({item.count}).txt", item.email);
The query select items grouped by same email having more than one occurences.
Then we save result to files.
You can replace , item.email) by what you want using WriteAllText or AllLines if you have multiline.

Groupby and selectMany and orderby doesn't bring back the data I need

I have two List row1 and row2.This is data for row1:
and data for row2:
I Concatenate these two lists into one :
var rows = rows1.Concat(rows2).ToList();
The result would be this:
and then want to groupBy on a few fields and order by with other fields.and do some changes to some data. This is my Code
var results = rows.GroupBy(row => new { row.FromBayPanel, row.TagNo })
.SelectMany(g => g.OrderBy(row => row.RowNo)
.Select((x, i) =>
new
{
TagGroup = x.TagGroup,
RowNo = (i == 0) ? (j++).ToString() : "",
TagNo = (i == 0) ? x.TagNo.ToString() : "",
FromBayPanel = x.FromBayPanel,
totalItem = x.totalItem
}).ToList());
which brings me back this result:
This is not what I really want I want to have this result. I Want all data with same "FromBayPanel" be listed together.
which part of my code is wrong?
I think when you want to order the elements within your group you have to use a different approach as SelectMany will simply flatten your grouped items into one single list. Thus instead of rows.GroupBy(row => new { row.FromBayPanel, row.TagNo }).SelectMany(g => g.OrderBy(row => row.RowNo) you may use this:
rows.OrderBy(x => x.FromBayPanel).ThenBy(x => x.TagNo) // this preserves the actual group-condition
.ThenBy(x => x.RowNo) // here you order the items of every item within the group by its RowNo
.GroupBy(row => new { row.FromBayPanel, row.TagNo })
.Select(...)
EDIT: You have to make your select WITHIN every group, not afterwards:
rows.GroupBy(row => new { row.FromBayPanel, row.TagNo })
.ToDictionary(x => x.Key,
x => x.OrderBy(y => y.RowNo)
.Select((y, i) =>
new
{
TagGroup = y.TagGroup,
RowNo = (i == 0) ? (j++).ToString() : "",
TagNo = (i == 0) ? y.TagNo.ToString() : "",
FromBayPanel = x.FromBayPanel,
totalItem = y.totalItem
})
)
EDIT: Test see here

What is the most elegant way to find index of duplicate items in C# List

I've got a List<string> that contains duplicates and I need to find the indexes of each.
What is the most elegant, efficient way other than looping through all the items. I'm on .NET 4.0 so LINQ is an option. I've done tons of searching and connect find anything.
Sample data:
var data = new List<string>{"fname", "lname", "home", "home", "company"}();
I need to get the indexes of "home".
You can create an object from each item containing it's index, then group on the value and filter out the groups containing more than one object. Now you have a grouping list with objects containing the text and their original index:
var duplicates = data
.Select((t,i) => new { Index = i, Text = t })
.GroupBy(g => g.Text)
.Where(g => g.Count() > 1);
using System;
using System.Collections.Generic;
class Program
{
static void Main(string[] args)
{
var data = new List<string> { "fname", "lname", "home", "home", "company" };
foreach (var duplicate in FindDuplicates(data))
{
Console.WriteLine("Duplicate: {0} at index {1}", duplicate.Item1, duplicate.Item2);
}
}
public static IEnumerable<Tuple<T, int>> FindDuplicates<T>(IEnumerable<T> data)
{
var hashSet = new HashSet<T>();
int index = 0;
foreach (var item in data)
{
if (hashSet.Contains(item))
{
yield return Tuple.Create(item, index);
}
else
{
hashSet.Add(item);
}
index++;
}
}
}
How about something like this
var data = new List<string>{"fname", "lname", "home", "home", "company"};
var duplicates = data
.Select((x, index) => new { Text = x, index})
.Where( x => ( data
.GroupBy(i => i)
.Where(g => g.Count() > 1)
.Select(g => g.Key).ToList()
).Contains(x.Text));
I myself needed to find and remove the duplicates from list of strings. I first searched the indexes of duplicate items and then filtered the list in functional way using LINQ, without mutating the original list:
public static IEnumerable<string> RemoveDuplicates(IEnumerable<string> items)
{
var duplicateIndexes = items.Select((item, index) => new { item, index })
.GroupBy(g => g.item)
.Where(g => g.Count() > 1)
.SelectMany(g => g.Skip(1), (g, item) => item.index);
return items.Where((item, index) => !duplicateIndexes.Contains(index));
}

How to convert an IEnumerable<IEnumerable<T>> to a IEnumerable<T>

I have an IEnumerable<IEnumerable<T>> collection that I want to convert to a single dimension collection. Is it possible to achieve this with a generic extension method? Right now I'm doing this to achieve it.
List<string> filteredCombinations = new List<string>();
//For each collection in the combinated results collection
foreach (var combinatedValues in combinatedResults)
{
List<string> subCombinations = new List<string>();
//For each value in the combination collection
foreach (var value in combinatedValues)
{
if (value > 0)
{
subCombinations.Add(value.ToString());
}
}
if (subCombinations.Count > 0)
{
filteredCombinations.Add(String.Join(",",subCombinations.ToArray()));
}
}
If it's not possible to get a generic solution, how can I optimize this in an elegant fashioned way.
You can use the Enumerable.SelectMany extension method for this.
If I read your code correctly, the code for that would be:
var filteredCombinations = combinatedResults.SelectMany(o => o)
.Where(value => value > 0)
.Select(v => v.ToString());
Edit: As commented, the above code is not joining each element of the subsets to a string, as the original code does. Using the built-in methods, you can do that using:
var filteredCombinations = combinatedResults
.Where(resultSet => resultSet.Any(value => value > 0)
.Select(resultSet => String.Join(",",
resultSet.Where(value => value > 0)
.Select(v => v.ToString()).ToArray()));
Here you go:
var strings = combinedResults.Select
(
c => c.Where(i => i > 0)
.Select(i => i.ToString())
).Where(s => s.Any())
.Select(s => String.Join(",", s.ToArray());
I would personally use Enumerable.SelectMany, as suggested by driis.
However, if you wanted to implement this yourself, it would be much cleaner to do:
IEnumerable<T> MakeSingleEnumerable<T>(IEnumerable<IEnumerable<T>> combinatedResults)
{
foreach (var combinatedValues in combinatedResults) {
foreach (var value in combinatedValues)
yield return value;
}
}
You asked two different questions. The one you described in the title is already answered by drilis.
But your example code is a different problem. We can refactor it in stages. Step 1, build the subCombinations list using some Linq:
List<string> filteredCombinations = new List<string>();
//For each collection in the combinated results collection
foreach (var combinatedValues in combinatedResults)
{
var subCombinations = combinatedValues.Where(v => v > 0)
.Select(v => v.ToString())
.ToList();
if (subCombinations.Count > 0)
filteredCombinations.Add(string.Join(",",subCombinations.ToArray()));
}
Now the outer loop, leaving us with just this:
var filteredCombinations = combinatedResults
.Select(values => values.Where(v => v > 0)
.Select(v => v.ToString())
.ToArray())
.Where(a => a.Count > 0)
.Select(a => string.Join(",", a));
use linq SelectMany

Categories

Resources