LINQ: Enumerate through duplicates in List and remove them

LINQ: Enumerate through duplicates in List and remove them - c#

I need to remove duplicates, but also log which I am removing. I have two solutions right now, one that can go through each duplicate and one that removes duplicates. I know that removing in-place inside a foreach is dangerous so I am a bit stuck on how to do this as efficient as possible.
What I got right now
var duplicates = ListOfThings
.GroupBy(x => x.ID)
.Where(g => g.Skip(1).Any())
.SelectMany(g => g);
foreach (var duplicate in duplicates)
{
Log.Append(Logger.Type.Error, "Conflicts with another", "N/A", duplicate.ID);
}
ListOfThings = ListOfThings.GroupBy(x => x.ID).Select(y => y.First()).ToList();

Well, ToList() will materialize the query, so if you allow side effects (i.e. writing to log) it could be like that:
var cleared = ListOfThings
.GroupBy(x => x.ID)
.Select(chunk => {
// Side effect: writing to log while selecting
if (chunk.Skip(1).Any())
Log.Append(Logger.Type.Error, "Conflicts with another", "N/A", chunk.Key);
// if there're duplicates by Id take the 1st one
return chunk.First();
})
.ToList();

Why group when one can use the Aggregate function to determine the duplicates for the report and the result?
Example
var items = new List<string>() { "Alpha", "Alpha", "Beta", "Gamma", "Alpha"};
var duplicatesDictionary =
items.Aggregate (new Dictionary<string, int>(),
(results, itm) =>
{
if (results.ContainsKey(itm))
results[itm]++;
else
results.Add(itm, 1);
return results;
});
Here is the result of the above where each insert was counted and reported.
Now extract the duplicates report for any count above 1.
duplicatesDictionary.Where (kvp => kvp.Value > 1)
.Select (kvp => string.Format("{0} had {1} duplicates", kvp.Key, kvp.Value))
Now the final result is to just extract all the keys.
duplicatesDictionary.Select (kvp => kvp.Key);

You can use a hash set and union it with a list to get unique items; just override the reference comparison. Implementing IEqualityComparer<T> is flexible; if it's just ID that makes two objects unique then ok; but if it's more you can extend it, too.
You can get duplicates with LINQ.
void Main()
{
//your original class:
List<Things> originalList = new List<Things> { new Things(5), new Things(3), new Things(5) };
//i'm doing this in LINQPad; if you're using VS you may need to foreach the object
Console.WriteLine(originalList);
//put your duplicates back in a list and log them as you did.
var duplicateItems = originalList.GroupBy(x => x.ID).Where(x => x.Count() > 1).ToList();//.Select(x => x.GetHashCode());
Console.WriteLine(duplicateItems);
//create a custom comparer to compare your list; if you care about more than ID then you can extend this
var tec = new ThingsEqualityComparer();
var listThings = new HashSet<Things>(tec);
listThings.UnionWith(originalList);
Console.WriteLine(listThings);
}
// Define other methods and classes here
public class Things
{
public int ID {get;set;}
public Things(int id)
{
ID = id;
}
}
public class ThingsEqualityComparer : IEqualityComparer<Things>
{
public bool Equals(Things thing1, Things thing2)
{
if (thing1.ID == thing2.ID)
{
return true;
}
else
{
return false;
}
}
public int GetHashCode(Things thing)
{
int hCode = thing.ID;
return hCode.GetHashCode();
}
}

Related

Remove duplicate by matching string part of text

Check the code bellow. Here i am creating a method that simply should remove the duplicate from the list foo. If you see the list values they are product id and quantity derived by : so the first part of number before : is product and and second part of number after : is the product quantity. I am taking this list into RemoveDuplicateItems() method for processing. This method should remove all matching product id items from whole list but my current method just returns exactly same list which i am taking on input. How can i fix my method to remove those item from list which has matching first part number. (first part number means before :)
The final output on doo variable it should remove the first from from list which is 22:15 since it has matching with second one.
C#:
[HttpPost]
public JsonResult DoSomething()
{
var foo = new List<string>();
foo.Add("22:10");//this should removed by RemoveDuplicateItems() since it has `22` matching with second one
foo.Add("22:15");
foo.Add("25:30");
foo.Add("26:30");
var doo = RemoveDuplicateItems(foo);
return Json("done");
}
public List<string> RemoveDuplicateItems(List<string> AllItems)
{
var FinalList = new List<string>();
var onlyProductIds = new List<string>();
foreach (var item in AllItems)
{
Match result = Regex.Match(item, #"^.*?(?=:)");
onlyProductIds.Add(result.Value);
}
var unique_onlyProductIds = onlyProductIds.Distinct().ToList();
foreach (var item in AllItems)
{
Match result = Regex.Match(item, #"^.*?(?=:)");
var id = unique_onlyProductIds.Where(x => x.Contains(result.Value)).FirstOrDefault();
if (id != null)
{
FinalList.Add(item);
}
}
return FinalList;
}

Does this work for you?
List<string> doo =
foo
.Select(x => x.Split(':'))
.GroupBy(x => x[0], x => x[1])
.Select(x => $"{x.Key}:{x.Last()}")
.ToList();

There are multiple ways to achieve this, one is, as suggested by #Aluan Haddad is to use Linq. His comment uses the query syntax but would could use the method syntax too (I assumed you use C#8):
List<string> doo = foo.GroupBy(str => str[0..2])
.Select(entry => entry.Last())
.ToList();
Note that this works because the current implementation of GroupBy preserves ordering.

you can do it using Linq :
var doo = foo.Select(x =>
{
var split = x.Split(':');
return new { Key = split[0], Value = split[1] };
})
.GroupBy(x => x.Key)
.OrderBy(x => x.Key)
.Select(x =>
{
var max = x.LastOrDefault();
return $"{max.Key}:{max.Value}";
}
).ToList();

C#, Split List<Class> into List<List<Class>> (group)

I have list of classes and how to split, group them?
class CLA
{
string GroupName;
double Class;
double Value;
}
...
public List<List<CLA>> Dividr (List<CLA> a)
{
List<List<CLA> Clist = new List<List<CLA>>();
Clist.Addrange(...) //Here
returnn Clist;
}
As for dividing, it would be split by it's properties, GroupName, Class.
Example, if elements have same GroupName and Class it will be one List<>.

Simply use GroupBy. Then, as you want an inner list and not IEnumerable use ToList() for each group:
List<CLA> data = new List<CLA>();
var result = data.GroupBy(item => new { item.GroupName, item.Class })
.Select(group => group.ToList()).ToList();
Unless for specific reasons consider returning IEnumerable<IEnumerable<CLA>> instead - shame to already execute the query if not yet needed:
var result = data.GroupBy(item => new { item.GroupName, item.Class })
.Select(group => group);

Identifying and grouping similar items in a collection of strings

I have a collection of strings like the following:
List<string> codes = new List<string>
{
"44.01", "44.02", "44.03", "44.04", "44.05", "44.06", "44.07", "44.08", "46", "47.10"
};
Each string is made up of two components separated by a full stop - a prefix code and a subcode. Some of the strings don't have sub codes.
I want to be able combine the strings whose prefixes are the same and output them as follows with the other codes also:
44(01,02,03,04,05,06,07,08),46,47.10
I'm stuck at the first hurdle of this, which is how to identify and group together the codes whose prefix values are the same, so that I can combine them into a single string as you can see above.

You can do:
var query = codes.Select(c =>
new
{
SplitArray = c.Split('.'), //to avoid multiple split
Value = c
})
.Select(c => new
{
Prefix = c.SplitArray.First(), //you can avoid multiple split if you split first and use it later
PostFix = c.SplitArray.Last(),
Value = c.Value,
})
.GroupBy(r => r.Prefix)
.Select(grp => new
{
Key = grp.Key,
Items = grp.Count() > 1 ? String.Join(",", grp.Select(t => t.PostFix)) : "",
Value = grp.First().Value,
});
This is how it works:
Split each item in the list on the delimiter and populate an anonymous type with Prefix, Postfix and original value
Later group on Prefix
after that select the values and the post fix values using string.Join
For output:
foreach (var item in query)
{
if(String.IsNullOrWhiteSpace(item.Items))
Console.WriteLine(item.Value);
else
Console.WriteLine("{0}({1})", item.Key, item.Items);
}
Output would be:
44(01,02,03,04,05,06,07,08)
46
47.10

Try this:-
var result = codes.Select(x => new { SplitArr = x.Split('.'), OriginalValue = x })
.GroupBy(x => x.SplitArr[0])
.Select(x => new
{
Prefix= x.Key,
subCode = x.Count() > 1 ?
String.Join(",", x.Select(z => z.SplitArray[1])) : "",
OriginalValue = x.First().OriginalValue
});
You can print your desired output like this:-
foreach (var item in result)
{
Console.Write("{0}({1}),",item.Prefix,item.subCode);
}
Working Fiddle.

Outlined idea:
Use Dictionary<string, List<string>> for collecting your result
in a loop over your list, use string.split() .. the first element will be your Dictionary key ... create a new List<string> there if the key doesn't exist yet
if the result of split has a second element, append that to the List
use a second loop to format that Dictionary to your output string
Of course, linq is possible too, e.g.
List<string> codes = new List<string>() {
"44.01", "44.05", "47", "42.02", "44.03" };
var result = string.Join(",",
codes.OrderBy(x => x)
.Select(x => x.Split('.'))
.GroupBy(x => x[0])
.Select((x) =>
{
if (x.Count() == 0) return x.Key;
else if (x.Count() == 1) return string.Join(".", x.First());
else return x.Key + "(" + string.Join(",", x.Select(e => e[1]).ToArray()) + ")";
}).ToArray());
Gotta love linq ... haha ... I think this is a monster.

You can do it all in one clever LINQ:
var grouped = codes.Select(x => x.Split('.'))
.Select(x => new
{
Prefix = int.Parse(x[0]),
Subcode = x.Length > 1 ? int.Parse(x[1]) : (int?)null
})
.GroupBy(k => k.Prefix)
.Select(g => new
{
Prefix = g.Key,
Subcodes = g.Where(s => s.Subcode.HasValue).Select(s => s.Subcode)
})
.Select(x =>
x.Prefix +
(x.Subcodes.Count() == 1 ? string.Format(".{0}", x.Subcodes.First()) :
x.Subcodes.Count() > 1 ? string.Format("({0})", string.Join(",", x.Subcodes))
: string.Empty)
).ToArray();
First it splits by Code and Subcode
Group by you Code, and get all Subcodes as a collection
Select it in the appropriate format
Looking at the problem, I think you should stop just before the last Select and let the data presentation be done in another part/method of your application.

The old fashioned way:
List<string> codes = new List<string>() {"44.01", "44.05", "47", "42.02", "44.03" };
string output=""
for (int i=0;i<list.count;i++)
{
string [] items= (codes[i]+"..").split('.') ;
int pos1=output.IndexOf(","+items[0]+"(") ;
if (pos1<0) output+=","+items[0]+"("+items[1]+")" ; // first occurence of code : add it
else
{ // Code already inserted : find the insert point
int pos2=output.Substring(pos1).IndexOf(')') ;
output=output.Substring(0,pos2)+","+items[1]+output.Substring(pos2) ;
}
}
if (output.Length>0) output=output.Substring(1).replace("()","") ;

This will work, including the correct formats for no subcodes, a single subcode, multiple subcodes. It also doesn't assume the prefix or subcodes are numeric, so it leaves leading zeros as is. Your question didn't show what to do in the case you have a prefix without subcode AND the same prefix with subcode, so it may not work in that edge case (44,44.01). I have it so that it ignores the prefix without subcode in that edge case.
List<string> codes = new List<string>
{
"44.01", "44.02", "44.03", "44.04", "44.05", "44.06", "44.07", "44.08", "46", "47.10"
};
var result=codes.Select(x => (x+".").Split('.'))
.Select(x => new
{
Prefix = x[0],
Subcode = x[1]
})
.GroupBy(k => k.Prefix)
.Select(g => new
{
Prefix = g.Key,
Subcodes = g.Where(s => s.Subcode!="").Select(s => s.Subcode)
})
.Select(x =>
x.Prefix +
(x.Subcodes.Count() == 0 ? string.Empty :
string.Format(x.Subcodes.Count()>1?"({0})":".{0}",
string.Join(",", x.Subcodes)))
).ToArray();

General idea, but i'm sure replacing the Substring calls with Regex would be a lot better as well
List<string> newCodes = new List<string>()
foreach (string sub1 in codes.Select(item => item.Substring(0,2)).Distinct)
{
StringBuilder code = new StringBuilder();
code.Append("sub1(");
foreach (string sub2 in codes.Where(item => item.Substring(0,2) == sub1).Select(item => item.Substring(2))
code.Append(sub2 + ",");
code.Append(")");
newCodes.Add(code.ToString());
}

You could go a couple ways... I could see you making a Dictionary<string,List<string>> so that you could have "44" map to a list of {".01", ".02", ".03", etc.} This would require you processing the codes before adding them to this list (i.e. separating out the two parts of the code and handling the case where there is only one part).
Or you could put them into a a SortedSet and provide your own Comparator which knows that these are codes and how to sort them (at least that'd be more reliable than grouping them alphabetically). Iterating over this SortedSet would still require special logic, though, so perhaps the Dictionary to List option above is still preferable.
In either case you would still need to handle a special case "46" where there is no second element in the code. In the dictionary example, would you insert a String.Empty into the list? Not sure what you'd output if you got a list {"46", "46.1"} -- would you display as "46(null,1)" or... "46(0,1)"... or "46(,1)" or "46(1)"?

Alternatives to LINQ.SelectMany with constant number of inner elements

I am trying to determine if there is a better way to execute the following query:
I have a List of Pair objects.
A Pair is defined as
public class Pair
{
public int IDA;
public int IDB;
public double Stability;
}
I would like to extract a list of all distinct ID's (ints) contained in the List<Pair>.
I am currently using
var pIndices = pairs.SelectMany(p => new List<int>() { p.IDA, p.IDB }).Distinct().ToList();
Which works, but it seems unintuitive to me to create a new List<int> only to have it flattened out by SelectMany.
This is another option I find unelegant to say the least:
var pIndices = pairs.Select(p => p.IDA).ToList();
pIndices.AddRange(pairs.Select((p => p.IDB).ToList());
pIndices = pIndices.Distinct().ToList();
Is there a better way? And if not, which would you prefer?

You could use Union() to get both the A's and B's after selecting them individually.
var pIndices = pairs.Select(p => p.IDA).Union(pairs.Select(p => p.IDB));

You could possibly shorten the inner expression to p => new [] { p.IDA, p.IDB }.

If you don't want to create a 2-element array/list for each Pair, and don't want to iterate your pairs list twice, you could just do it by hand:
HashSet<int> distinctIDs = new HashSet<int>();
foreach (var pair in pairs)
{
distinctIDs.Add(pair.IDA);
distinctIDs.Add(pair.IDB);
}

This is one without a new collection:
var pIndices = pairs.Select(p => p.IDA)
.Concat(pairs.Select(p => p.IDB))
.Distinct();

Shorten it like this:
var pIndices = pairs.SelectMany(p => new[] { p.IDA, p.IDB }).Distinct().ToList();

Using Enumerable.Repeat is a little unorthodox, but here it is anyway:
var pIndices = pairs
.SelectMany(
p => Enumerable.Repeat(p.IDA, 1).Concat(Enumerable.Repeat(p.IDB, 1))
).Distinct()
.ToList();
Finally, if you do not mind a little helper class, you can do this:
public static class EnumerableHelper {
// usage: EnumerableHelper.AsEnumerable(obj1, obj2);
public static IEnumerable<T> AsEnumerable<T>(params T[] items) {
return items;
}
}
Now you can do this:
var pIndices = pairs
.SelectMany(p => EnumerableHelper.AsEnumerable(p.IDA, p.IDB))
.Distinct()
.ToList();

trim away duplicates using Linq

I am working with an API that is returning duplicate Ids. I need to insert these values into my database using the EF. Before trying to add the objects I want to trim away any duplicates.
I have a small example of the code I am trying to write.
var itemsToImport = new List<Item>(){};
itemsToImport.Add(new Item() { Description = "D-0", Id = 0 });
for (int i = 0; i < 5; i++)
{
itemsToImport.Add(new Item(){Id = i,Description = "D-"+i.ToString()});
}
var currentItems = new List<Item>
{
new Item() {Id = 1,Description = "D-1"},
new Item(){Id = 3,Description = "D-3"}
};
//returns the correct missing Ids
var missing = itemsToImport.Select(s => s.Id).Except(currentItems.Select(s => s.Id));
//toAdd contains the duplicate record.
var toAdd = itemsToImport.Where(x => missing.Contains(x.Id));
foreach (var item in toAdd)
{
Console.WriteLine(item.Description);
}
What do I need to change to fix my variable "toAdd" to only return a single record even if there is a repeat?

You can do this by grouping by the Id and then selecting the first item in each group.
var toAdd = itemsToImport
.Where(x => missing.Contains(x.Id));
becomes
var toAdd = itemsToImport
.Where(x => missing.Contains(x.Id))
.GroupBy(item => item.Id)
.Select(grp => grp.First());

Use DistinctBy from MoreLINQ, as recommended by Jon Skeet in https://stackoverflow.com/a/2298230/385844
The call would look something like this:
var toAdd = itemsToImport.Where(x => missing.Contains(x.Id)).DistinctBy(x => x.Id);
If you'd rather not (or can't) use MoreLINQ for some reason, DistinctBy is fairly easy to implement yourself:
static IEnumerable<T> DistinctBy<T, TKey>(this IEnumerable<T> sequence, Func<T, TKey> projection)
{
var set = new HashSet<TKey>();
foreach (var item in sequence)
if (set.Add(projection(item)))
yield return item;
}

You can use the Distinct function. You'll need to override Equals and GetHashCode in Item (Given they contain the same data) though.
Or use FirstOrDefault to get the first Item with the matching Id back.
itemsToImport.Where(x => missing.Contains(x.Id)).FirstOrDefault()

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

LINQ: Enumerate through duplicates in List and remove them - c#

Related

Remove duplicate by matching string part of text

C#, Split List<Class> into List<List<Class>> (group)

Identifying and grouping similar items in a collection of strings

Alternatives to LINQ.SelectMany with constant number of inner elements

trim away duplicates using Linq

Categories

Resources