trim away duplicates using Linq - c#

I am working with an API that is returning duplicate Ids. I need to insert these values into my database using the EF. Before trying to add the objects I want to trim away any duplicates.
I have a small example of the code I am trying to write.
var itemsToImport = new List<Item>(){};
itemsToImport.Add(new Item() { Description = "D-0", Id = 0 });
for (int i = 0; i < 5; i++)
{
itemsToImport.Add(new Item(){Id = i,Description = "D-"+i.ToString()});
}
var currentItems = new List<Item>
{
new Item() {Id = 1,Description = "D-1"},
new Item(){Id = 3,Description = "D-3"}
};
//returns the correct missing Ids
var missing = itemsToImport.Select(s => s.Id).Except(currentItems.Select(s => s.Id));
//toAdd contains the duplicate record.
var toAdd = itemsToImport.Where(x => missing.Contains(x.Id));
foreach (var item in toAdd)
{
Console.WriteLine(item.Description);
}
What do I need to change to fix my variable "toAdd" to only return a single record even if there is a repeat?

You can do this by grouping by the Id and then selecting the first item in each group.
var toAdd = itemsToImport
.Where(x => missing.Contains(x.Id));
becomes
var toAdd = itemsToImport
.Where(x => missing.Contains(x.Id))
.GroupBy(item => item.Id)
.Select(grp => grp.First());

Use DistinctBy from MoreLINQ, as recommended by Jon Skeet in https://stackoverflow.com/a/2298230/385844
The call would look something like this:
var toAdd = itemsToImport.Where(x => missing.Contains(x.Id)).DistinctBy(x => x.Id);
If you'd rather not (or can't) use MoreLINQ for some reason, DistinctBy is fairly easy to implement yourself:
static IEnumerable<T> DistinctBy<T, TKey>(this IEnumerable<T> sequence, Func<T, TKey> projection)
{
var set = new HashSet<TKey>();
foreach (var item in sequence)
if (set.Add(projection(item)))
yield return item;
}

You can use the Distinct function. You'll need to override Equals and GetHashCode in Item (Given they contain the same data) though.
Or use FirstOrDefault to get the first Item with the matching Id back.
itemsToImport.Where(x => missing.Contains(x.Id)).FirstOrDefault()

Related

How to OrderBy() as per the order requested in distinct string list

How to use OrderBy for shaping output in the same order as per the requested distinct list
public DataCollectionList GetLatestDataCollection(List<string> requestedDataPointList)
{
var dataPoints = _context.DataPoints.Where(c => requestedDataPointList.Contains(c.dataPointName))
.OrderBy(----------) //TODO: RE-ORDER IN THE SAME ORDER AS REQUESTED requestedDataPointList
.ToList();
dataPoints.ForEach(dp =>
{
....
});
}
Do the sorting on the client side:
public DataCollectionList GetLatestDataCollection(List<string> requestedDataPointList)
{
var dataPoints = _context.DataPoints.Where(c => requestedDataPointList.Contains(c.dataPointName))
.AsEnumerable()
.OrderBy(requestedDataPointList.IndexOf(c.dataPointName));
foreach (var dp in dataPoints)
{
....
});
}
NOTE: Also, I don't think ToList().ForEach() is ever better than foreach ().
It think the fastest method is to join the result back with the request list. This makes use of the fact that LINQ's join preserves the sort order of the first list:
var dataPoints = _context.DataPoints
.Where(c => requestedDataPointList.Contains(c.dataPointName))
.ToList();
var ordered = from n in requestedDataPointList
join dp in dataPoints on n equals dp.dataPointName
select dp;
foreach (var dataPoint in ordered)
{
...
}
This doesn't involve any ordering, joining does it all, which will be close to O(n).
Another fast method consists of creating a dictionary of sequence numbers:
var indexes = requestedDataPointList
.Select((n, i) => new { n, i }).ToDictionary(x => x.n, x => x.i);
var ordered = dataPoints.OrderBy(dp => indexes[dp.dataPointName]);

Remove duplicate by matching string part of text

Check the code bellow. Here i am creating a method that simply should remove the duplicate from the list foo. If you see the list values they are product id and quantity derived by : so the first part of number before : is product and and second part of number after : is the product quantity. I am taking this list into RemoveDuplicateItems() method for processing. This method should remove all matching product id items from whole list but my current method just returns exactly same list which i am taking on input. How can i fix my method to remove those item from list which has matching first part number. (first part number means before :)
The final output on doo variable it should remove the first from from list which is 22:15 since it has matching with second one.
C#:
[HttpPost]
public JsonResult DoSomething()
{
var foo = new List<string>();
foo.Add("22:10");//this should removed by RemoveDuplicateItems() since it has `22` matching with second one
foo.Add("22:15");
foo.Add("25:30");
foo.Add("26:30");
var doo = RemoveDuplicateItems(foo);
return Json("done");
}
public List<string> RemoveDuplicateItems(List<string> AllItems)
{
var FinalList = new List<string>();
var onlyProductIds = new List<string>();
foreach (var item in AllItems)
{
Match result = Regex.Match(item, #"^.*?(?=:)");
onlyProductIds.Add(result.Value);
}
var unique_onlyProductIds = onlyProductIds.Distinct().ToList();
foreach (var item in AllItems)
{
Match result = Regex.Match(item, #"^.*?(?=:)");
var id = unique_onlyProductIds.Where(x => x.Contains(result.Value)).FirstOrDefault();
if (id != null)
{
FinalList.Add(item);
}
}
return FinalList;
}
Does this work for you?
List<string> doo =
foo
.Select(x => x.Split(':'))
.GroupBy(x => x[0], x => x[1])
.Select(x => $"{x.Key}:{x.Last()}")
.ToList();
There are multiple ways to achieve this, one is, as suggested by #Aluan Haddad is to use Linq. His comment uses the query syntax but would could use the method syntax too (I assumed you use C#8):
List<string> doo = foo.GroupBy(str => str[0..2])
.Select(entry => entry.Last())
.ToList();
Note that this works because the current implementation of GroupBy preserves ordering.
you can do it using Linq :
var doo = foo.Select(x =>
{
var split = x.Split(':');
return new { Key = split[0], Value = split[1] };
})
.GroupBy(x => x.Key)
.OrderBy(x => x.Key)
.Select(x =>
{
var max = x.LastOrDefault();
return $"{max.Key}:{max.Value}";
}
).ToList();

LINQ: Enumerate through duplicates in List and remove them

I need to remove duplicates, but also log which I am removing. I have two solutions right now, one that can go through each duplicate and one that removes duplicates. I know that removing in-place inside a foreach is dangerous so I am a bit stuck on how to do this as efficient as possible.
What I got right now
var duplicates = ListOfThings
.GroupBy(x => x.ID)
.Where(g => g.Skip(1).Any())
.SelectMany(g => g);
foreach (var duplicate in duplicates)
{
Log.Append(Logger.Type.Error, "Conflicts with another", "N/A", duplicate.ID);
}
ListOfThings = ListOfThings.GroupBy(x => x.ID).Select(y => y.First()).ToList();
Well, ToList() will materialize the query, so if you allow side effects (i.e. writing to log) it could be like that:
var cleared = ListOfThings
.GroupBy(x => x.ID)
.Select(chunk => {
// Side effect: writing to log while selecting
if (chunk.Skip(1).Any())
Log.Append(Logger.Type.Error, "Conflicts with another", "N/A", chunk.Key);
// if there're duplicates by Id take the 1st one
return chunk.First();
})
.ToList();
Why group when one can use the Aggregate function to determine the duplicates for the report and the result?
Example
var items = new List<string>() { "Alpha", "Alpha", "Beta", "Gamma", "Alpha"};
var duplicatesDictionary =
items.Aggregate (new Dictionary<string, int>(),
(results, itm) =>
{
if (results.ContainsKey(itm))
results[itm]++;
else
results.Add(itm, 1);
return results;
});
Here is the result of the above where each insert was counted and reported.
Now extract the duplicates report for any count above 1.
duplicatesDictionary.Where (kvp => kvp.Value > 1)
.Select (kvp => string.Format("{0} had {1} duplicates", kvp.Key, kvp.Value))
Now the final result is to just extract all the keys.
duplicatesDictionary.Select (kvp => kvp.Key);
You can use a hash set and union it with a list to get unique items; just override the reference comparison. Implementing IEqualityComparer<T> is flexible; if it's just ID that makes two objects unique then ok; but if it's more you can extend it, too.
You can get duplicates with LINQ.
void Main()
{
//your original class:
List<Things> originalList = new List<Things> { new Things(5), new Things(3), new Things(5) };
//i'm doing this in LINQPad; if you're using VS you may need to foreach the object
Console.WriteLine(originalList);
//put your duplicates back in a list and log them as you did.
var duplicateItems = originalList.GroupBy(x => x.ID).Where(x => x.Count() > 1).ToList();//.Select(x => x.GetHashCode());
Console.WriteLine(duplicateItems);
//create a custom comparer to compare your list; if you care about more than ID then you can extend this
var tec = new ThingsEqualityComparer();
var listThings = new HashSet<Things>(tec);
listThings.UnionWith(originalList);
Console.WriteLine(listThings);
}
// Define other methods and classes here
public class Things
{
public int ID {get;set;}
public Things(int id)
{
ID = id;
}
}
public class ThingsEqualityComparer : IEqualityComparer<Things>
{
public bool Equals(Things thing1, Things thing2)
{
if (thing1.ID == thing2.ID)
{
return true;
}
else
{
return false;
}
}
public int GetHashCode(Things thing)
{
int hCode = thing.ID;
return hCode.GetHashCode();
}
}

Does anything already exist to flatten a mix of single instances and arrays of instances?

I just wrote myself a utility function
private static IEnumerable<T> Flatten<T>(params object[] items) where T : class
{
return items.SelectMany(c => c is T ? new[] {c as T} : (IEnumerable<T>) c);
}
It allowed me to go from this:
var lines = records
.GroupBy(c => new {c.CODEID, c.DESCRIPTION})
.SelectMany(c =>
new[] { string.Format(insertRecord, c.Key.CODEID, c.Key.DESCRIPTION) }
.Concat(c.Select(d => string.Format(insertDetail, d.CODEID, d.CODESEQ, d.DATAVALUE, d.DISPLAYVALUE)))
.Concat(new [] {Environment.NewLine}))
;
To this:
var lines2 = records
.GroupBy(c => new {c.CODEID, c.DESCRIPTION})
.SelectMany(c => Flatten<string>(
string.Format(insertRecord, c.Key.CODEID, c.Key.DESCRIPTION),
c.Select(d => string.Format(insertDetail, d.CODEID, d.CODESEQ, d.DATAVALUE, d.DISPLAYVALUE)),
Environment.NewLine))
;
Before I commit such an obscure looking thing to my code base, I wanted to see if I was overlooking some other obvious way to avoid the use of Concat in the first way above...
NOTE: maybe this belongs on code review... not sure
You can make the code simpler, but more importantly make the whole thing statically typed, by creating a method that simply prepends a single item to the start of a sequence:
public static IEnumerable<T> Prepend<T>(
this IEnumerable<T> sequence,
T item)
{
yield return item;
foreach(var current in sequence)
yield return current;
}
And one to append an item to the end of a sequence:
public static IEnumerable<T> Append<T>(
this IEnumerable<T> sequence,
T item)
{
foreach(var current in sequence)
yield return current;
yield return item;
}
Now your method can be written as:
var lines = records
.GroupBy(c => new {c.CODEID, c.DESCRIPTION})
.SelectMany(c =>
c.Select(d =>
string.Format(insertDetail, d.CODEID, d.CODESEQ, d.DATAVALUE, d.DISPLAYVALUE))
.Prepend(string.Format(insertRecord, c.Key.CODEID, c.Key.DESCRIPaTION))
.Append(Environment.NewLine);
The other route you can go is to write an AsSequence method that can more effectively turn an item into a sequence of size one.
public static IEnumerable<T> AsSequence<T>(this T item)
{
yield return item;
}
This does clean up your original code a bit by making the entire query one fluent sequence of method calls:
var lines = records
.GroupBy(c => new {c.CODEID, c.DESCRIPTION})
.SelectMany(c =>
string.Format(insertRecord, c.Key.CODEID, c.Key.DESCRIPTION)
.AsSequence()
.Concat(c.Select(d =>
string.Format(insertDetail, d.CODEID, d.CODESEQ, d.DATAVALUE, d.DISPLAYVALUE)))
.Concat(Environment.NewLine.AsSequence()));

Alternatives to LINQ.SelectMany with constant number of inner elements

I am trying to determine if there is a better way to execute the following query:
I have a List of Pair objects.
A Pair is defined as
public class Pair
{
public int IDA;
public int IDB;
public double Stability;
}
I would like to extract a list of all distinct ID's (ints) contained in the List<Pair>.
I am currently using
var pIndices = pairs.SelectMany(p => new List<int>() { p.IDA, p.IDB }).Distinct().ToList();
Which works, but it seems unintuitive to me to create a new List<int> only to have it flattened out by SelectMany.
This is another option I find unelegant to say the least:
var pIndices = pairs.Select(p => p.IDA).ToList();
pIndices.AddRange(pairs.Select((p => p.IDB).ToList());
pIndices = pIndices.Distinct().ToList();
Is there a better way? And if not, which would you prefer?
You could use Union() to get both the A's and B's after selecting them individually.
var pIndices = pairs.Select(p => p.IDA).Union(pairs.Select(p => p.IDB));
You could possibly shorten the inner expression to p => new [] { p.IDA, p.IDB }.
If you don't want to create a 2-element array/list for each Pair, and don't want to iterate your pairs list twice, you could just do it by hand:
HashSet<int> distinctIDs = new HashSet<int>();
foreach (var pair in pairs)
{
distinctIDs.Add(pair.IDA);
distinctIDs.Add(pair.IDB);
}
This is one without a new collection:
var pIndices = pairs.Select(p => p.IDA)
.Concat(pairs.Select(p => p.IDB))
.Distinct();
Shorten it like this:
var pIndices = pairs.SelectMany(p => new[] { p.IDA, p.IDB }).Distinct().ToList();
Using Enumerable.Repeat is a little unorthodox, but here it is anyway:
var pIndices = pairs
.SelectMany(
p => Enumerable.Repeat(p.IDA, 1).Concat(Enumerable.Repeat(p.IDB, 1))
).Distinct()
.ToList();
Finally, if you do not mind a little helper class, you can do this:
public static class EnumerableHelper {
// usage: EnumerableHelper.AsEnumerable(obj1, obj2);
public static IEnumerable<T> AsEnumerable<T>(params T[] items) {
return items;
}
}
Now you can do this:
var pIndices = pairs
.SelectMany(p => EnumerableHelper.AsEnumerable(p.IDA, p.IDB))
.Distinct()
.ToList();

Categories

Resources