Remove duplicate from large lists efficiently?

Remove duplicate from large lists efficiently? - c#

I have a list which contains ids and values and I need to remove ids duplication. I am looking for an efficiently way preferable in LINQ, instead of my loop and if condition. Thank you for any help and advise.
var list = new List<Tuple<int, double>>();
Current values:
1, 3.6
1, 3.8
2, 5.6
3, 8.1
Wished values:
1, 3.6
2, 5.6
3, 8.1
for (int i = 0; i < list.Count - 1; i++)
{
if (list[i].Item1 == list[i + 1].Item1)
list.RemoveAt(i+ 1);
}

If Id and Values are same as the other one. It will remove that item from list.
distinctList = list.Distinct().ToList();
If you are okay with converting the Tuple to Dictionary:
Try This: If Only Id's are duplicate removes that item from list. It will not consider the value duplication.
var distinctDictionary = list.GroupBy(id => id.Item1)
.Select(group => group.First())
.ToDictionary(id => id.Item1, val => val.Item2);
Look at the Screen shots:
Solution 1:
Solution 2:

Given your opinion that LINQ is generally more readable / maintainable and is generally equitable to efficiency, I present the following solution, which uses LINQ, and (IMHO compared to others presented so far) is more efficient in execution as well -
list = list.Where((entry, i) => i == 0 || entry.Item1 != list[i - 1].Item1).ToList();

Why are you a List with tuples? With the requested functionality I would use a Dictionary so you won't have duplicates.

DistinctByKey = list.Select(x => x.Keys).Distinct();
DistinctByValue= DistinctByKey.Select(x => x.Values).Distinct();

Related

LINQ Select in descending order from collection, three different values that go up to the maximum value of the element

For example i have a collection like this
var c1 = new Collection<int>{0,0,2,2,2,3,3,4,4,4,4,5,5,6,6,7};
I would like to get result like this
(6,5,4)

You can do:
c1.Distinct()
.OrderByDescending(x => x)
.Skip(1)
.Take(3)
.ToList()
First remove all the duplicates, then sort despondingly. Skip(1) so that the max element is removed. Finally you can take 3 elements from the rest.

In the old days, before LINQ, we might have done this on a sorted collection like you have:
var maxes = new int[4];
var idx = 0;
var max = 0;
foreach(var c in c1)
if(c > max)
max = maxes[(idx++)%4] = c;
At the end of this you'll have an array with 4 max values - the 3 you want, and the one you don't (which is in (idx - 1) % 4). I don't know if I'd use it now, but it's more efficient than a "distinct, then sort, then skip then take" approach as it does its work in a single pass

Partition Keys in List<KeyValuePair> c#

I have a List of KeyValuePairs
var hitCoord = new List<KeyValuePair<int, double>>()
and sorted like this (descending by Key)
hitCoord.Sort((a, b) => (b.Key.CompareTo(a.Key)));
I can find the total highest Value with
hitCoord.Sort((a, b) => (b.Value.CompareTo(a.Value)));
(^ maybe that can be used for the following query?)
I would like to partition the Keys in my list such that I can find Values that meet a condition within the specified range of keys.
i.e. I would like to find the highest Value and Lowest Value in a range of (int)Keys
for (i=0; i<hitCoord.Count; i++)
{
if (hitCoord[i].Key > (int lowerbound) && hitCoord[i].Key < (int upperBound)
{
find highest Value?
}
}
Not sure if that is at all on the right track. I am new to programming and very new to KeyValuePairs. Any help you can offer on this matter is much appreciated! Thank you!

Finding the max value in a specified range of keys could be solved by using LINQ (using System.Linq;) like this:
hitCoord.Where(c => c.Key > lowerbound && c.Key < upperbound).Max(c => c.Value);
The approach:
Use Where to filter all items with key in range
Use Max to get the max value
You could adapt and extend the query also with more checks and constraints. Some basic queries are described in Basic LINQ Query Operations (C#).

You don't need to actually sort - you can do this with Linq (adding using System.Linq; to the top of your .cs file). You just want a Where to filter by key and a Max to get the highest value:
var maxValue = hitCoord.Where(hc => hc.Key > lowerbound && hc.Key < upperBound)
.Max(hc => hc.Value);

As others have suggested this is all pretty easy to do with linq. here's another sample of linq calls including how to create a partition lookup.
var hitCoord = new List<KeyValuePair<int, double>>()
{
new KeyValuePair<int, double>(1, 1.1),
new KeyValuePair<int, double>(1, 1.2),
new KeyValuePair<int, double>(2, 2.0),
new KeyValuePair<int, double>(2, 2.1)
};
var partitions = hitCoord.ToLookup(kvp => kvp.Key % 2);
var maxKvp = hitCoord.Max(kvp => kvp.Key);
var minKvp = hitCoord.Min(kvp => kvp.Key);
int lower = 1;
int higher = 2;
var maxInRange = hitCoord.Where(kvp => kvp.Key >= lower && kvp.Key <= higher).Max(kvp => kvp.Key);
That said if this is perfromance critical then you'll probably want to use something other than linq so you can optimize it and avoid going through the list multiple times.

c# mongodb Find and remove one element from an array selected among several documents

I'm a newbie in a document-oriented database in general and in MongoDB in particular.
This database was created by me: a collection of several disjoint segments containing integers.
I'd like to take one item in accordance with some conditions and remove it from the document.
For example, I tried to take the item with conditions:
from [-105; 17]
not zero
contains in {-104, -97, -5, 0, 5}
like this
var db = client.GetDatabase("mongodbPOC");
var collection = db.GetCollection<Document>("Int");
var contains = new List<int> { -104, -97, -5, 0, 5 };
var result = collection.AsQueryable()
.Where(document => 17 >= document.Min && -105 <= document.Max)
.SelectMany(document => document.Values)
.First(val => val != 0 && contains.Contains(val));
and find it again for remove, but I sure that exists a more profitable way to do that.

To remove items from array in MongoDb, you need to use Pull or PullFilter, in your case, you need to use PullFilter, like this:
var filterPull = Builders<int>.Filter
.Where(x => x != 0 && contains.Contains(x));
var update = Builders<YourModel>.Update
.PullFilter(c => c.Values, filterPull);
Then create another filter for Min, Max condition, this filter is for your document and use Update Collection:
var filter = Builders<YourModel>.Filter
.Where(document => 17 >= document.Min && -105 <= document.Max);
Collection.UpdateManyAsync(filter, update);

For remove finding a solution was not easy, but they helped me on the MongoDB forum in slack. To solve this problem there are two ways:
using agg expressions in 4.2 for values where the position is known or unknown(Asya's answer): https://jira.mongodb.org/browse/SERVER-1014?focusedCommentId=2305681&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-2305681
using $unset followed by $pullAll to remove all null's

LINQ changing local "let" variable

Using a LINQ query (with C#) how would I go about do something like this (pseudocode)?
I'd look to do something like this is in places where, for example, I might generate 1000's of lists of 100's of random (bounded) integers, where I want to track the smallest of them as they're generated.
Best <- null value
Foreach N in Iterations
NewList <- List of 100 randomly generated numbers
If Best is null
Best <- NewList
If Sum(NewList) < Sum(Best)
Best <- NewList
Select Best
I've tried all sorts of things, but I can't really get it working. This isn't for any kind of project or work, just for my own curiosity!
Example of what I was thinking:
let R = new Random()
let Best = Enumerable.Range(0, 100).Select(S => R.Next(-100, 100)).ToArray()
//Where this from clause is acting like a for loop
from N in Iterations
let NewList = Enumerable.Range(0, 100).Select(S => R.Next(-100, 100))
Best = (NewList.Sum() < Best.Sum())? NewList : Best;
select Best

I believe you are looking for fold (aka "reduce") which is known as Aggregate in LINQ.
(IEnumerable.Min/Max are special-cases, but can be written in terms of fold/Aggregate.)
int Max (IEnumerable<int> x) {
return x.Aggregate(int.MinValue, (prev, cur) => prev > cur ? prev : cur);
}
Max(new int[] { 1, 42, 2, 3 }); // 42
Happy coding.

Looks like you're just selecting the minimum value.
var minimum = collection.Min( c => c );

You are effectively finding the minimum value in the collection, if it exists:
int? best = null;
if (collection != null && collection.Length > 0) best = collection.Min();

Linq TakeWhile depending on sum (or aggregate) of elements

I have a list of elements and want to takeWhile the sum (or any aggregation of the elements) satisfy a certain condition. The following code does the job, but i am pretty sure this is not an unusual problem for which a proper pattern should exist.
var list = new List<int> { 1, 2, 3, 4, 5, 6, 7 };
int tmp = 0;
var listWithSum = from x in list
let sum = tmp+=x
select new {x, sum};
int MAX = 10;
var result = from x in listWithSum
where x.sum < MAX
select x.x;
Does somebody know how to solve the task in nicer way, probably combining TakeWhile and Aggregate into one query?
Thx

It seems to me that you want something like the Scan method from Reactive Extensions (the System.Interactive part) - it's like Aggregate, but yields a sequence instead of a single result. You could then do:
var listWithSum = list.Scan(new { Value = 0, Sum = 0 },
(current, next) => new { Value = next,
Sum = current.Sum + next });
var result = listWithSum.TakeWhile(x => x.Sum < MaxTotal)
.Select(x => x.Value);
(MoreLINQ has a similar operator, btw - but currently it doesn't support the idea of the accumulator and input sequence not being the same type.)

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Remove duplicate from large lists efficiently? - c#

Why are you a List with tuples? With the requested functionality I would use a Dictionary so you won't have duplicates.

DistinctByKey = list.Select(x => x.Keys).Distinct(); DistinctByValue= DistinctByKey.Select(x => x.Values).Distinct();

Related

LINQ Select in descending order from collection, three different values that go up to the maximum value of the element

Partition Keys in List<KeyValuePair> c#

c# mongodb Find and remove one element from an array selected among several documents

LINQ changing local "let" variable

Linq TakeWhile depending on sum (or aggregate) of elements

Categories

Resources