LINQ: Removing duplicates with GroupBy - c#

I have this code similar to this post
List<MyObject> myList = new List<MyObject>(){new MyObject { Number1 = 1, Number2 = 2, Number3 = 3 },
new MyObject { Number1 = 1, Number2 = 2, Number3 = 3 }};
var listWithoutDuplicated = myList
.GroupBy(x => new { x.Number1, x.Number2, x.Number3 })
.Select(x => x.First());
int counter = 0;
foreach (var item in listWithoutDuplicated)
{
counter ++;
}
That code would return counter = 1, so it works fine, but why is it necessary .Select(x => x.First()); and not only .First(); at the end?
// This code would not remove duplicates.
var listWithoutDuplicated = myList
.GroupBy(x => new { x.Number1, x.Number2, x.Number3 })
.First();
Thanks a lot.

.First() returns the first group from a sequence of groups.
Select(group => group.First()) takes a sequences of groups and returns a new sequence representing the first item in each group.
Both are entirely valid things to do, and also extremely different. You can see this by simply printing out the results, or for that matter, just looking at the type of the result (this would be more visible in the code if you did not usevar).

Related

How to return a unique element from a C# array

Problem
I am trying to return a unique element from a C# array with the in-built method array.Distinct.First(). The method works for some arrays but returns an incorrect element for this array {11,11,11,14,11,11}. It gives me an output of 11 which is incorrect because the unique element in that array is supposed to 14.
I know there is a way to do that with a predicate function but that is just the reason why am asking, I do not know
Code
public static int getUnique(IEnumerable<int> numbers){
return numbers.Distinct().First();
//you can bet i tried .FirstOrDefault()
//without any success
}
Any help to make that code return the correct value is highly appreciated.
That's not what Distinct does.
It'll return distinct elements from a sequence, thus removing the duplicates.
You only want the items that have no duplicates to begin with;
var numbers = new int[] { 11, 11, 11, 14, 11 };
foreach (var i in numbers.GroupBy(i => i).Where(g => g.Count() == 1).Select(g => g.Key))
Console.WriteLine(i);
You obviously want the first non-duplicate value, so you'd get something like this;
var numbers = new int[] { 11, 11, 11, 14, 11 };
int? firstNonDuplicate = numbers.GroupBy(i => i).Where(g => g.Count() == 1).Select(g => g.Key).FirstOrDefault();
Console.WriteLine(firstNonDuplicate);
Be vary of your null checks here though, this just proves a point.
using distinct it will return all distinct element. Get a unique value in following way
int[] numbers = { 11, 11, 11, 14, 11 };
Console.WriteLine(numbers.GroupBy(i => i).Where(g => g.Count() == 1).Select(g => g.Key).First());
Generally speaking you should group your collection by a key and then filter that by the number of containing elements:
var groupings= numbers.GroupBy(x => x)
.Where(x => x.Count() == 1)
.Select(x => x.Key);
Now that gives you all unique numbers. Your question is a bit vague, because you want "a" unique number - not "the" unique number, so there is room for interpretation, what should happen, if there are multiple numbers.
Option 1: Just take the first result:
var uniques = numbers.GroupBy(x => x)
.Where(x => x.Count() == 1)
.Select(x => x.Key)
.First();
Option 1.1 Order results and take the smallest (or largest)
var uniques = numbers.GroupBy(x => x)
.Where(x => x.Count() == 1)
.Select(x => x.Key)
.OrderBy(x => x)
.First();
Option 2: Ensure there is only one unique number and throw otherwise:
var uniques = numbers.GroupBy(x => x)
.Where(x => x.Count() == 1)
.Select(x => x.Key)
.Single();
Be aware: If there are no unique numbers, Single() and First() will throw, where as SingleOrDefault() and FirstOrDefault() will return the default value of int, which is 0, which can lead to false results. You can consider changing it to int? to allow for null to be returned, if there are no unique numbers.
You can try below code.
var numbers = new int[] { 11, 11, 11, 14, 11, 11 };
var uniqueList = numbers.GroupBy(n => n).Where(item => item.Count() == 1).Select(item => item.Key);
foreach (var item in uniqueList)
Console.WriteLine(item);
I made a special method for you. I approached the subject somewhat primitively. Thanks to this method, you can easily find unique variables.
public static int[] getUniqiue(int[] vs)
{
List<int> vs1 = new List<int>(vs);
List<int> vs2 = new List<int>(vs);
List<int> ee = new List<int>();
List<int> vs3 = new List<int>();
int i = 0;
foreach (var item in vs1)
{
vs2.Remove(item);
if(vs3.Contains(item) || vs2.Contains(item))
{
vs3.Add(item);
}
else
{
ee.Add(item);
}
i++;
}
return ee.ToArray();
}

Group the indexes of the same elements in a array in C#

There is a int[] array that stores different numbers.
What I want is to group the indexes of those same numbers in the array to the same groups.
For exmaple, the array is int[5]{1,2,5,1,5}
I would like to see the output is List<List<int>> { {0,3}, {1}, {2,4} } // don't mind syntax
It's better if Linq (or a more efficient way) can be used, thanks for help.
You can simply use GroupBy and the position obtained from the Select overload:
int[] array;
var result = array.Select((v, idx) => new { Value = v, Index = idx })
.GroupBy(g => g.Value)
.Select(g => g.ToArray()) // inner array
.ToArray(); // outer array
One of ways:
var result = myArray.Select((elem, idx) => new { Value = elem, Idx = idx})
.GroupBy(proxy => proxy.Value);
foreach (var grouped in result)
{
Console.WriteLine("Element {0} has indexes: {1}",
grouped.Key,
string.Join(", ", grouped.Select(proxy => proxy.Idx).ToArray()));
}
var myFinalList = result.Select(proxy => proxy.ToArray()).ToList();
You can use Enumerable.Range combined with GroupBy:
int[] arr = { 1, 2, 5, 1, 5 };
var result = Enumerable.Range(0, arr.Length)
.GroupBy(i => arr[i])
.Select(x => x.ToList()).ToList();
DEMO HERE

How to rank a list with original order in c#

I want to make a ranking from a list and output it on original order.
This is my code so far:
var data = new[] { 7.806468478, 7.806468478, 7.806468478, 7.173501754, 7.173501754, 7.173501754, 3.40877696, 3.40877696, 3.40877696,
4.097010736, 4.097010736, 4.097010736, 4.036494085, 4.036494085, 4.036494085, 38.94333318, 38.94333318, 38.94333318, 14.43588131, 14.43588131, 14.43588131 };
var rankings = data.OrderByDescending(x => x)
.GroupBy(x => x)
.SelectMany((g, i) =>
g.Select(e => new { Col1 = e, Rank = i + 1 }))
.ToList();
However, the result will be order it from descending:
What I want is to display by its original order.
e.g.: Rank = 3, Rank = 3, Rank = 3, Rank = 4, Rank = 4, Rank = 4, etc...
Thank You.
Using what you have, one method would be to keep track of the original order and sort a second time (ugly and potentially slow):
var rankings = data.Select((x, i) => new {Item = x, Index = i})
.OrderByDescending(x => x.Item)
.GroupBy(x => x.Item)
.SelectMany((g, i) =>
g.Select(e => new {
Index = e.Index,
Item = new { Col1 = e.Item, Rank = i + 1 }
}))
.OrderBy(x => x.Index)
.Select(x => x.Item)
.ToList();
I would instead suggest creating a dictionary with your rankings and joining this back with your list:
var rankings = data.Distinct()
.OrderByDescending(x => x)
.Select((g, i) => new { Key = g, Rank = i + 1 })
.ToDictionary(x => x.Key, x => x.Rank);
var output = data.Select(x => new { Col1 = x, Rank = rankings[x] })
.ToList();
As #AntonínLejsek kindly pointed out, replacing the above GroupBy call with Distinct() is the way to go.
Note doubles are not a precise type and thus are really not a good candidate for values in a lookup table, nor would I recommend using GroupBy/Distinct with a floating-point value as a key. Be mindful of your precision and consider using an appropriate string conversion. In light of this, you may want to define an epsilon value and forgo LINQ's GroupBy entirely, opting instead to encapsulate each data point into a (non-anonymous) reference type, then loop through a sorted list and assign ranks. For example (disclaimer: untested):
class DataPoint
{
decimal Value { get; set; }
int Rank { get; set; }
}
var dataPointsPreservingOrder = data.Select(x => new DataPoint {Value = x}).ToList();
var sortedDescending = dataPointsPreservingOrder.OrderByDescending(x => x.Value).ToList();
var epsilon = 1E-15; //use a value that makes sense here
int rank = 0;
double? currentValue = null;
foreach(var x in sortedDescending)
{
if(currentValue == null || Math.Abs(x.Value - currentValue.Value) > epsilon)
{
currentValue = x.Value;
++rank;
}
x.Rank = rank;
}
From review of the data you will need to iterate twice over the result set.
The first iteration will be to capture the rankings as.
var sorted = data
.OrderByDescending(x => x)
.GroupBy(x => x)
.Select((g, i) => new { Col1 = g.First(), Rank = i + 1 })
.ToList();
Now we have a ranking of highest to lowest with the correct rank value. Next we iterate the data again to find where the value exists in the overall ranks as:
var rankings = (from i in data
let rank = sorted.First(x => x.Col1 == i)
select new
{
Col1 = i,
Rank = rank.Rank
}).ToList();
This results in a ranked list in the original order of the data.
A bit shorter:
var L = data.Distinct().ToList(); // because SortedSet<T> doesn't have BinarySearch :[
L.Sort();
var rankings = Array.ConvertAll(data,
x => new { Col1 = x, Rank = L.Count - L.BinarySearch(x) });

Foreach Loop In LINQ in C#

I would like to replace the foreach loop in the following code with LINQ ForEach() Expression:
List<int> idList = new List<int>() { 1, 2, 3 };
IEnumerable<string> nameList = new List<string>();
foreach (int id in idList)
{
var Name = db.Books.Where(x => x.BookId == id).Select(x => x.BookName);
nameList.Add(Name);
}
Any Help Please!!
Your code doesn't quite work (you're adding an IEnumerable<string> to a List<string>). You also won't need ForEach, since you're constructing the list:
You can do this:
var nameList = idList.SelectMany(id => db.Books.Where(x => x.BookId == id)
.Select(x => x.BookName)).ToList();
But then you're hitting the database for each ID. You can grab all the books at once with :
var nameList = db.Books.Where(b => idList.Contains(b.BookId))
.Select(b => b.BookName).ToList();
Which will only hit the database once.
Why not a select?
List<int> idList = new List<int>() { 1, 2, 3 };
List<string> nameList = idList
.Select(id => db.Books.Where(x => x.BookId == id).Select(x => x.BookName))
.ToList();
Or better yet: refactorise and select...
int[] idList = new int[] { 1, 2, 3 };
List<string> nameList = db.Books
.Where(x => idList.Contains(x.BookId))
.Select(x => x.BookName))
.ToList();
nameList.AddRange(
db.Books.Where(x => idList.Contains(x.BookId))
.Select(x => x.BookName)
.ToList());
This will generate an IN statement in the SQL, thereby only doing a single select.
One thing to be aware of is the performance of IN degrades as the set (idList in this case) gets bigger. In the case of a large set, you can batch the set and do multiple queries:
int start = 0;
int batch = 1000;
while (start < idList.Count())
{
var batchSet = idList.Skip(start).Take(batch);
nameList.AddRange(
db.Books.Where(x => batchSet.Contains(x.BookId))
.Select(x => x.BookName)
.ToList());
start += batch;
}
To answer your specific question, you can do this:
List<int> idList = new List<int>() { 1, 2, 3 };
List<string> nameList = new List<string>();
idList.ForEach(id => {
var Name = db.Books.Where(x => x.BookId == id).Select(x => x.BookName);
nameList.Add(Name);
});

find the first available long in a List<long>

ok, this should be interesting.
lets assume i have the following code:
in this example, the first available number would be 2.
List<long> myList = new List<long>(){0,1,10,3};
in this example, the first available number would be '4'.
List<long> myList = new List<long>(){0,1,2,3};
any ideas?
So by "available" you mean "the lowest non-negative number which doesn't already exist in the list"?
I'd be tempted to write something like:
HashSet<long> existing = new HashSet<long>(list);
for (long x = 0; x < long.MaxValue; x++)
{
if (!existing.Contains(x))
{
return x;
}
}
throw new InvalidOperationException("Somehow the list is enormous...");
EDIT: Alternatively, you could order the list and then find the first value where the index isn't the same as the value...
var ordered = list.OrderBy(x => x);
var differences = ordered.Select((value, index) => new { value, index })
.Where(pair => pair.value != pair.index)
.Select(pair => (int?) pair.index);
var firstDifference = differences.FirstOrDefault();
long nextAvailable = firstDifference ?? list.Count;
The last line is to take care of the situation where the list is contiguous from 0. Another alternative would be:
var nextAvailable = list.Concat(new[] { long.MaxValue })
.OrderBy(x => x)
.Select((value, index) => new { value, index })
.Where(pair => pair.value != pair.index)
.Select(pair => pair.index)
.First();
This should be fine so long as the list doesn't contain long.MaxValue + 1 elements, which it can't in current versions of .NET. (That's a lot of memory...) To be honest, this will already have problems when it goes beyond int.MaxValue elements due to the Select part taking an int index...
list.Sort();
var range = Enumerable.Range( list.First(), list.Last()- list.First());
var number = range.Except(list).FirstOrDefault();

Categories

Resources