Is there a Linq equivalent to the unix command uniq - c#

Every search I make assumes "Distinct()", but this is NOT my requirement. I just wish to remove all the repeats. Are there any options using linq (i.e. the Enumerable extensions) ?
For example (in C#)
int[] input = new [] {1,2,3,3,4,5,5,5,6,6,5,4,4,3,2,1,6};
int[] expected = new [] {1,2,3,4,5,6,5,4,3,2,1,6};

You are asking for non-repeating elements, not unique elements. LINQ-to-Objects operations are essentially iterators. You could write your own iterator method that only yields the first time an item is encountered, eg:
public static IEnumerable<int> DistinctUntilChanged(this IEnumerable<int> source)
{
int? previous=null;
foreach(var item in source)
{
if (item!=previous)
{
previous=item;
yield return item;
}
}
}
var input = new [] {1,2,3,3,4,5,5,5,6,6,5,4,4,3,2,1,6};
var result=input.DistinctUntilChanged().ToArray();
The result will be :
{1,2,3,4,5,6,5,4,3,2,1,6};
UPDATE
Another option is to use Observable.DistinctUntilChanged from the System.Reactive Library, eg:
var input = new[] { 1, 2, 3, 3, 4, 5, 5, 5, 6, 6, 5, 4, 4, 3, 2, 1, 6 };
var result = input.ToObservable()
.DistinctUntilChanged()
.ToEnumerable()
.ToArray();
System.Reactive, and Reactive Extensions are meant to handle sequences of events using the basic LINQ operators and more. It's easy to convert between Observable and Enumerable though, with ToObservable() and ToEnumerable(), so they can be used to handle any collection. After all, an event sequence is similar to an "infinite" sequence
UPDATE 2
In case there's any confusion about the use of int? to store the previous number, it's to allow easy comparison even with the first element of the source without actually calling First() on it. If it was ,eg int previous=0; and the first element was 0, the comparison would filter out the first element.
By using an int? in C# or an int option in F# or a Maybe<int> if we have a Maybe monad we can differentiate between no initial value and an initial value of 0.
Observable.DistinctUntilChanged uses a flag to check whether we are checking the first element. The equivalent code would be:
public static IEnumerable<int> NonRepeating(this IEnumerable<int> source)
{
int previous =0;
bool isAssigned=false;
foreach (var item in source)
{
if (!isAssigned || item != previous)
{
isAssigned = true;
previous = item;
yield return item;
}
}
}
MoreLINQ
Finally, one can use the GroupAdjacent method from the MoreLinq library to group repeating items together. Each group contains the repeating source elements. In this particular case though we only need the key values:
var result = input.GroupAdjacent(i => i).Select(i => i.Key).ToArray();
The nice thing about GroupAdjacent is that the elements can be transformed while grouping, eg :
input.GroupAdjacent(i => i,i=>$"Number {i}")
would return groupings of strings.

It is possible with linq, although for performance and readability a simple for loop would probably be the better option.
int[] input = new[] { 1, 2, 3, 3, 4, 5, 5, 5, 6, 6, 5, 4, 4, 3, 2, 1, 6 };
var result = input.Where((x, i) => i == 0 || x != input[i - 1]).ToArray();

Related

PriorityQueue containing array C#

I would like to create a PriorityQueue to store int[]. The first element in the array is gonna be the criteria for the comparisons.
I could do that easily in Java, though I could not convert it to C#. Could you please guide me?
Priority queues don't work the same way in both languages. What you're trying to do is the Java way of giving PQ a lambda (function) to compare any two elements. In C#, you give each element a priority when adding it to the queue, and then make a comparer to compare different priorities.
PriorityQueue<int[], int> pq = new(Comparer<int>.Create((a, b) => a - b));
// The Comparer compares the *priorities*, not the elements
pq.Enqueue(new int[] { 1, 2, 3, 4 }, 5);
pq.Enqueue(new int[] { 1, 2, 3, 4 }, 0); // This has more priority
while (pq.TryDequeue(out int[]? arr, out int priority))
{
Console.WriteLine(priority); // 0; 5
}
You may be interested in just a simple List and LINQ:
using System.Linq; // at the top of your code to include LINQ
List<int[]> list = new();
list.Add(new int[] { 1, 2, 3, 4 });
list.Add(new int[] { 5, 2, 3, 4 });
IEnumerable<int[]> ordered = list.OrderBy(x => x[0]); // orders by the first element

Group elements of the data set if they are next to each other with LINQ

I have a data set (ex. 1, 1, 4, 6, 3, 3, 1, 2, 2, 2, 6, 6, 6, 7) and I want to group items of the same value but only if they are next to each other minimum 3 times.
Is there a way?
I've tried using combinations of Count and GroupBy and Select in every way I know but I can't find a right one.
Or if it can't be done with LINQ then maybe some other way?
I don't think I'd strive for a 100% LINQ solution for this:
var r = new List<List<int>>() { new () { source.First() } };
foreach(var e in source.Skip(1)){
if(e == r.Last().Last()) r.Last().Add(e);
else r.Add(new(){ e });
}
return r.Where(l => l.Count > 2);
The .Last() calls can be replaced with [^1] if you like
This works like:
have an output that is a list of lists
put the first item in the input, into the output
For the second input items onward, if the input item is the same as the last int in the output, add the input item to the last list in the output,
Otherwise make a new list containing the input int and add it onto the end of the output lists
Keep only those output lists longer than 2
If he output is like:
[
[2,2,2],
[6,6,6]
]
Aggregate can be pushed into doing the same thing; this is simply an accumulator (r), an iteration (foreach) and an op on the result Where
var result = source.Skip(1).Aggregate(
new List<List<int>>() { new List<int> { source.First() } },
(r,e) => {
if(e == r.Last().Last()) r.Last().Add(e);
else r.Add(new List<int>(){ e });
return r;
},
r => r.Where(l => l.Count > 2)
);
..but would you want to be the one to explain it to the new dev?
Another LINQy way would be to establish a counter that incremented by one each time the value in the source array changes compared to the pervious version, then group by this integer, and return only those groups 3+, but I don't like this so much because it's a bit "WTF"
var source = new[]{1, 1, 4, 6, 3, 3, 1, 2, 2, 2, 6, 6, 6, 7};
int ctr = 0;
var result = source.Select(
(e,i) => new[]{ i==0 || e != source[i-1] ? ++ctr : ctr, e}
)
.GroupBy(
arr => arr[0],
arr => arr[1]
)
.Where(g => g.Count() > 2);
You could consider using the GroupAdjacent or the RunLengthEncode operators, from the MoreLinq package. The former groups adjacent elements in the sequence, that have the same key. The key is retrieved by invoking a keySelector lambda parameter. The later compares the adjacent elements, and emits a single KeyValuePair<T, int> for each series of equal elements. The int value of the KeyValuePair<T, int> represents the number of consecutive equal elements. Example:
var source = new[] { 1, 1, 4, 6, 3, 3, 1, 2, 2, 2, 6, 6, 6, 7 };
IEnumerable<IGrouping<int, int>> grouped = MoreLinq.MoreEnumerable
.GroupAdjacent(source, x => x);
foreach (var group in grouped)
{
Console.WriteLine($"Key: {group.Key}, Elements: {String.Join(", ", group)}");
}
Console.WriteLine();
IEnumerable<KeyValuePair<int, int>> pairs = MoreLinq.MoreEnumerable
.RunLengthEncode(source);
foreach (var pair in pairs)
{
Console.WriteLine($"Key: {pair.Key}, Value: {pair.Value}");
}
Output:
Key: 1, Elements: 1, 1
Key: 4, Elements: 4
Key: 6, Elements: 6
Key: 3, Elements: 3, 3
Key: 1, Elements: 1
Key: 2, Elements: 2, 2, 2
Key: 6, Elements: 6, 6, 6
Key: 7, Elements: 7
Key: 1, Value: 2
Key: 4, Value: 1
Key: 6, Value: 1
Key: 3, Value: 2
Key: 1, Value: 1
Key: 2, Value: 3
Key: 6, Value: 3
Key: 7, Value: 1
Live demo.
In the above example I've used the operators as normal methods, because I am not a fan of adding using MoreLinq; and "polluting" the IntelliSense of the Visual Studio with all the specialized operators of the MoreLinq package. An alternative is to enable each operator selectively like this:
using static MoreLinq.Extensions.GroupAdjacentExtension;
using static MoreLinq.Extensions.RunLengthEncodeExtension;
If you don't like the idea of adding a dependency on a third-party package, you could grab the source code of these operators (1, 2), and embed it directly into your project.
If you're nostalgic and like stuff like the Obfuscated C code contest, you could solve it like this.(No best practice claims included)
int[] n = {1, 1, 4, 6, 3, 3, 1, 2, 2, 2, 6, 6, 6, 7};
var t = new int [n.Length][];
for (var i = 0; i < n.Length; i++)
t[i] = new []{n[i], i == 0 ? 0 : n[i] == n[i - 1] ? t[i - 1][1] : t[i - 1][1] + 1};
var r = t.GroupBy(x => x[1], x => x[0])
.Where(g => g.Count() > 2)
.SelectMany(g => g);
Console.WriteLine(string.Join(", ", r));
In the end Linq is likely not the best solution here.
A simple for-loop with 1,2,3 additional loop-variables to track the "group index" and the last value makes likely more sense.
Even if it's 2 lines more code written.
I wouldn't use Linq just to use Linq.
I'd rather suggest using a simple for loop to loop over your input array and populate the output list. To keep track of which number is currently being repeated (if any), I'd use a variable (repeatedNumber) that's initially set to null.
In this approach, a number can only be assigned to repeatedNumber if it fulfills the minimum requirement of repeated items. Hence, for your example input, repeatedNumber would start at null, then eventually be set to 2, then be set to 6, and then be reset to null.
One perhaps good use of Linq here is to check if the minimum requirement of repeated items is fulfilled for a given item in input, by checking the necessary consecutive items in input:
input
.Skip(items up to and including current item)
.Take(minimum requirement of repeated items - 1)
.All(equal to current item)
I'll name this minimum requirement of repeated items repetitionRequirement. (In your question post, repetitionRequirement is 3.)
The logic in the for loop goes a follows:
number = input[i]
If number is equal to repeatedNumber, it means that the previously repeated item continues being repeated
Add number to output
Otherwise, if the minimum requirement of repeated items is fulfilled for number (i.e. if the repetitionRequirement - 1 items directly following number in input are all equal to number), it means that number is the first instance of a new repeated item
Set repeatedNumber equal to number
Add number to output
Otherwise, if repeatedNumber has value, it means that the previously repeated item just ended its repetition
Set repeatedNumber to null
Here is a suggested implementation:
(I'd suggest finding a more descriptive method name)
//using System.Collections.Generic;
//using System.Linq;
public static List<int> GetOutput(int[] input, int repetitionRequirement)
{
var consecutiveCount = repetitionRequirement - 1;
var output = new List<int>();
int? repeatedNumber = null;
for (var i = 0; i < input.Length; i++)
{
var number = input[i];
if (number == repeatedNumber)
{
output.Add(number);
}
else if (i + consecutiveCount < input.Length &&
input.Skip(i + 1).Take(consecutiveCount).All(num => num == number))
{
repeatedNumber = number;
output.Add(number);
}
else if (repeatedNumber.HasValue)
{
repeatedNumber = null;
}
}
return output;
}
By calling it with your example input:
var dataSet = new[] { 1, 1, 4, 6, 3, 3, 1, 2, 2, 2, 6, 6, 6, 7 };
var output = GetOutput(dataSet, 3);
you get the following output:
{ 2, 2, 2, 6, 6, 6 }
Example fiddle here.

Will the result of a LINQ query always be guaranteed to be in the correct order?

Question: Will the result of a LINQ query always be guaranteed to be in the correct order?
Example:
int[] numbers = { 5, 4, 1, 3, 9, 8, 6, 7, 2, 0 };
var lowNums =
from n in numbers
where n < 5
select n;
Now, when we walk through the entries of the query result, will it be in the same order as the input data numbers is ordered?
foreach (var x in lowNums)
{
Console.WriteLine(x);
}
If someone can provide a note on the ordering in the documentation, this would be perfect.
For LINQ to Objects: Yep.
For Parallel LINQ: Nope.
For LINQ to Expression Trees (EF, L2S, etc): Nope.
I think the order of elements retrieved by a LINQ is preserved, at least for LINQ to Object, for LINQ to SQL or Entity, it may depend on the order of the records in the table. For LINQ to Object, I'll try explaining why it preserves the order.
In fact when the LINQ query is executed, the IEnumerable source will call to GetEnumerator() to start looping with a while loop and get the next element using MoveNext(). This is how a foreach works on the IEnumerable source. We all know that a foreach will preserve the order of the elements in a list/collection. Digging more deeply into the MoveNext(), I think it just has some Position to save the current Index and MoveNext() just increase the Position and yield the corresponding element (at the new position). That's why it should preserve the order, all the code changing the original order is redundant or by explicitly calling to OrderBy or OrderByDescending.
If you think this
int[] numbers = { 5, 4, 1, 3, 9, 8, 6, 7, 2, 0 };
foreach(var i in numbers)
if(i < 5) Console.Write(i + " ");
prints out 4 1 3 2 0 you should think this
int[] numbers = { 5, 4, 1, 3, 9, 8, 6, 7, 2, 0 };
IEnumerator ie = numbers.GetEnumerator();
while(ie.MoveNext()){
if((int)ie.Current < 5) Console.Write(ie.Current + " ");
}
also prints out 4 1 3 2 0. Hence this LINQ query
var lowNums = from n in numbers
where n < 5
select n;
foreach (var i in lowNums) {
Console.Write(i + " ");
}
should also print out 4 1 3 2 0.
Conclusion: The order of elements in LINQ depends on how MoveNext() of an IEnumerator obtained from an IEnumerable is implemented. However, it's for sure that the order of elements in LINQ result will be the same order a foreach loop works on the elements.

IComparable<T> gives stackoverflow when used for negative numbers?

This is a weired problem, I have implemented simple quick sort as follows..
static void Main(string[] args)
{
List<int> unsorted = new List<int> { 1, 3, 5, 7, 9, 8, 6, 4, 2 };
List<int> sorted = quicksort(unsorted);
Console.WriteLine(string.Join(",", sorted));
Console.ReadKey();
}
private static List<T> quicksort<T>(List<T> arr) where T : IComparable<T>
{
List<T> loe = new List<T>(), gt = new List<T>();
if (arr.Count < 2)
return arr;
int pivot = arr.Count / 2;
T pivot_val = arr[pivot];
arr.RemoveAt(pivot);
foreach (T i in arr)
{
if (i.CompareTo(pivot_val) <= 0)
loe.Add(i);
else
gt.Add(i);
}
List<T> resultSet = new List<T>();
resultSet.AddRange(quicksort(loe));
gt.Add(pivot_val);
resultSet.AddRange(quicksort(gt));
return resultSet;
}
Output is : 1,2,3,4,5,6,7,8,9
But When I use any negative number in the unsorted list there is a stackoverflow error,
for example
if List unsorted = new List { 1, 3, 5, 7, 9, 8, 6, 4, 2, -1 };
Now there is a stackoverflow..
What's going on? Why this is not working ?
Your algorithm has a bug. Consider the simplest input list { 1, -1 }. Let's step through your logic.
You first choose a pivot index, Count / 2, which is 1.
You then remove the pivot element at index 1 (-1) from the arr list.
Next you compare each remaining element in the arr list (there's just the 1 at index 0) with the pivot.
The 1 is greater than the pivot (-1) so you add it to the gt list.
Next you quicksort the loe list, which is empty. That sort returns an empty list, which you add to the result set.
You then add the pivot value to the end of the gt list, so the gt list now looks like this: { 1, -1 }. Notice that this is the exact same list as you started with.
You then attempt to quicksort the gt list. Since you are calling the quicksort routine with the same input, the same sequence of steps happens again, until the stack overflows.
It seems the error in your logic is that you blindly add the pivot to the gt list without comparing it to anything. I'll leave it to you to figure out how to make it work.
Edited to add: I'm assuming this is a homework assignment, but if it's not, I would highly recommend using .NET's built in Sort() method on List<T>. It has been highly optimized and heavily tested, and will most likely perform better than anything home-brewed. Why reinvent the wheel?
if you don't have a debugger try this...
foreach (T i in arr)
{
if (i.CompareTo(pivot_val) <= 0)
{
loe.Add(i);
Console.WriteLine("loe.add " + i.ToString());
}
else
{
gt.Add(i);
Console.WriteLine("gt.add " + i.ToString());
}
}

Check two List<int>'s for the same numbers

I have two List's which I want to check for corresponding numbers.
for example
List<int> a = new List<int>(){1, 2, 3, 4, 5};
List<int> b = new List<int>() {0, 4, 8, 12};
Should give the result 4.
Is there an easy way to do this without too much looping through the lists?
I'm on 3.0 for the project where I need this so no Linq.
You can use the .net 3.5 .Intersect() extension method:-
List<int> a = new List<int>() { 1, 2, 3, 4, 5 };
List<int> b = new List<int>() { 0, 4, 8, 12 };
List<int> common = a.Intersect(b).ToList();
Jeff Richter's excellent PowerCollections has Set with Intersections. Works all the way back to .NET 2.0.
http://www.codeplex.com/PowerCollections
Set<int> set1 = new Set<int>(new[]{1,2,3,4,5});
Set<int> set2 = new Set<int>(new[]{0,4,8,12});
Set<int> set3 = set1.Intersection(set2);
You could do it the way that LINQ does it, effectively - with a set. Now before 3.5 we haven't got a proper set type, so you'd need to use a Dictionary<int,int> or something like that:
Create a Dictionary<int, int> and populate it from list a using the element as both the key and the value for the entry. (The value in the entry really doesn't matter at all.)
Create a new list for the intersections (or write this as an iterator block, whatever).
Iterate through list b, and check with dictionary.ContainsKey: if it does, add an entry to the list or yield it.
That should be O(N+M) (i.e. linear in both list sizes)
Note that that will give you repeated entries if list b contains duplicates. If you wanted to avoid that, you could always change the value of the dictionary entry when you first see it in list b.
You can sort the second list and loop through the first one and for each value do a binary search on the second one.
If both lists are sorted, you can easily do this in O(n) time by doing a modified merge from merge-sort, simply "remove"(step a counter past) the lower of the two leading numbers, if they are ever equal, save that number to the result list and "remove" both of them. it takes less than n(1) + n(2) steps. This is of course assuming they are sorted. But sorting of integer arrays isn't exactly expensive O(n log(n))... I think. If you'd like I can throw together some code on how to do this, but the idea is pretty simple.
Tested on 3.0
List<int> a = new List<int>() { 1, 2, 3, 4, 5, 12, 13 };
List<int> b = new List<int>() { 0, 4, 8, 12 };
List<int> intersection = new List<int>();
Dictionary<int, int> dictionary = new Dictionary<int, int>();
a.ForEach(x => { if(!dictionary.ContainsKey(x))dictionary.Add(x, 0); });
b.ForEach(x => { if(dictionary.ContainsKey(x)) dictionary[x]++; });
foreach(var item in dictionary)
{
if(item.Value > 0)
intersection.Add(item.Key);
}
In comment to question author said that there will be
Max 15 in the first list and 20 in the
second list
In this case I wouldn't bother with optimizations and use List.Contains.
For larger lists hash can be used to take advantage of O(1) lookup that leads to O(N+M) algorithm as Jon noted.
Hash requires additional space. To reduce memory usage we should hash shortest list.
List<int> a = new List<int>() { 1, 2, 3, 4, 5 };
List<int> b = new List<int>() { 0, 4, 8, 12 };
List<int> shortestList;
List<int> longestList;
if (a.Count > b.Count)
{
shortestList = b;
longestList = a;
}
else
{
shortestList = a;
longestList = b;
}
Dictionary<int, bool> dict = new Dictionary<int, bool>();
shortestList.ForEach(x => dict.Add(x, true));
foreach (int i in longestList)
{
if (dict.ContainsKey(i))
{
Console.WriteLine(i);
}
}
var c = a.Intersect(b);
This only works in 3.5 saw your requirement my apologies.
The method recommended by ocdecio is a good one if you're going to implement it from scratch. Looking at the time complexity compared to the nieve method we see:
Sort/binary search method:
T ~= O(n log n) + O(n) * O(log n) ~= O(n log n)
Looping through both lists (nieve method):
T ~= O(n) * O(n) ~= O(n ^ 2)
There may be a quicker method, but I am not aware of it. Hopefully that should justify choosing his method.
(Previous answer - changed IndexOf to Contains, as IndexOf casts to an array first)
Seeing as it's two small lists the code below should be fine. Not sure if there's a library with an intersection method like Java has (although List isn't a set so it wouldn't work), I know as someone pointed out the PowerCollection library has one.
List<int> a = new List<int>() {1, 2, 3, 4, 5};
List<int> b = new List<int>() {0, 4, 8, 12};
List<int> result = new List<int>();
for (int i=0;i < a.Count;i++)
{
if (b.Contains(a[i]))
result.Add(a[i]);
}
foreach (int i in result)
Console.WriteLine(i);
Update 2: HashSet was a dumb answer as it's 3.5 not 3.0
Update: HashSet seems like the obvious answer:
// Method 2 - HashSet from System.Core
HashSet<int> aSet = new HashSet<int>(a);
HashSet<int> bSet = new HashSet<int>(b);
aSet.IntersectWith(bSet);
foreach (int i in aSet)
Console.WriteLine(i);
Here is a method that removed duplicate strings. Change this to accomidate int and it will work fine.
public List<string> removeDuplicates(List<string> inputList)
{
Dictionary<string, int> uniqueStore = new Dictionary<string, int>();
List<string> finalList = new List<string>();
foreach (string currValue in inputList)
{
if (!uniqueStore.ContainsKey(currValue))
{
uniqueStore.Add(currValue, 0);
finalList.Add(currValue);
}
}
return finalList;
}
Update: Sorry, I am actually combining the lists and then removing duplicates. I am passing the combined list to this method. Not exactly what you are looking for.
Wow. The answers thus far look very complicated. Why not just use :
List<int> a = new List<int>() { 1, 2, 3, 4, 5, 12, 13 };
List<int> b = new List<int>() { 0, 4, 8, 12 };
...
public List<int> Dups(List<int> a, List<int> b)
{
List<int> ret = new List<int>();
foreach (int x in b)
{
if (a.Contains(x))
{
ret.add(x);
}
}
return ret;
}
This seems much more straight-forward to me... unless I've missed part of the question. Which is entirely possible.

Categories

Resources