Everybody!
How can I get minimal value of an int array in specific range in C#?
For example:
int[] array= new int{1,2,3,4,5,6,7,8,76,45};
And I want to get a minimal value between 3-rd and 8-th element.
Maybe it is possible to get via LINQ queries?
array.Skip(2).Take(5).Min();
I figure I may as well add my tuppence to this. As Jason objects to the fact that we're saying how many we're skipping rather than the end index, we can add a simple extension method:
public static IEnumerable<T> WithIndexBetween<T>(this IEnumerable<T> source,
int startInclusive, int endExclusive)
{
// The two values can be the same, yielding no results... but they must
// indicate a reasonable range
if (endExclusive < startInclusive)
{
throw new ArgumentOutOfRangeException("endExclusive");
}
return source.Skip(startInclusive).Take(endExclusive - startInclusive);
}
Then:
int min = array.WithIndexBetween(2, 7).Min();
Adjust the extension method name to taste. (Naming is hard, and I'm not going to spend ages coming up with a nice one here :)
int[] arr = {0,1,2,3,4,5,6,7,8};
int start = 3;
int end = 8;
int min = arr.Skip(start - 1).Take(end - start).Min();
int min = array.Where((value, index) => index >= 2 && index <= 7).Min();
EDIT
Actually, the approach above is quite inefficient, because it enumerates the whole sequence, even though we're not interested in items with an index higher than 7. A better solution would be to use TakeWhile:
int min = array.TakeWhile((value, index) => index <= 7).Skip(2).Min();
Unfortunately it's not very readable... The best option to make it nicer is probably to create a custom extension method, as shown in Jon's answer.
Just to add another option:
int start = 3;
int end = 8;
var min = Enumerable.Range(start - 1,end - start).Select(idx => array[idx]).Min();
AFAIK, this is "theorically" faster if you have to take a range near to the end of the one, and your array is really really long.
That's because (again AFAIK) Skip() doesn't take into account that is an array (i.e. can be accessed randomly in O(1)) and enumerates it anyway.
array.Skip(3).Take(4).Min();
Personally, I'd prefer this:
public static class ArrayExtensions {
public static bool ArrayAndIndexesAreValid(
T[] array,
int startInclusive,
int endExclusive
) {
return array != null &&
array.Length > 0 &&
startInclusive >= 0 && startInclusive < array.Length &&
endExclusive >= 1 && endExclusive <= array.Length &&
startInclusive < endExclusive;
}
public static IEnumerable<T> Slice<T>(
this T[] array,
int startInclusive,
int endExclusive
) {
Contract.Requires<ArgumentException>(ArrayAndIndexesAreValid(
array,
startInclusive,
endExclusive)
);
for (int index = startInclusive; index < endExclusive; index++) {
yield return array[index];
}
}
public static T MinimumInIndexRange<T>(
this T[] array,
int startInclusive,
int endExclusive
) where T : IComparable {
Contract.Requires<ArgumentException>(ArrayAndIndexesAreValid(
array,
startInclusive,
endExclusive)
);
return array.Slice(startInclusive, endExclusive).Min();
}
public static T MaximumInIndexRange<T>(
this T[] array,
int startInclusive,
int endExclusive
) where T : IComparable {
Contract.Requires<ArgumentException>(ArrayAndIndexesAreValid(
array,
startInclusive,
endExclusive)
);
return array.Slice(startInclusive, endExclusive).Max();
}
}
Related
I've been going though www.testdome.com to test my skills and opened a list of public questions. One of the practice questions was:
Implement function CountNumbers that accepts a sorted array of
integers and counts the number of array elements that are less than
the parameter lessThan.
For example, SortedSearch.CountNumbers(new int[] { 1, 3, 5, 7 }, 4)
should return 2 because there are two array elements less than 4.
And my answer was:
using System;
public class SortedSearch
{
public static int CountNumbers(int[] sortedArray, int lessThan)
{
int count = 0;
int l = sortedArray.Length;
for (int i = 0; i < l; i++) {
if (sortedArray [i] < lessThan)
count++;
}
return count;
}
public static void Main(string[] args)
{
Console.WriteLine(SortedSearch.CountNumbers(new int[] { 1, 3, 5, 7 }, 4));
}
}
It seems that I've failed on two counts:
Performance test when sortedArray contains lessThan: Time limit exceeded
and
Performance test when sortedArray doesn't contain lessThan: Time limit exceeded
To be honest I'm not sure what to optimize there? Maybe I'm using a wrong method and there is a similar way to speed up the calculation?
If someone could point out my mistake or explain what I'm going wrong, I'd really appreciate it!
Because the array is sorted, you can stop counting as soon as you reach or exceed the lessThan parameter.
else break would probably do it.
Does it have to be really a loop? You could do Lambda exp for that
public static int CountNumbers(int[] sortedArray, int lessThan)
{
return sortedArray.ToList().Where(x=>x < lessThan).Count();
}
Harold's answer and approach is spot on.
Find below another code sample in case you're practicing for technical interviews. It handles cases when the array is null or empty, when lessThan is presented in the array (including duplicates), etc.
private static int CountNumbers(int[] sortedArray, int lessThan)
{
if (sortedArray == null)
{
throw new ArgumentNullException("Sorted array cannot be null.");
}
if (sortedArray.Length == 0)
{
throw new ArgumentException("Sorted array cannot be empty.");
}
int start = 0;
int end = sortedArray.Length;
int middle = int.MinValue;
while (start < end)
{
middle = (start + end) / 2;
if (sortedArray[middle] == lessThan)
{
break; // Found the "lessThan" number in the array, we can stop and move left
}
else if (sortedArray[middle] < lessThan)
{
start = middle + 1;
}
else
{
end = middle - 1;
}
}
// Adjust the middle pointer based on the "current" and "lessThan" numbers in the sorted array
while (middle >= 0 && sortedArray[middle] >= lessThan)
{
middle--;
}
// +1 because middle is calculated through 0-based (e.g. start)
return middle + 1;
}
I was asked in an interview to write a function for finding all pairs of ints in an array that add up to N. My answer was kinda bulky:
HashSet<Tuple<int,int>> PairsThatSumToN ( int [] arr, int N )
{
HashSet<int> arrhash = new HashShet<int> (arr);
HashSet<Tuple<int,int>> result = new HashSet<Tuple<int,int>>();
for ( int i in arrhash )
{
int j = N - i;
if ( arrhash.Contains(j) ) result.Add(new Tuple<int,int> (i,j));
}
return result;
}
I'm a beginner to C#, come from a C++ background, and I have a few questions about how to make this better:
Is it innefficient to iterate through a HashSet? In other words, would my procedure be more efficient (although less compact) if I changed it to
HashSet<Tuple<int,int>> PairsThatSumToN ( int [] arr, int N )
{
HashSet<int> arrhash = new HashShet<int> ();
HashSet<Tuple<int,int>> result = new HashSet<Tuple<int,int>>();
for ( int i in arr )
{
int j = N - i;
if ( arrhash.Contains(j) ) result.Add(new Type<int,int> (i,j));
arrHash.Add(i);
}
return result;
}
?????
I realize that Add is more like an "Add if not already in there", so I have a useless operation whenever I run result.Add(new Tuple<int,int> (i,j)) for an i,j pair that is already in the set. The more repeated pairs in the array, the more useless operations, and there's all the overhead of allocating the new Tuple that may never be used. Is there a way to optimize this by checking whether the pair i,j is a Tuple in the set before creating a new Tuple out of said pair and trying to add it?
Speaking of the above allocation of a new Tuple on the heap, do I need to free this memory if I don't end up adding that Tuple to the result? Potential memory leak here?
There has to be some way of combining the two sets
HashSet<int> arrhash = new HashShet<int> (arr);
HashSet<Tuple<int,int>> result = new HashSet<Tuple<int,int>>();
In a sense, they contain redundant information since every int in the second one is also in the first one. Something feels "wrong" about having to sets here, yet I can't think of a better way to do it.
Better yet, does the .NET library have any way of doing a 1-line solution for the problem? ;)
Paging Dr. Skeet.
This is what I would try
public Dictionary<int, int> Pairs(int[] arr, int N)
{
// int N asssumes no arr > int32 max / 2
int len = arr.Length < N ? arr.Length / 2 : N / 2;
Dictionary<int, int> d = new Dictionary<int, int>(len);
// add is O(1) if count <= capacity
if(arr.Length == 0) return d;
Array.Sort(arr); // so it is O(n log n) I still take my chances with it
// that is n * log(n)
int start = 0;
int end = arr.Length - 1;
do
{
int ttl = arr[start] + arr[end];
if (ttl == N)
{
if(!d.ContainsKey(arr[start]))
d.Add(arr[start], arr[end]);
// if start <= end then pair uniquely defined by either
// and a perfect hash (zero collisions)
start++;
end--;
}
else if (ttl > N)
end--;
else
start++;
if(start >= end)
return d;
} while (true);
}
Even with a HashSet based solution still use Dictionary(N/2) with Key <= Value
Or use Dictionary(arr.Length / 2)
If you need a neat solution for your problem, here it is, implemented with LINQ.
The performance however, is 4 times worse than your second solution.
Since you have asked for a one liner, here it is anyway.
NOTE: I would appreciate any improvements especially to get rid of that Distinct() since it takes the 50% of the overall cpu time
static List<Pair> PairsThatSumToN(int[] arr, int N)
{
return
(
from x in arr join y in arr on N - x equals y select new Pair(x, y)
)
.Distinct()
.ToList();
}
public class Pair : Tuple<int, int>
{
public Pair(int item1, int item2) : base(item1, item2) { }
public override bool Equals(object pair)
{
Pair dest = pair as Pair;
return dest.Item1 == Item1 || dest.Item2 == Item1;
}
public override int GetHashCode()
{
return Item1 + Item2;
}
}
First of all HashSet removes duplicate items. So iterating through HashSet or Array may yield different results since the array may have duplicate items.
Iterating through HashSet is ok. but note that it should not be used for only iterating purpose. BTW using HashSet is best option here because of O(1) for finding items.
Tuples are compared by reference inside HashSet. That means two different tuples with same items are never equal by default. since they always have different reference. (Sorry my mistake.) it seems tuples are compared by their items. But it compares only x.item1 to y.item1 and x.item2 to y.item2. so 1,2 and 2,1 are not equal. you can make them equal by setting another IEqualityComparer to hashset.
You should not be worry about memory leaks. when HashSet fails to add tuple the garbage collector will remove that tuple when the reference of that tuple is gone. Not immediately but when its needed.
static HashSet<Tuple<int, int>> PairsThatSumToN(int[] arr, int N)
{
HashSet<int> hash = new HashSet<int>(arr);
HashSet<Tuple<int, int>> result = new HashSet<Tuple<int, int>>(new IntTupleComparer());
foreach(int i in arr)
{
int j = N - i;
if (hash.Contains(j)) result.Add(new Tuple<int, int>(i, j));
}
return result;
}
public class IntTupleComparer : IEqualityComparer<Tuple<int, int>>
{
public bool Equals(Tuple<int, int> x, Tuple<int, int> y)
{
return (x.Item1 == y.Item1 && x.Item2 == y.Item2) || (x.Item1 == y.Item2 && x.Item2 == y.Item1);
}
public int GetHashCode(Tuple<int, int> obj)
{
return (obj.Item1 + obj.Item2).GetHashCode();
}
}
If the input set contains unique numbers, or the function must return only unique pairs, I think your second algorithm is the best. Just the result doesn't need to be a HashSet<Tuple<int, int>>, because the uniqueness is guaranteed by the algorithm - a simple List<Tuple<int, int>> would do the same, and better abstraction would be IEnumerable<Tuple<int, int>>. Here is how it looks implemented with C# iterator function:
static IEnumerable<Tuple<int, int>> UniquePairsThatSumToN(int[] source, int N)
{
var set = new HashSet<int>();
for (int i = 0; i < source.Length; i++)
{
var a = source[i];
var b = N - a;
if (set.Add(a) && set.Contains(b))
yield return Tuple.Create(b, a);
}
}
The key point is the line if (set.Add(a) && set.Contains(b)). Since both HashSet<T>.Add and HashSet<T>.Contains are O(1), the whole algorithm is therefore O(N).
With a relatively small modification we can make a function that returns all pairs (not only unique) like this
static IEnumerable<Tuple<int, int>> AllPairsThatSumToN(int[] source, int N)
{
var countMap = new Dictionary<int, int>(source.Length);
for (int i = 0; i < source.Length; i++)
{
var a = source[i];
var b = N - a;
int countA;
countMap.TryGetValue(a, out countA);
countMap[a] = ++countA;
int countB;
if (countMap.TryGetValue(b, out countB))
while (--countB >= 0)
yield return Tuple.Create(b, a);
}
}
I have a list of strings where I need to count the number of list entries that have an occurances of a specific string inside of them (and the whole thing only for a subset of the list not the whole list).
The code below works quite well BUT its performance is.....sadly not in an acceptable niveau as I need to parse through 500k to 900k list entries.For these entries I need to run the code below about 10k times (as I have 10k parts of the list I need to analyse). For that it takes 177 seconds and even more. So my question is how can I do this...fast?
private int ExtraktNumbers(List<string> myList, int start, int end)
{
return myList.Where((x, index) => index >= start && index <= end
&& x.Contains("MYNUMBER:")).Count();
}
Well now we know you are calling the method 10,00 times here is my suggestion. I assume as you have hardcoded "Number:" that it means you are doing different ranges with each call? So if that's the case...
First, run an 'indexing' method and create a list of which indices are a match. Then you can easily count up the matches for the ranges you need.
NOTE: This is something quick, and you may even be able to further optimize this too:
List<int> matchIndex = new List<int>();
void RunIndex(List<string> myList)
{
for(int i = 0; i < myList.Count; i++)
{
if(myList[i].Contains("MYNUMBER:"))
{
matchIndex.Add(i);
}
}
}
int CountForRange(int start, int end)
{
return matchIndex.Count(x => x >= start && x <= end);
}
Then you can use like this, for example:
RunIndex(myList);
// I don't know what code you have here, this is just basic example.
for(int i = 0; i <= 10,000; i++)
{
int count = CountForRange(startOfRange, endOfRange);
// Do something with count.
}
In addition, if you have a lot of duplication in the ranges you check then you could consider caching range counts in a dictionary, but at this stage it's hard to tell if that will be worth doing anyway.
I am pretty sure a simple iterative solution will perform better:
private int ExtractNumbers(List<string> myList, int start, int end)
{
int count = 0;
for (int i = start; i <= end; i++)
{
if (myList[i].Contains("MYNUMBER:"))
{
count++;
}
}
return count;
}
Well for my test stand for 10 millions (10 times more than you have) lines
var data = Enumerable
.Range(1, 10000000)
.Select(item => "123456789 bla-bla-bla " + "MYNUMBER:" + item.ToString())
.ToList();
Stopwatch sw = new Stopwatch();
sw.Start();
int result = ExtraktNumbers(data, 0, 10000000);
sw.Stop();
I've got these results:
2.78 seconds - your initial implementtation
Naive loop (2.60 seconds):
private int ExtraktNumbers(List<string> myList, int start, int end) {
int result = 0;
for (int i = start; i < end; ++i)
if (myList[i].Contains("MYNUMBER:"))
result += 1;
return result;
}
PLinq (1.72 seconds):
private int ExtraktNumbers(List<string> myList, int start, int end) {
return myList
.AsParallel() // <- Do it in parallel
.Skip(start - 1)
.Take(end - start)
.Where(x => x.Contains("MYNUMBER:"))
.Count();
}
Explicit parallel implementation (1.66 seconds):
private int ExtraktNumbers(List<string> myList, int start, int end) {
long result = 0;
Parallel.For(start, end, (i) => {
if (myList[i].Contains("MYNUMBER:"))
Interlocked.Increment(ref result);
});
return (int) result;
}
I just cannot reproduce your 177 seconds
If you know from the beginning the intervals you want to consider, it's probably a good idea to loop the list once, as Dmytro and musefan proposed above, so I won't repeat the same idea again.
However I have a different suggestion for performance improvement. How do you create your list? Do you know the number of items in advance? Because for such a big list, you may gest a significant performance boost by using the List<T> constructor that takes the initial capacity.
I wrote skip last method. When I call it with int array, I expect to only get 2 elements back, not 4.
What is wrong?
Thanks
public static class myclass
{
public static IEnumerable<T> SkipLast<T>(this IEnumerable<T> source, int n)
{
return source.Reverse().Skip(n).Reverse();
}
}
class Program
{
static void Main(string[] args)
{
int [] a = new int[] {5, 6, 7, 8};
ArrayList a1 = new ArrayList();
a.SkipLast(2);
for( int i = 0; i <a.Length; i++)
{
Console.Write(a[i]);
}
}
}
you need to call as
var newlist = a.SkipLast(2);
for( int i = 0; i <newlist.Count; i++)
{
Console.Write(newlist[i]);
}
your method returning skipped list, but your original list will not update
if you want to assign or update same list you can set the returned list back to original as a = a.SkipLast(2).ToArray();
You should assign the result, not just put a.SkipLast(2):
a = a.SkipLast(2).ToArray(); // <- if you want to change "a" and loop on a
for( int i = 0; i <a.Length; i++) { ...
When you do a.SkipLast(2) it creates IEnumerable<int> and then discards it;
The most readable solution, IMHO, is to use foreach which is very convenient with LINQ:
...
int [] a = new int[] {5, 6, 7, 8};
foreach(int item in a.SkipLast(2))
Console.Write(item);
The other replies have answered your question, but wouldn't a more efficient implementation be this (which doesn't involve making two copies of the array in order to reverse it twice). It does iterate the collection twice (or rather, once and then count-n accesses) though:
public static IEnumerable<T> SkipLast<T>(this IEnumerable<T> source, int n)
{
n = source.Count() - n;
return source.TakeWhile(_ => n-- > 0);
}
Actually, if source is a type that implements Count without iteration (such as an array or a List) this will only access the elements count-n times, so it will be extremely efficient for those types.
Here is a better solution that only iterates the sequence once. It's data requirements are such that it only needs a buffer with n elements, which makes it very efficient if n is small compared with the size of the sequence:
public static IEnumerable<T> SkipLast<T>(this IEnumerable<T> source, int n)
{
int count = 0;
T[] buffer = new T[n];
var iter = source.GetEnumerator();
while (iter.MoveNext())
{
if (count >= n)
yield return buffer[count%n];
buffer[count++%n] = iter.Current;
}
}
Change your code to,
foreach (var r in a.SkipLast(2))
{
Console.Write(r);
}
for three reasons,
The SkipLast function returns the mutated sequence, it doesn't change it directly.
What is the point of using an indexer with IEnumerable? It imposes a needless count.
This code is easy to read, easier to type and shows intent.
For a more efficient generic SkipLast see Matthew's buffer with enumerator.
Your example could use a more specialised SkipLast,
public static IEnumerable<T> SkipLast<T>(this IList<T> source, int n = 1)
{
for (var i = 0; i < (source.Count - n); i++)
{
yield return source[i];
}
}
I'm using .NET 3.5 and would like to be able to obtain every *n*th item from a List. I'm not bothered as to whether it's achieved using a lambda expression or LINQ.
Edit
Looks like this question provoked quite a lot of debate (which is a good thing, right?). The main thing I've learnt is that when you think you know every way to do something (even as simple as this), think again!
return list.Where((x, i) => i % nStep == 0);
I know it's "old school," but why not just use a for loop with stepping = n?
Sounds like
IEnumerator<T> GetNth<T>(List<T> list, int n) {
for (int i=0; i<list.Count; i+=n)
yield return list[i]
}
would do the trick. I do not see the need to use Linq or a lambda expressions.
EDIT:
Make it
public static class MyListExtensions {
public static IEnumerable<T> GetNth<T>(this List<T> list, int n) {
for (int i=0; i<list.Count; i+=n)
yield return list[i];
}
}
and you write in a LINQish way
from var element in MyList.GetNth(10) select element;
2nd Edit:
To make it even more LINQish
from var i in Range(0, ((myList.Length-1)/n)+1) select list[n*i];
You can use the Where overload which passes the index along with the element
var everyFourth = list.Where((x,i) => i % 4 == 0);
For Loop
for(int i = 0; i < list.Count; i += n)
//Nth Item..
I think if you provide a linq extension, you should be able to operate on the least specific interface, thus on IEnumerable. Of course, if you are up for speed especially for large N you might provide an overload for indexed access. The latter removes the need of iterating over large amounts of not needed data, and will be much faster than the Where clause. Providing both overloads lets the compiler select the most suitable variant.
public static class LinqExtensions
{
public static IEnumerable<T> GetNth<T>(this IEnumerable<T> list, int n)
{
if (n < 0)
throw new ArgumentOutOfRangeException("n");
if (n > 0)
{
int c = 0;
foreach (var e in list)
{
if (c % n == 0)
yield return e;
c++;
}
}
}
public static IEnumerable<T> GetNth<T>(this IList<T> list, int n)
{
if (n < 0)
throw new ArgumentOutOfRangeException("n");
if (n > 0)
for (int c = 0; c < list.Count; c += n)
yield return list[c];
}
}
I'm not sure if it's possible to do with a LINQ expression, but I know that you can use the Where extension method to do it. For example to get every fifth item:
List<T> list = originalList.Where((t,i) => (i % 5) == 0).ToList();
This will get the first item and every fifth from there. If you want to start at the fifth item instead of the first, you compare with 4 instead of comparing with 0.
Imho no answer is right. All solutions begins from 0. But I want to have the real nth element
public static IEnumerable<T> GetNth<T>(this IList<T> list, int n)
{
for (int i = n - 1; i < list.Count; i += n)
yield return list[i];
}
#belucha I like this, because the client code is very readable and the Compiler chooses the most efficient Implementation. I would build upon this by reducing the requirements to IReadOnlyList<T> and to save the Division for high-performance LINQ:
public static IEnumerable<T> GetNth<T>(this IEnumerable<T> list, int n) {
if (n <= 0) throw new ArgumentOutOfRangeException(nameof(n), n, null);
int i = n;
foreach (var e in list) {
if (++i < n) { //save Division
continue;
}
i = 0;
yield return e;
}
}
public static IEnumerable<T> GetNth<T>(this IReadOnlyList<T> list, int n
, int offset = 0) { //use IReadOnlyList<T>
if (n <= 0) throw new ArgumentOutOfRangeException(nameof(n), n, null);
for (var i = offset; i < list.Count; i += n) {
yield return list[i];
}
}
private static readonly string[] sequence = "1,2,3,4,5,6,7,8,9,10,11,12,13,14,15".Split(',');
static void Main(string[] args)
{
var every4thElement = sequence
.Where((p, index) => index % 4 == 0);
foreach (string p in every4thElement)
{
Console.WriteLine("{0}", p);
}
Console.ReadKey();
}
output