Sampling from an array

Sampling from an array - c#

I have a float array containing 1M floats
I want to do sampling: for each 4 floats I want to take only 1. So i am doing this :
for(int i = 0; i< floatArray.Length; i++) {
if(i % 4 == 0) {
resultFloat.Add(floatArray[i])
}
}
This works fine, but it takes much time to run through all the elements , is there any other methods to make it with better results (if there are any)

I can see two factors that might be slowing down performance.
As you have already been offered, you should set the step to 4:
for (int i = 0; i < floatArray.Length; i += 4)
{
resultFloat.Add(floatArray[i]);
}
Looks like resultFloat is a list of float. I suggest to use array instead of list, like this:
int m = (floatArray.Length + 3) / 4;
float[] resultFloat = new float[m];
for (int i = 0, k = 0; i < floatArray.Length; i += 4, k++)
{
resultFloat[k] = floatArray[i];
}

Just increment your loop by 4 each iteration instead of by 1:
for(int i = 0; i< floatArray.Length; i+=4)
{
resultFloat.Add(floatArray[i]);
}

If you really have an issue with performance, then you'd be even better off not using a dynamic container for the results, but a statically sized array.
float[] resultFloat = new float[(floatArray.Length + 3) >> 2];
for(int i = 0; i < resultFloat.Length; i++)
resultFloat[i] = floatArray[i << 2];
Usually performance isn't an issue thow, and you shouldn't optimize until a profiler gave you proof that you should. In all other cases the more readable code is preferrable.

Just to add another option, if you want this to be the fastest, use Parallel.For instead of a normal for loop.
int resultLength = (floatArray.Length + 3) / 4;
var resultFloat = new float[resultLength];
Parallel.For(0, resultLength, i =>
{
resultFloat[i] = floatArray[i * 4];
});

List<decimal> list = new List<decimal>();
list.Add(1);
list.Add(2);
list.Add(3);
list.Add(4);
list.Add(5);
list.Add(6);
list.Add(7);
list.Add(8);
list.Add(9);
list.Add(10);
list.Add(11);
list.Add(12);
list.Add(13);
list.Add(14);
list.Add(15);
list.Add(16);
var sampleData = list.Where((x, i) => (i + 1) % (4) == 0).ToList();

Related

Duplicated elements from x array add into y array

I tried to find 2 or more same elements from array x and then that duplicate to add into new array Y
So if i have in x array number like: 2,5,7,2,8 I want to add numbers 2 into y array
int[] x = new int[20];
Random rnd = new Random();
int[] y = new int[20];
int counter = 0;
for (int i = 0; i < x.Length; i++)
{
x[i] = rnd.Next(1, 15);
for (int j=i+1; j< x.Length; j++)
{
if (x[i] == x[j])
{
y[counter] = x[i];
Console.WriteLine("Repeated numbers are " + y[counter]);
counter++;
}
else
{
Console.WriteLine("There is no repeated numbers, numbers that are in x are " + x[i]);
}
break;
}
}
But having problems with that, when it come to the if loop it doesn't want to proceed with executing if loop (even if condition is true)
If someone could give me some suggestion, that would be helpful, thank you

There are various logical errors in your use of for. You should work more on your logic, because while libraries can be learnt by rote, logical errors are more something that is inside you.
int[] x = new int[20];
Random rnd = new Random(5);
// You don't know the length of y!
// So you can't use arrays
List<int> y = new List<int>();
// First initialize
for (int i = 0; i < x.Length; i++)
{
x[i] = rnd.Next(1, 15);
}
// Then print the generated numbers, otherwise you won't know what numbers are there
Console.WriteLine("Numbers that are in x are: ");
for (int i = 0; i < x.Length; i++)
{
Console.WriteLine(x[i]);
}
// A blank line
Console.WriteLine();
// Then scan
for (int i = 0; i < x.Length; i++)
{
for (int j = i + 1; j < x.Length; j++)
{
if (x[i] == x[j])
{
y.Add(x[i]);
Console.WriteLine("Repeated numbers is " + x[i]);
}
}
}
// Success/failure in finding repeated numbers can be decided only at the end of the scan
if (y.Count == 0)
{
Console.WriteLine("There is no repeated numbers");
}
I've put some comments in the code (plus the changes)
And for debugging purpose, I suggest you use a fixed Random sequence. new Random(5) (or any other number) will return the same sequence every time you launch your program.
Note that if there are multiple repetitions of a number, like { 4, 4, 4 } then the y array will be { 4, 4 }

at first:
why do u use the 'break;' ?
second:
in the first for - loop u assign a random number to x[i]
but then in the nested second loop
u already ask x[j] to check for same values (but that doesn't exist yet)
there are so many ways to check if values are equal,
but i like your approach:
so what i would suggest:
make a for - loop and assign all the random numbers to int[] x
then think again how u can evaluate
x[0] = x[1] or x[2] or x[3] ...

Try to use Linq to find the duplicate in the Array
int[] x = new int[] { 2, 5, 7, 2, 8 };
int[] y;
var result = x.GroupBy(item => item)
.Select(grp => new { key = grp.Key, Count = grp.Count() });
y = result.Where(res => res.Count > 1).Select(res => res.key).ToArray();

int[] array = new int[5] {1,2,3,4,4};
List<int> duplitcateList = array.Where(x => array.Where(y => y == x).Count() > 1).Distinct().ToList();
or you can replace last line of above code with below.
List<int> duplitcateList = array.
GroupBy(x => x).Where(g => g.Count() > 1).Select(g => g.Key).ToList();
above code is using Linq.
suppose your first array (in question x) is array.
Linq will first check for all elements in to list which occur more then once, and select them distinctly and store it to duplicateList
if you need an array at the, you can simply convert this list to array by doing this,
int[] yArray = duplitcateList.ToArray();

Make use of linq in your code , as below
//first populate array x
var duplicates= xArray.GroupBy(x => x)
.Where(g => g.Count() > 1)
.Select(y => y.Key)
.ToArray();
linq query above make use of groupby and find duplicate i.e. element occuring more then one time in you array and then you select those element and return result

I think this will will be the most understandable solution without complicated extension methods:
int[] x = new int[20];
// there can be at most 10 duplicates in array of length of 20 :)
// you could use List<int> to easily add elements
int[] y = new int[10];
int counter = 0;
Random rnd = new Random();
// fill the array
for (int i = 0; i < x.Length; i++)
x[i] = rnd.Next(1, 15);
// iterate through distinct elements,
// otherwise, we would add multiple times duplicates
foreach (int i in x.Distinct())
// if the count of an elements is greater than one, then we have duplicate
if(x.Count(n => n == i) > 1)
{
y[counter] = i;
counter++;
}

Almost Ordered not sorting the exact amount of values i give it

this is a really easy question but i cant figure out a way around it. Apparently the almost ordered has a bug that it might randomize a little bit more than you ask it. the code is rather simple:
public void Section1Task1AlmostOrdered(int arraySize, int percentage)
{
int[] testArray = new int[arraySize];
Console.WriteLine("Ordered List: ");
for (int i = 1; i <= testArray.Length; i++)
{
testArray[i-1] = i;
Console.Write(i + "\t");
}
Console.WriteLine("Almost Ordered List: ");
testArray = shuffler.AlmostOrdered(arraySize, percentage);
for (int i = 0; i < testArray.Length; i++)
{
Console.Write(testArray[i] + "\t");
}
}
The shuffler is this part of the code:
public int[] AlmostOrdered(int n, double p)
{
if (p > 100)
{
throw new InvalidOperationException("Cannot shuffle more than 100% of the numbers");
}
int shuffled = 0;
//Create and Populate an array
int[] array = new int[n];
for(int i = 1; i <= n; i++)
{
array[i-1] = i;
}
//Calculate numbers to shuffle
int numsOutOfPlace = (int) Math.Ceiling(n * (p / 100));
int firstRandomIndex = 0;
int secondRandomIndex = 0;
do
{
firstRandomIndex = this.random.Next(n-1);
// to make sure that the two numbers are not the same
do
{
secondRandomIndex = this.random.Next(n - 1);
} while (firstRandomIndex == secondRandomIndex);
int temp = array[firstRandomIndex];
array[firstRandomIndex] = array[secondRandomIndex];
array[secondRandomIndex] = temp;
shuffled++;
}
while (shuffled < numsOutOfPlace);
return array;
}
When i enter values 10 for array size and 40 for percentage to be shuffled, it is shuffling 5 numbers instead of 4. Is there a way to perfect this method to make it more accurate?

Likely the problem is with the calculation:
int numsOutOfPlace = (int)Math.Ceiling(n * (p / 100));
So if p=40 and n=10, then in theory you should get 4. But you're dealing with floating point numbers. So if (p/100) returns 0.400000000001, then the result will be 4.000000001, and Math.Ceiling will round that up to 5.
You might want to replace Math.Ceiling with Math.Round and see how that works out.

For loop - doesn't do last loop

I have this for loop. TicketList starts with 109 tickets. nColumns = 100. I calculate the number of rows I will need depending on the number of tickets. So in this case I need 2 rows. Row one will be full and row two will only have 9 entries. I have the loop below. It only runs one time for the NumOfRows and fills the first 100 and never loops.
What am I missing?
for (int j = 0; j < NumOfRows; j++)
{
for (int i = 0; i < nColumns; i++)
{
if (TicketList.Count() > 0)
{
t = rand.Next(0, TicketList.Count() - 1);
numbers[i, j] = TicketList[t];
TicketList.Remove(TicketList[t]);
}
}
}

Try changing your code to use a more LINQ-like, functional approach. If might make the logic easier. Something like this:
TicketList
.OrderBy(x => rand.Next())
.Select((ticket, n) => new
{
ticket,
j = n / NumOfRows,
i = n % NumOfRows
})
.ToList()
.ForEach(x =>
{
numbers[x.i, x.j] = x.ticket;
});
You may need to flip around x.i & x.j or use nColumns instead of NumOfRows - I wasn't sure what your logic was looking for - but this code might work better.

Other than a few poor choices, your loops appear to be fine. I would venture that NumOfRows is not being calculated correctly.
The expression NumOfRows = (TotalTickets + (Columns - 1)) / Columns; should calculate the correct number of rows.
Also, you should use the property version of Count rather than the Linq extension method and use IList<T>.RemoveAt() or List<T>.RemoveAt rather than Remove(TicketList[T]).
Using Remove() requires that the list be enumerated to locate the element to remove, which may not be the same index that you are targeting. Not to mention that you will scan 50% (on average) of the list for each Remove call, when you already know the correct index to remove.
The functional approach listed earlier seems like overkill.
I've attempted to replicate your issue, assuming certain facts about the various variables in use. The loop repeats the expected number of times.
static void TestMe ()
{
List<object> TicketList = new List<object>();
for (int index = 0; index < 109; index++)
TicketList.Add(new object());
var rand = new Random();
int nColumns = 100;
int NumOfRows = (TicketList.Count + (nColumns - 1)) / nColumns;
object[,] numbers;
int t;
numbers = new object[nColumns, NumOfRows];
for (int j = 0; j < NumOfRows; j++)
{
Console.WriteLine("OuterLoop");
for (int i = 0; i < nColumns; i++)
{
if (TicketList.Count > 0)
{
t = rand.Next(0, TicketList.Count - 1);
numbers[i, j] = TicketList[t];
TicketList.RemoveAt(t);
}
}
}
}
The problem that you are seeing must be the result of something that you have not included in your sample.

Counting sort - implementation differences

I heard about Counting Sort and wrote my version of it based on what I understood.
public void my_counting_sort(int[] arr)
{
int range = 100;
int[] count = new int[range];
for (int i = 0; i < arr.Length; i++) count[arr[i]]++;
int index = 0;
for (int i = 0; i < count.Length; i++)
{
while (count[i] != 0)
{
arr[index++] = i;
count[i]--;
}
}
}
The above code works perfectly.
However, the algorithm given in CLRS is different. Below is my implementation
public int[] counting_sort(int[] arr)
{
int k = 100;
int[] count = new int[k + 1];
for (int i = 0; i < arr.Length; i++)
count[arr[i]]++;
for (int i = 1; i <= k; i++)
count[i] = count[i] + count[i - 1];
int[] b = new int[arr.Length];
for (int i = arr.Length - 1; i >= 0; i--)
{
b[count[arr[i]]] = arr[i];
count[arr[i]]--;
}
return b;
}
I've directly translated this from pseudocode to C#. The code doesn't work and I get an IndexOutOfRange Exception.
So my questions are:
What's wrong with the second piece of code ?
What's the difference algorithm wise between my naive implementation and the one given in the book ?

The problem with your version is that it won't work if the elements have satellite data.
CLRS version would work and it's stable.
EDIT:
Here's an implementation of the CLRS version in Python, which sorts pairs (key, value) by key:
def sort(a):
B = 101
count = [0] * B
for (k, v) in a:
count[k] += 1
for i in range(1, B):
count[i] += count[i-1]
b = [None] * len(a)
for i in range(len(a) - 1, -1, -1):
(k, v) = a[i]
count[k] -= 1
b[count[k]] = a[i]
return b
>>> print sort([(3,'b'),(2,'a'),(3,'l'),(1,'s'),(1,'t'),(3,'e')])
[(1, 's'), (1, 't'), (2, 'a'), (3, 'b'), (3, 'l'), (3, 'e')]

It should be
b[count[arr[i]]-1] = arr[i];
I'll leave it to you to track down why ;-).
I don't think they perform any differently. The second just pushes the correlation of counts out of the loop so that it's simplified a bit within the final loop. That's not necessary as far as I'm concerned. Your way is just as straightforward and probably more readable. In fact (I don't know about C# since I'm a Java guy) I would expect that you could replace that inner while-loop with a library array fill; something like this:
for (int i = 0; i < count.Length; i++)
{
arrayFill(arr, index, count[i], i);
index += count[i];
}
In Java the method is java.util.Arrays.fill(...).

The problem is that you have hard-coded the length of the array that you are using to 100. The length of the array should be m + 1 where m is the maximum element on the original array. This is the first reason that you would think using counting-sort, if you have information about the elements of the array are all minor that some constant and it would work great.

Is there a good radixsort-implementation for floats in C#

I have a datastructure with a field of the float-type. A collection of these structures needs to be sorted by the value of the float. Is there a radix-sort implementation for this.
If there isn't, is there a fast way to access the exponent, the sign and the mantissa.
Because if you sort the floats first on mantissa, exponent, and on exponent the last time. You sort floats in O(n).

Update:
I was quite interested in this topic, so I sat down and implemented it (using this very fast and memory conservative implementation). I also read this one (thanks celion) and found out that you even dont have to split the floats into mantissa and exponent to sort it. You just have to take the bits one-to-one and perform an int sort. You just have to care about the negative values, that have to be inversely put in front of the positive ones at the end of the algorithm (I made that in one step with the last iteration of the algorithm to save some cpu time).
So heres my float radixsort:
public static float[] RadixSort(this float[] array)
{
// temporary array and the array of converted floats to ints
int[] t = new int[array.Length];
int[] a = new int[array.Length];
for (int i = 0; i < array.Length; i++)
a[i] = BitConverter.ToInt32(BitConverter.GetBytes(array[i]), 0);
// set the group length to 1, 2, 4, 8 or 16
// and see which one is quicker
int groupLength = 4;
int bitLength = 32;
// counting and prefix arrays
// (dimension is 2^r, the number of possible values of a r-bit number)
int[] count = new int[1 << groupLength];
int[] pref = new int[1 << groupLength];
int groups = bitLength / groupLength;
int mask = (1 << groupLength) - 1;
int negatives = 0, positives = 0;
for (int c = 0, shift = 0; c < groups; c++, shift += groupLength)
{
// reset count array
for (int j = 0; j < count.Length; j++)
count[j] = 0;
// counting elements of the c-th group
for (int i = 0; i < a.Length; i++)
{
count[(a[i] >> shift) & mask]++;
// additionally count all negative
// values in first round
if (c == 0 && a[i] < 0)
negatives++;
}
if (c == 0) positives = a.Length - negatives;
// calculating prefixes
pref[0] = 0;
for (int i = 1; i < count.Length; i++)
pref[i] = pref[i - 1] + count[i - 1];
// from a[] to t[] elements ordered by c-th group
for (int i = 0; i < a.Length; i++){
// Get the right index to sort the number in
int index = pref[(a[i] >> shift) & mask]++;
if (c == groups - 1)
{
// We're in the last (most significant) group, if the
// number is negative, order them inversely in front
// of the array, pushing positive ones back.
if (a[i] < 0)
index = positives - (index - negatives) - 1;
else
index += negatives;
}
t[index] = a[i];
}
// a[]=t[] and start again until the last group
t.CopyTo(a, 0);
}
// Convert back the ints to the float array
float[] ret = new float[a.Length];
for (int i = 0; i < a.Length; i++)
ret[i] = BitConverter.ToSingle(BitConverter.GetBytes(a[i]), 0);
return ret;
}
It is slightly slower than an int radix sort, because of the array copying at the beginning and end of the function, where the floats are bitwise copied to ints and back. The whole function nevertheless is again O(n). In any case much faster than sorting 3 times in a row like you proposed. I dont see much room for optimizations anymore, but if anyone does: feel free to tell me.
To sort descending change this line at the very end:
ret[i] = BitConverter.ToSingle(BitConverter.GetBytes(a[i]), 0);
to this:
ret[a.Length - i - 1] = BitConverter.ToSingle(BitConverter.GetBytes(a[i]), 0);
Measuring:
I set up some short test, containing all special cases of floats (NaN, +/-Inf, Min/Max value, 0) and random numbers. It sorts exactly the same order as Linq or Array.Sort sorts floats:
NaN -> -Inf -> Min -> Negative Nums -> 0 -> Positive Nums -> Max -> +Inf
So i ran a test with a huge array of 10M numbers:
float[] test = new float[10000000];
Random rnd = new Random();
for (int i = 0; i < test.Length; i++)
{
byte[] buffer = new byte[4];
rnd.NextBytes(buffer);
float rndfloat = BitConverter.ToSingle(buffer, 0);
switch(i){
case 0: { test[i] = float.MaxValue; break; }
case 1: { test[i] = float.MinValue; break; }
case 2: { test[i] = float.NaN; break; }
case 3: { test[i] = float.NegativeInfinity; break; }
case 4: { test[i] = float.PositiveInfinity; break; }
case 5: { test[i] = 0f; break; }
default: { test[i] = test[i] = rndfloat; break; }
}
}
And stopped the time of the different sorting algorithms:
Stopwatch sw = new Stopwatch();
sw.Start();
float[] sorted1 = test.RadixSort();
sw.Stop();
Console.WriteLine(string.Format("RadixSort: {0}", sw.Elapsed));
sw.Reset();
sw.Start();
float[] sorted2 = test.OrderBy(x => x).ToArray();
sw.Stop();
Console.WriteLine(string.Format("Linq OrderBy: {0}", sw.Elapsed));
sw.Reset();
sw.Start();
Array.Sort(test);
float[] sorted3 = test;
sw.Stop();
Console.WriteLine(string.Format("Array.Sort: {0}", sw.Elapsed));
And the output was (update: now ran with release build, not debug):
RadixSort: 00:00:03.9902332
Linq OrderBy: 00:00:17.4983272
Array.Sort: 00:00:03.1536785
roughly more than four times as fast as Linq. That is not bad. But still not yet that fast as Array.Sort, but also not that much worse. But i was really surprised by this one: I expected it to be slightly slower than Linq on very small arrays. But then I ran a test with just 20 elements:
RadixSort: 00:00:00.0012944
Linq OrderBy: 00:00:00.0072271
Array.Sort: 00:00:00.0002979
and even this time my Radixsort is quicker than Linq, but way slower than array sort. :)
Update 2:
I made some more measurements and found out some interesting things: longer group length constants mean less iterations and more memory usage. If you use a group length of 16 bits (only 2 iterations), you have a huge memory overhead when sorting small arrays, but you can beat Array.Sort if it comes to arrays larger than about 100k elements, even if not very much. The charts axes are both logarithmized:
(source: daubmeier.de)

There's a nice explanation of how to perform radix sort on floats here:
http://www.codercorner.com/RadixSortRevisited.htm
If all your values are positive, you can get away with using the binary representation; the link explains how to handle negative values.

By doing some fancy casting and swapping arrays instead of copying this version is 2x faster for 10M numbers as Philip Daubmeiers original with grouplength set to 8. It is 3x faster as Array.Sort for that arraysize.
static public void RadixSortFloat(this float[] array, int arrayLen = -1)
{
// Some use cases have an array that is longer as the filled part which we want to sort
if (arrayLen < 0) arrayLen = array.Length;
// Cast our original array as long
Span<float> asFloat = array;
Span<int> a = MemoryMarshal.Cast<float, int>(asFloat);
// Create a temp array
Span<int> t = new Span<int>(new int[arrayLen]);
// set the group length to 1, 2, 4, 8 or 16 and see which one is quicker
int groupLength = 8;
int bitLength = 32;
// counting and prefix arrays
// (dimension is 2^r, the number of possible values of a r-bit number)
var dim = 1 << groupLength;
int groups = bitLength / groupLength;
if (groups % 2 != 0) throw new Exception("groups must be even so data is in original array at end");
var count = new int[dim];
var pref = new int[dim];
int mask = (dim) - 1;
int negatives = 0, positives = 0;
// counting elements of the 1st group incuding negative/positive
for (int i = 0; i < arrayLen; i++)
{
if (a[i] < 0) negatives++;
count[(a[i] >> 0) & mask]++;
}
positives = arrayLen - negatives;
int c;
int shift;
for (c = 0, shift = 0; c < groups - 1; c++, shift += groupLength)
{
CalcPrefixes();
var nextShift = shift + groupLength;
//
for (var i = 0; i < arrayLen; i++)
{
var ai = a[i];
// Get the right index to sort the number in
int index = pref[( ai >> shift) & mask]++;
count[( ai>> nextShift) & mask]++;
t[index] = ai;
}
// swap the arrays and start again until the last group
var temp = a;
a = t;
t = temp;
}
// Last round
CalcPrefixes();
for (var i = 0; i < arrayLen; i++)
{
var ai = a[i];
// Get the right index to sort the number in
int index = pref[( ai >> shift) & mask]++;
// We're in the last (most significant) group, if the
// number is negative, order them inversely in front
// of the array, pushing positive ones back.
if ( ai < 0) index = positives - (index - negatives) - 1; else index += negatives;
//
t[index] = ai;
}
void CalcPrefixes()
{
pref[0] = 0;
for (int i = 1; i < dim; i++)
{
pref[i] = pref[i - 1] + count[i - 1];
count[i - 1] = 0;
}
}
}

You can use an unsafe block to memcpy or alias a float * to a uint * to extract the bits.

I think your best bet if the values aren't too close and there's a reasonable precision requirement, you can just use the actual float digits before and after the decimal point to do the sorting.
For example, you can just use the first 4 decimals (be they 0 or not) to do the sorting.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Sampling from an array - c#

Just increment your loop by 4 each iteration instead of by 1: for(int i = 0; i< floatArray.Length; i+=4) { resultFloat.Add(floatArray[i]); }

Just to add another option, if you want this to be the fastest, use Parallel.For instead of a normal for loop. int resultLength = (floatArray.Length + 3) / 4; var resultFloat = new float[resultLength]; Parallel.For(0, resultLength, i => { resultFloat[i] = floatArray[i * 4]; });

Related

Duplicated elements from x array add into y array

Almost Ordered not sorting the exact amount of values i give it

For loop - doesn't do last loop

Counting sort - implementation differences

Is there a good radixsort-implementation for floats in C#

Categories

Resources