Order a list by a field then by random - c#

var r = new Random();
var orderedList = aListOfPeople.OrderBy(x => x.Age).ThenBy(x => r.Next());
What would be a better way of ordering a list by "age" and then by random?
My goal is to make sure that if PersonA age = PersonB age, PersonA will come first on some occasions and PersonB will come first on some other occasions.

Using the technique from SQL
var orderedList = aListOfPeople.OrderBy(x => x.Age).ThenBy(x => Guid.NewGuid());
Warning: it is not a true random, just a lazy approach, please refer to comment section of the question

The simplest answer is to shuffle and then sort. If you use a stable sort then the sort must preserve the shuffled order for equal-keyed values. However, even though an unstable sort will perturb your shuffle I can't think of any reasonable case in which it could un-shuffle equal-keyed values.
That might be a little inefficient, though...
If you're concerned about collisions I might assume your ages are integer ages in years. In that case you might consider a radix sort (256 bin will be enough for any living human), and when it comes time to stitch the bins back together, you would remove elements from each bin in random order as you append them to the list.
If your list is already sorted by age, and you just want to shuffle in-place, then you just need to iterate through the list, count how many of the following elements are equal, perform an in-place shuffle of that many elements, and then advance to the next non-matching element and repeat.
The latter would be something like this, I think (I'll write in C because I don't know C#):
int i = 0;
while (i < a.length) {
int j;
while (a[j] == a[i] && j < a.length) j++;
while (i + 1 < j) {
int k = random(j - i) + i;
swap(a[i], a[k]);
i++;
}
i++;
}
Haven't tested it, but it should give the rough idea.

Related

Comparing 1 million integers in an array without sorting it first

I have a task to find the difference between every integer in an array of random numbers and return the lowest difference. A requirement is that the integers can be between 0 and int.maxvalue and that the array will contain 1 million integers.
I put some code together which works fine for a small amount of integers but it takes a long time (so long most of the time I give up waiting) to do a million. My code is below, but I'm looking for some insight on how I can improve performance.
for(int i = 0; i < _RandomIntegerArray.Count(); i++) {
for(int ii = i + 1; ii < _RandomIntegerArray.Count(); ii++) {
if (_RandomIntegerArray[i] == _RandomIntegerArray[ii]) continue;
int currentDiff = Math.Abs(_RandomIntegerArray[i] - _RandomIntegerArray[ii]);
if (currentDiff < lowestDiff) {
Pairs.Clear();
}
if (currentDiff <= lowestDiff) {
Pairs.Add(new NumberPair(_RandomIntegerArray[i], _RandomIntegerArray[ii]));
lowestDiff = currentDiff;
}
}
}
Apologies to everyone that has pointed out that I don't sort; unfortunately sorting is not allowed.
Imagine that you have already found a pair of integers a and b from your random array such that a > b and a-b is the lowest among all possible pairs of integers in the array.
Does an integer c exist in the array such that a > c > b, i.e. c goes between a and b? Clearly, the answer is "no", because otherwise you'd pick the pair {a, c} or {c, b}.
This gives an answer to your problem: a and b must be next to each other in a sorted array. Sorting can be done in O(N*log N), and the search can be done in O(N) - an improvement over O(N2) algorithm that you have.
As per #JonSkeet try sorting the array first and then only compare consecutive array items, which means that you only need to iterate the array once:
Array.Sort(_RandomIntegerArray);
for (int i = 1; i < _RandomIntegerArray.Count(); i++)
{
int currentDiff = _RandomIntegerArray[i] - _RandomIntegerArray[i-1];
if (currentDiff < lowestDiff)
{
Pairs.Clear();
}
if (currentDiff <= lowestDiff)
{
Pairs.Add(new NumberPair(_RandomIntegerArray[i], _RandomIntegerArray[i-1]));
lowestDiff = currentDiff;
}
}
In my testing this results in < 200 ms elapsed for 1 million items.
You've got a million integers out of a possible 2.15 or 4.3 billion (signed or unsigned). That means the largest possible min distance is either about 2150 or 4300. Let's say that the max possible min distance is D.
Divide the legal integers into groups of length D. Create a hash h keyed on integers with arrays of ints as values. Process your array by taking each element x, and adding it to h[x/D].
The point of doing this is that any valid pair of points is either contained in h(k) for some k, or collectively in h(k) and h(k+1).
Find your pair of points by going through the keys of the hash and checking the points associated with adjacent keys. You can sort if you like, or use a bitvector, or any other method but now you're dealing with small arrays (on average 1 element per array).
As elements of the array are b/w 0 to int.maxvalue, so I suppose maxvalue will be less than 1 million. If it is so you just need to initialise the array[maxvalue] to 0 and then as you read 1 million values increment the value in your array.
Now read this array and find the lowest value as described by others as if all the values were sorted. If at any element is present more than 1 than its value will be >1 so you could easily say that min. difference is 0.
NOTE- This method is efficient only if you do not use sorting and more importantly int.maxvalue<<<<<(less than) 10^6(1 million).
It helps a little if you do not count on each iteration
int countIntegers = _RandomIntegerArray.Count();
for(int i = 0; i < countIntegers; i++) {
//...
for(int ii = i + 1; ii < countIntegers; ii++) {
//...
Given that Count() is only returning the number of Ints in an array on each successful count and not modifying the array or caching output until modifications.
How about splitting up the array into arraysize/number of processors sized chunks and running each chunk in a different thread. (Neil)
Assume three parts A, B and C of size as close as possible.
For each part, find the minimum "in-part" difference and that of pairs with the first component from the current part and the second from the next part (A being the next from C).
With a method taking O(n²) time, n/3 should take one ninth, done 2*3 times, this amounts to two thirds plus change for combining the results.
This calls to be applied recursively - remember Карацу́ба/Karatsuba multiplication?
Wait - maybe use two parts after all, for three fourth of the effort - very close to "Karatsuba". (When not seeing how to use an even number of parts, I was thinking multiprocessing with every processor doing "the same".)

Randomly select a specific quantity of indices from an array?

I have an array of boolean values and need to randomly select a specific quantity of indices for values which are true.
What is the most efficient way to generate the array of indices?
For instance,
BitArray mask = GenerateSomeMask(length: 100000);
int[] randomIndices = RandomIndicesForTrue(mask, quantity: 10);
In this case the length of randomIndices would be 10.
There's a faster way to do this that requires only a single scan of the list.
Consider picking a line at random from a text file when you don't know how many lines are in the file, and the file is too large to fit in memory. The obvious solution is to read the file once to count the lines, pick a random number in the range of 0 to Count-1, and then read the file again up to the chosen line number. That works, but requires you to read the file twice.
A faster solution is to read the first line and save it as the selected line. You replace the selected line with the next line with probability 1/2. When you read the third line, you replace with probability 1/3, etc. When you've read the entire file, you have selected a line at random, and every line had equal probability of being selected. The code looks something like this:
string selectedLine = null;
int numLines = 0;
Random rnd = new Random();
foreach (var line in File.ReadLines(filename))
{
++numLines;
double prob = 1.0/numLines;
if (rnd.Next() >= prob)
selectedLine = line;
}
Now, what if you want to select 2 lines? You select the first two. Then, as each line is read the probability that it will replace one of the two lines is 2/n, where n is the number of lines already read. If you determine that you need to replace a line, you randomly select the line to be replaced. You can follow that same basic idea to select any number of lines at random. For example:
string[] selectedLines = new int[M];
int numLines = 0;
Random rnd = new Random();
foreach (var line in File.ReadLines(filename))
{
++numLines;
if (numLines <= M)
{
selectedLines[numLines-1] = line;
}
else
{
double prob = (double)M/numLines;
if (rnd.Next() >= prob)
{
int ix = rnd.Next(M);
selectedLines[ix] = line;
}
}
}
You can apply that to your BitArray quite easily:
int[] selected = new int[quantity];
int num = 0; // number of True items seen
Random rnd = new Random();
for (int i = 0; i < items.Length; ++i)
{
if (items[i])
{
++num;
if (num <= quantity)
{
selected[num-1] = i;
}
else
{
double prob = (double)quantity/num;
if (rnd.Next() > prob)
{
int ix = rnd.Next(quantity);
selected[ix] = i;
}
}
}
}
You'll need some special case code at the end to handle the case where there aren't quantity set bits in the array, but you'll need that with any solution.
This makes a single pass over the BitArray, and the only extra memory it uses is for the list of selected indexes. I'd be surprised if it wasn't significantly faster than the LINQ version.
Note that I used the probability calculation to illustrate the math. You can change the inner loop code in the first example to:
if (rnd.Next(numLines+1) == numLines)
{
selectedLine = line;
}
++numLines;
You can make a similar change to the other examples. That does the same thing as the probability calculation, and should execute a little faster because it eliminates a floating point divide for each item.
There are two families of approaches you can use: deterministic and non-deterministic. The first one involves finding all the eligible elements in the collection and then picking N at random; the second involves randomly reaching into the collection until you have found N eligible items.
Since the size of your collection is not negligible at 100K and you only want to pick a few out of those, at first sight non-deterministic sounds like it should be considered because it can give very good results in practice. However, since there is no guarantee that N true values even exist in the collection, going non-deterministic could put your program into an infinite loop (less catastrophically, it could just take a very long time to produce results).
Therefore I am going to suggest going for a deterministic approach, even though you are going to pay for the guarantees you need through the nose with resource usage. In particular, the operation will involve in-place sorting of an auxiliary collection; this will practically undo the nice space savings you got by using BitArray.
Theory aside, let's get to work. The standard way to handle this is:
Filter all eligible indices into an auxiliary collection.
Randomly shuffle the collection with Fisher-Yates (there's a convenient implementation on StackOverflow).
Pick the N first items of the shuffled collection. If there are less than N then your input cannot satisfy your requirements.
Translated into LINQ:
var results = mask
.Select((i, f) => Tuple.Create) // project into index/bool pairs
.Where(t => t.Item2) // keep only those where bool == true
.Select(t => t.Item1) // extract indices
.ToList() // prerequisite for next step
.Shuffle() // Fisher-Yates
.Take(quantity) // pick N
.ToArray(); // into an int[]
if (results.Length < quantity)
{
// not enough true values in input
}
If you have 10 indices to choose from, you could generate a random number from 0 to 2^10 - 1, and use that as you mask.

Simultaneous Sorting/Removing data from a List

Currently I have a list of integers. This list contains index values that point to "active" objects in another, much larger list. If the smaller list of "active" values becomes too large, it triggers a loop that iterates through the small list and removes values that have become inactive. Currently, it removes them by simply ignoring the inactive values and adding them to a second list (and when the second list gets full again the same process is repeated, placing them back into the first list and so on).
After this trigger occurs, the list is then sorted using a Quicksort implementation. This is all fine and dandy.
-------Question---------
However, I see a potential gain of speed. I am imagining combining the removal of inactive values while the sorting is taking place. Unfortunately, I cannot find a way to implement quicksort in this way. Simply because the quicksort works with pivots, which means if values are removed from the list, the pivot will eventually try to access a slot in the list that does not exist, etc etc.. (unless I'm just thinking about it wrong).
So, any ideas on how to combine the two operations? I can't seem to find any sorting algorithms as fast as quicksort that could handle this, or perhaps I'm just not seeing how to implement it into a quicksort... any hints are appreciated!
Code for better understanding of whats currently going on:
(Current Conditions: values can range from 0 to 2 million, no 2 values are the same, and in general they are mostly sorted, since they are sorted every so often)
if (deactive > 50000)//if the number of inactive coordinates is greater than 50k
{
for (int i = 0; i < activeCoords1.Count; i++)
{
if (largeArray[activeCoords[i]].active == true)//if coordinate is active, readd to list
{
activeCoords2.Add(activeCoords1[i]);
}
}
//clears the old list for future use
activeCoords1.Clear();
deactive = 0;
//sorts the new list
Quicksort(activeCoords2, 0, activeCoords2.Count() - 1);
}
static void Quicksort(List<int> elements, int left, int right)
{
int i = left, j = right;
int pivot = elements[(left + right) / 2];
while (i <= j)
{
// p < pivot
while (elements[i].CompareTo(pivot) < 0)
{
i++;
}
while (elements[j].CompareTo(pivot) > 0)
{
j--;
}
if (i <= j)
{
// Swap
int tmp = elements[i];
elements[i] = elements[j];
elements[j] = tmp;
i++;
j--;
}
}
// Recursive calls
if (left < j)
{
Quicksort(elements, elements, left, j);
}
if (i < right)
{
Quicksort(elements, elements, i, right);
}
}
It sounds like you might benefit from using a red-black tree (or another balanced binary tree), your search, insert and delete time will be O(log n). The tree will always be sorted so there will be no one off big hits incurred to re-sort.
What is your split in terms of types of access (search, insert, delete) and what are your constraints for reach?
I would use a List<T> or a SortedDictionary<TKey, TValue> as your data structure.
As your reason for sorting ("micro optimization based on feelings") is not a good one, I would refrain from it. A good reason would be "it has a measurable impact on performance".
In that case (or of you just want to do it), I recommend a SortedDictionary. All the sorting stuff is already done for you, no reason to reinvent the wheel.
There is no need to juggle with two Lists if one appropriate data structure suffices. A red-black-tree seems appropriate and is apparently used in the SortedDictionary according to this

Most efficient sorting algorithm for sorted sub-sequences

I have several sorted sequences of numbers of type long (ascending order) and want to generate one master sequence that contains all elements in the same order. I look for the most efficient sorting algorithm to solve this problem. I target C#, .Net 4.0 and thus also welcome ideas targeting parallelism.
Here is an example:
s1 = 1,2,3,5,7,13
s2 = 2,3,6
s3 = 4,5,6,7,8
resulting Sequence = 1,2,2,3,3,4,5,5,6,6,7,7,8,13
Edit: When there are two (or more) identical values then the order of those two (or more) does not matter.
Just merge the sequences. You do not have to sort them again.
There is no .NET Framework method that I know of to do a K-way merge. Typically, it's done with a priority queue (often a heap). It's not difficult to do, and it's quite efficient. Given K sorted lists, together holding N items, the complexity is O(N log K).
I show a simple binary heap class in my article A Generic Binary Heap Class. In Sorting a Large Text File, I walk through the creation of multiple sorted sub-files and using the heap to do the K-way merge. Given an hour (perhaps less) of study, and you can probably adapt that to use in your program.
You just have to merge your sequences like in a merge sort.
And this is parallelizable:
merge sequences (1 and 2 in 1/2), (3 and 4 in 3/4), …
merge sequences (1/2 and 3/4 in 1/2/3/4), (5/6 and 7/8 in 5/6/7/8), …
…
Here is the merge function :
int j = 0;
int k = 0;
for(int i = 0; i < size_merged_seq; i++)
{
if (j < size_seq1 && seq1[j] < seq2[k])
{
merged_seq[i] = seq1[j];
j++;
}
else
{
merged_seq[i] = seq2[k];
k++;
}
}
Easy way is to merge them with each other one by one. However, this will require O(n*k^2) time, where k is number of sequences and n is the average number of items in sequences. However, using divide and conquer approach you can lower this time to O(n*k*log k). The algorithm is as follows:
Divide k sequences to k/2 groups, each of 2 elements (and 1 groups of 1 element if k is odd).
Merge sequences in each group. Thus you will get k/2 new groups.
Repeat until you get single sequence.
UPDATE:
Turns out that with all the algorithms... It's still faster the simple way:
private static List<T> MergeSorted<T>(IEnumerable<IEnumerable<T>> sortedBunches)
{
var list = sortedBunches.SelectMany(bunch => bunch).ToList();
list.Sort();
return list;
}
And for legacy purposes...
Here is the final version by prioritizing:
private static IEnumerable<T> MergeSorted<T>(IEnumerable<IEnumerable<T>> sortedInts) where T : IComparable<T>
{
var enumerators = new List<IEnumerator<T>>(sortedInts.Select(ints => ints.GetEnumerator()).Where(e => e.MoveNext()));
enumerators.Sort((e1, e2) => e1.Current.CompareTo(e2.Current));
while (enumerators.Count > 1)
{
yield return enumerators[0].Current;
if (enumerators[0].MoveNext())
{
if (enumerators[0].Current.CompareTo(enumerators[1].Current) == 1)
{
var tmp = enumerators[0];
enumerators[0] = enumerators[1];
enumerators[1] = tmp;
}
}
else
{
enumerators.RemoveAt(0);
}
}
do
{
yield return enumerators[0].Current;
} while (enumerators[0].MoveNext());
}

List.Sort and Bubble Sort, which is faster? (Closed)

For example, I have a List
List<int> list = new List<int>();
list.Add(1);
list.Add(5);
list.Add(7);
list.Add(3);
list.Add(17);
list.Add(10);
list.Add(13);
list.Add(9);
I use List.Sort method like this
private static int Compare(int x, int y)
{
if (x == y)
return 0;
else if (x > y)
return -1;
else
return 1;
}
List.Sort(Compare);
I use bubble sort like this
private static void Sort(List<int> list)
{
int size = list.Capacity;
for (int i = 1; i < size; i++)
{
for (int j = 0; j < (size - i); j++)
{
if (list[j] > list[j+1])
{
int temp = list[j];
list[j] = list[j+1];
list[j+1] = temp;
}
}
}
}
My question like the title, I wonder that which is faster?
Thank you
On the whole, bubble sort will be slower than almost anything else, including List.Sort which is implemented with a quick sort algorithm.
Bubble sort is simple to implement, but it's not very efficient. The List.Sort method uses QuickSort, which is a more complex and also more efficient algorithm.
However, when you have very few items in your list, like in your example, the efficiency of the algorithm doesn't really matter. What matters is how it's implemented and how much overhead there is, so you would just have to use the Stopwatch class to time your examples. This will of course only tell you which is faster for the exact list that you are testing, so it's not very useful for choosing an algorithm to use in an application.
Besides, when there are very few items in the list, it doesn't really matter which algorithm is faster because it takes so little time anyway. You should consider how much items there will be in the actual implementation, and if the number of items will grow over time.
Have a look at the documentation of List<T>.Sort():
On average, this method is an O(n log n) operation, where n is Count; in the worst case it is an O(n ^ 2) operation.
Since bubble sort is O(n ^ 2) (both on average and in the worst case), you can expect List<T>.Sort() to be (much) faster on large data sets. The speed of sorting 8 elements (as you have) is usually so minuscule (even using bubble sort), that it doesn't matter what you use.
What could affect the speed in this case is the fact that you use a delegate with List<T>.Sort(), but not with your bubble sort. Invoking delegates is relatively slow, so you should try to avoid them if possible when you are micro-optimizing (which you shouldn't do most of the time).

Categories

Resources