Simultaneous Sorting/Removing data from a List - c#

Currently I have a list of integers. This list contains index values that point to "active" objects in another, much larger list. If the smaller list of "active" values becomes too large, it triggers a loop that iterates through the small list and removes values that have become inactive. Currently, it removes them by simply ignoring the inactive values and adding them to a second list (and when the second list gets full again the same process is repeated, placing them back into the first list and so on).
After this trigger occurs, the list is then sorted using a Quicksort implementation. This is all fine and dandy.
-------Question---------
However, I see a potential gain of speed. I am imagining combining the removal of inactive values while the sorting is taking place. Unfortunately, I cannot find a way to implement quicksort in this way. Simply because the quicksort works with pivots, which means if values are removed from the list, the pivot will eventually try to access a slot in the list that does not exist, etc etc.. (unless I'm just thinking about it wrong).
So, any ideas on how to combine the two operations? I can't seem to find any sorting algorithms as fast as quicksort that could handle this, or perhaps I'm just not seeing how to implement it into a quicksort... any hints are appreciated!
Code for better understanding of whats currently going on:
(Current Conditions: values can range from 0 to 2 million, no 2 values are the same, and in general they are mostly sorted, since they are sorted every so often)
if (deactive > 50000)//if the number of inactive coordinates is greater than 50k
{
for (int i = 0; i < activeCoords1.Count; i++)
{
if (largeArray[activeCoords[i]].active == true)//if coordinate is active, readd to list
{
activeCoords2.Add(activeCoords1[i]);
}
}
//clears the old list for future use
activeCoords1.Clear();
deactive = 0;
//sorts the new list
Quicksort(activeCoords2, 0, activeCoords2.Count() - 1);
}
static void Quicksort(List<int> elements, int left, int right)
{
int i = left, j = right;
int pivot = elements[(left + right) / 2];
while (i <= j)
{
// p < pivot
while (elements[i].CompareTo(pivot) < 0)
{
i++;
}
while (elements[j].CompareTo(pivot) > 0)
{
j--;
}
if (i <= j)
{
// Swap
int tmp = elements[i];
elements[i] = elements[j];
elements[j] = tmp;
i++;
j--;
}
}
// Recursive calls
if (left < j)
{
Quicksort(elements, elements, left, j);
}
if (i < right)
{
Quicksort(elements, elements, i, right);
}
}

It sounds like you might benefit from using a red-black tree (or another balanced binary tree), your search, insert and delete time will be O(log n). The tree will always be sorted so there will be no one off big hits incurred to re-sort.
What is your split in terms of types of access (search, insert, delete) and what are your constraints for reach?

I would use a List<T> or a SortedDictionary<TKey, TValue> as your data structure.
As your reason for sorting ("micro optimization based on feelings") is not a good one, I would refrain from it. A good reason would be "it has a measurable impact on performance".
In that case (or of you just want to do it), I recommend a SortedDictionary. All the sorting stuff is already done for you, no reason to reinvent the wheel.
There is no need to juggle with two Lists if one appropriate data structure suffices. A red-black-tree seems appropriate and is apparently used in the SortedDictionary according to this

Related

How can I implement odd-even sorting in C# using threads?

I am practicing about threads and concurrency in C# and tried to implement the basic odd-even sort algorithm using a thread for even and another for odd sorting.
static bool Sort(int startPosition, List<int> list)
{
bool result = true;
do
{
for (int i = startPosition; i <= list.Count - 2; i = i + 2)
{
if (list[i] > list[i + 1])
{
int temp = list[i];
list[i] = list[i + 1];
list[i + 1] = temp;
result = false;
}
}
} while (!result);
return result;
}
While the main method is like this:
static void Main(string[] args)
{
bool isOddSorted = false;
bool isEvenSorted = false;
List<int> list = new List<int>();
while (list.Count < 15)
{
list.Add(new Random().Next(0, 20));
}
var evenThread = new Thread(() =>
{
isEvenSorted = Sort(0, list);
});
evenThread.Start();
var oddThread = new Thread(() =>
{
isOddSorted = Sort(1, list);
});
oddThread.Start();
while (true)
{
if (isEvenSorted && isOddSorted)
{
foreach (int i in list)
{
Console.WriteLine(i);
}
break;
}
}
}
Understandably, the loop in Sort method works forever because the result variable is never set to true. However the way it works manages to sort the list. It just doesn't break at any time.
However the moment I add a "result = true" to the first line of do-scope of Sort function, the sorting messes up.
I couldn't figure out how to fix this.
You cannot do odd-even sort easily in a multi-threaded manner. Why?
Because the odd-even sort is in essence the repetition of two sorting passes (the odd and the even pass), with any subsequent pass depending on the result of the preceding pass. You cannot run two passes in parallel/concurrently in practical terms, as each pass has to follow each other.
There are of course ways to employ multi-threading, even with odd-even-sort, although that wouldn't probably make much practical sense. For example, you could divide the list into several partitions, with each partition being odd-even-sorted independently. The sorting of each partition could be done in a multi-threaded manner. As a final step it would require merging the sorted partitions in a way that would result in the fully sorted list.
(By the way, that you eventually get a sorted list if you only let the do while loops in your Sort method run many, many times is just that given enough time, even with "overlapping" concurrent passes you reach eventually a sorted list, but maybe not with all the same numbers from the original list. Because given enough repetions of the loop, eventually the elements will be compared with each other and shuffled to the right positions. However, since you have not synchronized list access, you might lose some numbers from the list, being replaced with duplicates of other numbers, depending on the runtime behavior and timing of list accesses between the two threads.)
You are trying to modify non-thread safe collection across threads.
Even if the assumption is good - you are using basic swap in Sort method (but you did not implement it entirely correct), you have to take under account that while one of the threads is doing the swap, other one could swap a value that is being in temp variable in this exact moment.
You would need to familiarize ourself with either locks and/or thread-Safe Collections.
Look at your result variable and the logic you have implemented with regard to result.
The outer do ... while (!result) loop will only exit when result is being true.
Now imagine your inner for loop finds two numbers that need swapping. So it does and swaps the numbers. And sets result to false. And here is my question to you: After result has been set to false when two numbers have been swapped, when and where is result ever being set to true?
Also, while you sort each the numbers on even list positions, and each the numbers on odd positions, your code does not do a final sort across the entire list. So, basically, if after doing the even and odd sorting, a larger number on an even position n is followed by a smaller number on odd position n+1, your code leaves it at that, leaving the list essentially still (partially) unsorted...

Getting N x N dimension data from quad tree is very slow in c#

I am using quad-tree structure for my data processing application in c#, it is similar to hashlife algorithm. Getting data N x N (eg. 2000 x 2000) dimension data from quad-tree is very very slow.
how can i optimize it for extracting large data from quad tree.
Edit:
Here is the code i used to extract the data in recursive manner
public int Getvalue(long x, long y)
{
if (level == 0)
{
return value;
}
long offset = 1 << (level - 2);
if (x < 0)
{
if (y < 0)
{
return NW.Getvalue(x + offset, y + offset);
}
else
{
return SW.Getvalue(x + offset, y - offset);
}
}
else
{
if (y < 0)
{
return NE.Getvalue(x - offset, y + offset);
}
else
{
return SE.Getvalue(x - offset, y - offset);
}
}
}
outer code
int limit = 500;
List<int> ExData = new List<int>();
for (int row = -limit; row < limit; row++)
{
for (int col = -limit; col < limit; col++)
{
ExData.Add(Root.Getvalue(row, col));
//sometimes two dimension array
}
}
A quadtree or any other structure isn't going to help if you're going to visit every element (i.e. level 0 leaf node). Whatever code gets the value in a given cell, an exhaustive tour will visit 4,000,000 points. Your way does arithmetic over and over again as it goes down the tree at each visit.
So for element (-limit,-limit) the code visits every tier and then returns. For the next element it visits every tier and then returns and so on. That is very labourious.
It will speed up if you make the process of adding to the list itself recursively visiting each quadrant once.
NB: I'm not a C# programmer so please correct any errors here:
public void AppendValues(List<int> ExData) {
if(level==0){
ExData.Add(value);
} else{
NW.AppendValues(ExData);
NE.AppendValues(ExData);
SW.AppendValues(ExData);
SE.AppendValues(ExData);
}
}
That will append all the values though not in the raster-scan (row-by-row) order of the original code!
A further speed up can be achieved if you are dealing with sparse data. So if in many cases nodes are empty or even 'solid' (all zero or one value) you could set the nodes to null and then use zero or the solid value.
That trick works well in Hashlife for Conway Life but depends on your application. Interesting patterns have large areas of 'dead' cells that will always propagate to dead and rarely need considering in detail.
I'm not sure what 25-40% means as 'duplicates'. If they aren't some fixed value or are scattered across the tree large 'solid' regions are likely to be rare and that trick may not help here.
Also, if you actually need to only get the values in some region (e.g. rectangle) you need to be a bit cleverer about how you work out which sub-region of each quadrant you need using offset but it will still be far more efficient than 'brute' force tour of every element. Make sure the code realises when the region of interest is entirely outside the node in hand and return quickly.
All this said if creating a list of all the values in the quad-tree is a common activity in your application, a quad-tree may not be the answer you need. A map simply mapping (row,col) to value is pre-made and again very efficient if there is some common default value (e.g. zero).
It may help to create an iterator object rather than add millions of items to a list; particularly if the list is transient and destroyed soon after.
More information about the actual application is required to understand if a quadtree is the answer here. The information provided so far suggests it isn't.

Why my implementation of linked list quick sort is much slower than array one?

I been algorithm problem that requires me to do implementation of quick sort algorithm for linked list and array.
I have done both parts , algorithms are working, but it seems there is some bug in my quick-sort linked list implementation.
Here is my Quick sort linked list implementation.
public static void SortLinkedList(DataList items, DataList.Node low, DataList.Node high)
{
if( low != null && low !=high)
{
DataList.Node p = _PartitionLinkedList(items, low, high);
SortLinkedList(items, low, p);
SortLinkedList(items, p.Next(), null);
}
}
private static DataList.Node _PartitionLinkedList(DataList items, DataList.Node low, DataList.Node high)
{
DataList.Node pivot = low;
DataList.Node i = low;
for (DataList.Node j = i.Next(); j != high; j=j.Next())
{
if (j.Value().CompareTo(pivot.Value()) <= 0)
{
items.Swap(i.Next(),j);
i = i.Next();
}
}
items.Swap(pivot, i);
return i;
}
Here is Quick Sort array implementation
public static void SortData(DataArray items, int low, int high)
{
if (low < high)
{
int pi = _PartitionData(items, low, high);
SortData(items, low, pi - 1);
SortData(items, pi + 1, high);
}
}
static int _PartitionData(DataArray arr, int low, int high)
{
double pivot = arr[high];
int i = (low - 1);
for (int j = low; j <= high - 1; j++)
{
if (arr[j].CompareTo(pivot)<=0)
{
i++;
arr.Swap(i,j);
}
}
arr.Swap(i + 1, high);
return i + 1;
}
Here is Quick sort array and linked list performance. (left n, right time)
Picture
As you can see qs linked list took 10 min to sort 6400 elements. I dont think that its normal..
Also I dont think that its because of the data structure, because I was using same structure for selection sort and performance for both linked list and array were similar.
GitHub repo in case i forgot to provide some code. Repo
10 minutes is a very long time for 6400 elements. It would normally require 2 or 3 horrible mistakes, not just one.
Unfortunately, I only see one horrible mistake: Your second recursive call to SortLinkedList(items, p.Next(), null); goes all the way to the end of the list. You meant for it to stop at high.
That might account for the 10 minutes, but it seems a little unlikely.
It also looks to me like your sort is incorrect, even after you fix the above bug -- be sure to test the output!
I would look at your linked list, particularly the swap method. Unless we see the implementation of the linked list, I think the problem area is there.
Is there a reason why you're using linked lists? They have o(n) search which makes quicksort n^2lg(n) sort.
A different way to do it is to add all the items in your linked lists to a list, sort that list, and recreate your linkedlist. List.Sort() uses quick sort.
public static void SortLinkedList(DataList items)
{
list<object> actualList = new list<object>();
for (DataList.Node j = i.Next(); j != null; j=j.Next())
{
list.add(j.Value());
}
actualList.Sort();
items.Clear();
for (int i = 0; i < actualList.Count;i++)
{
items.Add(actualList[i]);
}
}
Quick sort for linked list is normally slightly different than quick sort for arrays. Use the first node's data value as the pivot value. Then the code creates 3 lists, one for values < pivot, one for values == pivot, one for values > pivot. It then does a recursive calls for the < pivot and > pivot lists. When the recursive call returns those 3 lists are now sorted, so the code only needs to concatenate the 3 lists.
To speed up concatenation of lists, keep track of a pointer to the last node. To simplify this, use circular lists, and use a pointer to the last node as the main way to access a list. This makes appending (joining) list simpler (no scanning). Once inside a function, use last->next in order to get a pointer to the first node of a list.
Two of the worst case data patterns are already sorted data or already reverse sorted data. If the circular list with pointer to last node method is used, then the average of last and first nodes could be used as a median of 2 which could help (note the list for nodes == pivot could end up empty).
Worst case time complexity is O(n^2). Worst case stack usage is O(n). The stack usage could be reduced by using recursion on the smaller of the list < pivot and list > pivot. After return, the now sorted smaller list would be concatenated with the list == pivot and saved in a 4th list. Then the sort process would iterate on the remaining unsorted list, then merging (or perhaps joining) with the saved list.
Sorting a linked list, using any method, including bottom up merge sort , will be slower than moving the linked list to an array, sorting the array, then creating a linked list from the sorted array. However the quick sort method I describe will be much faster than using an array oriented algorithm with a linked list.

Which class type should I use for sorted collection with key updating and value lookup

I am making a game in Unity with c# code and I want a character to have something like aggro table.
It will be something like (float, int) table where int is other character id and float is actual aggro value.
Basically, I have few things in mind:
The table should be sorted by float keys so I can quickly get top
entries
I need to be able to frequently lookup entries by its int
value because of (3)
I want to change float key frequently and have
the table sorted up all the time
I dont care about memory usage
as there wont be that many entries
I do care about performance of
operations like inserting/removing/changing entries and re-sorting
the list
I am not very experienced in C# saw few things like sortedlist/dictionary, but it doesnt seem to be optimalized for all the things I want.
Do you have any advice?
EDIT: Not sure if I did a good job at explaining what I want to achieve. It might be very similar to a table of football player names and number of goals they scored during the season. I will frequently ask for "most productive players" and also I will frequently update their scores by looking up their names and changing their stats, needing the table to be sorted all the time.
You could use a List<Character>, or an array excluding the player's character. You keep the List<Character> sorted with the highest aggro value at the front. To keep everything sorted every frame you run quicksort first. Once a Character has a lower aggro value than the player's aggro threshhold you escape out of the method.
If aggro is above the threshold just run the aggro check.
You could extend this to work for multiplayer by having a List<Player>.
Something like:
void quicksort(List<Enemy> enemies, int first, int last)
{
int left = first;
int right = last;
int pivot = first;
first++;
while (last >= first)
{
if(enemies[first].Aggro >= enemies[pivot].Aggro &&
enemies[last].Aggro < enemies[pivot].Aggro)
swapp(enemies, first, last)
else if(enemies[first].Aggro >= enemies[pivot].Aggro)
last--;
else if(enemies[last].Aggro < colliders[pivot].Aggro)
first++;
else
{
last--;
first++;
}
}
swap(enemies, pivot, last);
pivot = last;
if(pivot > left)
quicksort(enemies, left, pivot);
if(right > pivot + 1)
quicksort(enemies, pivot + 1, right);
}
void swap(List<Enemy> enemies, int left, right)
{
var temp = enemies[right];
enemies[right] = enemies[left];
enemies[left] = temp;
}
void CheckAggro()
{
quicksort(enemies, 0, enemies.Count - 1);
for(int = 0; i < players.Count; i++)
{
for(int j = 0 < enemies.Count; j++)
{
if(players[i].AggroThreshhold < enemies[j].Aggro)
{
break;
}
// Perform what happens when enemy is high on aggro threshold.
}
}
}
If players have different aggro thresholds you could save all of the enemies who have aggro above the minimum to a separate List, and then do a check against that from the player with the lowest to highest threshold. Just keep the list of players sorted with the lowest aggro threshold first.
I think the best solution here in the SortedList. From What i could gather:
SortedList <TKey, TValue> has faster insertion and removal operations when it comes to sorted date.
There is a question that i think will help: When to use a SortedList<TKey, TValue> over a SortedDictionary<TKey, TValue>?
hope I helped.

Why does failing to recognise equality mess up C# List<T> sort?

This is a somewhat obscure question, but after wasting an hour tracking down the bug, I though it worth asking...
I wrote a custom ordering for a struct, and made one mistake:
My struct has a special state, let us call this "min".
If the struct is in the min state, then it's smaller than any other struct.
My CompareTo method made one mistake: a.CompareTo(b) would return -1 whenever a was "min", but of course if b is also "min" it should return 0.
Now, this mistake completely messed up a List<MyStruct> Sort() method: the whole list would (sometimes) come out in a random order.
My list contained exactly one object in "min" state.
It seems my mistake could only affect things if the one "min" object was compared to itself.
Why would this even happen when sorting?
And even if it did, how can it cause the relative order of two "non-min" objects to be wrong?
Using the LINQ OrderBy method can cause an infinite loop...
Small, complete, test example:
struct MyStruct : IComparable<MyStruct>
{
public int State;
public MyStruct(int s) { State = s; }
public int CompareTo(MyStruct rhs)
{
// 10 is the "min" state. Otherwise order as usual
if (State == 10) { return -1; } // Incorrect
/*if (State == 10) // Correct version
{
if (rhs.State == 10) { return 0; }
return -1;
}*/
if (rhs.State == 10) { return 1; }
return this.State - rhs.State;
}
public override string ToString()
{
return String.Format("MyStruct({0})", State);
}
}
class Program
{
static int Main()
{
var list = new List<MyStruct>();
var rnd = new Random();
for (int i = 0; i < 20; ++i)
{
int x = rnd.Next(15);
if (x >= 10) { ++x; }
list.Add(new MyStruct(x));
}
list.Add(new MyStruct(10));
list.Sort();
// Never returns...
//list = list.OrderBy(item => item).ToList();
Console.WriteLine("list:");
foreach (var x in list) { Console.WriteLine(x); }
for (int i = 1; i < list.Count(); ++i)
{
Console.Write("{0} ", list[i].CompareTo(list[i - 1]));
}
return 0;
}
}
It seems my mistake could only affect things if the one "min" object was compared to itself.
Not quite. It could also be caused if there were two different "min" objects. In the case of the list sorted this particular time, it can only happen if the item is compared to itself. But the other case is worth considering generally in terms of why supplying a non-transitive comparer to a method that expects a transitive comparer is a very bad thing.
Why would this even happen when sorting?
Why not?
List<T>.Sort() works by using the Array.Sort<T> on its items. Array.Sort<T> in turn uses a mixture of Insertion Sort, Heapsort and Quicksort, but to simplify let's consider a general quicksort. For simplicity we'll use IComparable<T> directly, rather than via System.Collections.Generic.Comparer<T>.Default:
public static void Quicksort<T>(IList<T> list) where T : IComparable<T>
{
Quicksort<T>(list, 0, list.Count - 1);
}
public static void Quicksort<T>(IList<T> list, int left, int right) where T : IComparable<T>
{
int i = left;
int j = right;
T pivot = list[(left + right) / 2];
while(i <= j)
{
while(list[i].CompareTo(pivot) < 0)
i++;
while(list[j].CompareTo(pivot) > 0)
j--;
if(i <= j)
{
T tmp = list[i];
list[i] = list[j];
list[j] = tmp;
i++;
j--;
}
}
if(left < j)
Quicksort(list, left, j);
if(i < right)
Quicksort(list, i, right);
}
This works as follows:
Pick an element, called a pivot, from the list(we use the middle).
Reorder the list so that all elements with values less than the pivot come before the pivot, while all elements with values greater than the pivot come after it.
The pivot is now in its final position, with an unsorted sub-list before and after it. Recursively apply the same steps to these two sub-lists.
Now, there are two things to note about the example code above.
The first is that we do not prevent pivot being compared with itself. We could do this, but why would we? For one thing, we need some sort of comparison code to do this, which is precisely what you've already provided in your CompareTo() method. In order to avoid the wasted CompareTo we'd have to either call CompareTo()* an extra time for each comparison (!) or else track the position of pivot which would add more waste than it removed.
And even if it did, how can it cause the relative order of two "non-min" objects to be wrong?
Because quicksort partitions, it doesn't do one massive sort, but a series of mini-sorts. Therefore an incorrect comparison gets a series of opportunities to mess up parts of those sorts, each time leading to a sub-list of incorrectly sorted values that the algorithm considers "dealt with". So in those cases where the bug in the comparer hits, its damage can be spread throughout much of the list. Just as it does its sort by a series of mini-sorts, so it will do a buggy sort by a series of buggy mini-sorts.
Using the LINQ OrderBy method can cause an infinite loop
It uses a variant of Quicksort that guarantees stability; two equivalent item will still have the same relative order after the search as before. The extra complexity is presumably leading to it not only comparing the item to itself, but then continuing to do so forever, as it tries to make sure that it is both in front of itself, but also in the same order to itself as it was before. (Yes, that last sentence makes no sense, and that's exactly why it never returns).
*If this was a reference rather than value type then we could do ReferenceEquals quickly, but aside from the fact that this won't be any good with structs, and the fact that if that really was a time-saver for the type in question it should have if(ReferenceEquals(this, other)) return 0; in the CompareTo anyway, it still wouldn't fix the bug once there was more than one "min" items in the list.

Categories

Resources