Issue with c# quick sort algorithm - c#

I have created two functions that sorts a list using bubble sort, but I would like to change the sort style to quick sort.
I found this quick sort algorithm
http://snipd.net/quicksort-in-c
These are my two functions:
protected void sort_by_section_name()
{
int num1, num2;
for (var i = section_names.Count - 1; i > 0; i -= 1)
{
for (var j = 0; j < i; j += 1)
{
num1 = get_number_from_section(section_names[j]);
num2 = get_number_from_section(section_names[j + 1]);
if (num1 > num2)
{
swap_list_strings(section_names, j, j + 1);
swap_object_items(item_group_list, j, j + 1);
}
}
}
}
protected void sort_items()
{
int num1, num2;
List<SPListItem> temp;
for (var k = 0; k < item_group_list.Count; k += 1)
{
temp = (List<SPListItem>)item_group_list[k];
for (var i = temp.Count - 1; i > 0; i -= 1)
{
for (var j = 0; j < i; j += 1)
{
num1 = Convert.ToInt32((temp[j])[ORDER_BY_COLUMN]);
num2 = Convert.ToInt32((temp[j + 1])[ORDER_BY_COLUMN]);
if (num1 > num2)
{
swap_list_items(temp, j, j + 1);
}
}
}
}
}
For sort_items, its an array of arrays, so the bubble sort stuff is in a for loop.
I don't understand how to change these two functions into using the quicksort.
Can someone please help me?

You don't need to write it yourself in .NET - you can use:
Array.Sort for a basic array of items
LINQ - OrderBy for example with a List<string> (make sure you have using System.Linq at the top of the class)
If you're feeling adventurous, look into IComparable
Use myItems.Sort() which sorts them in place.
For what you want, the easiest way to get started is using #2, here's an example:
List<SPListItem> myItems = GetSomeItems();
myItems = myItems.OrderBy(i => i["MyField"]).ToList();
foreach (var item in sortedItems)
Console.WriteLine(item);
Without knowing the fields you're after, or much about the Sharepoint object that's a bit of a guess, there's are about 5 different ways of doing it in .NET with comparable interfaces (some more info here). As you can't change the SPListItem class then Sort or LINQ maybe easiest.

So you have a List<SPListItem> and you want them sorted, using an efficient sorting algorithm (aka not bubblesort) based on the numeric value of some field. This is easy, and doesn't involve you re-implementing quicksort.
List<SPListItem> list = ...;
var sortedData = list.OrderBy(item => Convert.ToInt32(item["fieldName"]));
It's also worth noting that when possible it's usually better to sort your data on the database, rather than on the webserver. You should be able to add an Order By clause to the CAML query generating those SPListItems and let it do the sort.
It appears that you're sorting two different data structures that are "parallel" (the item at the same index of both structures "belong" together). This is generally undesirable. While there are ways to perform a sort on both structures, what you really should be doing is making a single structure such that each item holds onto everything that logically represents that one item. In many cases this means creating a new class that has properties for each piece of data. You can then populate a collection of this new composite class and sort that.

Related

What should I use to optimize my c# code?

I was doing a codewars kata and it's working but I'm timing out.
I searched online for solutions, for some kind of reference but they were all for java script.
Here is the kata: https://i.stack.imgur.com/yGLmw.png
Here is my code:
public static int DblLinear(int n)
{
if(n > 0)
{
var list = new List<int>();
int[] next_two = new int[2];
list.Add(1);
for (int i = 0; i < n; i++)
{
for (int m = 0; m < next_two.Length; m++)
{
next_two[m] = ((m + 2) * list[i]) + 1;
}
if(list.Contains(next_two[0]))
{
list.Add(next_two[1]);
}
else if(list.Contains(next_two[1]))
{
list.Add(next_two[0]);
}
else
list.AddRange(next_two);
list.Sort();
}
return list[n];
}
return 1;
}
It's really slow solution but that's what seems to be working for me.
The first rule of performance optimization is to measure. Ideally using a profiler that can tell you where most of the time is spent, but for simple cases using some stopwatches can be sufficient.
I would guess that most of the time would be spent in list.Contains, since this is linear lookup, and is in the innermost loop. So one approach would be to change the list to a HashSet<int> to provide better lookup performance, skip the .Sort-call, and return the maximum value in the hashSet. As far as I can tell that should give the same result.
You might also consider using some specialized data structure that fits the problem better than the general containers provided in .Net.

Lambda Expression to keep only modulo N index items? (without creating a new list)

I'm trying to pare a list down to just the modulo N items, i.e., keep only in List A, each item that remains, call it item with index i satisfying i%N == 0
My current solution is to create a new list (listB), and loop through the old list for items that meet this condition. (I feel that there is a better way than to create a new list?)
List<string> listA ; /* the list is not actually a string,
but for our test case let's use this (populated with M=31 items for example)*/
List<string> listB = new List<string>();
int N = 3;
for(int i=0;i<listA.Count;i++){
if(i%N == 0)listB.Add(listA[i]);
}
Is there a better (performant?) way to write this in basically "one line" using lambda expressions? (Without needing to declare a new list)
You could create an extension method combining the efficiency of only handling the items you're interested in while also avoiding creating a new list (though if you create the list with the known size before hand it shouldn't be that bad performance wise).
public static class ListExtensions
{
public static IEnumerable<T> GetNthItems<T>(this List<T> source, int n)
{
for (var i = 0; i < source.Count; i += n)
{
yield return source[i];
}
}
}
And you can then enumerate them directly:
foreach (var myItem in listA.GetNthItems(3))
{
//do something
}
But admittedly I doubt there is that much performance gain if you declare listB like listB = new List<string>((listA.Count / N)+1); and use i += N inside the loop.
Edit:
As requested a way to edit the existing list in place
public static void ClearAllButNthItems<T>(this List<T> source, int n)
{
var i = 1;
for (; i * n < source.Count; i++)
{
source[i] = source[i * n];
}
source.RemoveRange(i, source.Count - i);
}
Which can then be used like:
listA.ClearAllButNthItems(100);
This first moves all nth items in order to the front of the list, then removes all remaining items (this is a constant time action* since we're removing the end of the list so no items require moving)
Edit 2:
*It seems the removal is not actually constant time since the list seems to internally create a new array during the RemoveRange, however the list itself stays the same.
A one-liner:
myList.Where((item, index) => index % N == 0)
A performant solution:
var resultList = new List<SomeType>();
for (var n = 0; n < myList.Count; n += N)
resultList.Add(myList[n]);

Keep statistic about sorting algorithms

I have a homework about object oriented programming in c#. a part of my homework, I need to make 2 different sorting algorithms and putting the random numbers into them and observing statistic about 2 different algorithms.
about that my teaches said me in e-mail "Non static sorting class can keep statistic about sorting how many numbers, how fast, min, max, average.."
So there are my sorting algorithms which Insertion and Count Sortings. Please tell me how can i keep statistic about sorting.
Don't forget main subject of my homework is OOP.
class InsertionSorting : Sort
{
public override List<int> Sorting(List<int> SortList)
{
for ( int i=0; i<SortList.Count-1; i++)
{
for (int j= i+1; j>0; j--)
{
if (SortList[j-1] > SortList [j])
{
int temp = SortList[j - 1];
SortList[j - 1] = SortList[j];
SortList[j] = temp;
}
}
}
return SortList;
}
}
class CountSorting : Sort
{
public override List<int> Sorting(List<int> SortList)
{
int n = SortList.Count;
List<int> output = new List<int>();
List<int> count = new List<int>();
for (int i = 0; i < 1000; ++i)
{
count.Add(0);
output.Add(0);
}
for (int i = 0; i < n; ++i)
++count[SortList[i]];
for (int i = 1; i <= 999; ++i)
count[i] += count[i - 1];
for (int i = 0; i < n; ++i)
{
output[count[SortList[i]] - 1] = SortList[i];
--count[SortList[i]];
}
for (int i = 0; i < SortList.Count; i++)
SortList[i] = output[i];
return SortList;
}
}
Your sorting is being done in two classes - InsertionSorting & CountSorting.
If you want keep track of the statistics declare a variable in the class and increment it every iteration etc etc. Then you can see which one is more effective.
E.g
class InsertionSorting : Sort
{
private int iterations = 0
...
for (int j= i+1; j>0; j--)
{
if (SortList[j-1] > SortList [j])
{
iterations++
...
You could also declare a startTime and endTime allowing to you determine the time the sort took. At the start of "Sorting" record the start time and just before you return record the end time. Write a method to report the difference.
Your prof has told you how when they said "...statistics about sorting how many numbers, how fast, min, max, average.." Your best bet here is to create a class such as "Statistics" which contains a method that allows user input, either through args or direct user prompt. The variables should be as easy as "count of numbers to sort" "lower bounds of number range", "upper bound of number range", and, if automating the testing process, "number of times to iterate".
Given answers to these questions, you should run the two sorting algos with them (eg use a random number generator, and max and min to generate a list.) Your sorting algos need an addition to "log" these statistics. Most likely a variable that tracks the number of position swaps that occurred in the array.
I'm not about to write out your homework for you (that's your job, and you should get good at it.) But if you have any more questions to this, I may be able to steer you in the right direction if this is too vague and you are still struggling.

Binary search slower, what am I doing wrong?

EDIT: so it looks like this is normal behavior, so can anyone just recommend a faster way to do these numerous intersections?
so my problem is this. I have 8000 lists (strings in each list). For each list (ranging from size 50 to 400), I'm comparing it to every other list and performing a calculation based on the intersection number. So I'll do
list1(intersect)list1= number
list1(intersect)list2= number
list1(intersect)list888= number
And I do this for every list. Previously, I had HashList and my code was essentially this: (well, I was actually searching through properties of an object, so I
had to modify the code a bit, but it's basically this:
I have my two versions below, but if anyone knows anything faster, please let me know!
Loop through AllLists, getting each list, starting with list1, and then do this:
foreach (List list in AllLists)
{
if (list1_length < list_length) //just a check to so I'm looping through the
//smaller list
{
foreach (string word in list1)
{
if (block.generator_list.Contains(word))
{
//simple integer count
}
}
}
// a little more code, but the same, but looping through the other list if it's smaller/bigger
Then I make the lists into regular lists, and applied Sort(), which changed my code to
foreach (List list in AllLists)
{
if (list1_length < list_length) //just a check to so I'm looping through the
//smaller list
{
for (int i = 0; i < list1_length; i++)
{
var test = list.BinarySearch(list1[i]);
if (test > -1)
{
//simple integer count
}
}
}
The first version takes about 6 seconds, the other one takes more than 20 (I just stop there cuz otherwise it would take more than a minute!!!) (and this is for a smallish subset of the data)
I'm sure there's a drastic mistake somewhere, but I can't find it.
Well I have tried three distinct methods for achieving this (assuming I understood the problem correctly). Please note I have used HashSet<int> in order to more easily generate random input.
setting up:
List<HashSet<int>> allSets = new List<HashSet<int>>();
Random rand = new Random();
for(int i = 0; i < 8000; ++i) {
HashSet<int> ints = new HashSet<int>();
for(int j = 0; j < rand.Next(50, 400); ++j) {
ints.Add(rand.Next(0, 1000));
}
allSets.Add(ints);
}
the three methods I checked (code is what runs in the inner loop):
the loop:
note that you are getting duplicated results in your code (intersecting set A with set B and later intersecting set B with set A).
It won't affect your performance thanks to the list length check you are doing. But iterating this way is clearer.
for(int i = 0; i < allSets.Count; ++i) {
for(int j = i + 1; j < allSets.Count; ++j) {
}
}
first method:
used IEnumerable.Intersect() to get the intersection with the other list and checked IEnumerable.Count() to get the size of the intersection.
var intersect = allSets[i].Intersect(allSets[j]);
count = intersect.Count();
this was the slowest one averaging 177s
second method:
cloned the smaller set of the two sets I was intersecting, then used ISet.IntersectWith() and checked the resulting sets Count.
HashSet<int> intersect;
HashSet<int> intersectWith;
if(allSets[i].Count < allSets[j].Count) {
intersect = new HashSet<int>(allSets[i]);
intersectWith = allSets[j];
} else {
intersect = new HashSet<int>(allSets[j]);
intersectWith = allSets[i];
}
intersect.IntersectWith(intersectWith);
count = intersect.Count;
}
}
this one was slightly faster, averaging 154s
third method:
did something very similar to what you did iterated over the shorter set and checked ISet.Contains on the longer set.
for(int i = 0; i < allSets.Count; ++i) {
for(int j = i + 1; j < allSets.Count; ++j) {
count = 0;
if(allSets[i].Count < allSets[j].Count) {
loopingSet = allSets[i];
containsSet = allSets[j];
} else {
loopingSet = allSets[j];
containsSet = allSets[i];
}
foreach(int k in loopingSet) {
if(containsSet.Contains(k)) {
++count;
}
}
}
}
this method was by far the fastest (as expected), averaging 66s
conclusion
the method you're using is the fastest of these three. I certainly can't think of a faster single threaded way to do this. Perhaps there is a better concurrent solution.
I've found that one of the most important considerations in iterating/searching any kind of collection is to choose the collection type very carefully. To iterate through a normal collection for your purposes will not be the most optimal. Try using something like:
System.Collections.Generic.HashSet<T>
Using the Contains() method while iterating over the shorter list of two (as you mentioned you're already doing) should give close to O(1) performance, the same as key lookups in the generic Dictionary type.

Parallel For losing values when looping

I'm facing a strange issue that I can't explain and I would like to know if some of you have the answer I'm lacking.
I have a small test app for testing multithreading modifications I'm making to a much larger code. In this app I've set up two functions, one that does a loop sequentially and one that uses the Task.Parallel.For . The two of them print out the time and final elements generated. What I'm seeing is that the function that executes the Parallel.For is generating less items than the sequential loop and this is huge problem for the real app(it's messing with some final results). So, my question is if someone has any idea why this could be happening and if so, if there's anyway to fix it.
Here is the code for the function that uses the parallel.for in my test app:
static bool[] values = new bool[52];
static List<int[]> combinations = new List<int[]>();
static void ParallelLoop()
{
combinations.Clear();
Parallel.For(0, 48, i =>
{
if (values[i])
{
for (int j = i + 1; j < 49; j++)
if (values[j])
{
for (int k = j + 1; k < 50; k++)
{
if (values[k])
{
for (int l = k + 1; l < 51; l++)
{
if (values[l])
{
for (int m = l + 1; m < 52; m++)
{
if (values[m])
{
int[] combination = { i, j, k, l, m };
combinations.Add(combination);
}
}
}
}
}
}
}
}
}); // Parallel.For
}
And here is the app output:
Executing sequential loop...
Number of elements generated: 1,712,304
Executing parallel loop...
Number of elements generated: 1,464,871
Thanks in advance and if you need some clarifications I'll do my best to explain in further detail.
You can't just add items in your list by multiple threads at the same time without any synchronization mechanism. List<T>.Add() actually does some none-trivial internal stuff (buffers...etc) so adding an item is not an atomic thread-safe operation.
Either:
Provide a way to synchronize your writes
Use a collection that supports concurrent writes (see System.Collections.Concurrent namespace)
Don't use multi-threading at all

Categories

Resources