Keep statistic about sorting algorithms - c#

I have a homework about object oriented programming in c#. a part of my homework, I need to make 2 different sorting algorithms and putting the random numbers into them and observing statistic about 2 different algorithms.
about that my teaches said me in e-mail "Non static sorting class can keep statistic about sorting how many numbers, how fast, min, max, average.."
So there are my sorting algorithms which Insertion and Count Sortings. Please tell me how can i keep statistic about sorting.
Don't forget main subject of my homework is OOP.
class InsertionSorting : Sort
{
public override List<int> Sorting(List<int> SortList)
{
for ( int i=0; i<SortList.Count-1; i++)
{
for (int j= i+1; j>0; j--)
{
if (SortList[j-1] > SortList [j])
{
int temp = SortList[j - 1];
SortList[j - 1] = SortList[j];
SortList[j] = temp;
}
}
}
return SortList;
}
}
class CountSorting : Sort
{
public override List<int> Sorting(List<int> SortList)
{
int n = SortList.Count;
List<int> output = new List<int>();
List<int> count = new List<int>();
for (int i = 0; i < 1000; ++i)
{
count.Add(0);
output.Add(0);
}
for (int i = 0; i < n; ++i)
++count[SortList[i]];
for (int i = 1; i <= 999; ++i)
count[i] += count[i - 1];
for (int i = 0; i < n; ++i)
{
output[count[SortList[i]] - 1] = SortList[i];
--count[SortList[i]];
}
for (int i = 0; i < SortList.Count; i++)
SortList[i] = output[i];
return SortList;
}
}

Your sorting is being done in two classes - InsertionSorting & CountSorting.
If you want keep track of the statistics declare a variable in the class and increment it every iteration etc etc. Then you can see which one is more effective.
E.g
class InsertionSorting : Sort
{
private int iterations = 0
...
for (int j= i+1; j>0; j--)
{
if (SortList[j-1] > SortList [j])
{
iterations++
...
You could also declare a startTime and endTime allowing to you determine the time the sort took. At the start of "Sorting" record the start time and just before you return record the end time. Write a method to report the difference.

Your prof has told you how when they said "...statistics about sorting how many numbers, how fast, min, max, average.." Your best bet here is to create a class such as "Statistics" which contains a method that allows user input, either through args or direct user prompt. The variables should be as easy as "count of numbers to sort" "lower bounds of number range", "upper bound of number range", and, if automating the testing process, "number of times to iterate".
Given answers to these questions, you should run the two sorting algos with them (eg use a random number generator, and max and min to generate a list.) Your sorting algos need an addition to "log" these statistics. Most likely a variable that tracks the number of position swaps that occurred in the array.
I'm not about to write out your homework for you (that's your job, and you should get good at it.) But if you have any more questions to this, I may be able to steer you in the right direction if this is too vague and you are still struggling.

Related

What should I use to optimize my c# code?

I was doing a codewars kata and it's working but I'm timing out.
I searched online for solutions, for some kind of reference but they were all for java script.
Here is the kata: https://i.stack.imgur.com/yGLmw.png
Here is my code:
public static int DblLinear(int n)
{
if(n > 0)
{
var list = new List<int>();
int[] next_two = new int[2];
list.Add(1);
for (int i = 0; i < n; i++)
{
for (int m = 0; m < next_two.Length; m++)
{
next_two[m] = ((m + 2) * list[i]) + 1;
}
if(list.Contains(next_two[0]))
{
list.Add(next_two[1]);
}
else if(list.Contains(next_two[1]))
{
list.Add(next_two[0]);
}
else
list.AddRange(next_two);
list.Sort();
}
return list[n];
}
return 1;
}
It's really slow solution but that's what seems to be working for me.
The first rule of performance optimization is to measure. Ideally using a profiler that can tell you where most of the time is spent, but for simple cases using some stopwatches can be sufficient.
I would guess that most of the time would be spent in list.Contains, since this is linear lookup, and is in the innermost loop. So one approach would be to change the list to a HashSet<int> to provide better lookup performance, skip the .Sort-call, and return the maximum value in the hashSet. As far as I can tell that should give the same result.
You might also consider using some specialized data structure that fits the problem better than the general containers provided in .Net.

What sorting method is this being applied and what is the algorithmic complexity of the method

I came across the code below for implementing sorting array.
I have applied to a very long array and it was able to do so in under a sec may be 20 millisec or less.
I have been reading about Algorithm complexity and the Big O notation and would like to know:
Which is the sorting method (of the existing ones) that is implemented in this code.
What is the complexity of the algorithm used here.
If you were to improve the algorithm/ code below what would you alter.
using System;
using System.Text;
//This program sorts an array
public class SortArray
{
static void Main(String []args)
{
// declaring and initializing the array
//int[] arr = new int[] {3,1,4,5,7,2,6,1, 9,11, 7, 2,5,8,4};
int[] arr = new int[] {489,491,493,495,497,529,531,533,535,369,507,509,511,513,515,203,205,207,209,211,213,107,109,111,113,115,117,11913,415,417,419,421,423,425,427,15,17,19,21,4,517,519,521,523,525,527,4,39,441,443,445,447,449,451,453,455,457,459,461,537,539,541,543,545,547,1,3,5,7,9,11,13,463,465,467,23,399,401,403,405,407,409,411,499,501,503,505,333,335,337,339,341,343,345,347,65,67,69,71,73,75,77,79,81,83,85,87,89,91,93,95,9,171,173,175,177,179,181,183,185,187,269,271,273,275,277,279,281,283,25,27,29,31,33,35,37,39,41,43,45,47,49,51,53,55,57,59,61,63,133,135,137,139,141,143,145,285,287,289,291,121,123,125,127,129,131,297,299,373,375,377,379,381,383,385,387,389,97,99,101,103,105,147,149,151,153,155,157,159,161,163,165,167,16,391,393,395,397,399,401,403,189,191,193,195,197,199,201,247,249,251,253,255,257,259,261,263,265,267,343,345,347,349,501,503,505,333,335,337,339,341,417,419,421,423,425,561,563,565,567,569,571,573,587,589,591,593,595,597,599,427,429,431,433,301,303,305,307,309,311,313,315,317,319,321,323,325,327,329,331,371,359,361,363,365,367,369,507,509,511,513,515,351,353,355,57,517,519,521,523,525,527,413,415,405,407,409,411,499,435,437,469,471,473,475,477,479,481,483,485,487,545,547,549,551,553,555,575,577,579,581,583,585,557,559,489,491,493,495,497,529,531,533,535,537,539,541,543,215,217,219,221,223,225,227,229,231,233,235,237,239,241,243,245,293,295};
int temp;
// traverse 0 to array length
for (int i = 0; i < arr.Length ; i++)
{
// traverse i+1 to array length
//for (int j = i + 1; j < arr.Length; j++)
for (int j = i+1; j < arr.Length; j++)
{
// compare array element with
// all next element
if (arr[i] > arr[j]) {
///Console.WriteLine(i+"i before"+arr[i]);
temp = arr[i];
arr[i] = arr[j];
arr[j] = temp;
//Console.WriteLine("i After"+arr[i]);
}
}
}
// print all element of array
foreach(int value in arr)
{
Console.Write(value + " ");
}
}
}
This is selection sort. It's time complexity is O(𝑛²). It has nested loops over i and j, and you can see these produce every possible set of two indices in the range {0,...,𝑛-1}, where 𝑛 is arr.Length. The number of pairs is a triangular number, and is equal to:
𝑛(𝑛-1)/2
...which is O(𝑛²)
If we stick to selection sort, we can still find some improvements.
We can see that the role of the outer loop is to store in arr[i] the value that belongs there in the final sorted array, and never touch that entry again. It does so by searching the minimum value in the right part of the array that starts at this index 𝑖.
Now during that search, which takes place in the inner loop, it keeps swapping lesser values into arr[i]. This may happen a few times, as it might find even lesser values as j walks to the right. That is a waste of operations, as we would prefer to only perform one swap. And this is possible: instead of swapping immediately, delay this operation. Instead keep track of where the minimum value is located (initially at i, but this may become some j index). Only when the inner loop completes, perform the swap.
There is less important improvement: i does not have to get equal to arr.Length - 1, as then there are no iterations of the inner loop. So the ending condition for the outer loop can exclude that iteration from happening.
Here is how that looks:
for (int i = 0, last = arr.Length - 1; i < last; i++)
{
int k = i; // index that has the least value in the range arr[i..n-1] so far
for (int j = i+1; j < arr.Length; j++)
{
if (arr[k] > arr[j]) {
k = j; // don't swap yet -- just track where the minimum is located
}
}
if (k > i) { // now perform the swap
int temp = arr[i];
arr[i] = arr[k];
arr[k] = temp;
}
}
A further improvement can be to use the inner loop to not only locate the minimum value, but also the maximum value, and to move the found maximum to the right end of the array. This way both ends of the array get sorted gradually, and the inner loop will shorten twice as fast. Still, the number of comparisons remains the same, and the average number of swaps as well. So the gain is only in the iteration overhead.
This is "Bubble sort" with O(n^2). You can use "Mergesort" or "Quicksort" to improve your algorithm to O(n*log(n)). If you always know the minimum and maximum of your numbers, you can use "Digit sort" or "Radix Sort" with O(n)
See: Sorting alrogithms in c#

arrayMaxConsecutiveSum c#

As part of learning c# I engage in codesignal challenges. So far everything is going good for me, except for the test stated in the title.
The problem is that my code is not efficient enough to run under 3 seconds when the length of an array is 10^5 and the number of consecutive elements (k) is 1000. My code runs as follows:
int arrayMaxConsecutiveSum(int[] inputArray, int k) {
int sum = 0;
int max = 0;
for (int i = 0; i <= inputArray.Length-k; i++)
{
sum = inputArray.Skip(i).Take(k).Sum();
if (sum > max)
max = sum;
}
return max;
}
All visible tests in the website run OK, but when it comes to hidden test, in test 20, an error occured, stating that
19/20 tests passed. Execution time limit exceeded on test 20: Program exceeded the execution time limit. Make sure that it completes execution in a few seconds for any possible input.
I also tried unlocking solutions but on c# the code is somewhat similar to this but he didn't use LINQ. I also tried to run it together with the hidden tests but same error occurred, which is weird as how it was submitted as a solution when it didn't even passed all tests.
Is there any faster way to get the sum of an array?
I also thought of unlocking the hidden tests, but I think it won't give me any specific solution as the problem would still persists.
It would seem that you are doing the addition of k numbers for every loop. This pseudo code should be more efficient:
Take the sum of the first k elements and set this to be the max.
Loop as you had before, but each time subtract from the existing sum the element at i-1 and add the element at i + k.
Check for max as before and repeat.
The difference here is about the number of additions in each loop. In the original code you add k elements for every loop, in this code, within each loop you subtract a single element and add a single element to an existing sum, so this is 2 operations versus k operations. Your code starts to slow down as k gets large for large arrays.
For this specific case, I would suggest you not to use Skip method as it iterates on the collection every time. You can check the Skip implementation at here. Copying the code for reference.
public static IEnumerable<TSource> Skip<TSource>(this IEnumerable<TSource> source, int count) {
if (source == null) throw Error.ArgumentNull("source");
return SkipIterator<TSource>(source, count);
}
static IEnumerable<TSource> SkipIterator<TSource>(IEnumerable<TSource> source, int count) {
using (IEnumerator<TSource> e = source.GetEnumerator()) {
while (count > 0 && e.MoveNext()) count--;
if (count <= 0) {
while (e.MoveNext()) yield return e.Current;
}
}
}
As you can see Skip iterates the collection everytime, so if you have a huge collection with k as a high number, than you can see the execution time sluggish.
Instead of using Skip, you can write simple for loop which iterates required items:
public static int arrayMaxConsecutiveSum(int[] inputArray, int k)
{
int sum = 0;
int max = 0;
for (int i = 0; i <= inputArray.Length-k; i++)
{
sum = 0;
for (int j = i; j < k + i; j++)
{
sum += inputArray[j];
}
if (sum > max)
max = sum;
}
return max;
}
You can check this dotnet fiddle -- https://dotnetfiddle.net/RrUmZX where you can compare the time difference. For through benchmarking, I would suggest to look into Benchmark.Net.
You need to be careful when using LINQ and thinking about performance. Not that it's slow, but that it can easily hide a big operation behind a single word. In this line:
sum = inputArray.Skip(i).Take(k).Sum();
Skip(i) and Take(k) will both take approximately as long as a for loop, stepping through thousands of rows, and that line is run for every one of the thousands of items in the main loop.
There's no magic command that is faster, instead you have to rethink your approach to do the minimum of steps inside the loop. In this case you could remember the sum from each step and just add or remove individual values, rather than recalculating the whole thing every time.
public static int arrayMaxConsecutiveSum(int[] inputArray, int k)
{
int sum = 0;
int max = 0;
for (int i = 0; i <= inputArray.Length-k; i++)
{
// Add the next item
sum += inputArray[i];
// Limit the sum to k items
if (i > k) sum -= inputArray[i-k];
// Is this the highest sum so far?
if (sum > max)
max = sum;
}
return max;
}
This is my solution.
public int ArrayMaxConsecutiveSum(int[] inputArray, int k)
{
int max = inputArray.Take(k).Sum();
int sum = max;
for (int i = 1; i <= inputArray.Length - k; i++)
{
sum = sum - inputArray[i- 1] + inputArray[i + k - 1];
if (sum > max)
max = sum;
}
return max;
}
Yes you should never run take & skip on large lists but here's a purely LINQ based solution that is both easy to understand and performs the task in sufficient time. Yes iterative code will still outperform it so you have to take the trade off for your use case. Benchmarks because of size or easy to understand
int arrayMaxConsecutiveSum(int[] inputArray, int k)
{
var sum = inputArray.Take(k).Sum();
return Math.Max(sum, Enumerable.Range(k, inputArray.Length - k)
.Max(i => sum += inputArray[i] - inputArray[i - k]));
}

I can't seem to generate PsuedoRandom numbers based on co-ordinates

I am trying to create an infinite map of 1 and 0's procedurally generated by PRNG but only storing a grid of 7x7 which is viewable. e.g. on the screen you will see a 7x7 grid which shows part of the map of 1's and 0's and when you push right it then shifts the 1's and 0's across and puts the next row in place.
If you then shift left as it is pseudo randomly generated it regenerates and shows the original. You will be able to shift either way and up and down infinitely.
The problem I have is that the initial map does not look that random and the pattern repeats itself as you scroll right or left and I think it is something to do with how I generate the numbers. e.g. you could start at viewing x=5000 and y= 10000 and therefore it need to generate a unique seed from these two values.
(Complexity is the variable for the size of the viewport)
void CreateMap(){
if(complexity%2==0) {
complexity--;
}
if (randomseed) {
levelseed=Time.time.ToString();
}
psuedoRandom = new System.Random(levelseed.GetHashCode());
offsetx=psuedoRandom.Next();
offsety=psuedoRandom.Next();
levellocation=psuedoRandom.Next();
level = new int[complexity,complexity];
middle=complexity/2;
for (int x=0;x<complexity;x++){
for (int y=0;y<complexity;y++){
int hashme=levellocation+offsetx+x*offsety+y;
psuedoRandom = new System.Random(hashme.GetHashCode());
level[x,y]=psuedoRandom.Next(0,2);
}
}
}
This is what I use for left and right movement,
void MoveRight(){
offsetx++;
for (int x=0;x<complexity-1;x++){
for (int y=0;y<complexity;y++){
level[x,y]=level[x+1,y];
}
}
for (int y=0;y<complexity;y++){
psuedoRandom = new System.Random(levellocation+offsetx*y);
level[complexity-1,y]=psuedoRandom.Next(0,2);
}
}
void MoveLeft(){
offsetx--;
for (int x=complexity-1;x>0;x--){
for (int y=0;y<complexity;y++){
level[x,y]=level[x-1,y];
}
}
for (int y=0;y<complexity;y++){
psuedoRandom = new System.Random(levellocation+offsetx*y);
level[0,y]=psuedoRandom.Next(0,2);
}
}
To distill it down I need to be able to set
Level[x,y]=Returnrandom(offsetx,offsety)
int RandomReturn(int x, int y){
psuedoRandom = new System.Random(Whatseedshouldiuse);
return (psuedoRandom.Next (0,2));
}
Your problem I think is that you are creating a 'new' random in loops... don't redeclare inside loops as you are... instead declare it at class level and then simply use psuedoRandom.Next e.g. See this SO post for an example of the issue you are experiencing
instead of re-instantiating a Random() class at every iteration like you are doing:
for (int x=0;x<complexity;x++){
for (int y=0;y<complexity;y++){
int hashme=levellocation+offsetx+x*offsety+y;
psuedoRandom = new System.Random(hashme.GetHashCode());
level[x,y]=psuedoRandom.Next(0,2);
}
}
do something more like
for (int x=0;x<complexity;x++){
for (int y=0;y<complexity;y++){
// Give me the next random integer
moreRandom = psuedoRandom.Next();
}
}
EDIT: As Enigmativity has pointed out in a comment below this post, reinstiating at every iteration is also a waste of time / resources too.
P.S. If you really need to do it then why not use the 'time-dependent default seed value' instead of specifying one?
I'd use SipHash to hash the coordinates:
It's relatively simple to implement
It takes both a key (seed for your map) and a message (the coordinates) as inputs
You can increase or decrease the number of rounds as a quality/performance trade-off.
Unlike your code, the results will be reproducible across processes and machines.
The SipHash website lists two C# implementations:
https://github.com/tanglebones/ch-siphash
https://github.com/BrandonHaynes/siphash-csharp
I've play a bit to create rnd(x, y, seed):
class Program
{
const int Min = -1000; // define the starting point
static void Main(string[] args)
{
for (int x = 0; x < 10; x++)
{
for (int y = 0; y < 10; y++)
Console.Write((RND(x, y, 123) % 100).ToString().PadLeft(4));
Console.WriteLine();
}
Console.ReadKey();
}
static int RND(int x, int y, int seed)
{
var rndX = new Random(seed);
var rndY = new Random(seed + 123); // a bit different
// because Random is LCG we can only move forward
for (int i = Min; i < x - 1; i++)
rndX.Next();
for (int i = Min; i < y - 1; i++)
rndY.Next();
// return some f(x, y, seed) = function(rnd(x, seed), rnd(y, seed))
return rndX.Next() + rndY.Next() << 1;
}
}
It works, but ofc an afwul implementation (because Random is LCG), it should give a general idea though. It's memory efficient (no arrays). Acceptable compromise is to implement value caching, then while located in certain part of map surrounding values will be taken from cache instead of generating.
For same x, y and seed you will always get same value (which in turn can be used as a cell seed).
TODO: find a way to make Rnd(x) and Rnd(y) without LCG and highly inefficient initialization (with cycle).
I solved it with this, after all of you suggested approaches.
void Drawmap(){
for (int x=0;x<complexity;x++){
for (int y=0;y<complexity;y++){
psuedoRandom = new System.Random((x+offsetx)*(y+offsety));
level[x,y]=psuedoRandom.Next (0,2);
}
}

Binary search slower, what am I doing wrong?

EDIT: so it looks like this is normal behavior, so can anyone just recommend a faster way to do these numerous intersections?
so my problem is this. I have 8000 lists (strings in each list). For each list (ranging from size 50 to 400), I'm comparing it to every other list and performing a calculation based on the intersection number. So I'll do
list1(intersect)list1= number
list1(intersect)list2= number
list1(intersect)list888= number
And I do this for every list. Previously, I had HashList and my code was essentially this: (well, I was actually searching through properties of an object, so I
had to modify the code a bit, but it's basically this:
I have my two versions below, but if anyone knows anything faster, please let me know!
Loop through AllLists, getting each list, starting with list1, and then do this:
foreach (List list in AllLists)
{
if (list1_length < list_length) //just a check to so I'm looping through the
//smaller list
{
foreach (string word in list1)
{
if (block.generator_list.Contains(word))
{
//simple integer count
}
}
}
// a little more code, but the same, but looping through the other list if it's smaller/bigger
Then I make the lists into regular lists, and applied Sort(), which changed my code to
foreach (List list in AllLists)
{
if (list1_length < list_length) //just a check to so I'm looping through the
//smaller list
{
for (int i = 0; i < list1_length; i++)
{
var test = list.BinarySearch(list1[i]);
if (test > -1)
{
//simple integer count
}
}
}
The first version takes about 6 seconds, the other one takes more than 20 (I just stop there cuz otherwise it would take more than a minute!!!) (and this is for a smallish subset of the data)
I'm sure there's a drastic mistake somewhere, but I can't find it.
Well I have tried three distinct methods for achieving this (assuming I understood the problem correctly). Please note I have used HashSet<int> in order to more easily generate random input.
setting up:
List<HashSet<int>> allSets = new List<HashSet<int>>();
Random rand = new Random();
for(int i = 0; i < 8000; ++i) {
HashSet<int> ints = new HashSet<int>();
for(int j = 0; j < rand.Next(50, 400); ++j) {
ints.Add(rand.Next(0, 1000));
}
allSets.Add(ints);
}
the three methods I checked (code is what runs in the inner loop):
the loop:
note that you are getting duplicated results in your code (intersecting set A with set B and later intersecting set B with set A).
It won't affect your performance thanks to the list length check you are doing. But iterating this way is clearer.
for(int i = 0; i < allSets.Count; ++i) {
for(int j = i + 1; j < allSets.Count; ++j) {
}
}
first method:
used IEnumerable.Intersect() to get the intersection with the other list and checked IEnumerable.Count() to get the size of the intersection.
var intersect = allSets[i].Intersect(allSets[j]);
count = intersect.Count();
this was the slowest one averaging 177s
second method:
cloned the smaller set of the two sets I was intersecting, then used ISet.IntersectWith() and checked the resulting sets Count.
HashSet<int> intersect;
HashSet<int> intersectWith;
if(allSets[i].Count < allSets[j].Count) {
intersect = new HashSet<int>(allSets[i]);
intersectWith = allSets[j];
} else {
intersect = new HashSet<int>(allSets[j]);
intersectWith = allSets[i];
}
intersect.IntersectWith(intersectWith);
count = intersect.Count;
}
}
this one was slightly faster, averaging 154s
third method:
did something very similar to what you did iterated over the shorter set and checked ISet.Contains on the longer set.
for(int i = 0; i < allSets.Count; ++i) {
for(int j = i + 1; j < allSets.Count; ++j) {
count = 0;
if(allSets[i].Count < allSets[j].Count) {
loopingSet = allSets[i];
containsSet = allSets[j];
} else {
loopingSet = allSets[j];
containsSet = allSets[i];
}
foreach(int k in loopingSet) {
if(containsSet.Contains(k)) {
++count;
}
}
}
}
this method was by far the fastest (as expected), averaging 66s
conclusion
the method you're using is the fastest of these three. I certainly can't think of a faster single threaded way to do this. Perhaps there is a better concurrent solution.
I've found that one of the most important considerations in iterating/searching any kind of collection is to choose the collection type very carefully. To iterate through a normal collection for your purposes will not be the most optimal. Try using something like:
System.Collections.Generic.HashSet<T>
Using the Contains() method while iterating over the shorter list of two (as you mentioned you're already doing) should give close to O(1) performance, the same as key lookups in the generic Dictionary type.

Categories

Resources