How can I sort an array of strings?

How can I sort an array of strings? - c#

I have a list of input words separated by comma. I want to sort these words by alphabetical and length. How can I do this without using the built-in sorting functions?

Good question!! Sorting is probably the most important concept to learn as an up-and-coming computer scientist.
There are actually lots of different algorithms for sorting a list.
When you break all of those algorithms down, the most fundamental operation is the comparison of two items in the list, defining their "natural order".
For example, in order to sort a list of integers, I'd need a function that tells me, given any two integers X and Y whether X is less than, equal to, or greater than Y.
For your strings, you'll need the same thing: a function that tells you which of the strings has the "lesser" or "greater" value, or whether they're equal.
Traditionally, these "comparator" functions look something like this:
int CompareStrings(String a, String b) {
if (a < b)
return -1;
else if (a > b)
return 1;
else
return 0;
}
I've left out some of the details (like, how do you compute whether a is less than or greater than b? clue: iterate through the characters), but that's the basic skeleton of any comparison function. It returns a value less than zero if the first element is smaller and a value greater than zero if the first element is greater, returning zero if the elements have equal value.
But what does that have to do with sorting?
A sort routing will call that function for pairs of elements in your list, using the result of the function to figure out how to rearrange the items into a sorted list. The comparison function defines the "natural order", and the "sorting algorithm" defines the logic for calling and responding to the results of the comparison function.
Each algorithm is like a big-picture strategy for guaranteeing that ANY input will be correctly sorted. Here are a few of the algorithms that you'll probably want to know about:
Bubble Sort:
Iterate through the list, calling the comparison function for all adjacent pairs of elements. Whenever you get a result greater than zero (meaning that the first element is larger than the second one), swap the two values. Then move on to the next pair. When you get to the end of the list, if you didn't have to swap ANY pairs, then congratulations, the list is sorted! If you DID have to perform any swaps, go back to the beginning and start over. Repeat this process until there are no more swaps.
NOTE: this is usually not a very efficient way to sort a list, because in the worst cases, it might require you to scan the whole list as many as N times, for a list with N elements.
Merge Sort:
This is one of the most popular divide-and-conquer algorithms for sorting a list. The basic idea is that, if you have two already-sorted lists, it's easy to merge them. Just start from the beginning of each list and remove the first element of whichever list has the smallest starting value. Repeat this process until you've consumed all the items from both lists, and then you're done!
1 4 8 10
2 5 7 9
------------ becomes ------------>
1 2 4 5 7 8 9 10
But what if you don't have two sorted lists? What if you have just one list, and its elements are in random order?
That's the clever thing about merge sort. You can break any single list into smaller pieces, each of which is either an unsorted list, a sorted list, or a single element (which, if you thing about it, is actually a sorted list, with length = 1).
So the first step in a merge sort algorithm is to divide your overall list into smaller and smaller sub lists, At the tiniest levels (where each list only has one or two elements), they're very easy to sort. And once sorted, it's easy to merge any two adjacent sorted lists into a larger sorted list containing all the elements of the two sub lists.
NOTE: This algorithm is much better than the bubble sort method, described above, in terms of its worst-case-scenario efficiency. I won't go into a detailed explanation (which involves some fairly trivial math, but would take some time to explain), but the quick reason for the increased efficiency is that this algorithm breaks its problem into ideal-sized chunks and then merges the results of those chunks. The bubble sort algorithm tackles the whole thing at once, so it doesn't get the benefit of "divide-and-conquer".
Those are just two algorithms for sorting a list, but there are a lot of other interesting techniques, each with its own advantages and disadvantages: Quick Sort, Radix Sort, Selection Sort, Heap Sort, Shell Sort, and Bucket Sort.
The internet is overflowing with interesting information about sorting. Here's a good place to start:
http://en.wikipedia.org/wiki/Sorting_algorithms

Create a console application and paste this into the Program.cs as the body of the class.
public static void Main(string[] args)
{
string [] strList = "a,b,c,d,e,f,a,a,b".Split(new [] { ',' }, StringSplitOptions.RemoveEmptyEntries);
foreach(string s in strList.Sort())
Console.WriteLine(s);
}
public static string [] Sort(this string [] strList)
{
return strList.OrderBy(i => i).ToArray();
}
Notice that I do use a built in method, OrderBy. As other answers point out there are many different sort algorithms you could implement there and I think my code snippet does everything for you except the actual sort algorithm.
Some C# specific sorting tutorials

There is an entire area of study built around sorting algorithms. You may want to choose a simple one and implement it.
Though it won't be the most performant, it shouldn't take you too long to implement a bubble sort.

If you don't want to use build-in-functions, you have to create one by your self. I would recommend Bubble sort or some similar algorithm. Bubble sort is not an effective algoritm, but it get the works done, and is easy to understand.
You will find much good reading on wikipedia.

I would recommend doing a wiki for quicksort.
Still not sure why you don't want to use the built in sort?

Bubble sort damages the brain.
Insertion sort is at least as simple to understand and code, and is actually useful in practice (for very small data sets, and nearly-sorted data). It works like this:
Suppose that the first n items are already in order (you can start with n = 1, since obviously one thing on its own is "in the correct order").
Take the (n+1)th item in your array. Call this the "pivot". Starting with the nth item and working down:
- if it is bigger than the pivot, move it one space to the right (to create a "gap" to the left of it).
- otherwise, leave it in place, put the "pivot" one space to the right of it (that is, in the "gap" if you moved anything, or where it started if you moved nothing), and stop.
Now the first n+1 items in the array are in order, because the pivot is to the right of everything smaller than it, and to the left of everything bigger than it. Since you started with n items in order, that's progress.
Repeat, with n increasing by 1 at each step, until you've processed the whole list.
This corresponds to one way that you might physically put a series of folders into a filing cabinet in order: put one in; then put another one into its correct position by pushing everything that belongs after it over by one space to make room; repeat until finished. Nobody ever sorts physical objects by bubble sort, so it's a mystery to me why it's considered "simple".
All that's left now is that you need to be able to work out, given two strings, whether the first is greater than the second. I'm not quite sure what you mean by "alphabetical and length" : alphabetical order is done by comparing one character at a time from each string. If there not the same, that's your order. If they are the same, look at the next one, unless you're out of characters in one of the strings, in which case that's the one that's "smaller".

Use NSort
I ran across the NSort library a couple of years ago in the book Windows Developer Power Tools. The NSort library implements a number of sorting algorithms. The main advantage to using something like NSort over writing your own sorting is that is is already tested and optimized.

Posting link to fast string sort code in C#:
http://www.codeproject.com/KB/cs/fast_string_sort.aspx
Another point:
The suggested comparator above is not recommended for non-English languages:
int CompareStrings(String a, String b) {
if (a < b) return -1;
else if (a > b)
return 1; else
return 0; }
Checkout this link for non-English language sort:
http://msdn.microsoft.com/en-us/goglobal/bb688122
And as mentioned, use nsort for really gigantic arrays that don't fit in memory.

Related

C# List - A more efficient multiple at once insertRange through shifts in lists

I have a list that I divided into a fixed number of sections (with some that might be empty).
Every section contains unordered elements, however the sections themselves are ordered backwards.
I am referencing each beginning of a section through a fixed dimension array, whose elements are the indexes at which the each section can be found in the list.
I regularly extract the whole section at the tail of the list and, when I do so, I set its index inside the array at 0 (so the section will start to regrow from the head of the list) and then I circularly increment the lastSection variable that I use to keep track of which section is at the tail of the list.
With the same frequency I also need to insert back into the list some new elements that will be spread across one or more sections.
I chose a single sectioned list (instead of a list of lists or something like that) because, even if the sections may vary a lot (from empty to a length of some thousands), the total number of elements has little variations during the application runtime AND because I also frequently need to get all the elements in the list, and didn't want to concatenate multiple lists in order to get the result.
Graphical representation of the data structure
Existential question:
Up to here did I do some mistakes in the choice of the data structure, since these described are all the operations I am doing with it?
Going forward:
The problem I am trying to address, since this is the core of the application I am building (and I want to squeeze out every slice of performance I can since it should run on smartphones), is: how can I do those multiple inserts as fast as possible?
Trivial solution:
For each new group of elements belonging to a certain section, just do an insertRange (sectionBeginning, groupOfElements).
Performance footprint:
every insertRange will force the list to shift all the content after the root of a section to the right, and with multiple insertRange this means that some data will be shifted even M times, where M is the number of insertRange done with index != list.Count.
Little smarter solution:
Knowing before every multiple-inserts step which and how many new elements per section I need to add, I can add empty elements to the back of the list, perform M shifts of determined size, then copy the new elements to the corresponding "holes" left inside the list.
I could extend the list class and implement a new insertRange (int [] index, IEnumerable [] collection) where each index points to the beginning of a section, however I am worried about some possible internal optimizations that the list class might have and that could transform my for loop shifts in worse performance, like an Array.Copy to which I do not think to have access. Is there a way to do a performant list shift in order to implement this and gain some advantages over multiple standard insertRanges?
Note: index and collections should be ordered by section.
Graphical representation of the multiple-at once insertRange approach
Another similar thread about insertRange:
Replace multiple InsertRange() into efficient way
Another similar thread about shifts in lists:
Does code exist, for shifting List elements to left or right by specified amount, in C#?

Magic square brute force algorithm

Basically for an assignment I need to create a C# program that will take a number as input (n) then create a 2D array size n*n with the numbers 1 to (n*n). It needs to use a brute force method. I have done this but at the moment the program will just randomly generate the order of the numbers each time, so it will sometimes check the same order more than once. Obviously this means it takes a really long time to check any number above 3, and even for 3 it can take a few minutes. Basically I'm wondering if there is any way for me to make it only check each order once. I am only allowed to use "basic" C# functions, so just things like *, /, +, - and nothing like .Shuffle etc.

Let me make sure I understand the question: you wish to enumerate all permutations of the numbers 1 through n squared, and check whether the permutation produces a magic square. You are now generating permutations randomly, but instead you wish to generate all permutations.
I wrote a series of articles on generating permutations; it is too long to easily summarize here.
http://ericlippert.com/2013/04/15/producing-permutations-part-one/

Choosing random order, as you found, is not a good idea.
I suggest that you put all the number 1 ... (n*n) in array , and than find all the permutation.
when you have all the permutation, it's easy to create square (1 .. n ==> the first row, n+1 ... 2n ==> the second row and so on).
Now, finding all the permutation can be done with the basic operation with recursion

Using Lookup or Dictionary to represent a hand of cards in C#

I apologize if there is a similar question already out there. There
are several questions about scoring hands but I don't need that.
The project I am working on takes in 10 cards and needs to report the
best possible 5-card hand found ("straight", "high card", "flush"
etc.). Luckily what the actual hand of cards is is irrelevant, I just
need a name.
I've already parsed and sorted all the cards out and have the tests
for all the possible hands laid out. All I need now is a convenient
way to store the hands. My mad method is as follows, in pseudocode
terms:
I want to have a dynamic list horizontally that I can populate with the NUMBER values of the cards, in order from highest to lowest. For
example, "Q J T 7 4 2 1". T is 10. Duplicates of values will be
ignored. Next, I want each of those values to have, underneath, a
list of the suits of each value that exist in the deck. For example,
J will have a sub-list with the values "D H" to represent that I have
a Jack of Diamonds and a Jack of Hearts.
I believe this to be the most elegant way to deal with these cards,
since most poker hands deal with only values and this way I don't have
to worry about cards of the same value in a row for say the straight
test. Then the two tests that do deal with suit can easily be
tested for by referring to the values under the keys.
Take a deep breath, almost there.
So an instance of Lookup appears to be perfect! It has the exact "one
key to multiple values" structure that I want. However, it doesn't
allow me to add the suits as I come to them. I have to add them all
at once or not at all since the lists are immutable after entry.
So I either
have to find all the suits at once before I even make the Lookup
Somehow add values to the Lookup lists or
Use something else.
Any ideas on any of these?
UPDATE
TL;DR SPARKNOTES VERSION: How can I add more values to the keys inside of a Lookup?
*IMOPRTANT NOTE:*The output of this program should be a string containing the name of the highest hand possible, for example "four of a kind", "two pair" or "high card."
I found one solution (which I unfortunately lost the link to and can't find again) where they suggested re-creating the entire Lookup with the new list. It may just be me but I find that solution to be very... ugly... Anyway several other solutions I have explored or tested are to:
METHOD 1
Roll through and populate another array with the suits associated with each value. Basically (in actual pseudocode this time >_>):
Create an array of ArrayLists (array 1)
Iterate through a sorted string array of "cards" (array 2)
For each card:
Take the char at string index [1] (representing the suit) and add into the ArrayList in array 1 at the index number extracted from string index [0].
This way I have the list of values with associated suits that I wanted. And the list of suits is the minimal size to boot, making iteration through that easier later. With some extra steps I can even make the umbrella array an ArrayList and populate it with the card values in order so there are no gaps and no duplicate numbers. This will leave me with a jagged array of what I want. To be clear, this is not a homework assignment. However, it IS from a coding class project my roommate completed in the past which is why I have the constrictions and requirements I have. Someone else I asked told me SE gets plagued by these kinds of homework questions around this time, so I understand your skepticism. This is a personal project because I want to learn C# (all I know is Java right now, and I like the sound of parameter pointer passing in C# methods, which Java does not do).
If it WERE for a grade I would end there because it works. But I don't really like arraylists of arraylists, they seem messy to me. So I want to know if there is another method.
METHOD 2
I also considered simply dealing with duplicates that inevitably appear. For example, here is my test for a straight:
for (int i = 0; i < 5; i++)
{
int counter = 0;
for (int j = i; j < i + 4; j++)
{
int secondCard = getValue(cardsArray[j + 1]);
int firstCard = getValue(cardsArray[j]);
if (secondCard == firstCard)
{
break;
}
if (secondCard == (firstCard + 1))
{
counter++;
if (counter == 4)
{
isStraight = true;
return "straight";
}
}
}
}
This code does not work. It needs some tweaks somewhere or other to work completely but I want to analyze if it is worth it before I try to fix it. It DOES accurately test for a straight, though. Also a couple notes: firstCard and secondCard are there for readability and debug purposes, and isStraight is there so that I don't reinvent the wheel later when I test for a straight flush.
This nested loop will iterate through all the cards up till the 5th card (since you can't have a straight out of ten sorted cards with less than 5 cards) and then check the next five cards as you would expect. If during this iteration I encounter a duplicate entry it means that it's the same card of another suit and I simply "break". What SHOULD happen as a result of this one statement is that now we have incremented our second iteration by one to check the next card instead of the current one. The count of in-order cards that we have will stay the same so that a list like " 1D 2D 3S 3H 4D 5C" will skip over the second 3 when finding the straight. Despite the break I was actually quite pleased with the elegance of this solution, whether I had a right to be or not.
It all goes back to the flaws of using a simple array of strings ("cards"), which is what my code is tailored to right now. And I hate fixing issues, I'd rather avoid them. Maybe I'm being unnecessarily picky but I'm learning along the way.
METHOD 3
My consideration of the weaknesses of an array of strings lead me to Dictionaries, which looked attractive. It can easily be made to hold my values in order, and easy to find if I have a certain suit for a key (TryGet), all in a neat, tailor-made package. Creating multiple array lists and doing things like "(find index of my value); array1[index].Add(value)" would be replaced by "Dictionary.Add(value, suit)". But I can only add a suit to a key at the point of creation. I couldn't make a "2" key and add "S" and then when I find out the next card is a "2D" add a D under the "2" key. Dictionary just doesn't support that, or even adding multiple values at all. I can make a dictionary of lists, but I still can't edit the list since Dictionaries are mostly query data structures. Lookups support multiple values per key but still cannot be changed after the initial "Add()". Again I could "re-create" the entire lookup or dictionary to add a suit and keep everything in order. But to me that seems like rebuilding the whole bridge because this one cable is too long and I don't have an industrial able cutter. It's a problem that SHOULD have an easier solution, like maybe go and GET some cutters (import a class maybe?).
CONCLUSION
Since you suggest that my needs are no different than what a hand scoring system could deliver leads me to another question:
Are hand scores directly tied to certain hands? Like I mentioned earlier the result I want is "The best hand you can make is a full house" not "This player has the highest hand." So can I calculate the highest scoring hand and extrapolate a "full house" from that score? If so then I guess this is all unnecessary code, but I would kind of like to solve this anyway in that case.
As I wrote this edit it dawned on me that this is basically a vanity issue. I don't "like" the solution I have. I also don't want to use the accepted solution (table lookup) because that is not a coding project that is a copypaste project. I would greatly appreciate any input.

Let's do this the simplest way possible. First, say you have an array of 10 strings, each of which is the name of a card. Like "Four of Hearts" and "Queen of Spades". That's really inconvenient to work with. So the first thing we do is convert those strings to numbers to represent each card. A very convenient way to do it is to use numbers 0-12 for hearts, 13-25 for diamonds, etc. So you have code (possibly a lookup table) that converts names to numbers:
Ace of Hearts = 0
Two of Hearts = 1
Three of Hearts = 2
...
Queen of Hearts = 11
King of Hearts = 12
Ace of Diamonds = 13
...
...
Ace of Clubs = 26
...
...
Kind of Spades = 51
So you have an array of numbers that represents the 10 cards. Call it cardsArray:
int[] cardsArray = new int[10];
// here, fill the cards array from the input
It's easy to check for flushes if you sort by suit and value. Remember, there are only 10 cards, so sorting isn't going to take a huge amount of time. The sort is really easy:
int[] sortedBySuit =
cardsArray
.OrderBy(x => x/13) // sorts by suit
.ThenByDescending(x => x % 13) // then by value, descending
.ToArray();
You can then go sequentially through the array and determine if you have a flush, straight flush, and what the high card in the flush (if any) is.
You have to save that information, because four-of-a-kind beats a flush, for example. So you need to check that, too.
Next, sort by value:
int[] sortedByValue =
cardsArray
.OrderByDescending(x => x % 13)
.ToArray();
Now you can go sequentially through that list to determine high card, pairs, three of a kind, four of a kind, or straights. As you find each type of hand, you save that hand information ("king high straight" or "three tens", along with the hand's value [1 for high card, 2 for pair, straight, flush, full house, etc. in the proper order]) to a list.
Then you just pick the hand with the highest value from those that you found.
That's definitely not the fastest way to do things, but it's simple, uses very little memory, and is fast enough for a prototype. It's certainly simpler than using a dictionary or array of arrays, etc.

To be clear, I didn't read your novel.
TL;DR SPARKNOTES VERSION: How can I add more values to the keys inside of a Lookup?
Usually, when I need a "dictionary" with a key that has multiple values, I use a List<KeyValuePair<string, int>>. You could use LINQ to Objects to select all the values. For example:
static void StackOverflowExample()
{
var cardList = new List<KeyValuePair<string,int>> ()
{
new KeyValuePair<string, int>("Club", 8),
new KeyValuePair<string, int>("Spade", 9),
new KeyValuePair<string, int>("Heart", 10)
};
var results = cardList.Where(p => p.Key == "Heart");
}
var results is an IEnumerable<KeyValuePair<string,int>>. Hopefully, this helps.

Efficiently pairing objects in lists based on key

So, here's the deal.
(My current use-case is in C#, but I'm also interested in the general algorithmic case)
I am given two Arrays of objects (I don't get to alter the code that creates these arrays, unfortunately).
Each object has (as part of it) a .Name property, a string.
These strings are unique per object, and they have zero or one matching strings in the other object.
What I need to do is efficiently pair these objects based on that string, into some sort of collection that allows me access to the paired objects. The strings need to match exactly to be considered a match, so I don't need any Upper or CaseInsensitive, etc.
Sadly, these lists are not sorted.
The lists themselves are maybe 30-50 items, but I need to repeat the algorithm on thousands of these array-pairs in a row, so efficiency is important.
Since I know that there's 0 or 1 match, and I know that most of them will be 1 match, I feel like there's a more efficient algorithm than x*y (Foreach item in x, foreach item in y, if x=y then x and y are a match)
I believe the most likely options are:
Keep the unsorted list and just do x*y, but drop items from the list once I've found them so I don't check ones already-found,
OR:
Convert both to Dictionaries and then do an indexed lookup on each (array2[currentArray1Item])
OR:
Sort the lists myself (Array.Sort()), and then having sorted arrays I can probably do something clever like jump to the index in B where I'd expect to find it (wherever it was in A) and then move up or down based on string until I either find it or pass where it should've been.
Then once that's done I need to figure out how to store it, I suppose I can make a custom ObjectPair class that just holds objects A and B. No need to do anything fancy here, since I'm just going to ForEach on the pairs.
So the questions are:
Are any of the above algorithms the fastest way to do this (if not, what is?) and is there some existing C# structure that'd conveniently hold the found pairs?
EDIT: Array.Sort() is a method that exists, so I don't need to convert the array to List to sort. Good to know. Updated above.

The question I have is: how much efficiency do we gain from the special handling if it requires us to sort both input arrays? According to the documentation for Array.Sort, it is O(n log n) on average and O(n ^ 2) in the worst case (quicksort). Once we have both arrays sorted, we then have another O(n) amount of work because we have to loop through the first one.
I interpret this to mean that the overall amount of work might actually increase because of the number of iterations required to sort, then process. This of course would be a different story if you could guarantee sorted arrays at the start, but as you said you cannot. (I should also note that you would need to create a custom IComparer<T> implementation to pass to Array.Sort so it knows to use the .Name property. That's not runtime work, but it's still work :-)
You might consider using a LINQ join, which only iterates the inner array a single time (see here for psuedocode). This is as opposed to the nested foreach statements, which would iterate the inner array for each element of the outer array. It's about as efficient as it can be in the general case and doesn't introduce the complexity of the special handling you suggested.
Here is an example implementation:
var pairs =
from item1 in array1
join item2 in array2 on item1.Name equals item2.Name
select new { item1, item2 };
foreach(var pair in pairs)
{
// Use the pair somehow
}
That very clearly states what you are doing with the data and also gives you an anonymous type representing each pair (so you don't have to invent a pairing). If you do end up going a different route, I would be interested in how it compares to this approach.

Sort the second array using Array.Sort method, then match objects in the second Array using Binary Search Algorithm.
Generally, for 30-50 items this would be a little faster than brute force x*y.

Understanding and solving K-Way merge sort

I want to:
count the number of comparisons needed by k-Way merge sort to sort random permutation of numbers from 0 to N-1.
to count the number of data moves needed by K-Way merge sort to sort random permutation of numbers from 0 to N-1.
I understand how 2-way merge sort works correctly, and understand the code very well. My problem now is I don't know how to start. How do I convert the 2-way merge sort into K-Way so that I can solve the above problems?
I have searched the web but can't find any tutorial to explain "k-Way merge sort" very well.
I need good explanation what to do so that I can take it from there and do it myself.
Like I said I understand the 2-Way, so how do I move to the K-Way merge sort? How do I implement the K-way?
Edit
I read some post http://bchalk.com/work/view/k_way_merge_sort
that BinaryHeap must be used to implement k-Way merge. Is that so or there are other ways?
How do I divide my list into K? Is there a special way of doing it?

When k > 2, the leading elements from each of the input streams are typically kept in a minheap structure. This makes it easy to find to the mininum of the the n-values, to pop that value off the heap, and insert a replacement value from the corresponding input stream.
A heap does O(lg2 k) comparisons for each insertion, so the total work for a k-way merge of n items is n * lg2(k).
Eventhough you asked about C# and Java, you can learn how to do it by looking at the Python standard library code for a k-way merge: http://hg.python.org/cpython/file/2.7/Lib/heapq.py#l323
To answer your other question, there is no special way to divide your list into K groups. Just take the first N/k elements in the first array, the next N/k elements into the next, etc. Sort each array and then merge them using heaps as mentioned above.

You could always think of a k-way merge as a series of 2-way merges, that is, do a 2-way merge with the result of the first and second, and the third: Merge(Merge(L1, L2), L3) and so on. Even faster would be to split it twice: Merge(Merge(L1, L2), Merge(L3, L4)). As you can see, for a k-way sort, you would then need some sort of loop (recursion).

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.