Algorithm to match teammate preferences - c#

I have a situation where i have to create a C# routine which has the following logic to take a list of people and divide them into 2 teams based on preferences:
I have an array of 20 names:
var names = new List(){"Joe", "Bill", "Scott", "Jonathan", . . .}
Each name can give 0 to 3 preferences, so for each name, I have is an array of 0 to 3 length and is an array of strings with other names in the list (they are people they want to be on their team)
I now need to solve for taking the list of 20 people and dividing them into 2 teams and creating the teams (sub lists) based on optimizing for people's preferences. So each person should get AT LEAST one person that they included in their preference on their team (if mathematically possible). There is no priority of one person above anyone else, just trying to optimize for the top number of matches.
I can convert the string lists into a list of objects
List<Person> list = CreateList(array)
where Person class is the following
public class Person
{
public string Name;
public List<Person> Preferences;
}
but now i am trying to figure out how to use this data structure to generate the 2 teams where i end up with 2 lists of teams that are set of 10 people.

The optimal flow method can solve your problem - maximizing flow in a graph which branches capacity express preferences. There is a N^3 algorithm.
See here for an example : Algorithm for optimal matching with choices and ranking

Instead of setting up an adjacency matrix and trying to extract some meaningful vector solution via linear algebra, I think you might get some mileage out of an iterative "team captain" type model.
I would start by representing each individual's preferences as a row vector, mimicking the rows of an adjacency matrix.
For ease of explanation, I'll use an example with 4 people: A, B, C, and D. A and D want to be together, B wants to be with C, and C wants to be with A.
The rows of the adjacency matrix are [1,0,0,1],[0,1,1,0],[1,0,1,0],[1,0,0,1]. So I would carry on representing each individual as their row in the adjacency matrix. A is [1,0,0,1], B is [0,1,1,0], C is [1,0,1,0], and D is [1,0,0,1].
Now I would create team captains. I'm going to look for two vectors whose dot product is minimal. A zero dot product is not guaranteed to exist by your requirements. For instance, everyone could want to be with one particular individual. But you can still find two vectors whose dot product is minimal, and entirely orthogonal is likely in practice.
Now make those two people team captains. In this example, that means A=[1,0,0,1] and B=[0,1,1,0] are viable team captains.
Now iteratively allow them to start picking their teams. A can find his optimal new member by taking dot products with C, and D, and observing D yields the maximum dot product, resulting in the teams {A,D}, {B,C}.
Of course, in your case, it doesn't stop there. So let's consider what would happen if there were also an E, F, G, and H waiting to get picked. In that case, the team {A,D} would multiply the (column) matrix [A][D] with the (column) vectors [E],[F],[G], and [H], and see which product has maximal norm, selecting it and allowing team {B,C} to select its newest member through the same process: max{norm([B][C]*[candidate])}.

Related

Best match between two sets of points

I've got two lists of points, let's call them L1( P1(x1, y1), ... Pn(xn, yn)) and L2(P'1(x'1, y'1), ... P'n(x'n, y'n)).
My task is to find the best match between their points for minimizing the sum of their distances.
Any clue on some algorithm? The two lists contain approx. 200-300 points.
Thanks and bests.
If the use case of your problem involves matching ever point present in list L1 with a point in list L2, then the Hungarian Algorithm would serve as a perfect fit.
The weights corresponding to your Hungarian matrix would be the distance between the point annotated for the row vs the column. The overall runtime for the optimized Hungarian algorithm is O(n3) which will comfortably fit for your given constraint of n = 300
A pretty nice tutorial covering the ideology and implementation of the Hungarian algorithm is https://www.topcoder.com/community/competitive-programming/tutorials/assignment-problem-and-hungarian-algorithm/
If not for the Hungarian algorithm, you can also morph the given problem into a max-flow-min-cost problem - the details of which I'll omit for now but can discuss if required.

Cartesian product with restrictions

Hello, I am new in C# Programming and I would like to ask a question related to the cartesian product. I found the method for calculating through StackOverflow and I used it. The method I use is:
char[] letters = { 'A', 'B', 'C' };
int[] numbers = { 1, 2, 3, 4 };
string[] colours = { "Red", "Blue" };
var cartesianProduct = from letter in letters
from number in numbers
from colour in colours
select new { letter, number, colour };
In my case the cartesian product means something. For example for the combination "A1Red" I upload to the program an array with the consistency values between A and 1 and A and Red. What I need from the program is to give me the sum of this values.
It works perfectly if i have 10 arrays to find the combinations but it stuck when I need to calculate over 23 arrays and total amount of combinations more than 100 trillions.
Is there something that I can do to make it run fast?
Unfortunately, there is not much you can do. Your problem requires so-called "exponential time".
As #esel points out, you can do some optimizations, and you can use parallellism.
But unless there is an underlying structure to your arrays, something correlation between the data that you can exploit, you're simply stuck with this "exponential time". Every array added to your list, will multiply the amount of time needed to compute. This escalates fast.
There is a very small consolation: as soon as any of your arrays is empty, this flattens the whole thing to the empty set.
See if there is an underlying structure to your data that you can exploit.
Edit:
In the comments, you mention taking the first 10.000 combinations. I'm not quite sure I understand the rest, but if you need the first 10.000 of cartesianProduct (i.e. unprocessed), then there is a way:
var first10000 = cartesianProduct.Take(10000);
This works because LinQ uses "lazy evaluation": it will not calculate the values in the cartesian product until it has to. As a consequence, no more than 10000 values will be calculated.
However, if some processing needs to be done first, like sorting, then I'm afraid you're out of luck.
There are ways to make some algorithms run more quickly, using parallelism to process multiple parts of the domain at once by taking advantage of many cores and special CPU or GPU instructions.
However, there are temporal limits. If the answer space is indeed that large, you simply cannot compute the results in a reasonable amount time.

Using Lookup or Dictionary to represent a hand of cards in C#

I apologize if there is a similar question already out there. There
are several questions about scoring hands but I don't need that.
The project I am working on takes in 10 cards and needs to report the
best possible 5-card hand found ("straight", "high card", "flush"
etc.). Luckily what the actual hand of cards is is irrelevant, I just
need a name.
I've already parsed and sorted all the cards out and have the tests
for all the possible hands laid out. All I need now is a convenient
way to store the hands. My mad method is as follows, in pseudocode
terms:
I want to have a dynamic list horizontally that I can populate with the NUMBER values of the cards, in order from highest to lowest. For
example, "Q J T 7 4 2 1". T is 10. Duplicates of values will be
ignored. Next, I want each of those values to have, underneath, a
list of the suits of each value that exist in the deck. For example,
J will have a sub-list with the values "D H" to represent that I have
a Jack of Diamonds and a Jack of Hearts.
I believe this to be the most elegant way to deal with these cards,
since most poker hands deal with only values and this way I don't have
to worry about cards of the same value in a row for say the straight
test. Then the two tests that do deal with suit can easily be
tested for by referring to the values under the keys.
Take a deep breath, almost there.
So an instance of Lookup appears to be perfect! It has the exact "one
key to multiple values" structure that I want. However, it doesn't
allow me to add the suits as I come to them. I have to add them all
at once or not at all since the lists are immutable after entry.
So I either
have to find all the suits at once before I even make the Lookup
Somehow add values to the Lookup lists or
Use something else.
Any ideas on any of these?
UPDATE
TL;DR SPARKNOTES VERSION: How can I add more values to the keys inside of a Lookup?
*IMOPRTANT NOTE:*The output of this program should be a string containing the name of the highest hand possible, for example "four of a kind", "two pair" or "high card."
I found one solution (which I unfortunately lost the link to and can't find again) where they suggested re-creating the entire Lookup with the new list. It may just be me but I find that solution to be very... ugly... Anyway several other solutions I have explored or tested are to:
METHOD 1
Roll through and populate another array with the suits associated with each value. Basically (in actual pseudocode this time >_>):
Create an array of ArrayLists (array 1)
Iterate through a sorted string array of "cards" (array 2)
For each card:
Take the char at string index [1] (representing the suit) and add into the ArrayList in array 1 at the index number extracted from string index [0].
This way I have the list of values with associated suits that I wanted. And the list of suits is the minimal size to boot, making iteration through that easier later. With some extra steps I can even make the umbrella array an ArrayList and populate it with the card values in order so there are no gaps and no duplicate numbers. This will leave me with a jagged array of what I want. To be clear, this is not a homework assignment. However, it IS from a coding class project my roommate completed in the past which is why I have the constrictions and requirements I have. Someone else I asked told me SE gets plagued by these kinds of homework questions around this time, so I understand your skepticism. This is a personal project because I want to learn C# (all I know is Java right now, and I like the sound of parameter pointer passing in C# methods, which Java does not do).
If it WERE for a grade I would end there because it works. But I don't really like arraylists of arraylists, they seem messy to me. So I want to know if there is another method.
METHOD 2
I also considered simply dealing with duplicates that inevitably appear. For example, here is my test for a straight:
for (int i = 0; i < 5; i++)
{
int counter = 0;
for (int j = i; j < i + 4; j++)
{
int secondCard = getValue(cardsArray[j + 1]);
int firstCard = getValue(cardsArray[j]);
if (secondCard == firstCard)
{
break;
}
if (secondCard == (firstCard + 1))
{
counter++;
if (counter == 4)
{
isStraight = true;
return "straight";
}
}
}
}
This code does not work. It needs some tweaks somewhere or other to work completely but I want to analyze if it is worth it before I try to fix it. It DOES accurately test for a straight, though. Also a couple notes: firstCard and secondCard are there for readability and debug purposes, and isStraight is there so that I don't reinvent the wheel later when I test for a straight flush.
This nested loop will iterate through all the cards up till the 5th card (since you can't have a straight out of ten sorted cards with less than 5 cards) and then check the next five cards as you would expect. If during this iteration I encounter a duplicate entry it means that it's the same card of another suit and I simply "break". What SHOULD happen as a result of this one statement is that now we have incremented our second iteration by one to check the next card instead of the current one. The count of in-order cards that we have will stay the same so that a list like " 1D 2D 3S 3H 4D 5C" will skip over the second 3 when finding the straight. Despite the break I was actually quite pleased with the elegance of this solution, whether I had a right to be or not.
It all goes back to the flaws of using a simple array of strings ("cards"), which is what my code is tailored to right now. And I hate fixing issues, I'd rather avoid them. Maybe I'm being unnecessarily picky but I'm learning along the way.
METHOD 3
My consideration of the weaknesses of an array of strings lead me to Dictionaries, which looked attractive. It can easily be made to hold my values in order, and easy to find if I have a certain suit for a key (TryGet), all in a neat, tailor-made package. Creating multiple array lists and doing things like "(find index of my value); array1[index].Add(value)" would be replaced by "Dictionary.Add(value, suit)". But I can only add a suit to a key at the point of creation. I couldn't make a "2" key and add "S" and then when I find out the next card is a "2D" add a D under the "2" key. Dictionary just doesn't support that, or even adding multiple values at all. I can make a dictionary of lists, but I still can't edit the list since Dictionaries are mostly query data structures. Lookups support multiple values per key but still cannot be changed after the initial "Add()". Again I could "re-create" the entire lookup or dictionary to add a suit and keep everything in order. But to me that seems like rebuilding the whole bridge because this one cable is too long and I don't have an industrial able cutter. It's a problem that SHOULD have an easier solution, like maybe go and GET some cutters (import a class maybe?).
CONCLUSION
Since you suggest that my needs are no different than what a hand scoring system could deliver leads me to another question:
Are hand scores directly tied to certain hands? Like I mentioned earlier the result I want is "The best hand you can make is a full house" not "This player has the highest hand." So can I calculate the highest scoring hand and extrapolate a "full house" from that score? If so then I guess this is all unnecessary code, but I would kind of like to solve this anyway in that case.
As I wrote this edit it dawned on me that this is basically a vanity issue. I don't "like" the solution I have. I also don't want to use the accepted solution (table lookup) because that is not a coding project that is a copypaste project. I would greatly appreciate any input.
Let's do this the simplest way possible. First, say you have an array of 10 strings, each of which is the name of a card. Like "Four of Hearts" and "Queen of Spades". That's really inconvenient to work with. So the first thing we do is convert those strings to numbers to represent each card. A very convenient way to do it is to use numbers 0-12 for hearts, 13-25 for diamonds, etc. So you have code (possibly a lookup table) that converts names to numbers:
Ace of Hearts = 0
Two of Hearts = 1
Three of Hearts = 2
...
Queen of Hearts = 11
King of Hearts = 12
Ace of Diamonds = 13
...
...
Ace of Clubs = 26
...
...
Kind of Spades = 51
So you have an array of numbers that represents the 10 cards. Call it cardsArray:
int[] cardsArray = new int[10];
// here, fill the cards array from the input
It's easy to check for flushes if you sort by suit and value. Remember, there are only 10 cards, so sorting isn't going to take a huge amount of time. The sort is really easy:
int[] sortedBySuit =
cardsArray
.OrderBy(x => x/13) // sorts by suit
.ThenByDescending(x => x % 13) // then by value, descending
.ToArray();
You can then go sequentially through the array and determine if you have a flush, straight flush, and what the high card in the flush (if any) is.
You have to save that information, because four-of-a-kind beats a flush, for example. So you need to check that, too.
Next, sort by value:
int[] sortedByValue =
cardsArray
.OrderByDescending(x => x % 13)
.ToArray();
Now you can go sequentially through that list to determine high card, pairs, three of a kind, four of a kind, or straights. As you find each type of hand, you save that hand information ("king high straight" or "three tens", along with the hand's value [1 for high card, 2 for pair, straight, flush, full house, etc. in the proper order]) to a list.
Then you just pick the hand with the highest value from those that you found.
That's definitely not the fastest way to do things, but it's simple, uses very little memory, and is fast enough for a prototype. It's certainly simpler than using a dictionary or array of arrays, etc.
To be clear, I didn't read your novel.
TL;DR SPARKNOTES VERSION: How can I add more values to the keys inside of a Lookup?
Usually, when I need a "dictionary" with a key that has multiple values, I use a List<KeyValuePair<string, int>>. You could use LINQ to Objects to select all the values. For example:
static void StackOverflowExample()
{
var cardList = new List<KeyValuePair<string,int>> ()
{
new KeyValuePair<string, int>("Club", 8),
new KeyValuePair<string, int>("Spade", 9),
new KeyValuePair<string, int>("Heart", 10)
};
var results = cardList.Where(p => p.Key == "Heart");
}
var results is an IEnumerable<KeyValuePair<string,int>>. Hopefully, this helps.

Permutation/Algorithm to Solve Conditional Fill Puzzle

I've been digging around to see if something similar has been done previously, but have not seen anything with the mirrored conditions. To make swallowing the problem a little easier to understand, I'm going to apply it in the context of filling a baseball team roster.
The given roster structure is organized as such: C, 1B, 2B, 3B, SS, 2B/SS (either or), 1B/3B, OF, OF, OF, OF, UT (can be any position)
Every player has at least one of the non-backup positions (positions that allow more than one position) where they're eligible and in many cases more than one (i.e. a player that can play 1B and OF, etc.). Say that you are manager of a team, which already has some players on it and you want to see if you have room for a particular player at any of your slots or if you can move one or more players around to open up a slot where he is eligible.
My initial attempts were to use a conditional permutation and collect in a list all the possible unique "lineups" for each player, updating the open slots before moving to the next player. This also required (since the order that the player was moved would affect what positions were available for the next player) that the list being looped through was reordered and then looped through again. I still think that this is the way to go, but there are a number of pitfalls that have snagged the function.
The data to start the loop that you assume is given is:
1. List of positions the player being evaluated can player (the one being checked if he can fit)
2. List of players currently on the roster and the positions each of those is eligible at (I'm currently storing a list of lists and using the list index as the unique identifier of the player)
3. A list of the positions open as the roster currently is
It's proven a bigger headache than I originally anticipated. It was even suggested to me by a colleague that the situation I have (which involves, on a much larger scale, conditional assignments for each object) was NP-complete. I am certain that it is not, since once a player has been repositioned in a particular lineup being tested, the entire roster should not need to be iterated over again once another player has moved. That's the long and short of it and I finally decided to open it up to the forums.
Thanks for any help anyone can provide. Due to restrictions, I can't post portions of code (some of it is legacy). It is, however, being translated in .NET (C# at the moment). If there's additional information necessary, I'll try and rewrite some of the short pieces of the function for post.
Joseph G.
EDITED 07/24/2010
Thank you very much for the responses. I actually did look into using a genetic algorithm, but ultimately abandoned it because of the amount of work that would go into the determination of ordinal results was superfluous. The ultimate aim of the test is to determine if there is, in fact, a scenario that returns a positive. There's no need to determine the relative benefit of each working solution.
I appreciate the feedback on the likely lack of familiarity with the context I presented the problem. The actual model is in the distribution of build commands across multiple platform-specific build servers. It's accessible, but I'd rather not get into why certain build tasks can only be executed on certain systems and why certain systems can only execute certain types of build commands.
It appears that you have gotten the gist of what I was presenting, but here's a different model that's a little less specific. There are a set of discrete positions in an ordered array of lists as such (I'll refer to these as "positions"):
((2), (2), (3), (4), (5), (6), (4, 6), (3, 5), (7), (7), (7), (7), (7), (2, 3, 4, 5, 6, 7))
Additionally, there is a an unordered array of lists (I'll refer to as "employees") that can only occupy one of the slots if its array has a member in common with the ordered list to which it would be assigned. After the initial assignments have been made, if an additional employee comes along, I need to determine if he can fill one of the open positions, and if not, if the current employees can be rearranged to allow one of the positions the employee CAN fill to be made available.
Brute force is something I'd like to avoid, because with this being on the order of 40 - 50 objects (and soon to be increasing), individual determinations will be very expensive to calculate at runtime.
I don't understand baseball at all so sorry if I'm on the wrong track. I do like rounders though, but there are only 2 positions to play in rounders, a batter or everyone else.
Have you considered using Genetic Algorithms to solve this problem? They are very good at solving NP hard problems and work surprisingly well for rota and time schedule type problems as well.
You have a solution model which can easily be scored and easily manipulated which is a great start for a genetic algorithm.
For more complex problems where the total permutations are too large to calculate a genetic algorithm should find a near optimum or excellent solution (along with lots and lots of other valid solutions) in a fairly short amount of time. Although if you wish the find the optimum solution every time, you are going to have to brute force it in all likelihood (I have only skimmed the problem so this may not be the case but it sounds like it probably is).
In your example, you would have a solution class, this represents a solution, IE a line-up for the baseball team. You randomly generate say 20 solutions, regardless if they are valid or not, then you have a rating algorithm that rates the solution. In your case, a better player in the line-up would score more than a worse player, and any invalid line-ups (for whatever reason) would force a score of 0.
Any 0 scoring solutions are killed off, and replaced with new random ones, and the rest of the solutions breed together to form new solutions. Theoretically and after enough time the pool of solutions should improve.
This has the benefit of not only finding lots of valid unique line-ups, but also rating them. You didn't specify in your problem the need to rate the solutions, but it offers plenty of benefits (for example if a player is injured, he can be temporarily rated as a -10 or whatever). All other players score based on their quantifiable stats.
It's scalable and performs well.
It sounds as though you have a bipartite matching problem. One partition has a vertex for each player on the roster. The other has a vertex for each roster position. There is an edge between a player vertex and a position vertex if and only if the player can play that position. You are interested in matchings: collections of edges such that no endpoint is repeated.
Given an assignment of players to positions (a matching) and a new player to be accommodated, there is a simple algorithm to determine if it can be done. Direct each edge in the current matching from the position to the player; direct the others from the player to the position. Now, using breadth-first search, look for a path from the new player to an unassigned position. If you find one, it tells you one possible series of reassignments. If you don't, there's no matching with all of the players.
For example, suppose player A can play positions 1 or 2
A--1
\
\
2
We provisionally assign A to 2. Now B shows up and can only play 2. Direct the graph:
A->1
<
\
B->2
We find a path B->2->A->1, which means "assign B to 2, displacing A to 1".
There is lots of pretty theory for dealing with hypothetical matchings. Genetic algorithms need not apply.
EDIT: I should add that because of the use of BFS, it computes the least disruptive sequence of reassignments.

How can I sort an array of strings?

I have a list of input words separated by comma. I want to sort these words by alphabetical and length. How can I do this without using the built-in sorting functions?
Good question!! Sorting is probably the most important concept to learn as an up-and-coming computer scientist.
There are actually lots of different algorithms for sorting a list.
When you break all of those algorithms down, the most fundamental operation is the comparison of two items in the list, defining their "natural order".
For example, in order to sort a list of integers, I'd need a function that tells me, given any two integers X and Y whether X is less than, equal to, or greater than Y.
For your strings, you'll need the same thing: a function that tells you which of the strings has the "lesser" or "greater" value, or whether they're equal.
Traditionally, these "comparator" functions look something like this:
int CompareStrings(String a, String b) {
if (a < b)
return -1;
else if (a > b)
return 1;
else
return 0;
}
I've left out some of the details (like, how do you compute whether a is less than or greater than b? clue: iterate through the characters), but that's the basic skeleton of any comparison function. It returns a value less than zero if the first element is smaller and a value greater than zero if the first element is greater, returning zero if the elements have equal value.
But what does that have to do with sorting?
A sort routing will call that function for pairs of elements in your list, using the result of the function to figure out how to rearrange the items into a sorted list. The comparison function defines the "natural order", and the "sorting algorithm" defines the logic for calling and responding to the results of the comparison function.
Each algorithm is like a big-picture strategy for guaranteeing that ANY input will be correctly sorted. Here are a few of the algorithms that you'll probably want to know about:
Bubble Sort:
Iterate through the list, calling the comparison function for all adjacent pairs of elements. Whenever you get a result greater than zero (meaning that the first element is larger than the second one), swap the two values. Then move on to the next pair. When you get to the end of the list, if you didn't have to swap ANY pairs, then congratulations, the list is sorted! If you DID have to perform any swaps, go back to the beginning and start over. Repeat this process until there are no more swaps.
NOTE: this is usually not a very efficient way to sort a list, because in the worst cases, it might require you to scan the whole list as many as N times, for a list with N elements.
Merge Sort:
This is one of the most popular divide-and-conquer algorithms for sorting a list. The basic idea is that, if you have two already-sorted lists, it's easy to merge them. Just start from the beginning of each list and remove the first element of whichever list has the smallest starting value. Repeat this process until you've consumed all the items from both lists, and then you're done!
1 4 8 10
2 5 7 9
------------ becomes ------------>
1 2 4 5 7 8 9 10
But what if you don't have two sorted lists? What if you have just one list, and its elements are in random order?
That's the clever thing about merge sort. You can break any single list into smaller pieces, each of which is either an unsorted list, a sorted list, or a single element (which, if you thing about it, is actually a sorted list, with length = 1).
So the first step in a merge sort algorithm is to divide your overall list into smaller and smaller sub lists, At the tiniest levels (where each list only has one or two elements), they're very easy to sort. And once sorted, it's easy to merge any two adjacent sorted lists into a larger sorted list containing all the elements of the two sub lists.
NOTE: This algorithm is much better than the bubble sort method, described above, in terms of its worst-case-scenario efficiency. I won't go into a detailed explanation (which involves some fairly trivial math, but would take some time to explain), but the quick reason for the increased efficiency is that this algorithm breaks its problem into ideal-sized chunks and then merges the results of those chunks. The bubble sort algorithm tackles the whole thing at once, so it doesn't get the benefit of "divide-and-conquer".
Those are just two algorithms for sorting a list, but there are a lot of other interesting techniques, each with its own advantages and disadvantages: Quick Sort, Radix Sort, Selection Sort, Heap Sort, Shell Sort, and Bucket Sort.
The internet is overflowing with interesting information about sorting. Here's a good place to start:
http://en.wikipedia.org/wiki/Sorting_algorithms
Create a console application and paste this into the Program.cs as the body of the class.
public static void Main(string[] args)
{
string [] strList = "a,b,c,d,e,f,a,a,b".Split(new [] { ',' }, StringSplitOptions.RemoveEmptyEntries);
foreach(string s in strList.Sort())
Console.WriteLine(s);
}
public static string [] Sort(this string [] strList)
{
return strList.OrderBy(i => i).ToArray();
}
Notice that I do use a built in method, OrderBy. As other answers point out there are many different sort algorithms you could implement there and I think my code snippet does everything for you except the actual sort algorithm.
Some C# specific sorting tutorials
There is an entire area of study built around sorting algorithms. You may want to choose a simple one and implement it.
Though it won't be the most performant, it shouldn't take you too long to implement a bubble sort.
If you don't want to use build-in-functions, you have to create one by your self. I would recommend Bubble sort or some similar algorithm. Bubble sort is not an effective algoritm, but it get the works done, and is easy to understand.
You will find much good reading on wikipedia.
I would recommend doing a wiki for quicksort.
Still not sure why you don't want to use the built in sort?
Bubble sort damages the brain.
Insertion sort is at least as simple to understand and code, and is actually useful in practice (for very small data sets, and nearly-sorted data). It works like this:
Suppose that the first n items are already in order (you can start with n = 1, since obviously one thing on its own is "in the correct order").
Take the (n+1)th item in your array. Call this the "pivot". Starting with the nth item and working down:
- if it is bigger than the pivot, move it one space to the right (to create a "gap" to the left of it).
- otherwise, leave it in place, put the "pivot" one space to the right of it (that is, in the "gap" if you moved anything, or where it started if you moved nothing), and stop.
Now the first n+1 items in the array are in order, because the pivot is to the right of everything smaller than it, and to the left of everything bigger than it. Since you started with n items in order, that's progress.
Repeat, with n increasing by 1 at each step, until you've processed the whole list.
This corresponds to one way that you might physically put a series of folders into a filing cabinet in order: put one in; then put another one into its correct position by pushing everything that belongs after it over by one space to make room; repeat until finished. Nobody ever sorts physical objects by bubble sort, so it's a mystery to me why it's considered "simple".
All that's left now is that you need to be able to work out, given two strings, whether the first is greater than the second. I'm not quite sure what you mean by "alphabetical and length" : alphabetical order is done by comparing one character at a time from each string. If there not the same, that's your order. If they are the same, look at the next one, unless you're out of characters in one of the strings, in which case that's the one that's "smaller".
Use NSort
I ran across the NSort library a couple of years ago in the book Windows Developer Power Tools. The NSort library implements a number of sorting algorithms. The main advantage to using something like NSort over writing your own sorting is that is is already tested and optimized.
Posting link to fast string sort code in C#:
http://www.codeproject.com/KB/cs/fast_string_sort.aspx
Another point:
The suggested comparator above is not recommended for non-English languages:
int CompareStrings(String a, String b) {
if (a < b) return -1;
else if (a > b)
return 1; else
return 0; }
Checkout this link for non-English language sort:
http://msdn.microsoft.com/en-us/goglobal/bb688122
And as mentioned, use nsort for really gigantic arrays that don't fit in memory.

Categories

Resources