Related
More Details:
For this problem, I'm specifically looking for the fastest way to do this, in general and specifically in c#. I don't necessarily mean "theoretical" fastest/algorithmic, instead I'm looking for practical implementation speed. In this specific situation, the arrays only have like 1000 elements each, which seems very small, but this computation is going to be running very rapidly and comparing many arrays(it blows up in size very quickly). I ultimately need the indexes of each element that is different.
I can obviously do a very simple implementation like:
public List<int> FindDifferences(List<double> Original,List<double> NewList)
{
List<int> Changes = new List<int>();
for(int i=0;i<Original.Count;i++)
{
if(Original[i]!=NewList[i])
{
Changes.Add(i);
}
}
return Changes;
}
But from what I can see, this will be really slow overall since it has to iterate once though each item on the list. Is there anything I can do to speed this up? Specifically, is there a way to do something like a parallel foreach that generates a list of the indexes of changes? I saw what I think was a similar question asked before, but I didn't quite understand the answer .Or would there be another way to run the calculation on all items of the list simultaneously(or somehow clustered)?
Assumptions
Each array or list being compared contains data of the same
type(double int or string), so if array1 holds strings and is
compared to array2, I know for certain that array2 will only hold
strings and it will be of the same size(in terms of item count-I can
see if maybe they are the same byte count too if that could come
into play).
The vast majority of the items in these comparisons will remain the same. My resultant "differences" list will probably only contain a few(1-10) items, if any.
Concerns
1) After a comparison is made(old and new list in the block above), the new list will overwrite the old list. If computation time is slower than the time it takes to receive a new message(a new list to compare), I can have a problem with collision:
Lets say I have three lists, A,B, and C. A would be my global/"current state" list. When a message is received containing a new list(B), it would be the list B would be compared to.
In an ideal world, A would be compared to B, I would receive a list of integers representing the indexes that contain elements different between the two. After the method computes and returns this index list, A would become B(the values of B overwrite the values of A as my "current state"). When I receive another message(C), this would be compared to my new current state(A, but with the values previously belonging to B), I'd receive the list of differences and C's values would overwrite A's and become the new current state. If the comparison between A and B is still calculating when C is received, I would need to make sure the new calculation either:
Doesn't happen until after A and B's comparison finish and A is overwritten with its new values. or
The comparison is instead made between B and C, with C overwriting A after the comparison finishes(the difference list is fired off elsewhere, so I'd still receive both change lists)
2) If this comparison between lists can't be sped up, is there somewhere else I can speed up instead? These messages I'm receiving come as an object with three values, an Ascii-encoded byte array, a long string(the already parsed byte array), and a "type"(the name of the list it corresponds to-so I know the data type of its contents). I currently ignore the byte array and parse the string by splitting it at newline characters.
I know this is inefficient, but I have trouble converting the byte array into ints or doubles. The doubles because it has a lot of "noise"(a value of 1.50 could end up coming in as 1.4976789, so I actually have to round it to get its "real" value). The ints because there is no 0 padding, so I don't know the length to chunk the byte array into. Below is an example of what I'm doing:
public List<string> ListFromString(string request)
{
List<string> fulllist = request.Split('\n').ToList<string>();
return fulllist.GetRange(1, fulllist.Count - 2); //There's always a label tacked on the beginning so I start from 1
}
public List<double> RequestListAsDouble(string request)
{
List<string> RequestAsString = ListFromString(request);
List<double> RequestListAsDouble = new List<double>();
foreach(string requestElement in RequestAsString)
{
double requestElementAsDouble = Math.Round(Double.Parse(requestElement),2);
RequestListAsDouble.Add(requestElementAsDouble);
}
return RequestListAsDouble;
}
Your single-threaded comparison of the two parsed lists is probably the fastest way to do it. It is certainly the easiest. As noted by another poster, you can get some speed advantage by pre-allocating the size of the "Changes" list to be some percentage of the size of your input list.
If you want to try parallel thread comparisons, you should setup "N" number of threads in advance and have them wait for a starting event. "N" is the number of real processors on your system. Each thread should compare a portion of the lists, and write their answers to the interlocked output list "Changes". On completion, the threads go back to sleep, waiting for the next starting event.
When all the threads have gone back to their starting positions, the main thread can pick up the "Changes" and pass it along. Repeat with the next list
Be sure to clean up all the worker threads when your application is supposed to exit - or it won't exit.
There is a lot of overhead in starting and ending threads. It is all too easy to lose all the processing speed from that overhead. That's why you would want a pool of worker threads already setup and waiting on an event flag. Threads only improve processing speed up to the number of real CPUs in the system.
A small optimization would be to initialize the results list with the capacity of the original
https://msdn.microsoft.com/en-us/library/4kf43ys3(v=vs.110).aspx
If the size of the collection can be estimated, using the
List(Int32) constructor and specifying the initial capacity
eliminates the need to perform a number of resizing operations while
adding elements to the List.
List<int> Changes = new List<int>(Original.Length);
.NET 4.5.1
I have a "bunch" of Int16 values that fit in a range from -4 to 32760. The numbers in the range are not consecutive, but they are ordered from -4 to 32760. In other words, the numbers from 16-302 are not in the "bunch", but numbers 303-400 are in there, number 2102 is not there, etc.
What is the all-out fastest way to determine if a particular value (eg 18400) is in the "bunch"? Right now it is in an Int16[] and the Linq Contains method is used to determine if a value is in the array, but if anyone can say why/how a different structure would deliver a single value faster I would appreciate it. Speed is the key for this lookup (the "bunch" is a static property on a static class).
Sample code that works
Int16[] someShorts = new[] { (short)4 ,(short) 5 , (short)6};
var isInIt = someShorts.Contains( (short)4 );
I am not sure if that is the most performant thing that can be done.
Thanks.
It sounds like you really want BitArray - just offset the value by 4 so you've got a range of [0, 32764] and you should be fine.
That will allocate an array which is effectively 4K in size (32764 / 8), with one bit per value in the array. It will handle finding the relevant element in the array, and applying bit masking. (I don't know whether it uses a byte[] internally or something else.)
This is a potentially less compact representation than storing ranges, but the only cost involved in getting/setting a bit will be computing an index (basically a shift), getting the relevant bit of memory to the CPU, and then bit masking. It takes 1/8th the size of a bool[], making your CPU cache usage more efficient.
Of course, if this is really a performance bottleneck for you, you should compare both this solution and a bool[] approach in your real application - microbenchmarks aren't nearly as important here as how your real app behaves.
Make one bool for each possible value:
var isPresentItems = new bool[32760-(-4)+1];
Set the corresponding element to true if the given item is present in the set. Lookup is easy:
var isPresent = isPresentItems[myIndex];
Can't be done any faster. The bools will fit into L1 or L2 cache.
I advise against using BitArray because it stores multiple values per byte. This means that each access is slower. Bit-arithmetic is required.
And if you want insane speed, don't make LINQ call a delegate once for each item. LINQ is not the first choice for performance-critical code. Many indirections that stall the CPU.
If you want to optimize for lookup time, pick a data structure with O(1) (constant-time) lookups. You have several choices since you only care about set membership, and not sorting or ordering.
A HashSet<Int16> will give this to you, as will a BitArray indexed on max - min + 1. The absolute fastest ad-hoc solution would probably be a simple array indexed on max - min + 1, as #usr suggests. Any of these should be plenty "fast enough". The HashSet<Int16> will probably use the most memory, as the size of the internal hash table is an implementation detail. BitArray would be the most space efficient out of these options.
If you only have a single lookup, then memory should not be a concern, and I suggest first going with a HashSet<Int16>. That solution is easy to reason about and deal with in a bug-free manner, as you don't have to worry about staying within array boundaries; you can simply check set.Contains(n). This is particularly useful if your value range might change in the future. You can fall back to one of the other solutions if you need to optimize further for speed or performance.
One option is to use the HashSet. To find if the value is in it, it is a O(1) operation
The code example:
HashSet<Int16> evenNumbers = new HashSet<Int16>();
for (Int16 i = 0; i < 20; i++)
{
evenNumbers.Add(i);
}
if (evenNumbers.Contains(0))
{
/////
}
Because the numbers are sorted, I would loop through the list one time and generate a list of Range objects that have a start and end number. That list would be much smaller than having a list or dictionary of thousands of numbers.
If your "bunch" of numbers can be identified as a series of intervals, I suggest you use Interval Trees. An interval tree allows dynamic insertion/deletions and also searching if a an interval intersects any interval in the tree is O(log(n)) where n is the number of intervals in the tree. In your case the number of intervals would be way less than the number of ints and the search is much faster.
I apologize if there is a similar question already out there. There
are several questions about scoring hands but I don't need that.
The project I am working on takes in 10 cards and needs to report the
best possible 5-card hand found ("straight", "high card", "flush"
etc.). Luckily what the actual hand of cards is is irrelevant, I just
need a name.
I've already parsed and sorted all the cards out and have the tests
for all the possible hands laid out. All I need now is a convenient
way to store the hands. My mad method is as follows, in pseudocode
terms:
I want to have a dynamic list horizontally that I can populate with the NUMBER values of the cards, in order from highest to lowest. For
example, "Q J T 7 4 2 1". T is 10. Duplicates of values will be
ignored. Next, I want each of those values to have, underneath, a
list of the suits of each value that exist in the deck. For example,
J will have a sub-list with the values "D H" to represent that I have
a Jack of Diamonds and a Jack of Hearts.
I believe this to be the most elegant way to deal with these cards,
since most poker hands deal with only values and this way I don't have
to worry about cards of the same value in a row for say the straight
test. Then the two tests that do deal with suit can easily be
tested for by referring to the values under the keys.
Take a deep breath, almost there.
So an instance of Lookup appears to be perfect! It has the exact "one
key to multiple values" structure that I want. However, it doesn't
allow me to add the suits as I come to them. I have to add them all
at once or not at all since the lists are immutable after entry.
So I either
have to find all the suits at once before I even make the Lookup
Somehow add values to the Lookup lists or
Use something else.
Any ideas on any of these?
UPDATE
TL;DR SPARKNOTES VERSION: How can I add more values to the keys inside of a Lookup?
*IMOPRTANT NOTE:*The output of this program should be a string containing the name of the highest hand possible, for example "four of a kind", "two pair" or "high card."
I found one solution (which I unfortunately lost the link to and can't find again) where they suggested re-creating the entire Lookup with the new list. It may just be me but I find that solution to be very... ugly... Anyway several other solutions I have explored or tested are to:
METHOD 1
Roll through and populate another array with the suits associated with each value. Basically (in actual pseudocode this time >_>):
Create an array of ArrayLists (array 1)
Iterate through a sorted string array of "cards" (array 2)
For each card:
Take the char at string index [1] (representing the suit) and add into the ArrayList in array 1 at the index number extracted from string index [0].
This way I have the list of values with associated suits that I wanted. And the list of suits is the minimal size to boot, making iteration through that easier later. With some extra steps I can even make the umbrella array an ArrayList and populate it with the card values in order so there are no gaps and no duplicate numbers. This will leave me with a jagged array of what I want. To be clear, this is not a homework assignment. However, it IS from a coding class project my roommate completed in the past which is why I have the constrictions and requirements I have. Someone else I asked told me SE gets plagued by these kinds of homework questions around this time, so I understand your skepticism. This is a personal project because I want to learn C# (all I know is Java right now, and I like the sound of parameter pointer passing in C# methods, which Java does not do).
If it WERE for a grade I would end there because it works. But I don't really like arraylists of arraylists, they seem messy to me. So I want to know if there is another method.
METHOD 2
I also considered simply dealing with duplicates that inevitably appear. For example, here is my test for a straight:
for (int i = 0; i < 5; i++)
{
int counter = 0;
for (int j = i; j < i + 4; j++)
{
int secondCard = getValue(cardsArray[j + 1]);
int firstCard = getValue(cardsArray[j]);
if (secondCard == firstCard)
{
break;
}
if (secondCard == (firstCard + 1))
{
counter++;
if (counter == 4)
{
isStraight = true;
return "straight";
}
}
}
}
This code does not work. It needs some tweaks somewhere or other to work completely but I want to analyze if it is worth it before I try to fix it. It DOES accurately test for a straight, though. Also a couple notes: firstCard and secondCard are there for readability and debug purposes, and isStraight is there so that I don't reinvent the wheel later when I test for a straight flush.
This nested loop will iterate through all the cards up till the 5th card (since you can't have a straight out of ten sorted cards with less than 5 cards) and then check the next five cards as you would expect. If during this iteration I encounter a duplicate entry it means that it's the same card of another suit and I simply "break". What SHOULD happen as a result of this one statement is that now we have incremented our second iteration by one to check the next card instead of the current one. The count of in-order cards that we have will stay the same so that a list like " 1D 2D 3S 3H 4D 5C" will skip over the second 3 when finding the straight. Despite the break I was actually quite pleased with the elegance of this solution, whether I had a right to be or not.
It all goes back to the flaws of using a simple array of strings ("cards"), which is what my code is tailored to right now. And I hate fixing issues, I'd rather avoid them. Maybe I'm being unnecessarily picky but I'm learning along the way.
METHOD 3
My consideration of the weaknesses of an array of strings lead me to Dictionaries, which looked attractive. It can easily be made to hold my values in order, and easy to find if I have a certain suit for a key (TryGet), all in a neat, tailor-made package. Creating multiple array lists and doing things like "(find index of my value); array1[index].Add(value)" would be replaced by "Dictionary.Add(value, suit)". But I can only add a suit to a key at the point of creation. I couldn't make a "2" key and add "S" and then when I find out the next card is a "2D" add a D under the "2" key. Dictionary just doesn't support that, or even adding multiple values at all. I can make a dictionary of lists, but I still can't edit the list since Dictionaries are mostly query data structures. Lookups support multiple values per key but still cannot be changed after the initial "Add()". Again I could "re-create" the entire lookup or dictionary to add a suit and keep everything in order. But to me that seems like rebuilding the whole bridge because this one cable is too long and I don't have an industrial able cutter. It's a problem that SHOULD have an easier solution, like maybe go and GET some cutters (import a class maybe?).
CONCLUSION
Since you suggest that my needs are no different than what a hand scoring system could deliver leads me to another question:
Are hand scores directly tied to certain hands? Like I mentioned earlier the result I want is "The best hand you can make is a full house" not "This player has the highest hand." So can I calculate the highest scoring hand and extrapolate a "full house" from that score? If so then I guess this is all unnecessary code, but I would kind of like to solve this anyway in that case.
As I wrote this edit it dawned on me that this is basically a vanity issue. I don't "like" the solution I have. I also don't want to use the accepted solution (table lookup) because that is not a coding project that is a copypaste project. I would greatly appreciate any input.
Let's do this the simplest way possible. First, say you have an array of 10 strings, each of which is the name of a card. Like "Four of Hearts" and "Queen of Spades". That's really inconvenient to work with. So the first thing we do is convert those strings to numbers to represent each card. A very convenient way to do it is to use numbers 0-12 for hearts, 13-25 for diamonds, etc. So you have code (possibly a lookup table) that converts names to numbers:
Ace of Hearts = 0
Two of Hearts = 1
Three of Hearts = 2
...
Queen of Hearts = 11
King of Hearts = 12
Ace of Diamonds = 13
...
...
Ace of Clubs = 26
...
...
Kind of Spades = 51
So you have an array of numbers that represents the 10 cards. Call it cardsArray:
int[] cardsArray = new int[10];
// here, fill the cards array from the input
It's easy to check for flushes if you sort by suit and value. Remember, there are only 10 cards, so sorting isn't going to take a huge amount of time. The sort is really easy:
int[] sortedBySuit =
cardsArray
.OrderBy(x => x/13) // sorts by suit
.ThenByDescending(x => x % 13) // then by value, descending
.ToArray();
You can then go sequentially through the array and determine if you have a flush, straight flush, and what the high card in the flush (if any) is.
You have to save that information, because four-of-a-kind beats a flush, for example. So you need to check that, too.
Next, sort by value:
int[] sortedByValue =
cardsArray
.OrderByDescending(x => x % 13)
.ToArray();
Now you can go sequentially through that list to determine high card, pairs, three of a kind, four of a kind, or straights. As you find each type of hand, you save that hand information ("king high straight" or "three tens", along with the hand's value [1 for high card, 2 for pair, straight, flush, full house, etc. in the proper order]) to a list.
Then you just pick the hand with the highest value from those that you found.
That's definitely not the fastest way to do things, but it's simple, uses very little memory, and is fast enough for a prototype. It's certainly simpler than using a dictionary or array of arrays, etc.
To be clear, I didn't read your novel.
TL;DR SPARKNOTES VERSION: How can I add more values to the keys inside of a Lookup?
Usually, when I need a "dictionary" with a key that has multiple values, I use a List<KeyValuePair<string, int>>. You could use LINQ to Objects to select all the values. For example:
static void StackOverflowExample()
{
var cardList = new List<KeyValuePair<string,int>> ()
{
new KeyValuePair<string, int>("Club", 8),
new KeyValuePair<string, int>("Spade", 9),
new KeyValuePair<string, int>("Heart", 10)
};
var results = cardList.Where(p => p.Key == "Heart");
}
var results is an IEnumerable<KeyValuePair<string,int>>. Hopefully, this helps.
I have built an application that is used to simulate the number of products that a company can produce in different "modes" per month. This simulation is used to aid in finding the optimal series of modes to run in for a month to best meet the projected sales forecast for the month. This application has been working well, until recently when the plant was modified to run in additional modes. It is now possible to run in 16 modes. For a month with 22 work days this yields 9,364,199,760 possible combinations. This is up from 8 modes in the past that would have yielded a mere 1,560,780 possible combinations. The PC that runs this application is on the old side and cannot handle the number of calculations before an out of memory exception is thrown. In fact the entire application cannot support more than 15 modes because it uses integers to track the number of modes and it exceeds the upper limit for an integer. Baring that issue, I need to do what I can to reduce the memory utilization of the application and optimize this to run as efficiently as possible even if it cannot achieve the stated goal of 16 modes. I was considering writing the data to disk rather than storing the list in memory, but before I take on that overhead, I would like to get people’s opinion on the method to see if there is any room for optimization there.
EDIT
Based on a suggestion by few to consider something more academic then merely calculating every possible answer, listed below is a brief explanation of how the optimal run (combination of modes) is chosen.
Currently the computer determines every possible way that the plant can run for the number of work days that month. For example 3 Modes for a max of 2 work days would result in the combinations (where the number represents the mode chosen) of (1,1), (1,2), (1,3), (2,2), (2,3), (3,3) For each mode a product produces at a different rate of production, for example in mode 1, product x may produce at 50 units per hour where product y produces at 30 units per hour and product z produces at 0 units per hour. Each combination is then multiplied by work hours and production rates. The run that produces numbers that most closely match the forecasted value for each product for the month is chosen. However, because some months the plant does not meet the forecasted value for a product, the algorithm increases the priority of a product for the next month to ensure that at the end of the year the product has met the forecasted value. Since warehouse space is tight, it is important that products not overproduce too much either.
Thank you
private List<List<int>> _modeIterations = new List<List<int>>();
private void CalculateCombinations(int modes, int workDays, string combinationValues)
{
List<int> _tempList = new List<int>();
if (modes == 1)
{
combinationValues += Convert.ToString(workDays);
string[] _combinations = combinationValues.Split(',');
foreach (string _number in _combinations)
{
_tempList.Add(Convert.ToInt32(_number));
}
_modeIterations.Add(_tempList);
}
else
{
for (int i = workDays + 1; --i >= 0; )
{
CalculateCombinations(modes - 1, workDays - i, combinationValues + i + ",");
}
}
}
This kind of optimization problem is difficult but extremely well-studied. You should probably read up in the literature on it rather than trying to re-invent the wheel. The keywords you want to look for are "operations research" and "combinatorial optimization problem".
It is well-known in the study of optimization problems that finding the optimal solution to a problem is almost always computationally infeasible as the problem grows large, as you have discovered for yourself. However, it is frequently the case that finding a solution guaranteed to be within a certain percentage of the optimal solution is feasible. You should probably concentrate on finding approximate solutions. After all, your sales targets are already just educated guesses, therefore finding the optimal solution is already going to be impossible; you haven't got complete information.)
What I would do is start by reading the wikipedia page on the Knapsack Problem:
http://en.wikipedia.org/wiki/Knapsack_problem
This is the problem of "I've got a whole bunch of items of different values and different weights, I can carry 50 pounds in my knapsack, what is the largest possible value I can carry while meeting my weight goal?"
This isn't exactly your problem, but clearly it is related -- you've got a certain amount of "value" to maximize, and a limited number of slots to pack that value into. If you can start to understand how people find near-optimal solutions to the knapsack problem, you can apply that to your specific problem.
You could process the permutation as soon as you have generated it, instead of collecting them all in a list first:
public delegate void Processor(List<int> args);
private void CalculateCombinations(int modes, int workDays, string combinationValues, Processor processor)
{
if (modes == 1)
{
List<int> _tempList = new List<int>();
combinationValues += Convert.ToString(workDays);
string[] _combinations = combinationValues.Split(',');
foreach (string _number in _combinations)
{
_tempList.Add(Convert.ToInt32(_number));
}
processor.Invoke(_tempList);
}
else
{
for (int i = workDays + 1; --i >= 0; )
{
CalculateCombinations(modes - 1, workDays - i, combinationValues + i + ",", processor);
}
}
}
I am assuming here, that your current pattern of work is something along the lines
CalculateCombinations(initial_value_1, initial_value_2, initial_value_3);
foreach( List<int> list in _modeIterations ) {
... process the list ...
}
With the direct-process-approach, this would be
private void ProcessPermutation(List<int> args)
{
... process ...
}
... somewhere else ...
CalculateCombinations(initial_value_1, initial_value_2, initial_value_3, ProcessPermutation);
I would also suggest, that you try to prune the search tree as early as possible; if you can already tell, that certain combinations of the arguments will never yield something, which can be processed, you should catch those already during generation, and avoid the recursion alltogether, if this is possible.
In new versions of C#, generation of the combinations using an iterator (?) function might be usable to retain the original structure of your code. I haven't really used this feature (yield) as of yet, so I cannot comment on it.
The problem lies more in the Brute Force approach that in the code itself. It's possible that brute force might be the only way to approach the problem but I doubt it. Chess, for example, is unresolvable by Brute Force but computers play at it quite well using heuristics to discard the less promising approaches and focusing on good ones. Maybe you should take a similar approach.
On the other hand we need to know how each "mode" is evaluated in order to suggest any heuristics. In your code you're only computing all possible combinations which, anyway, will not scale if the modes go up to 32... even if you store it on disk.
if (modes == 1)
{
List<int> _tempList = new List<int>();
combinationValues += Convert.ToString(workDays);
string[] _combinations = combinationValues.Split(',');
foreach (string _number in _combinations)
{
_tempList.Add(Convert.ToInt32(_number));
}
processor.Invoke(_tempList);
}
Everything in this block of code is executed over and over again, so no line in that code should make use of memory without freeing it. The most obvious place to avoid memory craziness is to write out combinationValues to disk as it is processed (i.e. use a FileStream, not a string). I think that in general, doing string concatenation the way you are doing here is bad, since every concatenation results in memory sadness. At least use a stringbuilder (See back to basics , which discusses the same issue in terms of C). There may be other places with issues, though. The simplest way to figure out why you are getting an out of memory error may be to use a memory profiler (Download Link from download.microsoft.com).
By the way, my tendency with code like this is to have a global List object that is Clear()ed rather than having a temporary one that is created over and over again.
I would replace the List objects with my own class that uses preallocated arrays to hold the ints. I'm not really sure about this right now, but I believe that each integer in a List is boxed, which means much more memory is used than with a simple array of ints.
Edit: On the other hand it seems I am mistaken: Which one is more efficient : List<int> or int[]
I have a list of input words separated by comma. I want to sort these words by alphabetical and length. How can I do this without using the built-in sorting functions?
Good question!! Sorting is probably the most important concept to learn as an up-and-coming computer scientist.
There are actually lots of different algorithms for sorting a list.
When you break all of those algorithms down, the most fundamental operation is the comparison of two items in the list, defining their "natural order".
For example, in order to sort a list of integers, I'd need a function that tells me, given any two integers X and Y whether X is less than, equal to, or greater than Y.
For your strings, you'll need the same thing: a function that tells you which of the strings has the "lesser" or "greater" value, or whether they're equal.
Traditionally, these "comparator" functions look something like this:
int CompareStrings(String a, String b) {
if (a < b)
return -1;
else if (a > b)
return 1;
else
return 0;
}
I've left out some of the details (like, how do you compute whether a is less than or greater than b? clue: iterate through the characters), but that's the basic skeleton of any comparison function. It returns a value less than zero if the first element is smaller and a value greater than zero if the first element is greater, returning zero if the elements have equal value.
But what does that have to do with sorting?
A sort routing will call that function for pairs of elements in your list, using the result of the function to figure out how to rearrange the items into a sorted list. The comparison function defines the "natural order", and the "sorting algorithm" defines the logic for calling and responding to the results of the comparison function.
Each algorithm is like a big-picture strategy for guaranteeing that ANY input will be correctly sorted. Here are a few of the algorithms that you'll probably want to know about:
Bubble Sort:
Iterate through the list, calling the comparison function for all adjacent pairs of elements. Whenever you get a result greater than zero (meaning that the first element is larger than the second one), swap the two values. Then move on to the next pair. When you get to the end of the list, if you didn't have to swap ANY pairs, then congratulations, the list is sorted! If you DID have to perform any swaps, go back to the beginning and start over. Repeat this process until there are no more swaps.
NOTE: this is usually not a very efficient way to sort a list, because in the worst cases, it might require you to scan the whole list as many as N times, for a list with N elements.
Merge Sort:
This is one of the most popular divide-and-conquer algorithms for sorting a list. The basic idea is that, if you have two already-sorted lists, it's easy to merge them. Just start from the beginning of each list and remove the first element of whichever list has the smallest starting value. Repeat this process until you've consumed all the items from both lists, and then you're done!
1 4 8 10
2 5 7 9
------------ becomes ------------>
1 2 4 5 7 8 9 10
But what if you don't have two sorted lists? What if you have just one list, and its elements are in random order?
That's the clever thing about merge sort. You can break any single list into smaller pieces, each of which is either an unsorted list, a sorted list, or a single element (which, if you thing about it, is actually a sorted list, with length = 1).
So the first step in a merge sort algorithm is to divide your overall list into smaller and smaller sub lists, At the tiniest levels (where each list only has one or two elements), they're very easy to sort. And once sorted, it's easy to merge any two adjacent sorted lists into a larger sorted list containing all the elements of the two sub lists.
NOTE: This algorithm is much better than the bubble sort method, described above, in terms of its worst-case-scenario efficiency. I won't go into a detailed explanation (which involves some fairly trivial math, but would take some time to explain), but the quick reason for the increased efficiency is that this algorithm breaks its problem into ideal-sized chunks and then merges the results of those chunks. The bubble sort algorithm tackles the whole thing at once, so it doesn't get the benefit of "divide-and-conquer".
Those are just two algorithms for sorting a list, but there are a lot of other interesting techniques, each with its own advantages and disadvantages: Quick Sort, Radix Sort, Selection Sort, Heap Sort, Shell Sort, and Bucket Sort.
The internet is overflowing with interesting information about sorting. Here's a good place to start:
http://en.wikipedia.org/wiki/Sorting_algorithms
Create a console application and paste this into the Program.cs as the body of the class.
public static void Main(string[] args)
{
string [] strList = "a,b,c,d,e,f,a,a,b".Split(new [] { ',' }, StringSplitOptions.RemoveEmptyEntries);
foreach(string s in strList.Sort())
Console.WriteLine(s);
}
public static string [] Sort(this string [] strList)
{
return strList.OrderBy(i => i).ToArray();
}
Notice that I do use a built in method, OrderBy. As other answers point out there are many different sort algorithms you could implement there and I think my code snippet does everything for you except the actual sort algorithm.
Some C# specific sorting tutorials
There is an entire area of study built around sorting algorithms. You may want to choose a simple one and implement it.
Though it won't be the most performant, it shouldn't take you too long to implement a bubble sort.
If you don't want to use build-in-functions, you have to create one by your self. I would recommend Bubble sort or some similar algorithm. Bubble sort is not an effective algoritm, but it get the works done, and is easy to understand.
You will find much good reading on wikipedia.
I would recommend doing a wiki for quicksort.
Still not sure why you don't want to use the built in sort?
Bubble sort damages the brain.
Insertion sort is at least as simple to understand and code, and is actually useful in practice (for very small data sets, and nearly-sorted data). It works like this:
Suppose that the first n items are already in order (you can start with n = 1, since obviously one thing on its own is "in the correct order").
Take the (n+1)th item in your array. Call this the "pivot". Starting with the nth item and working down:
- if it is bigger than the pivot, move it one space to the right (to create a "gap" to the left of it).
- otherwise, leave it in place, put the "pivot" one space to the right of it (that is, in the "gap" if you moved anything, or where it started if you moved nothing), and stop.
Now the first n+1 items in the array are in order, because the pivot is to the right of everything smaller than it, and to the left of everything bigger than it. Since you started with n items in order, that's progress.
Repeat, with n increasing by 1 at each step, until you've processed the whole list.
This corresponds to one way that you might physically put a series of folders into a filing cabinet in order: put one in; then put another one into its correct position by pushing everything that belongs after it over by one space to make room; repeat until finished. Nobody ever sorts physical objects by bubble sort, so it's a mystery to me why it's considered "simple".
All that's left now is that you need to be able to work out, given two strings, whether the first is greater than the second. I'm not quite sure what you mean by "alphabetical and length" : alphabetical order is done by comparing one character at a time from each string. If there not the same, that's your order. If they are the same, look at the next one, unless you're out of characters in one of the strings, in which case that's the one that's "smaller".
Use NSort
I ran across the NSort library a couple of years ago in the book Windows Developer Power Tools. The NSort library implements a number of sorting algorithms. The main advantage to using something like NSort over writing your own sorting is that is is already tested and optimized.
Posting link to fast string sort code in C#:
http://www.codeproject.com/KB/cs/fast_string_sort.aspx
Another point:
The suggested comparator above is not recommended for non-English languages:
int CompareStrings(String a, String b) {
if (a < b) return -1;
else if (a > b)
return 1; else
return 0; }
Checkout this link for non-English language sort:
http://msdn.microsoft.com/en-us/goglobal/bb688122
And as mentioned, use nsort for really gigantic arrays that don't fit in memory.