How to search through words or numbers quickly c# - c#

I'm sure something like this exists, but I don't know what it would be called (or how to find more info on it). If I have an alphabetically sorted list of words, and I'm checking to see if and where the word "test" is in that list, it doesn't make sense to start at the beginning, but to start in the T's, right? And the same for numbers, of course. Is there a way to implement something like this and tailor the the start of the search? Or do hash sets and methods like Contain already do this by themselves?
EDIT:
For example, if I have a list of integers like {1,2,3,5,7,8,9,23..}, is there any automatic way to sort it so that when I check the list for the element "9", it doesn't begin from the beginning...?
Sorry, this is a simple example, but I do intend to search thousands of times through a list that potentially contains thousands of elements
EDIT 2:
From the replies, I learned about Binary search, but since that apparently starts in the middle of your list, is it possible to implement something manually, along the lines of, for example, splitting a list of words into 26 bins such that when you search for a particular word, it can immediately start searching in the best place (or maybe 52 bins if each bin starts to become overpopulated...)

When you say you have a sorted list and you want to search it, the algorithm that immediately jumps to my mind is a binary search. Fortunately List<T> already has that implemented.
The example on that link actually looks to do exactly what you want (it's dealing with finding a word in a list of sorted words too).
In essence, you want something like this:
List<string> words = ...;
words.Sort(); // or not depending on the source
var index = words.BinarySearch("word");
if(index > -1)
{
// word was found, and its index is stored in index
}
else // you may or may not want this part
{ // this will insert the word into the list, so that you don't have to re-sort it.
words.Insert(~index, "word");
}
This, of course, also works with ints. Simply replace List<string> with List<int> and your BinarySearch argument with an int.
Most Contains-type functions simply loop through the collection until coming across the item you're looking for. That works great in that you don't have to sort the collection first, but it's not so nice when you start off with it sorted. So in most cases, if you're searching the same list a lot, sort it and BinarySearch it, but if you're modifying the list a lot and only searching once or twice, a regular IndexOf or Contains will likely be your best bet.
If you're looking to group words by their first letter, I would probably use a Dictionary<char, List<string>> to store them. I chose List over an array for the purposes of mutability, so make that call on your own--there's also Array.BinarySearch if you choose to use an array. You could get into a proprietary tree model, but that may or may not be overkill. To do the dictionary keyed by first character, you'll want something like this:
Dictionary<char, List<string>> GetDict(IEnumerable<string> args)
{
return args.GroupBy(c => c[0]).ToDictionary(c => c.Key, c => c.OrderBy(x => x).ToList());
}
Then you can use it pretty simply, similarly to before. The only change would reside in the fetching of your list.
Dictionary<char, List<string>> wordsByKey = GetDict(words);
List<string> keyed;
string word = "word";
if (wordsByKey.TryGetValue(word[0], out keyed))
{
// same as before
}
else
{
wordsByKey.Add(word[0], new List<string>() { word }); // or not, again
// depending on whether you
// want the list to update.
}

When list is sorted, then you are looking for BinarySearch: http://msdn.microsoft.com/pl-pl/library/3f90y839%28v=vs.110%29.aspx. The complexity is O(log n) against O(n) in simple Contains.
List<string> myList = GetList();
string elementToSearch = "test";
if (myList.Contains(elementToSearch))
{
// found, O(n), works on unsorted list
}
if (myList.BinarySearch(elementToSearch)) >= 0)
{
// found, O(log n), works only on sorted list
}
To claryify: What is the difference between Linear search and Binary search?
To your edit:
If your input collection is not sorted, you should use Contains or IndexOf due to mentioned O(n) time. It will loop your collection once. Sorting collection is less efficient - it takes O(n log n). S it's not efficient to sort it in order to search one element.
Some sample to realize the pefromance:
var r = new Random();
var list = new List<int>();
for (var i = 1; i < 10000000; i++)
{
list.Add(r.Next());
}
// O (log n) - we assume that list is sorted, so sorting is pefromed outside watch
var sortedList = new List<int>(list);
sortedList.Sort();
var elementToSearch = sortedList.Last();
var watcher = new Stopwatch();
watcher.Start();
sortedList.BinarySearch(elementToSearch);
watcher.Stop();
Console.WriteLine("BinarySearch on already sorted: {0} ms",
watcher.Elapsed.TotalMilliseconds);
// O(n) - simple search
elementToSearch = list.Last();
watcher.Reset();
watcher.Start();
list.IndexOf(elementToSearch);
watcher.Stop();
Console.WriteLine("IndexOf on unsorted: {0} ms",
watcher.Elapsed.TotalMilliseconds);
// O(n log n) + O (log n)
watcher.Reset();
watcher.Start();
list.Sort();
elementToSearch = list.Last();
list.BinarySearch(elementToSearch);
watcher.Stop();
Console.WriteLine("Sort + binary search on unsorted: {0} ms"
, watcher.Elapsed.TotalMilliseconds);
Console.ReadKey();
Result:
BinarySearch on already sorted: 0.0248 ms
IndexOf on unsorted: 6.144 ms
Sort + binary search on unsorted: 1157.3298 ms
Edit to edit 2:
I think you are looking for BucketSort rather:
You can implement it by own, but I think that solution with Dictionary of Matthew Haugen is simpler and faster to implement :)

Related

Compare 2 big string lists

I have two lists of strings - this is not mandatory, i can convert them to any collection (list, dictionary, etc).
First is "text":
Birds sings
Dogs barks
Frogs jumps
Second is "words":
sing
dog
cat
I need to iterate through "text" and if line contains any of "words" - do one thing and if not another thing.
Important: yes, in my case i need to find partial match ignoring case, like text "Dogs" is a match for word "dog". This is why i use .Contains and .ToLower().
My naive try looks like this:
List<string> text = new List<string>();
List<string> words = new List<string>();
foreach (string line in text)
{
bool found = false;
foreach (string word in words)
{
if (line.ToLower().Contains(word.ToLower()))
{
;// one thing
found = true;
break;
}
}
if (!found)
;// another
}
Problem in size - 8000 in first list and ~50000 in second. This takes too many time.
How to make it faster?
I'm assuming that you only want to match on the specific words in your text list: that is, if text contains "dogs", and words contains "dog", then that shouldn't be a match.
Note that this is different to what your code currently does.
Given this, we can construct a HashSet<string> of all of the words in your text list. We can then query this very cheaply.
We'll also use StringComparer.OrdinalIgnoreCase to do our comparisons. This is a better way of doing a case-insensitive match than ToLower(), and ordinal comparisons are relatively cheap. If you're dealing with languages other than English, you'll need to consider whether you actually need StringComparer.CurrentCultureIgnoreCase or StringComparer.InvariantCultureIgnoreCase.
var textWords = new HashSet<string>(StringComparer.OrdinalIgnoreCase);
foreach (var line in text)
{
var lineWords = line.Split(' ');
textWords.UnionWith(lineWords);
}
if (textWords.Overlaps(words))
{
// One thing
}
else
{
// Another
}
If this is not the case, and you do want to do a .Contains on each, then you can speed it up a bit by avoiding the calls to .ToLower(). Each call to .ToLower() creates a new string in memory, so you're creating two new, useless objects per comparison.
Instead, use:
if (line.IndexOf(word, StringComparison.OrdinalIgnoreCase) >= 0)
As above, you might have to use StringComparison.CurrentCultureIgnoreCase or StringComparison.InvariantCultureIgnoreCase depending on the language of your strings. However, you should see a significant speedup if your strings are entirely ASCII and you use OrdinalIgnoreCase as this makes the string search a lot quicker.
If you're using .NET Framework, another thing to try is moving to .NET Core. .NET Core introduced a lot of optimizations in this area, and you might find that it's quicker.
Another thing you can do is see if you have duplicates in either text or words. If you have a lot, you might be able to save a lot of time. Consider using a HashSet<string> for this, or linq's .Distinct() (you'll need to see which is quicker).
You can try using LINQ for the second looping construct.
List<string> text = new List<string>();
List<string> words = new List<string>();
foreach (string line in text)
{
bool found = words.FirstOrDefault(w=>line.ToLower().Contains(w.ToLower()))!=null;
if (found)
{
//Do something
}
else
{
//Another
}
}
Might not be as fast as you want but it will be faster than before.
You can improve the search algorithm.
public static int Search(string word, List<string> stringList)
{
string wordCopy = word.ToLower();
List<string> stringListCopy = new List<string>();
stringList.ForEach(s => stringListCopy.Add(s.ToLower()));
stringListCopy.Sort();
int position = -1;
int count = stringListCopy.Count;
if (count > 0)
{
int min = 0;
int max = count - 1;
int middle = (max - min) / 2;
int comparisonStatus = 0;
do
{
comparisonStatus = string.Compare(wordCopy, stringListCopy[middle]);
if (comparisonStatus == 0)
{
position = middle;
break;
}
else if (comparisonStatus < 0)
{
max = middle - 1;
}
else
{
min = middle + 1;
}
middle = min + (max - min) / 2;
} while (min < max);
}
return position;
}
Inside this method we create copy of string list. All elements are lower case.
After that we sort copied list by ascending. This is crucial because the entire algorithm is based upon ascending sort.
If word exists in the list then the Search method will return its position inside list, otherwise it will return -1.
How the algorithm works?
Instead of checking every element in the list, we split the list in half in every iteration.
In every iteration we take the element in the middle and compare two strings (the element and our word). If out string is the same as the one in the middle then our search is finished. If our string is lexical before the string in the middle, then our string must be in the first half of the list, because the list is sorted by ascending. If our string is lexical after the string in the middle, then our string must be in the second half of the list, again because the list is sorted by ascending. Then we take the appropriate half and repeat the process.
In first iteration we take the entire list.
I've tested the Search method using these data:
List<string> stringList = new List<string>();
stringList.Add("Serbia");
stringList.Add("Greece");
stringList.Add("Egypt");
stringList.Add("Peru");
stringList.Add("Palau");
stringList.Add("Slovakia");
stringList.Add("Kyrgyzstan");
stringList.Add("Mongolia");
stringList.Add("Chad");
Search("Serbia", stringList);
This way you will search the entire list of ~50,000 elements in 16 iterations at most.

How to improve performance of this algorithm?

I have a text file with 100000 pairs: word and frequency.
test.in file with words:
1 line - total count of all word-frequency pairs
2 line to ~100 001 - word-frequency pairs
100 002 line - total count of user input words
from 100 003 to the end - user input words
I parse this file and put the words in
Dictionary<string,double> dictionary;
And I want to execute some search + order logic in the following code:
for(int i=0;i<15000;i++)
{
tempInputWord = //take data from file(or other sources)
var adviceWords = dictionary
.Where(p => p.Key.StartsWith(searchWord, StringComparison.Ordinal))
.OrderByDescending(ks => ks.Value)
.ThenBy(ks => ks.Key,StringComparer.Ordinal)
.Take(10)
.ToList();
//some output
}
The problem: This code must run in less than 10 seconds.
On my computer (core i5 2400, 8gb RAM) with Parallel.For() - about 91 sec.
Can you give me some advice how to increase performance?
UPDATE :
Hooray! We did it!
Thank you #CodesInChaos, #usr, #T_D and everyone who was involved in solving the problem.
The final code:
var kvList = dictionary.OrderBy(ks => ks.Key, StringComparer.Ordinal).ToList();
var strComparer = new MyStringComparer();
var intComparer = new MyIntComparer();
var kvListSize = kvList.Count;
var allUserWords = new List<string>();
for (int i = 0; i < userWordQuantity; i++)
{
var searchWord = Console.ReadLine();
allUserWords.Add(searchWord);
}
var result = allUserWords
.AsParallel()
.AsOrdered()
.Select(searchWord =>
{
int startIndex = kvList.BinarySearch(new KeyValuePair<string, int>(searchWord, 0), strComparer);
if (startIndex < 0)
startIndex = ~startIndex;
var matches = new List<KeyValuePair<string, int>>();
bool isNotEnd = true;
for (int j = startIndex; j < kvListSize ; j++)
{
isNotEnd = kvList[j].Key.StartsWith(searchWord, StringComparison.Ordinal);
if (isNotEnd) matches.Add(kvList[j]);
else break;
}
matches.Sort(intComparer);
var res = matches.Select(s => s.Key).Take(10).ToList();
return res;
});
foreach (var adviceWords in result)
{
foreach (var adviceWord in adviceWords)
{
Console.WriteLine(adviceWord);
}
Console.WriteLine();
}
6 sec (9 sec without manual loop (with linq)))
You are not at all using any algorithmic strength of the dictionary. Ideally, you'd use a tree structure so that you can perform prefix lookups. On the other hand you are within 3.7x of your performance goal. I think you can reach that by just optimizing the constant factor in your algorithm.
Don't use LINQ in perf-critical code. Manually loop over all collections and collect results into a List<T>. That turns out to give a major speed-up in practice.
Don't use a dictionary at all. Just use a KeyValuePair<T1, T2>[] and run through it using a foreach loop. This is the fastest possible way to traverse a set of pairs.
Could look like this:
KeyValuePair<T1, T2>[] items;
List<KeyValuePair<T1, T2>> matches = new ...(); //Consider pre-sizing this.
//This could be a parallel loop as well.
//Make sure to not synchronize too much on matches.
//If there tend to be few matches a lock will be fine.
foreach (var item in items) {
if (IsMatch(item)) {
matches.Add(item);
}
}
matches.Sort(...); //Sort in-place
return matches.Take(10); //Maybe matches.RemoveRange(10, matches.Count - 10) is better
That should exceed a 3.7x speedup.
If you need more, try stuffing the items into a dictionary keyed on the first char of Key. That way you can look up all items matching tempInputWord[0]. That should reduce search times by the selectivity that is in the first char of tempInputWord. For English text that would be on the order of 26 or 52. This is a primitive form of prefix lookup that has one level of lookup. Not pretty but maybe it is enough.
I think the best way would be to use a Trie data structure instead of a dictionary. A Trie data structure saves all the words in a tree structure. A node can represent all the words that start with the same letters. So if you look for your search word tempInputWord in a Trie you will get a node that represents all the words starting with tempInputWord and you just have to traverse through all the child nodes. So you just have one search operation. The link to the Wikipedia article also mentions some other advantages over hash tables (that's what an Dictionary is basically):
Looking up data in a trie is faster in the worst case, O(m) time
(where m is the length of a search string), compared to an imperfect
hash table. An imperfect hash table can have key collisions. A key
collision is the hash function mapping of different keys to the same
position in a hash table. The worst-case lookup speed in an imperfect
hash table is O(N) time, but far more typically is O(1), with O(m)
time spent evaluating the hash.
There are no collisions of different keys in a trie.
Buckets in a trie, which are analogous to hash table buckets that store key collisions, are necessary only if a single key is
associated with more than one value.
There is no need to provide a hash function or to change hash functions as more keys are added to a trie.
A trie can provide an alphabetical ordering of the entries by key.
And here are some ideas for creating a trie in C#.
This should at least speed up the lookup, however, building the Trie might be slower.
Update:
Ok, I tested it myself using a file with frequencies of english words that uses the same format as yours. This is my code which uses the Trie class that you also tried to use.
static void Main(string[] args)
{
Stopwatch sw = new Stopwatch();
sw.Start();
var trie = new Trie<KeyValuePair<string,int>>();
//build trie with your value pairs
var lines = File.ReadLines("en.txt");
foreach(var line in lines.Take(100000))
{
var split = line.Split(' ');
trie.Add(split[0], new KeyValuePair<string,int>(split[0], int.Parse(split[1])));
}
Console.WriteLine("Time needed to read file and build Trie with 100000 words: " + sw.Elapsed);
sw.Reset();
//test with 10000 search words
sw.Start();
foreach (string line in lines.Take(10000))
{
var searchWord = line.Split(' ')[0];
var allPairs = trie.Retrieve(searchWord);
var bestWords = allPairs.OrderByDescending(kv => kv.Value).ThenBy(kv => kv.Key).Select(kv => kv.Key).Take(10);
var output = bestWords.Aggregate("", (s1, s2) => s1 + ", " + s2);
Console.WriteLine(output);
}
Console.WriteLine("Time to process 10000 different searchWords: " + sw.Elapsed);
}
My results on a pretty similar machine:
Time needed to read file and build Trie with 100000 words: 00:00:00.7397839
Time to process 10000 different searchWords: 00:00:03.0181700
So I think you are doing something wrong that we cannot see. For example the way you measure the time or the way you read the file. As my results show this stuff should be really fast. The 3 seconds are mainly due to the Console output in the loop which I needed so that the bestWords variable is used. Otherwise the variable would have been optimized away.
Replace the dictionary by a List<KeyValuePair<string, decimal>>, sorted by the key.
For the search I use that a substring sorts directly before its prefixes with ordinal comparisons. So I can use a binary search to find the first candidate. Since the candidates are contiguous I can replace Where with TakeWhile.
int startIndex = dictionary.BinarySearch(searchWord, comparer);
if(startIndex < 0)
startIndex = ~startIndex;
var adviceWords = dictionary
.Skip(startIndex)
.TakeWhile(p => p.Key.StartsWith(searchWord, StringComparison.Ordinal))
.OrderByDescending(ks => ks.Value)
.ThenBy(ks => ks.Key)
.Select(s => s.Key)
.Take(10).ToList();
Make sure to use ordinal comparison for all operations, including the initial sort, the binary search and the StartsWith check.
I would call Console.ReadLine outside the parallel loop. Probably using AsParallel().Select(...) on the collection of search words instead of Parallel.For.
If you want profiling, separate the reading of the file and see how long that takes.
Also data calculation, collection, presentation could be different steps.
If you want concurrence AND a dictionary, look at the ConcurrentDictionary, maybe even more for reliability than for performance, but probably for both:
http://msdn.microsoft.com/en-us/library/dd287191(v=vs.110).aspx
Assuming the 10 is constant, then why is everyone storing the entire data set? Memory is not free. The fastest solution is to store the first 10 entries into a list, sort it. Then, maintain the 10-element-sorted-list as you traverse through the rest of the data set, removing the 11th element every time you insert an element.
The above method works best for small values. If you had to take the first 5000 objects, consider using a binary heap instead of a list.

How to match / connect / pair integers from a List <T>

I have a list, with even number of nodes (always even). My task is to "match" all the nodes in the least costly way.
So I could have listDegree(1,4,5,6), which represents all the odd-degree nodes in my graph. How can I pair the nodes in the listDegree, and save the least costly combination to a variable, say int totalCost.
Something like this, and I return the least totalCost amount.
totalCost = (1,4) + (5,6)
totalCost = (1,5) + (4,6)
totalCost = (1,6) + (4,5)
--------------- More details (or a rewriting of the upper) ---------------
I have a class, that read my input-file and store all the information I need, like the costMatrix for the graph, the edges, number of edges and nodes.
Next i have a DijkstrasShortestPath algorithm, which computes the shortest path in my graph (costMatrix) from a given start node to a given end node.
I also have a method that examines the graph (costMatrix) and store all the odd-degree nodes in a list.
So what I was looking for, was some hints to how I can pair all the odd-degree nodes in the least costly way (shortest path). To use the data I have is easy, when I know how to combine all the nodes in the list.
I dont need a solution, and this is not homework.
I just need a hint to know, when you have a list with lets say integers, how you can combine all the integers pairwise.
Hope this explenation is better... :D
Perhaps:
List<int> totalCosts = listDegree
.Select((num,index) => new{num,index})
.GroupBy(x => x.index / 2)
.Select(g => g.Sum(x => x.num))
.ToList();
Demo
Edit:
After you've edited your question i understand your requirement. You need a total-sum of all (pairwise) combinations of all elements in a list. I would use this combinatorics project which is quite efficient and informative.
var listDegree = new[] { 1, 4, 5, 6 };
int lowerIndex = 2;
var combinations = new Facet.Combinatorics.Combinations<int>(
listDegree,
lowerIndex,
Facet.Combinatorics.GenerateOption.WithoutRepetition
);
// get total costs overall
int totalCosts = combinations.Sum(c => c.Sum());
// get a List<List<int>> of all combination (the inner list count is 2=lowerIndex since you want pairs)
List<List<int>> allLists = combinations.Select(c => c.ToList()).ToList();
// output the result for demo purposes
foreach (IList<int> combis in combinations)
{
Console.WriteLine(String.Join(" ", combis));
}
(Without more details on the cost, I am going to assume cost(1,5) = 1-5, and you want the sum to get as closest as possible to 0.)
You are describing the even partition problem, which is NP-Complete.
The problem says: Given a list L, find two lists A,B such that sum(A) = sum(B) and #elements(A) = #elements(B), with each element from L must be in A or B (and never both).
The reduction to your problem is simple, each left element in the pair will go to A, and each right element in each pair will go to B.
Thus, there is no known polynomial solution to the problem, but you might want to try exponential exhaustive search approaches (search all possible pairs, there are Choose(2n,n) = (2n!)/(n!*n!) of those).
An alternative is pseudo-polynomial DP based solutions (feasible for small integers).

Select items from List of structs

I've got List of sctructs. In struct there is field x. I would like to select those of structs, which are rather close to each other by parameter x. In other words, I'd like to clusterise them by x.
I guess, there should be one-line solution.
Thanks in advance.
If I understood correctly what you want, then you might need to sort your list by the structure's field X.
Look at the GroupBy extension method:
var items = mylist.GroupBy(c => c.X);
This article gives a lot of examples using group by.
If you're doing graph-style clustering, the easiest way to do it is by building up a list of clusters which is initially empty. Then loop over the input and, for each value, find all of the clusters which have at least one element which is close to the current value. All those clusters should then be merged together with the value. If there aren't any, then the value goes into a cluster all by itself.
Here is some sample code for how to do it with a simple list of integers.
IEnumerable<int> input;
int threshold;
List<List<int>> clusters = new List<List<int>>();
foreach(var current in input)
{
// Search the current list of clusters for ones which contain at least one
// entry such that the difference between it and x is less than the threshold
var matchingClusters =
clusters.Where(
cluster => cluster.Any(
val => Math.Abs(current - val) <= threshold)
).ToList();
// Merge all the clusters that were found, plus x, into a new cluster.
// Replace all the existing clusters with this new one.
IEnumerable<int> newCluster = new List<int>(new[] { current });
foreach (var match in matchingClusters)
{
clusters.Remove(match);
newCluster = newCluster.Concat(match);
}
clusters.Add(newCluster.ToList());
}

Counting occurrences of a string in an array and then removing duplicates

I am fairly new to C# programming and I am stuck on my little ASP.NET project.
My website currently examines Twitter statuses for URLs and then adds those URLs to an array, all via a regular expression pattern matching procedure. Clearly more than one person will update a with a specific URL so I do not want to list duplicates, and I want to count the number of times a particular URL is mentioned in, say, 100 tweets.
Now I have a List<String> which I can sort so that all duplicate URLs are next to each other. I was under the impression that I could compare list[i] with list[i+1] and if they match, for a counter to be added to (count++), and if they don't match, then for the URL and the count value to be added to a new array, assuming that this is the end of the duplicates.
This would remove duplicates and give me a count of the number of occurrences for each URL. At the moment, what I have is not working, and I do not know why (like I say, I am not very experienced with it all).
With the code below, assume that a JSON feed has been searched for using a keyword into srchResponse.results. The results with URLs in them get added to sList, a string List type, which contains only the URLs, not the message as a whole.
I want to put one of each URL (no duplicates), a count integer (to string) for the number of occurrences of a URL, and the username, message, and user image URL all into my jagged array called 'urls[100][]'. I have made the array 100 rows long to make sure everything can fit but generally, this is too big. Each 'row' will have 5 elements in them.
The debugger gets stuck on the line: if (sList[i] == sList[i + 1]) which is the crux of my idea, so clearly the logic is not working. Any suggestions or anything will be seriously appreciated!
Here is sample code:
var sList = new ArrayList();
string[][] urls = new string[100][];
int ctr = 0;
int j = 1;
foreach (Result res in srchResponse.results)
{
string content = res.text;
string pattern = #"((https?|ftp|gopher|telnet|file|notes|ms-help):((//)|(\\\\))+[\w\d:##%/;$()~_?\+-=\\\.&]*)";
MatchCollection matches = Regex.Matches(content, pattern);
foreach (Match match in matches)
{
GroupCollection groups = match.Groups;
sList.Add(groups[0].Value.ToString());
}
}
sList.Sort();
foreach (Result res in srchResponse.results)
{
for (int i = 0; i < 100; i++)
{
if (sList[i] == sList[i + 1])
{
j++;
}
else
{
urls[ctr][0] = sList[i].ToString();
urls[ctr][1] = j.ToString();
urls[ctr][2] = res.text;
urls[ctr][3] = res.from_user;
urls[ctr][4] = res.profile_image_url;
ctr++;
j = 1;
}
}
}
The code then goes on to add each result into a StringBuilder method with the HTML.
Is now edite
The description of your algorithm seems fine. I don't know what's wrong with the implementation; I haven't read it that carefully. (The fact that you are using an ArrayList is an immediate red flag; why aren't you using a more strongly typed generic collection?)
However, I have a suggestion. This is exactly the sort of problem that LINQ was intended to solve. Instead of writing all that error-prone code yourself, just describe the transformation you're interested in, and let the compiler work it out for you.
Suppose you have a list of strings and you wish to determine the number of occurrences of each:
var notes = new []{ "Do", "Fa", "La", "So", "Mi", "Do", "Re" };
var counts = from note in notes
group note by note into g
select new { Note = g.Key, Count = g.Count() }
foreach(var count in counts)
Console.WriteLine("Note {0} occurs {1} times.", count.Note, count.Count);
Which I hope you agree is much easier to read than all that array logic you wrote. And of course, now you have your sequence of unique items; you have a sequence of counts, and each count contains a unique Note.
I'd recommend using a more sophisticated data structure than an array. A Set will guarantee that you have no duplicates.
Looks like C# collections doesn't include a Set, but there are 3rd party implementations available, like this one.
Your loop fails because when i == 99, (i + 1) == 100 which is outside the bounds of your array.
But as other have pointed out, .Net 3.5 has ways of doing what you want more elegantly.
If you don't need to know how many duplicates a specific entry has you could do the following:
LINQ Extension Methods
.Count()
.Distinct()
.Count()

Categories

Resources