Array.Contains runs very slow, anyone shed some light? - c#

I have done some benchmark regarding List.Contains, Array.Contains, IEnumerable.Contains, ICollection.Contains and IList.Contains.
The results are:
array pure 00:00:45.0052754 // 45 sec, slow
array as IList 00:00:02.7900305
array as IEnumerable 00:00:46.5871087 // 45 sec, slow
array as ICollection 00:00:02.7449889
list pure 00:00:01.9907563
list as IList 00:00:02.6626009
list as IEnumerable 00:00:02.9541950
list as ICollection 00:00:02.3341203
As I find out that it would be very slow if call Array.Contains directly (which is equivalent to call IEnumerable)
Also I feel it is strange that MSDN array page doesn't have contains method listed in the extension method section.
Sample code:
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Diagnostics;
namespace arrayList
{
class Program
{
static void Main(string[] args)
{
Stopwatch watch = new Stopwatch();
Int64 n = 100000000;
Int64[] myarray = new Int64[] { 1, 2, 3 };
List<Int64> mylist = new List<Int64>(myarray);
watch.Start();
for (Int64 j = 0; j < n; j++)
{
bool i = myarray.Contains(2);
}
watch.Stop();
Console.WriteLine("array pure {0}", watch.Elapsed);
watch.Restart();
for (Int64 j = 0; j < n; j++)
{
bool i = (myarray as IList<Int64>).Contains(2);
}
watch.Stop();
Console.WriteLine("array as IList {0}",watch.Elapsed);
watch.Restart();
for (Int64 j = 0; j < n; j++)
{
bool i = (myarray as IEnumerable<Int64>).Contains(2);
}
watch.Stop();
Console.WriteLine("array as IEnumerable {0}",watch.Elapsed);
watch.Restart();
for (Int64 j = 0; j < n; j++)
{
bool i = (myarray as ICollection<Int64>).Contains(2);
}
watch.Stop();
Console.WriteLine("array as ICollection {0}",watch.Elapsed);
watch.Restart();
for (Int64 j = 0; j < n; j++)
{
bool i = mylist.Contains(2);
}
watch.Stop();
Console.WriteLine("list pure {0}", watch.Elapsed);
watch.Restart();
for (Int64 j = 0; j < n; j++)
{
bool i = (mylist as IList<Int64>).Contains(2);
}
watch.Stop();
Console.WriteLine("list as IList {0}", watch.Elapsed);
watch.Restart();
for (Int64 j = 0; j < n; j++)
{
bool i = (mylist as IEnumerable<Int64>).Contains(2);
}
watch.Stop();
Console.WriteLine("list as IEnumerable {0}", watch.Elapsed);
watch.Restart();
for (Int64 j = 0; j < n; j++)
{
bool i = (mylist as ICollection<Int64>).Contains(2);
}
watch.Stop();
Console.WriteLine("list as ICollection {0}", watch.Elapsed);
Console.ReadKey();
}
}
}

The way you are timing these are not sufficient. You need significantly larger inputs to get times representative of the algorithms. Yes Contains() will be slower than a simple linear search (something you've omitted) but the different calls are not going to have the times as you've shown. You're likely to not see any variation between the calls to Contains() when cast to the different types as in all likelihood, we're calling the same implementation for all of them.
Try this code for size:
using System;
using System.Collections.Generic;
using System.Linq;
using System.Diagnostics;
namespace ConsoleApplication1
{
class Program
{
static void Main(string[] args)
{
const int iterations = 1000000;
const long target = 7192;
var arr = Enumerable.Range(0, 10000).Select(i => (long)i).ToArray();
var list = arr.ToList();
bool result;
var arr0 = Stopwatch.StartNew();
for (var i = 0; i < iterations; i++)
{
result = LinearSearchArr(arr, target);
}
arr0.Stop();
var arr1 = Stopwatch.StartNew();
for (var i = 0; i < iterations; i++)
{
// actually Enumerable.Contains()
result = arr.Contains(target);
}
arr1.Stop();
var arr2 = Stopwatch.StartNew();
for (var i = 0; i < iterations; i++)
{
result = ((IList<long>)arr).Contains(target);
}
arr2.Stop();
var arr3 = Stopwatch.StartNew();
for (var i = 0; i < iterations; i++)
{
result = ((IEnumerable<long>)arr).Contains(target);
}
arr3.Stop();
var arr4 = Stopwatch.StartNew();
for (var i = 0; i < iterations; i++)
{
result = ((ICollection<long>)arr).Contains(target);
}
arr4.Stop();
var list0 = Stopwatch.StartNew();
for (var i = 0; i < iterations; i++)
{
result = LinearSearchList(list, target);
}
list0.Stop();
var list1 = Stopwatch.StartNew();
for (var i = 0; i < iterations; i++)
{
result = list.Contains(target);
}
list1.Stop();
var list2 = Stopwatch.StartNew();
for (var i = 0; i < iterations; i++)
{
result = ((IList<long>)list).Contains(target);
}
list2.Stop();
var list3 = Stopwatch.StartNew();
for (var i = 0; i < iterations; i++)
{
result = ((IEnumerable<long>)list).Contains(target);
}
list3.Stop();
var list4 = Stopwatch.StartNew();
for (var i = 0; i < iterations; i++)
{
result = ((ICollection<long>)list).Contains(target);
}
list4.Stop();
Console.WriteLine("array linear {0} ({1})", arr0.Elapsed, arr0.ElapsedTicks);
Console.WriteLine("array pure {0} ({1})", arr1.Elapsed, arr1.ElapsedTicks);
Console.WriteLine("array as IList {0} ({1})", arr2.Elapsed, arr2.ElapsedTicks);
Console.WriteLine("array as IEnumerable {0} ({1})", arr3.Elapsed, arr3.ElapsedTicks);
Console.WriteLine("array as ICollection {0} ({1})", arr4.Elapsed, arr4.ElapsedTicks);
Console.WriteLine("list linear {0} ({1})", list0.Elapsed, list0.ElapsedTicks);
Console.WriteLine("list pure {0} ({1})", list1.Elapsed, list1.ElapsedTicks);
Console.WriteLine("list as IList {0} ({1})", list2.Elapsed, list2.ElapsedTicks);
Console.WriteLine("list as IEnumerable {0} ({1})", list3.Elapsed, list3.ElapsedTicks);
Console.WriteLine("list as ICollection {0} ({1})", list4.Elapsed, list4.ElapsedTicks);
}
static bool LinearSearchArr(long[] arr, long target)
{
for (var i = 0; i < arr.Length; i++)
{
if (arr[i] == target)
{
return true;
}
}
return false;
}
static bool LinearSearchList(List<long> list, long target)
{
for (var i = 0; i < list.Count; i++)
{
if (list[i] == target)
{
return true;
}
}
return false;
}
}
}
Specs:
Windows 7 Professional 64-bit
Intel Core 2 Quad Q9550 # 2.83GHz
4x1GiB Corsair Dominator DDR2 1066 (PC2-8500)
Default .NET 4.0 Console App release build targeting x64:
array linear 00:00:07.7268891 (21379939)
array pure 00:00:12.1703848 (33674883)
array as IList 00:00:12.1764948 (33691789)
array as IEnumerable 00:00:12.5377771 (34691440)
array as ICollection 00:00:12.1827855 (33709195)
list linear 00:00:17.9288343 (49608242)
list pure 00:00:25.8427338 (71505630)
list as IList 00:00:25.8678260 (71575059)
list as IEnumerable 00:00:25.8500101 (71525763)
list as ICollection 00:00:25.8423424 (71504547)

Guess: IList/List use ICollection.Contains wich directly goes through elements in the collection using index.
Array and IEnumerable versions use IEnumerable.Contains that requires creation of enumrator and run genreic iteration code (like MoveNext calls).

Make sure you use the results of the Contains method somehow in your code so that it doesn't optimise that away. I am guessing in one situation it can use a hashtable, and in the others it has to do linear search. Either that or it is just not running your loop as it doesn't do anything.
Either way, who is ever going to write code that casts and then runs contains a million times...

Related

In C# is it faster to create a Hash Set for searching through a list, rather than searching the list itself? [duplicate]

This question already has answers here:
HashSet vs. List performance
(12 answers)
Closed 9 months ago.
I have two lists of strings, and I need to check to see if there are any matches, and I have to do this at a minimum of sixty times a second, but this can scale up to thousands of times a second.
Right now, the lists are both small; one is three, and another might have a few dozen elements at most, but the currently small list is probably gonna grow.
Would it be faster to do this:
for (int i = 0; i < listA.Length; i++)
{
for (int j = 0; j < listB.Length; j++) {
if (listA[i] == listB[j])
{
// do stuff
}
}
}
Or to do this:
var hashSetB = new HashSet<string>(listB.Length);
for (int i = 0; i < listB.Length; i++)
{
hashSetB.Add(listB[i]);
}
for (int i = 0; i < listA.Length; i++)
{
if (hashSetB.Contains(listA[i])) {
// do stuff
}
}
ListA and ListB when they come to me, will always be lists; I have no control over them.
I think the core of my question is that I don't know how long var hashSetB = new HashSet<string>(listB.Length); takes, so I'm not sure the change would be good or bad for smaller lists.
Was curious so here's some code I wrote to test it. From what I got back, HashSet was near instantaneous whereas nested loops were slow. Makes sense as you've essentially taken something where you needed to do lengthA * lengthB operations and simplified it to lengthA + lengthB operation.
const int size = 20000;
var listA = new List<int>();
for (int i = 0; i < size; i++)
{
listA.Add(i);
}
var listB = new List<int>();
for (int i = size - 5; i < 2 * size; i++)
{
listB.Add(i);
}
var sw = new Stopwatch();
sw.Start();
for (int i = 0; i < listA.Count; i++)
{
for (int j = 0; j < listB.Count; j++)
{
if (listA[i] == listB[j])
{
Console.WriteLine("Nested loop match");
}
}
}
long timeTaken1 = sw.ElapsedMilliseconds;
sw.Restart();
var hashSetB = new HashSet<int>(listB.Count);
for (int i = 0; i < listB.Count; i++)
{
hashSetB.Add(listB[i]);
}
for (int i = 0; i < listA.Count; i++)
{
if (hashSetB.Contains(listA[i]))
{
Console.WriteLine("HashSet match");
}
}
long timeTaken2 = sw.ElapsedMilliseconds;
Console.WriteLine("Time Taken Nested Loop: " + timeTaken1);
Console.WriteLine("Time Taken HashSet: " + timeTaken2);
Console.ReadLine();

c# For loop getting faster with Console.Writeline afterwards

I wrote a test program to see how much faster different kind of loops with pointers are and encountered a strange behaviuor where if i add a Console.WriteLine("1: " + sum) after the first loop some of the loops get faster and another becomes slower.
So how does this affect the perfomance of the loop because it seems like it does not interact with the loops at all.
My example code has the line which affects loop perfomance commented out to show the normal performance and if I uncomment it the speed changes.
class Program
{
static void Main(string[] args)
{
Class1 c = new Class1();
c.Do();
}
}
public class Class1
{
public unsafe int n = 1000000000;
public unsafe int[] arr;
public Class1 ()
{
arr = new int[n];
for (int i = 0; i < arr.Length; i++)
{
arr[i] = 2;
}
}
public void Do ()
{
int sum = 1;
Stopwatch sw = new Stopwatch();
sw.Start();
sum = 0;
for (int i = 0; i < arr.Length; i++)
{
sum += arr[i];
}
sw.Stop();
//Console.WriteLine("1: " + sum);
Console.WriteLine("1: " + sw.ElapsedMilliseconds);
unsafe
{
sw.Reset();
sw.Start();
sum = 0;
for (int i = 0; i < arr.Length; i++)
{
sum += arr[i];
}
sw.Stop();
Console.WriteLine("2: " + sw.ElapsedMilliseconds);
sw.Reset();
sw.Start();
sum = 0;
fixed (int* start = arr)
{
for (int i = 0; i < arr.Length; i++)
{
sum += *(start + i);
}
}
sw.Stop();
Console.WriteLine("3: " + sw.ElapsedMilliseconds);
sw.Reset();
sw.Start();
sum = 0;
fixed (int* start = arr)
{
int* iter = start;
var remaining = arr.Length;
while (remaining-- > 0)
{
sum += *iter;
iter++;
}
}
sw.Stop();
Console.WriteLine("4: " + sw.ElapsedMilliseconds);
sw.Reset();
sw.Start();
sum = 0;
fixed (int* start = arr)
{
int* end = start + n;
int* iter = start - 1;
while (iter++ < end)
{
sum += *iter;
}
}
sw.Stop();
Console.WriteLine("5: " + sw.ElapsedMilliseconds);
}
}
}
output:
1: 2082
2: 2038
3: 1800
4: 2061
5: 1724
output with uncommented line:
1: 2000000000
1: 1949
2: 1953
3: 1998
4: 2055
5: 1724
I tested it multiple times to see if it was just a random difference but it seems to be consistent.

Sorting many TValue arrays with one TKey array

I have an outer array of N inner arrays of size M. I want to sort each inner array according to another array K exactly in the same way as a built-in Array.Sort<TKey, TValue> Method (TKey[], TValue[], IComparer<TKey>) .NET method does.
The method modifies the Key array after sorting, so I can use it to sort only single inner array. To sort many arrays, I copy the Key array to another KeyBuffer array for each inner array, reusing the KeyBuffer on each sorting step and avoiding allocation and GC. Is that the most efficient way if the typical N is 10K-100K and M < 1000? Given the low size of M the copying and sorting should be done in CPU cache, - which is the fastest that I can get?
My concern is that by doing so, I am sorting the buffer and discarding the results (N-1) times, which is a kind of waste. Also I am doing actual sorting N times, but after the first sorting I already know a map of old indexes to new indexes and I could somehow reuse that mapping for other (N-1) steps.
How would you avoid unnecessary sorting and apply known mapping from the first step to other steps?
Here is the code how I do it now. The question is if it is possible to do it more efficiently.
using System;
using System.Collections.Generic;
namespace MultiSorting {
class Program {
static void Main(string[] args) {
var N = 10;
var M = 5;
var outer = new List<string[]>(N);
for (var i = 0; i < N; i++) {
string[] inner = { "a" + i, "d" + i, "c" + i, "b" + i, "e" + i };
outer.Add(inner);
}
int[] keys = { 1, 4, 3, 2, 5 };
var keysBuffer = new int[M];
for (int i = 0; i < N; i++) {
Array.Copy(keys, keysBuffer, M);
// doing sort N times, but we know the map
// old_index -> new_index from the first sorting
// plus we sort keysBuffer N times but use the result only one time
Array.Sort(keysBuffer, outer[i]);
}
keys = keysBuffer;
foreach (var key in keys) {
Console.Write(key + " "); // 1, 2, 3, 4, 5
}
Console.WriteLine("");
for (var i = 0; i < N; i++) {
foreach (var item in outer[i]) {
Console.Write(item + " "); // a{i}, b{i}, c{i}, d{i}, e{i}
}
Console.WriteLine("");
}
Console.ReadLine();
}
}
Just played with this and implemented mapping reuse directly in a for loop. I didn't expect that a simple loop instead of native built-in methods could speed up things, probably because I had underestimated algorithmic costs of sorting vs costs of array looping and I used to relax when a profiler said the job was mostly done inside .NET methods...
Naive is the code from the question, ReuseMap is what is described in words in the question, Linq is from the answer by #L.B. ...InPlace modifies input, ...Copy doesn't.
Results with N = 2000, M = 500, 10 runs, in milliseconds:
NaiveInPlace: 1005
ReuseMapInPlace: 129 (Log2(500) = 9.0, speed-up = 7.8x)
NaiveCopy: 1181
ReuseMapCopy: 304
LinqCopy: 3284
The entire test is below:
using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.Linq;
namespace MultiSorting {
class Program {
static void Main() {
const int n = 2;
const int m = 10;
var keys = GenerateKeys(m);
foreach (var key in keys) {
Console.Write(key + " ");
}
Console.WriteLine("");
var keysBuffer = new int[keys.Length];
Array.Copy(keys, keysBuffer, keys.Length);
Array.Sort(keysBuffer);
foreach (var key in keysBuffer) {
Console.Write(key + " ");
}
Console.WriteLine("");
// warm up, check that output is the same
List<string[]> outer = MultiSortNaiveInPlace(keys, GenerateOuter(n, m));
PrintResults(outer);
outer = MultiSortNaiveCopy(keys, GenerateOuter(n, m));
PrintResults(outer);
outer = MultiSortReuseMapInPlace(keys, GenerateOuter(n, m));
PrintResults(outer);
outer = MultiSortReuseMapCopy(keys, GenerateOuter(n, m));
PrintResults(outer);
outer = MultiSortLinqCopy(keys, GenerateOuter(n, m));
PrintResults(outer);
// tests
keys = GenerateKeys(500);
NaiveInPlace(2000, 500, keys);
ReuseMapInPlace(2000, 500, keys);
NaiveCopy(2000, 500, keys);
ReuseMapCopy(2000, 500, keys);
LinqCopy(2000, 500, keys);
Console.ReadLine();
}
private static void NaiveInPlace(int n, int m, int[] keys) {
const int rounds = 10;
var source = new List<List<string[]>>(rounds);
for (int i = 0; i < rounds; i++) {
source.Add(GenerateOuter(n, m));
}
GC.Collect();
var sw = Stopwatch.StartNew();
for (int i = 0; i < rounds; i++) {
source[i] = MultiSortNaiveInPlace(keys, source[i]);
}
sw.Stop();
Console.WriteLine("NaiveInPlace: " + sw.ElapsedMilliseconds);
}
private static void ReuseMapInPlace(int n, int m, int[] keys) {
const int rounds = 10;
var source = new List<List<string[]>>(rounds);
for (int i = 0; i < rounds; i++) {
source.Add(GenerateOuter(n, m));
}
GC.Collect();
var sw = Stopwatch.StartNew();
for (int i = 0; i < rounds; i++) {
source[i] = MultiSortReuseMapInPlace(keys, source[i]);
}
sw.Stop();
Console.WriteLine("ReuseMapInPlace: " + sw.ElapsedMilliseconds);
}
private static void NaiveCopy(int n, int m, int[] keys) {
const int rounds = 10;
var source = new List<List<string[]>>(rounds);
for (int i = 0; i < rounds; i++) {
source.Add(GenerateOuter(n, m));
}
GC.Collect();
var sw = Stopwatch.StartNew();
for (int i = 0; i < rounds; i++) {
source[i] = MultiSortNaiveCopy(keys, source[i]);
}
sw.Stop();
Console.WriteLine("NaiveCopy: " + sw.ElapsedMilliseconds);
}
private static void ReuseMapCopy(int n, int m, int[] keys) {
const int rounds = 10;
var source = new List<List<string[]>>(rounds);
for (int i = 0; i < rounds; i++) {
source.Add(GenerateOuter(n, m));
}
GC.Collect();
var sw = Stopwatch.StartNew();
for (int i = 0; i < rounds; i++) {
source[i] = MultiSortReuseMapCopy(keys, source[i]);
}
sw.Stop();
Console.WriteLine("ReuseMapCopy: " + sw.ElapsedMilliseconds);
}
private static void LinqCopy(int n, int m, int[] keys) {
const int rounds = 10;
var source = new List<List<string[]>>(rounds);
for (int i = 0; i < rounds; i++) {
source.Add(GenerateOuter(n, m));
}
GC.Collect();
var sw = Stopwatch.StartNew();
for (int i = 0; i < rounds; i++) {
source[i] = MultiSortLinqCopy(keys, source[i]);
}
sw.Stop();
Console.WriteLine("LinqCopy: " + sw.ElapsedMilliseconds);
}
private static void PrintResults(List<string[]> outer) {
for (var i = 0; i < outer.Count; i++) {
foreach (var item in outer[i]) {
Console.Write(item + " "); // a{i}, b{i}, c{i}, d{i}, e{i}
}
Console.WriteLine("");
}
}
private static int[] GenerateKeys(int m) {
var keys = new int[m];
for (int i = 0; i < m; i++) { keys[i] = i; }
var rnd = new Random();
keys = keys.OrderBy(x => rnd.Next()).ToArray();
return keys;
}
private static List<string[]> GenerateOuter(int n, int m) {
var outer = new List<string[]>(n);
for (var o = 0; o < n; o++) {
var inner = new string[m];
for (int i = 0; i < m; i++) { inner[i] = "R" + o + "C" + i; }
outer.Add(inner);
}
return outer;
}
private static List<string[]> MultiSortNaiveInPlace(int[] keys, List<string[]> outer) {
var keysBuffer = new int[keys.Length];
foreach (var inner in outer) {
Array.Copy(keys, keysBuffer, keys.Length);
// doing sort N times, but we know the map
// old_index -> new_index from the first sorting
// plus we sort keysBuffer N times but use the result only one time
Array.Sort(keysBuffer, inner);
}
return outer;
}
private static List<string[]> MultiSortNaiveCopy(int[] keys, List<string[]> outer) {
var result = new List<string[]>(outer.Count);
var keysBuffer = new int[keys.Length];
for (var n = 0; n < outer.Count(); n++) {
var inner = outer[n];
var newInner = new string[keys.Length];
Array.Copy(keys, keysBuffer, keys.Length);
Array.Copy(inner, newInner, keys.Length);
// doing sort N times, but we know the map
// old_index -> new_index from the first sorting
// plus we sort keysBuffer N times but use the result only one time
Array.Sort(keysBuffer, newInner);
result.Add(newInner);
}
return result;
}
private static List<string[]> MultiSortReuseMapInPlace(int[] keys, List<string[]> outer) {
var itemsBuffer = new string[keys.Length];
var keysBuffer = new int[keys.Length];
Array.Copy(keys, keysBuffer, keysBuffer.Length);
var map = new int[keysBuffer.Length];
for (int m = 0; m < keysBuffer.Length; m++) {
map[m] = m;
}
Array.Sort(keysBuffer, map);
for (var n = 0; n < outer.Count(); n++) {
var inner = outer[n];
for (int m = 0; m < map.Length; m++) {
itemsBuffer[m] = inner[map[m]];
}
Array.Copy(itemsBuffer, outer[n], inner.Length);
}
return outer;
}
private static List<string[]> MultiSortReuseMapCopy(int[] keys, List<string[]> outer) {
var keysBuffer = new int[keys.Length];
Array.Copy(keys, keysBuffer, keysBuffer.Length);
var map = new int[keysBuffer.Length];
for (int m = 0; m < keysBuffer.Length; m++) {
map[m] = m;
}
Array.Sort(keysBuffer, map);
var result = new List<string[]>(outer.Count);
for (var n = 0; n < outer.Count(); n++) {
var inner = outer[n];
var newInner = new string[keys.Length];
for (int m = 0; m < map.Length; m++) {
newInner[m] = inner[map[m]];
}
result.Add(newInner);
}
return result;
}
private static List<string[]> MultiSortLinqCopy(int[] keys, List<string[]> outer) {
var result = outer.Select(arr => arr.Select((item, inx) => new { item, key = keys[inx] })
.OrderBy(x => x.key)
.Select(x => x.item)
.ToArray()) // allocating
.ToList(); // allocating
return result;
}
}
}

Performance Benchmarking of Contains, Exists and Any

I have been searching for a performance benchmarking between Contains, Exists and Any methods available in the List<T>. I wanted to find this out just out of curiosity as I was always confused among these. Many questions on SO described definitions of these methods such as:
LINQ Ring: Any() vs Contains() for Huge Collections
Linq .Any VS .Exists - Whats the difference?
LINQ extension methods - Any() vs. Where() vs. Exists()
So I decided to do it myself. I am adding it as an answer. Any more insight on the results is most welcomed. I also did this benchmarking for arrays to see the results
The fastest way is to use a HashSet.
The Contains for a HashSet is O(1).
I took your code and added a benchmark for HashSet<int>
The performance cost of HashSet<int> set = new HashSet<int>(list); is nearly zero.
Code
void Main()
{
ContainsExistsAnyVeryShort();
ContainsExistsAnyShort();
ContainsExistsAny();
}
private static void ContainsExistsAny()
{
Console.WriteLine("***************************************");
Console.WriteLine("********* ContainsExistsAny ***********");
Console.WriteLine("***************************************");
List<int> list = new List<int>(6000000);
Random random = new Random();
for (int i = 0; i < 6000000; i++)
{
list.Add(random.Next(6000000));
}
int[] arr = list.ToArray();
HashSet<int> set = new HashSet<int>(list);
find(list, arr, set, (method, stopwatch) => $"{method}: {stopwatch.ElapsedMilliseconds}ms");
}
private static void ContainsExistsAnyShort()
{
Console.WriteLine("***************************************");
Console.WriteLine("***** ContainsExistsAnyShortRange *****");
Console.WriteLine("***************************************");
List<int> list = new List<int>(2000);
Random random = new Random();
for (int i = 0; i < 2000; i++)
{
list.Add(random.Next(6000000));
}
int[] arr = list.ToArray();
HashSet<int> set = new HashSet<int>(list);
find(list, arr, set, (method, stopwatch) => $"{method}: {stopwatch.ElapsedMilliseconds}ms");
}
private static void ContainsExistsAnyVeryShort()
{
Console.WriteLine("*******************************************");
Console.WriteLine("***** ContainsExistsAnyVeryShortRange *****");
Console.WriteLine("*******************************************");
List<int> list = new List<int>(10);
Random random = new Random();
for (int i = 0; i < 10; i++)
{
list.Add(random.Next(6000000));
}
int[] arr = list.ToArray();
HashSet<int> set = new HashSet<int>(list);
find(list, arr, set, (method, stopwatch) => $"{method}: {stopwatch.ElapsedTicks} ticks");
}
private static void find(List<int> list, int[] arr, HashSet<int> set, Func<string, Stopwatch, string> format)
{
Random random = new Random();
int[] find = new int[10000];
for (int i = 0; i < 10000; i++)
{
find[i] = random.Next(6000000);
}
Stopwatch watch = Stopwatch.StartNew();
for (int rpt = 0; rpt < 10000; rpt++)
{
list.Contains(find[rpt]);
}
watch.Stop();
Console.WriteLine(format("List/Contains", watch));
watch = Stopwatch.StartNew();
for (int rpt = 0; rpt < 10000; rpt++)
{
list.Exists(a => a == find[rpt]);
}
watch.Stop();
Console.WriteLine(format("List/Exists", watch));
watch = Stopwatch.StartNew();
for (int rpt = 0; rpt < 10000; rpt++)
{
list.Any(a => a == find[rpt]);
}
watch.Stop();
Console.WriteLine(format("List/Any", watch));
watch = Stopwatch.StartNew();
for (int rpt = 0; rpt < 10000; rpt++)
{
arr.Contains(find[rpt]);
}
watch.Stop();
Console.WriteLine(format("Array/Contains", watch));
watch = Stopwatch.StartNew();
for (int rpt = 0; rpt < 10000; rpt++)
{
Array.Exists(arr, a => a == find[rpt]);
}
watch.Stop();
Console.WriteLine(format("Array/Exists", watch));
watch = Stopwatch.StartNew();
for (int rpt = 0; rpt < 10000; rpt++)
{
Array.IndexOf(arr, find[rpt]);
}
watch.Stop();
Console.WriteLine(format("Array/IndexOf", watch));
watch = Stopwatch.StartNew();
for (int rpt = 0; rpt < 10000; rpt++)
{
arr.Any(a => a == find[rpt]);
}
watch.Stop();
Console.WriteLine(format("Array/Any", watch));
watch = Stopwatch.StartNew();
for (int rpt = 0; rpt < 10000; rpt++)
{
set.Contains(find[rpt]);
}
watch.Stop();
Console.WriteLine(format("HashSet/Contains", watch));
}
RESULTS
*******************************************
***** ContainsExistsAnyVeryShortRange *****
*******************************************
List/Contains: 1067 ticks
List/Exists: 2884 ticks
List/Any: 10520 ticks
Array/Contains: 1880 ticks
Array/Exists: 5526 ticks
Array/IndexOf: 601 ticks
Array/Any: 13295 ticks
HashSet/Contains: 6629 ticks
***************************************
***** ContainsExistsAnyShortRange *****
***************************************
List/Contains: 4ms
List/Exists: 28ms
List/Any: 138ms
Array/Contains: 6ms
Array/Exists: 34ms
Array/IndexOf: 3ms
Array/Any: 96ms
HashSet/Contains: 0ms
***************************************
********* ContainsExistsAny ***********
***************************************
List/Contains: 11504ms
List/Exists: 57084ms
List/Any: 257659ms
Array/Contains: 11643ms
Array/Exists: 52477ms
Array/IndexOf: 11741ms
Array/Any: 194184ms
HashSet/Contains: 3ms
Edit (2021-08-25)
I added a comparison for very short collections (10 items) and also added Array.Contains and Array.IndexOf. You can see that Array.IndexOf is the fastest for such small ranges. That is, because as #lucky-brian said n is so small here, that a for-loop performs better than a somewhat complex search algorithm. However I still advice to use HashSet<T> whenever possible, as it better reflects the intend of only having unique values and the difference for small collections is below 1ms.
According to documentation:
List.Exists (Object method)
Determines whether the List(T) contains elements that match the
conditions defined by the specified predicate.
IEnumerable.Any (Extension method)
Determines whether any element of a sequence satisfies a condition.
List.Contains (Object Method)
Determines whether an element is in the List.
Benchmarking:
CODE:
static void Main(string[] args)
{
ContainsExistsAnyShort();
ContainsExistsAny();
}
private static void ContainsExistsAny()
{
Console.WriteLine("***************************************");
Console.WriteLine("********* ContainsExistsAny ***********");
Console.WriteLine("***************************************");
List<int> list = new List<int>(6000000);
Random random = new Random();
for (int i = 0; i < 6000000; i++)
{
list.Add(random.Next(6000000));
}
int[] arr = list.ToArray();
find(list, arr);
}
private static void ContainsExistsAnyShort()
{
Console.WriteLine("***************************************");
Console.WriteLine("***** ContainsExistsAnyShortRange *****");
Console.WriteLine("***************************************");
List<int> list = new List<int>(2000);
Random random = new Random();
for (int i = 0; i < 2000; i++)
{
list.Add(random.Next(6000000));
}
int[] arr = list.ToArray();
find(list, arr);
}
private static void find(List<int> list, int[] arr)
{
Random random = new Random();
int[] find = new int[10000];
for (int i = 0; i < 10000; i++)
{
find[i] = random.Next(6000000);
}
Stopwatch watch = Stopwatch.StartNew();
for (int rpt = 0; rpt < 10000; rpt++)
{
list.Contains(find[rpt]);
}
watch.Stop();
Console.WriteLine("List/Contains: {0:N0}ms", watch.ElapsedMilliseconds);
watch = Stopwatch.StartNew();
for (int rpt = 0; rpt < 10000; rpt++)
{
list.Exists(a => a == find[rpt]);
}
watch.Stop();
Console.WriteLine("List/Exists: {0:N0}ms", watch.ElapsedMilliseconds);
watch = Stopwatch.StartNew();
for (int rpt = 0; rpt < 10000; rpt++)
{
list.Any(a => a == find[rpt]);
}
watch.Stop();
Console.WriteLine("List/Any: {0:N0}ms", watch.ElapsedMilliseconds);
watch = Stopwatch.StartNew();
for (int rpt = 0; rpt < 10000; rpt++)
{
arr.Contains(find[rpt]);
}
watch.Stop();
Console.WriteLine("Array/Contains: {0:N0}ms", watch.ElapsedMilliseconds);
Console.WriteLine("Arrays do not have Exists");
watch = Stopwatch.StartNew();
for (int rpt = 0; rpt < 10000; rpt++)
{
arr.Any(a => a == find[rpt]);
}
watch.Stop();
Console.WriteLine("Array/Any: {0:N0}ms", watch.ElapsedMilliseconds);
}
RESULTS
***************************************
***** ContainsExistsAnyShortRange *****
***************************************
List/Contains: 96ms
List/Exists: 146ms
List/Any: 381ms
Array/Contains: 34ms
Arrays do not have Exists
Array/Any: 410ms
***************************************
********* ContainsExistsAny ***********
***************************************
List/Contains: 257,996ms
List/Exists: 379,951ms
List/Any: 884,853ms
Array/Contains: 72,486ms
Arrays do not have Exists
Array/Any: 1,013,303ms
It is worth mentioning that this comparison is a bit unfair, since the Array class doesn't own the Contains() method. It uses an extension method for IEnumerable<T> via a sequential Enumerator, hence it is not optimized for Array instances. On the other side, HashSet<T> has its own implementation fully optimized for all sizes.
To compare fairly you could use the static method int Array.IndexOf() which is implemented for Array instances, even though it uses a for loop slightly more efficient that an Enumerator.
Using a fair comparison algorithm, the performance for small sets of up to 5 elements of HashSet<T>.Contains() is similar to the Array.IndexOf() but it is much more efficient for larger sets.

How can I measure the performance of a HashTable in C#?

I am playing around with C# collections and I have decided to write a quick test to measure the performance of different collections.
My performance test goes like this:
int numOps= (put number here);
long start, end, numTicks1, numTicks2;
float ratio;
start = DateTime.Now.Ticks;
for(int i = 0; i < numOps; i++)
{
//add two elements to collection #1
//remove one element from collection #1
}
end = DateTime.Now.Ticks;
numTicks1 = end - start;
start = DateTime.Now.Ticks;
for(int i = 0; i < numOps; i++)
{
//add two elements to collection #2
//remove one element from collection #2
}
end = DateTime.Now.Ticks;
numTicks2 = end - start;
ratio = (float)numTicks2/(float)numTicks1;
Then I compare the ratio value using different Collections and different values for numOps to see how they compare.
The problem is sometimes when I use a small enough number (numOps = 500), the test results between a Hashtable and List are sporadic (in other words it's a coin flip which one is faster). Can anyone explain why this is?
EDIT: Thanks everyone! Stopwatch works like a charm.
try taking a look at StopWatch class instead of using DateTime
this example straight out of MSDN
Stopwatch stopWatch = new Stopwatch();
stopWatch.Start();
Thread.Sleep(10000); //your for loop
stopWatch.Stop();
// Get the elapsed time as a TimeSpan value.
TimeSpan ts = stopWatch.Elapsed;
// Format and display the TimeSpan value.
string elapsedTime = String.Format("{0:00}:{1:00}:{2:00}.{3:00}",
ts.Hours, ts.Minutes, ts.Seconds,
ts.Milliseconds / 10);
Console.WriteLine(elapsedTime, "RunTime");
I would start by trying out a higher resolution timer.
There are quite a few questions and answers about timers already on SO.
Here's one answer that has a list of options available to you.
In particular, check out System.Diagnostics.Stopwatch.
The proper way to time things diagnostically is to run the code many times iteratively (so that the total time is many multiples of the resolution of whatever timing mechanism you use) and then divide by the number of iterations to get an accurate time estimate.
Stopwatch returns times that are not quantized by 15 ms, so it's obviously more appropriate for timing events.
I used the following code to test the performance of a Dictionary using three different GetHash implementations:
class TestGetHash
{
class First
{
int m_x;
}
class Second
{
static int s_allocated = 0;
int m_allocated;
int m_x;
public Second()
{
m_allocated = ++s_allocated;
}
public override int GetHashCode()
{
return m_allocated;
}
}
class Third
{
int m_x;
public override int GetHashCode()
{
return 0;
}
}
internal static void test()
{
testT<First>(100, 1000);
testT<First>(1000, 100);
testT<Second>(100, 1000);
testT<Second>(1000, 100);
testT<Third>(100, 100);
testT<Third>(1000, 10);
}
static void testT<T>(int objects, int iterations)
where T : new()
{
System.Diagnostics.Stopwatch stopWatch = System.Diagnostics.Stopwatch.StartNew();
for (int i = 0; i < iterations; ++i)
{
Dictionary<T, object> dictionary = new Dictionary<T, object>();
for (int j = 0; j < objects; ++j)
{
T t = new T();
dictionary.Add(t, null);
}
for (int k = 0; k < 100; ++k)
{
foreach (T t in dictionary.Keys)
{
object o = dictionary[t];
}
}
}
stopWatch.Stop();
string stopwatchMessage = string.Format("Stopwatch: {0} type, {1} objects, {2} iterations, {3} msec", typeof(T).Name, objects, iterations, stopWatch.ElapsedMilliseconds);
System.Console.WriteLine(stopwatchMessage);
stopWatch = System.Diagnostics.Stopwatch.StartNew();
for (int i = 0; i < iterations; ++i)
{
Dictionary<T, object> dictionary = new Dictionary<T, object>();
for (int j = 0; j < objects; ++j)
{
T t = new T();
dictionary.Add(t, null);
}
}
stopWatch.Stop();
stopwatchMessage = string.Format("Stopwatch (fill dictionary): {0} type, {1} objects, {2} iterations, {3} msec", typeof(T).Name, objects, iterations, stopWatch.ElapsedMilliseconds);
System.Console.WriteLine(stopwatchMessage);
{
Dictionary<T, object> dictionary = new Dictionary<T, object>();
for (int j = 0; j < objects; ++j)
{
T t = new T();
dictionary.Add(t, null);
}
stopWatch = System.Diagnostics.Stopwatch.StartNew();
for (int i = 0; i < iterations; ++i)
{
for (int k = 0; k < 100; ++k)
{
foreach (T t in dictionary.Keys)
{
object o = dictionary[t];
}
}
}
stopWatch.Stop();
stopwatchMessage = string.Format("Stopwatch (read from dictionary): {0} type, {1} objects, {2} iterations, {3} msec", typeof(T).Name, objects, iterations, stopWatch.ElapsedMilliseconds);
System.Console.WriteLine(stopwatchMessage);
}
}
}

Categories

Resources