Working with IP4 addresses, which obviously can be stored in 32 bits.
I need to keep track of a list of ip's, which can be a fairly long list, so would like to keep it tight as possible. I also need to have quick search of the list, so as to check if an IP is already loaded in it.
I'm currently looking at: convert the IP to a UInt32, then store the list in a HashSet.
I'm thinking there may be a better way though?
update: Hashsets of course generate hashes, which are larger than the 4 bytes of a uint. So to truly optimize this, specifically for IP4 addresses, a similar structure is needed which is optimized for 4 bytes
If the list is relatively static (i.e. doesn't change very often), then an array or a List<uint> would be a very simple way to store it. It gives you O(log n) lookup with BinarySearch, which is probably fast enough unless you're doing thousands of lookups per second. Inserting a new item in the list, though, is an O(n) operation. If you have to do a lot of inserts, this isn't the way to go.
A HashSet<uint> works well and has much faster lookup and insertion. But it will cost you. A HashSet<uint> will occupy about 3 times as much memory as a List<uint>.
Justification of 3X memory use:
The program below allocates a List<uint> that contains 89,478,457 items, which used to be the maximum sized HashSet one could create. (Up through .NET 4.0.) It then fills that list with unique values and creates a HashSet<uint> from the list.
The program calculates the total allocated memory by calling GC.GetTotalMemory(true), which forces a garbage collection. It then computes the amount of memory required for the list and for the hash set.
Tests run with .NET 4.5, Visual Studio 2012. Run in release mode without the debugger attached.
My output:
Max size = 89,478,457
Starting memory = 53,240
89,000,000
After list populated = 357,967,136
89,478,457 items in the HashSet
After HashSet populated = 1,789,622,704
List occupies 357,913,896
HashSet occupies 1,431,655,568
HashSet occupies 4.00 times the memory of List
Press Enter:
So I was wrong ... it's 4X for uint. It's 3.5X for ulong.
private void DoStuff()
{
int maxSize = 89478457;
//89000000;
Console.WriteLine("Max size = {0:N0}", maxSize);
var startMem = GC.GetTotalMemory(true);
Console.WriteLine("Starting memory = {0:N0}", startMem);
// Initialize a List<long> to hold maxSize items
var l = new List<uint>(maxSize);
// now add items to the list
for (uint i = 0; i < maxSize; i++)
{
if ((i % 1000000) == 0)
{
Console.Write("\r{0:N0}", i);
}
l.Add(i);
}
Console.WriteLine();
var memAfterListAlloc = GC.GetTotalMemory(true);
Console.WriteLine("After list populated = {0:N0}", memAfterListAlloc);
// Construct a HashSet from that list
var h = new HashSet<uint>(l);
Console.WriteLine("{0:N0} items in the HashSet", h.Count);
var memAfterHashAlloc = GC.GetTotalMemory(true);
Console.WriteLine("After HashSet populated = {0:N0}", memAfterHashAlloc);
var listMem = memAfterListAlloc - startMem;
var hashMem = memAfterHashAlloc - memAfterListAlloc;
Console.WriteLine("List occupies {0:N0}", listMem);
Console.WriteLine("HashSet occupies {0:N0}", hashMem);
Console.WriteLine("HashSet occupies {0:N2} times the memory of List", (double)hashMem / listMem);
GC.KeepAlive(l);
GC.KeepAlive(h);
Console.Write("Press Enter:");
Console.ReadLine();
}
Related
I apologize if this is in the incorrect forum. Despite finding a lot of Array manipulation on this site, most of these are averaging/summing... the array of numerics as a set using LINQ, which processes well for all values in an array. But I need to process each index over multiple arrays (of the same size).
My routine receives array data from devices, typically double[512] or ushort[512]; A single device itself will always have the same size of Array data, but the array sizes can range from 256 to 2048 depending on the device. I need to hold CountToAverage quantity of the arrays to average. Each time an array is received, it must push and pop from the queue to ensure that the number of arrays in the average process is consistent (this part of the process is fixed in the Setup() for this benchmark testing. For comparison purposes, the benchmark results are shown after the code.
What I am looking for is the fastest most efficient way to average the values of each index of all the arrays, and return a new array (of the same size) where each index is averaged from the set of arrays. The count of arrays to be averaged can range from 3 to 25 (the code below sets benchmark param to 10). I have 2 different averaging methods in the test, the 2nd is significantly faster, 6-7 times faster than the first. My first question is; Is there any way to achieve this faster, that can be done at O(1) or O(log n) time complexity?
Secondarily, I am using a Queue (which may be changed to ConcurrentQueue for implementation) as a holder for the arrays to be processed. My primary reasoning for using a queue is because I can guarantee FIFO processing of the feed of arrays which is critical. Also, I can process against the values in the Queue through a foreach loop (just like a List) without having to dequeue until I am ready. I would be interested if anyone knows whether this is performance hindering as I haven't benchmarked it. Keep in mind it must be thread-safe. If you have an alternative way to process multiple sets of array data in a thread-safe manner I am "all ears".
The reason for the performance requirement is this is not the only process that is happening, I have multiple devices that are sending array results "streamed" at an approximate rate of 1 every 1-5 milliseconds, for each device coming from different threads/processes/connections, that still has several other much more intensive algorithms to process through, so this cannot be a bottleneck.
Any insights on optimizations and performance are appreciated.
using System;
using System.Collections.Generic;
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Jobs;
using BenchmarkDotNet.Running;
using Microsoft.Diagnostics.Tracing.Parsers.MicrosoftAntimalwareEngine;
namespace ArrayAverage
{
public class ArrayAverage
{
[Params(10)]
public int CountToAverage;
[Params(512, 2048)]
public int PixelSize;
static Queue<double[]> calcRepo = new Queue<double[]>();
static List<double[]> spectra = new();
[Benchmark]
public double[] CalculateIndexAverages()
{
// This is too slow
var avg = new double[PixelSize];
for (int i = 0; i < PixelSize; i++)
{
foreach (var arrayData in calcRepo)
{
avg[i] += arrayData[i];
}
avg[i] /= calcRepo.Count;
}
return avg;
}
[Benchmark]
public double[] CalculateIndexAverages2()
{
// this is faster, but is it the fastest?
var sum = new double[PixelSize];
int cnt = calcRepo.Count;
foreach (var arrayData in calcRepo)
{
for (int i = 0; i < PixelSize; i++)
{
sum[i] += arrayData[i];
}
}
var avg = new double[PixelSize];
for (int i = 0; i < PixelSize; i++)
{
avg[i] = sum[i] / cnt;
}
return avg;
}
[GlobalSetup]
public void Setup()
{
// Just generating some data as simple Triangular curve simulating a range of spectra
for (double offset = 0; offset < CountToAverage; offset++)
{
var values = new double[PixelSize];
var decrement = 0;
for (int i = 0; i < PixelSize; i++)
{
if (i > (PixelSize / 2))
decrement--;
values[i] = (offset / 7) + i + (decrement * 2);
}
calcRepo.Enqueue(values);
}
}
}
public class App
{
public static void Main()
{
BenchmarkRunner.Run<ArrayAverage>();
}
}
}
Benchmark results:
BenchmarkDotNet=v0.13.1, OS=Windows 10.0.19043.1348 (21H1/May2021Update)
Intel Core i7-6700HQ CPU 2.60GHz (Skylake), 1 CPU, 8 logical and 4 physical cores
.NET SDK=6.0.100-preview.7.21379.14
[Host] : .NET 5.0.12 (5.0.1221.52207), X64 RyuJIT [AttachedDebugger]
DefaultJob : .NET 5.0.12 (5.0.1221.52207), X64 RyuJIT
Method
Arrays To Average
Array Size
Mean
Error
StdDev
CalculateIndexAverages
10
512
32.164 μs
0.5485 μs
0.5130 μs
CalculateIndexAverages2
10
512
5.792 μs
0.1135 μs
0.2241 μs
CalculateIndexAverages
10
2048
123.628 μs
2.3394 μs
1.9535 μs
CalculateIndexAverages2
10
2048
22.311 μs
0.4366 μs
0.8093 μs
When dealing with simple operations on a large amount of data, you'd be very interested in SIMD:
SIMD stands for "single instruction, multiple data". It’s a set of processor instructions that ... allows mathematical operations to execute over a set of values in parallel.
In your particular case, using the the Vector<T> example would give you a quick win. Naively converting your fastest method to use Vectors already gives a ~2x speed up on my PC.
public double[] CalculateIndexAverages4() {
// Assumption: PixelSize is a round multiple of Vector<>.Count
// If not, you'll have to add in the 'remainder' from the example.
var batch = Vector<double>.Count;
var sum = new double[PixelSize];
foreach (var arrayData in calcRepo) {
// Vectorised summing:
for (int i = 0; i <= PixelSize - batch; i += batch) {
var vSum = new Vector<double>(sum, i);
var vData = new Vector<double>(arrayData, i);
(vSum + vData).CopyTo(sum, i);
}
}
var vCnt = Vector<double>.One * calcRepo.Count;
// Reuse sum[] for averaging, so we don't incur memory allocation cost
for (int i = 0; i <= PixelSize - batch; i += batch) {
var vSum = new Vector<double>(sum, i);
(vSum / vCnt).CopyTo(sum, i);
}
return sum;
}
The Vector<T>.Count gives you how many items are being parallelised into one instruction. In the case of double, it's likely to be 4 on most modern CPUs supporting AVX2.
If you're okay with losing precision and can go to float, you'll get a much bigger win by again doubling the amount of data processed in a single CPU op. All of this without even changing your algorithm.
You can further optimize the code by reducing memory allocations. If the method is called frequently, time spent on GC will dominate completely.
// Assuming the data fits on the stack. Some 100k pixels should be safe.
Span<double> sum = stackalloc double[PixelSize];
// ...
Span<double> avg = stackalloc double[PixelSize];
And possibly also remove the extra stack-allocation of avg and simply reuse the sum:
for (int i = 0; i < sum.Length; i++)
{
sum[i] /= cnt;
}
// TODO: Avoid array allocation! Maybe use a pre-allocated array and fill it here.
return sum.ToArray();
In my opinion this would be fairly well optimized code. A major reason for the second option to be faster is that it access memory linearly, instead of jumping between multiple different arrays. Another factor is that foreach loops have some overhead, so placing this in the outer loop will also help a bit.
You might gain a little bit performance by switching out the queue and foreach loop to a list/array and for loop, but since PixelSize is much larger than CountToAverage I would expect the benefit to be fairly small.
Unrolling the loop to process say 4 values at a time might help a bit. It is possible for the c# compiler to apply such optimization automatically, but it is often difficult to tell what optimization are applied or not, so it might be easier just to test.
The next step would be to look at parallelization. Simple summing code like this might benefit a from SIMD to process multiple values at a time. But the link shows that using processor specific intrinsic has a much larger benefit over the more general Vector<T>, but may require separate code paths for each platform you are targeting. The link also have performance examples of summing values at various levels of optimization, with example code, so is well worth a read.
Another option would be to use multiple threads with Parallel.For/Foreach, but at 6μs it is likely that the overhead will be larger than any gains unless the size of the data is significantly larger.
I have a list of 500000 randomly generated Tuple<long,long,string> objects on which I am performing a simple "between" search:
var data = new List<Tuple<long,long,string>>(500000);
...
var cnt = data.Count(t => t.Item1 <= x && t.Item2 >= x);
When I generate my random array and run my search for 100 randomly generated values of x, the searches complete in about four seconds. Knowing of the great wonders that sorting does to searching, however, I decided to sort my data - first by Item1, then by Item2, and finally by Item3 - before running my 100 searches. I expected the sorted version to perform a little faster because of branch prediction: my thinking has been that once we get to the point where Item1 == x, all further checks of t.Item1 <= x would predict the branch correctly as "no take", speeding up the tail portion of the search. Much to my surprise, the searches took twice as long on a sorted array!
I tried switching around the order in which I ran my experiments, and used different seed for the random number generator, but the effect has been the same: searches in an unsorted array ran nearly twice as fast as the searches in the same array, but sorted!
Does anyone have a good explanation of this strange effect? The source code of my tests follows; I am using .NET 4.0.
private const int TotalCount = 500000;
private const int TotalQueries = 100;
private static long NextLong(Random r) {
var data = new byte[8];
r.NextBytes(data);
return BitConverter.ToInt64(data, 0);
}
private class TupleComparer : IComparer<Tuple<long,long,string>> {
public int Compare(Tuple<long,long,string> x, Tuple<long,long,string> y) {
var res = x.Item1.CompareTo(y.Item1);
if (res != 0) return res;
res = x.Item2.CompareTo(y.Item2);
return (res != 0) ? res : String.CompareOrdinal(x.Item3, y.Item3);
}
}
static void Test(bool doSort) {
var data = new List<Tuple<long,long,string>>(TotalCount);
var random = new Random(1000000007);
var sw = new Stopwatch();
sw.Start();
for (var i = 0 ; i != TotalCount ; i++) {
var a = NextLong(random);
var b = NextLong(random);
if (a > b) {
var tmp = a;
a = b;
b = tmp;
}
var s = string.Format("{0}-{1}", a, b);
data.Add(Tuple.Create(a, b, s));
}
sw.Stop();
if (doSort) {
data.Sort(new TupleComparer());
}
Console.WriteLine("Populated in {0}", sw.Elapsed);
sw.Reset();
var total = 0L;
sw.Start();
for (var i = 0 ; i != TotalQueries ; i++) {
var x = NextLong(random);
var cnt = data.Count(t => t.Item1 <= x && t.Item2 >= x);
total += cnt;
}
sw.Stop();
Console.WriteLine("Found {0} matches in {1} ({2})", total, sw.Elapsed, doSort ? "Sorted" : "Unsorted");
}
static void Main() {
Test(false);
Test(true);
Test(false);
Test(true);
}
Populated in 00:00:01.3176257
Found 15614281 matches in 00:00:04.2463478 (Unsorted)
Populated in 00:00:01.3345087
Found 15614281 matches in 00:00:08.5393730 (Sorted)
Populated in 00:00:01.3665681
Found 15614281 matches in 00:00:04.1796578 (Unsorted)
Populated in 00:00:01.3326378
Found 15614281 matches in 00:00:08.6027886 (Sorted)
When you are using the unsorted list all tuples are accessed in memory-order. They have been allocated consecutively in RAM. CPUs love accessing memory sequentially because they can speculatively request the next cache line so it will always be present when needed.
When you are sorting the list you put it into random order because your sort keys are randomly generated. This means that the memory accesses to tuple members are unpredictable. The CPU cannot prefetch memory and almost every access to a tuple is a cache miss.
This is a nice example for a specific advantage of GC memory management: data structures which have been allocated together and are used together perform very nicely. They have great locality of reference.
The penalty from cache misses outweighs the saved branch prediction penalty in this case.
Try switching to a struct-tuple. This will restore performance because no pointer-dereference needs to occur at runtime to access tuple members.
Chris Sinclair notes in the comments that "for TotalCount around 10,000 or less, the sorted version does perform faster". This is because a small list fits entirely into the CPU cache. The memory accesses might be unpredictable but the target is always in cache. I believe there is still a small penalty because even a load from cache takes some cycles. But that seems not to be a problem because the CPU can juggle multiple outstanding loads, thereby increasing throughput. Whenever the CPU hits a wait for memory it will still speed ahead in the instruction stream to queue as many memory operations as it can. This technique is used to hide latency.
This kind of behavior shows how hard it is to predict performance on modern CPUs. The fact that we are only 2x slower when going from sequential to random memory access tell me how much is going on under the covers to hide memory latency. A memory access can stall the CPU for 50-200 cycles. Given that number one could expect the program to become >10x slower when introducing random memory accesses.
LINQ doesn't know whether you list is sorted or not.
Since Count with predicate parameter is extension method for all IEnumerables, I think it doesn't even know if it's running over the collection with efficient random access. So, it simply checks every element and Usr explained why performance got lower.
To exploit performance benefits of sorted array (such as binary search), you'll have to do a little bit more coding.
I am wondering if there is difference (in the aspect of performance, memory saving) to define a list with a default size or specify one.
List<object> m_objects = new List<object>();
or
List<object> m_objects = new List<object>(100);
Both will increase the size by doubling up if more items are added, right?
Thanks,
If you know that you will have more than 100 items, the second one is faster.
Every time it "doubles up", it needs to copy the contents of the entire existing array. For large lists, this can be slow.
If you specify the capacity, it won't need to resize at all until it gets bigger than what you specified.
If you never add more than 100 items, it justs wastes a bit of memory (specifically, IntPtr.Size * (Capacity - Count))
The capacity of a list start with 0 if you not specify in the constructor, and it's increasing when necessarily (first to 4 then always double the previous value).
var list = new List<object>();
int capacity = list.Capacity;
Console.WriteLine("Initial capacity: {0}", list.Capacity);
for (int i = 0; i < 10000; i++)
{
list.Add(new object());
if (list.Capacity > capacity)
{
capacity = list.Capacity;
Console.WriteLine("Capacity is {0} when count is {1}", list.Capacity, list.Count);
}
List<T> is, under the covers, an array. Its initial size looks to be 4 elements. When that is exceeded, the underlying array is reallocated with twice the size. So if you know the likely maximum size of the list, you're better off specifying it as you'll avoid relatively expensive allocations and copying.
If your list is less than 100,000 in size, the performance is the same in milliseconds!
But if your list is larger than say 1,000,000 the second way will be faster.
This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
When should I use a List vs a LinkedList
This question is related to my earlier question which was merged: related to List vs LinkedList
If I expect not to use the access by index for my data structure how much do I save by using LinkedList over List ? If if am not 100% sure I will never use access by index, I would like to know the difference.
Suppose I have N instances. inserting and removing in a LinkedList will only be a o(1) op , where as in List it may me be O(n), but since it it optimized, it would be nice to know what the difference is for some values of n. say N = 1,000,000 and N = 1,000,000,000
OK I did this experiment and here is the result:
This is the elapsed ticks for a list and linked list with 1000,000 items:
LinkedList 500 insert/remove operations: 10171
List 500 insert/remove operations: 968465
Linked list is 100 times faster when compared for 1000,000 items.
Here is the code:
static void Main(string[] args)
{
const int N = 1000*1000;
Random r = new Random();
LinkedList<int> linkedList = new LinkedList<int>();
List<int> list = new List<int>();
List<LinkedListNode<int>> linkedListNodes = new List<LinkedListNode<int>>();
for (int i = 0; i < N; i++)
{
list.Add(r.Next());
LinkedListNode<int> linkedListNode = linkedList.AddFirst(r.Next());
if(r.Next() % 997 == 0)
linkedListNodes.Add(linkedListNode);
}
Stopwatch stopwatch = new Stopwatch();
stopwatch.Start();
for (int i = 0; i < 500; i++)
{
linkedList.AddBefore(linkedListNodes[i], r.Next());
linkedList.Remove(linkedListNodes[i]);
}
stopwatch.Stop();
Console.WriteLine("LinkedList 500 insert/remove operations: {0}", stopwatch.ElapsedTicks);
stopwatch.Reset();
stopwatch.Start();
for (int i = 0; i < 500; i++)
{
list.Insert(r.Next(0,list.Count), r.Next());
list.RemoveAt(r.Next(0, list.Count));
}
stopwatch.Stop();
Console.WriteLine("List 500 insert/remove operations: {0}", stopwatch.ElapsedTicks);
Console.Read();
}
}
Don't worry about it. Write your app as you would normally. When it's finished, run it a few times with real data and profile the performance. Swap out the class you're using and compare results.
List uses an array internally, which always has a fixed size, for example it uses size of 16 (usually power of two), so even if you store one item, it will hold array with size 16 with 15 being empty.
Now assume, When you store all 16 items and when you try to add 17th item, it will now create a new array of 32, (16 + 16), copy all 16 from old array to new array, and then put item on 17th place in array of 32 size.
Now, when you remove an item from 1st place, all items after 1st item, are moved to one step ahead, so 2nd becomes 1st, 3rd becomes 2nd etc etc.
Where else Linked List is made out of pointers, it is not array, it only occupies nodes that you create, and it stores references of its previous and next nodes.
For performance issues, List will work great if you are going to add/remove items very rarely but you will be iterating (reading entire list, or accessing items via index) very frequently. List will be better for random access scenarios, like you will be able to get items very fast with index specified.
Linked list will work great if you are going to add/remove items very frequently, but you will be iterating it (entire list) rarely. Linked list will be more appropriate for sequential access, where you will navigate back or front from current node, but it will be of very poor performance when you will try to find indexed items.
Back-inserting in a List<T> actually runs in amortized constant time. However, removing elements is expensive (O(n)).
Is the Lookup Time for a HashTable or Dictionary Always O(1) as long as it has a Unique Hash Code?
If a HashTable has 100 Million Rows would it take the same amount of time to look up as something that has 1 Row?
No. It is technically possible but it would be extremely rare to get the exact same amount of overhead. A hash table is organized into buckets. Dictionary<> (and Hashtable) calculate a bucket number for the object with an expression like this:
int bucket = key.GetHashCode() % totalNumberOfBuckets;
So two objects with a different hash code can end of in the same bucket. A bucket is a List<>, the indexer next searches that list for the key which is O(n) where n is the number of items in the bucket.
Dictionary<> dynamically increases the value of totalNumberOfBuckets to keep the bucket search efficient. When you pump a hundred million items in the dictionary, there will be thousands of buckets. The odds that the bucket is empty when you add an item will be quite small. But if it is by chance then, yes, it will take just as long to retrieve the item.
The amount of overhead increases very slowly as the number of items grows. This is called amortized O(1).
Might be helpful : .NET HashTable Vs Dictionary - Can the Dictionary be as fast?
As long as there are no collisions with the hashes, yes.
var dict = new Dictionary<string, string>();
for (int i = 0; i < 100; i++) {
dict.Add("" + i, "" + i);
}
long start = DateTime.Now.Ticks;
string s = dict["10"];
Console.WriteLine(DateTime.Now.Ticks - start);
for (int i = 100; i < 100000; i++) {
dict.Add("" + i, "" + i);
}
start = DateTime.Now.Ticks;
s = dict["10000"];
Console.WriteLine(DateTime.Now.Ticks - start);
This prints 0 on both cases. So it seems the answer would be Yes.
[Got moded down so I'll explain better]
It seems that it is constant. But it depends on the Hash function giving a different result in all keys. As there is no hash function that can do that it all boils down to the Data that you feed to the Dictionary. So you will have to test with your data to see if it is constant.