In my quest to the primes, I've already asked this question : Can't create huge arrays which lead me to create my own class of fake arrays based on a dictionary of arrays... : private Dictionary<int, Array> arrays = new Dictionary<int, Array>();
I can know create fake arrays of a lot of bool (like 10 000 000 000) using the code below:
public class CustomArray
{
private Dictionary<int, Array> arrays = new Dictionary<int, Array>();
public CustomArray(ulong lenght)
{
int i = 0;
while (lenght > 0x7FFFFFC7)
{
lenght -= 0x7FFFFFC7;
arrays[i] = new bool[0x7FFFFFC7];
i++;
}
arrays[i] = new bool[lenght];
}
}
But it crashes as soon as I ask for a CustomArray of 100 000 000 000 elements. It works well for the 25 first iterations (my Dictionary contains 25 arrays of 0x7FFFFFC7 elements) but then it crashes with an OutOfMemory exception.
As a remainder, I've got 16GB memory, VS2013, the program is compiled in 64bits, I've enabled the gcAllowVeryLargeObjects option and I don't see any memory peak in the Task Manager.
How can I avoid this error?
100000000000 bools means ~93 GB of memory. You only have #50 GB (including the default allocated virtual memory).
Storing them as bits (not as bytes), would get you down to ~12GB.
Look at System.Collection.BitArray
Related
I apologize if this is in the incorrect forum. Despite finding a lot of Array manipulation on this site, most of these are averaging/summing... the array of numerics as a set using LINQ, which processes well for all values in an array. But I need to process each index over multiple arrays (of the same size).
My routine receives array data from devices, typically double[512] or ushort[512]; A single device itself will always have the same size of Array data, but the array sizes can range from 256 to 2048 depending on the device. I need to hold CountToAverage quantity of the arrays to average. Each time an array is received, it must push and pop from the queue to ensure that the number of arrays in the average process is consistent (this part of the process is fixed in the Setup() for this benchmark testing. For comparison purposes, the benchmark results are shown after the code.
What I am looking for is the fastest most efficient way to average the values of each index of all the arrays, and return a new array (of the same size) where each index is averaged from the set of arrays. The count of arrays to be averaged can range from 3 to 25 (the code below sets benchmark param to 10). I have 2 different averaging methods in the test, the 2nd is significantly faster, 6-7 times faster than the first. My first question is; Is there any way to achieve this faster, that can be done at O(1) or O(log n) time complexity?
Secondarily, I am using a Queue (which may be changed to ConcurrentQueue for implementation) as a holder for the arrays to be processed. My primary reasoning for using a queue is because I can guarantee FIFO processing of the feed of arrays which is critical. Also, I can process against the values in the Queue through a foreach loop (just like a List) without having to dequeue until I am ready. I would be interested if anyone knows whether this is performance hindering as I haven't benchmarked it. Keep in mind it must be thread-safe. If you have an alternative way to process multiple sets of array data in a thread-safe manner I am "all ears".
The reason for the performance requirement is this is not the only process that is happening, I have multiple devices that are sending array results "streamed" at an approximate rate of 1 every 1-5 milliseconds, for each device coming from different threads/processes/connections, that still has several other much more intensive algorithms to process through, so this cannot be a bottleneck.
Any insights on optimizations and performance are appreciated.
using System;
using System.Collections.Generic;
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Jobs;
using BenchmarkDotNet.Running;
using Microsoft.Diagnostics.Tracing.Parsers.MicrosoftAntimalwareEngine;
namespace ArrayAverage
{
public class ArrayAverage
{
[Params(10)]
public int CountToAverage;
[Params(512, 2048)]
public int PixelSize;
static Queue<double[]> calcRepo = new Queue<double[]>();
static List<double[]> spectra = new();
[Benchmark]
public double[] CalculateIndexAverages()
{
// This is too slow
var avg = new double[PixelSize];
for (int i = 0; i < PixelSize; i++)
{
foreach (var arrayData in calcRepo)
{
avg[i] += arrayData[i];
}
avg[i] /= calcRepo.Count;
}
return avg;
}
[Benchmark]
public double[] CalculateIndexAverages2()
{
// this is faster, but is it the fastest?
var sum = new double[PixelSize];
int cnt = calcRepo.Count;
foreach (var arrayData in calcRepo)
{
for (int i = 0; i < PixelSize; i++)
{
sum[i] += arrayData[i];
}
}
var avg = new double[PixelSize];
for (int i = 0; i < PixelSize; i++)
{
avg[i] = sum[i] / cnt;
}
return avg;
}
[GlobalSetup]
public void Setup()
{
// Just generating some data as simple Triangular curve simulating a range of spectra
for (double offset = 0; offset < CountToAverage; offset++)
{
var values = new double[PixelSize];
var decrement = 0;
for (int i = 0; i < PixelSize; i++)
{
if (i > (PixelSize / 2))
decrement--;
values[i] = (offset / 7) + i + (decrement * 2);
}
calcRepo.Enqueue(values);
}
}
}
public class App
{
public static void Main()
{
BenchmarkRunner.Run<ArrayAverage>();
}
}
}
Benchmark results:
BenchmarkDotNet=v0.13.1, OS=Windows 10.0.19043.1348 (21H1/May2021Update)
Intel Core i7-6700HQ CPU 2.60GHz (Skylake), 1 CPU, 8 logical and 4 physical cores
.NET SDK=6.0.100-preview.7.21379.14
[Host] : .NET 5.0.12 (5.0.1221.52207), X64 RyuJIT [AttachedDebugger]
DefaultJob : .NET 5.0.12 (5.0.1221.52207), X64 RyuJIT
Method
Arrays To Average
Array Size
Mean
Error
StdDev
CalculateIndexAverages
10
512
32.164 μs
0.5485 μs
0.5130 μs
CalculateIndexAverages2
10
512
5.792 μs
0.1135 μs
0.2241 μs
CalculateIndexAverages
10
2048
123.628 μs
2.3394 μs
1.9535 μs
CalculateIndexAverages2
10
2048
22.311 μs
0.4366 μs
0.8093 μs
When dealing with simple operations on a large amount of data, you'd be very interested in SIMD:
SIMD stands for "single instruction, multiple data". It’s a set of processor instructions that ... allows mathematical operations to execute over a set of values in parallel.
In your particular case, using the the Vector<T> example would give you a quick win. Naively converting your fastest method to use Vectors already gives a ~2x speed up on my PC.
public double[] CalculateIndexAverages4() {
// Assumption: PixelSize is a round multiple of Vector<>.Count
// If not, you'll have to add in the 'remainder' from the example.
var batch = Vector<double>.Count;
var sum = new double[PixelSize];
foreach (var arrayData in calcRepo) {
// Vectorised summing:
for (int i = 0; i <= PixelSize - batch; i += batch) {
var vSum = new Vector<double>(sum, i);
var vData = new Vector<double>(arrayData, i);
(vSum + vData).CopyTo(sum, i);
}
}
var vCnt = Vector<double>.One * calcRepo.Count;
// Reuse sum[] for averaging, so we don't incur memory allocation cost
for (int i = 0; i <= PixelSize - batch; i += batch) {
var vSum = new Vector<double>(sum, i);
(vSum / vCnt).CopyTo(sum, i);
}
return sum;
}
The Vector<T>.Count gives you how many items are being parallelised into one instruction. In the case of double, it's likely to be 4 on most modern CPUs supporting AVX2.
If you're okay with losing precision and can go to float, you'll get a much bigger win by again doubling the amount of data processed in a single CPU op. All of this without even changing your algorithm.
You can further optimize the code by reducing memory allocations. If the method is called frequently, time spent on GC will dominate completely.
// Assuming the data fits on the stack. Some 100k pixels should be safe.
Span<double> sum = stackalloc double[PixelSize];
// ...
Span<double> avg = stackalloc double[PixelSize];
And possibly also remove the extra stack-allocation of avg and simply reuse the sum:
for (int i = 0; i < sum.Length; i++)
{
sum[i] /= cnt;
}
// TODO: Avoid array allocation! Maybe use a pre-allocated array and fill it here.
return sum.ToArray();
In my opinion this would be fairly well optimized code. A major reason for the second option to be faster is that it access memory linearly, instead of jumping between multiple different arrays. Another factor is that foreach loops have some overhead, so placing this in the outer loop will also help a bit.
You might gain a little bit performance by switching out the queue and foreach loop to a list/array and for loop, but since PixelSize is much larger than CountToAverage I would expect the benefit to be fairly small.
Unrolling the loop to process say 4 values at a time might help a bit. It is possible for the c# compiler to apply such optimization automatically, but it is often difficult to tell what optimization are applied or not, so it might be easier just to test.
The next step would be to look at parallelization. Simple summing code like this might benefit a from SIMD to process multiple values at a time. But the link shows that using processor specific intrinsic has a much larger benefit over the more general Vector<T>, but may require separate code paths for each platform you are targeting. The link also have performance examples of summing values at various levels of optimization, with example code, so is well worth a read.
Another option would be to use multiple threads with Parallel.For/Foreach, but at 6μs it is likely that the overhead will be larger than any gains unless the size of the data is significantly larger.
I'm looking for a solution on how to index a large set of strings - say 100 000 000 (probably more) with an average length of 50 bytes each (= 5 000 000 000 = 5 GB of data; and then in UTF16 and with .NET memory allocation, even more).
I then want to use the index to allow other processes to query if a string exists in the index -- and this as fast as possible.
I've done some simple testing with a large memory based HashSet - about 1 000 000 strings - and looking up e.g. 50 000 strings in that HashSet is only a matter of milliseconds.
Here's some pseudo code for what I want to achieve:
// 1) create huge disk based HashSet / Index / Lookup
using (var hs = DiskBasedHashSet<string>(#"c:\index.bin", .create)) {
for each (var s in lotsOfStringsToIndex) {
hs.Add(s);
}
}
// 2) use index to check if items exists - this need to be fast
public static class Query {
static var hs = DiskBasedHashSet<string>(#"c:\index.bin", .read);
// callable from anywhere, and really fast
public static QueryItem(string s) {
return hs.Contains(s);
}
}
for each (var s in checkForThese) {
var result = Query.QueryItem(s);
}
I've tried using SQL Servers, Lucene.NET, and B+Trees, with and without partitioning data. Anyhow, these solutions are to slow and, I think, overqualified for this task. Immagine, the overhead of creating a SQL-query or a Lucene Filter, just do check for a string in a set.
I'm storing some data in a Math.net vector, as I have to do some calculations with it as a whole. This data comes with a time information when it was collected. So for example:
Initial = 5, Time 2 = 7, Time 3 = 8, Time 4 = 10
So when I store the data in a Vector it looks like this.
stateVectorData = [5,7,8,10]
Now sometimes I need to extract a single entry of the vector. But I don't have the index itself, but a time Information. So what I try is a dictionary with the information of the time and the index of the data in my stateVector.
Dictionary<int, int> stateDictionary = new Dictionary<int, int>(); //Dict(Time, index)
Everytime I get new data I add an entry to the dictionary(and of course to the stateVector). So at Time 2 I did:
stateDictionary.Add(2,1);
Now this works as long as I don't change my vector. Unfortunately I have to delete an entry in the vector when it gets too old. Assume time 2 is too old I delete the second entry and have a resulting vector of:
stateVector = [5,8,10]
Now my dictionary has the wrong index values stored.
I can think of two possible solutions how to solve this.
To loop through the dictionary and decrease every value (with key > 2) by 1.
What I think would be more elegant, is storing a reference to an vectorentry in the dictionary instead of the index.
So something like
Dictionary<int, ref int> stateDictionary =
new Dictionary<int, ref int>(); //Dict(Time, reference to vectorentry)
stateDictionary.Add(2, ref stateVector[1]);
Using something like this, I wouldn't care about deleting some entrys in the vector, as I still have the reference to the rest of the vectorentries. Now I know it's not possible to store a reference in C#.
So my question is, is there any alternative to looping through the whole dictionary? Or is there another solution without a dictionary I don't see at the moment?
Edit to answer juharr:
Time information doesn't always increase by one. Depends on some parallel running process and how long it takes. Probably increasing between 1 to 3. But also could be more.
There are some values in the vector which never get deleted. I tried to show this with the initial value of 5 which stays in the vector.
Edit 2:
Vector stores at least 5000 to 6000 elements. Maximum is not defined at the moment, as it is restricted by the elements I can handle in real time, so in my case I have about 0.01s to do my further calculations. This is why I search an effective way, so I can increase the number of elements in the vector (or increase the maximum "age" of my vectorentries).
I need the whole vector for calculation about 3 times the number I need to add a value.
I have to delete an entry with the lowest frequency. And finding a single value by its time key will be the most often case. Maybe 30 to 100 times a second.
I know this all sounds very undefined, but the frequency of finding and deleting part depends on an other process, which can vary a lot.
Though hope you can help me. Thanks so far.
Edit 3:
#Robinson
The exact number of times I need the whole vector also depends on the parallel process. Minimum would be two times every iteration (so twice in 0.01s), maximum at least 4 to 6 times every iteration.
Again, the size of the vector is what I want to maximize. So assumed to be very big.
Edit Solution:
First thanks to all, who helped me.
After experimenting a bit, I'm using the following construction.
I'm using a List, where I save the indexes in my state vector.
Additionally I use a Dictionary to assign my Time-key to the List Entry.
So when I delete something in the state vector, I loop only over the List, which seems to be much faster than looping the dictionary.
So it is:
stateVectorData = [5,7,8,10]
IndexList = [1,2,3];
stateDictionary = { Time 2, indexInList = 0; Time 3, indexInList = 1; Time 4, indexInList = 2 }
TimeKey->stateDictionary->indexInList -> IndexList -> indexInStateVector -> data
You can try this:
public class Vector
{
private List<int> _timeElements = new List<int>();
public Vector(int[] times)
{
Add(times);
}
public void Add(int time)
{
_timeElements.Add(time);
}
public void Add(int[] times)
{
_timeElements.AddRange(time);
}
public void Remove(int time)
{
_timeElements.Remove(time);
if (OnRemove != null)
OnRemove(this, time);
}
public List<int> Elements { get { return _timeElements; } }
public event Action<Vector, int> OnRemove;
}
public class Vectors
{
private Dictionary<int, List<Vector>> _timeIndex;
public Vectors(int maxTimeSize)
{
_timeIndex = new Dictionary<int, List<Vector>>(maxTimeSize);
for (var i = 0; i < maxTimeSize; i++)
_timeIndex.Add(i, new List<Vector>());
List = new List<Vector>();
}
public List<Vector> FindVectorsByTime(int time)
{
return _timeIndex[time];
}
public List<Vector> List { get; private set; }
public void Add(Vector vector)
{
List.Add(vector);
vector.Elements.ForEach(element => _timeIndex[element].Add(vector));
vector.OnRemove += OnRemove;
}
private void OnRemove(Vector vector, int time)
{
_timeIndex[time].Remove(vector);
}
}
To use:
var vectors = new Vectors(maxTimeSize: 6000);
var vector1 = new Vector(new[] { 5, 30, 8, 20 });
var vector2 = new Vector(new[] { 25, 5, 23, 11 });
vectors.Add(vector1);
vectors.Add(vector2);
var findsTwo = vectors.FindVectors(time: 5);
vector1.Remove(time: 5);
var findsOne = vectors.FindVectors(time: 5);
The same can be done for adding times, also the code is just for illustration purposes.
As an exercise in personal education and experimentation, I want to create my own HashTable class. Specifically, I'd like to write this object, without using any existing code (i.e. this object will not inherit from another class) other than mapping to existing interfaces for testing purposes.
Since I'm planning on writing this in C#, my "benchmark" is going to be the .Net HashSet<T> class. I can easily test against the time of execution for add, remove and look-up requests, but I have no clue how to test the size of the HashSet benchmark object, including all buckets that are empty for future add requests.
How can I track the size of a HashSet<t> object as it dynamically grows to make room for future insertions?
To be clear, I don't need to know in the exact number of bytes (I understand that the .Net framework makes it a bit difficult to get the exact size of many types of objects) but rather I'd prefer to know how many buckets are in use and how many are empty, waiting to be used, as I execute various types of test.
The best way to get the number and size of the buckets is to use reflection. The only trouble is that you need to understand the collection's behavior first. After reading the code a bit and doing some trial and error, it seems you need to count the size of the private m_buckets array to get the number of buckets, and count the number of values greater than 0 to get the number of used buckets. The method would look like:
static void CountBuckets<T>(HashSet<T> hashSet)
{
var field = typeof(HashSet<T>).GetField("m_buckets", System.Reflection.BindingFlags.Instance | System.Reflection.BindingFlags.NonPublic);
var buckets = (int[])field.GetValue(hashSet);
int numberOfBuckets = 0;
int numberOfBucketsUsed = 0;
if (buckets != null)
{
numberOfBuckets = buckets.Length;
numberOfBucketsUsed = buckets.Where(i => i != 0).Count();
}
Console.WriteLine("Number of buckets: {0} / Used: {1}", numberOfBuckets, numberOfBucketsUsed);
}
To test it, I first created a custom class where I could manually set the hash code:
public class Hash
{
private readonly int hashCode;
public Hash(int hashCode)
{
this.hashCode = hashCode;
}
public override int GetHashCode()
{
return this.hashCode;
}
}
From there, I did some tests:
var hashSet = new HashSet<Hash>();
CountBuckets(hashSet);
// Number of buckets: 0 / Used: 0
var firstHash = new Hash(0);
hashSet.Add(firstHash);
CountBuckets(hashSet);
// Number of buckets: 3 / Used: 1
hashSet.Add(new Hash(1));
hashSet.Add(new Hash(2));
CountBuckets(hashSet);
// Number of buckets: 3 / Used: 3
hashSet.Add(new Hash(3));
CountBuckets(hashSet);
// Number of buckets: 7 / Used: 4
hashSet.Add(new Hash(1));
CountBuckets(hashSet);
// Number of buckets: 7 / Used: 4
hashSet.Remove(firstHash);
CountBuckets(hashSet);
// Number of buckets: 7 / Used: 3
It sounds consistent with the intuitive behavior. First, the number of buckets is 0. After adding an element, it's expanded to 3. The number of buckets stay stable until a fourth element is added, expanding the count to 7. When simulating a hash collision, the number of used buckets stay stable, as expected. And removing an element decreases the number of used buckets.
I am not very familiar with internals of HashSet but you can see its source and use Reflection to gets its internal values:
HashSet<int> hashSet = new HashSet<int>();
var countField = typeof(HashSet<int>).GetField("m_count", BindingFlags.NonPublic | BindingFlags.Instance);
var freeListField = typeof(HashSet<int>).GetField("m_freeList", BindingFlags.NonPublic | BindingFlags.Instance);
var count = countField.GetValue(hashSet);
var freeList = freeListField.GetValue(hashSet);
Note: Such violation of private member access is of course very ugly but in your development/testing phase can be accepted I believe.
thst is interesting questionstrong text... i have a radical suggestion for you:
start your application and get the size of memory before initializing the HashSet. you can do so by using Process.GetCurrentProcess().WorkingSet64 (on msdn: http://msdn.microsoft.com/en-us/library/system.diagnostics.process.workingset64(v=vs.110).aspx)
then populate your HashSet and print Process.GetCurrentProcess().WorkingSet64 again. the difference would be the size you seek for.
Working with IP4 addresses, which obviously can be stored in 32 bits.
I need to keep track of a list of ip's, which can be a fairly long list, so would like to keep it tight as possible. I also need to have quick search of the list, so as to check if an IP is already loaded in it.
I'm currently looking at: convert the IP to a UInt32, then store the list in a HashSet.
I'm thinking there may be a better way though?
update: Hashsets of course generate hashes, which are larger than the 4 bytes of a uint. So to truly optimize this, specifically for IP4 addresses, a similar structure is needed which is optimized for 4 bytes
If the list is relatively static (i.e. doesn't change very often), then an array or a List<uint> would be a very simple way to store it. It gives you O(log n) lookup with BinarySearch, which is probably fast enough unless you're doing thousands of lookups per second. Inserting a new item in the list, though, is an O(n) operation. If you have to do a lot of inserts, this isn't the way to go.
A HashSet<uint> works well and has much faster lookup and insertion. But it will cost you. A HashSet<uint> will occupy about 3 times as much memory as a List<uint>.
Justification of 3X memory use:
The program below allocates a List<uint> that contains 89,478,457 items, which used to be the maximum sized HashSet one could create. (Up through .NET 4.0.) It then fills that list with unique values and creates a HashSet<uint> from the list.
The program calculates the total allocated memory by calling GC.GetTotalMemory(true), which forces a garbage collection. It then computes the amount of memory required for the list and for the hash set.
Tests run with .NET 4.5, Visual Studio 2012. Run in release mode without the debugger attached.
My output:
Max size = 89,478,457
Starting memory = 53,240
89,000,000
After list populated = 357,967,136
89,478,457 items in the HashSet
After HashSet populated = 1,789,622,704
List occupies 357,913,896
HashSet occupies 1,431,655,568
HashSet occupies 4.00 times the memory of List
Press Enter:
So I was wrong ... it's 4X for uint. It's 3.5X for ulong.
private void DoStuff()
{
int maxSize = 89478457;
//89000000;
Console.WriteLine("Max size = {0:N0}", maxSize);
var startMem = GC.GetTotalMemory(true);
Console.WriteLine("Starting memory = {0:N0}", startMem);
// Initialize a List<long> to hold maxSize items
var l = new List<uint>(maxSize);
// now add items to the list
for (uint i = 0; i < maxSize; i++)
{
if ((i % 1000000) == 0)
{
Console.Write("\r{0:N0}", i);
}
l.Add(i);
}
Console.WriteLine();
var memAfterListAlloc = GC.GetTotalMemory(true);
Console.WriteLine("After list populated = {0:N0}", memAfterListAlloc);
// Construct a HashSet from that list
var h = new HashSet<uint>(l);
Console.WriteLine("{0:N0} items in the HashSet", h.Count);
var memAfterHashAlloc = GC.GetTotalMemory(true);
Console.WriteLine("After HashSet populated = {0:N0}", memAfterHashAlloc);
var listMem = memAfterListAlloc - startMem;
var hashMem = memAfterHashAlloc - memAfterListAlloc;
Console.WriteLine("List occupies {0:N0}", listMem);
Console.WriteLine("HashSet occupies {0:N0}", hashMem);
Console.WriteLine("HashSet occupies {0:N2} times the memory of List", (double)hashMem / listMem);
GC.KeepAlive(l);
GC.KeepAlive(h);
Console.Write("Press Enter:");
Console.ReadLine();
}