Fast reading of an array of structs from a binary file

Fast reading of an array of structs from a binary file - c#

Is it possible to read an array of structs from binary file in one call?
For example, I have a file containing thousands of vertices:
struct Vector3 { float x, y, z; }
I need C# port for the C++ code:
Vector3 *verts = new Vector3[num_verts];
fread ( verts, sizeof(Vector3), num_verts, f );

Here's one (of a few) ways:
void Main()
{
var pts =
(from x in Enumerable.Range(0, 10)
from y in Enumerable.Range(0, 10)
from z in Enumerable.Range(0, 10)
select new Vector3(){X = x, Y = y, Z = z}).ToArray();
// write it out...
var bigAssByteArray = new byte[Marshal.SizeOf(typeof(Vector3)) * pts.Length];
var pinnedHandle = GCHandle.Alloc(pts, GCHandleType.Pinned);
Marshal.Copy(pinnedHandle.AddrOfPinnedObject(), bigAssByteArray, 0, bigAssByteArray.Length);
pinnedHandle.Free();
File.WriteAllBytes(#"c:\temp\vectors.out", bigAssByteArray);
// ok, read it back...
var readBytes = File.ReadAllBytes(#"c:\temp\vectors.out");
var numVectors = readBytes.Length / Marshal.SizeOf(typeof(Vector3));
var readVectors = new Vector3[numVectors];
pinnedHandle = GCHandle.Alloc(readVectors, GCHandleType.Pinned);
Marshal.Copy(readBytes, 0, pinnedHandle.AddrOfPinnedObject(), readBytes.Length);
pinnedHandle.Free();
var allEqual =
pts.Zip(readVectors,
(v1,v2) => (v1.X == v2.X) && (v1.Y == v2.Y) && (v1.Z == v2.Z))
.All(p => p);
Console.WriteLine("pts == readVectors? {0}", allEqual);
}
struct Vector3
{
public float X;
public float Y;
public float Z;
}

Yes, it's possible, but you would have to add attributes to the structure so that you specify exactly how it's mapped in memory so that there is no padding in the struct.
Often it's easier to just convert the data yourself. The vast majority of the processing time will be reading the data from the file, so the overhead of converting the data is small. Example:
byte[] bytes = File.ReadAllBytes(fileName);
Vector3[] data = new Vector3[bytes.Length / 12];
for (var i = 0; i < data.Length; i++) {
Vector3 item;
item.x = BitConverter.ToSingle(bytes, i * 12);
item.y = BitConverter.ToSingle(bytes, i * 12 + 4);
item.z = BitConverter.ToSingle(bytes, i * 12 + 8);
data[i] = item;
}

Related

Splitting an array of points into multiple arrays that have a specific distance from each other

I have an array of points that look like this (x,y):
6,12
6,13.25
6,14.5
6,15.75
6,17
6,18.25
18,12
18,13.25
18,14.5
18,15.75
18,17
18,18.25
This array represents a two, separate, series of points that are on a 1.25" offset from each other. There is two series, but there can be [n] series, and I need to split them up into the appropriate number of arrays based on the specific offset like this:
6,12
6,13.25
6,14.5
6,15.75
6,17
6,18.25
18,12
18,13.25
18,14.5
18,15.75
18,17
18,18.25
The offset can be in the X or the Y, but not both. I have been working on it for a little while now and am kind of stuck.
Edit
What I have tried so far, is to get a point in the array, and search for all points that are on the interval specified (getDist calculates the distance from one point to another):
foreach(Point firstPoint in points){
foreach(Point nextPoint in points){
if(isSame(firstPoint, nextPoint)
continue;
if(getDist(firstPoint, nextPoint) % 1.25 == 0){
// add to new array
}
}
}
double getDist(Point p1, Point p2) => Math.Sqrt(Math.Pow(p2.X - p1.X, 2) + Math.Pow(p2.Y - p1.Y, 2));
bool isSame(Point p1, Point p2) => p1.X == p2.X && p1.Y == p2.Y;
The problem I run into is that this looks at all the points, then looks at them each one again creating a huge set of arrays.

Using linq :
decimal[,] data = {{6,12}, {6,13.25M}, {6, 14.5M}, {6,15.75M}, {6,17}, {6,18.25M},
{18,12},{18,13.25M}, {18,14.5M},{18,15.75M},{18,17},{18,18.25M}};
var groups = data.Cast<decimal>()
.Select((x, i) => new { num = x, index = i })
.GroupBy(x => x.index / 2)
.Select(x => new decimal[] { x.FirstOrDefault().num, x.LastOrDefault().num })
.GroupBy(x => x.FirstOrDefault())
.Select(x => x.ToArray()).ToArray();

I tried to do it, but instead of storing the results in arrays, I used lists (the code might need improvements though):
List<List<PointF>> splitted_list = new List<List<PointF>>();
float x = 0;
float y = 0;
float offsetX = 0;
float offsetY = 0;
foreach(PointF p in points)
{
if (splitted_list.Count() > 0)
{
if (splitted_list.Last().Count() >= 2 && (p.X - x != offsetX || p.Y - y != offsetY))
{
List<PointF> list_points = new List<PointF>();
list_points.Add(p);
splitted_list.Add(list_points);
}
else
{
splitted_list.Last().Add(p);
offsetX = p.X - x;
offsetY = p.Y - y;
}
}
else
{
List<PointF> list_points = new List<PointF>();
list_points.Add(p);
splitted_list.Add(list_points);
}
x = p.X;
y = p.Y;
}
Everytime one of the offsets changes, I create a new list and I add it to my "list of lists" (it works regardless of the offset's value).
So the result is a list of series of points. If you HAVE to use those series as arrays of points, you can simply convert the lists to arrays by using the method ToArray().

Here's another implementation:
static void Main(string[] args)
{
List<PointF> points = new List<PointF>()
{
new PointF(6, 12),
new PointF(6, 13.25f),
new PointF(6, 14.5f),
new PointF(6, 15.75f),
new PointF(6, 17),
new PointF(6, 18.25f),
new PointF(18, 12),
new PointF(18, 13.25f),
new PointF(18, 14.5f),
new PointF(18, 15.75f),
new PointF(18, 17),
new PointF(18, 18.25f)
};
var result = new List<List<PointF>>();
int i, j = 0;
for (i = 1; i < points.Count; i++)
{
if (!IsInRange(points[i], points[i - 1]))
{
result.Add(points.GetRange(j, i - j));
j = i;
}
}
result.Add(points.GetRange(j, i - j));
}
static bool IsInRange(PointF a, PointF b) => Math.Abs(a.X + a.Y - b.X - b.Y) <= 1.25f;

So, thanks for all you responses, with them I was able to get this working. I used something similar to this. I can't share the exact code as there is code from my company that I can't share. I get the points in a collection that is very similar to an array, and I HAVE to use them in this type of collection.
while(points.Length > 0){
Point currentPoint = points[0];
List<Point> selectedPoints = new List<Point>();
selectedPoints.Add(currentPoint)
for(int i = 1; i <= points.Length; i++){
Point nextPoint = points[i];
double dist = getDist(currentPoint, nextPoint);
if(Math.Abs(dist % 1.25 - 0) < 1.19e-4){
selectedPoints.Add(nextPoint);
}
}
// remove selected points from all points
// add them to my list of points that I needed
}
Thanks for the help!

How to total a FloatResidentArray and retrieve the value to the device or host

I am using Hybridizer to total a FloatResidentArray and I am not able to return the calculated total to the device (or host) because of the need for a ref statement in the final AtomicExpr.apply statement.
Consider the following code which is based on the GenericReduce example provided by Altimesh.
The code takes a device resident array a, of float of length N and calculates the total – this value is placed in total[0].
[Kernel]
public static void Total(FloatResidentArray a, int N, float[] total)
{
var cache = new SharedMemoryAllocator<float>().allocate(blockDim.x);
int tid = threadIdx.x + blockDim.x * blockIdx.x;
int cacheIndex = threadIdx.x;
float sum = 0f;
while (tid < N)
{
sum = sum + a[tid];
tid += blockDim.x * gridDim.x;
}
cache[cacheIndex] = sum;
CUDAIntrinsics.__syncthreads();
int i = blockDim.x / 2;
while (i != 0)
{
if (cacheIndex < i)
{
cache[cacheIndex] = cache[cacheIndex] + cache[cacheIndex + i];
}
CUDAIntrinsics.__syncthreads();
i >>= 1;
}
if (cacheIndex == 0)
{
AtomicExpr.apply(ref total[0], cache[0], (x, y) => x + y);
}
}
The above code does not compile because you cannot pass a float[] and a FloatResidentArray in the same parameter list.
If total is defined as a FloatResidentArray itself, then the compiler will not allow the ref keyword to be used in the final line of code.
If I simply pass a float, then the returned variable is not updated with the total.
If I pass a ref float - then the program throws a runtime error at the point where the HybRunner wraps the above code to create the dynamic – the error message is
Value types by reference are not supported
How do I return the total? –either to Device or Host memory – both are acceptable.

Well, you need to understand how marshalling works
Object and arrays (even resident array) are all hosts when created in .Net.
Then we marshal them (pin host memory, allocate device memory and copy host to device) right before kernel execution.
For a float[], that will be done automatically
For an IntPtr, we do nothing and the user has to ensure the IntPtr is a valid device pointer containing the data
For a resident array, we do nothing and the user has to manually call RefreshDevice() and RefreshHost when she wants to get the data back and forth.
Mixing ResidentArray and float[] is supported, as show in this screenshot of the generated dll :
What is not supported is : mixing managed types and IntPtr.
Here is a complete version of your code working, and returning the correct result:
using Hybridizer.Runtime.CUDAImports;
using System;
using System.Runtime.InteropServices;
namespace SimpleMetadataDecorator
{
class Program
{
[EntryPoint]
public static void Total(FloatResidentArray a, int N, float[] total)
{
var cache = new SharedMemoryAllocator<float>().allocate(blockDim.x);
int tid = threadIdx.x + blockDim.x * blockIdx.x;
int cacheIndex = threadIdx.x;
float sum = 0f;
while (tid < N)
{
sum = sum + a[tid];
tid += blockDim.x * gridDim.x;
}
cache[cacheIndex] = sum;
CUDAIntrinsics.__syncthreads();
int i = blockDim.x / 2;
while (i != 0)
{
if (cacheIndex < i)
{
cache[cacheIndex] = cache[cacheIndex] + cache[cacheIndex + i];
}
CUDAIntrinsics.__syncthreads();
i >>= 1;
}
if (cacheIndex == 0)
{
AtomicExpr.apply(ref total[0], cache[0], (x, y) => x + y);
}
}
static void Main(string[] args)
{
const int N = 1024 * 1024 * 32;
FloatResidentArray arr = new FloatResidentArray(N);
float[] res = new float[1];
for (int i = 0; i < N; ++i)
{
arr[i] = 1.0F;
}
arr.RefreshDevice();
var runner = HybRunner.Cuda();
cudaDeviceProp prop;
cuda.GetDeviceProperties(out prop, 0);
runner.SetDistrib(16 * prop.multiProcessorCount, 1, 128, 1, 1, 128 * sizeof(float));
var wrapped = runner.Wrap(new Program());
runner.saveAssembly();
cuda.ERROR_CHECK((cudaError_t)(int)wrapped.Total(arr, N, res));
cuda.ERROR_CHECK(cuda.DeviceSynchronize());
Console.WriteLine(res[0]);
}
}
}

how to I deal with NaN results from FFT?

I am trying to implement a function which takes an wav file, runs a 100th of a second worth of audio through the FFT by AForge. When I change the offset to alter where in the audio I am computing through the FFT, sometimes I will get results in which I can show in my graph but most of the time I get a complex array of NaN's. Why could this be?
Here is my code.
public double[] test()
{
OpenFileDialog file = new OpenFileDialog();
file.ShowDialog();
WaveFileReader reader = new WaveFileReader(file.FileName);
byte[] data = new byte[reader.Length];
reader.Read(data, 0, data.Length);
samepleRate = reader.WaveFormat.SampleRate;
bitDepth = reader.WaveFormat.BitsPerSample;
channels = reader.WaveFormat.Channels;
Console.WriteLine("audio has " + channels + " channels, a sample rate of " + samepleRate + " and bitdepth of " + bitDepth + ".");
float[] floats = new float[data.Length / sizeof(float)];
Buffer.BlockCopy(data, 0, floats, 0, data.Length);
size = 2048;
int inputSamples = samepleRate / 100;
int offset = samepleRate * 15 * channels;
int y = 0;
Complex[] complexData = new Complex[size];
float[] window = CalcWindowFunction(inputSamples);
for (int i = 0; i < inputSamples; i++)
{
complexData[y] = new Complex(floats[i * channels + offset] * window[i], 0);
y++;
}
while (y < size)
{
complexData[y] = new Complex(0, 0);
y++;
}
FourierTransform.FFT(complexData, FourierTransform.Direction.Forward);
double[] arr = new double[complexData.Length];
for (int i = 0; i < complexData.Length; i++)
{
arr[i] = complexData[i].Magnitude;
}
Console.Write("complete, ");
return arr;
}
private float[] CalcWindowFunction(int inputSamples)
{
float[] arr = new float[size];
for(int i =0; i<size;i++){
arr[i] = 1;
}
return arr;
}

A complex array of NaNs is usually the result of one of the inputs to the FFT being a NaN. To debug, you might check all the values in the input array before the FFT to make sure they are within some valid range, given the audio input scaling.

Data histogram - optimized binwidth optimization

I'm looking to produce a data histogram from a given dataset. I've read about different options for constructing the histogram and I'm most interested in a method based on the work of
Shimazaki, H.; Shinomoto, S. (2007). "A method for selecting the bin
size of a time histogram"
The above method uses estimation to determine the optimal bin width and distribution, which is needed in my case because the sample data will vary in distribution and hard to determine the bin count and width in advance.
Can someone recommend a good source or a starting point for writing such a function in c# or have a close enough c# histogram code.
Many thanks.

The following is a port I wrote of the Python version of this algorithm from here. I know the API could do with some work, but this should be enough to get you started. The results of this code are identical to those produced by the Python code for the same input data.
public class HistSample
{
public static void CalculateOptimalBinWidth(double[] x)
{
double xMax = x.Max(), xMin = x.Min();
int minBins = 4, maxBins = 50;
double[] N = Enumerable.Range(minBins, maxBins - minBins)
.Select(v => (double)v).ToArray();
double[] D = N.Select(v => (xMax - xMin) / v).ToArray();
double[] C = new double[D.Length];
for (int i = 0; i < N.Length; i++)
{
double[] binIntervals = LinearSpace(xMin, xMax, (int)N[i] + 1);
double[] ki = Histogram(x, binIntervals);
ki = ki.Skip(1).Take(ki.Length - 2).ToArray();
double mean = ki.Average();
double variance = ki.Select(v => Math.Pow(v - mean, 2)).Sum() / N[i];
C[i] = (2 * mean - variance) / (Math.Pow(D[i], 2));
}
double minC = C.Min();
int index = C.Select((c, ix) => new { Value = c, Index = ix })
.Where(c => c.Value == minC).First().Index;
double optimalBinWidth = D[index];
}
public static double[] Histogram(double[] data, double[] binEdges)
{
double[] counts = new double[binEdges.Length - 1];
for (int i = 0; i < binEdges.Length - 1; i++)
{
double lower = binEdges[i], upper = binEdges[i + 1];
for (int j = 0; j < data.Length; j++)
{
if (data[j] >= lower && data[j] <= upper)
{
counts[i]++;
}
}
}
return counts;
}
public static double[] LinearSpace(double a, double b, int count)
{
double[] output = new double[count];
for (int i = 0; i < count; i++)
{
output[i] = a + ((i * (b - a)) / (count - 1));
}
return output;
}
}
Run it like this:
double[] x =
{
4.37, 3.87, 4.00, 4.03, 3.50, 4.08, 2.25, 4.70, 1.73,
4.93, 1.73, 4.62, 3.43, 4.25, 1.68, 3.92, 3.68, 3.10,
4.03, 1.77, 4.08, 1.75, 3.20, 1.85, 4.62, 1.97, 4.50,
3.92, 4.35, 2.33, 3.83, 1.88, 4.60, 1.80, 4.73, 1.77,
4.57, 1.85, 3.52, 4.00, 3.70, 3.72, 4.25, 3.58, 3.80,
3.77, 3.75, 2.50, 4.50, 4.10, 3.70, 3.80, 3.43, 4.00,
2.27, 4.40, 4.05, 4.25, 3.33, 2.00, 4.33, 2.93, 4.58,
1.90, 3.58, 3.73, 3.73, 1.82, 4.63, 3.50, 4.00, 3.67,
1.67, 4.60, 1.67, 4.00, 1.80, 4.42, 1.90, 4.63, 2.93,
3.50, 1.97, 4.28, 1.83, 4.13, 1.83, 4.65, 4.20, 3.93,
4.33, 1.83, 4.53, 2.03, 4.18, 4.43, 4.07, 4.13, 3.95,
4.10, 2.27, 4.58, 1.90, 4.50, 1.95, 4.83, 4.12
};
HistSample.CalculateOptimalBinWidth(x);

Check the Histogram function. If any data elements are unlucky to be equal to a bin boundary (other than the first or last bin), they will be counted in both consecutive bins.
The code needs to check (lower <= data[j] && data[j] < upper) and handle the case that all elements equal to xMax go into the last bin.

A small update to nick_w answer.
If you actually need the bins after. Plus optimized the double loop in histogram function away, plus got rid of linspace function.
/// <summary>
/// Calculate the optimal bins for the given data
/// </summary>
/// <param name="x">The data you have</param>
/// <param name="xMin">The minimum element</param>
/// <param name="optimalBinWidth">The width between each bin</param>
/// <returns>The bins</returns>
public static int[] CalculateOptimalBinWidth(List<double> x, out double xMin, out double optimalBinWidth)
{
var xMax = x.Max();
xMin = x.Min();
optimalBinWidth = 0;
const int MIN_BINS = 1;
const int MAX_BINS = 20;
int[] minKi = null;
var minOffset = double.MaxValue;
foreach (var n in Enumerable.Range(MIN_BINS, MAX_BINS - MIN_BINS).Select(v => v*5))
{
var d = (xMax - xMin)/n;
var ki = Histogram(x, n, xMin, d);
var ki2 = ki.Skip(1).Take(ki.Length - 2).ToArray();
var mean = ki2.Average();
var variance = ki2.Select(v => Math.Pow(v - mean, 2)).Sum()/n;
var offset = (2*mean - variance)/Math.Pow(d, 2);
if (offset < minOffset)
{
minKi = ki;
minOffset = offset;
optimalBinWidth = d;
}
}
return minKi;
}
private static int[] Histogram(List<double> data, int count, double xMin, double d)
{
var histogram = new int[count];
foreach (var t in data)
{
var bucket = (int) Math.Truncate((t - xMin)/d);
if (count == bucket) //fix xMax
bucket --;
histogram[bucket]++;
}
return histogram;
}

I would recommend binary search to speed up the assignment to the class intervals.
public void Add(double element)
{
if (element < Bins.First().LeftBound || element > Bins.Last().RightBound)
return;
var min = 0;
var max = Bins.Length - 1;
var index = 0;
while (min <= max)
{
index = min + ((max - min) / 2);
if (element >= Bins[index].LeftBound && element < Bins[index].RightBound)
break;
if (element < Bins[index].LeftBound)
max = index - 1;
else
min = index + 1;
}
Bins[index].Count++;
}
"Bins" is a list of items of type "HistogramItem" which defines properties like "Leftbound", "RightBound" and "Count".

C# ModInverse Function

Is there a built in function that would allow me to calculate the modular inverse of a(mod n)?
e.g. 19^-1 = 11 (mod 30), in this case the 19^-1 == -11==19;

Since .Net 4.0+ implements BigInteger with a special modular arithmetics function ModPow (which produces “X power Y modulo Z”), you don't need a third-party library to emulate ModInverse. If n is a prime, all you need to do is to compute:
a_inverse = BigInteger.ModPow(a, n - 2, n)
For more details, look in Wikipedia: Modular multiplicative inverse, section Using Euler's theorem, the special case “when m is a prime”. By the way, there is a more recent SO topic on this: 1/BigInteger in c#, with the same approach suggested by CodesInChaos.

int modInverse(int a, int n)
{
int i = n, v = 0, d = 1;
while (a>0) {
int t = i/a, x = a;
a = i % x;
i = x;
x = d;
d = v - t*x;
v = x;
}
v %= n;
if (v<0) v = (v+n)%n;
return v;
}

The BouncyCastle Crypto library has a BigInteger implementation that has most of the modular arithmetic functions. It's in the Org.BouncyCastle.Math namespace.

Here is a slightly more polished version of Samuel Allan's algorithm. The TryModInverse method returns a bool value, that indicates whether a modular multiplicative inverse exists for this number and modulo.
public static bool TryModInverse(int number, int modulo, out int result)
{
if (number < 1) throw new ArgumentOutOfRangeException(nameof(number));
if (modulo < 2) throw new ArgumentOutOfRangeException(nameof(modulo));
int n = number;
int m = modulo, v = 0, d = 1;
while (n > 0)
{
int t = m / n, x = n;
n = m % x;
m = x;
x = d;
d = checked(v - t * x); // Just in case
v = x;
}
result = v % modulo;
if (result < 0) result += modulo;
if ((long)number * result % modulo == 1L) return true;
result = default;
return false;
}

There is no library for getting inverse mod, but the following code can be used to get it.
// Given a and b->ax+by=d
long[] u = { a, 1, 0 };
long[] v = { b, 0, 1 };
long[] w = { 0, 0, 0 };
long temp = 0;
while (v[0] > 0)
{
double t = (u[0] / v[0]);
for (int i = 0; i < 3; i++)
{
w[i] = u[i] - ((int)(Math.Floor(t)) * v[i]);
u[i] = v[i];
v[i] = w[i];
}
}
// u[0] is gcd while u[1] gives x and u[2] gives y.
// if u[1] gives the inverse mod value and if it is negative then the following gives the first positive value
if (u[1] < 0)
{
while (u[1] < 0)
{
temp = u[1] + b;
u[1] = temp;
}
}

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Fast reading of an array of structs from a binary file - c#

Is it possible to read an array of structs from binary file in one call? For example, I have a file containing thousands of vertices: struct Vector3 { float x, y, z; } I need C# port for the C++ code: Vector3 *verts = new Vector3[num_verts]; fread ( verts, sizeof(Vector3), num_verts, f );

Related

Splitting an array of points into multiple arrays that have a specific distance from each other

How to total a FloatResidentArray and retrieve the value to the device or host

how to I deal with NaN results from FFT?

Data histogram - optimized binwidth optimization

C# ModInverse Function

Categories

Resources