Fastest way to extract variable width signed integer from byte[] - c#

The title speaks for itself. I have a file containing a base64 encoded byte[] of variable width integer, min 8 bit, max 32bit
I have a large file (48MB) and I am trying to find the fastest way of grabbing integers from the stream.
This is the fastest code from a perf app:
static int[] Base64ToIntArray3(string base64, int size)
{
List<int> res = new List<int>();
byte[] buffer = new byte[4];
using (var ms = new System.IO.MemoryStream(Convert.FromBase64String(base64)))
{
while(ms.Position < ms.Length)
{
ms.Read(buffer, 0, size);
res.Add(BitConverter.ToInt32(buffer, 0));
}
}
return res.ToArray();
}
I can't see a faster way of padding the bytes to 32bit. Any ideas, chaps and chapettes? Solutions should be in c#. I could fall down to C/++ if i must but i don't want to.

There is no reason to use a memory stream to move bytes from an array to another array, just read from the array directly. Also, the size of the array is known, so there is need to add the items to a list that is then converted to an array, you can use an array from the start:
static int[] Base64ToIntArray3(string base64, int size) {
byte[] data = Convert.FromBase64String(base64);
int cnt = data.Length / size;
int[] res = new int[cnt];
for (int i = 0; i < cnt; i++) {
switch (size) {
case 1: res[i] = data[i]; break;
case 2: res[i] = BitConverter.ToInt16(data, i * 2); break;
case 3: res[i] = data[i * 3] + data[i * 3 + 1] * 256 + data[i * 3 + 2] * 65536; break;
case 4: res[i] = BitConverter.ToInt32(data, i * 4); break;
}
}
return res;
}
Note: Untested code! You have to verify that it actually does what it is supposed to do, but at least it shows the principle.

This is probably how I would do it. Not using a stream should increase performance. This seems like the sort of thing that should be easy to do using Linq but I couldn't figure it out.
static int[] Base64ToIntArray3(string base64, int size)
{
if (size < 1 || size > 4) throw new ArgumentOutOfRangeException("size");
byte[] data = Convert.FromBase64String(base64);
List<int> res = new List<int>();
byte[] buffer = new byte[4];
for (int i = 0; i < data.Length; i += size )
{
Buffer.BlockCopy(data, i, buffer, 0, size);
res.Add(BitConverter.ToInt32(buffer, 0));
}
return res.ToArray();
}

Ok so I believe this is the Linq way to do this:
static int[] Base64ToIntArray3(string base64, int size)
{
byte[] data = Convert.FromBase64String(base64);
return data.Select((Value, Index) => new { Value, Index })
.GroupBy(p => p.Index / size)
.Select(g => BitConverter.ToInt32(g.Select(p => p.Value).Union(new byte[4 - size]).ToArray(), 0))
.ToArray();
}

Related

What am I doing wrong when parsing a wav file?

I'm trying to parse a wav file. I'm not sure if there can be multiple data chunks in a wav file, but I originally assumed there was only 1 since the wav file format description I was reading only mentioned there being 1.
But I noticed that the subchunk2size was very small (like 26) when the wav file being parsed was something like 36MB and the sample rate was 44100.
So I tried to parse it assuming there were multiple chunks, but after the 1st chunk, there was no subchunk2id to be found.
To go chunk by chunk, I was using the below code
int chunkSize = System.BitConverter.ToInt32(strm, 40);
int widx = 44; //wav data starts at the 44th byte
//strm is a byte array of the wav file
while(widx < strm.Length)
{
widx += chunkSize;
if(widx < 1000)
{
//log "data" or "100 97 116 97" for the subchunkid
//This is only getting printed the 1st time though. All prints after that are garbage
Debug.Log( strm[widx] + " " + strm[widx+1] + " " + strm[widx+2] + " " + strm[widx+3]);
}
if(widx + 8 < strm.Length)
{
widx += 4;
chunkSize = System.BitConverter.ToInt32(strm, widx);
widx += 4;
}else
{
widx += 8;
}
}
A .wav-File has 3 chunks:
Each chunk has a size of 4 Byte
The first chunk is the "RIFF"-chunk. It includes 8 Byte the filesize(4 Byte) and the name of the format(4byte, usually "WAVE").
The next chunk is the "fmt "-chunk (the space in the chunk-name is important). It includes the audio-format(2 Byte), the number of channels (2 Byte), the sample rate (4 Byte), the byte rate (4 Byte), blockalign (2 Byte) and the bits per sample (2 Byte).
The third and last chunk is the data-chunk. Here are the real data and the amplitudes of the samples. It includes 4 Byte for the datasize, which is the number of bytes for the data.
You can find further explanations of the properties of a .wav-file here.
From this knowledge I have already created the following class:
public sealed class WaveFile
{
//privates
private int fileSize;
private string format;
private int fmtChunkSize;
private int audioFormat;
private int numChannels;
private int sampleRate;
private int byteRate;
private int blockAlign;
private int bitsPerSample;
private int dataSize;
private int[][] data;//One array per channel
//publics
public int FileSize => fileSize;
public string Format => format;
public int FmtChunkSize => fmtChunkSize;
public int AudioFormat => audioFormat;
public int NumChannels => numChannels;
public int SampleRate => sampleRate;
public int ByteRate => byteRate;
public int BitsPerSample => bitsPerSample;
public int DataSize => dataSize;
public int[][] Data => data;
public WaveFile(string path)
{
FileStream fs = File.OpenRead(path);
LoadChunk(fs); //read RIFF Chunk
LoadChunk(fs); //read fmt Chunk
LoadChunk(fs); //read data Chunk
fs.Close();
}
private void LoadChunk(FileStream fs)
{
ASCIIEncoding Encoder = new ASCIIEncoding();
byte[] bChunkID = new byte[4];
fs.Read(bChunkID, 0, 4);
string sChunkID = Encoder.GetString(bChunkID);
byte[] ChunkSize = new byte[4];
fs.Read(ChunkSize, 0, 4);
if (sChunkID.Equals("RIFF"))
{
fileSize = BitConverter.ToInt32(ChunkSize, 0);
byte[] Format = new byte[4];
fs.Read(Format, 0, 4);
this.format = Encoder.GetString(Format);
}
if (sChunkID.Equals("fmt "))
{
fmtChunkSize = BitConverter.ToInt32(ChunkSize, 0);
byte[] audioFormat = new byte[2];
fs.Read(audioFormat, 0, 2);
this.audioFormat = BitConverter.ToInt16(audioFormat, 0);
byte[] numChannels = new byte[2];
fs.Read(numChannels, 0, 2);
this.numChannels = BitConverter.ToInt16(numChannels, 0);
byte[] sampleRate = new byte[4];
fs.Read(sampleRate, 0, 4);
this.sampleRate = BitConverter.ToInt32(sampleRate, 0);
byte[] byteRate = new byte[4];
fs.Read(byteRate, 0, 4);
this.byteRate = BitConverter.ToInt32(byteRate, 0);
byte[] blockAlign = new byte[2];
fs.Read(blockAlign, 0, 2);
this.blockAlign = BitConverter.ToInt16(blockAlign, 0);
byte[] bitsPerSample = new byte[2];
fs.Read(bitsPerSample, 0, 2);
this.bitsPerSample = BitConverter.ToInt16(bitsPerSample, 0);
}
if (sChunkID.Equals("data"))
{
dataSize = BitConverter.ToInt32(ChunkSize, 0);
data = new int[this.numChannels][];
byte[] temp = new byte[dataSize];
for (int i = 0; i < this.numChannels; i++)
{
data[i] = new int[this.dataSize / (numChannels * bitsPerSample / 8)];
}
for (int i = 0; i < data[0].Length; i++)
{
for (int j = 0; j < numChannels; j++)
{
if (fs.Read(temp, 0, blockAlign / numChannels) > 0)
{
if (blockAlign / numChannels == 2)
{ data[j][i] = BitConverter.ToInt32(temp, 0); }
else
{ data[j][i] = BitConverter.ToInt16(temp, 0); }
}
}
}
}
}
}
Needed using-directives:
using System;
using System.IO;
using System.Text;
This class reads all chunks byte per byte and sets the properties. You just have to initialize this class and it will return all properties of your selected wave-file.
In the reference you added I dont see any mention of the chunk size being repeated for each data chunk...
Try something like this:
int chunkSize = System.BitConverter.ToInt32(strm, 40);
int widx = 44; //wav data starts at the 44th byte
//strm is a byte array of the wav file
while(widx < strm.Length)
{
if(widx < 1000)
{
//log "data" or "100 97 116 97" for the subchunkid
//This is only getting printed the 1st time though. All prints after that are garbage
Debug.Log( strm[widx] + " " + strm[widx+1] + " " + strm[widx+2] + " " + strm[widx+3]);
}
widx += chunkSize;
}

Turning pairs of bytes into floats

Anyone have something equivalent to (and prettier than) the following code?
private static float[] PairsOfBytesToFloats(byte[] bytes)
{
if (bytes.Length.IsNotAMultipleOf(2)) throw new ArgumentException();
float[] result = new float[bytes.Length / 2];
for (int i = 0; i < bytes.Length; i += 2)
{
result[i / 2] = BitConverter.ToUInt16(bytes, i);
}
return result;
}
Maybe a bit of LINQ:
return Enumerable.Range(0, bytes.Length/2)
.Select(index => (float)BitConverter.ToUInt16(bytes, index*2))
.ToArray();
I would suggest you to use the Buffer.BlockCopy method instead of loop:
Buffer.BlockCopy(bytes, 0, result, 0, bytes.Length);

How have a generic conversion from 32/24bit From Bytes To 16bit To bytes

Have been searching the solution for two days.
I want to convert my wave 32 or 24 bits to a 16bit.
This my code after reading few stackoverflow topics):
byte[] data = Convert.FromBase64String("-- Wav String encoded --") (32 or 24 bits)
int conv = Convert.ToInt16(data);
byte[] intBytes = BitConverter.GetBytes(conv);
if (BitConverter.IsLittleEndian)
Array.Reverse(intBytes);
byte[] result = intBytes;
but when i writeAllbyte my result, nothing to hear...
Here is a method that cuts the least significant bits:
byte[] data = ...
var skipBytes = 0;
byte[] data16bit;
int samples;
if( /* data was 32 bit */ ) {
skipBytes = 2;
samples = data.Length / 4;
} else if( /* data was 24 bit */ ) {
skipBytes = 1;
samples = data.Length / 3;
}
data16bit = new byte[samples * 2];
int writeIndex = 0;
int readIndex = 0;
for(var i = 0; i < samples; ++i) {
readIndex += skipBytes; //skip the least significant bytes
//read the two most significant bytes
data16bit[writeIndex++] = data[readIndex++];
data16bit[writeIndex++] = data[readIndex++];
}
This assumes a little endian byte order (least significant byte is the first byte, usual for WAV RIFF). If you have big endian, you have to put the readIndex += ... after the two read lines.
You could implement your own conversion iterator for this task like so:
IEnumerable<byte> ConvertTo16Bit(byte[] data, int skipBytes)
{
int bytesToRead = 0;
int bytesToSkip = skipBytes;
int readIndex = 0;
while (readIndex < data.Length)
{
if (bytesToSkip > 0)
{
readIndex += bytesToSkip;
bytesToSkip = 0;
bytesToRead = 2;
continue;
}
if (bytesToRead == 0)
{
bytesToSkip = skipBytes;
continue;
}
yield return data[readIndex++];
bytesToRead--;
}
}
This way you don't have to create a new array if there is no need for it. And you could simply convert the data array to a new 16 bit array with the IEnumerable<T> extension methods:
var data16bit = ConvertTo16Bit(data, 1).ToArray();
Or if you don't need the array, you can iterate the data skipping the least significant bytes:
foreach (var b in ConvertTo16Bit(data, 1))
{
Console.WriteLine(b);
}

Search data from one array in another

What I'm trying to do is simple but it's just slooow. Basically I'm looping through data (byte array), converting some parts to a INT and then comparing it to RamCache with is also a byte array. The reason why I'm converting it to a INT is because it's 4 bytes so if 4 bytes are equal in some part of the RamCache array I know it's already 4 length equal.
And then from there I can see how many bytes are equal.
In short what this code must do:
Loop through the data array and take 4 bytes ,then look if it contains in the RamCache array. Currently the code below is slow when the data array and RamCache array contains 65535 bytes.
private unsafe SmartCacheInfo[] FindInCache(byte[] data, Action<SmartCacheInfo> callback)
{
List<SmartCacheInfo> ret = new List<SmartCacheInfo>();
fixed (byte* x = &(data[0]), XcachePtr = &(RamCache[0]))
{
Int32 Loops = data.Length >> 2;
int* cachePtr = (int*)XcachePtr;
int* dataPtr = (int*)x;
if (IndexWritten == 0)
return new SmartCacheInfo[0];
//this part is just horrible slow
for (int i = 0; i < data.Length; i++)
{
if (((data.Length - i) >> 2) == 0)
break;
int index = -1;
dataPtr = (int*)(x + i);
//get the index, alot faster then List.IndexOf
for (int j = 0; ; j++)
{
if (((IndexWritten - j) >> 2) == 0)
break;
if (dataPtr[0] == ((int*)(XcachePtr + j))[0])
{
index = j;
break;
}
}
if (index == -1)
{
//int not found, lets see how
SmartCacheInfo inf = new SmartCacheInfo(-1, i, 4, false);
inf.instruction = Instruction.NEWDATA;
i += inf.Length - 1; //-1... loop does +1
callback(inf);
}
else
{
SmartCacheInfo inf = new SmartCacheInfo(index, i, 0, true); //0 index for now just see what the length is of the MemCmp
inf.Length = MemCmp(data, i, RamCache, index);
ret.Add(inf);
i += inf.Length - 1; //-1... loop does +1
}
}
}
return ret.ToArray();
}
Double looping is what's making it so slow. The data array contains 65535 bytes and so goes for the RamCache array. This code is btw some part of the Cache system I'm working at it's for my SSP project.
Sort the RamCache array or a copy of the array and use a Array.BinarySearch. If you cannot sort it, create a HashSet of the RamCache.

Represent a Guid as a set of integers

If I want to represent a guid as a set of integers how would I handle the conversion? I'm thinking along the lines of getting the byte array representation of the guid and breaking it up into the fewest possible 32 bit integers that can be converted back into the original guid. Code examples preferred...
Also, what will the length of the resulting integer array be?
As a GUID is just 16 bytes, you can convert it to four integers:
Guid id = Guid.NewGuid();
byte[] bytes = id.ToByteArray();
int[] ints = new int[4];
for (int i = 0; i < 4; i++) {
ints[i] = BitConverter.ToInt32(bytes, i * 4);
}
Converting back is just getting the integers as byte arrays and put together:
byte[] bytes = new byte[16];
for (int i = 0; i < 4; i++) {
Array.Copy(BitConverter.GetBytes(ints[i]), 0, bytes, i * 4, 4);
}
Guid id = new Guid(bytes);
System.Guid guid = System.Guid.NewGuid();
byte[] guidArray = guid.ToByteArray();
// condition
System.Diagnostics.Debug.Assert(guidArray.Length % sizeof(int) == 0);
int[] intArray = new int[guidArray.Length / sizeof(int)];
System.Buffer.BlockCopy(guidArray, 0, intArray, 0, guidArray.Length);
byte[] guidOutArray = new byte[guidArray.Length];
System.Buffer.BlockCopy(intArray, 0, guidOutArray, 0, guidOutArray.Length);
System.Guid guidOut = new System.Guid(guidOutArray);
// check
System.Diagnostics.Debug.Assert(guidOut == guid);
Somehow I had much more fun doing it this way:
byte[] bytes = guid.ToByteArray();
int[] ints = new int[bytes.Length / sizeof(int)];
for (int i = 0; i < bytes.Length; i++) {
ints[i / sizeof(int)] = ints[i / sizeof(int)] | (bytes[i] << 8 * ((sizeof(int) - 1) - (i % sizeof(int))));
}
and converting back:
byte[] bytesAgain = new byte[ints.Length * sizeof(int)];
for (int i = 0; i < bytes.Length; i++) {
bytesAgain[i] = (byte)((ints[i / sizeof(int)] & (byte.MaxValue << 8 * ((sizeof(int) - 1) - (i % sizeof(int))))) >> 8 * ((sizeof(int) - 1) - (i % sizeof(int))));
}
Guid guid2 = new Guid(bytesAgain);
Will the build-in Guid structure not suffice?
Constructor:
public Guid(
byte[] b
)
And
public byte[] ToByteArray()
Which, returns a 16-element byte array that contains the value of this instance.
Packing the bytes into integers and visa versa should be trivial.
A Guid is typically just a 128-bit number.
-- Edit
So in C#, you can get the 16 bytes via
byte[] b = Guid.NewGuid().ToByteArray();

Categories

Resources