I have a byte array recived from Cpp program.
arr[0..3] // a real32,
arr[4] // a uint8,
How can I interpret arr[4] as int?
(uint)arr[4] // Err: can't implicitly convert string to int.
BitConverter.ToUint16(arr[4]) // Err: Invalid argument.
buff[0+4] as int // Err: must be reference or nullable type
Do I have to zero consecutive byte to interpret it as a UInt16?
OK, here is the confusion. Initially, I defined my class.
byte[] buff;
buff = getSerialBuffer();
public class Reading{
public string scale_id;
public string measure;
public int measure_revised;
public float wt;
}
rd = new Reading();
// !! here is the confusion... !!
// Err: Can't implicitly convert 'string' to 'int'
rd.measure = string.Format("{0}", buff[0 + 4]);
// then I thought, maybe I should convert buff[4] to int first ?
// I throw all forms of conversion here, non worked.
// but, later it turns out:
rd.measure_revised = buff[0+4]; // just ok.
So basically, I don't understand why this happens
rd.measure = string.Format("{0}", buff[0 + 4]);
//Err: Can't implicitly convert 'string' to 'int'
If buff[4] is a byte and byte is uint8, what does it mean by can't implicitly convert string to int ?... It confuses me.
You were almost there. Assuming you wanted a 32-bit int from the first 4 bytes (it's hard to interpret your question):
BitConverter.ToInt32(arr, 0);
This says to take the 4 bytes from arr, starting at index 0, and turn them into a 32-bit int. (docs)
Note that BitConverter uses the endianness of the computer, so on x86/x64 this will be little-endian.
If you want to use an explicit endianness, you'll need to construct the int by hand:
int littleEndian = arr[0] | (arr[1] << 8) | (arr[2] << 16) | (arr[3] << 24);
int bigEndian = arr[3] | (arr[2] << 8) | (arr[1] << 16) | (arr[0] << 24);
If instead you wanted a 32-bit floating-point number from the first 4 bytes, see Dmitry Bychenko's answer.
If I've understood you right you have byte (not string) array
byte[] arr = new byte[] {
182, 243, 157, 63, // Real32 - C# Single or float (e.g. 1.234f)
123 // uInt8 - C# byte (e.g. 123)
};
To get float and byte back you can try BitConverter
// read float / single starting from 0th byte
float realPart = BitConverter.ToSingle(arr, 0);
byte bytePart = arr[4];
Console.Write($"Real Part: {realPart}; Integer Part: {bytePart}");
Outcome:
Real Part: 1.234; Integer Part: 123
Same idea (BitConverter class) if we want to encode arr:
float realPart = 1.234f;
byte bytePart = 123;
byte[] arr =
BitConverter.GetBytes(realPart)
.Concat(new byte[] { bytePart })
.ToArray();
Console.Write(string.Join(" ", arr));
Outcome:
182 243 157 63 123
I am trying to move a trained model into a production environment and have encountered an issue trying to replicate the behavior of the Keras hashing_trick() function in C#. When I go to encode the sentence my output is different in C# than it is in python:
Text: "Information - The configuration processing is completed."
Python: [ 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 217 142 262 113 319 413]
C#: [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 433, 426, 425, 461, 336, 146, 52]
(copied from debugger, both sequences have length 30)
What I've tried:
changing the encoding of the text bytes in C# to match the python string.encode() function default (UTF8)
Changing capitalization of letters to lowercase and upper case
Tried using Convert.ToUInt32 instead of BitConverter (resulted in overflow error)
My code (below) is my implementation of the Keras hashing_trick function. A single input sentence is given and then the function will return the corresponding encoded sequence.
public uint[] HashingTrick(string data)
{
const int VOCAB_SIZE = 534; //Determined through python debugging of model
var filters = "!#$%&()*+,-./:;<=>?#[\\]^_`{|}~\t\n".ToCharArray().ToList();
filters.ForEach(x =>
{
data = data.Replace(x, '\0');
});
string[] parts = data.Split(' ');
var encoded = new List<uint>();
parts.ToList().ForEach(x =>
{
using (System.Security.Cryptography.MD5 md5 = System.Security.Cryptography.MD5.Create())
{
byte[] inputBytes = System.Text.Encoding.UTF8.GetBytes(x);
byte[] hashBytes = md5.ComputeHash(inputBytes);
uint val = BitConverter.ToUInt32(hashBytes, 0);
encoded.Add(val % (VOCAB_SIZE - 1) + 1);
}
});
return PadSequence(encoded, 30);
}
private uint[] PadSequence(List<uint> seq, int maxLen)
{
if (seq.Count < maxLen)
{
while (seq.Count < maxLen)
{
seq.Insert(0, 0);
}
return seq.ToArray();
}
else if (seq.Count > maxLen)
{
return seq.GetRange(seq.Count - maxLen - 1, maxLen).ToArray();
}
else
{
return seq.ToArray();
}
}
The keras implementation of the hashing trick can be found here
If it helps, I am using an ASP.NET Web API as my solution type.
The biggest problem with your code is that it fails to account for the fact that Python's int is an arbitrary precision integer, while C#'s uint has only 32 bits. This means that Python is calculating the modulo over all 128 bits of the hash, while C# is not (and BitConverter.ToUInt32 is the wrong thing to do in any case, as the endianness is wrong). The other problem that trips you up is that \0 does not terminate strings in C#, and \0 can't just be added to an MD5 hash without changing the outcome.
Translated in as straightforward a manner as possible:
int[] hashingTrick(string text, int n, string filters, bool lower, string split) {
var splitWords = String.Join("", text.Where(c => !filters.Contains(c)))
.Split(new[] { split }, StringSplitOptions.RemoveEmptyEntries);
return (
from word in splitWords
let bytes = Encoding.UTF8.GetBytes(lower ? word.ToLower() : word)
let hash = MD5.Create().ComputeHash(bytes)
// add a 0 byte to force a non-negative result, per the BigInteger docs
let w = new BigInteger(hash.Reverse().Concat(new byte[] { 0 }).ToArray())
select (int) (w % (n - 1) + 1)
).ToArray();
}
Sample use:
const int vocabSize = 534;
Console.WriteLine(String.Join(" ",
hashingTrick(
text: "Information - The configuration processing is completed.",
n: vocabSize,
filters: "!#$%&()*+,-./:;<=>?#[\\]^_`{|}~\t\n",
lower: true,
split: " "
).Select(i => i.ToString())
));
217 142 262 113 319 413
This code has various inefficiencies: filtering characters with LINQ is very inefficient compared to using a StringBuilder and we don't really need BigInteger here since MD5 is always exactly 128 bits, but optimizing (if necessary) is left as an exercise to the reader, as is padding the outcome (which you already have a function for).
Instead of solving the issue of trying to fight with C# to get the hashing right, I took a different approach to the problem. When making my data set to train the model (this is a machine learning project after all) I decided to use #Jeron Mostert's implementation of the hashing function to pre-hash the data set before feeding it into the model.
This solution was much easier to implement and ended up working just as well as the original text hashing. Word of advice for those attempting to do cross language hashing like me: Don't do it, it's a lot of headache! Use one language for hashing your text data and find a way to create a valid data set with all of the information required.
first sorry for my English
by using c#
I'm try to decoding RTP A-Law Packet but it is gave me a noise,
i checked my code with wireshark code which can gave me the voice without noise , i can not get the difference between wireshark code (c++) and my code (c#) which because I can not Debugging wireshark , but i get the difference Bytes resulting from my code and wireshark code
i will write the Bytes result from two code and simple of my code and wireshark
for example :
when the alaw_exp_table[data[i]] = -8
in my cod the bytes result are : 248 , 255
in wireshark code the bytes result are : 255 , 248
are you see 248,255 : 255,248 i think it was a reflection but the next example not
when the alaw_exp_table[data[i]] = 8
in my cod the bytes result are : 8 , 0
in wireshark code the bytes result are : 0 , 0
this wireshark
int
decodeG711a(void *input, int inputSizeBytes, void *output, int *outputSizeBytes)
{
guint8 *dataIn = (guint8 *)input;
gint16 *dataOut = (gint16 *)output;
int i;
for (i=0; i<inputSizeBytes; i++)
{
dataOut[i] = alaw_exp_table[dataIn[i]];
}
*outputSizeBytes = inputSizeBytes * 2;
return 0;
}
static short[] alaw_exp_table= {
-5504, -5248, -6016, -5760, -4480, -4224, -4992, -4736,
-7552, -7296, -8064, -7808, -6528, -6272, -7040, -6784,
-2752, -2624, -3008, -2880, -2240, -2112, -2496, -2368,
-3776, -3648, -4032, -3904, -3264, -3136, -3520, -3392,
-22016,-20992,-24064,-23040,-17920,-16896,-19968,-18944,
-30208,-29184,-32256,-31232,-26112,-25088,-28160,-27136,
-11008,-10496,-12032,-11520, -8960, -8448, -9984, -9472,
-15104,-14592,-16128,-15616,-13056,-12544,-14080,-13568,
-344, -328, -376, -360, -280, -264, -312, -296,
-472, -456, -504, -488, -408, -392, -440, -424,
-88, -72, -120, -104, -24, -8, -56, -40,
-216, -200, -248, -232, -152, -136, -184, -168,
-1376, -1312, -1504, -1440, -1120, -1056, -1248, -1184,
-1888, -1824, -2016, -1952, -1632, -1568, -1760, -1696,
-688, -656, -752, -720, -560, -528, -624, -592,
-944, -912, -1008, -976, -816, -784, -880, -848,
5504, 5248, 6016, 5760, 4480, 4224, 4992, 4736,
7552, 7296, 8064, 7808, 6528, 6272, 7040, 6784,
2752, 2624, 3008, 2880, 2240, 2112, 2496, 2368,
3776, 3648, 4032, 3904, 3264, 3136, 3520, 3392,
22016, 20992, 24064, 23040, 17920, 16896, 19968, 18944,
30208, 29184, 32256, 31232, 26112, 25088, 28160, 27136,
11008, 10496, 12032, 11520, 8960, 8448, 9984, 9472,
15104, 14592, 16128, 15616, 13056, 12544, 14080, 13568,
344, 328, 376, 360, 280, 264, 312, 296,
472, 456, 504, 488, 408, 392, 440, 424,
88, 72, 120, 104, 24, 8, 56, 40,
216, 200, 248, 232, 152, 136, 184, 168,
1376, 1312, 1504, 1440, 1120, 1056, 1248, 1184,
1888, 1824, 2016, 1952, 1632, 1568, 1760, 1696,
688, 656, 752, 720, 560, 528, 624, 592,
944, 912, 1008, 976, 816, 784, 880, 848};
and this is my code
public static void ALawDecode(byte data, out byte[] decoded)
{
int size = data.Length;
decoded = new byte[size * 2];
for (int i = 0; i < size; i++)
{
//First byte is the less significant byte
decoded[2 * i] = (byte)(alaw_exp_table[data[i]] & 0xff);
//Second byte is the more significant byte
decoded[2 * i + 1] = (byte)(alaw_exp_table[data[i]] >> 8);
}
}
the alaw_exp_table is same in my code and wireshark code
please tell me what is the wrong in my code which do that noise ?
thanks in advance
Your are probably handling the endianess incorrectly.
Try swapping the the two decoding operations in your C# sample. Eg:
decoded[2 * i + 1] = (byte)(alaw_exp_table[data[i]] & 0xff);
decoded[2 * i] = (byte)(alaw_exp_table[data[i]] >> 8);
You are decoding eight bit A-law samples into 16 bit signed PCM, so it would make sense for you to use an array of shorts for the output. This is close to what the C code is doing.
If you don't have a particular reason for using a byte array as output, I would suggest just having the A-law lookup table be a short array an just move 16-bit signed values around instead of messing around with byte ordering.
If you really do care about bytes and byte ordering, you need to get the byte ordering right, as #leppie says. This will depend on what you actually do with the output.