How can we identify silence packet in a byte array (Buffer provided by WaveInEventArgs) using Naudio. basically I am trying to loop through the array and checking for 0 values in the array. Is that correct?
I'm not sure what you mean by "packet", but finding silence is usually a matter of looking for consecutive samples having an absolute value less than a "threshold" amount. 0.00006 is -84.437 dB, so silence detection can be done on most audio with that value (though you should feel free to adjust that threshold to fit your audio). Depending on exactly what you are doing, you'll want to see a sequence of anywhere from 440 to 48000 "silent" samples before deciding it's a silent "packet".
Related
Im using kinect to extract audio and classifie its features, but i have a question. On http://msdn.microsoft.com/en-us/library/hh855698.aspx it says the audio.start method Opens an audio data stream (16-bit PCM format, sampled at 16 kHz) and starts capturing audio data streamed out of a sensor. The problem is that i dont know how pcm data is represented and i dont know if the method returns pcm true values or not. Because using the sdk examples i get values like 200, 56, 17 and i think audio values are more like -3*10^-5 .
So does anyone know how do i get the true PCM values? Or am i doing something wrong?
Thanks
I wouldn't expect any particular values. 16-bit PCM means it's a series of 16-bit integers, so -3*10-5 (-0.00003) isn't representable.
I would guess it's encoded with 16-bit signed integers (like a WAV file) which have a range of -32768 to 32767. If you're being very quiet the values will probably be close to 0. If you make a lot of noise you will see some higher values too.
Check out this diagram (from Wikipedia's article on PCM) which shows a sine wave encoded as PCM using 4-bit unsigned integers, which have a range of 0 to 15.
See how that 4-bit sine wave oscillates around 7? That's it's equilibrium. If it was a signed 4-bit integer (which has a range of -8 to 7) it would have the same shape, but its equilibrium would be 0 - the values would be shifted by -8 so it would oscillate around 0.
You can measure the distance from the equilibrium to the highest or lowest points of the sine wave to get its amplitude, or broadly, it's volume (which is why if you're quiet you will mostly see values near 0 in your signed 16-bit data). This is probably the easiest sort of feature detection you can do. You can find plenty of good explanations on the web about this, for example http://scienceaid.co.uk/physics/waves/sound.html.
You could save it to a file and play it back with something like Audacity if you're not sure. Fiddle with the input settings and you'll soon figure out the format.
I have an in-memory byte[] and need to locate the offset where 13 and 10 are. I will then use the following to extract that line:
String oneLine = Encoding.ASCII.GetString(bytes, 0, max);
What is the fastest way to search for the two bytes on a x64 bit computer? ..and convert it to a string?
Is there anything I can do other than iterating through each byte, scanning for 13 and then scan for 10?
// Disclaimer:
// This is just for my curiosity. Perhaps I'll gain a better understanding of
// how .NET interfaces with RAM, the CPU instructions related to comparisons, etc.
//
// I don't suspect a performance problem, but I do suspect a lack of understanding
// (on my part) on how C# does low-level operations.
Not sure if it will be 'the fastest way' but you can look at Boyer-Moore algorithm to find the indexes of the required values.
Have a look at this SO thread Search longest pattern in byte array in C#
Boyer-Moore would be better than a linear array traversal because it can skip elements depending on the length of you 'needle' and it gets better as the 'haystack' gets bigger. HTH.
Since you are looking for a two byte sequence, you don't have to scan every byte, just every other one. If the target index contains a 13, then look at the next byte for a 10. If the target index contatins a 10, look at the previous byte for a 13. That should cut your scan time approximatly in half over a linear search.
Using System.IO BinaryReader object found in .NET mscorlib assembly, I ran a loop that dumped each byte value from a .wav file into Excel spreadsheet. For simplicity sake, I recorded a two second 4K signal from signal generator into software sequencer and saved as monaural wave file. The software I sequence music with shows a resolution of 1ms - which is 44.11 samples(assuming 44.1K sample rate). What I find curious is that the data extracted via ReadInt16() method(starting at position 44 in .wav file) shows varied numbers with integers switching signs seemingly at random- whereas the visual sine wave within sequencer is completely uniform with respect to amplitude and frequency. With 16 bit resolution, I determined that for each sample first byte was frequency resolution and the second amplitude, is correct?
Question: How can I intelligently interpret the integers pulled from wave file for the ultimate purpose of determining rhythmic beats?
Many thanks...........Mickey
For a WAV file with 16 bits per sample, it is not the case that the first byte of the sample is frequency resolution and the second byte is amplitude. Both bytes together indicate the sample's amplitude at that specific point in time. The two bytes are interpreted as a 2-byte integer, so the values will range from -32768 to +32767.
I do not know how your sequencer works or what it is displaying. From your description, it sounds as if your sequencer is using FFT to convert the audio from time-domain (which is what a WAV file is) to frequency-domain (which is a graph with frequency along the x-axis and frequency amplitude along the y-axis). A WAV file does not contain frequency information.
I have data 0f 340 bytes in string mostly consists of signs and numbers like "føàA¹º#ƒUë5§Ž§"
I want to compress into 250 or less bytes to save it on my RFID card.
As this data is related to finger print temp. I want lossless compression.
So is there any algorithm which i can implement in C# to compress it?
If the data is strictly numbers and signs, I highly recommend changing the numbers into int based values. eg:
+12939272-23923+927392
can be compress into 3 piece of 32-bit integers, which is 22 bytes => 16 bytes. Picking the right integer size (whether 32-bit, 24-bit, 16-bit) should help.
If the integer size varies greatly, you could possibly use 8-bit to begin and use the value 255 to specify that the next 8-bit becomes the 8 more significant bits of the integer, making it 15-bit.
alternatively, you could identify the most significant character and assign 0 for it. the second most significant character gets 10, and the third 110. This is a very crude compression, but if you data is very limited, this might just do the job for you.
Is there any other information you know about your string? For instance does it contain certain characters more often than others? Does it contain all 255 characters or just a subset of them?
If so, huffman encoding may help you, see this or this other link for implementations in C#.
To be honest it just depends on how your input string looks like. What I'd do is try the using rar, zip, 7zip (LZMA) with very small dictionary sizes (otherwise they'll just use up too much space for preprocessed information) and see how big the raw compressed file they produce is (will probably have to use their libraries in order to make them strip headers to conserve space). If any of them produce a file under 250b, then find the c# library for it and there you go.
I'm new to audio analysis, but need to perform a (seemingly) simple task. I have a byte array containing a 16 bit recording (single channel) and a sample rate of 44100. How do I perform a quick analysis to get the volume at any given moment? I need to calculate a threshold, so a function to return true if it's above a certain amplitude (volume) and false if not. I thought I could iterate through the byte array and check its value, with 255 being the loudest, but this doesn't seem to work as even when I don't record anything, background noise gets in and some of the array is filled with 255. Any suggestions would be great.
Thanks
As you have 16-bit data, you should expect the signal to vary between -32768 and +32767.
To calculate the volume you can take intervals of say 1000 samples, and calculate their RMS value. Sum the squared sample values divide by 1000 and take the square root. check this number against you threshold.
Typically one measures the energy of waves using root mean square.
If you want to be more perceptually accurate you can take the time-domain signal through a discrete fourier transform to a frequency-domain signal, and integrate over the magnitudes with some weighting function (since low-frequency waves are perceptually louder than high-frequency waves at the same energy).
But I don't know audio stuff either so I'm just making stuff up. ☺
I might try applying a standard-deviation sliding-window. OTOH, I would not have assumed that 255 = loudest. It may be, but I'd want to know what encoding is being used. If any compression is present, then I doubt 255 is "loudest."