I'm trying to determine the "beats per minute" from real-time audio in C#. It is not music that I'm detecting in though, just a constant tapping sound. My problem is determining the time between those taps so I can determine "taps per minute" I have tried using the WaveIn.cs class out there, but I don't really understand how its sampling. I'm not getting a set number of samples a second to analyze. I guess I really just don't know how to read in an exact number of samples a second to know the time between to samples.
Any help to get me in the right direction would be greatly appreciated.
I'm not sure which WaveIn.cs class you're using, but usually with code that records audio, you either A) tell the code to start recording, and then at some later point you tell the code to stop, and you get back an array (usually of type short[]) that comprises the data recorded during this time period; or B) tell the code to start recording with a given buffer size, and as each buffer is filled, the code makes a callback to a method you've defined with a reference to the filled buffer, and this process continues until you tell it to stop recording.
Let's assume that your recording format is 16 bits (aka 2 bytes) per sample, 44100 samples per second, and mono (1 channel). In the case of (A), let's say you start recording and then stop recording exactly 10 seconds later. You will end up with a short[] array that is 441,000 (44,100 x 10) elements in length. I don't know what algorithm you're using to detect "taps", but let's say that you detect taps in this array at element 0, element 22,050, element 44,100, element 66,150 etc. This means you're finding taps every .5 seconds (because 22,050 is half of 44,100 samples per second), which means you have 2 taps per second and thus 120 BPM.
In the case of (B) let's say you start recording with a fixed buffer size of 44,100 samples (aka 1 second). As each buffer comes in, you find taps at element 0 and at element 22,050. By the same logic as above, you'll calculate 120 BPM.
Hope this helps. With beat detection in general, it's best to record for a relatively long time and count the beats through a large array of data. Trying to estimate the "instantaneous" tempo is more difficult and prone to error, just like estimating the pitch of a recording is more difficult to do in realtime than with a recording of a full note.
I think you might be confusing samples with "taps."
A sample is a number representing the height of the sound wave at a given moment in time. A typical wave file might be sampled 44,100 times a second, so if you have two channels for stereo, you have 88,200 sixteen-bit numbers (samples) per second.
If you take all of these numbers and graph them, you will get something like this:
(source: vbaccelerator.com)
What you are looking for is this peak ------------^
That is the tap.
Assuming we're talking about the same WaveIn.cs, the constructor of WaveLib.WaveInRecorder takes a WaveLib.WaveFormat object as a parameter. This allows you to set the audio format, ie. samples rate, bit depth, etc. Just scan the audio samples for peaks or however you're detecting "taps" and record the average distance in samples between peaks.
Since you know the sample rate of the audio stream (eg. 44100 samples/second), take your average peak distance (in samples), multiply by 1/(samples rate) to get the time (in seconds) between taps, divide by 60 to get the time (in minutes) between taps, and invert to get the taps/minute.
Hope that helps
Related
I have a stream of data that I get at around 10 snapshots per second. I wrote a C# controller that takes this data and adjusts some data-gathering parameters based on the distance away from the expected result. Currently, I just do a linear scale operation on my data so that the farther away I am from the expected result the more it corrects. The problem is that the incoming data stream is delayed by somewhere between 0.5 and 2 seconds (I can calculate that at runtime). Because of this delay, it is correcting for results from a while ago and is constantly overcorrecting and even correcting in the wrong direction sometimes.
I am looking for an algorithm to do the following:
Implement a correction algorithm (I'd prefer PID) that will attempt to hone-in on the optimal value
Predict a certain amount of time (or number of datasets) in advance based on the history of corrections and results
What options do I have in terms of algorithms that can accomplish this?
C# code samples would be appreciated; I am not great at converting complex pseudo-code in to a working implementation.
I'm trying to get input from plug-in guitar, get the frequency from it and check whether the users is playing the right note or not. Something like a guitar tuner (I'll need to do a guitar tuner as well).
My first question is, how can I get the frequency of guitar input in real time?
and is it possible to do something like :
if (frequency == noteCFrequency)
{
//print This is a C note!!
}
I'm now able to get input from the soundcard, record and playback the input sound already.
For an implementation of FFT in C# you can have a look at this.
Whiel I think that you do not need to fully understand the FFT to use it, you should know about some basic limitations:
You always need a sample window. You may have a sliding window but the essence of being fast here is to take a chunk of signal and accept some error.
You have "buckets" of frequencies not exact ones. The result is something like "In the range 420Hz - 440Hz you have 30% of the signal". (The "width" of the buckets should be adjustable)
The window size must contain a number of samples that is a power of 2.
The window size must be at least two wavelengths of the longest wavelength you want to detect.
The highest frequency is given by the sampling rate. (You don't need to worry about this so much)
The more precise you want your frequencies separated, the longer shall your window be.
The other answers don't really explain how to do this, they just kinda waive their arms. For example, you would have no idea from reading those answers that the output of an FFT is a set of complex numbers, and you wouldn't have any clue how to interpret them.
Moreover FFT is not even the best available method, although for your purposes it works fine and most people find it to be the most intuitive. Anyway, this question has been asked to death, so I'll just refer you to other questions on SO. Not all apply to C#, but you are going to need to understand the (non-trivial) concepts first. You can find an answer to your question by reading answers to these questions and following the links.
frequency / pitch detection for dummies
Get the frequency of an audio file in every 1/4 seconds in android
How to detect sound frequency / pitch on an iPhone?
How to calculate sound frequency in android?
how to retrieve the different original frequency in each FFT caculating and without any frequency leakage in java
You must compute the FFT -Fast Fourier Transform- of a piece of signal and look for a peak. For kinds of FFT, window type, window size... you must read some documentation regarding signal processing. Anyway a 25 ms window is OK and use a Hamming window, for example. On the net there is lot of code for computing FFT. Good luck!
I have a wav file and all what i need is to perform a function when a remarkable intensity of sound plays.
For example : if there is a sound of intensity level 10 (supposed) is playing so i want that when ever the intensity level of sound increases from 10 then an event should be triggered to tell me that there is a remarkable sound.
I tried to google it and found that if we read the bytes of wav file and read the data chunk (after 44th byte) we get the user data (sound data). but when i analyse this data i got confused because there is also same data where there is no sound.
I hope my question is quite clear.
so please i need your suggestions/ideas and references.
You don't need an FFT for this - you can just compute the short term RMS power and when this exceeds a predetermined threshold then you have a "loud" sound.
power_RMS = sqrt(sum(x^2) / N)
where x is the sample value and N is the number of samples over which you want to compute RMS power - I would suggest using a period of say 10 ms which gives N = 441 samples at a 44.1 kHz sample rate.
I want to make a program that detects the note that is being played in front of the microphone. I am testing the FFT function of Naudio, but with the tests that I did in audacity it seems that FFT does not detect the pitch correctly. I played an C5, but the highest pick was at E7.
I changed the first dropdown box in the frequency analysis window to "enchanced autocorrelation" and after that the highest pick was at C5.
I googled "enchanced autocorrelation" and had no luck.
You are likely getting thrown off by harmonics. Have you tried testing with a sine wave to see if your NAudio's FFT is in the ballpark?
See these references:
http://cnx.org/content/m11714/latest/
http://www.gamedev.net/community/forums/topic.asp?topic_id=506592&whichpage=1�
Line 48 in Spectrum.cpp in the Audacity source code seems to be close to what you want. They also reference an IEEE paper by Tolonen and Karjalainen.
The highest peak in an audio spectrum is not necessarily the musical pitch as a human would perceive it, especially in a sound with strong overtones. That's because pitch is a human psycho-perceptual phenomena, the brain will often deduce frequencies that aren't even present in a waveform.
Auto-correlation methods of frequency or pitch estimation (roughly, finding how far apart even a funny-looking and/or non-sinusoidal waveform repeats in time) is usually a better match for what a human would call pitch. The reason for various enhancements to the autocorrelation algorithm is that simple autocorrelation will find an near infinite number of repeating wavelengths (e.g. if it repeats every 1 second it also repeats twice every 2 seconds, etc.) So the trick is to weight the correlation to somehow statistically better match what a human would guess about the same waveform.
Well, if you can live with GPLv2, why not take a peek at the Audacity source code?
http://audacity.sourceforge.net/download/beta_source
I have a C# project that plays Morse code for RSS feeds. I write it using Managed DirectX, only to discover that Managed DirectX is old and deprecated. The task I have is to play pure sine wave bursts interspersed with silence periods (the code) which are precisely timed as to their duration. I need to be able to call a function which plays a pure tone for so many milliseconds, then Thread.Sleep() then play another, etc. At its fastest, the tones and spaces can be as short as 40ms.
It's working quite well in Managed DirectX. To get the precisely timed tone I create 1 sec. of sine wave into a secondary buffer, then to play a tone of a certain duration I seek forward to within x milliseconds of the end of the buffer then play.
I've tried System.Media.SoundPlayer. It's a loser [edit - see my answer below] because you have to Play(), Sleep(), then Stop() for arbitrary tone lengths. The result is a tone that is too long, variable by CPU load. It takes an indeterminate amount of time to actually stop the tone.
I then embarked on a lengthy attempt to use NAudio 1.3. I ended up with a memory resident stream providing the tone data, and again seeking forward leaving the desired length of tone remaining in the stream, then playing. This worked OK on the DirectSoundOut class for a while (see below) but the WaveOut class quickly dies with an internal assert saying that buffers are still on the queue despite PlayerStopped = true. This is odd since I play to the end then put a wait of the same duration between the end of the tone and the start of the next. You'd think that 80ms after starting Play of a 40 ms tone that it wouldn't have buffers on the queue.
DirectSoundOut works well for a while, but its problem is that for every tone burst Play() it spins off a separate thread. Eventually (5 min or so) it just stops working. You can see thread after thread after thread exiting in the Output window while running the project in VS2008 IDE. I don't create new objects during playing, I just Seek() the tone stream then call Play() over and over, so I don't think it's a problem with orphaned buffers/whatever piling up till it's choked.
I'm out of patience on this one, so I'm asking in the hopes that someone here has faced a similar requirement and can steer me in a direction with a likely solution.
I can't believe it... I went back to System.Media.SoundPlayer and got it to do just what I want... no giant dependency library with 95% unused code and/or quirks waiting to be discovered :-). Furthermore, it runs on MacOSX under Mono (2.6)!!! [wrong - no sound, will ask separate question]
I used a MemoryStream and BinaryWriter to crib a WAV file, complete with the RIFF header and chunking. No "fact" chunk needed, this is 16-bit samples at 44100Hz. So now I have a MemoryStream with 1000ms of samples in it, and wrapped by a BinaryReader.
In a RIFF file there are two 4-byte/32-bit lengths, the "overall" length which is 4 bytes into the stream (right after "RIFF" in ASCII), and a "data" length just before the sample data bytes. My strategy was to seek in the stream and use the BinaryWriter to alter the two lengths to fool the SoundPlayer into thinking the audio stream is just the length/duration I want, then Play() it. Next time, the duration is different, so once again overwrite the lengths in the MemoryStream with the BinaryWriter, Flush() it and once again call Play().
When I tried this, I couldn't get the SoundPlayer to see the changes to the stream, even if I set its Stream property. I was forced to create a new SoundPlayer... every 40 milliseconds??? No.
Well I want back to that code today and started looking at the SoundPlayer members. I saw "SoundLocation" and read it. There it said that a side effect of setting SoundLocation would be to null the Stream property, and vice versa for Stream. So I added a line of code to set the SOundLocation property to something bogus, "x", then set the Stream property to my (just modified) MemoryStream. Damn if it didn't pick that up and play a tone precisely as long as I asked for. There don't seem to be any crazy side effects like dead time afterward or increasing memory, or ??? It does take 1-2 milliseconds to do that tweaking of the WAV stream and then load/start the player, but it's very small and the price is right!
I also implemented a Frequency property which re-generates the samples and uses the Seek/BinaryWriter trick to overlay the old data in the RIFF/WAV MemoryStream with the same number of samples but for a different frequency, and again did the same thing for an Amplitude property.
This project is on SourceForge. You can get to the C# code for this hack in SPTones.CS from this page in the SVN browser. Thanks to everyone who provided info on this, including #arke whose thinking was close to mine. I do appreciate it.
It's best to just generate the sine waves and silence together into a buffer which you play. That is, always play something, but write whatever you need next into that buffer.
You know the samplerate, and given the samplerate, you can calculate the amount of samples you need to write.
uint numSamples = timeWantedInSeconds * sampleRate;
That's the amount of samples you need to generate a sine wave or silence, whichever. Then just fill the buffer as needed. That way, you get the most accurate possible timing.
Try using XNA.
You will have to provide a file, or a stream to a static tone, that you can loop. You can then change the pitch and volume of that tone.
Since XNA is made for games, it will have no problem at all with 40 ms delays.
It should be pretty easy to convert from ManagedDX to SlimDX ...
Edit: What stops you, btw, just pre-generating 'n' samples of sine wave? (Where n is the closest to the number of milliseconds you want). It really doesn't take all that long to generate the data. Further than that if you have a 22Khz buffer and you want the final 100 samples why don't you just submit 'buffer + 21950' and set the buffer length to 100 samples?