I'm trying to create a small utility application (console app or winforms) in C# that will load a single cycle waveform. The user can then enter a chord, and the app will generate a "single cycle chord", ie: a perfectly loopable version of the chord.
A perfect fifth (is not a chord but 2 notes) for instance would be the original single cycle looped twice, mixed with a second copy of the single cycle, transposed 7 semitones up and looped 3 times in the same timeframe.
What I can't find is how to transpose the wave simply by playing it faster, just like most basic samplers do. What would be a good way to do this in C#?
I've tried NAudio and cscore, but can't find how to play the wave at a different pitch by playing it faster. The pitch shift and varispeed examples are not the thing I'm looking for because those either try to keep the length the same or try to keep the pitch the same.
Related
i'm trying to capture sound from several interfaces of single audio card. Can i get arrays that not shifted relative to each other ? I want to use 4 microphones (2 microphone in one interface for each channels) to detect sound emitter position.I use windows, so i can't create aggregated device. I also recorded sound from different threads, but delay between arrays was very randomly. This is main problem, because i want to apply intercorrelation function for array to get delay(shift) that gives maximum value, this shift defines angle to sound source, so i can use anything different against of ASIO, but it's must be stable for all recording interval. If there isn't solution for c#, i know c++. Please, tell me how i can solve my problem.
If all mics are connected to the same hardware device you can use ASIO, assuming the device actually has ASIO drivers. If not, you can either try ASIO4All (but i have no idea whether it will synchronize independent devices) or use WASAPI and perform synchronization manually. WASAPI's IAudioCaptureClient::GetBuffer method will give you both the stream position and stream time at which the position was recorded, from there you should be able to work out the time shift between each of the 4 mics and then perform "unshifting" yourself.
I have been searching for answers for a long time now but every solution I get seems too complex for what I want to do or perhaps there is no "easier" way of doing it..
What I want to do is simply use my system microphone to get the volume or loudness (or whatever it is called) in the room. Then according to that volume, I want to adjust my system volume so that the sound from my system always "sounds the same" (the same loudness), no matter if a train passes by or an airplane flies over.
How do I get this loudness or volume in my room into a C# application to use that to change my system volume?
I am using C# and a laptop with a built in microphone.
It is better to use library to read the input from microphone. NAudio is probably the best one.
Calibrate input with determining microphone gain. [#MSalters Comment used]
Every second iterate over the waveform recorded in memory, then: square the amplitude (to get an energy), average the squared values and take the square root of that. (Or the log, to convert to dB) [#MSalters Comment used]
Depending on it, set system volume with WinAPI.
So I have this Mono audio file that contains people talking, talking pauses and then them talking again. While they are talking and while they're not talking, there are children crying in the background from time to time, cars' breaks' squealing, things you hear when you are outside.
My goal is to keep those parts when they are talking and to cut those parts when they are not talking out. It is not necessary to filter the background noises.
Basically my final goal is to have a cut list like this
Start in seconds, End in seconds
What have I tried?
I manually created a voice only file by fitting together all of those parts that contain speech.(10 seconds)
I manually created a noise only file by fitting together all of those parts that do not contain speech.(50 seconds)
I got the frequencies + amplitudes thereof by applying a Fast Fourier transform
I walk through the audio file each 100 ms and take a FFT snapshot
I put all values of one snapshot(in my case 512) in a List and feed it to a machine learning algorithm(numl) combined with a label (in the first case voice = true and in the second case voice = false)
Then I use my main audio file, do basically the same but this time use the result of my machine learning model to determine whether it is speech or not and output the time in seconds it realizes this.
My problem is that I get a lot of false positives and false negatives. It seems to recognize voice when there is none and vice versa.
Is the reason for this probably a badly trained model(I use a decision tree) or do I need to take other measures to get a better result?
The common misconception about speech is that it is treated as an unrelated sequence of data frames. The core property of speech is that it is a continuous process in time, not just an array of data points.
Any reasonable VAD should take account of that and use time-oriented classifiers like HMMs. In your case any classifier that takes time into account would it be a simple energy-based voice activity detection that monitors background level or a GMM-HMM based VAD will do way better than any static classifier.
For description of simple algorithms you can check Wikipedia.
If you are looking for a good sophisticated VAD implementation, you can find one in WebRTC project, this VAD was developed by Google:
https://code.google.com/p/webrtc/source/browse/trunk/webrtc/common_audio/vad/
I wish to write a C# WinForms application that can play a WAV file. While playing the file, it shows a waveform (similar to an oscilloscope).
At the same time, a user can record sound via the microphone, attempting to follow the original sound played (like a karaoke). The program displays the waveform of the recorded sound real-time, so comparisons can be seen from the waveform display of the original
wave file and the recorded one by the user. The comparisons will be done as in the difference in time (the delay) of the original and recorded sound. The waveform displays don't have to be very advanced (there is no need for cut, copy or paste); just being able to see it with a timeline would suffice.
I hope this is clear enough. Please do not hesitate to ask for more clarification if it's not clear. Thank you very much.
You can do what you want with C#, but it isn't going to work like you think. There is effectively no relationship at all between how a recording looks in an oscilloscope-type display and how that recording sounds to a human ear. So, for example, if I showed you two WAV files displayed in an oscilloscope display and told you that one recording was of a tuba playing and the other was of a person speaking a sentence, you would have no idea which was which just from looking at them.
If you want to compare a user's sounds to a pre-recorded WAV, you have to get more sophisticated and do FFT analysis of both and compare the frequency spectra, but even that won't really work for what you're trying to do.
Update: after some thought, I don't think I fully agree with my above statements. What you want to do might sort of work if what you want to do is to use the oscilloscope-type effect to compare the pitch (or frequency) of the WAV and the person's voice. If you tuned the oscilloscope to show a relatively small number of wavelengths at a time (like 20, maybe), the user would be able to quickly see the effect of raising or lowering the pitch of their voice.
I have a small sample C# app that I wrote about 2 years ago that does something kind of like this, only it displays an FFT-produced spectrograph instead of an oscilloscope (the difference is basically that a spectrograph shows frequency-domain information while an oscilloscope shows time-domain information). It's realtime, so you can talk/sing/whatever into a microphone and watch the spectrograph change dynamically.
I can dig this out and post the code here if you like. Or if you want the fun of doing it all yourself, I can post some links to the code resources you'd need.
The NAudio library has plenty of functionality that will (possibly) give you what you need. I've used it in the past for some simple operations, but it is much more powerful than I've had need to use.
#ZombieSheep
Naudio is indeed useful, but it has limitations. For example, there is not much control over the waveform display, it cannot be cleared and redrawn again. Besides, if it gets too long its impossible to scroll back to see the waveform in the front part. One more thing is that it only works with playing the sound but does not work with recording the sound.
Thank you.
I'm looking for an audio library that works with .NET that allows for smooth looping. I've tried DirectX AudioVideoPlayback and Mentalis. Both are easy to use, but the looping skips a bit. I'm wondering if that's my fault or theirs. I have sound samples that I know can loop cleanly (WinAmp can do it fine) but I can't get my C# app to do the same. What library could I use, or what could I fix in my app to get it to loop cleanly with the libraries I have?
UPDATE: FMOD has been able to loop my audio, but the problem is that the .net wrapper I have only loads files one way. I can't play a sound effect more than once because they get disposed when playback finishes, and sometimes it hangs whenever a sound is supposed to be played. I know I could just reload the sound to play it again, but I don't want to hit the disk every time a gunshot is fired. Should I just reach into the C++ layer myself and skip the .NET wrappers?
You could try FMOD which is free for non-commercial use.
I would double-check that the sound really loops cleanly - specifically, that the first sample and the last sample are close (or equal), otherwise you'll hear a click. WinAMP could conceivably do some special processing to eliminate the click.
UPDATE: FMOD comes with a whole bunch of samples in C# that show the right way to do stuff. The example called "3d" shows, among other things, a cleanly looping sound. You should be able to rip out the code that handles the looping without utilising the 3D features.