please help me in speech recognition using HMM (hidden markov models) or MFCC ( Mel Frequency Cepstral Coefficient ) by longage c# or c++
I want to recognize word "one", "two"... to "ten")
When I say one ===> show MessageBox write one
You should use a toolkit for this purpose like HTK, Kaldi, etc. which are open-source or you could use a free API like Google Speech API, Microsoft Speech API (SAPI), etc.
It is not really easy to do speech recognition using HMM from scratch. BTW, MFCC is not a machine learning tool like HMM. MFCC is a method of feature extraction which is used to prepare observations for HMM training and decoding.
Related
How to detect language and convert speech to text? Is Google APi support it? If yes then can anyone put any example of code?
Please help needed to convert speech to text?
Thanks
If you use C# 3.5 and higher u can Add a reference to the System.Speech namespace using the Add Reference in the Solution Explorer.
Then take a look at this Articles:
Speech recognition, speech to text, text to speech, and speech synthesis in C#
C# Speech to Text
I am looking to develop an app in C# WinRt, but was wondering what libraries are available for playback and for complicated manipulation. I am looking for a free library that allows for an extensive list of audio formats to be played (for example mp3, wma, wav, ogg, etc.) and also to be analyzed. Thats pretty much the basic functionality I would need. But if I could get picky, a library that can convert audio files between the formats would be handy. Doing a google search I came across the Naudio library, but it was not so greatly compatible with WinRt.Thanks for any tips or advice on this.
The current alpha build of NAudio 1.7 (available via NuGet) does contain a Windows RT assembly and the source code includes a simple demo of playback and recording as a Windows Store app. Since it uses Media Foundation, you'll be able to play most of the file types you suggested (although ogg won't be supported out of the box), and you can construct your audio pipeline to access the audio as floating point samples for analysis.
Things that aren't currently supported are using the Media Foundation encoders to encode, and the various reader/writer classes need to be re-written to use the WinRT asynchronous streams and File I/O APIs instead of the regular .NET ones. Hopefully these features will be added to the library soon.
I am been trying to find a mehthod for online speech recognition, for eg: very similar to google voice search, which does not require the user to install any plugin/software/flash. The user just has to plugin the microphone and speak something for the text to get recognised.
I thought of this approach but don't know if this is corrector not. I built a dll which can take an input audio stream and give an output of recognized txt out of audio. I referenced this dll in ASP.NET references, and further thinking to upload an audio file from the user side to the server which will then be used the 'recognizer' dll. I am not sure if this approach is correct or not? Is there any other approach that I can follow?
The main thing is I can't have the user install anything or any dependency for this implementation such as flash/silverlight etc.
If you can specify that your users use Chrome 11, or later, you could use Google's webkit to speech enable your application. Here is a link on how to use webkit for speech. This leverages the audio input capabilities that are available in HTML5. If you take a look at this blog it will explain how it works, because the author reverse engineered it. It is taking the audio input from the user in the browser and sending it to a service for processing, returning the results as a JSON message. You could build your own service on the server side, as you are suggesting, to imitate what Google is doing. Building a scalable service for speech recognition will not be a small feat.
I have built a voice recognition system in C# and I’m using the Microsoft Speech Platform 11.0 (Swedish language packs). I use a wav file as an input for the SpeechRecognitionEngine.
The problem is that some of the words (40%) are not recognized at all.
I would like to record some commands (a limited number of Swedish words and/or numbers) to a sound file and import them so that the SpeechRecognitionEngine could be able to understand them.
For example:
Record when an user says the word: “Katt” (Swedish word for cat) and then be able to tell the RecognitionEngine that this means “Katt”.
Is this possible or are there better solutions?
Yes its possible. Create a lookup entry file for your words and write soundex in c# and load the lookup grammar. It will display whichever the word you pronounced but it should be in the lookup.
I am looking to build a PC box that can play up to 16 audio files(mp3/wav) out to 16 analog devices(think of it as 16 sets of speakers). There will be a one to one relationship for audio file to analog output. The solution that is being presented to be hardware wise is to use multiple sound cards, so my question is, is there a library available that will let me play an audio file to a specific sound card / channel?
Yes, the BASS library can do this.
http://www.un4seen.com
C# wrappers are also available, a commercial one:
Bass.NET
and an excellent alternative open source C# wrapper for Bass:
ManagedBass by Mathew Sachin
I found this one while I was looking as well.
http://www.alvas.net/alvas.audio.aspx
FMOD should do what you're looking for. It's also an extremely fast, solid and popular library overall.