In my Speech Engine I activate / desactivate multiple grammars.
In a special step I'd like to run a Grammar with ONLY to capture Audio of next given sentence according to engine's properties.
But to start/stop matching something, I assume engine needs "words". So I don't know how to do it ?
(Underlaying explanation: my application convert all garbage audio to text using google speech API because dictation is too bad and no available on Kinect)
Well, actually, no, the SR engine only needs to know that the incoming audio is "speech-like" (usually determined by the spectral characteristics of the audio). In particular, you could use the AudioPosition property and the SpeechDetected and RecognitionRejected events to send all rejected audio to the google speech API.
So your workflow would look like this:
Ask question of user.
Enable appropriate grammars.
Wait for recognition or recognition rejected.
If recognition, process accordingly
If recognition rejected, collect retained audio & send to google speech API.
Related
How would one use the Microsoft Speech API in C# Unity in a way that lets one lip-sync a character, i.e. detect when the speech ended (or have the result available as raw audio file, or have the audio output be monitorable)? For instance, this library lets one use the Microsoft Speech API in Unity, but the audio has no apparent return object, and checking WindowsVoice.GetStatusMessage() in an interval crashes Unity for me. Thanks!
I'm not sure if it is possible but anyway,
I use using System.Speech.Recognition; in winform C# app.
I'm wondering if it is possible not only to recognize speech, but also recognize voice, somehow recognize difference between different voices
to get something near to reading of multiply content from each separate voice, for example from two simultaneously or separately speaking users as different two.
Or at least maybe some method to control background loudness, for example if AudioLevelUpdated event allows me to see input volume, but maybe also exist some specific way to separate loud voice from extra noise or voices in background
System.Speech.Recognition will not help you in voice recognition.
System.Speech.Recognition is intended for speech to text. Adding grammar to it improves its efficiency. You can train the Windows desktop for better conversion. Refer Speech Recognition in Control Panel.
There are couple of 3rd party libraries available for voice recognition.
For removal of noise, you can refer Sound visualizer in C#.
You can find an interesting discussion at msdn forum.
I think you should take a look at CRIS which is part of Microsoft Cognitive Services, at least for you question about noise.
CRIS is a Custom Speech Service and its basic use is to improve the quality of Speech-to-text using custom acoustics models (like background noise) and learning vocabulary using samples.
You can import :
Acoustic Datasets
Language Datasets
Pronunciation Datasets
For example in acoustic models you have:
Microsoft Conversational Model for recognizing speech spoken in a conversational style (i.e. speech directed at another person).
Microsoft Search and Dictation Model for speech directed to an application, such as commands, search queries or dictation.
There is also a Speaker Recognition API available in preview
I've checked out the speech framework that Microsoft offers in C# and it appears to be VERY bad compared to other voice recognition solutions like Siri on iPhone4 or Google's search option on my Galaxy Tab. I guess these work better as they have a HUGE amount of voice samples and it is being processed on high performance servers.
So I've been looking for their API but it doesn't seem to support desktop applications at all. The only think I could find was this post: Use google "Speak Now" in C#
----> I misunderstood what the person answering meant so this part is invalid!
Where the answer simply says: "Once you run the recognition api on the
sample of text you want to recognize, you can simply call google.com
with the "q" parameter to do a query search"
I don't understand what he means but I do know that he claims you can
get a desktop API to use Google as processor. That would be great!
So my question is: How can I use google.com's "Speak now" Voice Recognition feature I see on my tablet, in a standalone desktop application preferably written in C#?
I know the first comment will be that am duplicating previous threads, but the codes I found (from MSDN) uses window's speech recognition... I'm doing my graduation project and speech recognition is part of it! and I cant include this code,I have to try and do it from scratch, am doing some researches about it and I would be really thankful if someone have already done this and can give me a link for a paper or a code I can benefit from !
Thanks in advance!
Microsoft Server Speech Platform 10.1 (SR and TTS in 26 languages)
http://www.microsoft.com/download/en/details.aspx?displaylang=en&id=24003
The basic operations that speech recognition applications perform:
Starting the speech recognizer.
Creating a recognition grammar.
Loading the grammar into a speech recognizer.
Registering for speech recognition event notification.
Creating a handler for the speech recognition event.
Language Packs
http://www.microsoft.com/download/en/details.aspx?displaylang=en&id=3971
Runtime Download
http://www.microsoft.com/download/en/details.aspx?displaylang=en&id=24974
These libraries could, at least, give you a starter to understand the makeup of the interfaces, and a starter of the core/base code to copy/steal or emulate ;)
There's also this paper:
http://www.cs.nyu.edu/~mohri/pub/hbka.pdf
Best of Luck!
Writing the code that implements the basic recognition algorithm (Hidden Markov Model based recognizers are the norm these days) is only part of your challenge. Virtually every speech recognition system is trained on actual speech data, so you also have to identify a corpus (collection of audio files and transcriptions) to train your mathematical models.
Have a look at the open source Sphinx speech recognizer (and related tools) from CMU if you are still interested in doing it all by yourself.
I am writing my first speech recognition application by using the System.Speech namespace of .NET Framework 4.0.
I am using the shared speech recognition, loading a default dictation grammar and custom grammars I've done.
I also capture the text recognized by the Windows Speech Recognizer (WSR) by implementing a handler for the event "SpeechRecognized".
I would like to change the text recognized (e.g. to change "two" by "2" in the text) but if I do that, the output will not be written in the current app (e.g. MS Word).
I know I can do something SIMILAR by using the SendKeys method, but I think it's not a good idea because the quality of the output is lower. For example, if you use WSR as a standard user, you will see that after "." or a new line the following sentence starts with an uppercase character. There're tons of things you must take into account if you want to write your own output parser so I would like to use the one WSR uses if you don't handle the SpeechRecognized event. But... HOW??
(I wouldn't mind to use SAPI if necessary).
Thanks!!
The short answer is that you can't. WSR doesn't have a hook that allows 3rd parties to connect into its dictation pipeline.