I am writing my first speech recognition application by using the System.Speech namespace of .NET Framework 4.0.
I am using the shared speech recognition, loading a default dictation grammar and custom grammars I've done.
I also capture the text recognized by the Windows Speech Recognizer (WSR) by implementing a handler for the event "SpeechRecognized".
I would like to change the text recognized (e.g. to change "two" by "2" in the text) but if I do that, the output will not be written in the current app (e.g. MS Word).
I know I can do something SIMILAR by using the SendKeys method, but I think it's not a good idea because the quality of the output is lower. For example, if you use WSR as a standard user, you will see that after "." or a new line the following sentence starts with an uppercase character. There're tons of things you must take into account if you want to write your own output parser so I would like to use the one WSR uses if you don't handle the SpeechRecognized event. But... HOW??
(I wouldn't mind to use SAPI if necessary).
Thanks!!
The short answer is that you can't. WSR doesn't have a hook that allows 3rd parties to connect into its dictation pipeline.
Related
I'm not sure if it is possible but anyway,
I use using System.Speech.Recognition; in winform C# app.
I'm wondering if it is possible not only to recognize speech, but also recognize voice, somehow recognize difference between different voices
to get something near to reading of multiply content from each separate voice, for example from two simultaneously or separately speaking users as different two.
Or at least maybe some method to control background loudness, for example if AudioLevelUpdated event allows me to see input volume, but maybe also exist some specific way to separate loud voice from extra noise or voices in background
System.Speech.Recognition will not help you in voice recognition.
System.Speech.Recognition is intended for speech to text. Adding grammar to it improves its efficiency. You can train the Windows desktop for better conversion. Refer Speech Recognition in Control Panel.
There are couple of 3rd party libraries available for voice recognition.
For removal of noise, you can refer Sound visualizer in C#.
You can find an interesting discussion at msdn forum.
I think you should take a look at CRIS which is part of Microsoft Cognitive Services, at least for you question about noise.
CRIS is a Custom Speech Service and its basic use is to improve the quality of Speech-to-text using custom acoustics models (like background noise) and learning vocabulary using samples.
You can import :
Acoustic Datasets
Language Datasets
Pronunciation Datasets
For example in acoustic models you have:
Microsoft Conversational Model for recognizing speech spoken in a conversational style (i.e. speech directed at another person).
Microsoft Search and Dictation Model for speech directed to an application, such as commands, search queries or dictation.
There is also a Speaker Recognition API available in preview
In my Speech Engine I activate / desactivate multiple grammars.
In a special step I'd like to run a Grammar with ONLY to capture Audio of next given sentence according to engine's properties.
But to start/stop matching something, I assume engine needs "words". So I don't know how to do it ?
(Underlaying explanation: my application convert all garbage audio to text using google speech API because dictation is too bad and no available on Kinect)
Well, actually, no, the SR engine only needs to know that the incoming audio is "speech-like" (usually determined by the spectral characteristics of the audio). In particular, you could use the AudioPosition property and the SpeechDetected and RecognitionRejected events to send all rejected audio to the google speech API.
So your workflow would look like this:
Ask question of user.
Enable appropriate grammars.
Wait for recognition or recognition rejected.
If recognition, process accordingly
If recognition rejected, collect retained audio & send to google speech API.
I've made an app that uses the SpeechRecognizer class to setup a simple grammar and recognize simple words.
When I run it on Win7 I notice two things.
1) The first time I start the app the Speech recognition bar (thingy) comes up, but the UI of my app is not shown (it is running as I can see in the Task Manager).
When I start the app for the 2nd time (after killing the first instance) it display normally (with the windows speech recognition toolbar already running).
2) When I speak one of the words I'm recognizing in my app a 2nd time, it does not trigger an event -instead- it selects the text on my app where I print out in a listbox the history of the recognized words.
Note: When I remove the history listbox from the main screen, it works as expected. Apperently Win7 tries to find the word in my UI first and when it cannot find it - only then does it trigger my programmatic event...??
Both problems seem very weird to me.
More info on the app: Its a VS2008/.NET 3.0 WPF app written in C#. The application allows a user to edit setting groups (patches) for sending Midi commands. Each Patch is tagged with a phrase. When that Phrase is spoken (recognized by the app) all configured Midi commands are sent to the output(s). The history of patches that were recalled by the user are printed in a 'history' list on the apps main screen.
I hope someone can help me with this. Any suggestions are most welcome.
Thanx,
Marc Jacobi
I think you are using the shared speech recognizer (SpeechRecognizer). When you instantiate
SpeechRecognizer you get a recognizer that can be shared by other applications and is typically used for building applications to control windows and applications running on the desktop.
It sounds like you want to use your own private recognition engine (SpeechRecognitionEngine). So instantiate a SpeechRecognitionEngine instead.
see http://msdn.microsoft.com/en-us/library/system.speech.recognition.speechrecognizer(v=vs.90).aspx
What is the difference between System.Speech.Recognition and Microsoft.Speech.Recognition? and Disable built-in speech recognition commands? may also have some helpful info.
I got it working, thanx!
The main difference between using the SpeechRecognizer and the SpeechRecognitionEngine are:
Construct the SpeechRecognitionEngine using a RecognizerInfo from InstalledRecognizers.
Call one of the SetInputToXxxx methods
Call RecognizeAsync(RecognizeMode.Multiple) to mimic the SpeechRecognizer (SpeechRecognized) events.
Call RecognizeCancel/Stop to quit.
Hope it helps.
Microsoft Speech Recognition comes with a Speech Reference Card. It consists in some pre-determined words that are recognized.
I want to know if it's possible to disable it. Is it?
EDIT:
I want to remove all pre-defined commands.
This ones:
http://windows.microsoft.com/en-us/windows-vista/Common-commands-in-Speech-Recognition
EDIT2:
I'm using SpeechLib!
You probably want an in-process recognizer, instead of a shared recognizer.
Since you're using C#, you need to use the SpeechRecognitionEngine class if you're using System.Speech.Recognition.
In particular, you need to also set the Audio Input property of the recognizer using SetInputToDefaultAudioDevice, so that the inproc recognizer knows where to get the audio from.
Trying to change the code to use what you said, I've discovered what I needed!
With this command:
recGrammar.SetGrammarState(SPGRAMMARSTATE.SPGS_EXCLUSIVE);
all worked!
You can find more info here!
I am making a Smart House Control System right now, and I have a little problem.
I was thinking on using Cosmos for a base system, and adding the needed namespace libraries to it, but as the usual System.Speech.Recognition namespace depends too much on Windows Speech API, I have to forget about using it.
So my question is, is there any (free if possible) voice recognition and/or speech speech synthesizer library for C#, what has the following:
support for multi-language speaking
extracting text content from speech sample
synthesizing speech with selectable (or user-written) speech pattern (voice)
A general usage, non-windows dependent library would be the best, and of course, if it was free too.
Voxeo offers developer accounts which you could use to develop a speech powered home automation system. I've interfaced it to my own home automation system for a small subset of the commands my home understands and it works great. You'll need to learn some VoiceXML to use it.
SAPI works OK for voice synthesis; I use SAPI in my system for spoken prompts in the house like a weather forecast that comes over the speakers in the morning when you walk into the bathroom. If Cosmos doesn't allow you to include all the DLLs you need maybe you could create a separate service using SAPI and then use WCF (or other) to communicate between them??
For the related problem of understanding natural language in a typed form I've developed a C# NLP Engine which I hope to be able to make available for non-commercial at some point in the future.
Extracting text from speech without specifying any grammar up-front is a very hard problem and is going to be error prone. Even if you could solve that, you'd still have the problem of trying to understand what they said using NLP. Constructing a grammar that guides the recognizer to the kinds of sentences you want to recognize (like VoiceXML does) is likely to achieve much higher accuracy.
Check out this project: http://cmusphinx.sourceforge.net/
It's an open source speech recognition project. It is trainable with any language you want plus since its open source you can modify it to suit your needs or expand it.