I've checked out the speech framework that Microsoft offers in C# and it appears to be VERY bad compared to other voice recognition solutions like Siri on iPhone4 or Google's search option on my Galaxy Tab. I guess these work better as they have a HUGE amount of voice samples and it is being processed on high performance servers.
So I've been looking for their API but it doesn't seem to support desktop applications at all. The only think I could find was this post: Use google "Speak Now" in C#
----> I misunderstood what the person answering meant so this part is invalid!
Where the answer simply says: "Once you run the recognition api on the
sample of text you want to recognize, you can simply call google.com
with the "q" parameter to do a query search"
I don't understand what he means but I do know that he claims you can
get a desktop API to use Google as processor. That would be great!
So my question is: How can I use google.com's "Speak now" Voice Recognition feature I see on my tablet, in a standalone desktop application preferably written in C#?
Related
I'm not sure if it is possible but anyway,
I use using System.Speech.Recognition; in winform C# app.
I'm wondering if it is possible not only to recognize speech, but also recognize voice, somehow recognize difference between different voices
to get something near to reading of multiply content from each separate voice, for example from two simultaneously or separately speaking users as different two.
Or at least maybe some method to control background loudness, for example if AudioLevelUpdated event allows me to see input volume, but maybe also exist some specific way to separate loud voice from extra noise or voices in background
System.Speech.Recognition will not help you in voice recognition.
System.Speech.Recognition is intended for speech to text. Adding grammar to it improves its efficiency. You can train the Windows desktop for better conversion. Refer Speech Recognition in Control Panel.
There are couple of 3rd party libraries available for voice recognition.
For removal of noise, you can refer Sound visualizer in C#.
You can find an interesting discussion at msdn forum.
I think you should take a look at CRIS which is part of Microsoft Cognitive Services, at least for you question about noise.
CRIS is a Custom Speech Service and its basic use is to improve the quality of Speech-to-text using custom acoustics models (like background noise) and learning vocabulary using samples.
You can import :
Acoustic Datasets
Language Datasets
Pronunciation Datasets
For example in acoustic models you have:
Microsoft Conversational Model for recognizing speech spoken in a conversational style (i.e. speech directed at another person).
Microsoft Search and Dictation Model for speech directed to an application, such as commands, search queries or dictation.
There is also a Speaker Recognition API available in preview
Since this topic is a bit out dated I would like to re-discuss it here.
After searching the web, I came across the following link:
http://archive.msdn.microsoft.com/nesl which runs only out of browser because Silverlight (in browser) can't access certain COM libraries that are related to windows.
I wish (for obvious performance purposes) to perform the speech recognition through Silverlight (on the client machine) and then send the result (text) to the server via a postback to perform the corresponding action.
I already achieved a way to get the voice from the microphone and store it in Silverlight in a byte array. Is there a way to convert the speech byte array to text?
HTML5 Google service is not an acceptable approach since IE is required.
My final goal is to implement a speech recognition in ASP.NET Web Application.
Any suggestion is appreciated.
You can't do it in Silverlight. You'll need to send the audio somewhere. You can call some third-party service (I'm sure there are plenty--and it shouldn't matter that you're using IE) or your own ASP.NET (which can call System.Speech or any other free or commercial system). But before you do that, you should compress the audio. There aren't a lot of options in Silverlight. I recommend NSpeex, or at least convert it to 16kHz PCM (either linear or a-law).
Here's a list of Speech SDKs (many of which have a cloud service component): http://www.toolsjournal.com/mobile-articles/item/918-top-10-sdks-to-voice-enable-mobile-apps-quickly
To make Trusted In-browser Silverlight application:
http://msdn.microsoft.com/en-us/library/gg192793(v=vs.95).aspx
http://www.pitorque.de/MisterGoodcat/post/Silverlight-5-Tidbits-Trusted-applications.aspx
And for security background:
http://msdn.microsoft.com/en-us/library/ee721083%28v=vs.95%29.aspx
Note that NESL doesn't support DictionaryGrammar. Grammar needs to be pre-defined:
http://archive.msdn.microsoft.com/nesl/Thread/View.aspx?ThreadId=4905
I'm looking for speech (wave files) to text on windows server 2008 (or win server 2008 r2) using c# (at least an api that i can call from c#) that supports multilanguage.
As far as i know i can't use .net speech (sapi) because it works only on vista \ windows 7.
I can't use Microsoft Speech Platform because it not supports all the languages i need (as far as i checked there is no Hebrew (he) support).
It can't be a web based service (i need it on my server).
I'm looking for something that can be used in commercial software and i'm also willing to pay for a third party product.
Can you please help me with that?
Thanks
You have text-to-speech listed as a tag but the description sounds like speech recognition. If I understand what you want to do it is to take a wav file with speech in it and convert it too text. Actually this is not even normal speech recognition because most of the speech reco systems work on targeted speech input that use grammars to restrict the search space that the speech engine has to use. I think what you are describing is automatic translation or transcription, akin to what Google Voice does to your voice mail messages when it sends you a text translation in an email. This is a much more difficult problem and the state-of-the-art is not that advanced right now. Most of these solutions are offered as services and the best ones still use human translators when the speech recognition confidence rate is low. I think the leader in this area is Nuance. I would check with them for a solution. I know they recently bought out a company that provides this automated transcription service and perhaps they now offer it as a product. They are also a leader in transcribing doctors orders/findings automatically to text with their product Dragon Naturally Speaking.
I am making a Smart House Control System right now, and I have a little problem.
I was thinking on using Cosmos for a base system, and adding the needed namespace libraries to it, but as the usual System.Speech.Recognition namespace depends too much on Windows Speech API, I have to forget about using it.
So my question is, is there any (free if possible) voice recognition and/or speech speech synthesizer library for C#, what has the following:
support for multi-language speaking
extracting text content from speech sample
synthesizing speech with selectable (or user-written) speech pattern (voice)
A general usage, non-windows dependent library would be the best, and of course, if it was free too.
Voxeo offers developer accounts which you could use to develop a speech powered home automation system. I've interfaced it to my own home automation system for a small subset of the commands my home understands and it works great. You'll need to learn some VoiceXML to use it.
SAPI works OK for voice synthesis; I use SAPI in my system for spoken prompts in the house like a weather forecast that comes over the speakers in the morning when you walk into the bathroom. If Cosmos doesn't allow you to include all the DLLs you need maybe you could create a separate service using SAPI and then use WCF (or other) to communicate between them??
For the related problem of understanding natural language in a typed form I've developed a C# NLP Engine which I hope to be able to make available for non-commercial at some point in the future.
Extracting text from speech without specifying any grammar up-front is a very hard problem and is going to be error prone. Even if you could solve that, you'd still have the problem of trying to understand what they said using NLP. Constructing a grammar that guides the recognizer to the kinds of sentences you want to recognize (like VoiceXML does) is likely to achieve much higher accuracy.
Check out this project: http://cmusphinx.sourceforge.net/
It's an open source speech recognition project. It is trainable with any language you want plus since its open source you can modify it to suit your needs or expand it.
I am making a robot that responds to few voice commands. I am using Windows XP and C# to achieve that. My only problem is that I don't know how to use speech recognition with C#.
I've been searching Google and MSDN, but I did not find any beginner friendly tutorial yet..
Any suggestions??
Also, I know -from my experience with windows' speech recognition in M$ word- that I need to train the computer before starting the speech recognition application. This may cause a big problem for me because I may need to present my robot using different computers/or/different people may be the presenters.
So is there any way to make a predefined list of words that any user can say to the application without having to train it first???
Thanks for help!
Yes, you'll have to train anything that uses pattern recognition to respond to things. In Philadelphia, they pronounce "water" as "wudder". How could an algorithm figure that out? A predefined list would require you to have a working knowledge of every accent in the target sales countries.
SAPI 5.4 in Windows 7 does a very good job of recognizing limited command & control grammars without training.
If you keep your command set (grammar) small (say, no more than 10-15 commands), you should be able to get good results.
Dictation or a large command set requires training; there's just too much uncertainty.