I am trying to develop a humble AI to help the daily routines of my family.
Voice recognition is a must, and commands can not be limited to a command library.
so command library mode is out of the table
I tried dictation mode, which already has a terrible recognition with headset, wont be able to understand anything with a room mic.
So I am trying to use Microsoft Cognitive Services: Bing Speech Recognition:
I downloaded the documents and the example, I see everything is in XAML form. I don't understand why.
I am asking some guidance from those who are experienced in this, is it possible to make it in console app or windows form? (I am using .Net 4.6).
If not do you have any suggestion for me to solve my problem?
Thank you for your time and patience.
You can use NuGets to achieve the same. Find sample over here https://github.com/Azure-Samples/Cognitive-Speech-STT-Windows
Further details about BingSpeechSDK -> https://learn.microsoft.com/en-us/azure/cognitive-services/Speech/GetStarted/GetStartedCSharpDesktop
You can use the same for Console App also just need to define the input segment from MicIn.
Related
I'm build a Win Universal App with capabilities to watch live captions of the lecture which student is currently watching or attending in person. I'm looking for a built-in free solution to do audio to text operations.
macOS have the Speech lib https://developer.apple.com/documentation/speech , which we're going to use, but cannot find a similar on Windows. Found docs on Windows.Media package, but cannot figure out if it actually has audio2text api or just commands recognition https://learn.microsoft.com/en-us/uwp/api/windows.media.speechrecognition?view=winrt-22621
Maybe someone has experience with building such kind of capabilities on Windows?
Yes, you could use the Windows.Media.SpeechRecognition API for speech recognition not only with the commands recognition.
You could make a simple test with the official Speech Recognition Sample here: SpeechRecognitionAndSynthesis. Just remember to enable the Online speech recognition (Settings -> Privacy -> Speech).
I want to develop an application for voice calls between two android devices on my home network (WiFi). I'm new to programming so don't really know where to begin, I have researched around but cannot find anything that fits what I need to do.
The application has to be written in C# as I have a basic understanding of that language and it's the language I want to expand my knowledge in. I'm using MonoDevelop which allows the creation of android apps through C#.
The call will be peer to peer so very basic and no security or encryption will be necessary in these early stages of development.
All help will really be appreciated!
I'd start with this link at code project:
http://www.codeproject.com/Articles/138484/Simple-SIP-VOIP-based-phone-in-C
You will have to adapt it to android, but again is a probably a good starting point to understand how to do voice over IP.
The big things that may differ on android are:
User interface
method to get microphone input
method to play audio output
access to the IP stack
but the basic (encoding, decoding, etc...) should be there.
I've checked out the speech framework that Microsoft offers in C# and it appears to be VERY bad compared to other voice recognition solutions like Siri on iPhone4 or Google's search option on my Galaxy Tab. I guess these work better as they have a HUGE amount of voice samples and it is being processed on high performance servers.
So I've been looking for their API but it doesn't seem to support desktop applications at all. The only think I could find was this post: Use google "Speak Now" in C#
----> I misunderstood what the person answering meant so this part is invalid!
Where the answer simply says: "Once you run the recognition api on the
sample of text you want to recognize, you can simply call google.com
with the "q" parameter to do a query search"
I don't understand what he means but I do know that he claims you can
get a desktop API to use Google as processor. That would be great!
So my question is: How can I use google.com's "Speak now" Voice Recognition feature I see on my tablet, in a standalone desktop application preferably written in C#?
I am making a Smart House Control System right now, and I have a little problem.
I was thinking on using Cosmos for a base system, and adding the needed namespace libraries to it, but as the usual System.Speech.Recognition namespace depends too much on Windows Speech API, I have to forget about using it.
So my question is, is there any (free if possible) voice recognition and/or speech speech synthesizer library for C#, what has the following:
support for multi-language speaking
extracting text content from speech sample
synthesizing speech with selectable (or user-written) speech pattern (voice)
A general usage, non-windows dependent library would be the best, and of course, if it was free too.
Voxeo offers developer accounts which you could use to develop a speech powered home automation system. I've interfaced it to my own home automation system for a small subset of the commands my home understands and it works great. You'll need to learn some VoiceXML to use it.
SAPI works OK for voice synthesis; I use SAPI in my system for spoken prompts in the house like a weather forecast that comes over the speakers in the morning when you walk into the bathroom. If Cosmos doesn't allow you to include all the DLLs you need maybe you could create a separate service using SAPI and then use WCF (or other) to communicate between them??
For the related problem of understanding natural language in a typed form I've developed a C# NLP Engine which I hope to be able to make available for non-commercial at some point in the future.
Extracting text from speech without specifying any grammar up-front is a very hard problem and is going to be error prone. Even if you could solve that, you'd still have the problem of trying to understand what they said using NLP. Constructing a grammar that guides the recognizer to the kinds of sentences you want to recognize (like VoiceXML does) is likely to achieve much higher accuracy.
Check out this project: http://cmusphinx.sourceforge.net/
It's an open source speech recognition project. It is trainable with any language you want plus since its open source you can modify it to suit your needs or expand it.
I am making a robot that responds to few voice commands. I am using Windows XP and C# to achieve that. My only problem is that I don't know how to use speech recognition with C#.
I've been searching Google and MSDN, but I did not find any beginner friendly tutorial yet..
Any suggestions??
Also, I know -from my experience with windows' speech recognition in M$ word- that I need to train the computer before starting the speech recognition application. This may cause a big problem for me because I may need to present my robot using different computers/or/different people may be the presenters.
So is there any way to make a predefined list of words that any user can say to the application without having to train it first???
Thanks for help!
Yes, you'll have to train anything that uses pattern recognition to respond to things. In Philadelphia, they pronounce "water" as "wudder". How could an algorithm figure that out? A predefined list would require you to have a working knowledge of every accent in the target sales countries.
SAPI 5.4 in Windows 7 does a very good job of recognizing limited command & control grammars without training.
If you keep your command set (grammar) small (say, no more than 10-15 commands), you should be able to get good results.
Dictation or a large command set requires training; there's just too much uncertainty.