I'm build a Win Universal App with capabilities to watch live captions of the lecture which student is currently watching or attending in person. I'm looking for a built-in free solution to do audio to text operations.
macOS have the Speech lib https://developer.apple.com/documentation/speech , which we're going to use, but cannot find a similar on Windows. Found docs on Windows.Media package, but cannot figure out if it actually has audio2text api or just commands recognition https://learn.microsoft.com/en-us/uwp/api/windows.media.speechrecognition?view=winrt-22621
Maybe someone has experience with building such kind of capabilities on Windows?
Yes, you could use the Windows.Media.SpeechRecognition API for speech recognition not only with the commands recognition.
You could make a simple test with the official Speech Recognition Sample here: SpeechRecognitionAndSynthesis. Just remember to enable the Online speech recognition (Settings -> Privacy -> Speech).
Related
I am making an application with Unity and MRTK for Hololens 2.
I would like to know if it is possible to disable general speech commands.
That is, I want to keep my own voice commands that I have created but I don't want phrases like "Take a picture", "Take a video".... to be recognized.
I've searched the internet but haven't found anything about it.
Does anyone know if there is an option to do this?
Speech commands in the Unity UWP app are powered by the same engine that supports speech in Windows 10 System. It means if you disable the Speech feature in Settings > Privacy > Speech, the speech recognition will no longer be available in your Unity app. Apart from this, HoloLens does not provide any way to allow you to modify system commands for now. Therefore, it is recommended you avoid using system commands in your app, or consider using the Unity plug-in for the Cognitive Speech Services SDK or other third-party speech engines.
You can go setting page,then go to camera ,then find "choose which apps can acess your camera" ,then uncheck Mixed Reality Camera.
I am trying to develop a humble AI to help the daily routines of my family.
Voice recognition is a must, and commands can not be limited to a command library.
so command library mode is out of the table
I tried dictation mode, which already has a terrible recognition with headset, wont be able to understand anything with a room mic.
So I am trying to use Microsoft Cognitive Services: Bing Speech Recognition:
I downloaded the documents and the example, I see everything is in XAML form. I don't understand why.
I am asking some guidance from those who are experienced in this, is it possible to make it in console app or windows form? (I am using .Net 4.6).
If not do you have any suggestion for me to solve my problem?
Thank you for your time and patience.
You can use NuGets to achieve the same. Find sample over here https://github.com/Azure-Samples/Cognitive-Speech-STT-Windows
Further details about BingSpeechSDK -> https://learn.microsoft.com/en-us/azure/cognitive-services/Speech/GetStarted/GetStartedCSharpDesktop
You can use the same for Console App also just need to define the input segment from MicIn.
I am good programmer but I am new for Robotics. I want to make small project which will recognize my speech and my light on and off.
Please provide me some reference library for all things.
What kind of hardware I need for it.
If you are new to Robotics I would diffiently recommend you to check out the Arduino device. The Arduino device is an easily programmable circut board. You can read more at http://www.arduino.cc
Regarding Speech recogniztion you could use Microsoft Speech recogniztion software, which allows you to communicate with your app via. An library. Find out how to install it here: http://support.microsoft.com/kb/306537
The example given in your comments of saying "Lights Off" to turn a light off in a room sounds more like home automation than robotics. I would look at the open source project called Mayhem on CodePlex. This applications makes it easy to wire events with actions using plug in modules and was developed by Microsoft Labs. Modules are easy to develop and it comes out of the box with an event module for speech recognition. You just configure an event for speech recognition by telling the app what phrase you are looking for like "Lights Off". There is already an extensive list of add-on modules for Mayhem, and one of them is a reaction module for Insteon devices. This add-on module lets you control INSTEON-brand lighting and home automation devices. There is also a PhidgetsModule which allows you to control Phidgets, a set of "plug and play" building blocks for low cost USB sensing and control from your PC. Microsoft had a contest to encourage developers to create new add-on modules and you can see a demonstration here of a module called RemoteCommand which shows how you could use speech recognition over a telephone to remotely control your home.
I've checked out the speech framework that Microsoft offers in C# and it appears to be VERY bad compared to other voice recognition solutions like Siri on iPhone4 or Google's search option on my Galaxy Tab. I guess these work better as they have a HUGE amount of voice samples and it is being processed on high performance servers.
So I've been looking for their API but it doesn't seem to support desktop applications at all. The only think I could find was this post: Use google "Speak Now" in C#
----> I misunderstood what the person answering meant so this part is invalid!
Where the answer simply says: "Once you run the recognition api on the
sample of text you want to recognize, you can simply call google.com
with the "q" parameter to do a query search"
I don't understand what he means but I do know that he claims you can
get a desktop API to use Google as processor. That would be great!
So my question is: How can I use google.com's "Speak now" Voice Recognition feature I see on my tablet, in a standalone desktop application preferably written in C#?
I'm looking for speech (wave files) to text on windows server 2008 (or win server 2008 r2) using c# (at least an api that i can call from c#) that supports multilanguage.
As far as i know i can't use .net speech (sapi) because it works only on vista \ windows 7.
I can't use Microsoft Speech Platform because it not supports all the languages i need (as far as i checked there is no Hebrew (he) support).
It can't be a web based service (i need it on my server).
I'm looking for something that can be used in commercial software and i'm also willing to pay for a third party product.
Can you please help me with that?
Thanks
You have text-to-speech listed as a tag but the description sounds like speech recognition. If I understand what you want to do it is to take a wav file with speech in it and convert it too text. Actually this is not even normal speech recognition because most of the speech reco systems work on targeted speech input that use grammars to restrict the search space that the speech engine has to use. I think what you are describing is automatic translation or transcription, akin to what Google Voice does to your voice mail messages when it sends you a text translation in an email. This is a much more difficult problem and the state-of-the-art is not that advanced right now. Most of these solutions are offered as services and the best ones still use human translators when the speech recognition confidence rate is low. I think the leader in this area is Nuance. I would check with them for a solution. I know they recently bought out a company that provides this automated transcription service and perhaps they now offer it as a product. They are also a leader in transcribing doctors orders/findings automatically to text with their product Dragon Naturally Speaking.