Since this topic is a bit out dated I would like to re-discuss it here.
After searching the web, I came across the following link:
http://archive.msdn.microsoft.com/nesl which runs only out of browser because Silverlight (in browser) can't access certain COM libraries that are related to windows.
I wish (for obvious performance purposes) to perform the speech recognition through Silverlight (on the client machine) and then send the result (text) to the server via a postback to perform the corresponding action.
I already achieved a way to get the voice from the microphone and store it in Silverlight in a byte array. Is there a way to convert the speech byte array to text?
HTML5 Google service is not an acceptable approach since IE is required.
My final goal is to implement a speech recognition in ASP.NET Web Application.
Any suggestion is appreciated.
You can't do it in Silverlight. You'll need to send the audio somewhere. You can call some third-party service (I'm sure there are plenty--and it shouldn't matter that you're using IE) or your own ASP.NET (which can call System.Speech or any other free or commercial system). But before you do that, you should compress the audio. There aren't a lot of options in Silverlight. I recommend NSpeex, or at least convert it to 16kHz PCM (either linear or a-law).
Here's a list of Speech SDKs (many of which have a cloud service component): http://www.toolsjournal.com/mobile-articles/item/918-top-10-sdks-to-voice-enable-mobile-apps-quickly
To make Trusted In-browser Silverlight application:
http://msdn.microsoft.com/en-us/library/gg192793(v=vs.95).aspx
http://www.pitorque.de/MisterGoodcat/post/Silverlight-5-Tidbits-Trusted-applications.aspx
And for security background:
http://msdn.microsoft.com/en-us/library/ee721083%28v=vs.95%29.aspx
Note that NESL doesn't support DictionaryGrammar. Grammar needs to be pre-defined:
http://archive.msdn.microsoft.com/nesl/Thread/View.aspx?ThreadId=4905
Related
I've checked out the speech framework that Microsoft offers in C# and it appears to be VERY bad compared to other voice recognition solutions like Siri on iPhone4 or Google's search option on my Galaxy Tab. I guess these work better as they have a HUGE amount of voice samples and it is being processed on high performance servers.
So I've been looking for their API but it doesn't seem to support desktop applications at all. The only think I could find was this post: Use google "Speak Now" in C#
----> I misunderstood what the person answering meant so this part is invalid!
Where the answer simply says: "Once you run the recognition api on the
sample of text you want to recognize, you can simply call google.com
with the "q" parameter to do a query search"
I don't understand what he means but I do know that he claims you can
get a desktop API to use Google as processor. That would be great!
So my question is: How can I use google.com's "Speak now" Voice Recognition feature I see on my tablet, in a standalone desktop application preferably written in C#?
I'm looking for speech (wave files) to text on windows server 2008 (or win server 2008 r2) using c# (at least an api that i can call from c#) that supports multilanguage.
As far as i know i can't use .net speech (sapi) because it works only on vista \ windows 7.
I can't use Microsoft Speech Platform because it not supports all the languages i need (as far as i checked there is no Hebrew (he) support).
It can't be a web based service (i need it on my server).
I'm looking for something that can be used in commercial software and i'm also willing to pay for a third party product.
Can you please help me with that?
Thanks
You have text-to-speech listed as a tag but the description sounds like speech recognition. If I understand what you want to do it is to take a wav file with speech in it and convert it too text. Actually this is not even normal speech recognition because most of the speech reco systems work on targeted speech input that use grammars to restrict the search space that the speech engine has to use. I think what you are describing is automatic translation or transcription, akin to what Google Voice does to your voice mail messages when it sends you a text translation in an email. This is a much more difficult problem and the state-of-the-art is not that advanced right now. Most of these solutions are offered as services and the best ones still use human translators when the speech recognition confidence rate is low. I think the leader in this area is Nuance. I would check with them for a solution. I know they recently bought out a company that provides this automated transcription service and perhaps they now offer it as a product. They are also a leader in transcribing doctors orders/findings automatically to text with their product Dragon Naturally Speaking.
I want build an application on web which records the audio sound through Mic.
If any one can provide some appropriate approach or some links would be helpful.
also if you can suggest some third party control which is free.
The technology for Implementation - ASP.NET , C#
Since you are looking to use C#, check out Silverlight 4 which added microphone support to Silverlight. Here is a tutorial on accessing the microphone in Silverlight 4. Scratch Audio is a great example of a Silverlight support with microphone support.
javascript does not give mic support. You would have to include another technology to do this.
I know flash supports microsoft input. It might be the best place to start. I don't know if silverlight has these features.
The only other things i can think of would be an activeX control for IE, or a standalone application. Both of these look like much worse approaches
You can use flash to record from microphone and upload it to a server. For the server you can use Red5 which is great open source server.
Here are some examples:
http:// => fms.denniehoopingarner.com/
http:// => mariofalomir.com/blog/?p=101
(Sorry but i can only post 1 hyperlink)
How can I create an ASP.NET page, that allows users to communicate with audio voice.
What must I do to accomplish this job.
Thanks.
Using ASP.NET only? I'm not sure you can do this without some sort of browser plugin. I suspect that it might be advantageous to leverage the Flash Player to make the call. You'd still need a communication server to do the plumbing. Check out Red5 (Open Source), FlashMediaServer (Adobe's product), Wowza Media Server (cheaper than FMS).
spender's answer is correct. Current versions of HTML nor the proposed HTML5 provide any support for audio pick up. Additionally, the audio playback features are insufficient to provide the kind of audio streaming controls that you will need. Look at the variety of plug-in technologies available or write your own browser specific, operating specific plug in.
If you are going to target devices such as the iPad and iPhone that won't support plug-ins, you are going to be forced to build native applications for those platforms.
I am working for a company where we are developing video chat support on an existing application. I have looked at various solutions for this like
Using Managed Direct show for video capture and streaming in C#
Some code samples in code project where we take an image and pass it over the network (I would call it rather a crude solution as this would eat up lot of bandwidth.
Code a compression algorithm from scratch from scratch and use it to compress-decompress video.
Now the challenge is that we are looking to achieve very high quality video streaming and the container application is coded in C#.NET
This is what I have proposed so far. The network logic to stream data is written in C# , the video compression to be written in VC++ and call this VC++ dll using pinvoke or either CLI which way possible.
I am looking for some one more experienced that me in this field who can suggest me if Iam going correct or can this be still improved.
The ultimate goal is high quality video streaming.
The codec can be any anything like h.2633, h.264 etc.
I've used several ways to get video streaming/conferencing with .net easily, without need to dig into directshow. (ok, dig some, but not deep :)
1) Use of plain Windows Media Encoder components. It is documented with samples in Windows Media Encoder SDK. Good for any high resolution streaming, but delay is too big for realtime chat (0.5-2 seconds at best). Modern Express Encoder SDK another option.
2) Microsoft Research ConferenceXP http://cct.cs.washington.edu/ Full featured conferencing API including application streaming. They too low level Windows Media coded filters and wrapped them into managed code. Works well. Easily customizable. Looks bit abandoned now.
3) Microsoft RTC Client up to version 1.3 - core of windows messenger.
pros: managed samples from Microsoft, good docs, reliable performance, freely redistributable, microsoft compatible (good) SIP stack included. Major conferencing vendors like Emblaze VCON based their solutions on it in some near past, not sure about this days, but I know that Tandberg licensed Microsft's VC-1.
cons: version up to 1.3 support h261-h263 video only. modern version with support of VC-1(h264) codec does not allow direct serverless ip-ip connections. It does at require Microsoft Live Communications server. Newer version SDK does not cover well video conferencing calls.
http://msdn.microsoft.com/en-us/library/ms775892(VS.85).aspx
Please let us know what platform you have chosen. By the way, I've even used ConferenceXP video rtp part with RTC 1.3 voice/SIP features together to improve video quality, so you have wide choice of managed technologies here. Another thing is Live Meeting at which I had no chance to take good look yet.
Save yourself the trouble and use VLC. There are some decent .NET wrappers for it (http://forum.videolan.org/viewtopic.php?f=32&t=52021&start=30)
We are using C# and VLC for an IPTV network. We take input off DISH network satellites via Osprey-450 video capture devices on a Windows XP server. From there, we have a .NET server component that we wrote in C# that uses VLC behind the scenes (starting separate processes in .NET to control the vlc.exe instances). The VLC processes transcode and stream the signals over a network (.h264 or MPEG-4, we've successfully done both).
On the client side we have a C# WinForm application that uses an embedded VLC Viewer to view multicast signals. This application is mainly for command & control. The real use of the multicast signals happens when our set top boxes attached to our TV's decode and display the streams.
We thought we were going to have to write our own DirectX encoders too, but don't go to all the trouble. VLC works really well and has enough C# support to be very useful. Feel free to e-mail me if you have specific questions about implementation.
You should check out the Ucentrik SDK. This SDK will enable you to integrate rich-media functionality such as video, audio, chat, remote-desktop sharing and control and video recording in your applications. The video codecs supported are VP8 (Google), Theora, and x.264. Additionally, the rich media traffic is encapsulated within an HTTP protocol to enable it to traverse firewalls that enable normal web traffic. This technology is completely free and you can download the SDK and request an API key so that you can evaluate without investing any time on setting up the infrastructure. In the next few months, we are releasing a server component so that you can download and host the infrastructure your self or your customer. The technology supports 1-to-many connections which means that you can create video conferences if you like. The features are highly modular so that you can integrate just the video or audio or desktop share or a combination of the technologies using the same SDK... You should request for an SDK available here: www.ucentrik.com. Additionally, there are some video's here: http://www.youtube.com/user/ucentrik
good luck.
Ucentrik has just released an open-source call-center application that integrates the CTX technology. This call-center application implements the video, audio, desktop/application sharing (with control), text chat functionality available from the CTX API. The application also includes some business logic specifically around providing the ability to route the calls to an agent that is available or have specific skill-set. The project is available at http://vcca.codeplex.com - please note that you will require access to the Ucentrik CTX SDK which is available for request on the Ucentrik website (www.ucentrik.com). Good luck.