I would like to write a program in C# that includes limited vocabulary speech recognition of languages such as Finnish or Polish. Microsoft's Speech SDK works great for English but can it support other languages like those? If not, what other (hopefully affordable) software tools are available?
Have a look at Microsoft Server Speech Platform 10.2. It supports both STT and TTS.
For 26 Languages, including Finnish and Polish!
Here's a link that will get you started.
http://www.codeproject.com/KB/audio-video/TTSandSR.aspx
A bit late post, sorry for that.
Related
C# - Free speech recognition Engine library (SDK)
System.Speech.Recognition is very bad... I want another SDK that give me good results and works with c# on Visual Studio...
and i want it OFFLINE not online like google api
Thanks
I got quite good results using pocketsphinx, or Sphinx if you have more available resources, in the past. Check it here:
https://cmusphinx.github.io/
I'm looking for speech (wave files) to text on windows server 2008 (or win server 2008 r2) using c# (at least an api that i can call from c#) that supports multilanguage.
As far as i know i can't use .net speech (sapi) because it works only on vista \ windows 7.
I can't use Microsoft Speech Platform because it not supports all the languages i need (as far as i checked there is no Hebrew (he) support).
It can't be a web based service (i need it on my server).
I'm looking for something that can be used in commercial software and i'm also willing to pay for a third party product.
Can you please help me with that?
Thanks
You have text-to-speech listed as a tag but the description sounds like speech recognition. If I understand what you want to do it is to take a wav file with speech in it and convert it too text. Actually this is not even normal speech recognition because most of the speech reco systems work on targeted speech input that use grammars to restrict the search space that the speech engine has to use. I think what you are describing is automatic translation or transcription, akin to what Google Voice does to your voice mail messages when it sends you a text translation in an email. This is a much more difficult problem and the state-of-the-art is not that advanced right now. Most of these solutions are offered as services and the best ones still use human translators when the speech recognition confidence rate is low. I think the leader in this area is Nuance. I would check with them for a solution. I know they recently bought out a company that provides this automated transcription service and perhaps they now offer it as a product. They are also a leader in transcribing doctors orders/findings automatically to text with their product Dragon Naturally Speaking.
I know the first comment will be that am duplicating previous threads, but the codes I found (from MSDN) uses window's speech recognition... I'm doing my graduation project and speech recognition is part of it! and I cant include this code,I have to try and do it from scratch, am doing some researches about it and I would be really thankful if someone have already done this and can give me a link for a paper or a code I can benefit from !
Thanks in advance!
Microsoft Server Speech Platform 10.1 (SR and TTS in 26 languages)
http://www.microsoft.com/download/en/details.aspx?displaylang=en&id=24003
The basic operations that speech recognition applications perform:
Starting the speech recognizer.
Creating a recognition grammar.
Loading the grammar into a speech recognizer.
Registering for speech recognition event notification.
Creating a handler for the speech recognition event.
Language Packs
http://www.microsoft.com/download/en/details.aspx?displaylang=en&id=3971
Runtime Download
http://www.microsoft.com/download/en/details.aspx?displaylang=en&id=24974
These libraries could, at least, give you a starter to understand the makeup of the interfaces, and a starter of the core/base code to copy/steal or emulate ;)
There's also this paper:
http://www.cs.nyu.edu/~mohri/pub/hbka.pdf
Best of Luck!
Writing the code that implements the basic recognition algorithm (Hidden Markov Model based recognizers are the norm these days) is only part of your challenge. Virtually every speech recognition system is trained on actual speech data, so you also have to identify a corpus (collection of audio files and transcriptions) to train your mathematical models.
Have a look at the open source Sphinx speech recognizer (and related tools) from CMU if you are still interested in doing it all by yourself.
So, you've all probably seen Iron Man where Tony interacts with an AI system called Jarvis. Demo clip here (Sorry it's a commercial).
I'm very familiar with C#, C++ and Visual Basic, but I am unsure what options I have available for me to program something like this. Ideally, I'd like to have it assist me while working on some projects by automating a few things.
After doing a bit of research, I saw that a lot of people where using apple script. Well, I'm a windows developer and I work on windows, SO, that won't work.
Microsoft has a Speech SDK, but I hear that I can't program it to learn custom words... as in it just uses it's standard library. Is this true? What are the other limitations of speech recognition with the SDK? Is there something else then?
Also, which language would be better to use for a project like this? C# or VB?
The .NET 3.0 System.Speech.Recognition namespace has very elegant .NET wrapper classes around the SAPI SDK. Including the Grammar class to customize the recognition. As usual, any .NET enabled language can take advantage of it, the specific language doesn't matter.
We have a application that we were planing to use Microsoft speech API for. Now we tested it on Windows XP using Microsoft Sam voice and frankly it sound terrible ... It's almost impossible to hear what the voice is trying to say.
Are there other, better voice. Are there any updates or newer versions out there that are better. Are there other product, open source projects etc that can work as an alternative?
Just to clarify - It needs to have some sort of API so I actually can program against it.
On Windows about the best I have found was using the speech API and voices from AT&T Natural Voices: https://nextup.com/attnv.html
They are however VERY expensive if available at all. I have run into projects where the usage/business model was so far from what AT&T was thinking of that they wouldn't even sell a license.
There is a free software alternative, Festival: http://festvox.org/ , the quality though is horrible. It is about 10 years behind the current sound quality of commercial systems. It is however free.
A third alternative which has worked well for me was to shift the voice synthesis part of a few projects to OS X. OS X has a decent set of tools and speech APIS and a fairly decent set of stock voices. The downside of course is that prorams written for these APIs run only under OS X which runs only on Apple hardware.
AT&T Natural Voices engine produces great speech but its not free
there is also NeoSpeech which are also good - Not free as well
You don't describe your licensing needs, so I don't know if any of these will be suitable in that regard, but all of the following are sources of SAPI 5 compatible voices:
Ivona (http://www.ivona.com/) - I'm using their Kendra voice on a SAPI project.
AT&T Natural Voices (http://www2.research.att.com/~ttsweb/tts/)
Loquendo (http://www.loquendo.com/)
Acapela (http://www.acapela-group.com/products/products.asp)
Cepstral (http://www.cepstral.com/)
fonix (http://www.fonixspeech.com/tts.php) - only if you loved the original Speak & Spell.
Nuance RealSpeak (I'm not sure about this one...)
You can use free and open source Festival. The default Festival voice sounds a little like Stephen Hawking but you can use some other much better HTS voices. For example try selecting Peter HTS 2011 voice on this demo page: http://www.cstr.ed.ac.uk/projects/festival/morevoices.html. Most of HTS voices for Festival that I've seen are not allowed for commercial use however this one seems to be free: http://homepages.inf.ed.ac.uk/jyamagis/software/page54/page54.html
You can check this youtube tutorial: http://www.youtube.com/watch?v=MmcLFJQpv2o