Microsoft C# API provide a SpeechRecognitionEngine to recognize Audio stream. One way to test recogition is to call method SpeechRecognizer.EmulateRecognize
According to documentation:
recognizers ignore case and character width when applying
grammar rules to the input phrase
I'd like to know if there is a way to handle more fuzzy string because confidence is very low even for mispelled text ! Far from real life...
With Audio I could say Hello, Helo, Helllo with a good confidence
With Text the engine is very strict
EDIT: For what purpose ?
My speech engine is working fine, but I also want to trigger it from text input.
Let's say your on mobile phone and use HTML5 SpeechRecognition. I'd like to send the recognized text to engine to get the same behavior as speech
Ok I found the answer !
I should better read the documentation !
SpeechRecognizer.EmulateRecognize
Is really straightforward and test the given string but
SpeechRecognizer.SimulateRecognize
Will try to build a an ‘idealized’ audio representation of the input phrase (based on the engine's lexicon and acoustic model)
And so it works very well !
When you send audio to the recognizer, the SR engine does a lot of work to create a set of phonemes (via acoustic modeling) and then a set of strings (via phoneme modeling). During that process, much of the ambiguity gets eliminated. EmulateRecognize doesn't generate audio that gets processed via the SR engine; it skips all the modeling and just does a string match.
There's no way to work around this that doesn't involve a lot of work (e.g., implementing a SAPI-compatible SR engine that only does EmulateRecognize).
Enter your string in the SpeechSynthesizer.Speak() and use that as input to SpeechRecognitionEngine?
Related
I have read the documentation but fail to understand what the underlying difference is between using:
Prompt prompt = new Prompt("What are you doing?");
speaker.SpeakAsync(prompt);
VS:
speaker.SpeakAsync("What are you doing?");
The reason I am asking this is because I am trying to get a response from the user and it's not just a statement, I am expecting a specific answer to a question that the speaker asks.
For example, I want speaker to say "What are you doing?" and if the user speaks into the microphone, "I'm trying to read, leave me alone.", then the voice recognition should stay quiet.
I am trying to determine how to best handle Question/Answer-based scenarios. How should I handle this, when my app is expecting a specific type of answer so that it can act on it.
There's no reason to use Prompt if it is just a simple string.
But more elaborate phrases can be built with PromptBuilder, switching voice and volume, inserting pauses and audio snippets, using Ssml markup, specifying style and pronunciation, marking paragraphs and sentences. With some further likelihood that you want to preserve that if you repeat the phrase. You'll need to use the Prompt class for that.
A Prompt object can contain plain text, text formatted with markup language, or audio files.
SpeakAsync is of type prompt.
I had to check a tts app I did a while back.
Back in the late eighties I seem to remember using a unix utility method named 'banner' - (see http://en.wikipedia.org/wiki/Banner_(Unix) It basically took a string of text and 'rendered' it as a larger text 'banner' using each character as blocks to form the original character. It was usually used at the start of print runs to create a heading for multi-sheet reports.
Does anyone know of a C# library that reproduces this functionality?
I remember a similar thing on the VAX mainframes at college. useful with a roomful of people banging in Pascal programs on VT100 terminals, and only one matrix printer behind closed doors in the hallowed office where the IT staff worked.
Anyway, this is a C# ASCII art creator - if you could find a way of rendering your text to a graphic file you could then convert it.
You probably could invoke a free software C program doing the same, perhaps figlet
I'm new to the CE environment, I'm creating an application for a mobile computer with a barcode scanne that uses Windows CE 5.0 (Motorola mc3000).
I'm using VS 2008 and I'm programming with C#.
I made a litte demo project that it runs successfully on the device. My application have a scan task so it need to use the Barcode reader: Access to the scan hardware, make it run, read the returned result and display it into a textBox!
The problem is that I don't know how to integrate the scan part into my application.
Any help on this?
Well, first step would be to look in the documentation for the Motorola MC3000. I don't know if there is one for the MC3000 but I know that in the Motorola EMDK for .Net they provide quite a few sample C# VS2008 projects and a couple of them are for barcode reading.
Kobunite has posted you the first step. So go to the download page of the Motorola EMDK an watch the examples. After that you have to reference the Symbol.dll and Symbol.Barcode.dll in your project (local copy = true). Then you can begin to write your "barcode-class" with an event-handler for the scan-event. When a barcode is scanned via the hardware-trigger the event will throw an then you can place the barcode-string in your focused textbox or do something else with it (e.g. filtering in a datagrid). Hope this helps.
Just to simply scan a barcode there is a much easier solution.
By default the barcode scanner should also output into the keyboard cache.
To test it simply open a text editor and scan a barcode. If the barcode appears then you are good. You can simply use a normal textbox and make sure focus is on it.
The problem is however that you need a terminator. The easiest solution is to append the Carriage return symbol to any scanned value. Most handheld devices have a utility somewhere where you can append characters to scan. Appending '\r' (without quotes) works for most devices.
This means that you don't have to do a single thing extra on your code. Just make sure the textbox support keyboard input and starts processing when enter is pressed.
Motorola uses the utility called DataWedge. Here is a link to it's manual (PDF file). Look at page 5 for carriage return and line feed. DataWedge Manual (old but should still help)
The main advantage is that it allows the user to also use manual input in case the barcode is damaged. The disadvantage is that you lose the barcode metadata (i.e. barcode encoding type, etc.) But this is not required 99% of the time anyway.
I need to load subtitles as external xml/text (representing different language) files for a video. However, I'm trying to decide on a schema/format for my external file, and have come across two options:
SAMI: http://msdn.microsoft.com/en-us/library/ms971327.aspx
Timed Text: http://www.w3.org/AudioVideo/TT/
Right now I'm leaning towards SAMI since it seems like Timed Text is relatively new and still being drafted. Also, Netflix mentioned that they were working on an implementation for it: http://blog.netflix.com/2009_06_01_archive.html
Does anyone have any suggestions/recommendations?
I have worked with SMIL and another priority format that goes with this Silverlight player on CodePlex. SAMI is interesting too... I have seen more general support for SMIL, but for what it sounds like you are doing I would say that either SAMI or SMIL would be good since there is no built in support for any format (today) in Silverlight.
You are going to have to process it and render it yourself so I would just pick the simplest format. Unless you are expecting to be getting these from external sources in which case you should first look at what that format is.
I am in the broadcast industry and most of the CC files that I get are SMIL based files.
I have made a project using SAMI subtitles - it works just fine.
Write me in case you need some sample code to get startet.
I'm an IT student. I want to write an web browser for blind people. How can I use C# or java to write an application to pronounce some text from a XML file (Text to Speech)?
You can use the SpeechSynthesizer class from the .Net Framework:
Add a reference to System.Speech.dll
Add a using statement for System.Speech.Synthesis
Use this code:
SpeechSynthesizer synthesizer = new SpeechSynthesizer();
synthesizer.Speak("Hello world! How are you?");
For .NET, have a look at the Speech APIs. There's a quick introduction to it here.
Hope that's enough to get you started.
It might be worth looking at accessibility guidelines for the web, this is a good place to start: http://www.w3.org/TR/WCAG10/
This will at least indicate what people are doing to support accessibility which will give you an idea of what your application should do. (eg reading ALT and TITLE tags)
C# can certainly be used to parse web pages and in addition to the text to speech built into the .NET framework there will be 3rd party libraries that you can integrate with.
You can also have a look at existing screen reader applications to give you some inspiration: eg: http://www.freedomscientific.com/products/fs/jaws-product-page.asp
For Java, this question gives you some Text To Speach options, but writing a whole web browser has a lot more to it than just the text to speech. Unless you are looking for something specifically cross platform (which I'm guessing not since you include .NET as an option), Windows comes with TTS accessibility options and of course a web browser built in.
With .NET you can use it's Speech API and interact with Internet Explorer to get you the web browser functionality. Definitely the shortest route, but it may not bring much over the built in Windows abilities.