VoiceXML - Recognize DTMF in Recording - c#

I've been doing IVR work for a while, but we have a case where I'd love some expertise/feedback:
Is it possible to record a message where the user could press a DTMF tone to indicate a pause where we would insert our own sound? In this scenario, the user would record something like: "Good Morning, [DTMF], please call the office at [DTMF] to reconcile your account.".
Not sure whether we would chop the resulting WAV file into pieces to insert our variables, or do some post-processing before sending out our message.
Does anyone have any experience with something like this?
Thanks
Jim Stanley
Blackboard Connect

In VoiceXML you would use a record element to record a message from a user. The record element has an attribute call dtmfterm which if set to true (default setting) will terminate recording. If this attribute is set to false then recording is terminated when maxtime setting is reached or silence for the duration of finalsilence is reached. Having dtmfterm set to false will just result in the DTMF being part of the recording. Setting dtmfterm to true will result in the recording being terminated.
I have created applications that use caller created recordings but never one that manipulates the recordings like in your requirements. What you may be able to do is concatenate recordings together. Here is a QA that shows how to concatenate wav recordings using C#.
What you will have to experiment with is whether you can catch which DTMF key was pressed by using grammars. The spec eludes to this but it may be somewhat specific to the VoiceXML IVR platform that you are using. If you know what DTMF key was used then you can instruct the user to press * to insert silence and # to terminate recording. Both will terminate a recording but the logic in your VoiceXML will go right back into recording again if the * is pressed and stop the recording process completely if the # is pressed. Then you would use the concatenation to string these recording together and use a wav file with pre-recorded silence in the concatenation process that is inserted between the users recorded snippets.
From the tags it looks like you are using C# and MVC for your VoiceXML application. There is an open source project called VoiceModel that makes it easier to develop VoiceXML applications using ASP.NET MVC 4. You can read about how it handles recording in this environment here.

If you want to insert a pause and want to stay within the UI tag , So far how much work I had in IVR, the only dtmf with which we could stay within the UI is * and we would return a grammar "REPEAT" on pressing '*' , in the UI condition tag for REPEAT , you would add the silence (pause) wav file.
The recording part , we used osdmtype = record which mapped to an xslt which helped in the recording and recognising Customer's answer yes/no.
But nevertheless I'm bit confused on the requirement exactly , would need more details.
Sorry can't add comments as don't have enough Rep.
You can mail me or i can add more answers here.

Related

Watson IBM Speech to Text c# api

I using follow example to recognize text from audio https://gist.github.com/nfriedly/0240e862901474a9447a600e5795d500 but I need also time codes, i added at line 40 "timestamps" : true, and removed "interim_results": true as I need only final results. But it broken, after { "state": "listening" } message it takes some time and raise exception like that
"Text" received message is invalid after the call Websocket.Closeasync. Websockets.In cases closeasync, so you should only use those when you do not expect to receive other data from the remote endpoint. Use "Websockets.CloseOutputAsync" to preserve the possibility of obtaining additional data, but to close the outgoing channel.
And if i set "continuous" : false, It do only the first iteration of speech (few first words before a pause), and then repeat {"state": "listening" } and freezes.
Can you help me, how to update that example to return Timecodes?
continuous: false means "only transcribe until the first pause" - so it isn't "freezing", it's just stopping when you tell it to.
The service then sends the final results followed by the second {"state": "listening"} message to indicate that it's done sending results. The example code closes the connection after that, but it sound like you're still attempting to send audio after closing the connection.
I'm not certain, but I think that timestamps and interim_results will probably work the way you want once you set continuous: false.
Although, if you only need final results, then the HTTP interface might make more sense. It's much simpler than the WebSockets one.
Finally, as I mentioned in email, the official IBM Watson .net SDK has support for Speech to Text in the development branch right now, and should have it included in a release soon.

Writing to a microphone's output buffer

I'm wanting to create a fun little project to function as a Skype sound-board. That is to say, if you press a hotkey (say, NumPad 1), the sound-board plays a pre-determined WAV file over the call. Really only to be used for stupid in-jokes and other silliness with friends.
The way I envision handling this problem is writing to the microphone's output buffer. However, I cannot find any ideas on how to do this. I found this question regarding general audio handling, but the output examples for nAudio are rather generic and don't handle writing to a specific device.
Ideally, I want to get the default audio input device for the system (so the default microphone) and then write the WAV data to the buffer it's using for transmission.
The first problem appears to be tenable with the XNA framework and its Microphone object. It has a Default static method that should get me what I need. But the Microphone object itself doesn't have an obvious way to write to the buffer, which leaves me a little stuck.
Are there any ideas on how to do this? Am I running down the wrong path? Is the Microphone object even the correct thing to use here?

Run a script when there is no audio

i am trying to download a long tutorial from a website containing a lot of links, and I would like to do that automatically.
I need to create a script that listen to audio, if the program does not hear anything after 5 sec then it should click on the next button (I know how to simulate a click).
I have never worked with audio, could you please advice me an api/function that would listen to the
sound and return a value (true, false) or anything like that when it does not hear anything.
Many thanks
This is recording: http://msdn.microsoft.com/en-us/library/ff827802.aspx
And then you would have to know the exact WaveFormat of the recorded sound. If you've got the exact WaveFormat (e.g. 16 bit pcm mono), you could iterate through , check whether it is within a specific range. If all of the sample is for example smaller than 0.1 it is silence. If not... click.
you want to download the tutorial and further information of a website? why write a script for that by hand? take a look at existing tools. e.g. http://pagenest.com/
Don't know if that fits your requirement but there are quite some tools for downloading website information
BR

How to integrate the scan barcode option on my WinCE application?

I'm new to the CE environment, I'm creating an application for a mobile computer with a barcode scanne that uses Windows CE 5.0 (Motorola mc3000).
I'm using VS 2008 and I'm programming with C#.
I made a litte demo project that it runs successfully on the device. My application have a scan task so it need to use the Barcode reader: Access to the scan hardware, make it run, read the returned result and display it into a textBox!
The problem is that I don't know how to integrate the scan part into my application.
Any help on this?
Well, first step would be to look in the documentation for the Motorola MC3000. I don't know if there is one for the MC3000 but I know that in the Motorola EMDK for .Net they provide quite a few sample C# VS2008 projects and a couple of them are for barcode reading.
Kobunite has posted you the first step. So go to the download page of the Motorola EMDK an watch the examples. After that you have to reference the Symbol.dll and Symbol.Barcode.dll in your project (local copy = true). Then you can begin to write your "barcode-class" with an event-handler for the scan-event. When a barcode is scanned via the hardware-trigger the event will throw an then you can place the barcode-string in your focused textbox or do something else with it (e.g. filtering in a datagrid). Hope this helps.
Just to simply scan a barcode there is a much easier solution.
By default the barcode scanner should also output into the keyboard cache.
To test it simply open a text editor and scan a barcode. If the barcode appears then you are good. You can simply use a normal textbox and make sure focus is on it.
The problem is however that you need a terminator. The easiest solution is to append the Carriage return symbol to any scanned value. Most handheld devices have a utility somewhere where you can append characters to scan. Appending '\r' (without quotes) works for most devices.
This means that you don't have to do a single thing extra on your code. Just make sure the textbox support keyboard input and starts processing when enter is pressed.
Motorola uses the utility called DataWedge. Here is a link to it's manual (PDF file). Look at page 5 for carriage return and line feed. DataWedge Manual (old but should still help)
The main advantage is that it allows the user to also use manual input in case the barcode is damaged. The disadvantage is that you lose the barcode metadata (i.e. barcode encoding type, etc.) But this is not required 99% of the time anyway.

c# DirectShow graphbuilder output filename issue

I'm new to using Directshow. I'm more than willing to post the pages of code I've writen but I'm hoping someone could explain or hint in the right direction for a solution so I can figure it out myself.
Basically I have a WPF program that displays a window that has a preview of my webcam - this is done and working. Now I'm trying to get it to record the preview - done using graphBuilder.SetOutputFileName
However everytime I show the window to record another session is just overwrites the last file it recorded, even though I'm calling graphBuilder.SetOutputFileName again!
So my question is how can I change the outputfilename to record a second video. I know I'm missing something but don't know what.
Thanks in advance.
Rich
Filter graphs normally create media files starting from scratch on your initial Run and closing the file on your Stop. Next time you repeat the calls, you just start it from fresh from empty (overwritten) file. There is no appending. If you want to keep the previously recorded content, you need to switch files by providing new name, or copying/renaming the completed file.

Categories

Resources