AWS - Amazon Polly Text To Speech

AWS - Amazon Polly Text To Speech - c#

I have doubts about the "text-to-speech" Amazon Polly service.
I've integrated this service in my chatbot, in order to describe vocally what the bot writes to the user in chat.
It works pretty well, but I don't know if it is possible to stop the voice early, before she (I chose a female voice) finishes speaking. Sometimes I need to go further in the conversation and I don't want to listen until the end of the sentence.
This is the code used for the integration:
//Html side
function textToSpeech(text) {
$.ajax({
type: 'GET',
url: '/Chat/TextToSpeech?text=' + text,
cache: false,
success: function (result) {
var audio = document.getElementById('botvoice');
$("#botvoice").attr("src", "/Audios/" + result);
audio.load();
audio.play();
}
});
}
Controller side:
public ActionResult TextToSpeech(string text)
{
string filename = "";
try
{
AWSCredentials credentials = new StoredProfileAWSCredentials("my_credential");
AmazonPollyClient client = new AmazonPollyClient(credentials, Amazon.RegionEndpoint.EUWest1);
// Create describe voices request.
DescribeVoicesRequest describeVoicesRequest = new DescribeVoicesRequest();
// Synchronously ask Amazon Polly to describe available TTS voices.
DescribeVoicesResponse describeVoicesResult = client.DescribeVoices(describeVoicesRequest);
List<Voice> voices = describeVoicesResult.Voices;
// Create speech synthesis request.
SynthesizeSpeechRequest synthesizeSpeechPresignRequest = new SynthesizeSpeechRequest();
// Text
synthesizeSpeechPresignRequest.Text = text;
// Select voice for synthesis.
synthesizeSpeechPresignRequest.VoiceId = voices[18].Id;
// Set format to MP3.
synthesizeSpeechPresignRequest.OutputFormat = OutputFormat.Mp3;
// Get the presigned URL for synthesized speech audio stream.
string current_dir = AppDomain.CurrentDomain.BaseDirectory;
filename = CalculateMD5Hash(text) + ".mp3";
var path_audio = current_dir + #"\Audios\" + filename;
var presignedSynthesizeSpeechUrl = client.SynthesizeSpeechAsync(synthesizeSpeechPresignRequest).GetAwaiter().GetResult();
FileStream wFile = new FileStream(path_audio, FileMode.Create);
presignedSynthesizeSpeechUrl.AudioStream.CopyTo(wFile);
wFile.Close();
}
catch (Exception ex)
{
filename = ex.ToString();
}
return Json(filename, JsonRequestBehavior.AllowGet);
}
An input text is present in my chat (obviously) for writing and sending (by pressing ENTER on the keyboard) the question to the bot. I tried to put the command audio.src="" in the handler, and she stops to talk but the chat still remains blocked... It seems like it waits the end of the audio stream. I have to refresh the page to see new messages and responses.
Is there any Amazon function that I can call with a particular parameter set, in order to notify the service that I want to stop and clear the audio stream?

Amazon Polly returns a .mp3 file. It is not responsible for playing the audio file.
Any difficulties you are experiencing playing/stopping the audio would be the result of the code you are using to play an MP3 audio file. It has nothing to do with the Amazon Polly service itself.

Thank you! I found the real problem: when I stopped the audio I didn't print the rest of the messages. I add the call to the function that print messages on the chat. For stopping the voice I used the command audio.src="";

Related

Firebase Cross-platform issue - .setAsync() is setting in 2 path destinations instead of just one

I am using Firesharp for my app in c# winforms. This database is also connected to my ReactJS website which deals with the same information.
I have noticed that when I make .SetAsync calls to my app on the website and then log in to my Winforms app, my WinForms app will automatically perform the last action I did on my website to my database which is a .setAsync() action which adds some user information to a list of other user's information. Now it will not stop. Anytime I log on to my c# app, it runs it.
It makes me think there is a queue of orders in firesharp?
here is my react code. From what I can tell, it is nothing out of the ordinary:
async function handleSubmit(e) {
e.preventDefault()
var date = moment().format("MM/DD/YYYY" )
setError("")
setLoading(true)
// grab user info first then use that.
await firebaseApp.database().ref("Users/" + currentUser.uid + "/UserData").on('value', snapshot => {
if (snapshot.val() != null) {
setContactObjects({
...snapshot.val()
})
firebaseApp.database().ref("Projects/" + projectGUIDRef.current.value + "/queueList/" + userIdRef.current.value).set({
"EntryTimeStamp": date + " " + moment().format("hh:mm:ss a"),
"IsSyncing": false,
"UserId": userIdRef.current.value,
"UserName": usernameRef.current.value,
})
}
})
history.push("/Demo")
setLoading(false)
}
here is my c# winforms code of where the code is executing. For some reason, when this executes, it also updates the EntryTimeStamp field of the react code and completely sets all the information even if I delete it. It also happens if I run .set().
updateLastLogin2(authLink);
private async void updateLastLogin2(FirebaseAuthLink authLink)
{
IFirebaseConfig config = new FireSharp.Config.FirebaseConfig
{
AuthSecret = this.authLink.FirebaseToken,
BasePath = Command.dbURL,
};
IFirebaseClient client = new FireSharp.FirebaseClient(config);
string newDateTime = DateTime.Now.ToString();
if (authLink.User.DisplayName.Contains(adUserId) && authLink.User.DisplayName.Contains(adUserId))
{
await client.SetAsync("Users/" + this.authLink.User.LocalId + "/UserData/DateLastLogin", newDateTime);
}
}
Any and all help is appreciated, I've been at this for a day and a half now.

I have never used fire-sharp but this is my guess
You are calling await firebaseApp.database().ref("Users/" + currentUser.uid + "/UserData").on('value' in your react, and then in your Csharp you are calling client.SetAsync("Users/" + this.authLink.User.LocalId .
What happens is the both listeners are listening to each other and then causing a loop.
In that case it's probably better to use once instead of on if you are just using it once.
In cases where you cannot use .once, then you should use .off to turn off the listener once you are done.
firebaseApp.database().ref("Users/" + currentUser.uid + "/UserData").once('value'`
You also shouldn't be using await here since ref().on creates a listener, it doesn't return a promise.
You should also move history.push("/Demo") into your firebase database callback function so it's called after you have set data

How to return a simple timestamp output in Azure?

I'm trying to trim down the data I'm getting from the Azure speech-to-text model I'm using. Line 21 is where the output format is specified and I've changed it to "simple" but I still get a detailed output. The code I'm using is:
using System;
using System.Threading.Tasks;
using Microsoft.CognitiveServices.Speech;
using Microsoft.CognitiveServices.Speech.Audio;
namespace NEST
{
class Program
{
static void Main(string[] args)
{
var key = "";
var region = "";
var audioFilePath = #"C:/Users/MichaelSchwartz/source/repos/AI-102-Process-Speech-master/transcribe_speech_to_text/media/narration.wav";
var speechConfig = SpeechConfig.FromSubscription(key, region);
// Generates timestamps
speechConfig.RequestWordLevelTimestamps();
speechConfig.OutputFormat = OutputFormat.Simple;
var stopRecognition = new TaskCompletionSource<int>();
// Calls the audio file
var audioConfig = AudioConfig.FromWavFileInput(audioFilePath);
var recognizer = new SpeechRecognizer(speechConfig, audioConfig);
//Display Recognized
recognizer.Recognized += (s, e) =>
{
if (e.Result.Reason == ResultReason.RecognizedSpeech)
{
Console.WriteLine($"RECOGNIZED :{e.Result.Properties.GetProperty(PropertyId.SpeechServiceResponse_JsonResult)}");
}
else if (e.Result.Reason == ResultReason.NoMatch)
{
Console.WriteLine($"NOMATCH: Speech could not be recognized.");
}
};
recognizer.Canceled += (s, e) =>
{
Console.WriteLine($"CANCELED: Reason={e.Reason}");
if (e.Reason == CancellationReason.Error)
{
Console.WriteLine($"CANCELED: ErrorCode={e.ErrorCode}");
Console.WriteLine($"CANCELED: ErrorDetails={e.ErrorDetails}");
Console.WriteLine($"CANCELED: Did you update the subscription info?");
}
stopRecognition.TrySetResult(0);
};
recognizer.SessionStopped += (s, e) =>
{
Console.WriteLine("\n Session stopped event.");
stopRecognition.TrySetResult(0);
};
recognizer.StartContinuousRecognitionAsync().GetAwaiter().GetResult();
// Waits for completion. Use Task.WaitAny to keep the task rooted.
Task.WaitAny(new[] { stopRecognition.Task });
do
{
Console.WriteLine("Press Enter to stop");
} while (Console.ReadKey().Key != ConsoleKey.Enter);
// Stops recognition.
recognizer.StopContinuousRecognitionAsync().ConfigureAwait(false);
}
}
}
What units of time are being returned? Offset: 173800000? The model runs for a few seconds, not hours. What is meant by "offset" and "duration"?
Is there a way to timestamp at the message level and not word level? Or at least a way to focus on a subset of word level data that indicates when each utterance begins? I'm transcribing longer utterances and it's hours of audio.
Output is:
RECOGNIZED :{"DisplayText":"The speech Translation API transcribes audio streams into text. Your application can display this text to the user or act upon it as command input. You can use this API either with an SDK client library, or a rest a rest API.","Duration":163400000,"Id":"02d2042cadec4ae9bf324c91949620e0","NBest":[{"Confidence":0.85213876,"Display":"The speech Translation API transcribes audio streams into text. Your application can display this text to the user or act upon it as command input. You can use this API either with an SDK client library, or a rest a rest API.","ITN":"the speech translation API transcribes audio streams into text your application can display this text to the user or act upon it as command input you can use this API either with an SDK client library or a rest a rest API","Lexical":"the speech translation API transcribes audio streams into text your application can display this text to the user or act upon it as command input you can use this API either with an SDK client library or a rest a rest API","MaskedITN":"the speech translation api transcribes audio streams into text your application can display this text to the user or act upon it as command input you can use this api either with an sdk client library or a rest a rest api","Words":[{"Duration":1700000,"Offset":21800000,"Word":"the"},{"Duration":4300000,"Offset":23600000,"Word":"speech"},{"Duration":7300000,"Offset":28000000,"Word":"translation"},{"Duration":5600000,"Offset":35400000,"Word":"API"},{"Duration":6600000,"Offset":41300000,"Word":"transcribes"},{"Duration":3400000,"Offset":48000000,"Word":"audio"},{"Duration":4000000,"Offset":51500000,"Word":"streams"},{"Duration":1900000,"Offset":55600000,"Word":"into"},{"Duration":5900000,"Offset":57600000,"Word":"text"},{"Duration":2300000,"Offset":66200000,"Word":"your"},{"Duration":7100000,"Offset":68600000,"Word":"application"},{"Duration":1500000,"Offset":75800000,"Word":"can"},{"Duration":3700000,"Offset":77400000,"Word":"display"},{"Duration":2100000,"Offset":81200000,"Word":"this"},{"Duration":3200000,"Offset":83400000,"Word":"text"},{"Duration":800000,"Offset":86700000,"Word":"to"},{"Duration":1100000,"Offset":87600000,"Word":"the"},{"Duration":4900000,"Offset":88800000,"Word":"user"},{"Duration":2700000,"Offset":94000000,"Word":"or"},{"Duration":1700000,"Offset":96800000,"Word":"act"},{"Duration":2300000,"Offset":98600000,"Word":"upon"},{"Duration":900000,"Offset":101000000,"Word":"it"},{"Duration":1300000,"Offset":102000000,"Word":"as"},{"Duration":3700000,"Offset":103400000,"Word":"command"},{"Duration":5800000,"Offset":107200000,"Word":"input"},{"Duration":2000000,"Offset":116900000,"Word":"you"},{"Duration":1700000,"Offset":119000000,"Word":"can"},{"Duration":2300000,"Offset":120800000,"Word":"use"},{"Duration":2100000,"Offset":123200000,"Word":"this"},{"Duration":6300000,"Offset":125400000,"Word":"API"},{"Duration":2500000,"Offset":131800000,"Word":"either"},{"Duration":1500000,"Offset":134400000,"Word":"with"},{"Duration":1100000,"Offset":136000000,"Word":"an"},{"Duration":6300000,"Offset":137200000,"Word":"SDK"},{"Duration":4100000,"Offset":143600000,"Word":"client"},{"Duration":7900000,"Offset":147800000,"Word":"library"},{"Duration":6200000,"Offset":158100000,"Word":"or"},{"Duration":2000000,"Offset":164900000,"Word":"a"},{"Duration":4700000,"Offset":167000000,"Word":"rest"},{"Duration":1700000,"Offset":172000000,"Word":"a"},{"Duration":4300000,"Offset":173800000,"Word":"rest"},{"Duration":7000000,"Offset":178200000,"Word":"API"}]},{"Confidence":0.85089046,"Display":"the speech translation API transcribes audio streams into text your application can display this text to the user or act upon it is command input you can use this API either with an SDK client library or a rest a rest API","ITN":"the speech translation api transcribes audio streams into text your application can display this text to the user or act upon it is command input you can use this api either with an sdk client library or a rest a rest api","Lexical":"the speech translation api transcribes audio streams into text your application can display this text to the user or act upon it is command input you can use this api either with an sdk client library or a rest a rest api","MaskedITN":"the speech translation api transcribes audio streams into text your application can display this text to the user or act upon it is command input you can use this api either with an sdk client library or a rest a rest api","Words":[{"Duration":1700000,"Offset":21800000,"Word":"the"},{"Duration":4300000,"Offset":23600000,"Word":"speech"},{"Duration":7300000,"Offset":28000000,"Word":"translation"},{"Duration":5600000,"Offset":35400000,"Word":"API"},{"Duration":6600000,"Offset":41300000,"Word":"transcribes"},{"Duration":3400000,"Offset":48000000,"Word":"audio"},{"Duration":4000000,"Offset":51500000,"Word":"streams"},{"Duration":1900000,"Offset":55600000,"Word":"into"},{"Duration":5900000,"Offset":57600000,"Word":"text"},{"Duration":2300000,"Offset":66200000,"Word":"your"},{"Duration":7100000,"Offset":68600000,"Word":"application"},{"Duration":1500000,"Offset":75800000,"Word":"can"},{"Duration":3700000,"Offset":77400000,"Word":"display"},{"Duration":2100000,"Offset":81200000,"Word":"this"},{"Duration":3200000,"Offset":83400000,"Word":"text"},{"Duration":800000,"Offset":86700000,"Word":"to"},{"Duration":1100000,"Offset":87600000,"Word":"the"},{"Duration":4900000,"Offset":88800000,"Word":"user"},{"Duration":2700000,"Offset":94000000,"Word":"or"},{"Duration":1700000,"Offset":96800000,"Word":"act"},{"Duration":2300000,"Offset":98600000,"Word":"upon"},{"Duration":900000,"Offset":101000000,"Word":"it"},{"Duration":1300000,"Offset":102000000,"Word":"is"},{"Duration":3700000,"Offset":103400000,"Word":"command"},{"Duration":5800000,"Offset":107200000,"Word":"input"},{"Duration":2000000,"Offset":116900000,"Word":"you"},{"Duration":1700000,"Offset":119000000,"Word":"can"},{"Duration":2300000,"Offset":120800000,"Word":"use"},{"Duration":2100000,"Offset":123200000,"Word":"this"},{"Duration":6300000,"Offset":125400000,"Word":"API"},{"Duration":2500000,"Offset":131800000,"Word":"either"},{"Duration":1500000,"Offset":134400000,"Word":"with"},{"Duration":1100000,"Offset":136000000,"Word":"an"},{"Duration":6300000,"Offset":137200000,"Word":"SDK"},{"Duration":4100000,"Offset":143600000,"Word":"client"},{"Duration":7900000,"Offset":147800000,"Word":"library"},{"Duration":6200000,"Offset":158100000,"Word":"or"},{"Duration":2000000,"Offset":164900000,"Word":"a"},{"Duration":4700000,"Offset":167000000,"Word":"rest"},{"Duration":1700000,"Offset":172000000,"Word":"a"},{"Duration":4300000,"Offset":173800000,"Word":"rest"},{"Duration":7000000,"Offset":178200000,"Word":"API"}]},{"Confidence":0.8548482,"Display":"the speech translation API transcribes audio streams and a text your application can display this text to the user or act upon it as command input you can use this API either with an SDK client library or a rest a rest API","ITN":"the speech translation api transcribes audio streams and a text your application can display this text to the user or act upon it as command input you can use this api either with an sdk client library or a rest a rest api","Lexical":"the speech translation api transcribes audio streams and a text your application can display this text to the user or act upon it as command input you can use this api either with an sdk client library or a rest a rest api","MaskedITN":"the speech translation api transcribes audio streams and a text your application can display this text to the user or act upon it as command input you can use this api either with an sdk client library or a rest a rest api","Words":[{"Duration":1700000,"Offset":21800000,"Word":"the"},{"Duration":4300000,"Offset":23600000,"Word":"speech"},{"Duration":7300000,"Offset":28000000,"Word":"translation"},{"Duration":5600000,"Offset":35400000,"Word":"API"},{"Duration":6600000,"Offset":41300000,"Word":"transcribes"},{"Duration":3400000,"Offset":48000000,"Word":"audio"},{"Duration":3900000,"Offset":51500000,"Word":"streams"},{"Duration":1200000,"Offset":55500000,"Word":"and"},{"Duration":700000,"Offset":56800000,"Word":"a"},{"Duration":5900000,"Offset":57600000,"Word":"text"},{"Duration":2300000,"Offset":66200000,"Word":"your"},{"Duration":7100000,"Offset":68600000,"Word":"application"},{"Duration":1500000,"Offset":75800000,"Word":"can"},{"Duration":3700000,"Offset":77400000,"Word":"display"},{"Duration":2100000,"Offset":81200000,"Word":"this"},{"Duration":3200000,"Offset":83400000,"Word":"text"},{"Duration":800000,"Offset":86700000,"Word":"to"},{"Duration":1100000,"Offset":87600000,"Word":"the"},{"Duration":4900000,"Offset":88800000,"Word":"user"},{"Duration":2700000,"Offset":94000000,"Word":"or"},{"Duration":1700000,"Offset":96800000,"Word":"act"},{"Duration":2300000,"Offset":98600000,"Word":"upon"},{"Duration":900000,"Offset":101000000,"Word":"it"},{"Duration":1300000,"Offset":102000000,"Word":"as"},{"Duration":3700000,"Offset":103400000,"Word":"command"},{"Duration":5800000,"Offset":107200000,"Word":"input"},{"Duration":2000000,"Offset":116900000,"Word":"you"},{"Duration":1700000,"Offset":119000000,"Word":"can"},{"Duration":2300000,"Offset":120800000,"Word":"use"},{"Duration":2100000,"Offset":123200000,"Word":"this"},{"Duration":6300000,"Offset":125400000,"Word":"API"},{"Duration":2500000,"Offset":131800000,"Word":"either"},{"Duration":1500000,"Offset":134400000,"Word":"with"},{"Duration":1100000,"Offset":136000000,"Word":"an"},{"Duration":6300000,"Offset":137200000,"Word":"SDK"},{"Duration":4100000,"Offset":143600000,"Word":"client"},{"Duration":7900000,"Offset":147800000,"Word":"library"},{"Duration":6200000,"Offset":158100000,"Word":"or"},{"Duration":2000000,"Offset":164900000,"Word":"a"},{"Duration":4700000,"Offset":167000000,"Word":"rest"},{"Duration":1700000,"Offset":172000000,"Word":"a"},{"Duration":4300000,"Offset":173800000,"Word":"rest"},{"Duration":7000000,"Offset":178200000,"Word":"API"}]},{"Confidence":0.8535998,"Display":"the speech translation API transcribes audio streams and a text your application can display this text to the user or act upon it is command input you can use this API either with an SDK client library or a rest a rest API","ITN":"the speech translation api transcribes audio streams and a text your application can display this text to the user or act upon it is command input you can use this api either with an sdk client library or a rest a rest api","Lexical":"the speech translation api transcribes audio streams and a text your application can display this text to the user or act upon it is command input you can use this api either with an sdk client library or a rest a rest api","MaskedITN":"the speech translation api transcribes audio streams and a text your application can display this text to the user or act upon it is command input you can use this api either with an sdk client library or a rest a rest api","Words":[{"Duration":1700000,"Offset":21800000,"Word":"the"},{"Duration":4300000,"Offset":23600000,"Word":"speech"},{"Duration":7300000,"Offset":28000000,"Word":"translation"},{"Duration":5600000,"Offset":35400000,"Word":"API"},{"Duration":6600000,"Offset":41300000,"Word":"transcribes"},{"Duration":3400000,"Offset":48000000,"Word":"audio"},{"Duration":3900000,"Offset":51500000,"Word":"streams"},{"Duration":1200000,"Offset":55500000,"Word":"and"},{"Duration":700000,"Offset":56800000,"Word":"a"},{"Duration":5900000,"Offset":57600000,"Word":"text"},{"Duration":2300000,"Offset":66200000,"Word":"your"},{"Duration":7100000,"Offset":68600000,"Word":"application"},{"Duration":1500000,"Offset":75800000,"Word":"can"},{"Duration":3700000,"Offset":77400000,"Word":"display"},{"Duration":2100000,"Offset":81200000,"Word":"this"},{"Duration":3200000,"Offset":83400000,"Word":"text"},{"Duration":800000,"Offset":86700000,"Word":"to"},{"Duration":1100000,"Offset":87600000,"Word":"the"},{"Duration":4900000,"Offset":88800000,"Word":"user"},{"Duration":2700000,"Offset":94000000,"Word":"or"},{"Duration":1700000,"Offset":96800000,"Word":"act"},{"Duration":2300000,"Offset":98600000,"Word":"upon"},{"Duration":900000,"Offset":101000000,"Word":"it"},{"Duration":1300000,"Offset":102000000,"Word":"is"},{"Duration":3700000,"Offset":103400000,"Word":"command"},{"Duration":5800000,"Offset":107200000,"Word":"input"},{"Duration":2000000,"Offset":116900000,"Word":"you"},{"Duration":1700000,"Offset":119000000,"Word":"can"},{"Duration":2300000,"Offset":120800000,"Word":"use"},{"Duration":2100000,"Offset":123200000,"Word":"this"},{"Duration":6300000,"Offset":125400000,"Word":"API"},{"Duration":2500000,"Offset":131800000,"Word":"either"},{"Duration":1500000,"Offset":134400000,"Word":"with"},{"Duration":1100000,"Offset":136000000,"Word":"an"},{"Duration":6300000,"Offset":137200000,"Word":"SDK"},{"Duration":4100000,"Offset":143600000,"Word":"client"},{"Duration":7900000,"Offset":147800000,"Word":"library"},{"Duration":6200000,"Offset":158100000,"Word":"or"},{"Duration":2000000,"Offset":164900000,"Word":"a"},{"Duration":4700000,"Offset":167000000,"Word":"rest"},{"Duration":1700000,"Offset":172000000,"Word":"a"},{"Duration":4300000,"Offset":173800000,"Word":"rest"},{"Duration":7000000,"Offset":178200000,"Word":"API"}]},{"Confidence":0.8474758,"Display":"the speech translation API transcribes audio streams into text your application can display this text to the user or act upon it as command input you can use this API either within SDK client library or a rest a rest API","ITN":"the speech translation api transcribes audio streams into text your application can display this text to the user or act upon it as command input you can use this api either within sdk client library or a rest a rest api","Lexical":"the speech translation api transcribes audio streams into text your application can display this text to the user or act upon it as command input you can use this api either within sdk client library or a rest a rest api","MaskedITN":"the speech translation api transcribes audio streams into text your application can display this text to the user or act upon it as command input you can use this api either within sdk client library or a rest a rest api","Words":[{"Duration":1700000,"Offset":21800000,"Word":"the"},{"Duration":4300000,"Offset":23600000,"Word":"speech"},{"Duration":7300000,"Offset":28000000,"Word":"translation"},{"Duration":5600000,"Offset":35400000,"Word":"API"},{"Duration":6600000,"Offset":41300000,"Word":"transcribes"},{"Duration":3400000,"Offset":48000000,"Word":"audio"},{"Duration":4000000,"Offset":51500000,"Word":"streams"},{"Duration":1900000,"Offset":55600000,"Word":"into"},{"Duration":5900000,"Offset":57600000,"Word":"text"},{"Duration":2300000,"Offset":66200000,"Word":"your"},{"Duration":7100000,"Offset":68600000,"Word":"application"},{"Duration":1500000,"Offset":75800000,"Word":"can"},{"Duration":3700000,"Offset":77400000,"Word":"display"},{"Duration":2100000,"Offset":81200000,"Word":"this"},{"Duration":3200000,"Offset":83400000,"Word":"text"},{"Duration":800000,"Offset":86700000,"Word":"to"},{"Duration":1100000,"Offset":87600000,"Word":"the"},{"Duration":4900000,"Offset":88800000,"Word":"user"},{"Duration":2700000,"Offset":94000000,"Word":"or"},{"Duration":1700000,"Offset":96800000,"Word":"act"},{"Duration":2300000,"Offset":98600000,"Word":"upon"},{"Duration":900000,"Offset":101000000,"Word":"it"},{"Duration":1300000,"Offset":102000000,"Word":"as"},{"Duration":3700000,"Offset":103400000,"Word":"command"},{"Duration":5800000,"Offset":107200000,"Word":"input"},{"Duration":2000000,"Offset":116900000,"Word":"you"},{"Duration":1700000,"Offset":119000000,"Word":"can"},{"Duration":2300000,"Offset":120800000,"Word":"use"},{"Duration":2100000,"Offset":123200000,"Word":"this"},{"Duration":6300000,"Offset":125400000,"Word":"API"},{"Duration":2500000,"Offset":131800000,"Word":"either"},{"Duration":2700000,"Offset":134400000,"Word":"within"},{"Duration":6300000,"Offset":137200000,"Word":"SDK"},{"Duration":4100000,"Offset":143600000,"Word":"client"},{"Duration":7900000,"Offset":147800000,"Word":"library"},{"Duration":6200000,"Offset":158100000,"Word":"or"},{"Duration":2000000,"Offset":164900000,"Word":"a"},{"Duration":4700000,"Offset":167000000,"Word":"rest"},{"Duration":1700000,"Offset":172000000,"Word":"a"},{"Duration":4300000,"Offset":173800000,"Word":"rest"},{"Duration":7000000,"Offset":178200000,"Word":"API"}]}],"Offset":21800000,"RecognitionStatus":"Success"}
CANCELED: Reason=EndOfStream
Press Enter to stop
Session stopped event.
Also, why does the output repeat the results 5 times? I'm looking to trim this down as much as possible for data analysis. Is there a way to change the unit of measurement to something more user friendly, such as seconds?

For Q1:
Why does the output repeat the results 5 times?
Actually, you can find the answer from STT FAQ by the question:
I get several results for each phrase with the detailed output format.
Which one should I use?
It is by design that you can get several results in NBest of JSON response with different Confidence scores, by default, the system will choose the first as display result. You can choose the result as you need, for instance, the result with the highest Confidence score.
For Q2:
Is there a way to change the unit of measurement to something more user-friendly, such as seconds?
In fact, Azure not provides any further ways to use the result. But I write a simple demo based on your code:
using Microsoft.CognitiveServices.Speech;
using Microsoft.CognitiveServices.Speech.Audio;
using Newtonsoft.Json;
using System;
using System.Collections.Generic;
using System.Threading.Tasks;
using System.Linq;
namespace STTwithTime
{
class Program
{
static void Main(string[] args)
{
var key = "";
var region = "";
var audioFilePath = #"";
var speechConfig = SpeechConfig.FromSubscription(key, region);
// Generates timestamps
speechConfig.RequestWordLevelTimestamps();
speechConfig.OutputFormat = OutputFormat.Detailed;
var stopRecognition = new TaskCompletionSource<int>();
var audioConfig = AudioConfig.FromWavFileInput(audioFilePath);
var recognizer = new SpeechRecognizer(speechConfig, audioConfig);
//Display Recognized
recognizer.Recognized += (s, e) =>
{
if (e.Result.Reason == ResultReason.RecognizedSpeech)
{
var result = JsonConvert.DeserializeObject<Result>(e.Result.Properties.GetProperty(PropertyId.SpeechServiceResponse_JsonResult));
var maxConfidenceValue = result.NBest.Max(item => item.Confidence);
var maxConfidence = result.NBest.Find(item => item.Confidence == maxConfidenceValue);
Console.WriteLine("================================");
Console.WriteLine("Confidence:"+maxConfidence.Confidence);
Console.WriteLine("RECOGNIZED :" + maxConfidence.Display);
Console.WriteLine("Duration: :" + Convert.ToDouble(result.Duration) / 10000000);
Console.WriteLine("Words:");
foreach (var word in maxConfidence.Words) {
Console.WriteLine(word.word + "=> offset:" + Convert.ToDouble(word.Offset) / 10000000 + " duraction:" + Convert.ToDouble(word.Duration) / 10000000);
}
}
else if (e.Result.Reason == ResultReason.NoMatch)
{
Console.WriteLine($"NOMATCH: Speech could not be recognized.");
}
};
recognizer.Canceled += (s, e) =>
{
Console.WriteLine($"CANCELED: Reason={e.Reason}");
if (e.Reason == CancellationReason.Error)
{
Console.WriteLine($"CANCELED: ErrorCode={e.ErrorCode}");
Console.WriteLine($"CANCELED: ErrorDetails={e.ErrorDetails}");
Console.WriteLine($"CANCELED: Did you update the subscription info?");
}
stopRecognition.TrySetResult(0);
};
recognizer.SessionStopped += (s, e) =>
{
Console.WriteLine("\n Session stopped event.");
stopRecognition.TrySetResult(0);
};
recognizer.StartContinuousRecognitionAsync().GetAwaiter().GetResult();
// Waits for completion. Use Task.WaitAny to keep the task rooted.
Task.WaitAny(new[] { stopRecognition.Task });
}
}
public class Word
{
public int Duration { get; set; }
public int Offset { get; set; }
public string word { get; set; }
}
public class NBest
{
public double Confidence { get; set; }
public string Display { get; set; }
public string ITN { get; set; }
public string Lexical { get; set; }
public string MaskedITN { get; set; }
public List<Word> Words { get; set; }
}
public class Result
{
public string DisplayText { get; set; }
public int Duration { get; set; }
public string Id { get; set; }
public List<NBest> NBest { get; set; }
public int Offset { get; set; }
public string RecognitionStatus { get; set; }
}
}
Result:
If you specify a long time .wav file, the result will be split into multiple parties.

Android Notification Sound Causes Media Volume to Duck (Lower) & It Never Comes Back

I just converted one of my apps to target Android API 9 (was targeting API 8); now when notifications are sent out, the volume of media is lowered and never comes back to full volume.
The app uses WebView to play media files. This was not happening prior to targeting API 9. I had to convert the app into level 9 so that I could upload to the Google Play Store. I am running a Samsung S7 which was originally designed for API level 6 (with the OS upgraded to 8.0), not sure if that has something to do with the issue. Another detail is that I use Xamarin.Android for development, not sure if that matters either.
Additionally, I forced the notifications to play a blank sound (a very short[couple ms] blank mp3) in the same build that I converted the app to target API 9:
var channelSilent = new Android.App.NotificationChannel(CHANNEL_ID, name + " Silent", Android.App.NotificationImportance.High)
{
Description = description
};
var alarmAttributes = new Android.Media.AudioAttributes.Builder()
.SetContentType(Android.Media.AudioContentType.Sonification)
.SetUsage(Android.Media.AudioUsageKind.Notification).Build()
//blank is blank mp3 file with nothing in it, a few ms in duration
var uri = Android.Net.Uri.Parse("file:///Assets/blank.mp3")
channelSilent.SetSound(uri, alarmAttributes);
...so it could also be the blank sound that is causing the ducking to malfunction, not the API change. Is there something to do with notification sound ducking that could be causing the issue? Is there any other way to mute a notification with Xamarin.Android other than playing a blank sound? That is one route I think would be worth trying to fix this issue.
Here is the code I am using to generate notifications:
private static List<CustomNotification> _sentNotificationList = new List<CustomNotification>();
private static NotificationManagerCompat _notificationManager;
public async void SendNotifications(List<CustomNotification> notificationList)
{
await Task.Run(() =>
{
try
{
var _ctx = Android.App.Application.Context;
if (_notificationManager == null)
{
_notificationManager = Android.Support.V4.App.NotificationManagerCompat.From(_ctx);
}
if (notificationList.Count == 0)
{
return;
}
int notePos = 0;
foreach (var note in notificationList)
{
var resultIntent = new Intent(_ctx, typeof(MainActivity));
var valuesForActivity = new Bundle();
valuesForActivity.PutInt(MainActivity.COUNT_KEY, _count);
valuesForActivity.PutString("URL", note._noteLink);
resultIntent.PutExtras(valuesForActivity);
var resultPendingIntent = PendingIntent.GetActivity(_ctx, MainActivity.NOTIFICATION_ID, resultIntent, PendingIntentFlags.UpdateCurrent);
resultIntent.AddFlags(ActivityFlags.SingleTop);
var alarmAttributes = new Android.Media.AudioAttributes.Builder()
.SetContentType(Android.Media.AudioContentType.Sonification)
.SetUsage(Android.Media.AudioUsageKind.Notification).Build();
//I am playing this blank sound to prevent android from spamming sounds as the notifications get sent out
var uri = Android.Net.Uri.Parse("file:///Assets/blank.mp3");
//if the notification is the first in our batch then use this
//code block to send the notifications with sound
if (!_sentNotificationList.Contains(note) && notePos == 0)
{
var builder = new Android.Support.V4.App.NotificationCompat.Builder(_ctx, MainActivity.CHANNEL_ID + 1)
.SetAutoCancel(true)
.SetContentIntent(resultPendingIntent) // Start up this activity when the user clicks the intent.
.SetContentTitle(note._noteText) // Set the title
.SetNumber(1) // Display the count in the Content Info
.SetSmallIcon(Resource.Drawable.bitchute_notification2)
.SetContentText(note._noteType)
.SetPriority(NotificationCompat.PriorityMin);
MainActivity.NOTIFICATION_ID++;
_notificationManager.Notify(MainActivity.NOTIFICATION_ID, builder.Build());
_sentNotificationList.Add(note);
notePos++;
}
//if the notification isn't the first in our batch, then use this
//code block to send the notifications without sound
else if (!_sentNotificationList.Contains(note))
{
var builder = new Android.Support.V4.App.NotificationCompat.Builder(_ctx, MainActivity.CHANNEL_ID)
.SetAutoCancel(true) // Dismiss the notification from the notification area when the user clicks on it
.SetContentIntent(resultPendingIntent) // Start up this activity when the user clicks the intent.
.SetContentTitle(note._noteText) // Set the title
.SetNumber(1) // Display the count in the Content Info
.SetSmallIcon(Resource.Drawable.bitchute_notification2)
.SetContentText(note._noteType)
.SetPriority(NotificationCompat.PriorityHigh);
MainActivity.NOTIFICATION_ID++;
_notificationManager.Notify(MainActivity.NOTIFICATION_ID, builder.Build());
_sentNotificationList.Add(note);
notePos++;
}
ExtStickyService._notificationsHaveBeenSent = true;
}
}
catch
{
}
});
}
In my MainActivity I've created two different notification channels: one is silent; the other uses default notification setting for the device:
void CreateNotificationChannel()
{
var alarmAttributes = new Android.Media.AudioAttributes.Builder()
.SetContentType(Android.Media.AudioContentType.Sonification)
.SetUsage(Android.Media.AudioUsageKind.Notification).Build();
var uri = Android.Net.Uri.Parse("file:///Assets/blank.mp3");
if (Build.VERSION.SdkInt < BuildVersionCodes.O)
{
// Notification channels are new in API 26 (and not a part of the
// support library). There is no need to create a notification
// channel on older versions of Android.
return;
}
var name = "BitChute";
var description = "BitChute for Android";
var channelSilent = new Android.App.NotificationChannel(CHANNEL_ID, name + " Silent", Android.App.NotificationImportance.High)
{
Description = description
};
var channel = new Android.App.NotificationChannel(CHANNEL_ID + 1, name, Android.App.NotificationImportance.High)
{
Description = description
};
channel.LockscreenVisibility = NotificationVisibility.Private;
//here is where I set the sound for the silent channel... this could be the issue?
var notificationManager = (Android.App.NotificationManager)GetSystemService(NotificationService);
channelSilent.SetSound(uri, alarmAttributes);
notificationManager.CreateNotificationChannel(channel);
notificationManager.CreateNotificationChannel(channelSilent);
}
Full source: https://github.com/hexag0d/BitChute_Mobile_Android_BottomNav/tree/APILevel9
EDIT: something really interesting is that if I pulldown the system ui bar, the volume goes back to normal. Very strange workaround but it might help diagnose the cause.
DOUBLE EDIT: I used .SetSound(null, null) instead of using the blank .mp3 and the ducking works fine now. See comments

Sending Email to SpecifiedPickupDirectory with MailKit

I was using SmtpClient till now with ASP.NET MVC 5. For testing email send functionality on local system, I was using client.DeliveryMethod = SmtpDeliveryMethod.SpecifiedPickupDirectory;
Now, I want to do the same things in ASP.NET Core which does not have SmtpClient class implemented till now. All search for this ended up on MailKit. I have used their send mail code which is working fine with gmail.
I do not want to send testing emails each time and there may be a lot of scenarios in my project where I need to send email. How can I use the local email sending functionality with MailKit. Any links or little source code will help. Thanks

I'm not sure on the finer details of how SmtpDeliveryMethod.SpecifiedPickupDirectory works and what it does exactly, but I suspect it might just save the message in a directory where the local Exchange server periodically checks for mail to send out.
Assuming that's the case, you could do something like this:
public static void SaveToPickupDirectory (MimeMessage message, string pickupDirectory)
{
do {
// Generate a random file name to save the message to.
var path = Path.Combine (pickupDirectory, Guid.NewGuid ().ToString () + ".eml");
Stream stream;
try {
// Attempt to create the new file.
stream = File.Open (path, FileMode.CreateNew);
} catch (IOException) {
// If the file already exists, try again with a new Guid.
if (File.Exists (path))
continue;
// Otherwise, fail immediately since it probably means that there is
// no graceful way to recover from this error.
throw;
}
try {
using (stream) {
// IIS pickup directories expect the message to be "byte-stuffed"
// which means that lines beginning with "." need to be escaped
// by adding an extra "." to the beginning of the line.
//
// Use an SmtpDataFilter "byte-stuff" the message as it is written
// to the file stream. This is the same process that an SmtpClient
// would use when sending the message in a `DATA` command.
using (var filtered = new FilteredStream (stream)) {
filtered.Add (new SmtpDataFilter ());
// Make sure to write the message in DOS (<CR><LF>) format.
var options = FormatOptions.Default.Clone ();
options.NewLineFormat = NewLineFormat.Dos;
message.WriteTo (options, filtered);
filtered.Flush ();
return;
}
}
} catch {
// An exception here probably means that the disk is full.
//
// Delete the file that was created above so that incomplete files are not
// left behind for IIS to send accidentally.
File.Delete (path);
throw;
}
} while (true);
}
The above code snippet uses Guid.NewGuid () as a way of generating a temporary filename, but you can use whatever method you want (e.g. you could also opt to use message.MessageId + ".eml").
Based on Microsoft's referencesource, when SpecifiedPickupDirectory is used, they actually also use Guid.NewGuid ().ToString () + ".eml", so that's probably the way to go.

How to better implement start and stop video recording Web Services using Axis Media Parser API?

I have a C# Web Service which records data from an IP camera. This service works fine when I record data during a specified amount of time, for example 10 seconds. But my objective is to achieve data recording for an unspecified amount of time and let the user press to stop recording. So I modified my code creating a new Web Service (stopRecording) to change the value of a global variable that acts as a mutex. Obviously this is wrong because I test it but I don´t know how to proceed. Can anybody help me? I would really appreciate it.
Down here I leave the relevant code.
[WebMethod]
public string startRecording(string ipAddress)
{
// Connection preset for H.264 (HTTP API 3.0)
string Url = "axrtsp:://" + ipAddress + "/axis-media/media.amp?videocodec=h264";
string UserName = "username";
string Password = "pass";
string Path = "C:/directory/subdirectory/";
string Filename = "myRecordedFile.bin";
string FilePath = Path + Filename;
// Open binary output file to write parsed video frames into
using (FileStream outFileStream = new FileStream(FilePath, FileMode.Create))
using (outFile = new BinaryWriter(outFileStream))
{
try
{
// Register for events like OnVideoSample, OnAudioSample, OnTriggerData and OnError
...
// Set connection and media properties
...
// Get absolute time from Axis device
...
// Connect to the device
int cookieID;
int numberOfStreams;
object buffer;
parser.Connect(out cookieID, out numberOfStreams, out buffer);
// Write media type information to out file (buffer is an array of bytes)
byte[] mediaTypeBuffer = (byte[])buffer;
outFile.Write(mediaTypeBuffer.Length);
outFile.Write(mediaTypeBuffer, 0, mediaTypeBuffer.Length);
// Start the media stream and registered event handlers will be called
parser.Start();
Debug.WriteLine("Will start recording...");
do {
Debug.WriteLine("recording..."); //want to record during an unspecified time
} while (record); //my mutex variable that doesn´t make the thing even when I call the stopRecording Web Service. The program remains overlooping
System.Diagnostics.Debug.Write("Finish recording... never reached!!!!! ");
// Stop the stream
parser.Stop();
// Unregister event handlers
...
}
catch (COMException e)
{
Console.WriteLine("Exception for {0}, {1}", parser.MediaURL, e.Message);
}
// Inform the GC that COM object no longer will be used
Marshal.FinalReleaseComObject(parser);
parser = null;
Console.WriteLine("Stream stopped");
}
return "Recording from camera " + Url;
}
[WebMethod]
public string stopRecording()
{
System.Diagnostics.Debug.Write("I want to stop recording...");
record = false;
return "Stop";
}

Your record variable is not a Mutex object, but a simple flag... Which is besides the point.
Trouble here is that you code in startRecording() never gives the hand back to the parent class and might be holding the processing thread forever.
If I might suggest, why not create a thread to do your recording ? You have a number of possibilities here ( new Thread(), Action.BeginInvoke(), .. )
This way, you give a chance to your stopRecording to be received and set this record flag to false and leave the recording thread.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

AWS - Amazon Polly Text To Speech - c#

Amazon Polly returns a .mp3 file. It is not responsible for playing the audio file. Any difficulties you are experiencing playing/stopping the audio would be the result of the code you are using to play an MP3 audio file. It has nothing to do with the Amazon Polly service itself.

Thank you! I found the real problem: when I stopped the audio I didn't print the rest of the messages. I add the call to the function that print messages on the chat. For stopping the voice I used the command audio.src="";

Related

Firebase Cross-platform issue - .setAsync() is setting in 2 path destinations instead of just one

How to return a simple timestamp output in Azure?

Android Notification Sound Causes Media Volume to Duck (Lower) & It Never Comes Back

Sending Email to SpecifiedPickupDirectory with MailKit

How to better implement start and stop video recording Web Services using Axis Media Parser API?

Categories

Resources