I cannot get the the text from the wav file using SpeechRecognizer class.
When I debug the code under I see that when I delay I get text but it eventually crashes.
Is the code incorrect?
What am I missing inorder to wait on all the results and collect them in totalText which is a field variable.
using (var audioInput = AudioConfig.FromWavFileInput(wavFile))
{
using (var recognizer = new SpeechRecognizer(configuration, audioInput))
{
recognizer.Recognized += (s, e) =>
{
if (e.Result.Reason == ResultReason.RecognizedSpeech)
{
System.Diagnostics.Debug.WriteLine($"RECOGNIZED: Text={e.Result.Text}");
totalText += e.Result.Text;
}
else if (e.Result.Reason == ResultReason.NoMatch)
{
System.Diagnostics.Debug.WriteLine($"NOMATCH: Speech could not be recognized.");
}
};
recognizer.Canceled += (s, e) =>
{
System.Diagnostics.Debug.WriteLine($"CANCELED: Reason={e.Reason}");
if (e.Reason == CancellationReason.Error)
{
System.Diagnostics.Debug.WriteLine($"CANCELED: ErrorCode={e.ErrorCode}");
System.Diagnostics.Debug.WriteLine($"CANCELED: ErrorDetails={e.ErrorDetails}");
System.Diagnostics.Debug.WriteLine($"CANCELED: Did you update the subscription info?");
}
stopRecognition.TrySetResult(0);
};
recognizer.SessionStarted += (s, e) =>
{
System.Diagnostics.Debug.WriteLine("\n Session started event.");
};
recognizer.SessionStopped += (s, e) =>
{
System.Diagnostics.Debug.WriteLine("\n Session stopped event.");
System.Diagnostics.Debug.WriteLine("\nStop recognition.");
stopRecognition.TrySetResult(0);
};
recognizer.SpeechEndDetected += (s, e) =>
{
System.Diagnostics.Debug.WriteLine($"SpeechEndDetected: Did you update the subscription info?");
SaveFile(totalText);
stopRecognition.TrySetResult(0);
};
// Starts continuous recognition. Uses StopContinuousRecognitionAsync() to stop recognition.
await recognizer.StartContinuousRecognitionAsync().ConfigureAwait(false);
// Waits for completion.
// Use Task.WaitAny to keep the task rooted.
Task.WaitAny(new[] { stopRecognition.Task });
// Stops recognition.
await recognizer.StopContinuousRecognitionAsync().ConfigureAwait(false);
if (totalText != string.Empty)
{
SaveFile(totalText);
}
}
}
I get this result in the end.
The program '[9312] testhost.exe' has exited with code 0 (0x0).
enter code here
The call to the above code was done synchronously instead of async thus causing erratic behaviour.
Related
I am playing with the real-time conversation
and I get this error
IOException: Unable to write data to the transport connection: An existing connection was forcibly closed by the remote host.
SocketException: An existing connection was forcibly closed by the remote host.
could somebody help me, please
public static async Task TranscribeConversationsAsync(string voiceSignatureStringUser1, string voiceSignatureStringUser2)
{
var filepath = "Tech.wav";
var config = SpeechConfig.FromSubscription(VoiceGenerator.subscriptionKey, VoiceGenerator.region);
config.SetProperty("ConversationTranscriptionInRoomAndOnline", "true");
// en-us by default. Adding this code to specify other languages, like zh-cn.
// config.SpeechRecognitionLanguage = "zh-cn";
var stopRecognition = new TaskCompletionSource<int>();
using (var audioInput = AudioConfig.FromWavFileInput(filepath))
{
var meetingID = Guid.NewGuid().ToString();
using (var conversation = await Conversation.CreateConversationAsync(config, meetingID))
{
// create a conversation transcriber using audio stream input
using (var conversationTranscriber = new ConversationTranscriber(audioInput))
{
conversationTranscriber.Transcribing += (s, e) =>
{
Console.WriteLine($"TRANSCRIBING: Text={e.Result.Text} SpeakerId={e.Result.UserId}");
};
conversationTranscriber.Transcribed += (s, e) =>
{
if (e.Result.Reason == ResultReason.RecognizedSpeech)
{
Console.WriteLine($"TRANSCRIBED: Text={e.Result.Text} SpeakerId={e.Result.UserId}");
}
else if (e.Result.Reason == ResultReason.NoMatch)
{
Console.WriteLine($"NOMATCH: Speech could not be recognized.");
}
};
conversationTranscriber.Canceled += (s, e) =>
{
Console.WriteLine($"CANCELED: Reason={e.Reason}");
if (e.Reason == CancellationReason.Error)
{
Console.WriteLine($"CANCELED: ErrorCode={e.ErrorCode}");
Console.WriteLine($"CANCELED: ErrorDetails={e.ErrorDetails}");
Console.WriteLine($"CANCELED: Did you set the speech resource key and region values?");
stopRecognition.TrySetResult(0);
}
};
conversationTranscriber.SessionStarted += (s, e) =>
{
Console.WriteLine($"\nSession started event. SessionId={e.SessionId}");
};
conversationTranscriber.SessionStopped += (s, e) =>
{
Console.WriteLine($"\nSession stopped event. SessionId={e.SessionId}");
Console.WriteLine("\nStop recognition.");
stopRecognition.TrySetResult(0);
};
// Add participants to the conversation.
var speaker1 = Participant.From("User1", "en-US", voiceSignatureStringUser1);
var speaker2 = Participant.From("User2", "en-US", voiceSignatureStringUser2);
await conversation.AddParticipantAsync(speaker1);
await conversation.AddParticipantAsync(speaker2);
// Join to the conversation and start transcribing
await conversationTranscriber.JoinConversationAsync(conversation);
await conversationTranscriber.StartTranscribingAsync().ConfigureAwait(false);
// waits for completion, then stop transcription
Task.WaitAny(new[] { stopRecognition.Task });
await conversationTranscriber.StopTranscribingAsync().ConfigureAwait(false);
}
It appears that something is blocking my connection, but why? I Searched of google, but I only find reference for ASP, not for console apps
Timestamps are not appearing in my results when I run my speech-to-text Azure model. I'm not getting any errors, but also not getting timestamped results. My code is:
using System;
using System.Threading.Tasks;
using Microsoft.CognitiveServices.Speech;
using Microsoft.CognitiveServices.Speech.Audio;
namespace SUPRA
{
internal class NewBaseType
{
static async Task Main(string[] args)
{
// Creates an instance of a speech config with specified subscription key and region.
// Replace with your own subscription key and service region (e.g., "westus").
var config = SpeechConfig.FromSubscription("8ec6730993d54cf9a9cec0f5d08b8e8b", "eastus");
// Generates timestamps
config.OutputFormat = OutputFormat.Detailed;
config.RequestWordLevelTimestamps();
//calls the audio file
using (var audioInput = AudioConfig.FromWavFileInput("C:/Users/MichaelSchwartz/source/repos/AI-102-Process-Speech-master/transcribe_speech_to_text/media/narration.wav"))
// Creates a speech recognizer from microphone.
using (var recognizer = new SpeechRecognizer(config, audioInput))
{
recognizer.Recognized += (s, e) =>
{
var result = e.Result;
if (result.Reason == ResultReason.RecognizedSpeech)
{
Console.WriteLine(result.Text);
}
};
recognizer.Recognized += (s, e) =>
{
var j = e.Result.Properties.GetProperty(PropertyId.SpeechServiceResponse_JsonResult);
};
recognizer.Canceled += (s, e) =>
{
Console.WriteLine($"\n Canceled. Reason: {e.Reason.ToString()}, CanceledReason: {e.Reason}");
};
recognizer.SessionStarted += (s, e) =>
{
Console.WriteLine("\n Session started event.");
};
recognizer.SessionStopped += (s, e) =>
{
Console.WriteLine("\n Session stopped event.");
};
// Starts continuous recognition.
// Uses StopContinuousRecognitionAsync() to stop recognition.
await recognizer.StartContinuousRecognitionAsync().ConfigureAwait(false);
do
{
Console.WriteLine("Press Enter to stop");
} while (Console.ReadKey().Key != ConsoleKey.Enter);
// Stops recognition.
await recognizer.StopContinuousRecognitionAsync().ConfigureAwait(false);
}
}
}
}
No errors are returned and the results are accurate but without timestamps. I've included the code to produce timestamps in lines 37-40. How do I get timestamps to generate? Thanks.
You configured correctly but seems you haven't print the result in the console. Just try the code below:
using Microsoft.CognitiveServices.Speech;
using Microsoft.CognitiveServices.Speech.Audio;
using System;
using System.Threading.Tasks;
namespace STTwithTime
{
class Program
{
static void Main(string[] args)
{
var key = "";
var region = "";
var audioFilePath = #"";
var speechConfig = SpeechConfig.FromSubscription(key, region);
// Generates timestamps
speechConfig.RequestWordLevelTimestamps();
speechConfig.OutputFormat = OutputFormat.Detailed;
var stopRecognition = new TaskCompletionSource<int>();
var audioConfig = AudioConfig.FromWavFileInput(audioFilePath);
var recognizer = new SpeechRecognizer(speechConfig, audioConfig);
//Display Recognizing
recognizer.Recognizing += (s, e) =>
{
Console.WriteLine($"RECOGNIZING:{e.Result.Properties.GetProperty(PropertyId.SpeechServiceResponse_JsonResult)}");
};
//Display Recognized
recognizer.Recognized += (s, e) =>
{
if (e.Result.Reason == ResultReason.RecognizedSpeech)
{
Console.WriteLine($"RECOGNIZED :{e.Result.Properties.GetProperty(PropertyId.SpeechServiceResponse_JsonResult)}");
}
else if (e.Result.Reason == ResultReason.NoMatch)
{
Console.WriteLine($"NOMATCH: Speech could not be recognized.");
}
};
recognizer.Canceled += (s, e) =>
{
Console.WriteLine($"CANCELED: Reason={e.Reason}");
if (e.Reason == CancellationReason.Error)
{
Console.WriteLine($"CANCELED: ErrorCode={e.ErrorCode}");
Console.WriteLine($"CANCELED: ErrorDetails={e.ErrorDetails}");
Console.WriteLine($"CANCELED: Did you update the subscription info?");
}
stopRecognition.TrySetResult(0);
};
recognizer.SessionStopped += (s, e) =>
{
Console.WriteLine("\n Session stopped event.");
stopRecognition.TrySetResult(0);
};
recognizer.StartContinuousRecognitionAsync().GetAwaiter().GetResult();
// Waits for completion. Use Task.WaitAny to keep the task rooted.
Task.WaitAny(new[] { stopRecognition.Task });
}
}
}
Result
Display recognizing:
Display recognized:
I have setup a timer:
System.Timers.Timer timer = new System.Timers.Timer (1000);
timer.Enabled = true;
timer.Elapsed += (object sender, System.Timers.ElapsedEventArgs e) => {
timer.Enabled = false;
ui.CldOnPhoneNumberSent(); // <- this method is not fired
};
the second method is not called.
if i switch the methods as in:
timer.Elapsed += (object sender, System.Timers.ElapsedEventArgs e) => {
ui.CldOnPhoneNumberSent();
timer.Enabled = false; // <- this method is not called and the timer repeats
}
what's wrong?
Edit:
When the method is called from a timer, it's not called completely!:
timer.Elapsed += (object sender, System.Timers.ElapsedEventArgs e) => {
((Timer)sender).Enabled = false;
ui.method1();
};
void method1()
{
do something; //<-- called
do Something; //<-- not called
}
It could be a problem with variable closure in the anonymous method - try using the sender value instead of referencing timer:
timer.Elapsed += (object sender, System.Timers.ElapsedEventArgs e) => {
((Timer)sender).Enabled = false;
ui.CldOnPhoneNumberSent(); // <- this method is not fired
};
As was said in comments, the most likely reason is that your CldOnPhoneNumberSent() throws some exception preventing further execution.
You should rewrite as follow:
var timer = new System.Timers.Timer (1000);
timer.Elapsed += (sender, args) =>
{
((Timer)sender).Enabled = false;
try
{
ui.CldOnPhoneNumberSent();
}
catch (Exception e)
{
// log exception
// do something with it, eventually rethrow it
}
};
timer.Enabled = true;
Note that is you are inside a WPF application and want to access object created in the UI thread, you may need to dispatch the call:
Action callback = ui.CldOnPhoneNumberSent;
var app = Application.Current;
if (app == null)
{
// This prevents unexpected exceptions being thrown during shutdown (or domain unloading).
return;
}
if (app.CheckAccess())
{
// Already on the correct thread, just execute the action
callback();
}
else
{
// Invoke through the dispatcher
app.Dispatcher.Invoke(callback);
}
As a final note, if you are using .Net 4.5 (with C# 5) you might consider using the async/await pattern instead of System.Timers.Timer, which is easier to use and more readable:
private async Task YourMethod()
{
await Task.Delay(1000)
.ConfigureAwait(continueOnCapturedContext: true); // this makes sure continuation is on the same thread (in your case it should be the UI thread)
try
{
ui.CldOnPhoneNumberSent();
}
catch (Exception e)
{
// log exception
// do something with it, eventually rethrow it
}
}
I have a universal app the uses voice synthesis. Running under WP8.1, it works fine, but as soon as I try Win8.1 I start getting strange behaviour. The actual voice seems to speak once, however, on the second run (within the same app), the following code hangs:
string toSay = "hello";
System.Diagnostics.Debug.WriteLine("{0}: Speak {1}", DateTime.Now, toSay);
using (SpeechSynthesizer synth = new SpeechSynthesizer())
{
System.Diagnostics.Debug.WriteLine("{0}: After sythesizer instantiated", DateTime.Now);
var voiceStream = await synth.SynthesizeTextToStreamAsync(toSay);
System.Diagnostics.Debug.WriteLine("{0}: After voice stream", DateTime.Now);
The reason for the debug statements is that the code seems to have an uncertainty principle to it. That is, when I debug through it, the code executes and passes the SynthesizeTextToStreamAsync statement. However, when the breakpoits are removed, I only get the debug statement preceding it - never the one after.
The best I can deduce is that during the first run through something bad happens (it does claim to complete and actually speaks the first time), then it gets stuck and can't play any more. The full code looks similar to this:
string toSay = "hello";
System.Diagnostics.Debug.WriteLine("{0}: Speak {1}", DateTime.Now, toSay);
using (SpeechSynthesizer synth = new SpeechSynthesizer())
{
System.Diagnostics.Debug.WriteLine("{0}: After sythesizer instantiated", DateTime.Now);
var voiceStream = await synth.SynthesizeTextToStreamAsync(toSay);
System.Diagnostics.Debug.WriteLine("{0}: After voice stream", DateTime.Now);
MediaElement mediaElement;
mediaElement = rootControl.Children.FirstOrDefault(a => a as MediaElement != null) as MediaElement;
if (mediaElement == null)
{
mediaElement = new MediaElement();
rootControl.Children.Add(mediaElement);
}
mediaElement.SetSource(voiceStream, voiceStream.ContentType);
mediaElement.Volume = 1;
mediaElement.IsMuted = false;
var tcs = new TaskCompletionSource<bool>();
mediaElement.MediaEnded += (o, e) => { tcs.TrySetResult(true); };
mediaElement.MediaFailed += (o, e) => { tcs.TrySetResult(true); };
mediaElement.Play();
await tcs.Task;
Okay - I think I managed to get this working... although I'm unsure why.
using (SpeechSynthesizer synth = new SpeechSynthesizer())
{
var voiceStream = await synth.SynthesizeTextToStreamAsync(toSay);
MediaElement mediaElement;
mediaElement = rootControl.Children.FirstOrDefault(a => a as MediaElement != null) as MediaElement;
if (mediaElement == null)
{
mediaElement = new MediaElement();
rootControl.Children.Add(mediaElement);
}
mediaElement.SetSource(voiceStream, voiceStream.ContentType);
mediaElement.Volume = 1;
mediaElement.IsMuted = false;
var tcs = new TaskCompletionSource<bool>();
mediaElement.MediaEnded += (o, e) => { tcs.TrySetResult(true); };
mediaElement.Play();
await tcs.Task;
// Removing the control seems to free up whatever is locking
rootControl.Children.Remove(mediaElement);
}
I am not sure what program language you are using. However this may help. This is in C# so this could help lead you in the right direction.
namespace Alexis
{
public partial class frmMain : Form
{
SpeechRecognitionEngine _recognizer = new SpeechRecognitionEngine();
SpeechSynthesizer Alexis = new SpeechSynthesizer();
SpeechRecognitionEngine startlistening = new SpeechRecognitionEngine();
DateTime timenow = DateTime.Now;
}
//other coding such as InitializeComponent and others.
//
//
//
//
private void frmMain_Load(object sender, EventArgs e)
{
_recognizer.SetInputToDefaultAudioDevice();
_recognizer.LoadGrammarAsync(new Grammar(new GrammarBuilder(new Choices(File.ReadAllLines(#"Default Commands.txt")))));
_recognizer.SpeechRecognized += new EventHandler<SpeechRecognizedEventArgs>(Shell_SpeechRecognized);
_recognizer.SpeechRecognized += new EventHandler<SpeechRecognizedEventArgs>(Social_SpeechRecognized);
_recognizer.SpeechRecognized += new EventHandler<SpeechRecognizedEventArgs>(Web_SpeechRecognized);
_recognizer.SpeechRecognized += new EventHandler<SpeechRecognizedEventArgs>(Default_SpeechRecognized);
_recognizer.SpeechRecognized += new EventHandler<SpeechRecognizedEventArgs>(AlarmClock_SpeechRecognized);
_recognizer.LoadGrammarAsync(new Grammar(new GrammarBuilder(new Choices(AlarmAM))));
_recognizer.LoadGrammarAsync(new Grammar(new GrammarBuilder(new Choices(AlarmPM))));
_recognizer.SpeechDetected += new EventHandler<SpeechDetectedEventArgs>(_recognizer_SpeechDetected);
_recognizer.RecognizeAsync(RecognizeMode.Multiple);
startlistening.SetInputToDefaultAudioDevice();
startlistening.LoadGrammarAsync(new Grammar(new GrammarBuilder(new Choices("alexis"))));
startlistening.SpeechRecognized += new EventHandler<SpeechRecognizedEventArgs>(startlistening_SpeechRecognized);
//other stuff here..... Then once you have this then you can generate a method then with your code as follows
//
//
//
private void Default_SpeechRecognized(object sender, SpeechRecognizedEventArgs e)
{
int ranNum;
string speech = e.Result.Text;
switch (speech)
{
#region Greetings
case "hello":
case "hello alexis":
timenow = DateTime.Now;
if (timenow.Hour >= 5 && timenow.Hour < 12)
{ Alexis.SpeakAsync("Goodmorning " + Settings.Default.User); }
if (timenow.Hour >= 12 && timenow.Hour < 18)
{ Alexis.SpeakAsync("Good afternoon " + Settings.Default.User); }
if (timenow.Hour >= 18 && timenow.Hour < 24)
{ Alexis.SpeakAsync("Good evening " + Settings.Default.User); }
if (timenow.Hour < 5)
{ Alexis.SpeakAsync("Hello " + Settings.Default.User + ", it's getting late"); }
break;
case "whats my name":
case "what is my name":
Alexis.SpeakAsync(Settings.Default.User);
break;
case "stop talking":
case "quit talking":
Alexis.SpeakAsyncCancelAll();
ranNum = rnd.Next(1, 2);
if (ranNum == 2)
{ Alexis.Speak("sorry " + Settings.Default.User); }
break;
}
}
instead of using the commands in the code. I recommend that you use a text document. once you have that then you can add your own commands to it then put it in code. Also reference the System.Speech.
I hope this helps on getting you on the right track.
I am loading huge files to the memory but on this calculation my application is freezes .
Any idea what is the issue with my code ?
public void Drop(DragEventArgs args)
{
BackgroundWorker worker = new BackgroundWorker();
string fileName = IsSingleTextFile(args);
if (fileName == null) return;
worker.DoWork += (o, ea) =>
{
try
{
StreamReader fileToLoad = new StreamReader(fileName);
string filecontent = fileToLoad.ReadToEnd();
fileToLoad.Close();
// providing the information to the UI thread
Application.Current.Dispatcher.BeginInvoke(DispatcherPriority.Background,
new Action(() => SfmLogFile = filecontent));
}
catch (Exception)
{
throw;
}
};
worker.RunWorkerCompleted += (o, ea) =>
{
args.Handled = true;
IsBusy = false;
};
// Mark the event as handled, so TextBox's native Drop handler is not called.
IsBusy = true;
worker.RunWorkerAsync();
}
I'd transform your sample to something like this:
public void Drop(DragEventArgs args)
{
string fileName = IsSingleTextFile(args);
if (fileName == null) return;
// It is better to create worker after check for file name.
BackgroundWorker worker = new BackgroundWorker();
worker.DoWork += (o, ea) =>
{
try
{
string filecontent = ReadAllText(fileName);
ea.Result = fileContent;
}
catch (Exception)
{
throw;
}
};
worker.RunWorkerCompleted += (o, ea) =>
{
var fileContent = ea.Result as string;
Application.Current.Dispatcher.BeginInvoke(DispatcherPriority.Background,
new Action(() => SfmLogFile = filecontent));
// if IsBusy propery is not on the UI thread, then you may leave it here
// otherwise it should be set using the dispatcher too.
IsBusy = false;
};
IsBusy = true;
worker.RunWorkerAsync();
// Mark the event as handled, so TextBox's native Drop handler is not called.
args.Handled = true;
}
I am not sure if it's the cause of your problem, but you are setting args.Handled in a callback for the background worker, so it will be called after the drop event handler has returned. It won't have the desired effect as it's set too late, and it might mess up something in the event handling.