Kinect speech recognition and skeleton tracking don't work together - c#

I'm writing an application that can take several different external inputs (keyboard presses, motion gestures, speech) and produce similar outputs (for instance, pressing "T" on the keyboard will do the same thing as saying the word "Travel" out loud). Because of that, I don't want any of the input managers to know about each other. Specifically, I don't want the Kinect manager (as much as possible) to know about the Speech manager and vice versa, even though I'm using the Kinect's built-in microphone (the Speech manager should work with ANY microphone). I'm using System.Speech in the Speech manager as opposed to Microsoft.Speech.
I'm having a problem where as soon as the Kinect motion recognition module is enabled, the speech module stops receiving input. I've tried a whole bunch of things like inverting the skeleton stream and audio stream, capturing the audio stream in different ways, etc. I finally narrowed down the problem: something about how I'm initializing my modules does not play nicely with how my application deals with events.
The application works great until motion capture starts. If I completely exclude the Kinect module, this is how my main method looks:
// Main.cs
public static void Main()
{
// Create input managers
KeyboardMouseManager keymanager = new KeyboardMouseManager();
SpeechManager speechmanager = new SpeechManager();
// Start listening for keyboard input
keymanager.start();
// Start listening for speech input
speechmanager.start()
try
{
Application.Run();
}
catch (Exception ex)
{
MessageBox.Show(ex.StackTrace);
}
}
I'm using Application.Run() because my GUI is handled by an outside program. This C# application's only job is to receive input events and run external scripts based on that input.
Both the keyboard and speech modules receive events sporadically. The Kinect, on the other hand, generates events constantly. If my gestures happened just as infrequently, a polling loop might be the answer with a wait time between each poll. However, I'm using the Kinect to control mouse movement... I can't afford to wait between skeleton event captures, because then the mouse would be very laggy; my skeleton capture loop needs to be as constant as possible. This presented a big problem, because now I can't have my Kinect manager on the same thread (or message pump? I'm a little hazy on the difference, hence why I think the problem lies here): from the way I understand it, being on the same thread would not allow keyboard or speech events to consistently get through. Instead, I kind of hacked together a solution where I made my Kinect manager inherit from System.Windows.Forms, so that it would work with Application.Run().
Now, my main method looks like this:
// Main.cs
public static void Main()
{
// Create input managers
KeyboardMouseManager keymanager = new KeyboardMouseManager();
KinectManager kinectManager = new KinectManager();
SpeechManager speechmanager = new SpeechManager();
// Start listening for keyboard input
keymanager.start();
// Attempt to launch the kinect sensor
bool kinectLoaded = kinectManager.start();
// Use the default microphone (if applicable) if kinect isn't hooked up
// Use the kinect microphone array if the kinect is working
if (kinectLoaded)
{
speechmanager.start(kinectManager);
}
else
{
speechmanager.start();
}
try
{
// THIS IS THE PLACE I THINK I'M DOING SOMETHING WRONG
Application.Run(kinectManager);
}
catch (Exception ex)
{
MessageBox.Show(ex.StackTrace);
}
For some reason, the Kinect microphone loses its "default-ness" as soon as the Kinect sensor is started (if this observation is incorrect, or there is a workaround, PLEASE let me know). Because of that, I was required to make a special start() method in the Speech manager, which looks like this:
// SpeechManager.cs
/** For use with the Kinect Microphone **/
public void start(KinectManager kinect)
{
// Get the speech recognizer information
RecognizerInfo recogInfo = SpeechRecognitionEngine.InstalledRecognizers().FirstOrDefault();
if (null == recogInfo)
{
Console.WriteLine("Error: No recognizer information found on Kinect");
return;
}
SpeechRecognitionEngine recognizer = new SpeechRecognitionEngine(recogInfo.Id);
// Loads all of the grammars into the recognizer engine
loadSpeechBindings(recognizer);
// Set speech event handler
recognizer.SpeechRecognized += speechRecognized;
using (var s = kinect.getAudioSource().Start() )
{
// Set the input to the Kinect audio stream
recognizer.SetInputToAudioStream(s, new SpeechAudioFormatInfo(EncodingFormat.Pcm, 16000, 16, 1, 32000, 2, null));
// Recognize asycronous speech events
recognizer.RecognizeAsync(RecognizeMode.Multiple);
}
}
For reference, the start() method in the Kinect manager looks like this:
// KinectManager.cs
public bool start()
{
// Code from Microsoft Sample
kinect = (from sensorToCheck in KinectSensor.KinectSensors where sensorToCheck.Status == KinectStatus.Connected select sensorToCheck).FirstOrDefault();
// Fail elegantly if no kinect is detected
if (kinect == null)
{
connected = false;
Console.WriteLine("Couldn't find a Kinect");
return false;
}
// Start listening
kinect.Start();
// Enable listening for all skeletons
kinect.SkeletonStream.Enable();
// Obtain the KinectAudioSource to do audio capture
source = kinect.AudioSource;
source.EchoCancellationMode = EchoCancellationMode.None; // No AEC for this sample
source.AutomaticGainControlEnabled = false; // Important to turn this off for speech recognition
kinect.AllFramesReady += new EventHandler<AllFramesReadyEventArgs>(allFramesReady);
connected = true;
return true;
}
So when I disable motion capture (by having my main() look similar to the first code segment), speech recognition works fine. When I enable motion capture, motion works great but no speech gets recognized. In both cases, keyboard events always work. There are no errors, and through tracing I found out that all the data in the speech manager is initialized correctly... it seems like the speech recognition events just disappear. How can I reorganize this code so that the input modules can work independently? Do I use threading, or just Application.Run() in a different way?

The Microsoft Kinect SDK have several known issues, one of them being that audio is not processed if you begin tracking the skeleton after starting the audio processor. From the known issues:
Audio is not processed if skeleton stream is enabled after starting audio capture
Due to a bug, enabling or disabling the SkeletonStream will stop the AudioSource
stream returned by the Kinect sensor. The following sequence of instructions will
stop the audio stream:
kinectSensor.Start();
kinectSensor.AudioSource.Start(); // --> this will create an audio stream
kinectSensor.SkeletonStream.Enable(); // --> this will stop the audio stream as an undesired side effect
The workaround is to invert the order of the calls or to restart the AudioSource after changing SkeletonStream status.
Workaround #1 (start audio after skeleton):
kinectSensor.Start();
kinectSensor.SkeletonStream.Enable();
kinectSensor.AudioSource.Start();
Workaround #2 (restart audio after skeleton):
kinectSensor.Start();
kinectSensor.AudioSource.Start(); // --> this will create an audio stream
kinectSensor.SkeletonStream.Enable(); // --> this will stop the audio stream as an undesired side effect
kinectSensor.AudioSource.Start(); // --> this will create another audio stream
Resetting the SkeletonStream engine status is an expensive call. It should be made at application startup only, unless the app has specific needs that require turning Skeleton on and off.
I also hope that when you say you're using "version 1" of the SDK, you mean "version 1.6". If you are using anything but 1.5 or 1.6, you are only hurting yourself due to the many changes that were made in 1.5.

Related

Seeking in a dynamic GStreamer pipeline

I have a dynamic pipeline that switches between sources (File/HTTP/RMTP/SRT) on set intervals, and outputs to an RTMP sink. This is working fine, until I attempt to perform a seek.
The seek itself works, but the next time I try to switch sources, the pipeline hangs.
I've been able to stop it from hanging by recreating the muxer after performing a seek, but I still end up with audio/video synchronization issues - the audio plays, but the video falls behind.
For what it's worth, I'm testing the outgoing RTMP stream in VLC, and if I stop/play the stream is back in sync and continues to play without issue.
Here is my pipeline graph:
The application is written in C#.
I'm performing the seek like this:
var bus = pipeline.Bus;
do
{
var msg = bus.TimedPopFiltered(Constants.SECOND, MessageType.Error | MessageType.Eos | MessageType.StateChanged);
// Parse message
if (msg != null)
{
...
}
else
{
if (pipeline.CurrentState == State.Playing && !seekDone)
{
seekDone = true;
pipeline.SetState(State.Paused);
var seekTo = 50 * Gst.Constant.SECOND;
var seeked = mux.SeekSimple(Format.Time, SeekFlags.Flush | SeekFlags.KeyUnit, seekTo);
if (!seeked)
{
Console.WriteLine("Seek failed");
}
else
{
Console.WriteLine("Performing seek");
}
pipeline.SetState(State.Playing);
}
}
}
while(!terminate)
The source switching is handled by a Timer on a separate thread while the pipeline is in the Paused state.
Any help would be greatly appreciated.
EDIT: I placed pad probes on the video and audio queues and can see that the video timestamps are falling behind the audio by a few seconds.
Considering that this only happens after a seek, I am wondering if there is something in the video processing that is taking too long and for the player to handle.
The VLC logs show that the player is dropping a lot of late video frames.

Speech Recognition Game - Unity

Good Afternoon everyone,
I am currently working on a uni project on accessibility in video games. My game uses eye tracking and speech recognition. It consists of 2 small level : a shooting game and a running level. The game i offline. The Eye tracking part works fine but I encountered an issue with the speech recognition. I am using the phrase recognizer from unity speech: https://learn.microsoft.com/en-us/windows/mixed-reality/develop/unity/voice-input-in-unity .
The problem is that there is a delay of a second to a second and a half from the moment I speak to the recognition. It happens before my onphrase recognizer is called (before my functions are called). The delay is still present when I take down wifi and cortana and I am wondering if there is any way to shorten it as it is pretty bad in a video game...
Here is the code in question:
//Speech recognition Initialization
private KeywordRecognizer keywordRecognizer;
private Dictionary<string, System.Action> actions = new Dictionary<string, System.Action>();
[...]
void Start()
{
//we add the jump function to the dictionnary
actions.Add("jump", () => Up(1.25f));
//we set the speech recognition function and start it
keywordRecognizer = new KeywordRecognizer(actions.Keys.ToArray(), ConfidenceLevel.Low);
keywordRecognizer.OnPhraseRecognized += RecognizedSpeech;
keywordRecognizer.Start();
}
private void RecognizedSpeech(PhraseRecognizedEventArgs speech)
{
Debug.LogWarning("jump");
actions[speech.text].Invoke();
}
public void EndListening()
{
actions.Clear();
//keywordRecognizer.Stop();
}
[...]"
Would anyone have a lead or an advice or is working/ has worked on something similar ?
Thank you for your time.

Screen recording as video, audio on SharpAvi - Audio not recording

Requirement:
I am trying to capture Audio/Video of windows screen with SharpAPI Example with Loopback audio stream of NAudio Example.
I am using C#, wpf to achieve the same.
Couple of nuget packages.
SharpAvi - forVideo capturing
NAudio - for Audio capturing
What has been achieved:
I have successfully integrated that with the sample provided and I'm trying to capture the audio through NAudio with SharpAPI video stream for the video to record along with audio implementation.
Issue:
Whatever I write the audio stream in SharpAvi video. On output, It was recorded only with video and audio is empty.
Checking audio alone to make sure:
But When I try capture the audio as separate file called "output.wav" and It was recorded with audio as expected and can able to hear the recorded audio. So, I'm concluding for now that the issue is only on integration with video via SharpApi
writterx = new WaveFileWriter("Out.wav", audioSource.WaveFormat);
Full code to reproduce the issue:
https://drive.google.com/open?id=1H7Ziy_yrs37hdpYriWRF-nuRmmFbsfe-
Code glimpse from Recorder.cs
NAudio Initialization:
audioSource = new WasapiLoopbackCapture();
audioStream = CreateAudioStream(audioSource.WaveFormat, encodeAudio, audioBitRate);
audioSource.DataAvailable += audioSource_DataAvailable;
Capturing audio bytes and write it on SharpAvi Audio Stream:
private void audioSource_DataAvailable(object sender, WaveInEventArgs e)
{
var signalled = WaitHandle.WaitAny(new WaitHandle[] { videoFrameWritten, stopThread });
if (signalled == 0)
{
audioStream.WriteBlock(e.Buffer, 0, e.BytesRecorded);
audioBlockWritten.Set();
Debug.WriteLine("Bytes: " + e.BytesRecorded);
}
}
Can you please help me out on this. Any other way to reach my requirement also welcome.
Let me know if any further details needed.
Obviously, author doesn't need it, but since I run to the same problem others might need it.
Problem in my case was that I was getting audio every 0.1 seconds and attempted to write both new video and audio at the same time. And getting new video data (taking screenshot) took me too long. Causing each frame was added every 0.3 seconds instead of 0.1. And that caused some problems with audio stream being not sync with video and not being played properly by video players (or whatever it was). And after optimizing code a little bit to be within 0.1 second, the problem is gone

C# how to detect which keyboard is pressed? [duplicate]

I have a barcode scanner (which acts like a keyboard) and of course I have a keyboard too hooked up to a computer. The software is accepting input from both the scanner and the keyboard. I need to accept only the scanner's input. The code is written in C#. Is there a way to "disable" input from the keyboard and only accept input from the scanner?
Note:
Keyboard is part of a laptop...so it cannot be unplugged. Also, I tried putting the following code
protected override Boolean ProcessDialogKey(System.Windows.Forms.Keys keyData)
{
return true;
}
But then along with ignoring the keystrokes from the keyboard, the barcode scanner input is also ignored.
I cannot have the scanner send sentinal characters as, the scanner is being used by other applications and adding a sentinal character stream would mean modifying other code.
Also, I cannot use the timing method of determining if the input came from a barcode scanner (if its a bunch of characters followed by a pause) since the barcodes scanned could potentially be single character barcodes.
Yes, I am reading data from a stream.
I am trying to follow along with the article: Distinguishing Barcode Scanners from the Keyboard in WinForms. However I have the following questions:
I get an error NativeMethods is inaccessible due to its protection level. It seems as though I need to import a dll; is this correct? If so, how do I do it?
Which protected override void WndProc(ref Message m) definition should I use, there are two implementations in the article?
Am getting an error related to [SecurityPermission( SecurityAction.LinkDemand, Flags = SecurityPermissionFlag.UnmanagedCode)] error CS0246: The type or namespace name 'SecurityPermission' could not be found (are you missing a using directive or an assembly reference?). How do I resolve this error?
There is also an error on the line containing: if ((from hardwareId in hardwareIds where deviceName.Contains(hardwareId) select hardwareId).Count() > 0) Error is error CS1026: ) expected.
Should I be placing all the code in the article in one .cs file called BarcodeScannerListener.cs?
Followup questions about C# solution source code posted by Nicholas Piasecki on http://nicholas.piasecki.name/blog/2009/02/distinguishing-barcode-scanners-from-the-keyboard-in-winforms/:
I was not able to open the solution in VS 2005, so I downloaded Visual C# 2008 Express Edition, and the code ran. However, after hooking up my barcode scanner and scanning a barcode, the program did not recognize the scan. I put a break point in OnBarcodeScanned method but it never got hit. I did change the App.config with the id of my Barcode scanner obtained using Device Manager. There seems to be 2 deviceNames with HID#Vid_0536&Pid_01c1 (which is obtained from Device Manager when the scanner is hooked up). I don't know if this is causing the scanning not to work. When iterating over the deviceNames, here is the list of devices I found (using the debugger):
"\??\HID#Vid_0536&Pid_01c1&MI_01#9&25ca5370&0&0000#{4d1e55b2-f16f-11cf-88cb-001111000030}"
"\??\HID#Vid_0536&Pid_01c1&MI_00#9&38e10b9&0&0000#{884b96c3-56ef-11d1-bc8c-00a0c91405dd}"
"\??\HID#Vid_413c&Pid_2101&MI_00#8&1966e83d&0&0000#{884b96c3-56ef-11d1-bc8c-00a0c91405dd}"
"\??\HID#Vid_413c&Pid_3012#7&960fae0&0&0000#{378de44c-56ef-11d1-bc8c-00a0c91405dd}"
"\??\Root#RDP_KBD#0000#{884b96c3-56ef-11d1-bc8c-00a0c91405dd}"
"\??\ACPI#PNP0303#4&2f94427b&0#{884b96c3-56ef-11d1-bc8c-00a0c91405dd}"
"\??\Root#RDP_MOU#0000#{378de44c-56ef-11d1-bc8c-00a0c91405dd}"
"\??\ACPI#PNP0F13#4&2f94427b&0#{378de44c-56ef-11d1-bc8c-00a0c91405dd}"
So there are 2 entries for HID#Vid_0536&Pid_01c1; could that be causing the scanning not to work?
OK so it seems that I had to figure out a way to not depend on the ASCII 0x04 character being sent by the scanner...since my scanner does not send that character. After that, the barcode scanned event is fired and the popup with the barcode is shown. So thanks Nicholas for your help.
You could use the Raw Input API to distinguish between the keyboard and the scanner like I did recently. It doesn't matter how many keyboard or keyboard-like devices you have hooked up; you will see a WM_INPUT before the keystroke is mapped to a device-independent virtual key that you typically see in a KeyDown event.
Far easier is to do what others have recommended and configure the scanner to send sentinel characters before and after the barcode. (You usually do this by scanning special barcodes in the back of the scanner's user manual.) Then, your main form's KeyPreview event can watch those roll end and swallow the key events for any child control if it's in the middle of a barcode read. Or, if you wanted to be fancier, you could use a low-level keyboard hook with SetWindowsHookEx() to watch for those sentinels and swallow them there (advantage of this is you could still get the event even if your app didn't have focus).
I couldn't change the sentinel values on our barcode scanners among other things so I had to go the complicated route. Was definitely painful. Keep it simple if you can!
--
Your update, seven years later: If your use case is reading from a USB barcode scanner, Windows 10 has a nice, friendly API for this built-in in Windows.Devices.PointOfService.BarcodeScanner. It's a UWP/WinRT API, but you can use it from a regular desktop app as well; that's what I'm doing now. Here's some example code for it, straight from my app, to give you the gist:
{
using System;
using System.Linq;
using System.Threading.Tasks;
using System.Windows;
using Windows.Devices.Enumeration;
using Windows.Devices.PointOfService;
using Windows.Storage.Streams;
using PosBarcodeScanner = Windows.Devices.PointOfService.BarcodeScanner;
public class BarcodeScanner : IBarcodeScanner, IDisposable
{
private ClaimedBarcodeScanner scanner;
public event EventHandler<BarcodeScannedEventArgs> BarcodeScanned;
~BarcodeScanner()
{
this.Dispose(false);
}
public bool Exists
{
get
{
return this.scanner != null;
}
}
public void Dispose()
{
this.Dispose(true);
GC.SuppressFinalize(this);
}
public async Task StartAsync()
{
if (this.scanner == null)
{
var collection = await DeviceInformation.FindAllAsync(PosBarcodeScanner.GetDeviceSelector());
if (collection != null && collection.Count > 0)
{
var identity = collection.First().Id;
var device = await PosBarcodeScanner.FromIdAsync(identity);
if (device != null)
{
this.scanner = await device.ClaimScannerAsync();
if (this.scanner != null)
{
this.scanner.IsDecodeDataEnabled = true;
this.scanner.ReleaseDeviceRequested += WhenScannerReleaseDeviceRequested;
this.scanner.DataReceived += WhenScannerDataReceived;
await this.scanner.EnableAsync();
}
}
}
}
}
private void WhenScannerDataReceived(object sender, BarcodeScannerDataReceivedEventArgs args)
{
var data = args.Report.ScanDataLabel;
using (var reader = DataReader.FromBuffer(data))
{
var text = reader.ReadString(data.Length);
var bsea = new BarcodeScannedEventArgs(text);
this.BarcodeScanned?.Invoke(this, bsea);
}
}
private void WhenScannerReleaseDeviceRequested(object sender, ClaimedBarcodeScanner args)
{
args.RetainDevice();
}
private void Dispose(bool disposing)
{
if (disposing)
{
this.scanner = null;
}
}
}
}
Granted, you'll need a barcode scanner that supports the USB HID POS and isn't just a keyboard wedge. If your scanner is just a keyboard wedge, I recommend picking up something like a used Honeywell 4600G off eBay for like $25. Trust me, your sanity will be worth it.
What I did in a similar situation is distinguish between a scan and a user typing by looking at the speed of the input.
Lots of characters very close together then a pause is a scan. Anything else is keyboard input.
I don't know exactly your requirements, so maybe that won't do for you, but it's the best I've got :)
It depends on the way you are interacting with the device. Anyway it wont be a C# solution, it will be some other library. Are you reading data from a stream? If you are just taking keystrokes, there may be nothing you can do about it.
I know this is an old thread, found it by searching barcode scanning in WIN10.
Just a few notes in case someone needs it.
These scanners from Honeywell have several USB interfaces.
One is a keyboard + Hid Point of sales (composite device).
Also there are CDC-ACM (ComPort emulation) and Hid Point of sales (alone) + more.
By default the scanners expose a serial number, so the host can distinguish between many devices (I had once +20 connected). There is a command to disable the serial number though!
The newer models behave the same in this regard.
If you want to see it live, try my terminal program yat3 (free on my site).
It can open all the interfaces mentioned above and is tailored for such devices.
A word to use keyboard interfaces:
Only use them as a last resort. They are slow, less reliable when it comes to exotic characters. The only good use is if you want to enter data into existing applications. If you code anyway, then reading from ComPort/HidPos-Device is easier.
look at this: http://nate.dynalias.net/dev/keyboardredirector.rails (NOT AVAILABLE ANYMORE) works great!
Specify the keyboard and the keys you want to block, and it works like a charm!
Also take a look at this: http://www.oblita.com/interception.html
You can create a C# wrapper for it - it also works like a charm..
I think you might be able to distinguish multiple keyboards through DirectX API, or if that doesn't work, through raw input API.
I have successfully accomplished what you folks are looking for here. I have an application that receives all barcode character data from a Honeywell/Metrologic barcode scanner. No other application on the system receives the data from the scanner, and the keyboard continues to function normally.
My application uses a combination of raw input and the dreaded low-level keyboard hook system. Contrary to what is written here, I found that the wm_input message is received before the keyboard hook function is called. My code to process the wm_input message basically sets a boolean variable to specify whether or not the received character is from the scanner. The keyboard hook function, called immediately after the wm_input is processed, swallows the scanner’s pseudo-keyboard data, preventing the data from being received by other applications.
The keyboard hook function has to be placed in an dll since you want to intercept all system keyboard messages. Also, a memory mapped file has to be used for the wm_input processing code to communicate with the dll.

MC75 Barcode Reader Issue

I am aiding in the development for a custom made application for the Motorola MC75. It is well tuned except for a random bug with the barcode reader. Periodically, the barcode reader will only activate (start a read) if the right shoulder button is pressed. The middle and left shoulder buttons somehow become disabled. This is a unique bug in that it happens randomly and only effects 2 of the three buttons. The EMDK enables all buttons simultaneously so I am clueless as to where this is coming from (Internal or code related). If anyone has any input or advice please let me know and thank you beforehand.
Thanks,
Zach
I've worked with the Motorola EMDK before on the MC55. I'm not sure why the buttons are being disabled, and since you posted this in June you probably don't need the answer anymore, but here's a possible workaround:
Instead of letting the EMDK handle the triggers on its own, you can capture all triggers by setting up an event:
// Create a trigger device to handle all trigger events of stage 2 (pressed) or RELEASED
var device = new TriggerDevice(TriggerID.ALL_TRIGGERS, new[] { TriggerState.RELEASED, TriggerState.STAGE2 });
var trigger = new Trigger(device);
trigger.Stage2Notify += OnTrigger;
Then, in your OnTrigger method, you can handle the trigger and perform the appropriate action. For example, you can activate your barcode reader when any trigger is pressed:
private void OnTrigger(object sender, TriggerEventArgs e)
{
if (e.NewState == e.PreviousState)
return;
// Pseudocode
if (e.NewState == TriggerState.RELEASED)
{
myBarcodeReader.Actions.ToggleSoftTrigger();
myBarcodeReader.Actions.Flush();
myBarcodeReader.Actions.Disable();
}
else if (e.NewState == TriggerState.STAGE2)
{
// Prepare the barcode reader for scanning
// This initializes various objects but does not actually enable the scanner device
// The scanner device would still need to be triggered either via hardware or software
myBarcodeReader.Actions.Enable();
myBarcodeReader.Actions.Read(data);
// Finally, turn on the scanner via software
myBarcodeReader.Actions.ToggleSoftTrigger();
}
}

Categories

Resources