I am been trying to find a mehthod for online speech recognition, for eg: very similar to google voice search, which does not require the user to install any plugin/software/flash. The user just has to plugin the microphone and speak something for the text to get recognised.
I thought of this approach but don't know if this is corrector not. I built a dll which can take an input audio stream and give an output of recognized txt out of audio. I referenced this dll in ASP.NET references, and further thinking to upload an audio file from the user side to the server which will then be used the 'recognizer' dll. I am not sure if this approach is correct or not? Is there any other approach that I can follow?
The main thing is I can't have the user install anything or any dependency for this implementation such as flash/silverlight etc.
If you can specify that your users use Chrome 11, or later, you could use Google's webkit to speech enable your application. Here is a link on how to use webkit for speech. This leverages the audio input capabilities that are available in HTML5. If you take a look at this blog it will explain how it works, because the author reverse engineered it. It is taking the audio input from the user in the browser and sending it to a service for processing, returning the results as a JSON message. You could build your own service on the server side, as you are suggesting, to imitate what Google is doing. Building a scalable service for speech recognition will not be a small feat.
Related
i'm trying to use Azure Cognitive Services Speech to Text and i am hitting a roadblock in .net Core
i have native support for a WAV file using the audioConfig.FromWafFileInput(); which is great.
however i need to also support MP3's
I have found compressed audio support
https://learn.microsoft.com/en-us/azure/cognitive-services/speech-service/how-to-use-codec-compressed-audio-input-streams?tabs=debian&pivots=programming-language-csharp
however this is referencing PushAudio Streams.
this is where i'm getting lost....
i have found this example for stream codec compressed audio
https://github.com/Azure-Samples/cognitive-services-speech-sdk/blob/master/samples/cpp/linux/compressed-audio-input/compressed-audio-input.cpp
however this is not C# .net core and conversion is not really my strong suit.
so yeah at a bit of a loss.
any assistance would be greatly appreciated (y)
This sample: https://github.com/Azure-Samples/cognitive-services-speech-sdk/blob/master/samples/csharp/sharedcontent/console/speech_recognition_samples.cs has compressed audio specific methods here and here. The latter pull stream sample seems pretty straightforward, just plug in your key, region, and filepath.
If you have files, especially if you have multiple of them, you can benefit from using batch transcription. It natively supports files in WAV, MP3 and OGG format.
The documentation links to the API documentation, that also includes model customization. Here you can select the region you are interested in and export a swagger file. The swagger file you can use to generate a client in the programming language of your choice.
For your scenario you will only need 4 APIs and you could use the standard HttpClient to execute the requests. You would want to
Create a batch transcription.
Get your transcriptions to check the state. If it is complete, you get the URL you will need next. If it is failed, you get a message about the problem.
Get the results after the batch transcription succeeded. The object with the kind TranscriptionReport contains a list of files that got transcribed, if the transcription was successful and if not, why. The other objects contain the result of the successful transcriptions.
(here you need to iterate over the contentUrls, to download the files.)
Delete the transcription(s), after you got the results.
One of the requirements for the web application I'm creating is that users should be able to create and edit documents. I've been searching around and I came across the Google Drive REST API, however I'm a little unsure about what it can do.
From what I understand, the API allows my application to access a user's Google Drive account and their files, being able to open and edit them, as well as create documents using my application.
However, I was hoping that I could be able to use the Google Docs editor itself to create and open/edit documents, but from what I can gather is that the editor is up to me to create, and that I can use the Realtime API to enable the collaboration feature that Google Docs offers.
Is this the case? Is Google leaving the job of actually creating the document editor itself up to me (sorry if I sound like a whiny child here, it's an honest question), or does Drive API also provide their editor? The reason I want to use their editors is because it perfectly fits the requirements for the application, and it will be near impossible for me to compete with their document editor.
If I do I have to create the editor myself, can anyone recommend any open source/free document editors with similar features to that of the Google Docs editor that works with C# ASP.NET, or a way that I could somehow use the Google Docs editor in my application?
The short answer is no, Google does not allow directly editing Google Docs themselves, nor is there an API for recreating the Docs editor.
Bear in mind also that realtime data is not actually stored in Google Drive. Google uses Drive as its organisation method for realtime data, but the data itself, being collaborative, is not just a simple file. What is stored in Drive is a shortcut which links to your app's realtime data. In the case of an existing file (text etc), a shortcut is attached to the file, but it can also be a pure shortcut file, with no non-realtime data at all. Only your app can read or modify that realtime data, in much the same way that only Docs can (directly) work with its realtime data.
You can definitely re-create the capabilities of Google Docs using the realtime API, by exporting from Docs, using the realtime API to collaborate on the exported data, then re-import into Docs if necessary. At that point, Google Docs themselves may be superfluous.
What's involved will be something like this:
Set up an app in the Google developers console
Write the editor, and incorporate it in your app
Get the user to authorize your app to access their Drive
Using the picker, or another method, get the user to select a file.
Import that file from Docs
Collaboratively edit it within your app
Export it back to Docs.
You can embed Google Editor in to your web app and use it to edit, comment or read files, that are stored on Google Drive. You need:
click share button in the file
chose emails you want to share document with (or you can choose any one who has link, or even make it public)
choose permissions you want to grant: read, comment, edit
copy that link and paste it in the <iframe src=google_link width=x height=y></iframe> tag in your UI.
I am developping a mobile app using Unity3D framework which is a C#/Javscript 3D engine, working on desktop and mobile plateform.
I need to recover the content of an XLS file on a cloud-based storage (Dropbox or GoogleDrive), then process it on the mobile plateform to transform it into a local SQLite database.
The mobile app will regulary check for modification on the remote xls file, and push it locally if needed to process it.
The framework I use (Unity3D) allow me to work with both C# and Javascript technologies.
What would be the best strategy to implement such a fonctionnality.
I'm totally newbie with web API, and I've seen in Dropbox documentation that there is several possibilities to interact with a cloud folder.
Is there a possiblity to fetch the content of an xls document into the mobile device memory (with writing to its local drive)? What would be the easiest/more elegant way to achieve it?
Thanx in advance
Unity3D supports using managed DLLs in your project. So you can look at using one of the following libraries:
http://sergeytihon.wordpress.com/2013/09/29/dropbox-for-net-developers/
https://developers.google.com/drive/web/quickstart/quickstart-cs
and choose one.
There's also code for reading excel files from Unity3D on the wiki: http://wiki.unity3d.com/index.php?title=Load_Data_from_Excel_2003
And code for using SQLite: http://wiki.unity3d.com/index.php/SQLite.
From there putting it together is up to you, the documentation from each link should be more than enough to implement what you want. I'd recommend abstracting the differences in platforms here (Google Drive vs Dropbox) from your main game (It doesn't sound like you've made up your mind on one), in case you want to change them later on. In my experience, operations between the two are similar enough to make changing over reasonably straight forward down the road as long as you're using the proper abstraction techniques.
I would like to emulate video input from a webcam for testing purposes.
So I need to be able to emulate a software video capture device in Windows and be able to dynamically generate its output.
How can I achieve this?
I would prefer a solution in C# or C++.
You can use a Virtual Webcam (old link, but there are others) it will take a video/images file and will display it in a webcam device. Your system will think that its a normal device.
Then you will need to create something that will generate the video/images, if you need static image then its pretty easy to generate a bmp.
Old (no selected answer) question.... actually probably one of the oldest I've ever seen... but I came across this looking for an answer myself, I remembered the day when "Virtual Webcam" still existed (now just a chinese ad site).
Fear not! There are new sources to solve your decade long quest:
First of all, checkout OBS, open source does a LOT with video streams:
https://obsproject.com/
Second, checkout this virtual webcam plugin for it. Does exactly what you're talking about, and does use #qbeuek's suggestion of DirectDraw:
https://obsproject.com/forum/resources/obs-virtualcam.949/
It is written in C++, so grabbing the bits you need and rewriting to C# is left as an exercise to the reader, but the capability is there.
As far as I know, there is a set of COM interfaces that govern the recording and playback of audio and video in Windows. It used to be called DirectShow, but maybe in the meantime the name has been changed. Those interfaces are used to construct a graph of audio and video filters, to encode / decode the data stream.
The way to go:
- read about the Microsoft DirectShow API,
- implement a COM object that implements the video source interface,
I'm looking at options for adding streaming video to a social web site written in ASP.NET/C#. I have a great deal of experience with Flash too, so I'm comfortable using FLV players, but I'd definitely go Silverlight if the right library is available.
The library would need to be able to encode user uploaded video in a web format.
I imagine playback will be Flash or Silverlight based.
It would need to create thumbnails of the video.
It would need to have server software for streaming the video or have some 3rd party way of doing so.
I don't mind paying a licensing fee for the software, so it does not have to be open source or free.
The license must allow use on a commercial web site.
The closest thing I have found is MediaSoft's offering. But I never heard of this company before starting my search and don't know anyone using their software. They seem to be using FFMPEG to perform encoding, which I heard can spawn legal issues for commercial web sites. Though I'm not very familiar with the licensing of FFMPEG myself so please correct me if I heard wrong.
Has anyone used MediaSoft? Any other video libraries that you have used that worked well? Did you just end up writing your own video encoding and serving library?
Not sure about Silverlight, but Flash will render both h.264 and FLV videos. FFMPEG can convert into both via liblame for FLV and x264 for h.264. It can also generate thumbnails.
It and the corresponding modules are licensed under the LGPL/GPL which means you can use FFMPEG to generate videos/thumbnails without restriction as long as you have the rights to the original movies that you're transcoding. The GPL/LGPL license restrictions only apply to the FFMPEG code/binaries which won't matter until you decide to distribute those binaries to other people.
In addition to the above answer, you can look at red5 as a streaming solution
http://osflash.org/red5