Extract word timings from MP3 audio file

Extract word timings from MP3 audio file - c#

I want to capture audio timings for each word in the audio.
For example, "Introduction Hi Bangalore" consider this is one mp3file(page1.mp3), from that how can extract each word starting time.
i.e., The starting time of "Introduction", "Hi", "Bangalore".
If it is possible?, Possible means, how can i do?
If any software available?.
I have tried to extract timing using system.speech API using bookmark-reach event but it not as accurate as I want.

Related

Search an Audio inside another Audio in C# or C++

Suppose there is a sample audio file that contains up to 10 simple words
"One Two Three .... Ten"
and there is 1 second silence between each number in the audio file.
I want to check to see if the audio file contains keyword "Two" for example.
Please note that I have the keyword "Two" voice file and it's the same exact voice from the master voice file, but It may contain some noise.
Is there a way for me to search the voice "Two" inside that bigger audio file and find the occurrence time?

Since there's no provided code, i'll just give you the idea how to proceed, hope it'll help.
First you have to split your file to 10 different audio files according to silence (I'm sure there are libraries that will help you do that).
Then you can send the file to google voice recognition api, and get a string as a result, which will contain the string according to the voice in the file.
EDIT: Please refer to:
https://googlespeechtotext.codeplex.com/
How to use google speech recognition api in c#?

Why don't you just work out to convert both the audio samples into some bits or signal formats and check if they both have a some common strings.
Some of the links you should check before going any further just to work out with audio in .Net:
http://crsouza.com/2009/08/converting-audio-bit-depths-in-c/
https://cscore.codeplex.com/
http://www.codeproject.com/Articles/501521/How-to-convert-between-most-audio-formats-in-NET
Let me know if you can work this out.

Train voice recognition

I have built a voice recognition system in C# and I’m using the Microsoft Speech Platform 11.0 (Swedish language packs). I use a wav file as an input for the SpeechRecognitionEngine.
The problem is that some of the words (40%) are not recognized at all.
I would like to record some commands (a limited number of Swedish words and/or numbers) to a sound file and import them so that the SpeechRecognitionEngine could be able to understand them.
For example:
Record when an user says the word: “Katt” (Swedish word for cat) and then be able to tell the RecognitionEngine that this means “Katt”.
Is this possible or are there better solutions?

Yes its possible. Create a lookup entry file for your words and write soundex in c# and load the lookup grammar. It will display whichever the word you pronounced but it should be in the lookup.

Detect DTMF Tones C#

I have wav file in which using the naudio lib i have been able to get raw data out of the wav files.
Does any one know how to loop though the data in chuncks detecting DTMF tones?

The NuGet package DtmfDetection.NAudio provides extension methods and wrappers to detect DTMF tones in live (captured) audio and pre-recorded audio files.
On the GitHub site of the project you can find a sample program.

Well, on the top of the google is this:
http://sourceforge.net/projects/dtmf-cs/
But, if you want to use heavy artillery, you can always FFT your samples and check what two freqs are seen the most.
BTW, do some searching before you post anything, and you'll come up with:
Detect a specific frequency/tone from raw wave-data
or even
Is it possible to detect DTMF tones using C#

I've gone with http://www.tapiex.com/ToneDecoder.Net.htm
Its cheeap and does a good job at detection. All the others i found dont seem to do the job or have no documentation

DTMF stands for dual-tone multi frequence signaling. So you have to detect the two frequencies used to send a signal.
You have to transform your timebased audio material into the frequency domain typically by using a FFT algorithm.
Here i found a very old VB5 program with source online which does exactly what you want i think: http://www.qsl.net/kb5ryo/dtmf.htm
EDIT: Ok, maybe its better to take a look at the suggested C# lib.

Acoustic Audio Comparing Library

I need a software or a library which handles with audio comparison, but not using the tag's inside mp3 ,it should compare similarity or confidence between 2 audio Files, or if i cut a piece from an audio file, the software should point where is that file token from the main audio file (i hope i was clear enough).
So how i heard this technology is called Audio Acoustic Comparing, and based on some audio sample file, which we can call fingerprint. The software should point me if it finds an equivalent of the input sample or fingerprint, somewhere in the file.
Bests.

libfooid is free. It's dual licensed as GPL and as a BSD like license.

check An Industrial-Strength Audio Search Algorithm (PDF) at http://www.ee.columbia.edu/~dpwe/papers/Wang03-shazam.pdf

How do I split an mp3 file into smaller files?

I want to make a program that takes an MP3 file and breaks it into many smaller mp3 files based on 1-2 seconds of no sound (silence).
What is the easiest way to do this in c#?

Bass library. Bass has everything you need. It can access, record and edit media streams like mp3s, allowing you to sample the volume at different time points. It has a .net api, so you can use it in c#. Unfortunately it does cost money if you are using it for a commercial application, but they do provide a free non-commercial license.
Sox is a command-line tool which has an option to split an mp3 on n seconds of silence. You could always use the system command to call sox from c#.
Other related links.
Ripping a CD to mp3 in C# - third party component or api out there?
Audio Libraries for MP3 editing
How do I merge/join mp3 files with c#
This code shows a way to make a CD
ripper in C#. There are APIs from some
vendors that allow reading audio CD
tracks but it is also possible to do
it using APIs that allow low level
access to CD drives such as ASPI from
Adaptec or IOCTL control codes. The
latter method is used in this case,
because there is no need to install
any third party software, it is
completely covered by Win32 API
functions.
http://www.codeproject.com/KB/cs/csharpripper.aspx

Splitting the MP3 stream will be difficult to do with any degree of precision. The compressed MP3 data exists as sequential chunks of audio data comprised of many samples. The easiest way to perform this would be to decode the stream either progressively or in its entirety, perform your manipulation, then re-encode it (which as I understand is how most jukebox software does it)

Having a solid knowledge of the file's binary format would be a good place to start. That done, you'll know what silence looks like in the file. You may have to define exactly what silence is. Presuming that, like most audio, it started from an analog source, there's almost certainly some noise buried in the "silence". What will your tolerance for ambient/background noise be?
Once you know what you're looking for, just scan through the file, looking for "it".
Simple ...

A program to do this already exists:
http://mp3splt.sourceforge.net/mp3splt_page/home.php

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Extract word timings from MP3 audio file - c#

Related

Search an Audio inside another Audio in C# or C++

Train voice recognition

Detect DTMF Tones C#

Acoustic Audio Comparing Library

How do I split an mp3 file into smaller files?

Categories

Resources