MIDI or WAV file to an array of frequencies and duration - c#

Is there any script/software/algorithm which allows to convert a MIDI (or WAV) file to a list of <frequency, duration> so that we can replay an 'image' of this sound file, for example, through the System.Console.Beep(frequency, duration) function in C#?

You need to convert the MIDI, WAV or other sound file to raw audio samples. Then for successive blocks of samples (typically overlapping each block by 50%), apply a window function (e.g. Hanning), then an FFT, then take the magnitude of the FFT output bins, then for audio you would usually take 20*log10 of this magnitude to get a dB value.

For MIDI, you must either parse the file yourself (which I have done, and I recommend the following two references: one and two), or get a MIDI toolkit. I don't know of any for .NET but here is a Google search.
Once you get that, it should be fairly easy. Read in the MIDI file using the toolkit, and this will give you a set of tracks. Each track contains a sequence of events, each with a timestamp relative to the previous event. An event can be "note on", "note off", or one of hundreds of other events you probably don't care about and can ignore for this exercise. Just look for the "note on" and "note off" events. Usually, each note is a "note on" (with a certain pitch and velocity, which is volume) followed by a "note off" some time later (with the same pitch, and a velocity of 0).
So armed with this information, you can construct a table of notes with a quadruple (start time, duration, pitch, velocity), where start time is the time of the "note on" event, duration is the time difference between "note on" and "note off", and pitch/velocity is the pitch/velocity of the "note on". You can convert the pitch to frequency using this formula.
As for WAV/MP3/AAC/OGG, all of those have the same technique which is what Paul suggests in his answer.

Paul R's explanation is fine for WAV.
For MIDI, you're going to have to pick a track and read in the MIDI data. How you decide which track is up to you, but you can really only pick one, since you only get one "note" at a time out of the PC speaker, using your method.
C# MIDI Tutorial: http://www.codeproject.com/KB/audio-video/MIDIToolkit.aspx
Once you have read up on that, you should know how to read a MIDI file in. From there, you can translate that to frequencies and durations. The duration depends on tempo and the number of ticks that a note lasts, and the pitch will depend on a note number and its corresponding frequency according to equal temperament. (If you wanted to get really crazy, you could even handle alternate tunings, but I wouldn't worry about it for now.)
Also, I believe NAudio has some MIDI classes for reading files, but they may not be complete.
While we're getting crazy... if you could thread it effectively (this would be near impossible I'd imagine, but...), for WAV playback, you could use PWM to drive the PC speaker and emulate PCM audio playback. I remember some old DOS games from Necrobones used to do this, and there was a driver for Windows 3.1 that worked great on my 33MHz laptop for the usual clicks and dings. Although this method from a managed framework (or even within Windows without a realtime priority) might be very difficult.

Related

How to stream music in C# with PostgreSQL

I am trying to recreate some features of Spotify in C# using the PostgreSQL database.
The reason is simple, I want to gain more knowledge, and I think this is a good challenge.
But I ran into an obstacle that I've been standing for days. Spotify he doesn't download the music, he does her streaming, plays the music while it's downloaded.
However, I can't do this in C#, I'm using the PostgreSQL database.
I'm well locked in this part, I tried several implementations, but I think I'm not on the right track, and on the internet I imagine I'm looking wrong, otherwise I would have found it.
Do you have any guidance for this streaming process in C#? I've tried to read the large_object bytes from PostgreSQL, but couldn't.
Any suggestions or guides about the process are welcome.
You start by getting the file into the database or its network location into the database, whichever gives you better performance; Then start with creating an implementation of a bytestream. You want to be transmitting raw data to c#.
you then build a real time interpreter that takes in using your file format, one byte at a time, and plays the value associated with that section. does that make sense? this is simple to do with many libraries and the brunt of it is just figuring those out.
You seem like you've PROBABLY got that first part down, and are instead having issues with the database. A lot of things we did at my last company involved saving file network locations and indexing files on disk. You might be able to instead point your streamer to a file locally using a server, and instead transmit data from one point to another in that manner instead.
You seem more than capable of doing this just judging by your speech. I hope this comment was helpful, and if it was not I apologize as well. I would be interested in seeing your finished result.
for clarification here would be that workflow:
request for a song listed in table dbo.Songs
matches that song onto dbo.songlocation
streams from dbo.songlocation.location from the filename dbo.songlocation.songname = dbo.song.name and verified directory returns true
enjoyment of that music

Obtain the current audio playback stream

So, I need to obtain the current audio playback that is happening on my pc programatically in real time. This includes all the audio(you can check this in the playback devices tab of the Sound settings). I need it because I'm doing an analysis over the stream which I then put into use in another software. Until now, I've used the 'StereoMix' option, which actually relay's the current audio as an input(something like an internal microphone). However, I can't use this method when I connect external speakers to my pc(through HDMI, PC/AUX works though).
Is there some way to obtain the audio stream no matter if external speakers are connected or not.
The programming language does not matter in the current case, everything is fine with me. However, I prefer if there is a C# / Processing solution.
EDIT:
Here's the technique(and method) I currently use to obtain the audio in http://code.compartmental.net/minim/minim_method_getlinein.html. The library/code is related to Processing: https://processing.org/.
Basically, NAudio would be a good place to look for a prospective solution. Its not quite clear what you intend to do with the audio such as if you're recording/dumping data, or simply analyzing live-data so I'm thinking NAudio is going to have something such as you're looking for, as far as getting your hands on live-data.
NAudio has an FFT, but not quite robust in the area of analysis as the JS-libs you may be accustomed to ;)
http://naudio.codeplex.com/
https://github.com/naudio/NAudio
There are plenty of examples provided to get you started, and many in the wild.
Though its pretty outdated and the API may or not look slightly different (in regard to...), the following video may provide a nice relaxing quick-start to help familiarize you with this lib.
C# Audio Tutorial 6 - Audio Loopback using NAudio

Writing to a microphone's output buffer

I'm wanting to create a fun little project to function as a Skype sound-board. That is to say, if you press a hotkey (say, NumPad 1), the sound-board plays a pre-determined WAV file over the call. Really only to be used for stupid in-jokes and other silliness with friends.
The way I envision handling this problem is writing to the microphone's output buffer. However, I cannot find any ideas on how to do this. I found this question regarding general audio handling, but the output examples for nAudio are rather generic and don't handle writing to a specific device.
Ideally, I want to get the default audio input device for the system (so the default microphone) and then write the WAV data to the buffer it's using for transmission.
The first problem appears to be tenable with the XNA framework and its Microphone object. It has a Default static method that should get me what I need. But the Microphone object itself doesn't have an obvious way to write to the buffer, which leaves me a little stuck.
Are there any ideas on how to do this? Am I running down the wrong path? Is the Microphone object even the correct thing to use here?

Run a script when there is no audio

i am trying to download a long tutorial from a website containing a lot of links, and I would like to do that automatically.
I need to create a script that listen to audio, if the program does not hear anything after 5 sec then it should click on the next button (I know how to simulate a click).
I have never worked with audio, could you please advice me an api/function that would listen to the
sound and return a value (true, false) or anything like that when it does not hear anything.
Many thanks
This is recording: http://msdn.microsoft.com/en-us/library/ff827802.aspx
And then you would have to know the exact WaveFormat of the recorded sound. If you've got the exact WaveFormat (e.g. 16 bit pcm mono), you could iterate through , check whether it is within a specific range. If all of the sample is for example smaller than 0.1 it is silence. If not... click.
you want to download the tutorial and further information of a website? why write a script for that by hand? take a look at existing tools. e.g. http://pagenest.com/
Don't know if that fits your requirement but there are quite some tools for downloading website information
BR

VoiceXML - Recognize DTMF in Recording

I've been doing IVR work for a while, but we have a case where I'd love some expertise/feedback:
Is it possible to record a message where the user could press a DTMF tone to indicate a pause where we would insert our own sound? In this scenario, the user would record something like: "Good Morning, [DTMF], please call the office at [DTMF] to reconcile your account.".
Not sure whether we would chop the resulting WAV file into pieces to insert our variables, or do some post-processing before sending out our message.
Does anyone have any experience with something like this?
Thanks
Jim Stanley
Blackboard Connect
In VoiceXML you would use a record element to record a message from a user. The record element has an attribute call dtmfterm which if set to true (default setting) will terminate recording. If this attribute is set to false then recording is terminated when maxtime setting is reached or silence for the duration of finalsilence is reached. Having dtmfterm set to false will just result in the DTMF being part of the recording. Setting dtmfterm to true will result in the recording being terminated.
I have created applications that use caller created recordings but never one that manipulates the recordings like in your requirements. What you may be able to do is concatenate recordings together. Here is a QA that shows how to concatenate wav recordings using C#.
What you will have to experiment with is whether you can catch which DTMF key was pressed by using grammars. The spec eludes to this but it may be somewhat specific to the VoiceXML IVR platform that you are using. If you know what DTMF key was used then you can instruct the user to press * to insert silence and # to terminate recording. Both will terminate a recording but the logic in your VoiceXML will go right back into recording again if the * is pressed and stop the recording process completely if the # is pressed. Then you would use the concatenation to string these recording together and use a wav file with pre-recorded silence in the concatenation process that is inserted between the users recorded snippets.
From the tags it looks like you are using C# and MVC for your VoiceXML application. There is an open source project called VoiceModel that makes it easier to develop VoiceXML applications using ASP.NET MVC 4. You can read about how it handles recording in this environment here.
If you want to insert a pause and want to stay within the UI tag , So far how much work I had in IVR, the only dtmf with which we could stay within the UI is * and we would return a grammar "REPEAT" on pressing '*' , in the UI condition tag for REPEAT , you would add the silence (pause) wav file.
The recording part , we used osdmtype = record which mapped to an xslt which helped in the recording and recognising Customer's answer yes/no.
But nevertheless I'm bit confused on the requirement exactly , would need more details.
Sorry can't add comments as don't have enough Rep.
You can mail me or i can add more answers here.

Categories

Resources