C# extract frames from part of a video file

C# extract frames from part of a video file - c#

Using AForge ffmpeg wrapper you can extract frames from a video using the VideoFileReader class and save it as a bitmap.
See this for the exemple:
Extracting frames of a .avi file
My problem with that is that you cannot specified where to start reading the frames. It always starts from the beginning of the video file.
But what if i wanted to extract frames that are in the middle of a two hours long video file. Using that class you'd have to parse the whole first hour juste to get to those frames.
Does anyone know a way to achieve that?

If you know where in the video you want to start reading, just skip the appropriate number of frames; there's no need to process any of them.
This assumes, of course, that you know the exact frame number you want to start reading from, which you can calculate by multiplying the framerate by the time at which you want to perform the extraction. In your example, if the video is two hours long and you want to extract frames from the middle...
VideoFileReader reader = new VideoFileReader();
reader.Open("file.avi");
// Jump to 1 hour into the video
int framesToSkip = reader.FrameRate * 3600; // 1 hour = 3600 seconds
for (int i = 0; i < framesToSkip; i++)
reader.ReadVideoFrame();
// Now the next time ReadVideoFrame() is called, we will get the frame at the 1 hour mark
This assumes that the .FrameRate property returns the value in frames per second. Unfortunately the documentation doesn't say, so I'm not sure how it handles video files with non-integral framerates (i.e. 29.97 is a common framerate.)

Related

How to display current media player position in frames rather then seconds - UWP MediaPlayerElement

I'm using the MediaPlayerElement in a UWP app. I'm hoping to convert the displayed value from TimeElapsedElement and TimeRemainingElement in frames rather then seconds.
So currently the value I recieve is: hh:mm:ss:ff (ff fractions of a second)
The value I would like to have is: FFFFF (the number of total frames elapsed thus far) or a standard time-code value such as hh:mm:ss:FF (FF being frames elapsed in the current second)
It seems I'm unable to access and modify the TimeElapsed and TimeRemaining elements in particular but wondering if those values would be suitable for some kind of conversion that gets the frame-rate of the media playing and converts the value into frames.
Alternatively, I was wondering if I could use MediaPlayer.PlaybackSession.Position to create my own variable that is in frames rather then seconds and display that value instead. I imagine there is some conversion occurring already that takes the framerate of the loaded media.
Wondering what method you would recommend and if you could point me in the direction of any resources for time to frames conversion in c#? Thanks for any help you can provide!

Currently MediaPlayerElement does not provide information on how many frames have passed, time is a more general expression.
If you need to get how many frames are currently passing, you need to do some calculations:
Frame rate * Play time
How to get frame rate
List<string> videoProperties = new List<string>();
videoProperties.Add("System.Video.FrameRate");
IDictionary<string, object> retrieveProperties = await file.Properties.RetrievePropertiesAsync(videoProperties);
double frameRate = ((uint)retrieveProperties["System.Video.FrameRate"])/1000.0;
How to get past frames
var sec = MyPlayer.MediaPlayer.PlaybackSession.Position.TotalSeconds;
var frameCounts = sec * frameRate;
In most situation it is applicable, but this can only be used as a reference value. Because the frame rate of some videos is not fixed.
Thanks.

C# - Get byte position of MediaElement

Basically, I want to get the current byte of the MediaElement at its current playback position. For example, when it is at 5 seconds, the byte position would be 1024kb. I don't want to multiply the bitrate with the current time as that is not accurate.
All I need is to get the byte position at certain durations.
So is there anyway I could get this? I'm open to other options. (Does FFProbe support this?)

I've tried everything and there is no way to do this directly using MediaElement.
The only way is to get the frame number of the video by multiplying the framerate with the timecode of the byte position you want to get.
Then use a program like BmffViewer which analyzes the moov atom of the video header. Then go to the stco entries of the track you want to analyze and get the chunk offset of the frame you calculated earlier.

How to get sound amplitude of my wav file with respect to time?

I have a wav file and all what i need is to perform a function when a remarkable intensity of sound plays.
For example : if there is a sound of intensity level 10 (supposed) is playing so i want that when ever the intensity level of sound increases from 10 then an event should be triggered to tell me that there is a remarkable sound.
I tried to google it and found that if we read the bytes of wav file and read the data chunk (after 44th byte) we get the user data (sound data). but when i analyse this data i got confused because there is also same data where there is no sound.
I hope my question is quite clear.
so please i need your suggestions/ideas and references.

You don't need an FFT for this - you can just compute the short term RMS power and when this exceeds a predetermined threshold then you have a "loud" sound.
power_RMS = sqrt(sum(x^2) / N)
where x is the sample value and N is the number of samples over which you want to compute RMS power - I would suggest using a period of say 10 ms which gives N = 441 samples at a 44.1 kHz sample rate.

How to combine SoundEffectInstances into a new Sound File /mp3 or wav

I'm working on the new WindowsPhone platform. I have a few intances of a SoundEffectInstance that I would like to combine into a new single Sound file (either SoundEffectInstance, SoundEffect or MediaElement, it does not matter.) I then want to save that file as an mp3 to the phone.
How do I do that? Normally, I would try to send all the files to a bytearray but I'm not sure if that is the correct method here, or how to convert the bytearray into an MP3 format sound.
So for example I have SoundEffectInstance soudBackground, playing from 0 - 5 seconds. I then have SoundEffectInstance chime playing from 3 - 4 seconds, and SoundEffectInstance foreground playing from 3.5 to 7 seconds. I want to combine all these into a single mp3 file that lasts 7 seconds long.

There are two task that you are trying to accomplish here:
Combine several sound files into a single sound file
Save the resulting file as an MP3.
As far as I have found thus far you will have a good bit of challenges with item 2. To date I have not found a pure .Net MP3 encoder. All the ones I find rely on P/Invokes to native code (Which of course won't work on the phone).
As for combining the files, you don't want to treat them as a SoundEffectInstance. That class is only meant for playing and it abstracts most of the details of the sound file away. Instead you will need to treat the sound files as arrays of ints. I'm going to assume that the sample rate on all three sound files is the exact same and that these are 16-bit recordings. I am also going to assume that these wave files are recorded in mono. I'm keeping the scenario simple for now. You can extend upon it with stereo and various sample rates after you've mastered this simpler scenario.
The first 48 bytes of the wave files is nothing but header. Skip past that (for now) and read the contents of the wave files into their own arrays. Once they are all read we can start mixing them together. Ignoring the time differences in which you want to start playing these sounds if we wanted to start producing a sample that is the combined result of all three we could do it by adding the values in the sound file array together and writing that out to an array to hold our result. But there's a problem. 16-bit numbers can only go up to 32,767 (and down to -32,768). If the combined value of all three sounds were to go beyond these limits you'll get really bad distortion. The easiest (though not necessarily the best) way to handle this is to consider the maximum number of simultaneous sounds that will play and scale the values down accordingly. From the 3.5 second to 4 second mark you will have all three sounds playing. So we will scale by dividing by three. Another way is to sum up the sound samples using a data type that can go beyond this range and then normalizing the values back to this range when you are done mixing them together.
Let's define some parameters.
int SamplesPerSecond = 22000;
int ResultRecordingLength = 7;
short[] Sound01;
short[] Sound02;
short[] Sound03;
int[] ResultantSoundBuffer;
short[] ProcessedResultSoundBuffer;
//Insert code to populate sound array's here.
// Sound01.Length will equal 5.0*SamplesPerSecond
// Sound02.Length will equal 1.0*SamplesPerSecond
// Sound03.Length will equal 3.5*SamplesPerSecond
ResultantSound = new int[ResultRecordingLength*SamplesPerSecond];
Once you've got your sound files read and the array prepared for receiving the resulting file you can start rendering. There's several ways we could go about this. Here is one:
void InitResultArray(int[] resultArray)
{
for(int i=0;i<resultArray.Length;++i)
{
resultArray[i]=0;
}
}
void RenderSound(short[] sourceSound, int[] resultArray, double timeOffset)
{
int startIndex = (int)(timeOffset*SamplesPerSecond);
int readIndex = 0;
for(int readIndex=0;((readIndex<sourceSound.Length)&&(readIndex+sourceSound<resultArray.Length;++readIndex)
{
resultArray[readIndex+startIndex] += (int)sourceSound[readIndex];
}
}
RangeAdjust(int[] resultArray)
{
int max = int.MinimumValue;
int min = int.MaximumValue;
for(int i=0;i<resultArray;++i)
{
max = Math.Max(max, resultArray[i]);
min = Math.Min(min, resultArray[i]);
}
//I want the range normalized to [-32,768..32,768]
//you may want to normalize differently.
double scale = 65536d/(double)(max-min);
double offset = 32767-(max*scale);
for(int i=0;i<resultArray.Length;++i)
{
resultArray[i]= (scale*resultArray[i])+offset;
}
}
You would call InitResultAttay to ensure the result array is filled with zeros (I believe it is by default, but I still prefer to explicitly set it to zero) and then call RenderSound() for each sound that you want in your result. After you've rendered your sounds call RangeAdjust to normalize the sound. All that's left is to write it to a file. You'll need to convert from ints back to shorts.
short[] writeBuffer = new short[ResultantSound.Length];
for(int i=0;i<writeBuffer.Length;++i)
writeBuffer[i]=(short)ResultantSound[i];
Now the mixed sound is all ready to write to the file. There is just one thing missing, you need to write the 48 byte wave header before writing the file. I've written code on how to do that here: http://www.codeproject.com/KB/windows-phone-7/WpVoiceMemo.aspx

Timing in C# real time audio analysis

I'm trying to determine the "beats per minute" from real-time audio in C#. It is not music that I'm detecting in though, just a constant tapping sound. My problem is determining the time between those taps so I can determine "taps per minute" I have tried using the WaveIn.cs class out there, but I don't really understand how its sampling. I'm not getting a set number of samples a second to analyze. I guess I really just don't know how to read in an exact number of samples a second to know the time between to samples.
Any help to get me in the right direction would be greatly appreciated.

I'm not sure which WaveIn.cs class you're using, but usually with code that records audio, you either A) tell the code to start recording, and then at some later point you tell the code to stop, and you get back an array (usually of type short[]) that comprises the data recorded during this time period; or B) tell the code to start recording with a given buffer size, and as each buffer is filled, the code makes a callback to a method you've defined with a reference to the filled buffer, and this process continues until you tell it to stop recording.
Let's assume that your recording format is 16 bits (aka 2 bytes) per sample, 44100 samples per second, and mono (1 channel). In the case of (A), let's say you start recording and then stop recording exactly 10 seconds later. You will end up with a short[] array that is 441,000 (44,100 x 10) elements in length. I don't know what algorithm you're using to detect "taps", but let's say that you detect taps in this array at element 0, element 22,050, element 44,100, element 66,150 etc. This means you're finding taps every .5 seconds (because 22,050 is half of 44,100 samples per second), which means you have 2 taps per second and thus 120 BPM.
In the case of (B) let's say you start recording with a fixed buffer size of 44,100 samples (aka 1 second). As each buffer comes in, you find taps at element 0 and at element 22,050. By the same logic as above, you'll calculate 120 BPM.
Hope this helps. With beat detection in general, it's best to record for a relatively long time and count the beats through a large array of data. Trying to estimate the "instantaneous" tempo is more difficult and prone to error, just like estimating the pitch of a recording is more difficult to do in realtime than with a recording of a full note.

I think you might be confusing samples with "taps."
A sample is a number representing the height of the sound wave at a given moment in time. A typical wave file might be sampled 44,100 times a second, so if you have two channels for stereo, you have 88,200 sixteen-bit numbers (samples) per second.
If you take all of these numbers and graph them, you will get something like this:
(source: vbaccelerator.com)
What you are looking for is this peak ------------^
That is the tap.

Assuming we're talking about the same WaveIn.cs, the constructor of WaveLib.WaveInRecorder takes a WaveLib.WaveFormat object as a parameter. This allows you to set the audio format, ie. samples rate, bit depth, etc. Just scan the audio samples for peaks or however you're detecting "taps" and record the average distance in samples between peaks.
Since you know the sample rate of the audio stream (eg. 44100 samples/second), take your average peak distance (in samples), multiply by 1/(samples rate) to get the time (in seconds) between taps, divide by 60 to get the time (in minutes) between taps, and invert to get the taps/minute.
Hope that helps

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

C# extract frames from part of a video file - c#

Related

How to display current media player position in frames rather then seconds - UWP MediaPlayerElement

C# - Get byte position of MediaElement

How to get sound amplitude of my wav file with respect to time?

How to combine SoundEffectInstances into a new Sound File /mp3 or wav

Timing in C# real time audio analysis

Categories

Resources