does anyone know how i can retrieve the frame dimension of a mpeg4 video (non h264, i.e. Mpeg4 Part 2) from the raw video bitstream?
i´m currently writing a custom media source for windows media foundation, i have to provide a mediatype which needs the frame size. it doesn´t work without it.
any ideas?
thanks
I am not getting you. Are you trying to know the width and the height of the video being streamed? If so (and I guess that it is the "dimension" you are looking for) heres how:
Parse the stream for this integer 000001B0(hex) its always the first thing you get streamed. If not, see the SDP of the stream (if you have any, and search for config= field, and there it is... only now it is a Base16 string!
Read all the bytes until you get to the integer 000001B6(hex)
You should get something like this (hex): 000001B0F5000001B5891300000100000001200086C40FA28 A021E0A2
This is the "stream configuration header" or frame or whatever, exact name is Video Object Sequence. It holds all the info a decoder would need to decode the video stream.
Read the last 4 bytes (in my example they are separated by one space -- A021E0A2)
Now observe these bytes as one 32-bit unsigned integer...
To get width read the first 8 bits, and then multiply what you get with 4
Skip next 7 bits
To get height read next 9 bits
In pseudo code:
WIDTH = readBitsUnsigned(array, 8) * 4;
readBitsUnsigned(array, 7);
HEIGHT = readBitsUnsigned(array, 9);
There you go... width and height. (:
Related
I am trying to convert YUV420 frames to Bitmap or Image. I am reading these frames from an MP4 video in C# using the AVBlocks library. So, after creating an input and output socket using AVBlocks classes, I then pull each frame from the video with a YUV420 color format and UncompressedVideo stream type. I basically do this by calling Transcoder.Pull(int outputIndex, MediaSample outputData) and then the MediaBuffer that's part of the outputData has the data in an array of bytes. So I am trying to convert these bytes to a Bitmap or Image so that I can eventually show each frame into a PictureBox in the Winforms application.
What I've tried:
I have tried using a MemoryStream, as shown below, but I get an unhandled ArgumentException saying that the parameter is not valid. I tried using ImageConverter() as well to convert to an Image, but I get the same exception. Then, I converted the byte array from YUV to RGB format and gave the updated array as a parameter to the MemoryStream, but again no luck. I also tried changing the color format of the output socket from YUV420 to a BGR format, but it resulted in the same issue as above. The code that tries to convert to a bitmap using MemoryStream:
while (transcoder.Pull(out inputIndex, yuvFrame))
{
buffer = (MediaBuffer) yuvFrame.Buffer.Clone();
Bitmap b;
byte[] temp = new byte[buffer.DataSize];
Array.Copy(buffer.Start, buffer.DataOffset, temp, 0, buffer.DataSize);
var ms = new MemoryStream(temp);
b = new Bitmap(ms);
}
The aforementioned exception is thrown in the last line of the code. I'm not sure if it's the color format or the stream type, or something else that's causing the problem. If someone wants to see more of the code (setting up input & output sockets etc), let me know. For reference, the link to the example I've been following from AVBlocks is this and the link to MediaBuffer class is this.
The Bitmap(MemoryStream ms) constructor expects the bytes from an actual file, like a png, jpeg, bmp or gif. If I'm reading this correctly, you don't have that; you only have pure RGB triplets data. That isn't enough, because it lacks all information about the image's width, height, colour depth etc.
You will need to actually construct an image object from the RGB data. This isn't really trivial; it means you need to make a new image object with the correct dimensions and colour format, then access its backing bytes array, and write your data into it. The actual code for creating an image out of a byte array can be found in this answer.
Note that you'll have to take into account the actual stride in the resulting data you get; the amount of bytes on each line of the image. Images are saved per line, and those lines are usually padded to a multiple of 4 bytes. This obviously messes up a lot if you don't take it into account.
If your data is completely compact, then the stride to give to the BuildImage function I linked to will just be your image width multiplied by the amount of bytes per pixel (should be 3 for 24bpp RGB), but if not, you'll have to pad it to the next multiple of 4.
Basically, I want to get the current byte of the MediaElement at its current playback position. For example, when it is at 5 seconds, the byte position would be 1024kb. I don't want to multiply the bitrate with the current time as that is not accurate.
All I need is to get the byte position at certain durations.
So is there anyway I could get this? I'm open to other options. (Does FFProbe support this?)
I've tried everything and there is no way to do this directly using MediaElement.
The only way is to get the frame number of the video by multiplying the framerate with the timecode of the byte position you want to get.
Then use a program like BmffViewer which analyzes the moov atom of the video header. Then go to the stco entries of the track you want to analyze and get the chunk offset of the frame you calculated earlier.
Recently, I was trying to answer another SO question about loading the frames (Bitmap and duration) of animated GIFs. The code can be found on pastenbin.
While doing additional tests on this code before moving it into my dev library, I noticed that there is a problem with this line of code:
//Get the times stored in the gif
//PropertyTagFrameDelay ((PROPID) 0x5100) comes from gdiplusimaging.h
//More info on http://msdn.microsoft.com/en-us/library/windows/desktop/ms534416(v=vs.85).aspx
var times = img.GetPropertyItem(0x5100).Value;
When running this on Windows .Net using this (example GIF), the array is of the same size as the amount of frames in the animated GIF and filled with the durations of the frames. In this case a byte[20] which converts to (BitConverter.ToInt32()) 5 durations:
[75,0,0,0,125,0,0,0,125,0,0,0,125,0,0,0,250,0,0,0]
On MonoMac however, this line of code for the same example GIF returns a byte[4] which converts to only one duration (the first):
[75,0,0,0]
I tested this for 10 different GIF's and the result is always the same. On Windows all durations are in the byte[], while MonoMac only lists the first duration:
[x,0,0,0]
[75,0,0,0]
[50,0,0,0]
[125,0,0,0]
Looking at the Mono System.Drawing.Image source code, the length seem to be set in this method, which is a GDI wrapper:
status = GDIPlus.GdipGetPropertyItemSize (nativeObject, propid,out propSize);
However, I don't really see any problems, not with the source as with my implementation. Am I missing something or is this a bug?
I don't see anything wrong in the mono source either. It would have been helpful if you would have posted one of the sample images you tried. One quirk about the GIF image format is that the Graphics Control Extension block that contains the frame time is optional and may be omitted before an image descriptor. Non-zero odds therefore that you have GIF files that just have one GCE that applies to all the frames, you are supposed to apply the same frame time to every frame.
Do note that you didn't get 4 values, the frame time is encoded as a 32-bit value and you are seeing the little endian encoding for it in a byte[]. You should use BitConverter.ToInt32(), as you correctly did in your sample code.
I therefore think you should probably use this instead:
//convert 4 bit value to integer
var duration = BitConverter.ToInt32(times, 4*i % times.Length);
Do note that there's another nasty implementation detail about GIF frames, frames #2 and up do not have to be the same size as the frame #1. And each frame has a metadata field that describes what should be done with the previous frame to merge it with the next one. There are no property IDs that I know of to obtain the frame offset, size and undraw method for each frame. I think you need to render each frame into a bitmap yourself to get a proper sequence of images. Very ugly details, GIF needs to die.
If you look into libgdiplus you will see that the properties are always read from the active bitmap:
if (gdip_bitmapdata_property_find_id(image->active_bitmap, propID, &index) != Ok) {
You can set the active bitmap by calling Image.SelectActiveFrame and then mono will return the correct durations, one by one. Since this is an incompatibility with windows, I'd call it a mono bug. As a simple workaround, you can of course just check the array length and handle both cases. This will be better than a check for mono, because if mono gets fixed this will continue to work.
I have read the first frame of a DICOM CINE image, then I want to read the second frame and so on. How much byte should I seek the file pointer to get next frame(If the frame size is width=640, height=480).
by DICOM cine image, you mean multi-frame DICOM files right?
May i know :
which platform you are on, which dicom lib/SDK you are using? and for your DICOM image, has it been decompressed? to BMP(32-bit/24-bit)?
If your dicom file is in 24bit(3-bytes) BMP, then your next frame of pixel data would be 640*480*3.
Assuming you are dealing with uncompressed (native) multi-frame DICOM. In that case, you need to extract following information before proceeding to calculate the size of each image frame.
Transfer Syntax (0002, 0010) to make sure dataset set is not using
encapsulated/compressed transfer syntax.
Sample per Pixel (0028, 0002): This represents number of samples
(planes) in this image. As for example 24-bit RGB will have value of
3
Number of Frames (0028, 0008): total number of frames
Rows (0028, 0010)
Columns (0028, 0011)
Bit Allocated (0028, 0100): Number of bits allocated for each pixel
sample.
Planar Configuration (0028, 0006): Conditional element that indicates
whether the pixel data are sent color-by-plane or color-by-pixel.
This is required if Samples per Pixel (0028, 0002) has a value
greater than 1.
You would calculate the frame size as follows:
Frame size in bytes = Rows * Columns * (Bit Allocated* Sample per Pixel/8)
I want to destroy all i-frames of a video. Doing this I want to check if encrypting only the i-frames of a video would be sufficient for making it unwatchable. How can I do this? Only removing them and recompressing the video would not be the same as really overwriting the i-frame in the stream without recalculating b-frames etc.
Using libavformat (library from ffmpeg), you can demultiplex the video into packets that represent a single frame. You can then encrypt data in the packets that are marked as key frames. Finally you can remultiplex the video into a new file. There is a good libavformat/libavcodec tutorial here. You will not have to actually decode/encode the frames because I assume you just want to encrypt the compressed data. In this case, once you read the AVPacket, just encrypt its data if it's a key frame (packet->flags & PKT_FLAG_KEY). You would then have to write the packets to a new file.
One thing to note is that you might have to be careful when you just encrypt the I-frame packets returned from libavformat or some other demuxing software since they may include data from other headers that are stored in the bitstream. For instance, I have often seen libavformat return sequence or group of picture headers as part of a video frame packet. Destroying this information may invalidate your test.
A possibly easier way to approach the problem would be to research the bitstream syntax of the codec used to encode the video and use the start codes to determine where frames start and whether or not they are I-frames. One problem is that most video files have a container (AVI, MP4, MPEG-PS/TS) around the actual compressed data and you would not want to encrypt anything in that area. You will most likely find header information belonging to the container format interspersed within the compressed data of a single frame. So you could use ffmpeg from the command line to output just the raw compressed video data:
ffmpeg -i filename -an -vcodec copy -f rawvideo output_filename
This will create a file with only the video data(no audio) with no container. From here you can use the start codes of the specific video format to find the ranges of bytes in the file that correspond to I-frames.
For instance, in MPEG-4, you would be looking for the 32-bit start code 0x000001b6 to indicate the start of a VOP (video object plane). You could determine whether it is an I-frame or not by testing whether two bits immediately following the start code are equal to 00. If it is an I frame, encrypt the data until you reach the next start code (24-bit 0x000001). You'll probably want to leave the start code and frame type code untouched so you can tell later where to start decrypting.
Concerning outcome of your test as to whether or not encrypting I-frames will make a video unwatchable; it depends on your meaning of unwatchable. I would expect that you may be able to make out a major shape that existed in the original video if it is in motion since its information would have to be encoded in the B or P frames, but the color and detail would still be garbage. I have seen a single bit error in an I-frame make the entire group of pictures (the I-frame and all frames that depend on it) look like garbage. The purpose of the compression is to reduce redundancy to the point that each bit is vital. Destroying the entire I-frame will almost definitely make it unwatchable.
Edit: Response to comment
Start codes are guaranteed to be byte-aligned, so you can read the file a byte at a time into a 4 byte buffer and test whether it is equal to the start code. In C++, you can do this by the following:
#include <iostream>
using namespace std;
//...
//...
ifstream ifs("filename", ios::in | ios::binary);
//initialize buffer to 0xffffffff
unsigned char buffer[4] = {0xff, 0xff, 0xff, 0xff};
while(!ifs.eof())
{
//Shift to make space for new read.
buffer[0] = buffer[1];
buffer[1] = buffer[2];
buffer[2] = buffer[3];
//read next byte from file
buffer[3] = ifs.get();
//see if the current buffer contains the start code.
if(buffer[0]==0x00 && buffer[1]==0x00 && buffer[2]==0x01 && buffer[3]==0xb6)
{
//vop start code found
//Test for I-frame
unsigned char ch = ifs.get();
int vop_coding_type = (ch & 0xc0) >> 6; //masks out the first 2 bits and shifts them to the least significant bits of the uchar
if(vop_coding_type == 0)
{
//It is an I-frame
//...
}
}
}
Finding a 24-bit start code is similar, just use a 3 byte buffer. Remember that you must remove the video container with ffmpeg before doing this or you may destroy some of the container information.
On Windows you could copy file without recompress using VFW and skip I-frames. To find I-frames you could use FindSample function with FIND_KEY flag.