Save XNA rendered image to disk - c#

I am trying to save an image rendered in XNA to disk. This is actually quite easy, but the kicker is I need the full 32 bit float precision for each channel not just the 0-255 range.
I have been looking into using texture packing (converting the float into a 4 component ARGB), but I worry I will lose precision this way. I need very high accuracy.
Another way I was looking into is using a shader and multiply the float component with 2147483647 (max positive int), then go through each bit and store a binary 0 or 1 in the rendered image. Each image can later be reassembled in regular code to reconstruct the full precision float. This works, but the problem is, shader model 3.0 seems to not support 32 bit int's properly. All I get is 24 bit of precision this way.
Is there a way to do this in a more direct and accurate way?

Since you say that you're using 32-bit precision for each of your four channels, I assume that your texture is using SurfaceFormat.Vector4, which is--as far I'm aware--the only 128-bit texture format supported by XNA. In that case, it's easy enough to retrieve the actual data from a texture that you've rendered:
var tex = new Texture2D(GraphicsDevice, 1024, 1024, false, SurfaceFormat.Vector4);
var data = new Vector4[1024 * 1024];
tex.GetData<Vector4>(data);
Then all you need to do is write some code to save the data array to a file that you can read in later, which is easy enough. Then you can reconstitute the texture:
Vector4[] data = LoadMyData();
tex.SetData(data);

Related

NAudio Normalize Audio

I am trying to normalize Mp3-Files with NAudio but I don't know how to do so.
The first I did was converting the Mp3-File to PCM:
using (Mp3FileReader fr = new Mp3FileReader(mp3.getPathWithFilename())) {
using (WaveStream pcm = WaveFormatConversionStream.CreatePcmStream(fr)) {
WaveFileWriter.CreateWaveFile("test.wav", pcm);
}
}
But what is the next step? Unforntunately I didn't find anything on the net.
Thanks for your help
I'm new to NAudio, so I don't exactly know how to code this, but I do know that normalization of an audio file requires two passes through the data. The first pass is to determine the maximum and minimum data values contained in the file - so you would have to scan each data point and determine the max and min data points (and for both channels if stereo). Then, upon determining the highest max or lowest min (whichever absolute value is highest), you calculate that value as a percentage from Full Scale (the highest or lowest possible value for the bit stream, for example with 16-bit audio it's 32767 or -32768). You then increase the volume by the difference in percentage.
So for example on your scanning pass, you discovered that the highest value in a 16 bit mono file was 29000, you would then increase the volume by 112.989 percent so that the maximum sample is increased from 29000 to 32767, and all other samples are increased accordingly.

Why is bmp loading faster than png

I have a sample wpf app here and wondering, why BMP is loading faster than PNG. Here is the exact setup:
- Windows 7
- Visual Studio 2013
- landscape.png, 1920x1080, 2.4mb
- landscape.bmp, 1920x1080, 5.6mb
- wpf-app, class MainWindow.xaml.cs constructor, no other code before
Code:
var sw = Stopwatch.StartNew();
var Bitmap1 = new Bitmap("landscape.bmp");
long t1 = sw.ElapsedMilliseconds;
sw.Restart();
var Bitmap2 = new Bitmap("landscape.png");
long t2 = sw.ElapsedMilliseconds;
So the BMP loads with around 6ms, the PNG needs 40ms.
Why is that so?
First, we need to understand how digital images are stored and shown, a digital image is represented as a matrix where each element of the matrix is the color of the pixel, if you have a grayscale image then each element is a uint8 (unsigned 8-bit integer) number between 0 and 255 and in some cases, it's an int8 (signed 8-bit integer) number between -128 and 127. if the element is 0 (or -128 in int8 version) the color is solid black and if the element is 255 (or 127 in int8 version) the color is solid white.
For RGB images each element of the said matrix takes 24 bit or 3 Byte to store (one Byte for each color), a very common resolution for digital cameras and smartphones is 3264 x 2448 for an 8-megapixel camera, now imagine we want to save a 3264 row matrix where each row has 2448 element and each element is 3 Byte, we need about 24 MegaByte to store that image which is not very efficient for posting on the internet or transferring or most of the other purposes. That is why we should compress the image, we can go for JPEG which is a lossy compression method and that means we do lose some quality or we can choose a lossless compression method like PNG which will give us less compression ratio but instead we are not gonna lose quality.
Whether we chose to compress the image or not, when we want to see the image, we can only show the uncompressed version of the image, if the image is not compressed at all, there is no problem, we show exactly what it is, but if it's compressed, we have to decode it first (uncompress it).
With all that being said, let's answer the question. BMP is a format for somewhat raw images, there is either no compression at all or much fewer compression techniques are used than PNG or JPEG but the file size is bigger. When you want to show a BMP image, because it's bigger, there is more data to read into memory, but when it is read, you can show it very faster because there is either no need for decoding or much less decoding is required, on the other hand when you want to show a PNG image, the image will be read into memory much faster but compared to BMP the decoding is going take more time.
If you have a very slow storage, BMP images will be shown slow.
If you have a very slow CPU or your decoding software is not efficient PNG images will be shown slow.

Kinect Audio PCM Values

Im using kinect to extract audio and classifie its features, but i have a question. On http://msdn.microsoft.com/en-us/library/hh855698.aspx it says the audio.start method Opens an audio data stream (16-bit PCM format, sampled at 16 kHz) and starts capturing audio data streamed out of a sensor. The problem is that i dont know how pcm data is represented and i dont know if the method returns pcm true values or not. Because using the sdk examples i get values like 200, 56, 17 and i think audio values are more like -3*10^-5 .
So does anyone know how do i get the true PCM values? Or am i doing something wrong?
Thanks
I wouldn't expect any particular values. 16-bit PCM means it's a series of 16-bit integers, so -3*10-5 (-0.00003) isn't representable.
I would guess it's encoded with 16-bit signed integers (like a WAV file) which have a range of -32768 to 32767. If you're being very quiet the values will probably be close to 0. If you make a lot of noise you will see some higher values too.
Check out this diagram (from Wikipedia's article on PCM) which shows a sine wave encoded as PCM using 4-bit unsigned integers, which have a range of 0 to 15.
See how that 4-bit sine wave oscillates around 7? That's it's equilibrium. If it was a signed 4-bit integer (which has a range of -8 to 7) it would have the same shape, but its equilibrium would be 0 - the values would be shifted by -8 so it would oscillate around 0.
You can measure the distance from the equilibrium to the highest or lowest points of the sine wave to get its amplitude, or broadly, it's volume (which is why if you're quiet you will mostly see values near 0 in your signed 16-bit data). This is probably the easiest sort of feature detection you can do. You can find plenty of good explanations on the web about this, for example http://scienceaid.co.uk/physics/waves/sound.html.
You could save it to a file and play it back with something like Audacity if you're not sure. Fiddle with the input settings and you'll soon figure out the format.

Audio Beat Detection in C#

Using System.IO BinaryReader object found in .NET mscorlib assembly, I ran a loop that dumped each byte value from a .wav file into Excel spreadsheet. For simplicity sake, I recorded a two second 4K signal from signal generator into software sequencer and saved as monaural wave file. The software I sequence music with shows a resolution of 1ms - which is 44.11 samples(assuming 44.1K sample rate). What I find curious is that the data extracted via ReadInt16() method(starting at position 44 in .wav file) shows varied numbers with integers switching signs seemingly at random- whereas the visual sine wave within sequencer is completely uniform with respect to amplitude and frequency. With 16 bit resolution, I determined that for each sample first byte was frequency resolution and the second amplitude, is correct?
Question: How can I intelligently interpret the integers pulled from wave file for the ultimate purpose of determining rhythmic beats?
Many thanks...........Mickey
For a WAV file with 16 bits per sample, it is not the case that the first byte of the sample is frequency resolution and the second byte is amplitude. Both bytes together indicate the sample's amplitude at that specific point in time. The two bytes are interpreted as a 2-byte integer, so the values will range from -32768 to +32767.
I do not know how your sequencer works or what it is displaying. From your description, it sounds as if your sequencer is using FFT to convert the audio from time-domain (which is what a WAV file is) to frequency-domain (which is a graph with frequency along the x-axis and frequency amplitude along the y-axis). A WAV file does not contain frequency information.

XNA: Getting a struct as a texture to the GPU

I use XNA as a nice easy basis for some graphics processing I'm doing on the CPU, because it already provides a lot of the things I need. Currently, my "rendertarget" is an array of a custom Color struct I've written that consists of three floating point fields: R, G, B.
When I want to render this on screen, I manually convert this array to the Color struct that XNA provides (only 8 bits of precision per channel) by simply clamping the result within the byte range of 0-255. I then set this new array as the data of a Texture2D (it has a SurfaceFormat of SurfaceFormat.Color) and render the texture with a SpriteBatch.
What I'm looking for is a way to get rid of this translation process on the CPU and simply send my backbuffer directly to the GPU as some sort of texture, where I want to do some basic post-processing. And I really need a bit more precision than 8 bits there (not necessarily 32-bits, but since what I'm doing isn't GPU intensive, it can't hurt I guess).
How would I go about doing this?
I figured that if I gave Color an explicit size of 32 bytes (so 8 bytes padding, because my three channels only fill 24 bits) through StructLayout and set the SurfaceFormat of the texture that is rendered with the SpriteBatch to SurfaceFormat.Vector4 (32 bytes large) and filled the texture with SetData<Color> that it would maybe work. But I get this exception:
The type you are using for T in this method is an invalid size for this resource.
Is it possible to use any arbitrarily made up struct and interpret it as texture data in the GPU like you can with vertices through VertexDeclaration by specifying how it's laid out?
I think I have what I want by dumping the Color struct I made and using Vector4 for my color information. This works if the SurfaceFormat of the texture is also set to Vector4.

Categories

Resources