I have some JPEG files that I can't seem to load into my C# application. They load fine into other applications, like the GIMP. This is the line of code I'm using to load the image:
System.Drawing.Image img = System.Drawing.Image.FromFile(#"C:\Image.jpg");
The exception I get is: "A generic error occurred in GDI+.", which really isn't very helpful. Has anyone else run into this, or know a way around it?
Note: If you would like to test the problem you can download a test image that doesn't work in C#.
There's an exact answer to this problem. We ran into this at work today, and I was able to prove conclusively what's going on here.
The JPEG standard defines a metadata format, a file that consists of a series of "chunks" of data (which they call "segments"). Each chunk starts with FF marker, followed by another marker byte to identify what kind of chunk it is, followed by a pair of bytes that describe the length of the chunk (a 16-bit little-endian value). Some chunks (like FFD8, "Start of Image") are critical to the file's usage, and some (like FFFE, "Comment") are utterly meaningless.
When the JPEG standard was defined, they also included the so-called "APP markers" --- types FFE0 through FFEF --- that were supposed to be used for "application-specific data." These are abused in various ways by various programs, but for the most part, they're meaningless, and can be safely ignored, with the exception of APP0 (FFE0), which is used for JFIF data: JFIF extends the JPEG standard slightly to include additional useful information like the DPI of the image.
The problem with your image is that it contains an FFE1 marker, with a size-zero chunk following that marker. It's otherwise unremarkable image data (a remarkable image, but unremarkable data) save for that weird little useless APP1 chunk. GDI+ is wrongly attempting to interpret that APP1 chunk, probably attempting to decode it as EXIF data, and it's blowing up. (My guess is that GDI+ is dying because it's attempting to actually process a size-zero array.) GDI+, if it was written correctly, would ignore any APPn chunks that it doesn't understand, but instead, it tries to make sense of data that is by definition nonstandard, and it bursts into flames.
So the solution is to write a little routine that will read your file into memory, strip out the unneeded APPn chunks (markers FFE1 through FFEF), and then feed the resulting "clean" image data into GDI+, which it will then process correctly.
We currently have a contest underway here at work to see who can write the JPEG-cleaning routine the fastest, with fun prizes :-)
For the naysayers: That image is not "slightly nonstandard." The image uses APP1 for its own purposes, and GDI+ is very wrong to try to process that data. Other applications have no trouble reading the image because they rightly ignore the APP chunks like they're supposed to.
.Net isn't handling the format of that particular image, potentially because the jpeg data format is slightly broken or non-standard. If you load the image into GIMP and save to a new file you can then load it with the Image class. Presumably GIMP is a bit more forgiving of file format problems.
This thread from MSDN Forums may be useful.
The error may mean the data is corrupt
or there is some underlying stream
that has been close too early.
The error can be a permission problem. Especially if your application is an ASP.NET application. Try moving the file to the same directory as your executable (if Win forms) or the root directory of your web application (if asp.net).
I have the same problem.
The only difference i've noticed is the compression. It works fine with "JPEG" but when the compression is "Progressive JPEG" i get the exception (A generic error occurred in GDI+).
At first i thought it could be a memory problem because the images i mentioned were kind of big (about 5MB in disk and maybe ~80MB in memory), but then i'v noticed the difference in the compression type.
When i open/save the image file in other program like IrfanView or GIMP, the result is ok, but that's not the idea.
Related
For example, I recorded a video using my camera and saved it as my_vacation.mp4 which size is 50MB. I opened the video file and an encrypted file called secret_message.dat using Visual Studio, by using File.ReadAllBytes() in C#, concatenated both arrays of bytes, and then saved it as my_vacation_2.mp4.
The program I created for testing purpose is able to save the byte index where the hidden file begin and I want to use it as key to extract that hidden file later.
Now I can play that video file normally, without any error. Total file size is 65MB. Suppose no one could access the original file, of course no one would know that the last 15MB part of that video file is actually another file, right?
What might be the flaw of this technique? Is this also a valid steganography technique?
Is this a valid steganography technique?
Yes, it is. The definition of steganography is hiding information in another medium without someone suspecting its presence or existence. Just because it may be a bad approach doesn't change its intentions at all. If anything, a multitude of papers on steganography mention this technique in their introduction section as an example of how steganography can be applied.
What might be the flaw of this technique?
There are mainly 2 flaws: it is trivial to detect and is absolutely fragile to modification attacks.
Many formats encode their data either by a header which says in advance how many bytes to read before the end of file, or by putting an end-of-file marker, which means to keep on reading data until the marker is encountered. By attaching your data after that, you ensure they won't be read by the appropriate format decoder. This can fool your 11-year old cousin who knows nothing about that sort of stuff, but anyone mildly experienced can load the file and count how many bytes were read. If there are unaccounted bytes in the physical file, that will instantly raise red flags.
Even worse, it's trivial to fully extract your secret. You may argue it's encrypted, but remember, the aim of steganography is to not raise any suspicion. Most steganalysis approaches put a statistical number to it, e.g., 60% there is a message hidden in X medium. A few others can go a bit further and guess the approximate length of the embedded secret. In comparison, you're already caught red-handed.
Talking about length, a file of X bitrate/compression and Y duration approximately results to a file of size Z. Even an unsavvy one will know what's up when the size is 30% larger than expected.
Now, imagine your file is communicated through an insecure channel where a warden inspects its contents and if he suspects foul play, he can modify the file so that the recipient doesn't get the message. In this case, it's as simple as loading the file and resaving it. In fact, your method is so fragile it can be destroyed by even the most unintentional of attacks. By just uploading your track to a site for playback, it can unwittingly reencode it for higher compression, just because it makes sense.
Suppose no one could access the original file, of course no one would know that the last 15MB part of that video file is actually another file, right?
No. Your secret file is encrypted, so that probably rules out any headers showing up in hex editor, but there is a problem - MP4 container format and its structure is well known.
You can extract all video/audio tracks and what you are left with is some metadata and your secret message, so it will be obvious that it's not supposed to be there.
It is a valid technique, just not a very effective one.
I'm writing a service for a project that's going to handle our image processing. One such process is supposed to strip all metadata from the byte[] provided and return the same image as a byte[].
The method I'm currently working on involves always converting the image to a Bitmap, then converting it back to the original format and returning the data from a MemoryStream.
I haven't been able to test it yet but something tells me I'm going to experience some quality loss.
How can I remove all metadata from any image with a common format?
(bmp, gif, png, jpg, icon, tiff)
Not sure how I can narrow that down any further. Would be nice if I got some feedback regarding the downvotes.
For the lossless formats (except JPEG), your idea of loading it as a bitmap and re-saving is fine. Not sure if .NET natively supports TIFFs (I doubt it does).
For JPEGs, as you suggested there may be quality loss if you're re-compressing the file after decompressing it. For that, you might try the ExifLibrary and see if that has anything. If not, there are command line tools (like ImageMagick) that can strip metadata. (If you use ImageMagick, you're all set, since it supports all of your required formats. The command you want is convert -strip.)
For TIFFs, .NET has built-in TiffBitmapDecoder and ...Encoder classes you might be able to use; see here.
In short, using an external tool like ImageMagick is definitely the easiest solution. If you can't use an external tool, you're almost certainly going to need to special-case the formats that .NET doesn't support natively (and the lossy JPEG).
EDIT: I just read that ImageMagick doesn't do lossless stripping with JPEGs, sorry. I guess using the library I linked above, or some other JPEG library, is the best I can think of.
I'm writing a method that needs to save a System.Drawing.Image to a file. Without knowing the original file the Image was created from, is there anyway to determine what file extension it should have?
The best solution I've come up with is to use a Switch/Case statement with the value of Image.RawFormat.
Does it even matter that I save the Image in it's original format? Is an Image generated from a PNG any different from say one generated from a JPEG? Or is the data stored in an Image object completely generic?
While Steve Danner is correct in that an image created from a JPG will look different to an image created from a PNG once it's loaded into memory it's an uncompressed data stream.
This means that you can save it out to any file format you want.
However, if you load a JPG image and then save it as another JPG you are throwing away more information due to the compression algorithm. If you do this repeatedly you will eventually lose the image.
If you can I'd recommend always saving as PNG.
Image.RawFormat has cooties, stay away from it. I've seen several reports of it having no legal value for no apparent reason. Undiagnosed as yet.
You are quite right, it doesn't matter what format you save it to. After you loaded the file, the internal format is the same for any bitmap (not vector) with the same pixel format. Generally avoid recompressing jpeg files, they tend to get bigger and acquire more artifacts. Steve mentions multi-frame files, they need to be saved a different way.
Yes, it definitely matters because different fileformats support different features such as compression, multiple frames, etc.
I've always used a switch statement like you have, perhaps baked into an extension method or something.
To answer your question 'Does it even matter that I save the Image in it's original format?' explicitly: Yes, it does, but in a negative way.
When you load the image, it is uncompressed internally to a bitmap (or as ChrisF calls it, an uncompressed data stream). So if the original image used a lossy compression (for example jpeg), saving it in the same format will again result in loss of information (i.e. more artifacts, less detail, lower quality). Especially if you have repeated actions of read - modify - save, this is something to avoid.
(Note that it is also something to avoid if you are not modifying the picture. Just the repeated decompress - compress cycles will degrade the image quality).
So if disk space is not an issue here (and it usually isn't in the age of hard disks that are big enough for HD video), always store any intermediate pictures in lossless compression formats, or uncompressed. You may consider saving the finall output in a compressed format, depending on what you use it for. (If you want to present those final pictures on the web, jpeg or png would be good choices).
I'm trying to open a file to bytes, convert it to a string, modify some data (think Steganography) and convert the file back to bytes and save it as a jpeg. So far, everything I've tried has corrupted the file in converting it to a string. I've tried converting it to a 64-bit string, but of course that's a bit hard to modify the data in :P
Any suggestions on how I can do this properly, without corrupting my file?
I don't have this in C# but in PHP, but you can take a look and adaptate to C#.
http://www.havenard.110mb.com/fotomagica/
This is my site where there's a tool to modify the EXIF data of a JPEG and build "magic pictures" which display something in the thumbnail which is not the real picture.
It opens the JPEG, slice its sectors, and build it back ignoring irrelevant sectors and placing my custom made EXIF header.
And this is the source of the PHP classes:
http://www.havenard.110mb.com/fotomagica/class.JpegMapper.php.txt (ExifMapper is incomplete)
http://www.havenard.110mb.com/fotomagica/class.DataMapper.php.txt
You can study it and rebuild in C#, it is really simple to slice a JPEG as you will see.
Usage of this PHP class (only JpegMapper):
$jpg = new JpegMapper('picture.jpg');
$jpg->save_filtered('filtere picture.jpg'); // save removing irrelevant sectors
It is great to get any JPEG even smaller (sometimes a lot smaller).
I am working on a system that stores many images in a database as byte[]. Each byte[] is a multi page tiff already, but I have a need to retrieve the images, converting them all to one multi page tiff. The system was previously using the System.Drawing.Image classes, with Save and SaveAdd - this was nice in that it saves the file progressively, and therefore memory usage was minimal, however GDI+ concurrency issues were encountered - this is running on a back end in COM+.
The methods were converted to use the System.Windows.Media.Imaging classes, TiffBitmapDecoder and TiffBitmapEncoder, with a bit of massaging in between. This resolved the concurrency issue, but I am struggling to find a way to save the image progressively (i.e. frame by frame) to limit memory usage, and therefore the size of images that can be manipulated is much lower (i.e. I created a test 1.2GB image using the GDI+ classes, and could have gone on, but could only create a ~600MB file using the other method).
Is there any way to progressively save a multi page tiff image to avoid memory issues? If Save is called on the TiffBitmapEncoder more than once an error is thrown.
I think I would use the standard .NET way to decode the tiff images and write my own tiff encoder that can write progressively to disk. The tiff format specifications are public.
Decoding a tiff is not that easy, that's why I would use the TiffBitmapDecoder for this. Encoding is is easier, so I think it is doable to write an encoder that you can feed with separate frames and that is writing the necessary data progressively to disk. You'll probably have to update the header of the resulting tiff once you are ready to update the IFD (Image File Directory) entry.
Good luck!
I've done this via LibTIFF.NET I can handle multi-gigabyte images this way with no pain. See my question at
Using LibTIFF from c# to access tiled tiff images
Although I use it for tiled access, the memory issues are similar. LibTIFF allows full access to all TIFF functions, so files can be read and stored in a directory-like manner.
The other thing worth noting is the differences between GDI on different windows versions. See GDI .NET exceptions.