As I know the jpeg file have a best compression ratio between another image extensions, and if I correct we can not more compress a jpeg file because that have best compression, so please help me about this. I create some jpegs as following:
ImageCodecInfo[] codecs = ImageCodecInfo.GetImageEncoders();
ImageCodecInfo ici = null;
foreach(ImageCodecInfo codec in codecs) {
if(codec.MimeType == "image/jpeg")
ici = codec;
}
EncoderParameters ep = new EncoderParameters();
ep.Param[0] = new EncoderParameter(System.Drawing.Imaging.Encoder.Quality, _quality);
using(MemoryStream ms = new MemoryStream()) {
Bitmap capture = GetImage();
capture.Save(ms, ici, ep);
}
And I zipped them with sharpziplib, in average every jpeg size is 130KB and after zip every file compressed to about 70KB, how it possible? there is just 2 answer I can Imagine.
1- We can Compress Jpeg file with more compression ratio by zip libraries
2- My jpeg file not correctly created, and we can create better jpegs (with more compression ratio as we can not more compress them with zip libraries)
Does any one know about this? if we can create better jpegs please help me about it.
Edit:
this is my zip code to compress jpegs:
void addnewentry(MemoryStream stream, string pass,
string ZipFilePath, string entryname){
ICSharpCode.SharpZipLib.Zip.ZipFile zf = new ZipFile(ZipFilePath);
if(!String.IsNullOrEmpty(pass))
zf.Password = pass;
StaticDataSource sds = new StaticDataSource(Stream);
zf.BeginUpdate();
zf.Add(sds, entryName);
zf.CommitUpdate();
zf.IsStreamOwner = true;
zf.Close();
}
public class StaticDataSource : IStaticDataSource {
public Stream stream { get; set; }
public StaticDataSource() {
this.stream.Position = 0;
}
public StaticDataSource(Stream stream) {
this.stream = stream;
this.stream.Position = 0;
}
public Stream GetSource() {
this.stream.Position = 0;
return stream;
}
}
As most of people already stated, you can't compress such already compressed files further easily. Some people works hard on JPEG recompression (recompression = partially decoding already compressed file, and then compressing those data with a custom stronger model and entropy coder. Recompression usually ensures bit-identical results). Even that advanced recompression techniques, I only saw at most 25% improvement. PackJPG is one them. You can have a look at the other compressors here. As you realize, even top rank compressor couldn't reach exactly 25% (even though it's very complex).
Taking these facts into considerations, ZIP (actually deflate) cannot improve compression that much (it's a very old and inefficient if you compare it with top 10 compressors). I believe there are two possible reasons for that problem:
You're accidentally adding some extra data to JPEG stream (possibly adding after JPEG stream).
.NET outputs lots of redundant data to JFIF file. Maybe some big EXIF data and such.
To solve the problem, you can use a JFIF dump tool to observe what's inside the JFIF container. Also, you may want to try your JPEG files with PackJPG.
No one has mentioned that fact that JPEG is merely a container. There are many compression methods that can be used with that file format (JFIF, JPEG-2000, JPEG-LS, etc.) Further compressing that file can yield varying results depending on the content.
Also, some cameras store huge amounts of EXIF data (sometimes resulting in about 20K of data) and that might account for the difference you're seeing.
The JPEG compression algorithm has two stages: a "lossy" stage where visual elements that should be imperceptible to the human eye are removed, and a "lossless" stage where the remaining data is compressed using a technique called Huffmann coding. After Huffmann coding, further lossless compression techniques (like ZIP) will not reduce the size of the image file by significant amount.
However, if you were to zip multiple copies of the same small image together, the ZIP ("DEFLATE") algorithm will recognise the repetition of data, and exploit it to reduce the total file size to less than the sum of the individual files' size. This may be what you're seeing in your experiment.
Stated very simply, losless compression techniques like Huffman coding (part of JPEG) and DEFLATE (used in ZIP) try to discover repeated patterns in your original data, and then represent those repeated patterns using shorter codes.
In short, you won't be able to really improve JPEG by adding on another lossless compression stage.
You can attempt to compress anything with zlib. You just don't always get a reduction in size.
Usually compressing a whole jpeg file will yield a handful of bytes savings as it will compress the jpeg header (including any plain text comments or EXIF data)
This may not fully account for the 40K of compression you see unless you have a huge amount of header data or your jpeg data somehow winds up with alot of repeating values inside.
Zipping JPEGs reduces size because: EXIF data isn't compressed, JPEG is optimized for photos and not GIF-like data, and compressing files creates a single data stream, allowing patterns across multiple files and removing the requirement that each must be aligned with a specific block on disk. The latter can alone save around 4KB per compressed file.
The main problem with zipping pre-compressed images is that it requires extra work (human and CPU) for prep and viewing, which may not be worth the effort (unless you have millions of images that are infrequently accessed, or some kind of automated image service you're developing).
A better approach is to minimize the native file size, forgetting zip. There are many free libraries and apps out there to help with this. For example, ImageOptim combines several libs into one (OptiPNG, PNGCrush, Zopfli, AdvPNG, Gifsicle, PNGOUT), for a barrage of aggressive tricks to minimize size. Works great for PNGs; haven't tried it much with JPEGs.
Though remember that with any compression, there's always a point of diminishing returns. It's up to you to decide whether or not a few extra bytes really matter in the long run.
Related
A little bit of background:
I'm writing a bar code image scanner desktop app using WPF, that can take input from either a file location (previously scanned image) or have it come directly from a scanner (using NTWAIN). In both cases I create or get a stream.
Now when I create a new Bitmap from the stream and save it as a JPEG file using an Encoder
using (var bmp = Image.FromStream(rawStream))
{
EncoderParameter ratio = new EncoderParameter(Encoder.Quality, 100L);
EncoderParameter depth = new EncoderParameter(Encoder.ColorDepth, 8L);
EncoderParameters codecParams = new EncoderParameters(2);
codecParams.Param[0] = ratio;
codecParams.Param[1] = depth;
ImageCodecInfo jpegCodecInfo = ImageCodecInfo.GetImageEncoders().FirstOrDefault(x => x.FormatID == ImageFormat.Jpeg.Guid);
bmp.Save(file.FileFullPath, jpegCodecInfo, codecParams); // Save to JPG
}
or the built in
bmp.Save(file.FileFullPath, ImageFormat.Jpeg);
I tend to end up with much larger file sizes. Of course, this isn't always the case, but definately true when I'm loading a small black and white tiff file into memory and encoding as jpg.
My knowledge on image handling is rudimentary, but I think it is because the jpg files are saved with a color depth of 24 bits and the tiff images are originally stored as 1 bit. (Black and white)
No matter what I do, I can't get the jpg files to match the original file's bit depth.
The only work around I found is simply renaming the file to "filename.jpg" and saving like so
using (Bitmap bmp = new Bitmap(rawStream))
{
Save(file.FileFullPath);
}
But this feels like a solution that won't work indefinitely (as a side question, can one simply rename any *.bmp or *tiff file to *.jpg and it will still work?)
Based on my initial research it seems like
bmp.Save()
doesn't honor the encoding parameter for bit depth in jpeg images. Understandably my clients won't be happy having files grow from 16kb to 200kb for "no reason".
Is there a known work around for this problem or am I missing something obvious when it comes to working with streams and images?
JPEG works best for photographs with a multitude of colors, shades and gradients. Typical bit-depths: 8 (for greyscale) or 24 (for full color).
If you want monochrome (1-bit), I'd recommend agains using JPEG, not least because JPEG will introduce encoding artifacts that may not matter for photographs, but which will look like "added pepper and salt" if your original source is 1-bit. And the more you compress them, the more it will be there.
You should try using PNG instead, it has no such artifacts, and is better suited for digital sources with sharp edges.
You could also try making the TIFF smaller by 50% or 75% using a smart resize algorithm (using e.g. 8-bit output) that will convert micro-dots in the original into small gradients in the output. I did so long ago with 1-bit fax/scanner images, with actually quite good results. But too long to still have those sources.
i want to save pdf file to database if the pdf file is more than 5Mb then database becomes Heavy or Burden for more no.of files.so that i want to decrease the size of Pdf file as less as possible....
i tried the following code but not working.please help to compress Large PDF to Smaller size.for example if PDF size is 2Mb it will compress to 700Kb. i tried so many examples but didn't get output. so please help..
PdfReader reader = new PdfReader("D:/User Guid for Artificial Skin.pdf");
PdfStamper stamper = new PdfStamper(reader, new FileStream("d:/pdfdoccompressed.pdf", FileMode.Create), PdfWriter.VERSION_1_5);
reader.SetPageContent(1, reader.GetPageContent(1));
int pageNum = reader.NumberOfPages;
for (int i = 1; i <= pageNum; i++)
{
reader.SetPageContent(i, reader.GetPageContent(i));
}
stamper.FormFlattening = true;
stamper.Writer.CompressionLevel = PdfStream.BEST_COMPRESSION;
stamper.SetFullCompression();
stamper.Close();
The point with lossless compression is that there's an end to how much you can compress data. When looking sec at the file as a container and applying a generic compression algorithm on it, you'll not deflate the file by much, as they're saved quite optimally by default already.
Very simplified: PDF files can generally only be made significantly smaller when they contain many unused (embedded) objects such as fonts and form fields, and unoptimized images. Any optimizer you find will simply drop the unused objects, and minify those images by saving them in either less dots per inch ("smaller resolution"), less bits per pixel ("smaller bitdepth") or both.
So by passing PdfStream.BEST_COMPRESSION to PdfStamper, you're already doing everything you can. You simply cannot trivially and significantly compress the PDF much more than PdfStamper already does.
However, from your comments and edits it simply seems you're afraid that in the future this will hurt your database (even though it's designed to contain data, and a lot of it). But that concern is not concrete enough for us to help you with.
So see any of the many previous discussions on whether you actually should store your data like that:
BLOB vs FileSystem
Programmers.SE: Is it a bad practice to store large files (10 MB) in a database?
MS Research: To BLOB or Not To BLOB: Large Object Storage in a Database or a Filesystem (PDF)
DBA.SE: Files - in the database or not?
And many, many others.
I'm currently working on a small program to read png files from disk, do some modifications and save it back. Everything is running smoothly except for one small problem, after I saved the file back to disk, its size always increases, for example, a 27.1MB file will become 33.3MB.
After some debugging I finally narrow it down to my reading and saving code. This is the code I'm currently using:
Bitmap img = new Bitmap(<path to file>);
//omitted
img.Save(<path to new file>, ImageFormat.Png);
I've verified no matter if I do or do not make any modification, simply reading and saving the image will cause it size to change. Furthermore, if I opened the saved file with Paint and save from there, the file will shrink back to original size.
How do I read and save the image without changing its size?
Apart from the color depth and how many channels (w/o alpha) are used, saved PNG file size depends mainly on two factors:
How the pre-processing on image lines (called filtering) is done.
The compression level for the deflate algorithm (0-9).
This two factors will greatly affect the output image file size. Filtering is empirical and you can use one out of 4 filtering algorithm for all image lines or different algorithms for different lines or even adaptively try different algorithms on individual lines and choose the largest compression rate. The adaptively way is the most time consuming and impractical for most image writers.
After the filtering, image data is deflate compressed. The compression level for deflate algorithm usually ranges from 0-9 from lowerest to highest compression rate. The higher the compression rate, the slower the compression process. Usually 4 is the best for most of the images.
The filtering process plays a very important sometimes crucial role in PNG compression process. Different filtering algorithm may result in large difference in saved image size. On the other hand, image size is less sensitive to compression level.
You can use tools like TweakPNG to check about the color depth and number of channels the image contains. If the original and the re-saved image has the same color depth and channels, then most probably the filtering and compression level are the culprit for the increased file size.
The truth is if the encoder is not optimized, more often than not, the file size will increase. There are however a lot PNG optimization softwares out there if you don't mind post-processing your resulting images.
Have you tried playing with the Endoder.ColorDepth field? PNG also supports transparency and might be saving some information not needed by your image.
ImageCodecInfo pngCodec = ImageCodecInfo.GetImageEncoders().Where(codec => codec.FormatID.Equals(ImageFormat.Png.Guid)).FirstOrDefault();
if (pngCodec != null)
{
EncoderParameters parameters = new EncoderParameters();
parameters.Param[0] = new EncoderParameter(Encoder.ColorDepth, 24); //8, 16, 24, 32 (base on your format)
image.Save(stream, pngCodec, parameters);
}
Additional info here: https://msdn.microsoft.com/en-us/library/system.drawing.imaging.encoder.colordepth(v=vs.110).aspx
I think you are missing the compression part.
Add to your code like this -
Bitmap img = new Bitmap(<path to file>);
here is what you missed -
ImageCodecInfo myImageCodecInfo = GetEncoderInfo("image/jpeg");
EncoderParameter myEncoderParameter = new EncoderParameter(Encoder.Quality, 25L);
EncoderParameters myEncoderParameters.Param[0] = myEncoderParameter;
and save like this -
img.Save(<path to file>, myImageCodecInfo, myEncoderParameters);
Here is the MSDN link. hope it helps.
I have a service that takes a pdf document, resizes all the images, and replaces it in the pdf. The problem that I'm getting at, is the compression.
Some documents are scanned and saved with a Compression.CCITTFAX3 compression and some are saved with a Compression.CCITTFAX4 compression. I am using iTextSharp and convert the stream bytes to a Tiff, otherwise the image becomes funky because of stride or something.
Below is the code I'm currently making use of to check for the correct filter, and then convert to tiff image.
if (filter == "/CCITTFaxDecode")
{
byte[] data = PdfReader.GetStreamBytesRaw((PRStream)stream);
using (MemoryStream ms = new MemoryStream())
{
using (Tiff myTiff = Tiff.ClientOpen("in-memory", "w", ms, new TiffStream()))
{
myTiff.SetField(TiffTag.IMAGEWIDTH, UInt32.Parse(dict.Get(PdfName.WIDTH).ToString()));
myTiff.SetField(TiffTag.IMAGELENGTH, UInt32.Parse(dict.Get(PdfName.HEIGHT).ToString()));
myTiff.SetField(TiffTag.COMPRESSION, Compression.CCITTFAX3);
myTiff.SetField(TiffTag.BITSPERSAMPLE, UInt32.Parse(dict.Get(PdfName.BITSPERCOMPONENT).ToString()));
myTiff.SetField(TiffTag.SAMPLESPERPIXEL, 1);
myTiff.WriteRawStrip(0, data, data.Length);
myTiff.Flush();
using (System.Drawing.Image img = new Bitmap(ms))
{
if (img == null) continue;
ReduceResolution(stream, img, quality);
}
myTiff.Close();
}
}
}
Just to make sure that you understand my question...
I want to find out how I know when to use G3 compression and when to use G4 compression.
Keep in mind that I've tried every code sample I could find.
This is quite important, as we interface with banking systems, and the files uploaded are sent to them as FICA documents.
Please help...
You need to go low level and inspect the image dictionary. The /DecodeParms entry is a dictionary that contains several keys related to CCITT compression. The /K key specifies the compression type: -1 is G4, 0 is G3 1D and 1 is G3 2D.
Update: to be more exact a negative value, usually -1, is G4, 0 is G3 1D and a positive value, usually 1, is G3 2D. To answer your question in the comment, the /K entry is optional and if it is missing the default value is considered to be 0.
I would not advise inserting the data direct. I base this assertion on many years of practical experience of PDFs and TIFF in products like ABCpdf .NET (on which I work).
While in theory you should be able to move the data over direct, minor differences between the formats of the compressed data are likely to lead to occasional mismatches.
The fact that some Fax TIFFs contain data which will display correctly in a TIFF viewer but not in a PDF one leads me to suspect that the same kind of problem is likely to operate in the other direction too.
I'm not going to say this kind of problem is common but it is the kind of thing I wouldn't rely on if I was in a bank. Unless you are very sure your data source will be uniform I would suggest it is much safer to decompress and recompress.
I would also note that sometimes images are held inline in the content stream rather than in a separate XObject. Again this is something you will need to cope with unless your data source produces a standard format which you can be sure will not contain this kind of structure.
Thank you for the replies above. The solution from Mihai seems viable if you do have all the information from the stream. I found that iTextSharp does not do this properly, so I ended up buying pdf4net. Much simpler than trying to figure out whats the better solution, besides, it ended up cheaper than my time I spent on this.
OnceUponATime.... Thank you for the information given above.
PDF4Net has a built in method that you get all the images per page... This sorted my issues, whereas I tried to do this myself using iTextSharp and the examples that were given to me.
First, this question is NOT about "how to save a Bitmap as a jpeg on your disk?"
I can't find (or think of) a way of converting a Bitmap with Jpeg compression, but keep it as a Bitmap object. MSDN clearly shows how to save a Bitmap as a JPEG, but what I'm looking for is how to apply the encoding/compression to the Bitmap object, so I can still pass it around, in my code, without referencing the file.
One of the reasons behind that would be a helper class that handles bitmap, but shouldn't be aware of the persistency method used.
All images are bitmaps when loaded into program memory. Specific compressions are typically utilized when writing to disk, and decompressing when reading from disk.
If you're worried about the in-memory footprint of an image you could zip-compress the bytes and pass the byte array around internally. Zipping would be good for lossless compression of an image. Don't forget that many image compressions have different levels of losiness (sp?) In other words, the compression throws away data to store the image in the smallest number of bytes possible.
De/compression is also a performance tradeoff in that you're trading memory footprint for processing time. And in any case, unless you get really fancy, the image does need to be a bitmap if you need to manipulate it in any way.
Here is an answer for a somewhat similar question which you might find interesting.
Bitmap does not support encoded in-memory storage. It is always unencoded (see the PixelFormat enum). Problably you need to write your own wrapper class/abstraction, or give up on that idea.
var stream = new MemoryStream()
Bitmap.Save(Stream, ImageFormat)
Does it what you need?