imageresing.net community and developers.
Please, clarify me some details about imageresing.net internals.
Does imageresing.net use .NET Drawing library to recompress jpegs? If not - does it use a 3rd party engine or some internal algorithms?
Are there performance benchmarks? I'd like to compare imageresing.net with other libraries: libjpeg, Intel Integrated Performance Primitives, etc.
Thanks in advance,
Anton
ImageResizer offers 3 imaging pipelines.
GDI+ (System.Drawing) The default. 2-pass high-quality bicubic with pre-smoothing. Very fast for the quality provided. (Avg. 200-300ms for resizing.) A recent windows update makes GDI+ parallelize poorly, but this is being actively investigated by MSFT.
WIC (What WPF also uses). 4-8x faster (20-200ms for resizing). Single-pass resizing causes lots of moire artifacts and causes blurriness (in both Fant and Bicubic modes). No really high quality resizing is available within Windows Imaging Components.
FreeImage. If you need to support DSLR formats or access Lanczos resampling (top-tier quality), FreeImage is your library. It's slower than the others (often 800-2400ms), large, monolithic, and hard to audit, so we only recommend using it with trusted image data. We use a custom version of FreeImage built against libjpeg-turbo, which is significantly faster than libjpeg.
You can mix and match encoders, decoders, and (to some extent) resizing algorithms. Some algorithms are implemented internally for quality reasons, while most are implemented in C/C++ in dependencies.
End-to-end comparison benchmarking is kind of absurd if you care about photo quality, since you can never compare apples to apples. Back in 2011, I did some benchmarks between GDI+ and WIC, but photographers and graphics designers tend to find WIC image quality unacceptable, so it's not particularly fair.
We regularly benchmark each pipeline against itself to detect performance improvements or regressions, but comparing pipelines can be deceptive for a numer of reasons:
Do you care about metadata? libjpeg-turbo is (wierdly) 2-3x faster at reading jpegs when you disable metadata parsing. If you want to auto-rotate based on camera exif data, you'll need that info.
Do you care about color correctness? Jpeg isn't an RGB format. If it has an ICC profile, the right thing to do is convert to sRGB before you resize. That's slow.
Do you care about resizing quality? There are a hundred ways to implement a bicubic resizing filter. Some fast, some slow, most ugly, some accurate. Bicubic WIC != Bicubic GDI+. You can get < 20ms end-to-end out of ImageResizer WIC in nearest neighbor mode - if you're cool with the visual results.
Do you care about output file size? If you're willing to spend more clock cycles, you can gain 30-80% reductions in PNG/GIF file sizes, and 5-15% in jpeg. If you want to add 150-600ms per request, ImageResizer can create WebP images that halve your bandwidth costs (WebP is more costly to encode than jpeg).
You can make sense of micro-benchmarks (is libjpeg-turbo 40% faster than libjpeg under the same circumstances, etc). You can even compare certain simple low-quality image resizing filters (nearest-neighbor, box, bilinear) after excluding encoding, decoding, and color transformation.
The problem is that truly high-quality resizing is really complex, and never implemented the same way twice. There are a vanishingly small number of high-quality implementations, and an even tinier number that have sub-second performance. I ordered a dozen textbooks on image processing to see if I could find a reference implementation, but the topic is... expertly avoided by most, and only briefly touched on by others. Edge-pixel handling, pre-filtering, and performance optimization are never mentioned.
I've funded a lot of research into fast high-quality image resizing, but we haven't been able to match GDI+ yet. ImageResizer's default configuration tends to beat Photoshop quality on many types of images.
A fourth pipeline may be added to ImageResizer in the near future, based on our fork of libgd with custom resizing algorithms. No promises yet, but we may have something nearly as high quality as GDI+ with similar single-threaded (but better concurrent) performance.
All our source code is on GitHub, so if you find something fast that you'd like to demo as a plugin or alternate pipeline, we'd love to hear about it.
Related
I developed a Matrix themed application in Java, and was experimenting with porting to over to C#. However, I found that the latter's DrawString method offered much poorer performance when drawing lots of single characters. Thus I'm hoping there exists one of two possibilities:
There is an alternative method of drawing lots of single characters that is much faster.
There is a way to draw a string with fixed spacing to achieve the same effect. This does not seem likely.
Does anyone know of any way to accomplish either 1 or 2?
Additional information:
I need to be able to draw around 20000 characters 30 times per
second.
The characters can have the same font and size, but colour should be able to be
altered.
The set of characters is finite (letters, numbers, and punctuation).
The location of the characters are along a 2D grid, and do not overlap.
I am not aware of some blazing fast alternative, but using GDI(TextRenderer) instead of GDI+(DrawString) will give a better result. Up to 5-6 times faster.
GDI vs. GDI+ Text Rendering Performance
Another useful article - Rendering fast with GDI+ - What to do and what not to do!
Is there an alternative method of drawing lots of single characters that is much faster?
The Graphics.DrawString methods use GDI+ to draw, which tends to be slower than GDI. GDI is usually hardware accelerated, assuming you have a decent set of graphics drivers installed. The only exception to this is Windows Vista with the Aero theme enabled, but that was fixed in Windows 7. You can switch to GDI instead by calling one of the TextRenderer.DrawText methods instead. Not only is that likely to be somewhat faster than GDI+, there are other advantages to using GDI. The only real disadvantage is that WinForms doesn't support using GDI for printing. But it doesn't sound like that's a concern for you.
Assuming you're targeting only modern versions of Windows that support them, you could also look into some of the new graphics technologies like Direct2D and DirectWrite. For C# wrappers, you might look into the Windows API Code Pack (also see this blog article). OpenGL might also be an option. I haven't used it, but speed is among its claims to fame.
Unfortunately, I'm still not sure if this will be fast enough. A million characters 30 times per second is really an unreasonable amount. No video output device that I know of is even capable of displaying this. Surely there is some type of caching that you could be doing instead. Or perhaps an entirely different design for your application?
Also, keep in mind that drawing into a background buffer (e.g., a bitmap) is often considerably faster than drawing directly onto the display buffer (i.e., the monitor). So you can do all of your drawing onto this background buffer, then periodically blit that to the screen in a single pass. This technique is often known as "double-buffering" (because it uses two buffers), and is a well-known tactic for minimizing flicker. It can also produce speed improvements if you need to draw as quickly as possible, because it allows you to do so while taking into account the inherent limitations of the display output.
There is a way to draw a string with fixed spacing to achieve the same effect. This does not seem likely.
Drawing with fixed spacing is not going to increase the speed over using proportional spacing.
There are a few "tricks of the trade" to keep in mind when you're writing particularly critical drawing code, but I seriously doubt that these will be of much use to you, given how high you're expectations are. They're more for tuning an algorithm, not increasing its performance by multiple orders of magnitude.
Our web server needs to process many compositions of large images together before sending the results to web clients. This process is performance critical because the server can receive several thousands of requests per hour.
Right now our solution loads PNG files (around 1MB each) from the HD and sends them to the video card so the composition is done on the GPU. We first tried loading our images using the PNG decoder exposed by the XNA API. We saw the performance was not too good.
To understand if the problem was loading from the HD or the decoding of the PNG, we modified that by loading the file in a memory stream, and then sending that memory stream to the .NET PNG decoder. The difference of performance using XNA or using System.Windows.Media.Imaging.PngBitmapDecoder class is not significant. We roughly get the same levels of performance.
Our benchmarks show the following performance results:
Load images from disk: 37.76ms 1%
Decode PNGs: 2816.97ms 77%
Load images on Video Hardware: 196.67ms 5%
Composition: 87.80ms 2%
Get composition result from Video Hardware: 166.21ms 5%
Encode to PNG: 318.13ms 9%
Store to disk: 3.96ms 0%
Clean up: 53.00ms 1%
Total: 3680.50ms 100%
From these results we see that the slowest parts are when decoding the PNG.
So we are wondering if there wouldn't be a PNG decoder we could use that would allow us to reduce the PNG decoding time. We also considered keeping the images uncompressed on the hard disk, but then each image would be 10MB in size instead of 1MB and since there are several tens of thousands of these images stored on the hard disk, it is not possible to store them all without compression.
EDIT: More useful information:
The benchmark simulates loading 20 PNG images and compositing them together. This will roughly correspond to the kind of requests we will get in the production environment.
Each image used in the composition is 1600x1600 in size.
The solution will involve as many as 10 load balanced servers like the one we are discussing here. So extra software development effort could be worth the savings on the hardware costs.
Caching the decoded source images is something we are considering, but each composition will most likely be done with completely different source images, so cache misses will be high and performance gain, low.
The benchmarks were done with a crappy video card, so we can expect the PNG decoding to be even more of a performance bottleneck using a decent video card.
There is another option. And that is, you write your own GPU-based PNG decoder. You could use OpenCL to perform this operation fairly efficiently (and perform your composition using OpenGL which can share resources with OpenCL). It is also possible to interleave transfer and decoding for maximum throughput. If this is a route you can/want to pursue I can provide more information.
Here are some resources related to GPU-based DEFLATE (and INFLATE).
Accelerating Lossless compression with GPUs
gpu-block-compression using CUDA on Google code.
Floating point data-compression at 75 Gb/s on a GPU - note that this doesn't use INFLATE/DEFLATE but a novel parallel compression/decompression scheme that is more GPU-friendly.
Hope this helps!
Have you tried the following 2 things.
1)
Multi thread it, there is several ways of doing this but one would be a "all in" method. Basicly fully spawn X amount of threads, for the full proccess.
2)
Perhaps consider having XX thread do all the CPU work, and then feed it to the GPU thread.
Your question is very well formulated for being a new user, but some information about the senario might be usefull?
Are we talking about a batch job or service pictures in real time?
Do the 10k pictures change?
Hardware resources
You should also take into account what hardware resources you have at your dispoal.
Normaly the 2 cheapest things are CPU power and diskspace, so if you only have 10k pictures that rarly change, then converting them all into a format that quicker to handle might be the way to go.
Multi thread trivia
Another thing to consider when doing multithreading, is that its normaly smart to make the threads in BellowNormal priority.So you dont make the entire system "lag". You have to experiment a bit with the amount of threads to use, if your luck you can get close to 100% gain in speed pr CORE but this depends alot on the hardware and the code your running.
I normaly use Environment.ProcessorCount to get the current CPU count and work from there :)
I've written a pure C# PNG coder/decoder ( PngCs ) , you might want to give it a look.
But I higly doubt it will have better speed permance [*], it's not highly optimized, it rather tries to minimize the memory usage for dealing with huge images (it encodes/decodes sequentially, line by line). But perhaps it serves you as boilerplate to plug in some better compression/decompression implementantion. As I see it, the speed bottleneck is zlib (inflater/deflater), which (contrarily to Java) is not implemented natively in C# -I used a SharpZipLib library, with pure C# managed code; this cannnot be very efficient.
I'm a little surprised, however, that in your tests decoding was so much slower than encoding. That seems strange to me, because, in most compression algorithms (perhaps in all; and surely in zlib) encoding is much more computer intensive than decoding.
Are you sure about that?
(For example, this speedtest which read and writes 5000x5000 RGB8 images (not very compressible, about 20MB on disk) gives me about 4.5 secs for writing and 1.5 secs for reading). Perhaps there are other factor apart from pure PNG decoding?
[*] Update: new versions (since 1.1.14) that have several optimizations; if you can use .Net 4.5, specially, it should provide better decoding speed.
You have mutliple options
Improve the performance of the decoding process
You could implement another faster png decoder
(libpng is a standard library which might be faster)
You could switch to another picture format that uses simpler/faster decodeable compression
Parallelize
Use the .NET parallel processing capabilities for decoding concurrently. Decoding is likely singlethreaded so this could help if you run on multicore machines
Store the files uncompressed but on a device that compresses
For instance a compressed folder or even a sandforce ssd.
This will still compress but differently and burden other software with the decompression. I am not sure this will really help and would only try this as a last resort.
We're going to do some experiments on perceptual thresholds and want to show an image for a very short time. I'm speaking about less then 10 ms (our screen supports 144 Hz, that is a new image every 6.94 ms).
But all our approaches until now failed. We tried it with C#: WinForms were much too slow, WPF was faster, but we were still able to see the image, and even showing a texture with XNA framework did not work for us.
Do you have any suggestions for us? We're allowed to use C++ too, but we prefer using C#, so if your suggestions works with C# we would really much appreciate.
I would highly recommend using the XNA Framework for this. Some might consider it overkill but the fact is it is first of all designed to handle high throughput of frames and secondly has a relatively small learning curve.
I'm not a game developer but started by reading an article, modifying a simple project and creating one from scratch.
This would probably be your best bet in terms of the technologies you mentioned in your question.
UPDATE: Regarding measuring the actual time an image is shown, XNA would again be your closest guess short of dealing with specialized hardware and resorting to low-level programming.
I want to compress bitmaps to PNG with different compression levels as these levels are available in JPEG compression in C#. I have 20 to 30 images of different sizes to process in 1 sec. Is there any library to achieve this compression in PNG with different compression levels?
A lossless compression method does not have to be single quality level. I mean, some lossless compression methods have some parameters to choose speed/compression ratio trade-off. It's up to author.
As to PNG compression, it's actually bunch of filters and deflate algorithm. Considering deflate has a redundant encoding stage (e.g. two different compressors can produce completely different output, yet still they can be decompressed by any valid decompressor), it's no wonder several programs outputs different PNGs. Note that, I'm still not taking filters into account. There are several filters and there is no "best" filter i.e. it's quality varies by image.
Due to it's redundant feature, some people wrote PNG optimizers which take a regular PNG as input to produce smaller PNG without any perceptual loss. Some of them applies some tricks such as filling completely transparent area with some predictable colors to increase compression. But, in general they tweak parameters in a brute-force fashion.
Now, simplest answer for your question could be that you have two options:
If you're going to run your executable on a desktop environment, you can use one of these tools as an optimizer. But, in shared hosting you can't use them due to possible privileges.
You can write your own PNG writer in .NET that takes into account some brute-force parameter tuning.
According to this answer C#: Seeking PNG Compression algorithm/library PNG in C# is a lossless compression library, e.g. does not support multiple quality levels.
The wiki page on PNG (http://en.wikipedia.org/wiki/Portable_Network_Graphics) seems to confirm the compression format is lossless (e.g. has only a single compression level)
Some googling suggests there is research on lossy versions of PNG
I've become rather familiar with the System.Drawing namespace in terms of knowing the basic steps of how to generate an image, draw on it, write text on it, etc. However, it's so darn slow for anything approaching print-quality. I have seen some suggestions using COM to talk to native Windows GDI to do this quicker but I was wondering if there were any optimizations I could make to allow for high speed, high quality image generation. I have tried playing with the anti-aliasing options and such immediately available to the Graphics, Bitmap and Image objects but are there any other techniques I can employ to do this high speed?
Writing this I just had the thought to use the task library in .Net 4 to do MORE work even though each generation task wouldn't be any quicker.
Anyway, thoughts and comments appreciated.
Thanks!
If you want raw speed, the best option is to use DirectX. The next best is probably to use GDI from C++ and provide a managed interface to call it. Then probably to p/invoke to GDI directly from C#, and last to use GDI+ in C#. But depending on what you're doing you may not see a huge difference. Multithreading may not help you if you're limited by the speed at which the graphics card can be driven by GDI+, but could be beneficial if you're processor bound while working out "what to draw". If you're printing many images in sequence, you may gain by running precalculation, rendering, and printing "phases" on separate threads.
However, there are a number of things you can do to optimise redraw speed, both optimisations and compromises, and these approaches will apply to any rendering system you choose. Indeed, most of these stem from the same principles used when optimising code.
How can you minimise the amount that you draw?
Firstly, eliminate unnecessary work. Think about each element you are drawing - is it really necessary? Often a simpler design can actually look better while saving a lot of rendering effort. Consider whether a grad fill can be replaced by a flat fill, or whether a rounded rectangle will look acceptable as a plain rectangle (and test whether doing this even provides any speed benefit on your hardware before throwing it away!)
Challenge your assumptions - e.g. the "high resolution" requirement - often if you're printing on something like a dye-sub printer (which is a process that introduces a bit of colour bleed) or a CMYK printer that uses any form of dithering to mix colours (which has a much lower practical resolution than the dot pitch the printer can resolve), a relatively low resolution anti-aliased image can often produce just as good a result as a super-high-res one. If you're outputting to a 2400dpi black and white printer, you may still find that 1200dpi or even 600dpi is acceptable (you get ever decreasing returns as you increase the resolution, and most people won't notice the difference between 600dpi and 2400dpi). Just print some typical examples out using different source resolutions to see what the results are like. If you can halve the resolution you could potentially render as much as 4x faster.
Generally try to avoid overdrawing the same area - If you want to draw a black frame around a region, you could draw a white rectangle inside a black rectangle, but this means filling all the pixels in the middle twice. You may improve the performance by drawing 4 black rectangles around the outside to draw the frame exactly. Conversely, if you have a lot of drawing primitives, can you reduce the number of primitives you're drawing? e.g. If you are drawing a lot of stripes, you can draw alternating rectangles of different colours (= 2n rectangles), or you can fill the entire background with one colour and then only draw the rectangles for the second colour (= n+1 rectangles). Reducing the number of individual calls to GDI+ methods can often provide significant gains, especially if you have fast graphics hardware.
If you draw any portion of the image more than once, consider caching it (render it into a bitmap and then blitting it to your final image when needed). The more complex this sub-image is, the more likely it is that caching it will pay off. For example, if you have a repeating pattern like a ruler, don't draw every ruler marking as a separate line - render a repeating section of the ruler (e.g. 10 lines or 50) and then blit this prerendered only a few times to draw the final ruler.
Similarly, avoid doing lots of unnecessary work (like many MeasureString calls for values that could be precalculated once or even approximated. Or if you're stepping through a lot of Y values, try to do it by adding an offset on each iteration rather than recaclualting the absolute position using mutliples every time).
Try to "batch" drawing to minimise the number of state changes and/or drawing method calls that are necessary - e.g. draw all the elements that are in one colour/texture/brush before you move on to the next colour. Use "batch" rendering calls (e.g. Draw a polyline primitive once rather than calling DrawLine 100 times).
If you're doing any per-pixel operations, then it's usually a lot faster to grab the raw image buffer and manipulate it directly than to call GetPixel/SetPixel methods.
And as you've already mentioned, you can turn off expensive operations such as anti-aliasing and font smoothing that won't be of any/much benefit in your specific case.
And of course, look at the code you're rendering with - profile it and apply the usual optimisations to help it flow efficiently.
Lastly, there comes a point where you should consider whether a hardware upgrade might be a cheap and effective solution - if you have a slow PC and a low end graphics card there may be a significant gain to be had by just buying a new PC with a better graphics card in it. Or if the images are huge, you may find a couple of GB more RAM eliminates virtual memory paging overheads. It may sound expensive, but there comes a point where the cost/benefit of new hardware is better than ploughing more money into additional work on optimisations (and their ever decreasing returns).
I have a few ideas:
Look at the code in Paint.net. It is an open source paint program written in C#. It could give you some good ideas. You could certainly do this in conjunction with ideas 2 and 3.
If the jobs need to be done in an "immediate" way, you could use something asynchronous to create the images. Depending on the scope of the overall application, you might even use something like NServiceBus to queue an image processing component with the task. Once the task is completed, the sending component would receive a notification via subscribing to the message published upon completion.
The task based solution is good for delayed processing. You could batch the creation of the images and use either the Task approach or something called Quartz.net (http://quartznet.sourceforge.net). It's an open source job scheduler that I use for all my time based jobs.
You can create a new Bitmap image and do LockBits(...), specifying the pixel format you want. Once you have the bits, if you want to use unmanaged code to draw into it, bring in the library, pin the data in memory, and use that library against it. I think you can use GDI+ against raw pixel data, but I was under the impression that System.Drawing is already a thin layer on top of GDI+. Anyways, whether or not I'm wrong about that, with LockBits, you have direct access to pixel data, which can be as fast or slow as you program it.
Once you're done with drawing, you can UnlockBits and viola you have the new image.