I need to draw a two-dimensional grid of Squares with centered Text on them onto a (transparent) PNG file.
The tiles need to have a sufficiently big resolution, so that the text does not get pixaleted to much.
For testing purposes I create a 2048x2048px 32-bit (transparency) PNG Image with 128x128px tiles like for example that one:
The problem is I need to do this with reasonable performance. All methods I have tried so far took more than 100ms to complete, while I would need this to be at a max < 10ms. Apart from that I would need the program generating these images to be Cross-Platform and support WebAssembly (but even if you have for example an idea how to do this using posix threads, etc. I would gladly take that as a starting point, too).
Net5 Implementation
using System.Diagnostics;
using System;
using System.Drawing;
namespace ImageGeneratorBenchmark
class Program
static int rowColCount = 16;
static int tileSize = 128;
static void Main(string[] args)
var watch = Stopwatch.StartNew();
Bitmap bitmap = new Bitmap(rowColCount * tileSize, rowColCount * tileSize);
Graphics graphics = Graphics.FromImage(bitmap);
Brush[] usedBrushes = { Brushes.Blue, Brushes.Red, Brushes.Green, Brushes.Orange, Brushes.Yellow };
int totalCount = rowColCount * rowColCount;
Random random = new Random();
StringFormat format = new StringFormat();
format.LineAlignment = StringAlignment.Center;
format.Alignment = StringAlignment.Center;
for (int i = 0; i < totalCount; i++)
int x = i % rowColCount * tileSize;
int y = i / rowColCount * tileSize;
graphics.FillRectangle(usedBrushes[random.Next(0, usedBrushes.Length)], x, y, tileSize, tileSize);
graphics.DrawString(i.ToString(), SystemFonts.DefaultFont, Brushes.Black, x + tileSize / 2, y + tileSize / 2, format);
Console.WriteLine($"Output took {watch.ElapsedMilliseconds} ms.");
This takes around 115ms on my machine. I am using the System.Drawing.Common nuget here.
Saving the bitmap takes roughly 55ms and drawing to the graphics object in the loop also takes roughly 60ms, while 40ms can be attributed to drawing the text.
Rust Implementation
use std::path::Path;
use std::time::Instant;
use image::{Rgba, RgbaImage};
use imageproc::{drawing::{draw_text_mut, draw_filled_rect_mut, text_size}, rect::Rect};
use rusttype::{Font, Scale};
use rand::Rng;
struct TextureAtlas {
segment_size: u16, // The side length of the tile
row_col_count: u8, // The amount of tiles in horizontal and vertical direction
current_segment: u32 // Points to the next segment, that will be used
fn main() {
let before = Instant::now();
let mut atlas = TextureAtlas {
segment_size: 128,
row_col_count: 16,
let path = Path::new("test.png");
let colors = vec![Rgba([132u8, 132u8, 132u8, 255u8]), Rgba([132u8, 255u8, 32u8, 120u8]), Rgba([200u8, 255u8, 132u8, 255u8]), Rgba([255u8, 0u8, 0u8, 255u8])];
let mut image = RgbaImage::new(2048, 2048);
let font = Vec::from(include_bytes!("../assets/DejaVuSans.ttf") as &[u8]);
let font = Font::try_from_vec(font).unwrap();
let font_size = 40.0;
let scale = Scale {
x: font_size,
y: font_size,
// Draw random color rects for benchmarking
for i in 0..256 {
let rand_num = rand::thread_rng().gen_range(0..colors.len());
&mut image,
Rect::at((atlas.current_segment as i32 % atlas.row_col_count as i32) * atlas.segment_size as i32, (atlas.current_segment as i32 / atlas.row_col_count as i32) * atlas.segment_size as i32)
.of_size(atlas.segment_size.into(), atlas.segment_size.into()),
let number = i.to_string();
//let text = &number[..];
let text = number.as_str(); // Somehow this conversion takes ~15ms here for 255 iterations, whereas it should normally only be less than 1us
let (w, h) = text_size(scale, &font, text);
&mut image,
Rgba([0u8, 0u8, 0u8, 255u8]),
(atlas.current_segment % atlas.row_col_count as u32) * atlas.segment_size as u32 + atlas.segment_size as u32 / 2 - w as u32 / 2,
(atlas.current_segment / atlas.row_col_count as u32) * atlas.segment_size as u32 + atlas.segment_size as u32 / 2 - h as u32 / 2,
atlas.current_segment += 1;
println!("Output took {:?}", before.elapsed());
For Rust I was using the imageproc crate. Previously I used the piet-common crate, but the output took more than 300ms. With the imageproc crate I got around 110ms in release mode, which is on par with the C# version, but I think it will perform better with webassembly.
When I used a static string instead of converting the number from the loop (see comment) I got below 100ms execution time. For Rust drawing to the image only takes around 30ms, but saving it takes 80ms.
C++ Implementation
#include <iostream>
#include <cstdlib>
#define cimg_display 0
#define cimg_use_png
#include "CImg.h"
#include <chrono>
#include <string>
using namespace cimg_library;
using namespace std;
/* Generate random numbers in an inclusive range. */
int random(int min, int max)
static bool first = true;
if (first)
first = false;
return min + rand() % ((max + 1) - min);
int main() {
auto t1 = std::chrono::high_resolution_clock::now();
static int tile_size = 128;
static int row_col_count = 16;
// Create 2048x2048px image.
CImg<unsigned char> image(tile_size*row_col_count, tile_size*row_col_count, 1, 3);
// Make some colours.
unsigned char cyan[] = { 0, 255, 255 };
unsigned char black[] = { 0, 0, 0 };
unsigned char yellow[] = { 255, 255, 0 };
unsigned char red[] = { 255, 0, 0 };
unsigned char green[] = { 0, 255, 0 };
unsigned char orange[] = { 255, 165, 0 };
unsigned char colors [] = { // This is terrible, but I don't now C++ very well.
cyan[0], cyan[1], cyan[2],
yellow[0], yellow[1], yellow[2],
red[0], red[1], red[2],
green[0], green[1], green[2],
orange[0], orange[1], orange[2],
int total_count = row_col_count * row_col_count;
for (size_t i = 0; i < total_count; i++)
int x = i % row_col_count * tile_size;
int y = i / row_col_count * tile_size;
int random_color_index = random(0, 4);
unsigned char current_color [] = { colors[random_color_index * 3], colors[random_color_index * 3 + 1], colors[random_color_index * 3 + 2] };
image.draw_rectangle(x, y, x + tile_size, y + tile_size, current_color, 1.0); // Force use of transparency. -> Does not work. Always outputs 24bit PNGs.
auto s = std::to_string(i);
CImg<unsigned char> imgtext;
unsigned char color = 1;
imgtext.draw_text(0, 0, s.c_str(), &color, 0, 1, 40); // Measure the text by drawing to an empty instance, so that the bounding box will be set automatically.
image.draw_text(x + tile_size / 2 - imgtext.width() / 2, y + tile_size / 2 - imgtext.height() / 2, s.c_str(), black, 0, 1, 40);
// Save result image as PNG (libpng and GraphicsMagick are required).
auto t2 = std::chrono::high_resolution_clock::now();
auto duration = std::chrono::duration_cast<std::chrono::milliseconds>(t2 - t1).count();
std::cout << "Output took " << duration << "ms.";
I also reimplemented the same program in C++ using CImg. For .png output libpng and GraphicsMagick are required, too. I am not very fluent in C++ and I did not even bother optimizing, because the save operation took ~200ms in Release mode, whereas the whole Image generation which is currently very unoptimized took only 30ms. So this solution also falls way short of my goal.
Where I am right now
A graph of where I am right now. I will update this when I make some progress.
Why I am trying to do this and why it bothers me so much
I was asked in the comments to give a bit more context. I know this question is getting a big bloated, but if you are interested read on...
So basically I need to build a Texture Atlas for a .gltf file. I need to generate a .gltf file from data and the primitives in the .gltf file will be assigned a texture based on the input data, too. In order to optimize for a small amount of draw calls I am putting as much geometry as possible into one single primitive and then use texture coordinates to map the texture to the model. Now GPUs have a maximum size, that the texture can have. I will use 2048x2048 pixels, because the majority of devices supports at least that. That means, that if I have more than 256 objects, I need to add a new primitive to the .gltf and generate another texture atlas. In some cases one texture atlas might be sufficient, in other cases I need up to 15-20.
The textures will have a (semi-)transparent background, maybe text and maybe some lines / hatches or simple symbols, that can be drawn with a path.
I have the whole system set up in Rust already and the .gltf generating is really efficient: I can generate 54000 vertecies (=1500 boxes for example) in about 10ms which is a common case. Now for this I need to generate 6 texture atlases, which is not really a problem on a multi-core system (7 threads one for the .gltf, six for the textures). The problem is generating one takes about 100ms (or now 55 ms) which makes the whole process more than 5 times slower.
Unfortunatly it gets even worse, because another common case is 15000 objects. Generating the vertecies (plus a lot of custom attributes actually) and assembling the .gltf still only takes 96ms (540000 Vertecies / 20MB .gltf), but in that time I need to generate 59 texture atlases. I am working on a 8-core System, so at that point it gets impossible for me to run them all in parallel and I will have to generate ~9 atlases per thread (which means 55ms*9 = 495ms) so again this is 5 times as much and actually creates a quite noticeable lag. In reality it currently takes more than 2.5 s, because I am have updated to use the faster code and there seems to be additional slowdown.
What I need to do
I do understand that it will take some time to write out 4194304 32-bit pixels. But as far as I can see, because I am only writing to different parts of the image (for example only to the upper tile and so on) it should be possible to build a program that does this using multiple threads. That is what I would like to try and I would take any hint on how to make my Rust program run faster.
If it helps I would also be willing to rewrite this in C or any other language, that can be compiled to wasm and can be called via Rust's FFI. So if you have suggestions for more performant libraries I would be very thankful for that too.
Update 1: I made all the suggested improvements for the C# version from the comments. Thanks for all of them. It is now at 115ms and almost exactly as fast as the Rust version, which makes me believe I am sort of hitting a dead-end there and I would really need to find a way to parallize this in order to make significant further improvements...
Update 2: Thanks to #pinkfloydx33 I was able to run the binary with around 60ms (including the first run) after publishing it with dotnet publish -p:PublishReadyToRun=true --runtime win10-x64 --configuration Release.
In the meantime I also tried other methods myself, namely Python with Pillow (~400ms), C# and Rust both with Skia (~314ms and ~260ms) and I also reimplemented the program in C++ using CImg (and libpng as well as GraphicsMagick).
I was able to get all of the drawing (creating the grid and the text) down to 4-5ms by:
Caching values where possible (Random, StringFormat, Math.Pow)
Using ArrayPool for scratch buffer
Using the DrawString overload accepting a StringFormat with the following options:
Alignment and LineAlignment for centering (in lieu of manually calculating)
FormatFlags and Trimming options that disable things like overflow/wrapping since we are just writing small numbers (this had an impact, though negligible)
Using a custom Font from the GenericMonospace font family instead of SystemFonts.DefaultFont
This shaved off ~15ms
Fiddling with various Graphics options, such as TextRenderingHint and SmoothingMode
I got varying results so you may want to fiddle some more
An array of Color and the ToArgb function to create an int representing the 4x bytes of the pixel's color
Using LockBits, (semi-)unsafe code and Span to
Fill a buffer representing 1px high and size * countpx wide (the entire image width) with the int representing the ARGB values of the random colors
Copy that buffer size times (now representing an entire square in height)
unsafe was required to create a Span<> from the locked bit's Scan0 pointer
Finally, using GDI/native to draw the text over the graphic
I was then able to shave a little bit of time off of the actual saving process by using the Image.Save(Stream) overload. I used a FileStream with a custom buffer-size of 16kb (over the default 4kb) which seemed to be the sweet spot. This brought the total end-to-end time down to around 40ms (on my machine).
private static readonly Random Random = new();
private static readonly Color[] UsedColors = { Color.Blue, Color.Red, Color.Green, Color.Orange, Color.Yellow };
private static readonly StringFormat Format = new()
Alignment = StringAlignment.Center,
LineAlignment = StringAlignment.Center,
FormatFlags = StringFormatFlags.NoWrap | StringFormatFlags.FitBlackBox | StringFormatFlags.NoClip,
Trimming = StringTrimming.None, HotkeyPrefix = HotkeyPrefix.None
private static unsafe void DrawGrid(int count, int size, bool save)
var intsPerRow = size * count;
var sizePerFullRow = intsPerRow * size;
var colorsLen = UsedColors.Length;
using var bitmap = new Bitmap(intsPerRow, intsPerRow, PixelFormat.Format32bppArgb);
var bmpData = bitmap.LockBits(new Rectangle(0, 0, bitmap.Width, bitmap.Height), ImageLockMode.WriteOnly, PixelFormat.Format32bppArgb);
var byteSpan = new Span<byte>(bmpData.Scan0.ToPointer(), Math.Abs(bmpData.Stride) * bmpData.Height);
var intSpan = MemoryMarshal.Cast<byte, int>(byteSpan);
var arr = ArrayPool<int>.Shared.Rent(intsPerRow);
var buff = arr.AsSpan(0, intsPerRow);
for (int y = 0, offset = 0; y < count; ++y)
// fill buffer with an entire 1px row of colors
for (var bOffset = 0; bOffset < intsPerRow; bOffset += size)
buff.Slice(bOffset, size).Fill(UsedColors[Random.Next(0, colorsLen)].ToArgb());
// duplicate the pixel high row until we've created a row of squares in full
var len = offset + sizePerFullRow;
for ( ; offset < len; offset += intsPerRow)
buff.CopyTo(intSpan.Slice(offset, intsPerRow));
using var graphics = Graphics.FromImage(bitmap);
graphics.TextRenderingHint = TextRenderingHint.ClearTypeGridFit;
// some or all of these may not even matter?
// you may try removing/modifying the rest
graphics.CompositingQuality = CompositingQuality.HighSpeed;
graphics.InterpolationMode = InterpolationMode.Default;
graphics.SmoothingMode = SmoothingMode.HighSpeed;
graphics.PixelOffsetMode = PixelOffsetMode.HighSpeed;
var font = new Font(FontFamily.GenericMonospace, 14, FontStyle.Regular);
var lenSquares = count * count;
for (var i = 0; i < lenSquares; ++i)
var x = i % count * size;
var y = i / count * size;
var rect = new Rectangle(x, y, size, size);
graphics.DrawString(i.ToString(), font, Brushes.Black, rect, Format);
if (save)
using var fs = new FileStream("Test.png", FileMode.Create, FileAccess.Write, FileShare.Write, 16 * 1024);
bitmap.Save(fs, ImageFormat.Png);
Here are the timings (in ms) using a StopWatch in Release mode, run outside of Visual Studio. At least the first 1 or 2 timings should be ignored since the methods aren't fully jitted yet. Your mileage will vary depending on your PC, etc.
Image generation only:
Elapsed: 38
Elapsed: 6
Elapsed: 4
Elapsed: 4
Elapsed: 4
Elapsed: 4
Elapsed: 5
Elapsed: 4
Elapsed: 5
Elapsed: 4
Elapsed: 4
Image Generation and saving:
Elapsed: 95
Elapsed: 48
Elapsed: 41
Elapsed: 40
Elapsed: 37
Elapsed: 42
Elapsed: 42
Elapsed: 39
Elapsed: 38
Elapsed: 40
Elapsed: 41
I don't think there is anything that can be done about the slow save. I reviewed the source code of Image.Save. It calls into Native/GDI, passing in a Handle to the Stream, the native image pointer and the Guid representing PNG's ImageCodecInfo (encoder). Any slowness is going to be on that end. Update: I have verified that you get the same slow speed when saving to a MemoryStream so this has nothing to do with the fact you are saving to a file and everything to do with what's going on behind the scenes with GDI/native.
I also attempted to get the Image drawing down further using direct unsafe (pointers) and/or tricks with Unsafe and MemoryMarshal (ex. CopyBlock) as well as unrolling the loops. Those methods either produced identical results or worse and made things a bit harder to follow.
Note: Publishing as a console application with PublishReadyToRun=true seems to help a bit as well.
I realize that the above is just an example, so this may not apply to your end goal. Upon further, extensive review I found that the bulk of the time spent is actually part of Image::Save. It doesn't matter what type of Stream we are saving to, even MemoryStream exhibits the same slowness (obviously disregarding file I/O). I am confident this is related to having GDI objects in the Image/Graphics--in our case the text from DrawString.
As a "simple" test I updated the above so that drawing of the text happened on a secondary image of all white. Without saving that image, I then looped over its individual pixels and based on the rough color (since we have aliasing to deal with) I manually set the corresponding pixel on the primary bitmap. The entire end to end process took sub 20ms on my machine. The rendered image wasn't perfect since it was a quick test, but it proves that you can do parts of this manually and still achieve really low times. The problem is the text drawing but we can leverage GDI without actually using it in our final image. You just need to find the sweet spot. I also tried using an indexed format and populating the pallette with colors beforehand also appeared to help some. Anyways, just food for thought.
I am trying to find if the image is clipped from the bottom and if it is, then I will divide it in two images from the last white pixel row. Following are the simple methods I created to check clipping and get the empty white pixel rows. Also, as you can see this is not a very good solution. This might cause performance issues for larger images. So if anyone can suggest me better ways then it will be a great help:
private static bool IsImageBottomClipping(Bitmap image)
for (int i = 0; i < image.Width; i++)
var pixel = image.GetPixel(i, image.Height - 1);
if (pixel.ToArgb() != Color.White.ToArgb())
return true;
return false;
private static int GetLastWhiteLine(Bitmap image)
for (int i = image.Height - 1; i >= 0; i--)
int whitePixels = 0;
for (int j = 0; j < image.Width; j++)
var pixel = image.GetPixel(j, i);
if (pixel.ToArgb() == Color.White.ToArgb())
whitePixels = j + 1;
if (whitePixels == image.Width)
return i;
return -1;
IsImageBottomClipping is working fine. But other method is not sending correct white pixel row. It is only sending one less row. Example image:
In this case, row around 180 should be the return value of GetLastWhiteLine method. But it is returning 192.
All right, so... we got two of subjects to tackle here. First, the optimising, then, your bug. I'll start with the optimising.
The fastest way is to work in memory directly, but, honestly, it's kind of unwieldy. The second-best choice, which is what I generally use, is to copy the raw image data bytes out of the image object. This will make you end up with four vital pieces of data:
The width, which you can just get from the image.
The height, which you can just get from the image.
The byte array, containing the image bytes.
The stride, which gives you the amount of bytes used for each line on the image.
(Technically, there's a fifth one, namely the pixel format, but we'll just force things to 32bpp here so we don't have to take that into account along the way.)
Note that the stride, technically, is not just the amount of bytes used per pixel multiplied by the image width. It is rounded up to the next multiple of 4 bytes. When working with 32-bit ARGB content, this isn't really an issue, since 32-bit is 4 bytes, but in general, it's better to use the stride and not just the multiplied width, and write all code assuming there could be padded bytes behind each line. You'll thank me if you're ever processing 24-bit RGB content with this kind of system.
However, when going over the image's content you obviously should only check the exact range that contains pixel data, and not the full stride.
The way to get these things is quite simple: use LockBits on the image, tell it to expose the image as 32 bit per pixel ARGB data (it will actually convert it if needed), get the line stride, and use Marshal.Copy to copy the entire image contents into a byte array.
Int32 width = image.Width;
Int32 height = image.Height;
BitmapData sourceData = image.LockBits(new Rectangle(0, 0, width, height), ImageLockMode.ReadOnly, PixelFormat.Format32bppArgb);
Int32 stride = sourceData.Stride;
Byte[] data = new Byte[stride * height];
Marshal.Copy(sourceData.Scan0, data, 0, data.Length);
As mentioned, this is forced to 32-bit ARGB format. If you would want to use this system to get the data out in the original format it has inside the image, just change PixelFormat.Format32bppArgb to image.PixelFormat.
Now, you have to realise, LockBits is a rather heavy operation, which copies the data out, in the requested pixel format, to new memory, where it can be read or (if not specified as read-only as I did here) edited. What makes this more optimal than your method is, quite simply, that GetPixel performs a LockBits operation every time you request a single pixel value. So you're cutting down the amount of LockBits calls from several thousands to just one.
Anyway, now, as for your functions.
The first method is, in my opinion, completely unnecessary; you should just run the second one on any image you get. Its output gives you the last white line of the image, so if that value equals height-1 you're done, and if it doesn't, you immediately have the value needed for the further processing. The first function does exactly the same as the second, after all; it checks if all pixels on a line are white. The only difference is that it only processes the last line.
So, onto the second method. This is where things go wrong. You set the amount of white pixels to the "current pixel index plus one", rather than incrementing it to check if all pixels matched, meaning the method goes over all pixels but only really checks if the last pixel on the row was white. Since your image indeed has a white pixel at the end of the last row, it aborts after one row.
Also, whenever you find a pixel that does not match, you should just abort the scan of that line immediately, like your first method does; there's no point in continuing on that line after that.
So, let's fix that second function, and rewrite it to work with that set of "byte array", "stride", "width" and "height", rather than an image. I added the "white" colour as parameter too, to make it more reusable, so it's changed from GetLastWhiteLine to GetLastClearLine.
One general usability note: if you are iterating over the height and width, do actually call your loop variables y and x; it makes things a lot more clear in your code.
I explained the used systems in the code comments.
private static Int32 GetLastClearLine(Byte[] sourceData, Int32 stride, Int32 width, Int32 height, Color checkColor)
// Get color as UInt32 in advance.
UInt32 checkColVal = (UInt32)checkColor.ToArgb();
// Use MemoryStream with BinaryReader since it can read UInt32 from a byte array directly.
using (MemoryStream ms = new MemoryStream(sourceData))
using (BinaryReader sr = new BinaryReader(ms))
for (Int32 y = height - 1; y >= 0; --y)
// Set position in the memory stream to the start of the current row.
ms.Position = stride * y;
Int32 matchingPixels = 0;
// Read UInt32 pixels for the whole row length.
for (Int32 x = 0; x < width; ++x)
// Read a UInt32 for one whole 32bpp ARGB pixel.
UInt32 colorVal = sr.ReadUInt32();
// Compare with check value.
if (colorVal == checkColVal)
// Test if full line matched the given color.
if (matchingPixels == width)
return y;
return -1;
This can be simplified, though; the loop variable x already contains the value you need, so if you simply declare it before the loop, you can check after the loop what value it had when the loop stopped, and there is no need to increment a second variable. And, honestly, the value read from the stream can be compared directly, without the colorVal variable. Making the contents of the y-loop:
ms.Position = stride * y;
Int32 x;
for (x = 0; x < width; ++x)
if (sr.ReadUInt32() != checkColVal)
if (x == width)
return y;
For your example image, this gets me value 178, which is correct when I check in Gimp.
Friends! I have a set of small images that I need to lay out as a table in a Tiff file.
How the files should look like in the final file:
I use the LibTIFF library for this.
Tell me how this can be implemented? I implement my solution in C #, but the language does not matter, since rewriting the solution is not a problem.
var Row = 10;
var Column = 10;
var PIXEL_WIDTH = 8810;
var PIXEL_HEIGHT = 11810;
//Each small image has a resolution of 881x1181
using (Tiff tiff = Tiff.Open("big.tif", "w"))
tiff.SetField(TiffTag.COMPRESSION, Compression.LZW);
tiff.SetField(TiffTag.PHOTOMETRIC, Photometric.RGB);
tiff.SetField(TiffTag.ORIENTATION, Orientation.TOPLEFT);
tiff.SetField(TiffTag.XRESOLUTION, 120);
tiff.SetField(TiffTag.YRESOLUTION, 120);
tiff.SetField(TiffTag.BITSPERSAMPLE, 8);
tiff.SetField(TiffTag.SAMPLESPERPIXEL, 3);
tiff.SetField(TiffTag.PLANARCONFIG, PlanarConfig.CONTIG);
int tileC = 0;
for (int row = 0; row < Row; row++)
for (int col = 0; col < Column; col++)
Bitmap bitmap = new Bitmap($"{row}x{col}.png");
byte[] raster = getImageRasterBytes(bitmap, System.Drawing.Imaging.PixelFormat.Format24bppRgb);
tiff.WriteEncodedStrip(tileC++, raster, raster.Length);
Thanks in advance!
First let's put aside the TIFF part of your question. The major problem is that you need to figure out how to organize the pixel data in memory before you can save the final image to any image type.
I will provide my own simple example to illustrate how that pixel data should be organized.
Let's say we want to combine 9 images in a 3x3 table.
Each image will be 3x3 pixels and 8-bit mono (1 channel).
That makes this example nice and simple with 9 bytes per image, and each image having a stride of 3 bytes per row.
The combined image will end up being 9x9 pixels, 81 bytes total.
These images are named A, B, C ... I
A0 is byte 0 of the pixel data for A, A1 is byte 1, and so on...
These images are going to be organized in a 3x3 table like this:
Then the final data layout would need to look like this:
byte[] pixelData = [
The pixel array above could then be written to any image file you want. Including TIFF.
Notice in the array above:
As you iterate through that array from index 0 to 80, you will be jumping back and forth between the three images that are on the same row until you reach the next row entirely, where the next 3 images from that row are visited in the same pattern.
To achieve a memory layout like this, you can use several approaches.
TIFF files have support for breaking up a large image into equal-sized tiles. This could be used to achieve what you are asking for by writing each image to its own tile using the libTIFF library. There is a limitation that each TIFF tile must have dimensions that are multiples of 16.
The Graphics class in System.Drawing can be used to make one large blank image and then you can draw each sub-image into the large image at any desired position. (This is the easiest way to get what you want, but it can be slow.)
Doing it manually with a loop:
// For the example I have given above, in pseudo C# code
int composite_stride = image_stride * 3; // 3 is the number of images per row
int composite_size = composite_stride * image_height * 3 // 3 is the number of images per column
byte[] composite_pixels = new byte[composite_size];
// Loop over each image you want to combine
// We need some way to know which row/column the image is from, let that be assigned to table_row and table_col
// We are also assuming all images have the same width and height
foreach (image in table)
int comp_x = table_col * image.width;
int comp_y = table_row * image.height;
for (int y=0; y<image.height; y++)
// Calculate the array index that the current row starts at
int comp_row_start = comp_y * composite_stride;
for (int x=0; x<image.width; x++)
// Calculate the array index in the composite image to write to, and the source image index to copy from
int comp_index = comp_row_start + ((comp_x + x) * image.bytes_per_pixel);
int img_index = (y * image.stride) + (x * image.bytes_per_pixel);
composite_pixels[pixel_index] = image.pixels[img_index];
Say I have this 10x5 array:
and this 1x2 array:
Now I want to write the second array into the bigger one at position 1/2 (X-pos/Y-pos), deleting all old values (my example is zero based & inclusive). The result would be:
There might be multiple sub arrays with a known overwrite hierarchy, the arrays might have more than 3 dimensions and the arrays contain complex objects.
Is there a best practice to do this in C#?
Is there language agnostic solution?
Okay, Buffer.BlockCopy is significantly faster than copying data byte-by-byte, even after you eliminate bounds checking. Interestingly, it seems that Buffer.BlockCopy does in fact work on 2D arrays :)
So you might want to try something like this:
byte[,] source = new byte[1000, 100];
byte[,] dest = new byte[2048, 2048];
int offsetX = 100;
int offsetY = 20;
int width = source.GetLength(0);
int height = source.GetLength(1);
for (int y = 0; y < height; y++)
source, y * height,
dest, offsetX + dest.GetLength(1) * (y + offsetY),
Basically, assuming that the array is a byte array with X as the first index, and Y as the second, I go one row at a time, blitting the whole row from source to dest at once.
It seems much much faster than simply copying one byte at a time, so I assume that it does actually use DMA instead of using the CPU to copy the bytes.
Do note that this will only be faster if the rows are long enough. If you're copying a single column, it will probably be slower than just copying byte by byte. If you find yourself copying columns more often than rows (ie. width is usually less than height), you might want to think about inverting the coordinates, ie. swapping X and Y.
The image is of big size and I used getPixel and and setPixel methods to access bits but found out that it was way too slow so I went to implement lock and unlock bits but could not get my head around it. I also went through tutorials of Bob Powell but the tutorials but could not understand. So, I am asking for some help here to get GLCM from the image.
GLCM is generally a very computationally intensive algorithm. It iterates through each pixel, for each neighbor. Even C++ image processing libraries have this issue.
GLCM does however lend itself quite nicely to parallel (multi-threaded) implementations as the calculations for each reference pixel are independent.
With regards to using lock and unlock bits see the example code below. One thing to keep in mind is that the image can be padded for optimization reasons. Also, if your image has a different bit depth or multiple channels you will need to adjust the code accordingly.
BitmapData data = image.LockBits(new Rectangle(0, 0, width, height),
ImageLockMode.ReadOnly, PixelFormat.Gray8);
byte* dataPtr = (byte*)data.Scan0;
int rowPadding = data.Stride - (image.Width);
// iterate over height (rows)
for (int i = 0; i < height; i++)
// iterate over width (columns)
for (int j = 0; j < width; j++)
// pixel value
int value = dataPtr[0];
// advance to next pixel
// at the end of each column, skip extra padding
if (rowPadding > 0)
dataPtr += rowPadding;