How to reduce Base64 size?

How to reduce Base64 size? - c#

I am new to web service development. I am working on a web service where I need to upload image on server using web service which will be called from an Android application.
My problem is, when user selects image from Android device and clicks on upload button at that time it generates string using Base64 class file. And its length is more than 3000 characters.
So, I need to reduce the size of string which is generated using Base64 for image.

Base-64 is, by definition, a fixed size representation; n bytes binary will always be m characters base-64. The most you can do is remove the final chunk's padding, which will save you a whopping 3 characters maximum.
If you want your payload to be shorter, you could:
use a higher base (although you'll need to write the encode/decode manually, and you'll still only be able to get to about base-90 or something, so you won't drastically reduce the size)
use a pure binary post body rather than base-anything (this is effectively base-256)
have less data
by having smaller images
or by using compression (note: many image formats are already compressed, so this often won't help)
use multiple requests

Generally speaking, to reduce data size without losing any information you'll need lossless compression. A good starting point there might be Android's built-in zip classes, but you'll still need encoding across the wire.
If it's a captured image, however, changing the parameters of the JPEG compression (or original resolution) will prove far more useful, as the compression you'll get on JPEG-like data which is then base-64'd are likely to be very low.

You need to reduce your actual image size first to be able to reduce your base64 size.

Related

What is the most effective way of sending images to the client applications?

I have an existing REST API developed using ASP.NET WEB API 2,which returns byte[] containing the image responses to the client applications. Apart from byte[] option, we can use base64 string to send as response to the client for getting the image. But base64 format has its own limitations and may not be fit for all kinds of images (i.e. images with different dimensions).
Can any one help me to know is there any other option to return image to the client with good performance?

First, Base64 is pointless in this scenario. The only reason to use it is if you need to include the image in a text-based return format. For example, if you were returning JSON, and you wanted to include the image data as a member of that JSON object, then you would need to Base64 encode. Other than that it does nothing for you. In fact, Base64 will increase the file size roughly 1.5x, since it takes more characters to encode the same data. As a result, you're actually harming performance.
When it comes to alternatives, ultimately everything is essentially a byte array. An image is a binary format, so it's always just a collection of bytes. That said, depending on what you're doing in your action with the image data, you might be better served returning a stream. This will allow the server to directly send the image data down to the client without having to load it into memory first. However, it only works if you're streaming all the way. For example, if you're simply returning a file from the server's filesystem to the client, without modifications, then you can read the file into a stream and return that stream directly. However, if you're manipulating the image, then it will be likely loaded fully into memory regardless, meaning streaming it doesn't really buy you anything. At that point, you can either return a stream or a byte array.

Data concatenation as steganography technique

For example, I recorded a video using my camera and saved it as my_vacation.mp4 which size is 50MB. I opened the video file and an encrypted file called secret_message.dat using Visual Studio, by using File.ReadAllBytes() in C#, concatenated both arrays of bytes, and then saved it as my_vacation_2.mp4.
The program I created for testing purpose is able to save the byte index where the hidden file begin and I want to use it as key to extract that hidden file later.
Now I can play that video file normally, without any error. Total file size is 65MB. Suppose no one could access the original file, of course no one would know that the last 15MB part of that video file is actually another file, right?
What might be the flaw of this technique? Is this also a valid steganography technique?

Is this a valid steganography technique?
Yes, it is. The definition of steganography is hiding information in another medium without someone suspecting its presence or existence. Just because it may be a bad approach doesn't change its intentions at all. If anything, a multitude of papers on steganography mention this technique in their introduction section as an example of how steganography can be applied.
What might be the flaw of this technique?
There are mainly 2 flaws: it is trivial to detect and is absolutely fragile to modification attacks.
Many formats encode their data either by a header which says in advance how many bytes to read before the end of file, or by putting an end-of-file marker, which means to keep on reading data until the marker is encountered. By attaching your data after that, you ensure they won't be read by the appropriate format decoder. This can fool your 11-year old cousin who knows nothing about that sort of stuff, but anyone mildly experienced can load the file and count how many bytes were read. If there are unaccounted bytes in the physical file, that will instantly raise red flags.
Even worse, it's trivial to fully extract your secret. You may argue it's encrypted, but remember, the aim of steganography is to not raise any suspicion. Most steganalysis approaches put a statistical number to it, e.g., 60% there is a message hidden in X medium. A few others can go a bit further and guess the approximate length of the embedded secret. In comparison, you're already caught red-handed.
Talking about length, a file of X bitrate/compression and Y duration approximately results to a file of size Z. Even an unsavvy one will know what's up when the size is 30% larger than expected.
Now, imagine your file is communicated through an insecure channel where a warden inspects its contents and if he suspects foul play, he can modify the file so that the recipient doesn't get the message. In this case, it's as simple as loading the file and resaving it. In fact, your method is so fragile it can be destroyed by even the most unintentional of attacks. By just uploading your track to a site for playback, it can unwittingly reencode it for higher compression, just because it makes sense.

Suppose no one could access the original file, of course no one would know that the last 15MB part of that video file is actually another file, right?
No. Your secret file is encrypted, so that probably rules out any headers showing up in hex editor, but there is a problem - MP4 container format and its structure is well known.
You can extract all video/audio tracks and what you are left with is some metadata and your secret message, so it will be obvious that it's not supposed to be there.
It is a valid technique, just not a very effective one.

is there a way to compress a paged .tiff file using C#?

At the end of my process, I need to upload several paged .tiff file images to a website. The files need to be very small, 500kb or less when i upload them.
The problem is, even with me resizing them a lot but at the same time being able to read a few lines of text that are in some of them, they are around 1mb each or so.
I first resize all images going into the tiff files but it's not enough. I need a way to change the quality of them to decrease their size as well.
Can C# do this or would I need a third party software to do it?
The files being uploaded MUST be .tiff.

You don't provide much detail about your data, so can only make some guesses as to what you might need to look at.
First, can you loose some resolution? Can you make the images smaller?
Second, can you loose some color depth? Are you saving the files in a color format when bilevel or greyscale images would suffice?
Third, how clean are these images? Are they photos, scanned documents, what? If they are scanned documents of text or drawings, then some pre-processing to remove noise can make a significant difference in size.
Lastly, what compression method are you saving the file with? Only a lossy format is going to give you the highest degree of compression is most circumstances.
Based on your follow-up:
1) If you can make smaller, this of course saves significant storage space. Determine what is the minimum acceptable resolution that they need to be and standardize on that.
2) If you need to persist color, then this step might not be as effective, since you would have to algorithmically decrease the dynamic range of colors used in the image to an acceptable level before compressing. If you are not sure what this means, then you would probably best skip considering this completely unless you can spend time learning more about image processing and/or using a image processing library that will simplify this for you.
3) I don't think you addressed this in your comments. If you want more precise help, you should update your original question and add much more detail about what you are trying to accomplish. Provide some explanations of what/why you need to do in order to help determine what tradeoffs make sense.
4) Yes, JPG is a lossy format, but I think you may be confusing a few different things (or I may not be understanding your intent from your description). If you are first resizing your original images down into a new JPG file (an intermediate image file), then you are building a TIFF file and inserting the resized JPG as a source image into a multi-page TIFF and saving that, then you need to realize that the process of how the files are compressed in the intermediate files do not necessarily have any correlation with the compression format used in the TIFF file. Depending on what you are using to build and create the TIFF file, the compression format used in the TIFF is done separately and you probably need to specify those parameters when you save that file. If this is what you are doing, then the intermediary process of saving the JPG files may be increasing the size a bit.

File extension from System.Drawing.Image

I'm writing a method that needs to save a System.Drawing.Image to a file. Without knowing the original file the Image was created from, is there anyway to determine what file extension it should have?
The best solution I've come up with is to use a Switch/Case statement with the value of Image.RawFormat.
Does it even matter that I save the Image in it's original format? Is an Image generated from a PNG any different from say one generated from a JPEG? Or is the data stored in an Image object completely generic?

While Steve Danner is correct in that an image created from a JPG will look different to an image created from a PNG once it's loaded into memory it's an uncompressed data stream.
This means that you can save it out to any file format you want.
However, if you load a JPG image and then save it as another JPG you are throwing away more information due to the compression algorithm. If you do this repeatedly you will eventually lose the image.
If you can I'd recommend always saving as PNG.

Image.RawFormat has cooties, stay away from it. I've seen several reports of it having no legal value for no apparent reason. Undiagnosed as yet.
You are quite right, it doesn't matter what format you save it to. After you loaded the file, the internal format is the same for any bitmap (not vector) with the same pixel format. Generally avoid recompressing jpeg files, they tend to get bigger and acquire more artifacts. Steve mentions multi-frame files, they need to be saved a different way.

Yes, it definitely matters because different fileformats support different features such as compression, multiple frames, etc.
I've always used a switch statement like you have, perhaps baked into an extension method or something.

To answer your question 'Does it even matter that I save the Image in it's original format?' explicitly: Yes, it does, but in a negative way.
When you load the image, it is uncompressed internally to a bitmap (or as ChrisF calls it, an uncompressed data stream). So if the original image used a lossy compression (for example jpeg), saving it in the same format will again result in loss of information (i.e. more artifacts, less detail, lower quality). Especially if you have repeated actions of read - modify - save, this is something to avoid.
(Note that it is also something to avoid if you are not modifying the picture. Just the repeated decompress - compress cycles will degrade the image quality).
So if disk space is not an issue here (and it usually isn't in the age of hard disks that are big enough for HD video), always store any intermediate pictures in lossless compression formats, or uncompressed. You may consider saving the finall output in a compressed format, depending on what you use it for. (If you want to present those final pictures on the web, jpeg or png would be good choices).

Efficient way to send images via WCF?

I am learning WCF, LINQ and a few other technologies by writing, from scratch, a custom remote control application like VNC. I am creating it with three main goals in mind:
The server will provide 'remote control' on an application level (i.e. seamless windows) instead of full desktop access.
The client can select any number of applications that are running on the server and receive a stream of images of each of them.
A client can connect to more than one server simultaneously.
Right now I am using WCF to send an array of Bytes that represents the window being sent:
using (var ms = new MemoryStream()) {
window.GetBitmap().Save(ms, ImageFormat.Jpeg);
frame.Snapshot = ms.ToArray();
}
GetBitmap implementation:
var wRectangle = GetRectangle();
var image = new Bitmap(wRectangle.Width, wRectangle.Height);
var gfx = Graphics.FromImage(image);
gfx.CopyFromScreen(wRectangle.Left, wRectangle.Top, 0, 0, wRectangle.Size, CopyPixelOperation.SourceCopy);
return image;
It is then sent via WCF (TCPBinding and it will always be over LAN) to the client and reconstructed in a blank windows form with no border like this:
using (var ms = new MemoryStream(_currentFrame.Snapshot))
{
BackgroundImage = Image.FromStream(ms);
}
I would like to make this process as efficient as possible in both CPU and memory usage with bandwidth coming in third place. I am aiming to have the client connect to 5+ servers with 10+ applications per server.
Is my existing method the best approach (while continuing to use these technologies) and is there anything I can do to improve it?
Ideas that I am looking into (but I have no experience with):
Using an open source graphics library to capture and save the images instead of .Net solution.
Saving as PNG or another image type rather than JPG.
Send image deltas instead of a full image every time.
Try and 'record' the windows and create a compressed video stream instead of picture snapshots (mpeg?).

You should be aware for this points:
Transport: TCP/binary message encoding will be fastest way to transfer your image data
Image capture: you can rely on P/Invoke to access your screen data, as this can be faster and more memory consuming. Some examples: Capturing the Screen Image in C# [P/Invoke], How to take a screen shot using .NET [Managed] and Capturing screenshots using C# (Managed)
You should to reduce your image data before send it;
choose your image format wisely, as some formats have native compression (as JPG)
an example should be Find differences between images C#
sending only diff image, you can crop it and just send non-empty areas
Try to inspect your WCF messages. This will help you to understand how messages are formatted and will help you to identify how to make that messages smaller.
Just after passing through all this steps and being satisfied with your final code, you can download VncSharp source code. It implements the RFB Protocol (Wikipedia entry), "a simple protocol for remote access to graphical user interfaces. Because it works at the framebuffer level it is applicable to all windowing systems and applications, including X11, Windows and Macintosh. RFB is the protocol used in VNC (Virtual Network Computing)."

I worked on a similar project a while back. This was my general approach:
Rasterized the captured bitmap to tiles of 32x32
To determine which tiles had changed between frames I used unsafe code to compare them 64-bits at a time
On the set of delta tiles I applied one of the PNG filters to improve compressability and had the best results with the Paeth filter
Used DeflateStream to compress the filtered deltas
Used BinaryMessageEncoding custom binding to the service to transmit the data in Binary in stead of the default Base64 encoded version
Some client-side considerations. When dealing with large amounts of data being transferred through a WCF service I found that some parameters of the HttpTransportBinding and the XmlDictionaryRenderQuotas were set to pretty conservative values. So you will want to increase them.

Check out this: Large Data and Streaming (WCF)

The fastest way to send data between client/server is to send a byte array, or several byte arrays. That way WCF don't have to do any custom serialization on your data.
That said. You should use the new WPF/.Net 3.5 library to compress your images instead of the ones from System.Drawing. The functions in the System.Windows.Media.Imaging namespace are faster than the old ones, and can still be used in winforms.
In order to know if compression is the way to go you will have to benchmark your scenario to know how the compression/decompression time compares to transferring all the bytes uncompressed.
If you transfer the data over internet, then compression will help for sure. Between components on the same machine or on a LAN, the benefit might not be so obvious.
You could also try compressing the image, then chunk the data and send asynchronously with a chunk id which you puzzle together on the client. Tcp connections start slow and increase in bandwidth over time, so starting two or four at the same time should cut the total transfer time (all depending on how much data you are sending). Chunking the compressed images bytes is also easier logic wise compared to doing tiles in the actual images.
Summed up: System.Windows.Media.Imaging should help you both cpu and bandwidth wise compared to your current code. Memory wise I would guess about the same.

Instead of capturing the entire image just send smaller subsections of the image. Meaning: starting in the upper left corner, send a 10x10 pixel image, then 'move' ten pixels and send the next 10px square, and so on. You can then send dozens of small images and then update the painted full image on the client. If you've used RDC to view images on a remote machine you've probably seen it do this sort of screen painting.
Using the smaller image sections you can then split up the deltas as well, so if nothing has changed in the current section you can safely skip it, inform the client that you're skipping it, and then move onto the next section.
You'll definitely want to use compression for sending the images. However you should check to see if you get smaller file sizes from using compression similar to gZip, or if using an image codec gives you better results. I've never run a comparison, so I can't say for certain one way or another.

Your solution looks fine to me, but I suggest (as others did) you use tiles and compress the traffic when possible. In addition, I think you should send the entire image once a while, just to be sure that the client's deltas have a common "base".
Perhaps you can use an existing solution for streaming, such as RTP-H263 for video streaming. It works great, it uses compression, and it's well documented and widely used. You can then skip the WCF part and go directly to the streaming part (either over TCP or over UDP). If your solution should go to production, perhaps the H263 streaming approach would be better in terms of responsiveness and network usage.

Bitmap scrImg = new Bitmap(Screen.PrimaryScreen.Bounds.Width, Screen.PrimaryScreen.Bounds.Height);
Graphics scr;
scr.CopyFromScreen(new Point(0, 0), new Point(0, 0), Screen.PrimaryScreen.Bounds.Size);
testPictureBox.Image = (Image)scrImg;
I use this code to capture my screen.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.