Compressing\decompressing binary files

Compressing\decompressing binary files - c#

I need to decompress the *.bin file into a certain temp-file, perform certain actions with it in the my program and then compress it. There is a function that unpacks the *.bin file. I.e., temp-file for the correct operation of the program was obtained. Function:
public void Unzlib_bin(string initial_location, string target_location)
{
byte[] data, new_data;
Inflater inflater;
inflater = new Inflater(false);
data = File.ReadAllBytes(initial_location);
inflater.SetInput(data, 16, BitConverter.ToInt32(data, 8));
new_data = new byte[BitConverter.ToInt32(data, 12)];
inflater.Inflate(new_data);
File.WriteAllBytes(target_location, new_data);
}
The question is how to pack the temp file to its original state? I'm trying to do something like the following, but the result is wrong:
public void Zlib_bin(byte[] data, int length, string target_location)
{
byte[] new_data;
Deflater deflater;
List<byte> compress_data;
compress_data = new List<byte>();
new_data = new byte[length];
deflater = new Deflater();
compress_data.AddRange(new byte[] { ... }); //8 bytes from header, it does not matter
deflater.SetLevel(Deflater.BEST_COMPRESSION);
deflater.SetInput(data, 0, data.Length);
deflater.Finish();
deflater.Deflate(new_data);
compress_data.AddRange(BitConverter.GetBytes(new_data.Length));
compress_data.AddRange(BitConverter.GetBytes(data.Length));
compress_data.AddRange(new_data);
File.WriteAllBytes(target_location, compress_data.ToArray());
}
Any ideas?

If by "the result is wrong" you mean that you are not able to compress back to exactly the same bytes, that is entirely to be expected. There is no guarantee that decompressing followed by compressing gives you the same thing, unless you are using exactly the same compression code, version of that code, and settings for that code.
There are many compressed data streams to represent the same uncompressed data, and compressors are free to use any of them, usually due to tradeoffs in execution time, memory used, and compressed ratio. That is why compressors have "levels" and other adjustments.
The only guarantee for a lossless compressor is that when you compress and then decompress that you get exactly what you started with.
If decompressing what you got gives the same uncompressed data, then all is well.

Related

Stuck on second loop through at the reading part of the stream

public static void RemoteDesktopFunction()
{
Task.Run(async() =>
{
while (!ClientSession.noConnection && data != "§Close§")
{
byte[] frameBytes = ScreenShotToByteArray();
byte[] buffer = new byte[900];
using (MemoryStream byteStream = new MemoryStream())
{
await byteStream.WriteAsync(frameBytes, 0, frameBytes.Length);
byteStream.Seek(0, SeekOrigin.Begin);
for (int i = 0; i <= frameBytes.Length; i+= buffer.Length)
{
await byteStream.ReadAsync(buffer, i, buffer.Length);
await ClientSession.SendData(Encoding.UTF8.GetString(buffer).Trim('\0')+ "§RemoteDesktop§");
}
await ClientSession.SendData("§RemoteDesktopFrameDone§§RemoteDesktop§");
};
}
});
}
I'm trying to add a remoteDesktop function to my program by passing chunks of bytes that are read from the byte stream. frameBytes.length is about 20,000b in the debugger. And the chunk is 900b. I expected it to read through and send chunks of data from the frameBytes array to a network stream. But it got stuck on :
await byteStream.ReadAsync(buffer, i, buffer.Length);
On the second loopthrough...
What could cause the issue?

There is no obvious reason why this code should hand on ReadAsync. But an obvious problem is that you are not using the return value that tells you how many bytes are actually read. So the last 'chunk' will likely have a bunch of invalid data from the last chunk at the end.
Note that there is really no reason to use async variants to read/write 900 bytes fro/to a memory stream. Async is mostly meant to hide IO latency, and writing to memory is not an IO operation.
If the goal is to chunk a byte array you can just use the overload of GetString that takes a span.
var chunk = frameBytes.AsSpan().Slice(i, Math.Min(900, frameBytes.Length - i);
At least on any modern c# version, on older versions you can just use Buffer.BlockCopy, no need for a memory stream.
All this assumes your actual idea is sound. I know little about RDP, but it seems odd to convert a array of more or less random data to a string as if it was UTF8 encoded. Normally when sending binary data over a text protocol you would encode it as a base64 string, or possibly prefix it with a command that includes the length. I'm also not sure what the purpose of sending it in chunks is, what is the client supposed to do with 900bytes of screenshot? But again, I know little about RDP.

Decompress tar files using C#

I'm searching a way to add embedded resource to my solution. This resources will be folders with a lot of files in them. On user demand they need to be decompressed.
I'm searching for a way do store such folders in executable without involving third-party libraries (Looks rather stupid, but this is the task).
I have found, that I can GZip and UnGZip them using standard libraries. But GZip handles single file only. In such cases TAR should come to the scene. But I haven't found TAR implementation among standard classes.
Maybe it possible decompress TAR with bare C#?

While looking for a quick answer to the same question, I came across this thread, and was not entirely satisfied with the current answers, as they all point to using third-party dependencies to much larger libraries, all just to achieve simple extraction of a tar.gz file to disk.
While the gz format could be considered rather complicated, tar on the other hand is quite simple. At its core, it just takes a bunch of files, prepends a 500 byte header (but takes 512 bytes) to each describing the file, and writes them all to single archive on a 512 byte alignment. There is no compression, that is typically handled by compressing the created file to a gz archive, which .NET conveniently has built-in, which takes care of all the hard part.
Having looked at the spec for the tar format, there are only really 2 values (especially on Windows) we need to pick out from the header in order to extract the file from a stream. The first is the name, and the second is size. Using those two values, we need only seek to the appropriate position in the stream and copy the bytes to a file.
I made a very rudimentary, down-and-dirty method to extract a tar archive to a directory, and added some helper functions for opening from a stream or filename, and decompressing the gz file first using built-in functions.
The primary method is this:
public static void ExtractTar(Stream stream, string outputDir)
{
var buffer = new byte[100];
while (true)
{
stream.Read(buffer, 0, 100);
var name = Encoding.ASCII.GetString(buffer).Trim('\0');
if (String.IsNullOrWhiteSpace(name))
break;
stream.Seek(24, SeekOrigin.Current);
stream.Read(buffer, 0, 12);
var size = Convert.ToInt64(Encoding.ASCII.GetString(buffer, 0, 12).Trim(), 8);
stream.Seek(376L, SeekOrigin.Current);
var output = Path.Combine(outputDir, name);
if (!Directory.Exists(Path.GetDirectoryName(output)))
Directory.CreateDirectory(Path.GetDirectoryName(output));
using (var str = File.Open(output, FileMode.OpenOrCreate, FileAccess.Write))
{
var buf = new byte[size];
stream.Read(buf, 0, buf.Length);
str.Write(buf, 0, buf.Length);
}
var pos = stream.Position;
var offset = 512 - (pos % 512);
if (offset == 512)
offset = 0;
stream.Seek(offset, SeekOrigin.Current);
}
}
And here is a few helper functions for opening from a file, and automating first decompressing a tar.gz file/stream before extracting.
public static void ExtractTarGz(string filename, string outputDir)
{
using (var stream = File.OpenRead(filename))
ExtractTarGz(stream, outputDir);
}
public static void ExtractTarGz(Stream stream, string outputDir)
{
// A GZipStream is not seekable, so copy it first to a MemoryStream
using (var gzip = new GZipStream(stream, CompressionMode.Decompress))
{
const int chunk = 4096;
using (var memStr = new MemoryStream())
{
int read;
var buffer = new byte[chunk];
do
{
read = gzip.Read(buffer, 0, chunk);
memStr.Write(buffer, 0, read);
} while (read == chunk);
memStr.Seek(0, SeekOrigin.Begin);
ExtractTar(memStr, outputDir);
}
}
}
public static void ExtractTar(string filename, string outputDir)
{
using (var stream = File.OpenRead(filename))
ExtractTar(stream, outputDir);
}
Here is a gist of the full file with some comments.

Tar-cs will do the job, but it is quite slow. I would recommend using SharpCompress which is significantly quicker. It also supports other compression types and it has been updated recently.
using System;
using System.IO;
using SharpCompress.Common;
using SharpCompress.Reader;
private static String directoryPath = #"C:\Temp";
public static void unTAR(String tarFilePath)
{
using (Stream stream = File.OpenRead(tarFilePath))
{
var reader = ReaderFactory.Open(stream);
while (reader.MoveToNextEntry())
{
if (!reader.Entry.IsDirectory)
{
ExtractionOptions opt = new ExtractionOptions {
ExtractFullPath = true,
Overwrite = true
};
reader.WriteEntryToDirectory(directoryPath, opt);
}
}
}
}

See tar-cs
using (FileStream unarchFile = File.OpenRead(tarfile))
{
TarReader reader = new TarReader(unarchFile);
reader.ReadToEnd("out_dir");
}

Since you are not allowed to use outside libraries, you are not restricted to a specific format of the tar file either. In fact, they don't even need it to be all in the same file.
You can write your own tar-like utility in C# that walks a directory tree, and produces two files: a "header" file that consists of a serialized dictionary mapping System.IO.Path instances to an offset/length pairs, and a big file containing the content of individual files concatenated into one giant blob. This is not a trivial task, but it's not overly complicated either.

there are 2 ways to compress/decompress in .NET first you can use Gzipstream class and DeflatStream both can actually do compress your files in .gz format so if you compressed any file in Gzipstream it can be opened with any popular compression applications such as winzip/ winrar, 7zip but you can't open compressed file with DeflatStream. these two classes are from .NET 2.
and there is another way which is Package class it's actually same as Gzipstream and DeflatStream the only different is you can compress multiple files which then can be opened with winzip/ winrar, 7zip.so that's all .NET has. but it's not even generic .zip file,
it something Microsoft uses to compress their *x extension office files. if you decompress any docx file with package class you can see everything stored in it. so don't use .NET libraries for compressing or even decompressing cause you can't even make a generic compress file or even decompress a generic zip file. you have to consider for a third party library such as
http://www.icsharpcode.net/OpenSource/SharpZipLib/
or implement everything from the ground floor.

Raw Stream Has Data, Deflate Returns Zero Bytes

I'm reading data (an adCenter report, as it happens), which is supposed to be zipped. Reading the contents with an ordinary stream, I get a couple thousand bytes of gibberish, so this seems reasonable. So I feed the stream to DeflateStream.
First, it reports "Block length does not match with its complement." A brief search suggests that there is a two-byte prefix, and indeed if I call ReadByte() twice before opening DeflateStream, the exception goes away.
However, DeflateStream now returns nothing at all. I've spent most of the afternoon chasing leads on this, with no luck. Help me, StackOverflow, you're my only hope! Can anyone tell me what I'm missing?
Here's the code. Naturally I only enabled one of the two commented blocks at a time when testing.
_results = new List<string[]>();
using (Stream compressed = response.GetResponseStream())
{
// Skip the zlib prefix, which conflicts with the deflate specification
compressed.ReadByte(); compressed.ReadByte();
// Reports reading 3,000-odd bytes, followed by random characters
/*byte[] buffer = new byte[4096];
int bytesRead = compressed.Read(buffer, 0, 4096);
Console.WriteLine("Read {0} bytes.", bytesRead.ToString("#,##0"));
string content = Encoding.ASCII.GetString(buffer, 0, bytesRead);
Console.WriteLine(content);*/
using (DeflateStream decompressed = new DeflateStream(compressed, CompressionMode.Decompress))
{
// Reports reading 0 bytes, and no output
/*byte[] buffer = new byte[4096];
int bytesRead = decompressed.Read(buffer, 0, 4096);
Console.WriteLine("Read {0} bytes.", bytesRead.ToString("#,##0"));
string content = Encoding.ASCII.GetString(buffer, 0, bytesRead);
Console.WriteLine(content);*/
using (StreamReader reader = new StreamReader(decompressed))
while (reader.EndOfStream == false)
_results.Add(reader.ReadLine().Split('\t'));
}
}
As you can probably guess from the last line, the unzipped content should be TDT.
Just for fun, I tried decompressing with GZipStream, but it reports that the magic number is not correct. MS' docs just say "The downloaded report is compressed by using zip compression. You must unzip the report before you can use its contents."
Here's the code that finally worked. I had to save the content out to a file and read it back in. This does not seem reasonable, but for the small quantities of data I'm working with, it's acceptable, I'll take it!
WebRequest request = HttpWebRequest.Create(reportURL);
WebResponse response = request.GetResponse();
_results = new List<string[]>();
using (Stream compressed = response.GetResponseStream())
{
// Save the content to a temporary location
string zipFilePath = #"\\Server\Folder\adCenter\Temp.zip";
using (StreamWriter file = new StreamWriter(zipFilePath))
{
compressed.CopyTo(file.BaseStream);
file.Flush();
}
// Get the first file from the temporary zip
ZipFile zipFile = ZipFile.Read(zipFilePath);
if (zipFile.Entries.Count > 1) throw new ApplicationException("Found " + zipFile.Entries.Count.ToString("#,##0") + " entries in the report; expected 1.");
ZipEntry report = zipFile[0];
// Extract the data
using (MemoryStream decompressed = new MemoryStream())
{
report.Extract(decompressed);
decompressed.Position = 0; // Note that the stream does NOT start at the beginning
using (StreamReader reader = new StreamReader(decompressed))
while (reader.EndOfStream == false)
_results.Add(reader.ReadLine().Split('\t'));
}
}

You will find that DeflateStream is hugely limited in what data it will decompress. In fact if you are expecting entire files it will be of no use at all.
There are hundereds of (mostly small) variations of ZIP files and DeflateStream will get along only with two or three of them.
Best way is likely to use a dedicated library for reading Zip files/streams like DotNetZip or SharpZipLib (somewhat unmaintained).

You could write the stream to a file and try my tool Precomp on it. If you use it like this:
precomp -c- -v [name of input file]
any ZIP/gZip stream(s) inside the file will be detected and some verbose information will be reported (position and length of the stream). Additionally, if they can be decompressed and recompressed bit-to-bit identical, the output file will contain the decompressed stream(s).
Precomp detects ZIP/gZip (and some other) streams anywhere in the file, so you won't have to worry about header bytes or garbage at the beginning of the file.
If it doesn't detect a stream like this, try to add -slow, which detects deflate streams even if they don't have a ZIP/gZip header. If this fails, you can try -brute which even detects deflate streams that lack the two byte header, but this will be extremely slow and can cause false positives.
After that, you'll know if there is a (valid) deflate stream in the file and if so, the additional information should help you to decompress other reports correctly using zLib decompression routines or similar.

C++ zlib inflate failing - translation of c# fixup?

I'm trying to inflate a string using zlib's deflate, but it's failing, apparently because it doesn't have the right header. I read elsewhere that the C# solution to this problem is:
public static byte[] FlateDecode(byte[] inp, bool strict) {
MemoryStream stream = new MemoryStream(inp);
InflaterInputStream zip = new InflaterInputStream(stream);
MemoryStream outp = new MemoryStream();
byte[] b = new byte[strict ? 4092 : 1];
try {
int n;
while ((n = zip.Read(b, 0, b.Length)) > 0) {
outp.Write(b, 0, n);
}
zip.Close();
outp.Close();
return outp.ToArray();
}
catch {
if (strict)
return null;
return outp.ToArray();
}
}
But I know nothing about C#. I can surmise that all it's doing is adding a prefix to the string, but what that prefix is, I have no idea. Would someone be able to phrase this function (or even just the header creation and string concatenation) in C++?
The data which I'm trying to inflate is taken from a PDF using zlib deflation.
Thanks a million,
Wyatt

I've had better luck using SharpZipLib for zlib interop than with the native .Net Framework classes. This correctly handles streams from C++ (zlib native) and from Java's compression classes without any funny business being needed.

I can't see any prefixes, sorry. Here's what the logic appears to be; sorry this isn't in C++:
MemoryStream stream = new MemoryStream(inp);
InflaterInputStream zip = new InflaterInputStream(stream);
Create an inflate stream from the data passed
MemoryStream outp = new MemoryStream();
Create a memory buffer stream for output
byte[] b = new byte[strict ? 4092 : 1];
try {
int n;
while ((n = zip.Read(b, 0, b.Length)) > 0) {
If you're in strict mode, read up to 4092 bytes - or 1 in non-strict mode - into a byte buffer
outp.Write(b, 0, n);
Write all the bytes decoded (may be less than the 4092) to the output memory buffer stream
zip.Close();
outp.Close();
return outp.ToArray();
Clean up, and return the output memory buffer stream as an array.
I'm a bit confused, though: why not just cut array b off at n elements and return that rather than go via a MemoryStream? The code also ought really to take care to clean up the memory streams and zip on exception (e.g. using using) since they're all IDisposable but I guess that's not really important since they don't correspond to I/O file handles, only memory structures.

String compression result as a string

I founded following code on internet for string compression. When I compress a simple string, return value is very different.
For example, Compress("abc") returns "AwAAAB+LCAAAAAAABADtvQdgHEmWJSYvbcp7f0r1StfgdKEIgGATJNiQQBDswYjN5pLsHWlHIymrKoHKZVZlXWYWQMztnbz33nvvvffee++997o7nU4n99//P1xmZAFs9s5K2smeIYCqyB8/fnwfPyKyyfT/AcJBJDUDAAAA"
Can I take simple string result.
Thanks
using System.IO.Compression;
using System.Text;
using System.IO;
public static string Compress(string text)
{
byte[] buffer = Encoding.UTF8.GetBytes(text);
MemoryStream ms = new MemoryStream();
using (GZipStream zip = new GZipStream(ms, CompressionMode.Compress, true))
{
zip.Write(buffer, 0, buffer.Length);
}
ms.Position = 0;
MemoryStream outStream = new MemoryStream();
byte[] compressed = new byte[ms.Length];
ms.Read(compressed, 0, compressed.Length);
byte[] gzBuffer = new byte[compressed.Length + 4];
System.Buffer.BlockCopy(compressed, 0, gzBuffer, 4, compressed.Length);
System.Buffer.BlockCopy(BitConverter.GetBytes(buffer.Length), 0, gzBuffer, 0, 4);
return Convert.ToBase64String (gzBuffer);
}

Code you are using is intended for compress really large string. It compress source string by using GZip compression algorithm and then make it readable (or maybe usable / "passable") by using BASE64 encoding.
Base64 expand source string up to ~1.33 times large (8 bit symbol is encoded as 6 bit + 2 bit overflow for next symbol). So to make sense string have to be compressed at least to 70% from source length.
The result is expected and usual when using that encoding.
To answer your question please define what you mean by "simple string result"

sure, because the result is in base64 (see the last line in your code).

Compression doesn't always result in a smaller output for a few reasons:
The input might be completely random, in which case most compressions will not compress anything, but still need to save the decompression "instructions". The result of compressing such data is data + instructions...bigger.
The input has no features searched by the used compression algorithm. This is a very similar case to the previous one, except it is dependent on the compression algorithm used (in your case Gzip).
Very small input. The smaller the input, the less chance to find compressible segments in it, so there is a big chance you'll get pseudo-random input (not random, but so small it looks random), and we go back to the first case again.
Base64 is a big point here, yes, but just don't forget these small facts about compression in general.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Compressing\decompressing binary files - c#

Related

Stuck on second loop through at the reading part of the stream

Decompress tar files using C#

Raw Stream Has Data, Deflate Returns Zero Bytes

C++ zlib inflate failing - translation of c# fixup?

String compression result as a string

Categories

Resources