How do i DEFLATE properly? (both .NET and MONO gets it wrong)

How do i DEFLATE properly? (both .NET and MONO gets it wrong) - c#

I can't comprehend. It looks correct.
Windows gets me
21008 80373 0.2613813
80372 80372 1
Deflate on .NET isn't good (it shouldnt be 26%) but more importantly the decompressed size is 1 off. Looking at the file (its html) its missing the last >
mono (2.10.5) on ubuntu gets me
4096 80373 0.05096239
12297 12297 1
-edit- 5% is incorrect. Its 21% when done properly -end edit- 5% of the original seems proper. But the decompressed size is way to small. Looking at the file it looks like the correct html data. It just isn't decompressing it all.
Whats wrong with the code?
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.IO;
using System.IO.Compression;
namespace DeflateTest
{
class Program
{
static void Main(string[] args)
{
string fn1 = args.First();
var fn2 = fn1 + ".out";
var fn3 = fn1 + ".outorig";
{
using (var m = new MemoryStream())
using (var d = new DeflateStream(m, CompressionMode.Compress, true))
using (var f = File.OpenRead(fn1))
{
var filelen = 0;
var b = new Byte[1024 * 4];
while (true)
{
var len = f.Read(b, 0, b.Length);
if (len <= 0)
break;
filelen += len;
d.Write(b, 0, len);
}
d.Flush();
m.Flush();
var m_arr = m.ToArray();
using (var ff = File.Create(fn2))
{
ff.Write(m_arr, 0, m_arr.Length);
}
Console.WriteLine(#"{0} {1} {2}", m_arr.Length, filelen, (float)m_arr.Length / filelen);
}
}
{
using (var f = File.OpenRead(fn2))
using (var d = new DeflateStream(f, CompressionMode.Decompress, true))
using (var m = new MemoryStream())
{
var filelen = 0;
var b = new Byte[1024 * 4];
while (true)
{
var len = d.Read(b, 0, b.Length);
if (len <= 0)
break;
filelen += len;
m.Write(b, 0, len);
}
m.Flush();
var m_arr = m.ToArray();
using (var ff = File.Create(fn3)) { ff.Write(m_arr, 0, m_arr.Length); }
Console.WriteLine(#"{0} {1} {2}", m_arr.Length, filelen, (float)m_arr.Length / filelen);
}
}
}
}
}

d.Flush(); is incorrect. The deflate stream needs to be closed. d.Close(); solves it.

Related

How to read a binary file quickly in c#? (ReadOnlySpan vs MemoryStream)

I'm trying to parse a binary file as fastest as possible. So this is what I first tried to do:
using (FileStream filestream = path.OpenRead()) {
using (var d = new GZipStream(filestream, CompressionMode.Decompress)) {
using (MemoryStream m = new MemoryStream()) {
d.CopyTo(m);
m.Position = 0;
using (BinaryReaderBigEndian b = new BinaryReaderBigEndian(m)) {
while (b.BaseStream.Position != b.BaseStream.Length) {
UInt32 value = b.ReadUInt32();
} } } } }
Where BinaryReaderBigEndian class is implemented as it follows:
public static class BinaryReaderBigEndian {
public BinaryReaderBigEndian(Stream stream) : base(stream) { }
public override UInt32 ReadUInt32() {
var x = base.ReadBytes(4);
Array.Reverse(x);
return BitConverter.ToUInt32(x, 0);
} }
Then, I tried to get a performance improvement using ReadOnlySpan instead of MemoryStream. So, I tried doing:
using (FileStream filestream = path.OpenRead()) {
using (var d = new GZipStream(filestream, CompressionMode.Decompress)) {
using (MemoryStream m = new MemoryStream()) {
d.CopyTo(m);
int position = 0;
ReadOnlySpan<byte> stream = new ReadOnlySpan<byte>(m.ToArray());
while (position != stream.Length) {
UInt32 value = stream.ReadUInt32(position);
position += 4;
} } } }
Where BinaryReaderBigEndian class changed in:
public static class BinaryReaderBigEndian {
public override UInt32 ReadUInt32(this ReadOnlySpan<byte> stream, int start) {
var data = stream.Slice(start, 4).ToArray();
Array.Reverse(x);
return BitConverter.ToUInt32(x, 0);
} }
But, unfortunately, I didn't notice any improvement. So, where am I doing wrong?

I did some measurement of your code on my computer (Intel Q9400, 8 GiB RAM, SSD disk, Win10 x64 Home, .NET Framework 4/7/2, tested with 15 MB (when unpacked) file) with these results:
No-Span version: 520 ms
Span version: 720 ms
So Span version is actually slower! Why? Because new ReadOnlySpan<byte>(m.ToArray()) performs additional copy of whole file and also ReadUInt32() performs many slicings of the Span (slicing is cheap, but not free). Since you performed more work, you can't expect performance to be any better just because you used Span.
So can we do better? Yes. It turns out that the slowest part of your code is actually garbage collection caused by repeatedly allocating 4-byte Arrays created by the .ToArray() calls in ReadUInt32() method. You can avoid it by implementing ReadUInt32() yourself. It's pretty easy and also eliminates need for Span slicing. You can also replace new ReadOnlySpan<byte>(m.ToArray()) with new ReadOnlySpan<byte>(m.GetBuffer()).Slice(0, (int)m.Length);, which performs cheap slicing instead of copy of whole file. So now code looks like this:
public static void Read(FileInfo path)
{
using (FileStream filestream = path.OpenRead())
{
using (var d = new GZipStream(filestream, CompressionMode.Decompress))
{
using (MemoryStream m = new MemoryStream())
{
d.CopyTo(m);
int position = 0;
ReadOnlySpan<byte> stream = new ReadOnlySpan<byte>(m.GetBuffer()).Slice(0, (int)m.Length);
while (position != stream.Length)
{
UInt32 value = stream.ReadUInt32(position);
position += 4;
}
}
}
}
}
public static class BinaryReaderBigEndian
{
public static UInt32 ReadUInt32(this ReadOnlySpan<byte> stream, int start)
{
UInt32 res = 0;
for (int i = 0; i < 4; i++)
{
res = (res << 8) | (((UInt32)stream[start + i]) & 0xff);
}
return res;
}
}
With these changes I get from 720 ms down to 165 ms (4x faster). Sounds great, doesn't it? But we can do even better. We can completely avoid MemoryStream copy and inline and further optimize ReadUInt32():
public static void Read(FileInfo path)
{
using (FileStream filestream = path.OpenRead())
{
using (var d = new GZipStream(filestream, CompressionMode.Decompress))
{
var buffer = new byte[64 * 1024];
do
{
int bufferDataLength = FillBuffer(d, buffer);
if (bufferDataLength % 4 != 0)
throw new Exception("Stream length not divisible by 4");
if (bufferDataLength == 0)
break;
for (int i = 0; i < bufferDataLength; i += 4)
{
uint value = unchecked(
(((uint)buffer[i]) << 24)
| (((uint)buffer[i + 1]) << 16)
| (((uint)buffer[i + 2]) << 8)
| (((uint)buffer[i + 3]) << 0));
}
} while (true);
}
}
}
private static int FillBuffer(Stream stream, byte[] buffer)
{
int read = 0;
int totalRead = 0;
do
{
read = stream.Read(buffer, totalRead, buffer.Length - totalRead);
totalRead += read;
} while (read > 0 && totalRead < buffer.Length);
return totalRead;
}
And now it takes less than 90 ms (8x faster then the original!). And without Span! Span is great in situations, where it allows perform slicing and avoid array copy, but it won't improve performance just by blindly using it. After all, Span is designed to have performance characteristics on par with Array, but not better (and only on runtimes that have special support for it, such as .NET Core 2.1).

Convert byte[] to Point[] and back

Is it possible to convert byte[] to Point? I have a canvas and the drawing obtained as Point[]; I need to store it in the database as byte[] and then retrieve it and load it again as Point[].

You can serialize your points into a binary stream to get an array of bytes:
byte[] data;
using (var ms = new MemoryStream()) {
using (var bw = new BinaryWriter(ms)) {
bw.Write(points.Length);
foreach (var p in points) {
bw.Write(p.X);
bw.Write(p.Y);
}
}
data = ms.ToArray();
}
To deserialize your bytes back into an array, reverse the process:
Point[] points;
using (var ms = new MemoryStream(data)) {
using (var r = new BinaryReader(ms)) {
int len = r.ReadInt32();
points = new Point[len];
for (int i = 0 ; i != len ; i++) {
points[i] = new Point(r.ReadInt32(), r.ReadInt32());
}
}
}

Reading a file one byte at a time in reverse order

Hi I am trying to read a file one byte at a time in reverse order.So far I only managed to read the file from begining to end and write it on another file.
I need to be able to read the file from the end to the begining and print it to another file.
This is what I have so far:
string fileName = Console.ReadLine();
using (FileStream file = new FileStream(fileName ,FileMode.Open , FileAccess.Read))
{
//file.Seek(endOfFile, SeekOrigin.End);
int bytes;
using (FileStream newFile = new FileStream("newsFile.txt" , FileMode.Create , FileAccess.Write))
{
while ((bytes = file.ReadByte()) >= 0)
{
Console.WriteLine(bytes.ToString());
newFile.WriteByte((byte)bytes);
}
}
}
I know that I have to use the Seek method on the fileStream and that gets me to the end of the file.I already did that at the commented protion of the code , but I do not know how to read the file now in the while loop.
How can I achive this?

string fileName = Console.ReadLine();
using (FileStream file = new FileStream(fileName, FileMode.Open, FileAccess.Read))
{
byte[] output = new byte[file.Length]; // reversed file
// read the file backwards using SeekOrigin.Current
//
long offset;
file.Seek(0, SeekOrigin.End);
for (offset = 0; offset < fs.Length; offset++)
{
file.Seek(-1, SeekOrigin.Current);
output[offset] = (byte)file.ReadByte();
file.Seek(-1, SeekOrigin.Current);
}
// write entire reversed file array to new file
//
File.WriteAllBytes("newsFile.txt", output);
}

You could do it by reading one byte at a time, or you could read a larger buffer, write it to the output file in reverse, and continue like that until you've reached the beginning of the file. For example:
string inputFilename = "inputFile.txt";
string outputFilename = "outputFile.txt";
using (ofile = File.OpenWrite(outputFilename))
{
using (ifile = File.OpenRead(inputFilename))
{
int bufferSize = 4096;
byte[] buffer = new byte[bufferSize];
long filePos = ifile.Length;
do
{
long newPos = Math.Max(0, filePos - bufferSize);
int bytesToRead = (int)(filePos - newPos);
ifile.Seek(newPos, SeekOrigin.Set);
int bytesRead = ifile.Read(buffer, 0, bytesToRead);
// write the buffer to the output file, in reverse
for (int i = bytesRead-1; i >= 0; --i)
{
ofile.WriteByte(buffer[i]);
}
filePos = newPos;
} while (filePos > 0);
}
}
An obvious optimization would be to reverse the buffer after you've read it, and then write it in one whole chunk to the output file.
And if you know that the file will fit into memory, it's really easy:
var buffer = File.ReadAllBytes(inputFilename);
// now, reverse the buffer
int i = 0;
int j = buffer.Length-1;
while (i < j)
{
byte b = buffer[i];
buffer[i] = buffer[j];
buffer[j] = b;
++i;
--j;
}
// and write it
File.WriteAllBytes(outputFilename, buffer);

If the file is small (fits in your RAM) then this would work:
public static IEnumerable<byte> Reverse(string inputFilename)
{
var bytes = File.ReadAllBytes(inputFilename);
Array.Reverse(bytes);
foreach (var b in bytes)
{
yield return b;
}
}
Usage:
foreach (var b in Reverse("smallfile.dat"))
{
}

If the file is large (bigger than your RAM) then this would work:
using (var inputFile = File.OpenRead("bigfile.dat"))
using (var inputFileReversed = new ReverseStream(inputFile))
using (var binaryReader = new BinaryReader(inputFileReversed))
{
while (binaryReader.BaseStream.Position != binaryReader.BaseStream.Length)
{
var b = binaryReader.ReadByte();
}
}
It uses the ReverseStream class which can be found here.

Read file into ByteArrays of 4 bytes

I would like to know how I could read a file into ByteArrays that are 4 bytes long.
These arrays will be manipulated and then have to be converted back to a single array ready to be written to a file.
EDIT:
Code snippet.
var arrays = new List<byte[]>();
using (var f = new FileStream("file.cfg.dec", FileMode.Open))
{
for (int i = 0; i < f.Length; i += 4)
{
var b = new byte[4];
var bytesRead = f.Read(b, i, 4);
if (bytesRead < 4)
{
var b2 = new byte[bytesRead];
Array.Copy(b, b2, bytesRead);
arrays.Add(b2);
}
else if (bytesRead > 0)
arrays.Add(b);
}
}
foreach (var b in arrays)
{
BitArray source = new BitArray(b);
BitArray target = new BitArray(source.Length);
target[26] = source[0];
target[31] = source[1];
target[17] = source[2];
target[10] = source[3];
target[30] = source[4];
target[16] = source[5];
target[24] = source[6];
target[2] = source[7];
target[29] = source[8];
target[8] = source[9];
target[20] = source[10];
target[15] = source[11];
target[28] = source[12];
target[11] = source[13];
target[13] = source[14];
target[4] = source[15];
target[19] = source[16];
target[23] = source[17];
target[0] = source[18];
target[12] = source[19];
target[14] = source[20];
target[27] = source[21];
target[6] = source[22];
target[18] = source[23];
target[21] = source[24];
target[3] = source[25];
target[9] = source[26];
target[7] = source[27];
target[22] = source[28];
target[1] = source[29];
target[25] = source[30];
target[5] = source[31];
var back2byte = BitArrayToByteArray(target);
arrays.Clear();
arrays.Add(back2byte);
}
using (var f = new FileStream("file.cfg.enc", FileMode.Open))
{
foreach (var b in arrays)
f.Write(b, 0, b.Length);
}
EDIT 2:
Here is the Ugly Betty-looking code that accomplishes what I wanted. Now I must refine it for performance...
var arrays_ = new List<byte[]>();
var arrays_save = new List<byte[]>();
var arrays = new List<byte[]>();
using (var f = new FileStream("file.cfg.dec", FileMode.Open))
{
for (int i = 0; i < f.Length; i += 4)
{
var b = new byte[4];
var bytesRead = f.Read(b, 0, b.Length);
if (bytesRead < 4)
{
var b2 = new byte[bytesRead];
Array.Copy(b, b2, bytesRead);
arrays.Add(b2);
}
else if (bytesRead > 0)
arrays.Add(b);
}
}
foreach (var b in arrays)
{
arrays_.Add(b);
}
foreach (var b in arrays_)
{
BitArray source = new BitArray(b);
BitArray target = new BitArray(source.Length);
target[26] = source[0];
target[31] = source[1];
target[17] = source[2];
target[10] = source[3];
target[30] = source[4];
target[16] = source[5];
target[24] = source[6];
target[2] = source[7];
target[29] = source[8];
target[8] = source[9];
target[20] = source[10];
target[15] = source[11];
target[28] = source[12];
target[11] = source[13];
target[13] = source[14];
target[4] = source[15];
target[19] = source[16];
target[23] = source[17];
target[0] = source[18];
target[12] = source[19];
target[14] = source[20];
target[27] = source[21];
target[6] = source[22];
target[18] = source[23];
target[21] = source[24];
target[3] = source[25];
target[9] = source[26];
target[7] = source[27];
target[22] = source[28];
target[1] = source[29];
target[25] = source[30];
target[5] = source[31];
var back2byte = BitArrayToByteArray(target);
arrays_save.Add(back2byte);
}
using (var f = new FileStream("file.cfg.enc", FileMode.Open))
{
foreach (var b in arrays_save)
f.Write(b, 0, b.Length);
}
EDIT 3:
Loading a big file into byte arrays of 4 bytes wasn't the smartest idea...
I have over 68 million arrays being processed and manipulated. I really wonder if its possible to load it into a single array and still have the bit manipulation work. :/

Here's another way, similar to #igofed's solution:
var arrays = new List<byte[]>();
using (var f = new FileStream("test.txt", FileMode.Open))
{
for (int i = 0; i < f.Length; i += 4)
{
var b = new byte[4];
var bytesRead = f.Read(b, i, 4);
if (bytesRead < 4)
{
var b2 = new byte[bytesRead];
Array.Copy(b, b2, bytesRead);
arrays.Add(b2);
}
else if (bytesRead > 0)
arrays.Add(b);
}
}
//make changes to arrays
using (var f = new FileStream("test-out.txt", FileMode.Create))
{
foreach (var b in arrays)
f.Write(b, 0, b.Length);
}

Regarding your "Edit 3" ... I'll bite, although it's really a diversion from the original question.
There's no reason you need Lists of arrays, since you're just breaking up the file into a continuous list of 4-byte sequences, looping through and processing each sequence, and then looping through and writing each sequence. You can do much better. NOTE: The implementation below does not check for or handle input files whose lengths are not exactly multiples of 4. I leave that as an exercise to you, if it is important.
To directly address your comment, here is a single-array solution. We'll ditch the List objects, read the whole file into a single byte[] array, and then copy out 4-byte sections of that array to do your bit transforms, then put the result back. At the end we'll just slam the whole thing into the output file.
byte[] data;
using (Stream fs = File.OpenRead("E:\\temp\\test.bmp")) {
data = new byte[fs.Length];
fs.Read(data, 0, data.Length);
}
byte[] element = new byte[4];
for (int i = 0; i < data.Length; i += 4) {
Array.Copy(data, i, element, 0, element.Length);
BitArray source = new BitArray(element);
BitArray target = new BitArray(source.Length);
target[26] = source[0];
target[31] = source[1];
// ...
target[5] = source[31];
target.CopyTo(data, i);
}
using (Stream fs = File.OpenWrite("E:\\temp\\test_out.bmp")) {
fs.Write(data, 0, data.Length);
}
All of the ugly initial read code is gone since we're just using a single byte array. Notice I reserved a single 4-byte array before the processing loop to re-use, so we can save the garbage collector some work. Then we loop through the giant data array 4 bytes at a time and copy them into our working array, use that to initialize the BitArrays for your transforms, and then the last statement in the block converts the BitArray back into a byte array, and copies it directly back to its original location within the giant data array. This replaces BitArrayToByteArray method, since you did not provide it. At the end, writing is also easy since it's just slamming out the now-transformed giant data array.
When I ran your original solution I got an OutOfMemory exception on my original test file of 100MB, so I used a 44MB file. It consumed 650MB in memory and ran in 30 seconds. The single-array solution used 54MB of memory and ran in 10 seconds. Not a bad improvement, and it demonstrates how bad holding onto millions of small array objects is.

Here is what you want:
using (var reader = new StreamReader("inputFileName"))
{
using (var writer = new StreamWriter("outputFileName"))
{
char[] buff = new char[4];
int readCount = 0;
while((readCount = reader.Read(buff, 0, 4)) > 0)
{
//manipulations with buff
writer.Write(buff);
}
}
}

IEnumerable<byte[]> arraysOf4Bytes = File
.ReadAllBytes(path)
.Select((b,i) => new{b, i})
.GroupBy(x => x.i / 4)
.Select(g => g.Select(x => x.b).ToArray())

zlib from C++ to C#(How to convert byte[] to stream and stream to byte[])

My task is to decompress a packet(received) using zlib and then use an algoritm to make a picture from the data
The good news is that I have the code in C++,but the task is to do it in C#
C++
//Read the first values of the packet received
DWORD image[200 * 64] = {0}; //used for algoritm(width always = 200 and height always == 64)
int imgIndex = 0; //used for algoritm
unsigned char rawbytes_[131072] = {0}; //read below
unsigned char * rawbytes = rawbytes_; //destrination parameter for decompression(ptr)
compressed = r.Read<WORD>(); //the length of the compressed bytes(picture)
uncompressed = r.Read<WORD>(); //the length that should be after decompression
width = r.Read<WORD>(); //the width of the picture
height = r.Read<WORD>(); //the height of the picture
LPBYTE ptr = r.GetCurrentStream(); //the bytes(file that must be decompressed)
outLen = uncompressed; //copy the len into another variable
//Decompress
if(uncompress((Bytef*)rawbytes, &outLen, ptr, compressed) != Z_OK)
{
printf("Could not uncompress the image code.\n");
Disconnect();
return;
}
//Algoritm to make up the picture
// Loop through the data
for(int c = 0; c < (int)height; ++c)
{
for(int r = 0; r < (int)width; ++r)
{
imgIndex = (height - 1 - c) * width + r;
image[imgIndex] = 0xFF000000;
if(-((1 << (0xFF & (r & 0x80000007))) & rawbytes[((c * width + r) >> 3)]))
image[imgIndex] = 0xFFFFFFFF;
}
}
I'm trying to do this with zlib.NET ,but all demos have that code to decompress(C#)
private void decompressFile(string inFile, string outFile)
{
System.IO.FileStream outFileStream = new System.IO.FileStream(outFile, System.IO.FileMode.Create);
zlib.ZOutputStream outZStream = new zlib.ZOutputStream(outFileStream);
System.IO.FileStream inFileStream = new System.IO.FileStream(inFile, System.IO.FileMode.Open);
try
{
CopyStream(inFileStream, outZStream);
}
finally
{
outZStream.Close();
outFileStream.Close();
inFileStream.Close();
}
}
public static void CopyStream(System.IO.Stream input, System.IO.Stream output)
{
byte[] buffer = new byte[2000];
int len;
while ((len = input.Read(buffer, 0, 2000)) > 0)
{
output.Write(buffer, 0, len);
}
output.Flush();
}
My problem:I don't want to save the file after decompression,because I have to use the algoritm shown in the C++ code.
How to convert the byte[] array into a stream similiar to the one in the C# zlib code to decompress the data and then how to convert the stream back into byte array?
Also,How to change the zlib.NET code to NOT save files?

Just use MemoryStreams instead of FileStreams:
// Assuming inputData is a byte[]
MemoryStream input = new MemoryStream(inputData);
MemoryStream output = new MemoryStream();
Then you can use output.ToArray() afterwards to get a byte array out.
Note that it's generally better to use using statements instead of a single try/finally block - as otherwise if the first call to Close fails, the rest won't be made. You can nest them like this:
using (MemoryStream output = new MemoryStream())
using (Stream outZStream = new zlib.ZOutputStream(output))
using (Stream input = new MemoryStream(bytes))
{
CopyStream(inFileStream, outZStream);
return output.ToArray();
}

I just ran into this same issue.
For Completeness... (since this stumped me for several hours)
In the case of ZLib.Net you also have to call finish(), which usually happens during Close(), before you call return output.ToArray()
Otherwise you will get an empty/incomplete byte array from your memory stream, because the ZStream hasn't actually written all of the data yet:
public static void CompressData(byte[] inData, out byte[] outData)
{
using (MemoryStream outMemoryStream = new MemoryStream())
using (ZOutputStream outZStream = new ZOutputStream(outMemoryStream, zlibConst.Z_DEFAULT_COMPRESSION))
using (Stream inMemoryStream = new MemoryStream(inData))
{
CopyStream(inMemoryStream, outZStream);
outZStream.finish();
outData = outMemoryStream.ToArray();
}
}
public static void DecompressData(byte[] inData, out byte[] outData)
{
using (MemoryStream outMemoryStream = new MemoryStream())
using (ZOutputStream outZStream = new ZOutputStream(outMemoryStream))
using (Stream inMemoryStream = new MemoryStream(inData))
{
CopyStream(inMemoryStream, outZStream);
outZStream.finish();
outData = outMemoryStream.ToArray();
}
}
In this example I'm also using the zlib namespace:
using zlib;
Originally found in this thread:
ZLib decompression
I don't have enough points to vote up yet, so...
Thanks to Tim Greaves for the tip regarding finish before ToArray
And Jon Skeet for the tip regarding nesting the using statements for streams (which I like much better than try/finally)

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

How do i DEFLATE properly? (both .NET and MONO gets it wrong) - c#

d.Flush(); is incorrect. The deflate stream needs to be closed. d.Close(); solves it.

Related

How to read a binary file quickly in c#? (ReadOnlySpan vs MemoryStream)

Convert byte[] to Point[] and back

Reading a file one byte at a time in reverse order

Read file into ByteArrays of 4 bytes

zlib from C++ to C#(How to convert byte[] to stream and stream to byte[])

Categories

Resources