checksum calculation using ArraySegment<byte> - c#

I have issue with the following method - I don't understand why it behaves the way it does
private static bool chksumCalc(ref byte[] receive_byte_array)
{
Console.WriteLine("receive_byte_array -> " + receive_byte_array.Length); //ok,151 bytes in my case
ArraySegment<byte> segment = new ArraySegment<byte>(receive_byte_array, 0, 149);
Console.WriteLine("segment # -> " + segment.Count); //ok,149 bytes
BitArray resultBits = new BitArray(8); //hold the result
Console.WriteLine("resultBits.Length -> " + resultBits.Length); //ok, 8bits
//now loop through the 149 bytes
for (int i = segment.Offset; i < (segment.Offset + segment.Count); ++i)
{
BitArray curBits = new BitArray(segment.Array[i]);
Console.WriteLine("curBits.Length -> " + curBits.Length); //gives me 229 not 8?
resultBits = resultBits.Xor(curBits);
}
//some more things to do ... return true...
//or else
return false;
}
I need to XOR 149 bytes and I don't understand why segment.Array[i] doesn't give me 1 byte. If I have array of 149 bytes if I use for example segment.Array[1] it has to yield the 2nd byte or am I that wrong? Where does the 229 come from? Can someone please clarify? Thank you.

This is the constructor you're calling: BitArray(int length)
Initializes a new instance of the BitArray class that can hold the specified number of bit values, which are initially set to false.
If you look, all of the constructors for BitArray read like that. I don't see why you need to use the BitArray class at all, though. Just use a byte to store your XOR result:
private static bool chksumCalc(ref byte[] receive_byte_array)
{
var segment = new ArraySegment<byte>(receive_byte_array, 0, 149);
byte resultBits = 0;
for (var i = segment.Offset; i < (segment.Offset + segment.Count); ++i)
{
var curBits = segment.Array[i];
resultBits = (byte)(resultBits ^ curBits);
}
//some more things to do ... return true...
//or else
return false;
}
I don't think you need the ArraySegment<T> either (not for the code presented), but I left it as is since it's beside the point of the question.

Related

LZW decompression problem after Clear Code (Unix Compress .Z-files)

I am implementing my own decompression code for decompressing Unix COMPRESS'ed .Z files. I have the basic decompression working and tested on smaller example files, but when I test it out on "real" files which may include the so called "Clear Code" (0x256), I run into trouble.
My code is able to decompress the file well up until that point, but after clearing the table and resetting the code length back to its initial size of 9, I notice the next code I am reading is faulty as it is larger than 255 (somewhere in the 400s). As it is the first entry since "resetting", I obviously don't have this entry in the table.
I have compared the codes read just after the reset code 0x256 by my code and SharpZipLib. I noticed that SharpzipLib seems to get a different code after the reset than I do. Our implementations are quite different so I am having a hard time finding the issue. I suspect the issue is that I start reading the bits in the wrong place somehow, though I have not managed to figured out what I am doing wrong...
Maybe a second pair of eyes would help?
Code:
namespace LzwDecompressor;
public class Decompressor
{
#region LZW Constants
private readonly byte[] _magicBytes = new byte[] { 0x1f, 0x9d };
private const int BlockModeMask = 0x80; //0b1000 0000
private const int MaxCodeBitsMask = 0x1f; //0b0001 1111
private const int InitialCodeLength = 9;
private const int ClearCode = 256;
#endregion
private int _maxBits;
private int _maxCode = (1 << InitialCodeLength) - 1;
private bool _blockMode;
private int _codeLength = InitialCodeLength;
private readonly Stream _inputStream;
public Decompressor(Stream stream) => _inputStream = stream;
public byte[] Decompress()
{
if (_inputStream.Length < 3)
throw new LzwDecompressorException("Input too small to even fit the required header.");
ParseHeader();
var dictionary = InitDict();
using var outStream = new MemoryStream();
var code = ReadCode();
if (code >= 256) throw new LzwDecompressorException("The first code cannot be larger than 255!");
outStream.Write(new[] { (byte)code }); //First code is always uncompressed
var old = code;
var nextIndex = _blockMode ? 257 : 256; //Skip 256 in block mode as it is a "clear code"
while ((code = ReadCode()) != -1)
{
if (_blockMode && code == ClearCode)
{
_codeLength = InitialCodeLength;
_maxCode = (1 << _codeLength) - 1;
dictionary = InitDict();
nextIndex = 257; //Block mode first index
//Logically I should here be able to read the next code and write it instantly as the first code is basically uncompressed. But as the code is wrong, I cannot do that
//code = ReadCode();
//outStream.Write(new [] { (byte)code });
//old = code;
continue;
}
var word = new List<byte>();
if (dictionary.TryGetValue(code, out var entry))
{
word.AddRange(entry);
}
else if (dictionary.Count + 1 == nextIndex)
{
word.AddRange(dictionary[old].ToArray().Concat(new[] { dictionary[old][0] }));
}
if (word.Count > 0)
{
outStream.Write(word.ToArray());
dictionary[nextIndex++] = new List<byte>(dictionary[old].ToArray().Append(word[0]));
old = code;
}
if (_codeLength == _maxBits) continue; //prevent code length growing beyond max
if (nextIndex == (1 << _codeLength))
{
_codeLength++;
_maxCode = (1 << _codeLength) - 1;
_ = dictionary.EnsureCapacity(1 << _codeLength);
}
}
return outStream.ToArray();
}
#region Private methods
private void ParseHeader()
{
if (_inputStream.ReadByte() != _magicBytes[0] || _inputStream.ReadByte() != _magicBytes[1])
{
throw new LzwDecompressorException("The given file does not contain the LZW magic bytes");
}
var descriptorByte = _inputStream.ReadByte();
_maxBits = descriptorByte & MaxCodeBitsMask;
_blockMode = (descriptorByte & BlockModeMask) > 0;
}
private static Dictionary<int, List<byte>> InitDict()
{
var dict = new Dictionary<int, List<byte>>(1 << InitialCodeLength); //2⁹ max entries
for (var i = 0; i < 256; i++) dict[i] = new List<byte> { (byte)i };
return dict;
}
private int ReadCode()
{
var code = 0x0;
for (var i = 0; i < _codeLength; i++)
{
var bit = ReadBit();
if (bit == -1) return -1;
code |= bit << i;
}
return code;
}
#region Bit Reader
private int _currentBitMask = 0x100;
private int _currentByte;
private int ReadBit()
{
if (_currentBitMask == 0x100)
{
_currentBitMask = 0x1;
var newByte = _inputStream.ReadByte();
if (newByte == -1) return -1;
_currentByte = newByte;
}
var bit = (_currentByte & _currentBitMask) > 0 ? 1 : 0;
_currentBitMask <<= 1;
return bit;
}
#endregion
#endregion
}
public class LzwDecompressorException : Exception
{
public LzwDecompressorException() { }
public LzwDecompressorException(string message) : base($"LZW Decompressor: {message}") { }
public LzwDecompressorException(string message, Exception inner) : base($"LZW Decompressor: {message}", inner) { }
}
I recognize that there may be other stuff missing still. Also performance issues and improvements are bound to be found. I have not paid too much attention to these yet as I am first and foremost looking to get it working until I start changing data structures and such for more performant variants.
Update
I managed to get it working. I finally found an example suited for my use case (decompressing a Unix COMPRESS'ed file). I found this python code on GH: unlzw
The only thing I was missing was the following byte position calculation after resetting:
# process clear code (256)
if (code == 256) and flags:
# Flush unused input bits and bytes to next 8*bits bit boundary
rem = (nxt - mark) % bits
if rem:
rem = bits - rem
if rem > inlen - nxt:
break
nxt += rem
I had already refactored my original code to work with a byte array instead of the memory stream I had originally. This simplified my thinking somewhat as I was now keeping track of the "current byte position" instead of the current byte. This makes it easier to, on Clear code, re-calculate the new byte position from which to continue reading. This python snippet which seems to be based on some old machine instructions according to this comment from the code:
Flush unused input bits and bytes to next 8*bits bit boundary
(this is a vestigial aspect of the compressed data format
derived from an implementation that made use of a special VAX
machine instruction!)
Having implemented my own version of this calculation, based on my parameters I already had at hand, I managed to get it working! I must say I do not fully understand the logic behind it, but I am happy it works :)

Code change vb.net to c#

I have no idea where to ask a question like this, so probably should say sorry right away.
Private Function RCON_Command(ByVal Command As String, ByVal ServerData As Integer) As Byte()
Dim Packet As Byte() = New Byte(CByte((13 + Command.Length))) {}
Packet(0) = Command.Length + 9 'Packet Size (Integer)
Packet(4) = 0 'Request Id (Integer)
Packet(8) = ServerData 'SERVERDATA_EXECCOMMAND / SERVERDATA_AUTH (Integer)
For X As Integer = 0 To Command.Length - 1
Packet(12 + X) = System.Text.Encoding.Default.GetBytes(Command(X))(0)
Next
Return Packet
End Function
Can someone tell me how should this code look like in c#? Tried my self but always getting error Cannot implicitly convert type 'int' to 'byte'. An explicit conversion exists (are you missing a cast?)
Tried to cast, then getting error about no need to cast
My code:
private byte[] RCON_Command(string command, int serverdata)
{
byte[] packet = new byte[command.Length + 13];
packet[0] = command.Length + 9;
packet[4] = 0;
packet[8] = serverdata;
for (int i = 0; i < command.Length; i++)
{
packet[12 + i] = System.Text.Encoding.UTF8.GetBytes(command[i])[0];
}
return packet;
}
error is in packet[0] and packet [8] line
You need to cast the two items to byte before assigning them. Another option I've done below is to change the method to accept serverdata as a byte instead of int - there's no point in taking the extra bytes only to throw them away.
Another problem is in the for loop - the indexer of string returns a char, which UTF8.GetBytes() can't accept. I think my translation should work, but you'll need to test it.
private byte[] RCON_Command(string command, byte serverdata)
{
byte[] packet = new byte[command.Length + 13];
packet[0] = (byte)(command.Length + 9);
packet[4] = 0;
packet[8] = serverdata;
for (int i = 0; i < command.Length; i++)
{
packet[12 + i] = System.Text.Encoding.UTF8.GetBytes(command)[i];
}
return packet;
}
Here you go. The Terik converter was no use - that code wouldn't compile.
This code runs...
private byte[] RCON_Command(string Command, int ServerData)
{
byte[] commandBytes = System.Text.Encoding.Default.GetBytes(Command);
byte[] Packet = new byte[13 + commandBytes.Length + 1];
for (int i = 0; i < Packet.Length; i++)
{
Packet[i] = (byte)0;
}
int index = 0;
//Packet Size (Integer)
byte[] bytes = BitConverter.GetBytes(Command.Length + 9);
foreach (var byt in bytes)
{
Packet[index++] = byt;
}
//Request Id (Integer)
bytes = BitConverter.GetBytes((int)0);
foreach (var byt in bytes)
{
Packet[index++] = byt;
}
//SERVERDATA_EXECCOMMAND / SERVERDATA_AUTH (Integer)
bytes = BitConverter.GetBytes(ServerData);
foreach (var byt in bytes)
{
Packet[index++] = byt;
}
foreach (var byt in commandBytes)
{
Packet[index++] = byt;
}
return Packet;
}
In addition to the need for casting, you need to be aware that C# uses array sizes when creating the array, not the upper bound that VB uses - so you need "14 + Command.Length":
private byte[] RCON_Command(string Command, int ServerData)
{
byte[] Packet = new byte[Convert.ToByte(14 + Command.Length];
Packet[0] = Convert.ToByte(Command.Length + 9); //Packet Size (Integer)
Packet[4] = 0; //Request Id (Integer)
Packet[8] = Convert.ToByte(ServerData); //SERVERDATA_EXECCOMMAND / SERVERDATA_AUTH (Integer)
for (int X = 0; X < Command.Length; X++)
{
Packet[12 + X] = System.Text.Encoding.Default.GetBytes(Command[X])[0];
}
return Packet;
}
Just add the explicit casts. You might want to make sure that it's safe to down cast from a 32-bit value type to an 8-bit type.
packet[0] = (byte)(command.Length + 9);
...
packet[8] = (byte)serverdata;
EDIT:
TheEvilPenguin is also right that you will have a problem with your call to GetBytes().
This is how I would fix it to make sure I don't change the meaning of the existing VB.NET code:
packet[12 + i] = System.Text.Encoding.UTF8.GetBytes(new char[] {command[i]})[0];
And also, one more detail:
When you declare an array in VB.NET, you define the maximum array index. In C#, the number in the array declaration represents the number of elements in the array. This means that in the translation from VB.NET to C#, to keep equivalent behavior, you need to add + 1 to the number in the array declaration:
byte[] packet = new byte[command.Length + 13 + 1]; // or + 14 if you want

Optimization - Encode a string and get hexadecimal representation of 3 bytes

I am currently working in an environment where performance is critical and this is what I am doing :
var iso_8859_5 = System.Text.Encoding.GetEncoding("iso-8859-5");
var dataToSend = iso_8859_5.GetBytes(message);
The I need to group the bytes by 3 so I have a for loop that does this (i being the iterator of the loop):
byte[] dataByteArray = { dataToSend[i], dataToSend[i + 1], dataToSend[i + 2], 0 };
I then get an integer out of these 4 bytes
BitConverter.ToUInt32(dataByteArray, 0)
and finally the integer is converted to a hexadecimal string that I can place in a network packet.
The last two lines repeat about 150 times
I am currently hitting 50 milliseconds of execution times and ideally I would want to reach 0... Is there a faster way to do this that I am not aware of?
UPDATE
Just tried
string hex = BitConverter.ToString(dataByteArray);
hex.Replace("-", "")
to get the hex string directly but it is 3 times slower
Ricardo Silva's answer adapted
public byte[][] GetArrays(byte[] fullMessage, int size)
{
var returnArrays = new byte[(fullMessage.Length / size)+1][];
int i, j;
for (i = 0, j = 0; i < (fullMessage.Length - 2); i += size, j++)
{
returnArrays[j] = new byte[size + 1];
Buffer.BlockCopy(
src: fullMessage,
srcOffset: i,
dst: returnArrays[j],
dstOffset: 0,
count: size);
returnArrays[j][returnArrays[j].Length - 1] = 0x00;
}
switch ((fullMessage.Length % i))
{
case 0: {
returnArrays[j] = new byte[] { 0, 0, EOT, 0 };
} break;
case 1: {
returnArrays[j] = new byte[] { fullMessage[i], 0, EOT, 0 };
} break;
case 2: {
returnArrays[j] = new byte[] { fullMessage[i], fullMessage[i + 1], EOT, 0 };
} break;
}
return returnArrays;
}
After the line below you will get the total byte array.
var dataToSend = iso_8859_5.GetBytes(message);
My sugestion is work with Buffer.BlockCopy and test to see if this will be faster than your current method.
Try the code below and tell us if is faster than your current code:
public byte[][] GetArrays(byte[] fullMessage, int size)
{
var returnArrays = new byte[fullMessage.Length/size][];
for(int i = 0, j = 0; i < fullMessage.Length; i += size, j++)
{
returnArrays[j] = new byte[size + 1];
Buffer.BlockCopy(
src: fullMessage,
srcOffset: i,
dst: returnArrays[j],
dstOffset: 0,
count: size);
returnArrays[j][returnArrays[j].Length - 1] = 0x00;
}
return returnArrays;
}
EDIT1: I run the test below and the output was 245900ns (or 0,2459ms).
[TestClass()]
public class Form1Tests
{
[TestMethod()]
public void GetArraysTest()
{
var expected = new byte[] { 0x30, 0x31, 0x32, 0x00 };
var size = 3;
var stopWatch = new Stopwatch();
stopWatch.Start();
var iso_8859_5 = System.Text.Encoding.GetEncoding("iso-8859-5");
var target = iso_8859_5.GetBytes("012");
var arrays = Form1.GetArrays(target, size);
BitConverter.ToUInt32(arrays[0], 0);
stopWatch.Stop();
foreach(var array in arrays)
{
for(int i = 0; i < expected.Count(); i++)
{
Assert.AreEqual(expected[i], array[i]);
}
}
Console.WriteLine(string.Format("{0}ns", stopWatch.Elapsed.TotalMilliseconds * 1000000));
}
}
EDIT 2
I looked to your code and I have only one suggestion. I understood that you need to add EOF message and the length of input array will not be Always multiple of size that you want to break.
BUT, now the code below has TWO responsabilities, that break the S of SOLID concept.
The S talk about Single Responsability - Each method has ONE, and only ONE responsability.
The code you posted has TWO responsabilities (break input array into N smaller arrays and add EOF). Try think a way to create two totally independente methods (one to break an array into N other arrays, and other to put EOF in any array that you pass). This will allow you to create unit tests for each method (and guarantee that they Works and will never be breaked for any changed), and call the two methods from your class that make the system integration.

Generate a random mac address

I am looking for a method in C# which generates a random MAC number. Google is pretty thin on that one.
Thx a lot
SOLUTION:
With the help of Yahia I was able to code the following solution. Thx again!
public static string GenerateMACAddress()
{
var sBuilder = new StringBuilder();
var r = new Random();
int number;
byte b;
for (int i = 0; i < 6; i++)
{
number = r.Next(0, 255);
b = Convert.ToByte(number);
if (i == 0)
{
b = setBit(b, 6); //--> set locally administered
b = unsetBit(b, 7); // --> set unicast
}
sBuilder.Append(number.ToString("X2"));
}
return sBuilder.ToString().ToUpper();
}
private static byte setBit(byte b, int BitNumber)
{
if (BitNumber < 8 && BitNumber > -1)
{
return (byte)(b | (byte)(0x01 << BitNumber));
}
else
{
throw new InvalidOperationException(
"Der Wert für BitNumber " + BitNumber.ToString() + " war nicht im zulässigen Bereich! (BitNumber = (min)0 - (max)7)");
}
}
private static byte unsetBit(byte b, int BitNumber)
{
if (BitNumber < 8 && BitNumber > -1)
{
return (byte)(b | (byte)(0x00 << BitNumber));
}
else
{
throw new InvalidOperationException(
"Der Wert für BitNumber " + BitNumber.ToString() + " war nicht im zulässigen Bereich! (BitNumber = (min)0 - (max)7)");
}
}
A slightly less verbose solution (which I think still achieves the same outcome):
public static string GetRandomMacAddress()
{
var random = new Random();
var buffer = new byte[6];
random.NextBytes(buffer);
var result = String.Concat(buffer.Select(x => string.Format("{0}:", x.ToString("X2"))).ToArray());
return result.TrimEnd(':');
}
This gives a formatted MAC, remove string.Format and Trim if unformatted is required
There is no such method in the .NET framework...
You will need to write one yourself - read the format description, use a random generator to get 6 random numbers between 0 and 255, setup the 2 relevant bits (for globally unique/locally administered) as need be and then transform the number to hex (i.e. X2, 2 digits per number, left padded with 0) and join these together with : as delimiter...
MUG4N's solution has a problem. You have to tweak the LEAST significant two bits not the MOST significant two.
So instead of
b = setBit(b, 6); //--> set locally administered
b = unsetBit(b, 7); // --> set unicast
it should be
b = setBit(b, 1); //--> set locally administered
b = unsetBit(b, 0); // --> set unicast
Also the unsetBit() is incorrect. The relevant line should be
return unchecked((byte)(b & (byte)~(0x01 << BitNumber)));
Of course it would probably be simpler to change it to this:
if (i == 0)
{
b = (byte)((b & 0xFE) | 0x02) //-->set locally administered and unicast
}
Small update for those of you who have a problem generating a new MAC address for your Wifi-adapter. You just have to set the first octet of MAC address to "02", instead on what normally is "00". Setting first octet "02" actually sets the b2 bit indicating that the MAC address is locally administered.
You can read more about it here:
http://blog.technitium.com/2011/05/tmac-issue-with-wireless-network.html
Code:
public static string GetRandomWifiMacAddress()
{
var random = new Random();
var buffer = new byte[6];
random.NextBytes(buffer);
buffer[0] = 02;
var result = string.Concat(buffer.Select(x => string.Format("{0}", x.ToString("X2"))).ToArray());
return result;
}
Here is a helper class to generate random mac.
public static class MacAddress
{
private static readonly Random Random = new Random();
public static string GetSignatureRandomMac(string generic = "AA")
{
string[] macBytes = new[]
{
generic,
generic,
generic,
Random.Next(1, 256).ToString("X"),
Random.Next(1, 256).ToString("X"),
Random.Next(1, 256).ToString("X")
};
return string.Join("-", macBytes);
}
public static string GetRandomMac()
{
string[] macBytes = new[]
{
Random.Next(1, 256).ToString("X"),
Random.Next(1, 256).ToString("X"),
Random.Next(1, 256).ToString("X"),
Random.Next(1, 256).ToString("X"),
Random.Next(1, 256).ToString("X"),
Random.Next(1, 256).ToString("X")
};
return string.Join("-", macBytes);
}
}
Usage:
Console.WriteLine(MacAddress.GetRandomMac());
Console.WriteLine(MacAddress.GetSignatureRandomMac());
Console.WriteLine(MacAddress.GetSignatureRandomMac("BB"));
we can generate a random number and we can modify only last 2 bit of the first byte using bitwise operators:
public string GetRandomWifiMacAddress()
{
var random = new Random();
var buffer = new byte[6];
random.NextBytes(buffer);
buffer[0] &= 0b11111110;
buffer[0] |= 0b00000010;
var result = string.Concat(buffer.Select(x => string.Format("{0}", x.ToString("X2"))).ToArray());
return result;
}
There isn't a function within .NET to generate MAC addresses. It would have to be written.
MAC Addresses are generally meant to be unique and set by the OEM of the NIC. Different manufacturing have a certain allocated prefix. A sample list can be found here; https://code.wireshark.org/review/gitweb?p=wireshark.git;a=blob_plain;f=manuf
Off the top of my head, I don't know of any libraries that exist to generate MAC addresses (as it's not a common requirement), however it wouldn't be too difficult to create one as an address is simply 6 hexadecimal values from 00 to FF separated by a colon.

C# compress a byte array

I do not know much about compression algorithms. I am looking for a simple compression algorithm (or code snippet) which can reduce the size of a byte[,,] or byte[]. I cannot make use of System.IO.Compression. Also, the data has lots of repetition.
I tried implementing the RLE algorithm (posted below for your inspection). However, it produces array's 1.2 to 1.8 times larger.
public static class RLE
{
public static byte[] Encode(byte[] source)
{
List<byte> dest = new List<byte>();
byte runLength;
for (int i = 0; i < source.Length; i++)
{
runLength = 1;
while (runLength < byte.MaxValue
&& i + 1 < source.Length
&& source[i] == source[i + 1])
{
runLength++;
i++;
}
dest.Add(runLength);
dest.Add(source[i]);
}
return dest.ToArray();
}
public static byte[] Decode(byte[] source)
{
List<byte> dest = new List<byte>();
byte runLength;
for (int i = 1; i < source.Length; i+=2)
{
runLength = source[i - 1];
while (runLength > 0)
{
dest.Add(source[i]);
runLength--;
}
}
return dest.ToArray();
}
}
I have also found a java, string and integer based, LZW implementation. I have converted it to C# and the results look good (code posted below). However, I am not sure how it works nor how to make it work with bytes instead of strings and integers.
public class LZW
{
/* Compress a string to a list of output symbols. */
public static int[] compress(string uncompressed)
{
// Build the dictionary.
int dictSize = 256;
Dictionary<string, int> dictionary = new Dictionary<string, int>();
for (int i = 0; i < dictSize; i++)
dictionary.Add("" + (char)i, i);
string w = "";
List<int> result = new List<int>();
for (int i = 0; i < uncompressed.Length; i++)
{
char c = uncompressed[i];
string wc = w + c;
if (dictionary.ContainsKey(wc))
w = wc;
else
{
result.Add(dictionary[w]);
// Add wc to the dictionary.
dictionary.Add(wc, dictSize++);
w = "" + c;
}
}
// Output the code for w.
if (w != "")
result.Add(dictionary[w]);
return result.ToArray();
}
/* Decompress a list of output ks to a string. */
public static string decompress(int[] compressed)
{
int dictSize = 256;
Dictionary<int, string> dictionary = new Dictionary<int, string>();
for (int i = 0; i < dictSize; i++)
dictionary.Add(i, "" + (char)i);
string w = "" + (char)compressed[0];
string result = w;
for (int i = 1; i < compressed.Length; i++)
{
int k = compressed[i];
string entry = "";
if (dictionary.ContainsKey(k))
entry = dictionary[k];
else if (k == dictSize)
entry = w + w[0];
result += entry;
// Add w+entry[0] to the dictionary.
dictionary.Add(dictSize++, w + entry[0]);
w = entry;
}
return result;
}
}
Have a look here. I used this code as a basis to compress in one of my work projects. Not sure how much of the .NET Framework is accessbile in the Xbox 360 SDK, so not sure how well this will work for you.
The problem with that RLE algorithm is that it is too simple. It prefixes every byte with how many times it is repeated, but that does mean that in long ranges of non-repeating bytes, each single byte is prefixed with a "1". On data without any repetitions this will double the file size.
This can be avoided by using Code-type RLE instead; the 'Code' (also called 'Token') will be a byte that can have two meanings; either it indicates how many times the single following byte is repeated, or it indicates how many non-repeating bytes follow that should be copied as they are. The difference between those two codes is made by enabling the highest bit, meaning there are still 7 bits available for the value, meaning the amount to copy or repeat per such code can be up to 127.
This means that even in worst-case scenarios, the final size can only be about 1/127th larger than the original file size.
A good explanation of the whole concept, plus full working (and, in fact, heavily optimised) C# code, can be found here:
http://www.shikadi.net/moddingwiki/RLE_Compression
Note that sometimes, the data will end up larger than the original anyway, simply because there are not enough repeating bytes in it for RLE to work. A good way to deal with such compression failures is by adding a header to your final data. If you simply add an extra byte at the start that's on 0 for uncompressed data and 1 for RLE compressed data, then, when RLE fails to give a smaller result, you just save it uncompressed, with the 0 in front, and your final data will be exactly one byte larger than the original. The system at the other side can then read that starting byte and use that to determine if the following data should be uncompressed or just copied.
Look into Huffman codes, it's a pretty simple algorithm. Basically, use fewer bits for patterns that show up more often, and keep a table of how it's encoded. And you have to account in your codewords that there are no separators to help you decode.

Categories

Resources