quick question. I'm trying to convert an integer to a 7BitEncodedInt through the Binary Writer. Currently it seems the best manner I have to do so is the following:
Stream memoryStream = new MemoryStream();
InternalBinaryWriter ibw = new InternalBinaryWriter(memoryStream);
ibw.Write7BitEncodedInt((int)length);
memoryStream.Position = 0;
memoryStream.Read(object.ByteArray, 0, (int)memoryStream.Length);
Is there a simpler way to do so without requiring the streams? I'm slightly reluctant to reimplement their Write7BitEncodedInt function and just have that output directly to a byte array, but if necessary that's just what I'll do
edit:
What I ended up doing was the following:
the byteArray is resized accordingly, just didn't include that line in the source.
What I ended up doing was the following:
byte[] Write7BitEncodedIntToByteArray(int length) {
List<byte> byteList = new List<byte>();
while (length > 0x80) {
byteList.Add((byte)(length | 0x80));
length >>= 7;
}
byteList.Add((byte)length);
return byteList.ToArray();
}
If you want to write to a byte[] array, using a MemoryStream is the easiest way to do it. In fact, one of the MemoryStream constructors takes a byte array and uses that as the buffer it will write to. So it might be more useful in your case to do something like this:
using( var memoryStream = new MemoryStream( #object.ByteArray ) )
using( var writer = new InternalBinaryWriter( memoryStream ) )
{
writer.Write7BitEncodedInt( (int) length );
}
Keep in mind that this will result in a memory stream that cannot expand its size. Based on the code you supplied I think this will be fine (what you've shown may in fact throw an exception at runtime if #object.ByteArray is smaller than the stream array).
EDIT: Since we're looking at evidently performance-critical code, I tested three different methods:
List
static byte[] MethodA( int length )
{
List<byte> byteList = new List<byte>();
while (length > 0x80) {
byteList.Add((byte)(length | 0x80));
length >>= 7;
}
byteList.Add((byte)length);
return byteList.ToArray();
}
MemoryStream
static byte[] MethodB( int length )
{
using( var stream = new MemoryStream( 5 ) )
{
while( length > 0x80 )
{
stream.WriteByte( (byte) ( length | 0x80 ) );
length >>= 7;
}
stream.WriteByte( (byte) length );
return stream.ToArray();
}
}
Direct Array
static byte[] MethodC( int length )
{
int tmp = length;
int l = 1;
while( ( tmp >>= 7 ) > 0 ) ++l;
var result = new byte[l];
for( int i = 0; i < l; ++i )
{
result[i] = (byte) ( 0x80 | length );
length >>= 7;
}
result[l - 1] &= 0x7f;
return result;
}
By a wide margin, the third method was the fastest (4-5x faster than the previous two). When it comes to performance critical code, creating a List just to turn it into an array is almost always a smell, which is what got my attention here.
It's also possible that another big chunk of time could be getting sucked up elsewhere depending on how this method is used (for example, if this small array is being copied into a larger array every time, you might be able to remove the array allocations altogether and write directly into the larger array).
Related
I want to create a custom data type which is 4 bits (nibble).
One option is this -
byte source = 0xAD;
var hiNybble = (source & 0xF0) >> 4; //Left hand nybble = A
var loNyblle = (source & 0x0F); //Right hand nybble = D
However, I want to store each 4 bits in an array.
So for example, a 2 byte array,
00001111 01010000
would be stored in the custom data type array as 4 nibbles -
0000
1111
0101
0000
Essentially I want to operate on 4 bit types.
Is there any way I can convert the array of bytes into array of nibbles?
Appreciate an example.
Thanks.
You can encapsulate a stream returning 4-bit samples by reading then converting (written from a phone without a compiler to test. Expect typos and off-by-one errors):
public static int ReadNibbles(this Stream s, byte[] data, int offset, int count)
{
if (s == null)
{
throw new ArgumentNullException(nameof(s));
}
if (data == null)
{
throw new ArgumentNullException(nameof(data));
}
if (data.Length < offset + length)
{
throw new ArgumentOutOfRangeException(nameof(length));
}
var readBytes = s.Read(data, offset, length / 2);
for (int n = readBytes * 2 - 1, k = readBytes - 1; k >= 0; k--)
{
data[offset + n--] = data[offset + k] & 0xf;
data[offset + n--] = data[offset + k] >> 4;
}
return readBytes * 2;
}
To do the same for 12-bit integers (assuming MSB nibble ordering):
public static int Read(this Stream stream, ushort[] data, int offset, int length)
{
if (stream == null)
{
throw new ArgumentNullException(nameof(stream));
}
if (data == null)
{
throw new ArgumentNullException(nameof(data));
}
if (data.Length < offset + length)
{
throw new ArgumentOutOfRangeException(nameof(length));
}
if (length < 2)
{
throw new ArgumentOutOfRangeException(nameof(length), "Cannot read fewer than two samples at a time");
}
// we always need a multiple of two
length -= length % 2;
// 3 bytes length samples
// --------- * -------------- = N bytes
// 2 samples 1
int rawCount = (length / 2) * 3;
// This will place GC load. Something like a buffer pool or keeping
// the allocation as a field on the reader would be a good idea.
var rawData = new byte[rawCount];
int readBytes = 0;
// if the underlying stream returns an even number of bytes, we will need to try again
while (readBytes < data.Length)
{
int c = stream.Read(rawData, readBytes, rawCount - readBytes);
if (c <= 0)
{
// End of stream
break;
}
readBytes += c;
}
// unpack
int k = 0;
for (int i = 0; i < readBytes; i += 3)
{
// EOF in second byte is undefined
if (i + 1 >= readBytes)
{
throw new InvalidOperationException("Unexpected EOF");
}
data[(k++) + offset] = (ushort)((rawData[i + 0] << 4) | (rawData[i + 1] >> 4));
// EOF in third byte == only one sample
if (i + 2 < readBytes)
{
data[(k++) + offset] = (ushort)(((rawData[i + 1] & 0xf) << 8) | rawData[i + 2]);
}
}
return k;
}
The best way to do this would be to look at the source for one of the existing integral data types. For example Int16.
If you look a that type, you can see that it implements a handful of interfaces:
[Serializable]
public struct Int16 : IComparable, IFormattable, IConvertible, IComparable<short>, IEquatable<short> { /* ... */ }
The implementation of the type isn't very complicated. It has a MaxValue a MinValue, a couple of CompareTo overloads, a couple of Equals overloads, the System.Object overrides (GetHashCode, GetType, ToString (plus some overloads)), a handful of Parse and ToParse overloads and a range of IConvertible implementations.
In other places, you can find things like arithmetic, comparison and conversion operators.
BUT:
What System.Int16 has that you can't have is this:
internal short m_value;
That's a native type (16-bit integer) member that holds the value. There is no 4-bit native type. The best you are going to be able to do is have a native byte in your implementation that will hold the value. You can write accessors that constrain it to the lower 4 bits, but you can't do much more than that. If someone creates a Nibble array, it will be implemented as an array of those values. As far as I know, there's no way to inject your implementation into that array. Similarly, if someone creates some other collection (e.g., List<Nibble>), then the collection will be of instances of your type, each of which will take up 8 bits.
However
You can create specialized collection classes, NibbleArray, NibbleList, etc. C#'s syntax allows you to provide your own collection initialization implementation for a collection, your own indexing method, etc.
So, if someone does something like this:
var nyblArray = new NibbleArray(32);
nyblArray[4] = 0xd;
Then your code can, under the covers, create a 16-element byte array, set the low nibble of the third byte to 0xd.
Similarly, you can implement code to allow:
var newArray = new NibbleArray { 0x1, 0x3, 0x5, 0x7, 0x9, 0xa};
or
var nyblList = new NibbleList { 0x0, 0x2, 0xe};
A normal array will waste space, but your specialized collection classes will do what you are talking about (with the expense of some bit-twizzling).
The closest you can get to what you want is to use an indexer:
// Indexer declaration
public int this[int index]
{
// get and set accessors
}
Within the body of the indexer you can translate the index to the actual byte that contains your 4 bits.
The next thing you can do is operator overloading. You can redefine +, -, *...
This might be a simple one, but I can't seem to find an easy way to do it. I need to save an array of 84 uint's into an SQL database's BINARY field. So I'm using the following lines in my C# ASP.NET project:
//This is what I have
uint[] uintArray;
//I need to convert from uint[] to byte[]
byte[] byteArray = ???
cmd.Parameters.Add("#myBindaryData", SqlDbType.Binary).Value = byteArray;
So how do you convert from uint[] to byte[]?
How about:
byte[] byteArray = uintArray.SelectMany(BitConverter.GetBytes).ToArray();
This'll do what you want, in little-endian format...
You can use System.Buffer.BlockCopy to do this:
byte[] byteArray = new byte[uintArray.Length * 4];
Buffer.BlockCopy(uintArray, 0, byteArray, 0, uintArray.Length * 4];
http://msdn.microsoft.com/en-us/library/system.buffer.blockcopy.aspx
This will be much more efficient than using a for loop or some similar construct. It directly copies the bytes from the first array to the second.
To convert back just do the same thing in reverse.
There is no built-in conversion function to do this. Because of the way arrays work, a whole new array will need to be allocated and its values filled-in. You will probably just have to write that yourself. You can use the System.BitConverter.GetBytes(uint) function to do some of the work, and then copy the resulting values into the final byte[].
Here's a function that will do the conversion in little-endian format:
private static byte[] ConvertUInt32ArrayToByteArray(uint[] value)
{
const int bytesPerUInt32 = 4;
byte[] result = new byte[value.Length * bytesPerUInt32];
for (int index = 0; index < value.Length; index++)
{
byte[] partialResult = System.BitConverter.GetBytes(value[index]);
for (int indexTwo = 0; indexTwo < partialResult.Length; indexTwo++)
result[index * bytesPerUInt32 + indexTwo] = partialResult[indexTwo];
}
return result;
}
byte[] byteArray = Array.ConvertAll<uint, byte>(
uintArray,
new Converter<uint, byte>(
delegate(uint u) { return (byte)u; }
));
Heed advice from #liho1eye, make sure your uints really fit into bytes, otherwise you're losing data.
If you need all the bits from each uint, you're gonna to have to make an appropriately sized byte[] and copy each uint into the four bytes it represents.
Something like this ought to work:
uint[] uintArray;
//I need to convert from uint[] to byte[]
byte[] byteArray = new byte[uintArray.Length * sizeof(uint)];
for (int i = 0; i < uintArray.Length; i++)
{
byte[] barray = System.BitConverter.GetBytes(uintArray[i]);
for (int j = 0; j < barray.Length; j++)
{
byteArray[i * sizeof(uint) + j] = barray[j];
}
}
cmd.Parameters.Add("#myBindaryData", SqlDbType.Binary).Value = byteArray;
I am trying to achieve the best possible compression for data that consists of just 1s and 0s in a matrix.
To demonstrate what I mean, here's a sample 6 by 6 matrix:
1,0,0,1,1,1
0,1,0,1,1,1
1,0,0,1,0,0
0,1,1,0,1,1
1,0,0,0,0,1
0,1,0,1,0,1
I'd like to compress that into an as small string or byte array as possible. The matrices I will need to compress are bigger though (always 4096 by 4096 1s and 0s).
I suppose it could be compressed quite heavily, but I'm not sure how. I'll mark the best compression as the answer. Performance does not matter.
I assume that you want to compress string into other strings even though your data really is binary. I don't know what the best compression algorithm is (and that will vary depending on your data) but you can convert the input text into bits, compress these and then convert the compressed bytes into a string again using base-64 encoding. This will allow you to go from string to string and still apply a compression algorithm of your choice.
The .NET framework provides the class DeflateStream that will allow you to compress a stream of bytes. The first step is to create a custom Stream that will allow you to read and write your text format. For lack of better name I have named it TextStream. Note that to simplify matters a bit I use \n as the line ending (instead of \r\n).
class TextStream : Stream {
readonly String text;
readonly Int32 bitsPerLine;
readonly StringBuilder buffer;
Int32 textPosition;
// Initialize a readable stream.
public TextStream(String text) {
if (text == null)
throw new ArgumentNullException("text");
this.text = text;
}
// Initialize a writeable stream.
public TextStream(Int32 bitsPerLine) {
if (bitsPerLine <= 0)
throw new ArgumentException();
this.bitsPerLine = bitsPerLine;
this.buffer = new StringBuilder();
}
public override Boolean CanRead { get { return this.text != null; } }
public override Boolean CanWrite { get { return this.buffer != null; } }
public override Boolean CanSeek { get { return false; } }
public override Int64 Length { get { throw new InvalidOperationException(); } }
public override Int64 Position {
get { throw new InvalidOperationException(); }
set { throw new InvalidOperationException(); }
}
public override void Flush() {
}
public override Int32 Read(Byte[] buffer, Int32 offset, Int32 count) {
// TODO: Validate buffer, offset and count.
if (!CanRead)
throw new InvalidOperationException();
var byteCount = 0;
Byte currentByte = 0;
var bitCount = 0;
for (; byteCount < count && this.textPosition < this.text.Length; this.textPosition += 1) {
if (text[this.textPosition] != '0' && text[this.textPosition] != '1')
continue;
currentByte = (Byte) ((currentByte << 1) | (this.text[this.textPosition] == '0' ? 0 : 1));
bitCount += 1;
if (bitCount == 8) {
buffer[offset + byteCount] = currentByte;
byteCount += 1;
currentByte = 0;
bitCount = 0;
}
}
if (bitCount > 0) {
buffer[offset + byteCount] = currentByte;
byteCount += 1;
}
return byteCount;
}
public override void Write(Byte[] buffer, Int32 offset, Int32 count) {
// TODO: Validate buffer, offset and count.
if (!CanWrite)
throw new InvalidOperationException();
for (var i = 0; i < count; ++i) {
var currentByte = buffer[offset + i];
for (var mask = 0x80; mask > 0; mask /= 2) {
if (this.buffer.Length > 0) {
if ((this.buffer.Length + 1)%(2*this.bitsPerLine) == 0)
this.buffer.Append('\n');
else
this.buffer.Append(',');
}
this.buffer.Append((currentByte & mask) == 0 ? '0' : '1');
}
}
}
public override String ToString() {
if (this.text != null)
return this.text;
else
return this.buffer.ToString();
}
public override Int64 Seek(Int64 offset, SeekOrigin origin) {
throw new InvalidOperationException();
}
public override void SetLength(Int64 length) {
throw new InvalidOperationException();
}
}
Then you can write methods for compressing and decompressing using DeflateStream. Note that the the uncompressed input is a string like the one you have provided in your question an the compressed output is a base-64 encoded string.
String Compress(String text) {
using (var inputStream = new TextStream(text))
using (var outputStream = new MemoryStream()) {
using (var compressedStream = new DeflateStream(outputStream, CompressionMode.Compress))
inputStream.CopyTo(compressedStream);
return Convert.ToBase64String(outputStream.ToArray());
}
}
String Decompress(String compressedText, Int32 bitsPerLine) {
var bytes = Convert.FromBase64String(compressedText);
using (var inputStream = new MemoryStream(bytes))
using (var outputStream = new TextStream(bitsPerLine)) {
using (var compressedStream = new DeflateStream(inputStream, CompressionMode.Decompress))
compressedStream.CopyTo(outputStream);
return outputStream.ToString();
}
}
To test it I used a method to create a random string (using a fixed seed to always create the same string):
String CreateRandomString(Int32 width, Int32 height) {
var random = new Random(0);
var stringBuilder = new StringBuilder();
for (var i = 0; i < width; ++i) {
for (var j = 0; j < height; ++j) {
if (i > 0 && j == 0)
stringBuilder.Append('\n');
else if (j > 0)
stringBuilder.Append(',');
stringBuilder.Append(random.Next(2) == 0 ? '0' : '1');
}
}
return stringBuilder.ToString();
}
Creating a random 4,096 x 4,096 string has an uncompressed size of 33,554,431 characters. This is compressed to 2,797,056 characters which is a reduction to about 8% of the original size.
Skipping the base-64 encoding would increase the compression ratio even more but the output would be binary and not a string. If you also consider the input as binary you actually get the following result for random data with equal probability of 0 and 1:
Input bytes: 4,096 x 4,096 / 8 = 2,097,152
Output bytes: 2,097,792
Size after compression: 100%
Simply converting to bytes is a better than doing that following by a deflate. However, using random input but with 25% 0 and 75% 1 you get this result:
Input bytes: 4,096 x 4,096 / 8 = 2,097,152
Output bytes: 1,757,846
Size after compression: 84%
How much deflate will compress your data really depends of the nature of the data. If it is completely random you wont be able to get much compression after converting from text to bytes.
Hmm... as small as possible is not really possible without knowing the problem domain.
Here's the general approach:
Represent the ones and zeros in the array using bits not bytes or characters or whatever.
Compress using a general purpose loss-less compression algorithm. The two most common are:
Huffman encoding and some type of LZW.
Huffman can be mathematically proven to provide the best possible compression of data, the catch is in order to decompress the data you also need the Huffman tree which may be as big as the original data. LZW gives you compression equivalent to Huffman (within a few percent) for most inputs, but performs best on data with repeating segments such as text.
Implementations for the compression algorithms should be easy to come by (GZIP uses LZ77 which is an earlier slightly less optimal version of LZW.)
A good implementation of compression algorithms using modern algorithms go to 7zip.org. It's open source and they have a C API with a DLL, but you'll have to create the .Net interface (unless someone already made one.)
The non general approach:
This relays on a known characteristic of the data. For example: if you know most of the data is zeroes you can encode only the coordinates of the ones.
If the data contains patches of ones and zeros they can be encoded with RLE or two dimensional variants of the algorithm.
Trying to create your own algorithm for specifically compressing this data will most likely not yield much.
Create a GZipStream with Max CompressionLevel
Run a 4096x4096 loop
- set all 64 bits of a ulong to bits of the array
- when 64 bits are done write the ulong to the compressionstream and start at the first bit again
This will very easily add your cube into a pretty compressed block of memory
Using Huffman Coding you can compress it quite much:
0 => 111
1 => 10
, => 0
\r => 1100
\n => 1101
Yields for you example matrix (in bits):
10011101 11010010 01011001 10111101 00111010 01001011 00110110 01110111
01001110 11111001 10111101 00100111 01001011 00110110 01110111 01110111
01011001 10111101 00111010 0111010
If the commas, line feed and carriage return can be excluded then you only need a BitArray to store each value. Although now you need to know the dimension of the matrix when decoding. If you don't then you could store it as an int and then the data itself if you're planning on serializing the data.
Something like:
var input = #"1,0,0,1,1,1
0,1,0,1,1,1
1,0,0,1,0,0
0,1,1,0,1,1
1,0,0,0,0,1
0,1,0,1,0,1";
var values = new List<bool>();
foreach(var c in input)
{
if (c == '0')
values.Add(false);
else if (c == '1')
values.Add(true);
}
var ba = new BitArray(values.ToArray());
then serialize the BitArray. You'd probably need to add the number of padding bits to properly decode the data. (4096 * 4096 is divisible by 8).
The BitArray approach should get you the most compression unless there is a significant amount of repeating patterns in the matrix (yes I'm assuming the data is mostly random).
i got the following code:
byte[] myBytes = new byte[10 * 10000];
for (long i = 0; i < 10000; i++)
{
byte[] a1 = BitConverter.GetBytes(i);
byte[] a2 = BitConverter.GetBytes(true);
byte[] a3 = BitConverter.GetBytes(false);
byte[] rv = new byte[10];
System.Buffer.BlockCopy(a1, 0, rv, 0, a1.Length);
System.Buffer.BlockCopy(a2, 0, rv, a1.Length, a2.Length);
System.Buffer.BlockCopy(a3, 0, rv, a1.Length + a2.Length, a3.Length);
}
everything works as it should. i was trying to convert this code so everything will be written into myBytes but then i realised, that i use a long and if its value will be higher then int.MaxValue casting will fail.
how could one solve this?
another question would be, since i dont want to create a very large bytearray in memory, how could i send it directry to my .WriteBytes(path, myBytes); function ?
If the final destination for this is, as suggested, a file: then write to a file more directly, rather than buffering in memory:
using (var file = File.Create(path)) // or append file FileStream etc
using (var writer = new BinaryWriter(file))
{
for (long i = 0; i < 10000; i++)
{
writer.Write(i);
writer.Write(true);
writer.Write(false);
}
}
Perhaps the ideal way of doing this in your case would be to pass a single BinaryWriter instance to each object in turn as you serialize them (don't open and close the file per-object).
Why don't you just Write() the bytes out as you process them rather than converting to a massive buffer, or use a smaller buffer at least?
I'm trying to convert an int value to a byte array, but I'm using the byte for MIDI information (meaning that the 0x00 byte which is returned when using GetBytes acts as a separator) which renders my MIDI information useless.
I would like to convert the int to an array which leaves out the 0x00 bytes and just contains the bytes which contain actual values. How can I do this?
You've completely misunderstood what you need, but luckily you mentioned MIDI. You need to use the multi-byte encoding that MIDI defines, which is somewhat similar to UTF-8 in that less than 8 bits of data are placed into each octet, with the remaining providing information about the number of bits used.
See the description on wikipedia. Pay close attention to the fact that protobuf uses this encoding, you can probably reuse some of Google's code.
Based on the info Ben added, this should do what you require:
static byte[] VlqEncode(int value)
{
uint uvalue = (uint)value;
if (uvalue < 128) return new byte[] { (byte)uvalue }; // simplest case
// calculate length of buffer required
int len = 0;
do {
len++;
uvalue >>= 7;
} while (uvalue != 0);
// encode (this is untested, following the VQL/Midi/protobuf confusion)
uvalue = (uint)value;
byte[] buffer = new byte[len];
for (int offset = len - 1; offset >= 0; offset--)
{
buffer[offset] = (byte)(128 | (uvalue & 127)); // only the last 7 bits
uvalue >>= 7;
}
buffer[len - 1] &= 127;
return buffer;
}