Most efficient way to save binary code to file - c#

I have a string that only contains 1 and 0 and I need to save this to a .txt-File.
I also want it to be as small as possible. Since I have binary code, I can turn it into pretty much everything. Saving it as binary is not an option, since apparently every character will be a whole byte, even if it's a 1 or a 0.
I thought about turning my string into an Array of Byte but trying to convert "11111111" to Byte gave me a System.OverflowException.
My next thought was using an ASCII Codepage or something. But I don't know how reliable that is. Alternatively I could turn all of the 8-Bit pieces of my string into the corresponding numbers. 8 characters would turn into a maximum of 3 (255), which seems pretty nice to me. And since I know the highest individual number will be 255 I don't even need any delimiter for decoding.
But I'm sure there's a better way.
So:
What exactly is the best/most efficient way to store a string that only contains 1 and 0?

You could represent all your data as 64 bit integers and then write them to a binary file:
// The string we are working with.
string str = #"1010101010010100010101101";
// The number of bits in a 64 bit integer!
int size = 64;
// Pad the end of the string with zeros so the length of the string is divisible by 64.
str += new string('0', str.Length % size);
// Convert each 64 character segment into a 64 bit integer.
long[] binary = new long[str.Length / size]
.Select((x, idx) => Convert.ToInt64(str.Substring(idx * size, size), 2)).ToArray();
// Copy the result to a byte array.
byte[] bytes = new byte[binary.Length * sizeof(long)];
Buffer.BlockCopy(binary, 0, bytes, 0, bytes.Length);
// Write the result to file.
File.WriteAllBytes("MyFile.bin", bytes);
EDIT:
If you're only writing 64 bits then it's a one-liner:
File.WriteAllBytes("MyFile.bin", BitConverter.GetBytes(Convert.ToUInt64(str, 2)));

I would suggest using BinaryWriter. Like this:
BinaryWriter writer = new BinaryWriter(File.Open(fileName, FileMode.Create));

Related

How to get a unique ID for a string and the string from this ID with C#?

I have this name:
string name = "Centos 64 bit";
I want to generate a 168-bit (or whatever is feasible) uid from this name and to be able to get the name from this id vice versa
.
I tried this one GetHashCode() without success.
Result would be something like:
Centos 64 bit (=) 91C47A57-E605-4902-894B-74E791F37C1F
One solution I would recommend is to use a hash function and something like a dictionary. So, get a hash - say SHA256 - of your input string and truncate it to 168 bytes.
Now, to go back from a uid to original string, you would need to have a dictionary which stores pairs like (input_string, string_uid). input_string is original string and string_uid is the uid generated for input_string using method from first paragraph.
Using this dictionary you can easily go back to original input string using string_uid.
This is one way - of course in case, you are allowed to store mappings between string and uid.
The hash normally gives you result as byte array. Converting this byte array to string is a separate step.
For example if you have 10 bytes representing integers in the range [0, 255], converting it to string if you encode the byte array as hex string, will take 20 bytes.
So the next question is do you want the length of the uid as string to be 21 bytes?
Because this will mean the hash output must be somewhere like 10 bytes, this will poorly reflect on collision resistance of the output.
what you want is not achievable. You need to store a lookup table of hash to name. Since you dont give more details of yr system it hard to say if that has to be persistent or in memory. If in memory just use a dictionary of string->string
Here you go sir:
public byte[] GetUID(string name)
{
var bytes = Encoding.ASCII.GetBytes(name);
if (bytes.Length > 21)
throw new ArgumentException("Value is too long to be used as an ID");
var uid = new byte[21];
Buffer.BlockCopy(bytes, 0, uid, 0, bytes.Length);
return bytes;
}
public string GetName(byte[] UID)
{
int length = UID.Length;
for (int i = 0; i < UID.Length; i++)
{
if (UID[i] == 0)
{
length = i;
break;
}
}
return Encoding.ASCII.GetString(UID, 0, length);
}
Caveats: it works for strings up to 21 characters in length that only use ASCII characters (no Unicode support) and it doesn't encrypt the string in any way, but I believe it meets your requirements.

Read and write more than 8 bit symbols

I am trying to write an Encoded file.The file has 9 to 12 bit symbols. While writing a file I guess that it is not written correctly the 9 bit symbols because I am unable to decode that file. Although when file has only 8 bit symbols in it. Everything works fine. This is the way I am writing a file
File.AppendAllText(outputFileName, WriteBackContent, ASCIIEncoding.Default);
Same goes for reading with ReadAllText function call.
What is the way to go here?
I am using ZXing library to encode my file using RS encoder.
ReedSolomonEncoder enc = new ReedSolomonEncoder(GenericGF.AZTEC_DATA_12);//if i use AZTEC_DATA_8 it works fine beacuse symbol size is 8 bit
int[] bytesAsInts = Array.ConvertAll(toBytes.ToArray(), c => (int)c);
enc.encode(bytesAsInts, parity);
byte[] bytes = bytesAsInts.Select(x => (byte)x).ToArray();
string contentWithParity = (ASCIIEncoding.Default.GetString(bytes.ToArray()));
WriteBackContent += contentWithParity;
File.AppendAllText(outputFileName, WriteBackContent, ASCIIEncoding.Default);
Like in the code I am initializing my Encoder with AZTEC_DATA_12 which means 12 bit symbol. Because RS Encoder requires int array so I am converting it to int array. And writing to file like here.But it works well with AZTEC_DATA_8 beacue of 8 bit symbol but not with AZTEC_DATA_12.
Main problem is here:
byte[] bytes = bytesAsInts.Select(x => (byte)x).ToArray();
You are basically throwing away part of the result when converting the single integers to single bytes.
If you look at the array after the call to encode(), you can see that some of the array elements have a value higher than 255, so they cannot be represented as bytes. However, in your code quoted above, you cast every single element in the integer array to byte, changing the element when it has a value greater than 255.
So to store the result of encode(), you have to convert the integer array to a byte array in a way that the values are not lost or modified.
In order to make this kind of conversion between byte arrays and integer arrays, you can use the function Buffer.BlockCopy(). An example on how to use this function is in this answer.
Use the samples from the answer and the one from the comment to the answer for both conversions: Turning a byte array to an integer array to pass to the encode() function and to turn the integer array returned from the encode() function back into a byte array.
Here are the sample codes from the linked answer:
// Convert byte array to integer array
byte[] result = new byte[intArray.Length * sizeof(int)];
Buffer.BlockCopy(intArray, 0, result, 0, result.Length);
// Convert integer array to byte array (with bugs fixed)
int bytesCount = byteArray.Length;
int intsCount = bytesCount / sizeof(int);
if (bytesCount % sizeof(int) != 0) intsCount++;
int[] result = new int[intsCount];
Buffer.BlockCopy(byteArray, 0, result, 0, byteArray.Length);
Now about storing the data into files: Do not turn the data into a string directly via Encoding.GetString(). Not all bit sequences are valid representations of characters in any given character set. So, converting a random sequence of random bytes into a string will sometimes fail.
Instead, either store/read the byte array directly into a file via File.WriteAllBytes() / File.ReadAllBytes() or use Convert.ToBase64() and Convert.FromBase64() to work with a base64 encoded string representation of the byte array.
Combined here is some sample code:
ReedSolomonEncoder enc = new ReedSolomonEncoder(GenericGF.AZTEC_DATA_12);//if i use AZTEC_DATA_8 it works fine beacuse symbol size is 8 bit
int[] bytesAsInts = Array.ConvertAll(toBytes.ToArray(), c => (int)c);
enc.encode(bytesAsInts, parity);
// Turn int array to byte array without loosing value
byte[] bytes = new byte[bytesAsInts.Length * sizeof(int)];
Buffer.BlockCopy(bytesAsInts, 0, bytes, 0, bytes.Length);
// Write to file
File.WriteAllBytes(outputFileName, bytes);
// Read from file
bytes = File.ReadAllBytes(outputFileName);
// Turn byte array to int array
int bytesCount = bytes.Length * 40;
int intsCount = bytesCount / sizeof(int);
if (bytesCount % sizeof(int) != 0) intsCount++;
int[] dataAsInts = new int[intsCount];
Buffer.BlockCopy(bytes, 0, dataAsInts, 0, bytes.Length);
// Decoding
ReedSolomonDecoder dec = new ReedSolomonDecoder(GenericGF.AZTEC_DATA_12);
dec.decode(dataAsInts, parity);

Shift a 128-bit signed BigInteger to always be positive

I'm converting a Guid to a BigInteger so I can base62 encode it. This works well, however, I can get negative numbers in BigInterger. How do I shift the BigInteger so the number is positive. I'll also need to be able to shift it back so I can convert back to a Guid.
// GUID is a 128-bit signed integer
Guid original = new Guid("{35db5c21-2d98-4456-88a0-af263ed87bc2}");
BigInteger b = new BigInteger(original.ToByteArray());
// shift so its a postive number?
Note: For url-safe version of Base64 consider using modifyed set of characters for Base64 ( http://en.wikipedia.org/wiki/Base64#URL_applications) instead of custom Base62.
I believe you can append 0 to the array first (will make higest byte always not to contain 1 in the highest bit) and then convert to BigInteger if you really need positive BigInteger.
do you mean base64 encode?
Convert.ToBase64String(Guid.NewGuid().ToByteArray());
If you sometimes get negative numbers, it means that your GUID value is large enough to fill all 128 bits of the BigInteger or else the BigInteger byte[] ctor is interpreting the data as such. To make sure your bytes are actually positive, check that you are getting <= 16 bytes (128 bits) and that the most-significant bit of the last byte (because it's little endian) is zero. If you have <16 bytes, you can simply append a zero byte to your array (again, append because it is little endian) to make sure the BigInteger ctor treats it as a positive number.
This article I think it can give you the solution:
In summary it is to add one more byte, to 0, if the most significant bit of the last byte is a 1
Guid original = Guid.NewGuid();
byte[] bytes = original.ToByteArray();
if ((bytes[bytes.Length - 1] & 0x80) > 0)
{
byte[] temp = new byte[bytes.Length];
Array.Copy(bytes, temp, bytes.Length);
bytes = new byte[temp.Length + 1];
Array.Copy(temp, bytes, temp.Length);
}
BigInteger guidPositive = new BigInteger(bytes);

Bits into char array

I need to transfrom bits into char array or string, help to find best way to store bits and what shoud I do if for example I have 18 bits I will make 2 char and 2 bits?
The best way to store bits in C# is in the BitArray class, if you just need them as bits. If you need the integer value of the 18 bits, then you have to convert them to int or double or whatever.
First step would be to convert your bit array into bytes and once you have an array of bytes you will need to choose a proper encoding and convert to a string which is an array of chars:
BitArray bitArray = new BitArray(new[] { true, false, true, false, });
byte[] bytes = new byte[bitArray.Length];
bitArray.CopyTo(bytes, 0);
char[] result = Encoding.UTF8.GetString(bytes).ToCharArray();
Obviously you need to know the encoding of those bits in order to be able to convert to characters. If you don't know the encoding you should reconsider what you are trying to do.

Convert 2 bytes to a number

I have a control that has a byte array in it.
Every now and then there are two bytes that tell me some info about number of future items in the array.
So as an example I could have:
...
...
Item [4] = 7
Item [5] = 0
...
...
The value of this is clearly 7.
But what about this?
...
...
Item [4] = 0
Item [5] = 7
...
...
Any idea on what that equates to (as an normal int)?
I went to binary and thought it may be 11100000000 which equals 1792. But I don't know if that is how it really works (ie does it use the whole 8 items for the byte).
Is there any way to know this with out testing?
Note: I am using C# 3.0 and visual studio 2008
BitConverter can easily convert the two bytes in a two-byte integer value:
// assumes byte[] Item = someObject.GetBytes():
short num = BitConverter.ToInt16(Item, 4); // makes a short
// out of Item[4] and Item[5]
A two-byte number has a low and a high byte. The high byte is worth 256 times as much as the low byte:
value = 256 * high + low;
So, for high=0 and low=7, the value is 7. But for high=7 and low=0, the value becomes 1792.
This of course assumes that the number is a simple 16-bit integer. If it's anything fancier, the above won't be enough. Then you need more knowledge about how the number is encoded, in order to decode it.
The order in which the high and low bytes appear is determined by the endianness of the byte stream. In big-endian, you will see high before low (at a lower address), in little-endian it's the other way around.
You say "this value is clearly 7", but it depends entirely on the encoding. If we assume full-width bytes, then in little-endian, yes; 7, 0 is 7. But in big endian it isn't.
For little-endian, what you want is
int i = byte[i] | (byte[i+1] << 8);
and for big-endian:
int i = (byte[i] << 8) | byte[i+1];
But other encoding schemes are available; for example, some schemes use 7-bit arithmetic, with the 8th bit as a continuation bit. Some schemes (UTF-8) put all the continuation bits in the first byte (so the first has only limited room for data bits), and 8 bits for the rest in the sequence.
If you simply want to put those two bytes next to each other in binary format, and see what that big number is in decimal, then you need to use this code:
if (BitConverter.IsLittleEndian)
{
byte[] tempByteArray = new byte[2] { Item[5], Item[4] };
ushort num = BitConverter.ToUInt16(tempByteArray, 0);
}
else
{
ushort num = BitConverter.ToUInt16(Item, 4);
}
If you use short num = BitConverter.ToInt16(Item, 4); as seen in the accepted answer, you are assuming that the first bit of those two bytes is the sign bit (1 = negative and 0 = positive). That answer also assumes you are using a big endian system. See this for more info on the sign bit.
If those bytes are the "parts" of an integer it works like that. But beware, that the order of bytes is platform specific and that it also depends on the length of the integer (16 bit=2 bytes, 32 bit=4bytes, ...)
In case that item[5] is the MSB
ushort result = BitConverter.ToUInt16(new byte[2] { Item[5], Item[4] }, 0);
int result = 256 * Item[5] + Item[4];

Categories

Resources