I am using a c# struct as a pseudo-union (by using the LayoutKind.Explicit attribute), to pass network messages around my program. I understand how to use the layout with the primitive types, as they are of a know size.
However, how would I do this with one of the fields being a char array? I know a char is 2 bytes of data (when in unicode format), but how big is char[]? Am I correct in believing that this is a reference type, so its size is not just number of items * 2?
How would I layout the struct for this? Is it even possible?
The size is the width of a reference; so 4 bytes on x86 or 8 bytes on x64. The size of the array is irrelevant, as the array is stored separately on the heap. If you want to serialize that data to a byte stream, then it probably depends on which encoding you use for the char data. UTF16 would indeed be 2 * number of characters, but UTF8 or UTF32 will be different.
That is strange, shouldn't it be equal to the length times the number of bytes per character ?
Related
I'm reading up on the ProtectedMemory class in C# (which uses the Data Protection API in Windows (DPAPI)) and I see that in order to use the Protect() Method of the class, the data to be encrypted must be stored in a byte array whose size/length is a multiple of 16.
I know how to convert many different data types to byte array form and back again, but how can I guarantee that the size of a byte array is a multiple of 16? Do I literally need to create an array whose size is a multiple of 16 and keep track of the original data's length using another variable or am I missing something? With traditional block-ciphers all of these details are handled for you automatically with padding settings. Likewise, when I attempt to convert data back to its original form from a byte array, how do I ensure that any additional bytes are ignored, assuming of course that the original data wasn't a multiple of 16.
In the code sample provided in the .NET Framework documentation, the byte array utilised just so happens to be 16 bytes long so I'm not sure what best practice is in relation to this hence the question.
Yes, just to iterate over the possibilities given in the comments (and give an answer to this nice question), you can use:
a padding method that is also used for block cipher modes, see all the options on the Wikipedia page on the subject.
prefix a length in some form or other. A fixed size of 32 bits / 4 bytes is probably easiest. Do write down the type of encoding for the size (unsigned, little endian is probably best for C#).
Both of these already operate on bytes, so you may need to define a character encoding such as UTF-8 if you use a string.
You could also use a specific encoding of the string, e.g. one defined by ASN.1 / DER and then perform zero padding. That way you can even indicate the type of the data that has been encoded in a platform independent way. You may want to read up on masochism before taking this route.
I really hope someone can help me.
I have a single byte[] that has to show the amount of bytes in die byte[] to follow. Now my value is above 255. Is there a way to display/enter a large number?
A byte holds a value from 0 to 255. To represent 299, you either have to use 2 bytes, or use a scheme (which the receiver will have to use as well) where the value in the byte is interpreted as more than its nominal value in order to expand the possible range of values. For instance, the value could be the length / 2. This would allow lengths of 0 - 510, but would allow only even lengths (odd length arrays would need a pad byte).
You can use two (or more) bytes to represent a number larger than 255. Is that what you want ?
short value = 2451;
byte[] data = BitConverter.GetBytes(value);
If this is needed in order to exchange data with some external system, remember to read about Endianness.
That depends on what you consider a good approach. You can perform some form of encoding to allow you store larger than 2 bytes worth of data. I.e. perhaps setting the first byte as 0xFF means you will consider the next byte as part of its data.
[0x01,0x0A,0xFF,0x0A]
Would be interpreted as 3 values of [1,10,265]
I have try to generate unlock key like XXXX-XXXX-XXXX or simply small length string or Hexstring. I am using RSA algorithm to encrypt and decrypt the Key. I got some long string like
Q65g2+uiytyEUW5SFsiI/c5z9NSxyuU2CM1SEly6cAVv9PdTpH81XaWS8lITcaTZ4IjdmINwhHBosvt5kdg==
when I convert the byte array (array size is 64 byte) using the below convert method.
Convert.ToBase64String(bytes);
My requirement is to generate the minimal length Key. Is there any way to convert the Byte array (array size is 64 byte) to minimal length and I need that back to byte array or any other suggestions (to minimize the string length) would be helpful.
I have tried to convert the output string to Hex decimal, but the output is too long than the string.
You may want to take a look at What is the most efficient way to encode an arbitrary GUID into readable ASCII (33-127)? There the Base 85 encoding is discussed which is used to compress PDF files.
Though, the difference between Base64 and Base85 in your case is 8 characters.
You can safely remove trailing '==' in Base64 string because it is used for alignment and will always be there for 64-byte values (Of course you will have to add these characters back to decode the string).
Since you mention you want users to be able to type in the string,
there will be an inverse correlation between easy-of-use from point of view of users and the length of string.
Even typing a Base64 string is prone to lot of errors. Base32 strings are much easier to type, but correspondingly the length will increase.
If the users can Copy-Paste the key, then the above is moot and there should not be any valid reason why the length of the string should be as small as possible.
Obviously, you can only fit a certain amount of data into a fixed number of characters. You have pretty much maxed out the limit with base64 already which gives you 6 bits per byte.
Therefore you need to reduce the amount of data that needs to be stored. Can you reduce the key length? You could use a 96 bit key (by always leaving all other bytes zero). That would require 16 base64 characters which is much better.
It seems you don't need much security against brute forcing. So you can reduce the key size even further.
Can anyone tell me how many bytes the below string will take up?
string abc = "a";
From my article on strings:
In the current implementation at least, strings take up 20+(n/2)*4 bytes (rounding the value of n/2 down), where n is the number of characters in the string. The string type is unusual in that the size of the object itself varies. The only other classes which do this (as far as I know) are arrays. Essentially, a string is a character array in memory, plus the length of the array and the length of the string (in characters). The length of the array isn't always the same as the length in characters, as strings can be "over-allocated" within mscorlib.dll, to make building them up easier. (StringBuilder does this, for instance.) While strings are immutable to the outside world, code within mscorlib can change the contents, so StringBuilder creates a string with a larger internal character array than the current contents requires, then appends to that string until the character array is no longer big enough to cope, at which point it creates a new string with a larger array. The string length member also contains a flag in its top bit to say whether or not the string contains any non-ASCII characters. This allows for extra optimisation in some cases.
I suspect that was written before I had a chance to work with a 64-bit CLR; I suspect in 64-bit land each string takes up either 4 or 8 more bytes.
EDIT: I wrote up a blog post more recently which includes 64-bit information (and contradicts the above slightly for x86...)
Basically, Each string object require a constant 20 bytes for the object data.
The buffer requires 2 bytes per character.
The memory usage estimation for string in bytes: 20 + (2 * Length).
So, Normally The memory in CLR for this string: 22 bytes
However while we pass or sending this string to another end or in any other usage, we do not need this much memory(we never need the 20 bytes for the object data). So it depends on the type of encoding you select, while you use it.
For a default encoding, it will take 1 byte for a character.
So Answer is 1 byte for default encoding.
You can check with this code:
Encoding.Default.GetBytes("a"); //It will give you a byte array of size 1.
Encoding.Default.GetBytes("ABC"); //It will give you a byte array of size 3.
If you ask about size of string object then it is wrong to ask about its size, without debugger it is impossible to say what exactly is it. Not sure that it is possible with debugger either. string uses pointers internally.
If you ask about size of sequence of chars that it contains then it is 4, because strings are stored in UTF-16. All chars in Basic Multilingual Plane are coded with two bytes.
Is it possible to get strings, ints, etc in binary format? What I mean is that assume I have the string:
"Hello" and I want to store it in binary format, so assume "Hello" is
11110000110011001111111100000000 in binary (I know it not, I just typed something quickly).
Can I store the above binary not as a string, but in the actual format with the bits.
In addition to this, is it actually possible to store less than 8 bits. What I am getting at is if the letter A is the most frequent letter used in a text, can I use 1 bit to store it with regards to compression instead of building a binary tree.
Is it possible to get strings, ints,
etc in binary format?
Yes. There are several different methods for doing so. One common method is to make a MemoryStream out of an array of bytes, and then make a BinaryWriter on top of that memory stream, and then write ints, bools, chars, strings, whatever, to the BinaryWriter. That will fill the array with the bytes that represent the data you wrote. There are other ways to do this too.
Can I store the above binary not as a string, but in the actual format with the bits.
Sure, you can store an array of bytes.
is it actually possible to store less than 8 bits.
No. The smallest unit of storage in C# is a byte. However, there are classes that will let you treat an array of bytes as an array of bits. You should read about the BitArray class.
What encoding would you be assuming?
What you are looking for is something like Huffman coding, it's used to represent more common values with a shorter bit pattern.
How you store the bit codes is still limited to whole bytes. There is no data type that uses less than a byte. The way that you store variable width bit values is to pack them end to end in a byte array. That way you have a stream of bit values, but that also means that you can only read the stream from start to end, there is no random access to the values like you have with the byte values in a byte array.
What I am getting at is if the letter
A is the most frequent letter used in
a text, can I use 1 bit to store it
with regards to compression instead of
building a binary tree.
The algorithm you're describing is known as Huffman coding. To relate to your example, if 'A' appears frequently in the data, then the algorithm will represent 'A' as simply 1. If 'B' also appears frequently (but less frequently than A), the algorithm usually would represent 'B' as 01. Then, the rest of the characters would be 00xxxxx... etc.
In essence, the algorithm performs statistical analysis on the data and generates a code that will give you the most compression.
You can use things like:
Convert.ToBytes(1);
ASCII.GetBytes("text");
Unicode.GetBytes("text");
Once you have the bytes, you can do all the bit twiddling you want. You would need an algorithm of some sort before we can give you much more useful information.
The string is actually stored in binary format, as are all strings.
The difference between a string and another data type is that when your program displays the string, it retrieves the binary and shows the corresponding (ASCII) characters.
If you were to store data in a compressed format, you would need to assign more than 1 bit per character. How else would you identify which character is the mose frequent?
If 1 represents an 'A', what does 0 mean? all the other characters?