I have an application in C# that encrypt part of my files (because they are big files) using RijndaelManaged. So I convert my file to byte arrays and encrypt only a part of it.
Then I want to decrypt the file using Java. So I have to decrypt only part of the file (means those bytes) that was encrypted in C#.
Here the problem comes. Because in C# we have unsigned bytes and in Java we have signed bytes. So my encryption and decryption not working the way I want.
In C# I have joined the encrypted bytes and normal bytes together and saved them with File.WriteAllBytes. So I can't use sbyte here or I don't know how to do it:
byte[] myEncryptedFile = new byte[myFile.Length];
for (long i = 0; i < encryptedBlockBytes.Length; i++)
{
myEncryptedFile[i] = encryptedBlockBytes[i];
}
for (long i = encryptedBlockBytes.Length; i < myFile.Length; i++)
{
myEncryptedFile[i] = myFileBytes[i];
}
File.WriteAllBytes(#"C:\enc_file.big", myEncryptedFile);
( And there is an exact same code for decryption in Java )
So my questions are:
Is there any WriteAllsBytes in C#?
Or can I use unsigned bytes in Java?
Or any other solutions to my problem?
Although you cannot use unsigned bytes in Java, you may simply ignore the issue.
AES - and all modern symmetric ciphers - operates on bytes, and the input and output have been defined to be bytes (or octets). Input and output have been standardized by NIST and test vectors are available.
If you look at the separate bit content of the bytes then {200,201,202} in C# and {(byte)200, (byte)201, (byte)202} in Java are identical. This is because Java uses two-complement representation of bytes.
Take the number 200 as integer: this will be 11010000 in binary, representing the number -56 in Java if used in a (signed) byte in two complements. Now symmetric ciphers will simply transform these bits to another (normally using a full block of bits).
Once you have retrieved the answer you will see that they are identical both in C# and Java when you look at the separate bits. C# will however interpret those as unsigned values and Java as signed values.
If you want to print out or use these values as signed numbers in Java then you have to convert them to positive signed integers. The way to do this is to use int p = b & 0xFF.
This does the following (I'll use the number 200 again):
The (negative) byte value is expanded to a signed integer, remembering the sign bit:
11010000 becomes 11111111 11111111 11111111 11010000
This value is "masked" with 0xFF or 00000000 00000000 00000000 11111111 by performing the binary AND operator:
11111111 11111111 11111111 11010000 & 00000000 00000000 00000000 11111111 = 00000000 00000000 00000000 11010000
This value is identical to the value 200 as a signed integer.
Related
I implemented this checksum algorithm I found, and it works fine but I can't figure out what this "&= 0xFF" line is actually doing.
I looked up the bitwise & operator, and wikipedia claims it's a logical AND of all the bits in A with B. I also read that 0xFF is equivalent to 255 -- which should mean that all of the bits are 1. If you take any number & 0xFF, wouldn't that be the identity of the number? So A & 0xFF produces A, right?
So then I thought, wait a minute, checksum in the code below is a 32 bit Int, but 0xFF is 8bit. Does that mean that the result of checksum &= 0xFF is that 24 bits end up as zeros and only the remaining 8 bits are kept? In which case, checksum is truncated to 8 bits. Is that what's going on here?
private int CalculateChecksum(byte[] dataToCalculate)
{
int checksum = 0;
for(int i = 0; i < dataToCalculate.Length; i++)
{
checksum += dataToCalculate[i];
}
//What does this line actually do?
checksum &= 0xff;
return checksum;
}
Also, if the result is getting truncated to 8 bits, is that because 32 bits is pointless in a checksum? Is it possible to have a situation where a 32 bit checksum catches corrupt data when 8 bit checksum doesn't?
It is masking off the higher bytes, leaving only the lower byte.
checksum &= 0xFF;
Is syntactically short for:
checksum = checksum & 0xFF;
Which, since it is doing integer operations, the 0xFF gets expanded into an int:
checksum = checksum & 0x000000FF;
Which masks off the upper 3 bytes and returns the lower byte as an integer (not a byte).
To answer your other question: Since a 32-bit checksum is much wider than an 8-bit checksum, it can catch errors that an 8-bit checksum would not, but both sides need to use the same checksum calculations for that to work.
Seems like you have a good understanding of the situation.
Does that mean that the result of checksum &= 0xFF is that 24 bits end up as zeros and only the remaining 8 bits are kept?
Yes.
Is it possible to have a situation where a 32 bit checksum catches corrupt data when 8 bit checksum doesn't?
Yes.
This is performing a simple checksum on the bytes (8 bit values) by adding them and ignoring any overflow out into higher order bits. The final &=0xFF, as you suspected, just truncates the value to the 8LSB of the 32 bit (If that is your compiler's definition of int) value resulting in an unsigned value between 0 and 255.
The truncation to 8 bits and throwing away the higher order bits is simply the algorithm defined for this checksum implementation. Historically this sort of check value was used to provide some confidence that a block of bytes had been transferred over a simple serial interface correctly.
To answer your last question then yes, a 32 bit check value will be able to detect an error that would not be detected with an 8 bit check value.
Yes, the checksum is truncated to 8 bits by the
&= 0xFF. The lowest 8 bits are kept and all higher bits are set to 0.
Narrowing the checksum to 8 bits does decrease the reliability. Just think of two 32bit checksums that are different but the lowest 8 bits are equal. In case of truncating to 8 bits both would be equal, in 32bit case they are not.
I am trying to convert hex data to signed int/decimal and can't figure out what I'm doing wrong.
I need FE to turn into -2.
I'm using Convert.ToInt32(fields[10], 16) but am getting 254 instead of -2.
Any assistance would be greatly appreciated.
int is 32 bits wide, so 0xFE is REALLY being interpreted as 0x000000FE for the purposes of Convert.ToInt32(string, int), which is equal to 254 in the space of int.
Since you're wanting to work with a signed byte range of values , use Convert.ToSByte(string, int) instead (byte is unsigned by default, so you need the sbyte type instead).
Convert.ToSByte("FE",16)
Interpret the value as a signed byte:
sbyte value = Convert.ToSByte("FE", 16); //-2
Well the bounds of Int32 are -2 147 483 648 to 2 147 483 647. So FE matches 254.
In case you want to do a wrap around over 128, the most elegant solution is proably to use a signed byte (sbyte):
csharp> Convert.ToSByte("FE",16);
-2
I'm working on a parser to receive UDP information, parse it, and store it. To do so I'm using a BinaryReader since it will mostly be binary information. Some of it will be strings though. MSDN says for the ReadString() function:
Reads a string from the current stream. The string is prefixed with
the length, encoded as an integer seven bits at a time.
And I completely understand it up until "seven bits at a time" which I tried to simply ignore until I started testing. I'm creating my own byte array before putting it into a MemoryStream and attempting to read it with a BinaryReader. Here's what I first thought would work:
byte[] data = new byte[] { 3, 0, 0, 0, (byte)'C', (byte)'a', (byte)'t', }
BinaryReader reader = new BinaryReader(new MemoryStream(data));
String str = reader.ReadString();
Knowing an int is 4 bytes (and toying around long enough to find out that BinaryReader is Little Endian) I pass it the length of 3 and the corresponding letters. However str ends up holding \0\0\0. If I remove the 3 zeros and just have
byte[] data = new byte[] { 3, (byte)'C', (byte)'a', (byte)'t', }
Then it reads and stores Cat properly. To me this conflicts with the documentation saying that the length is supposed to be an integer. Now I'm beginning to think they simply mean a number with no decimal place and not the data type int. Does this mean that a BinaryReader can never read a string larger than 127 characters (since that would be 01111111 corresponding to the 7 bits part of the documentation)?
I'm writing up a protocol and need to completely understand what I'm getting into before I pass our documentation along to our clients.
I found the source code for BinaryReader. It uses a function called Read7BitEncodedInt() and after looking up that documentation and the documentation for Write7BitEncodedInt() I found this:
The integer of the value parameter is written out seven bits at a
time, starting with the seven least-significant bits. The high bit of
a byte indicates whether there are more bytes to be written after this
one. If value will fit in seven bits, it takes only one byte of space.
If value will not fit in seven bits, the high bit is set on the first
byte and written out. value is then shifted by seven bits and the next
byte is written. This process is repeated until the entire integer has
been written.
Also, Ralf found this link that better displays what's going on.
Unless they specifically say 'int' or 'Int32', they just mean an integer as in a whole number.
By '7 bits at time', they mean that it implements 7-bit length encoding, which seems a bit confusing at first but is actually rather straightforward. Here are some example values and how they are written out using 7-bit length encoding:
/*
decimal value binary value -> enc byte 1 enc byte 2 enc byte 3
85 00000000 00000000 01010101 -> 01010101 n/a n/a
1,365 00000000 00000101 01010101 -> 11010101 00001010 n/a
349,525 00000101 01010101 01010101 -> 11010101 10101010 00010101
*/
The table above uses big endian for no other reason than I simply had to pick one and it's what I'm most familiar with. The way 7-bit length encoding works, it is little endian by it's very nature.
Note that 85 writes out to 1 byte, 1,365 writes out to 2 bytes, and 349,525 writes out to 3 bytes.
Here's the same table using letters to show how each value's bits were used in the written output (dashes are zero-value bits, and the 0s and 1s are what's added by the encoding mechanism to indicate if a subsequent byte is to be written/read)...
/*
decimal value binary value -> enc byte 1 enc byte 2 enc byte 3
85 -------- -------- -AAAAAAA -> 0AAAAAAA n/a n/a
1,365 -------- -----BBB AAAAAAAA -> 1AAAAAAA 0---BBBA n/a
349,525 -----CCC BBBBBBBB AAAAAAAA -> 1AAAAAAA 1BBBBBBA 0--CCCBB
*/
So values in the range of 0 to 2^7-1 (127) will write out as 1 byte, values of 2^7 (128) to 2^14-1 (16,383) will use 2 bytes, 2^14 (16,384) to 2^21-1 (2,097,151) will take 3 bytes, and so on and so forth.
My question is that how does this assignment happen in c#? I mean, how does it calculate the answer 1 (with 257), and how does it calculate 0(with 256)?
the code is:
int intnumber=257;
byte bytenumber=(byte)intnumber;//the out put of this code is 1
int intnumber=256;
byte bytenumber=(byte)intnumber;//the out put of this code is 0
My question is what happen,that the output in first code is:1 and in second one is:0
A byte only occupies one byte in memory. An int occupies 4 bytes in memory. Here is the binary representation of some int values you've mentioned:
most significant least significant
255: 00000000 00000000 00000000 11111111
256: 00000000 00000000 00000001 00000000
257: 00000000 00000000 00000001 00000001
You can also see how this works when casting negative int values to a byte. An int value of -255, when cast to a byte, is 1.
-255: 11111111 11111111 11111111 00000001
When you cast an int to a byte, only the least significant byte is assigned to the byte value. The three higher significance bytes are ignored.
A single byte only goes up to 255. The code wraps around to 0 for 256 and 1 for 257, etc...
The most significant bits are discarded and you're left with the rest.
255 is the maximum value that can be represented in a single byte:
Hex code: FF
256 does not fit in 1 byte. It takes 2 bites to represent that:
01 00
since you're trying to put that value in a variable of type byte (which of course may only contain 1 byte), the second byte is "cropped" away, leaving only:
00
Same happens for 257 and actually for any value.
1 is assigned because the arithmetic overflow of byte values (max 255) exceed by 2 unit.
0 is assigned because exceed by 1 unit.
the byte data type contains a number between 0 to 255. When converting an int to byte, it calculates the number modulo 256.
byte = int % 256
I was looking at F# doc on bitwise ops:
Bitwise right-shift operator. The
result is the first operand with bits
shifted right by the number of bits in
the second operand. Bits shifted off
the least significant position are not
rotated into the most significant
position. For unsigned types, the most
significant bits are padded with
zeros. For signed types, the most
significant bits are padded with ones.
The type of the second argument is
int32.
What was the motivation behind this design choice comparing to C++ language (and probably C too) where MSB are padded with zeros? E.g:
int mask = -2147483648 >> 1; // C++ code
where -2147483648 =
10000000 00000000 00000000 00000000
and mask is equal to 1073741824
where 1073741824 =
01000000 00000000 00000000 00000000
Now if you write same code in F# (or C#), this will indeed pad MSB with ones and you'll get -1073741824.
where -1073741824 =
11000000 00000000 00000000 00000000
The signed shift has the nice property that shifting x right by n corresponds to floor(x/2n).
On .NET, there are CIL opcodes for both types of operations (shr to do a signed shift and shr.un to do an unsigned shift). F# and C# choose which opcode to use based on the signedness of the type which is being shifted. This means that if you want the other behavior, you just need to perform a numeric conversion before and after shifting (which actually has no runtime impact due to how numbers are stored on the CLR - an int32 on the stack is indistinguishable from a uint32).
To answer the reformed question (in the comments):
The C and C++ standards do not define the result of right-shifting a negative value (it's either implementation-defined, or undefined, I can't remember which).
This is because the standard was defined to reflect the lowest common denominator in terms of underlying instruction set. Enforcing a true arithmetic shift, for instance, takes several instructions if the instruction set doesn't contain an asr primitive. This is further complicated by the fact that the standard mandates either one's or two's complement representation.