C# binary to string - c#

I am converting integer number to binary and put it in a header of a data message.
For instance the first meesage that arrived, I would convert the counter to binary that take 4 bytes and had the data message, which is a reguler message containning a, b, c etc'.
Here is how I convert the counter :
//This is preparing the counter as binaryint
nCtrIn = ...;
int nCtrInNetwork = System.Net.IPAddress.HostToNetworkOrder(nCtrIn);
byte[] ArraybyteFormat = BitConverter.GetBytes(nCtrInNetwork);
Now the problem is that now in to take a nother string copy the byteFormat to the beginning of a string and in addition to add the string data.
I do that because I want only in the end to write to the file using binary writer
m_brWriter.Write(ArraybyteFormat);
m_brWriter.Flush();

You can simplify by letting the BinaryWriter directly write the int - no need to conver to byte[] first.
The other problem is writing the message, it can be done like:
m_brWriter.Write(nCounterIn);
string msg = ....; // get it as 1 big string
byte[] textData = Encoding.UTF8.GetBytes(msg);
m_brWriter.Write(textData);
Or, even easier and also easier to read back:
m_brWriter.Write(nCounterIn);
m_brWriter.Write(msg);
But note that the BinaryWriter will now put it's own lentgh-prefix in front of the string.

If it's important to write to the stream in single call you can concatenate the arrays:
var intArray = new byte[4]; // In real code assign
var stringArray = new byte[12]; // actual arrays
var concatenated = new byte[16];
intArray.CopyTo(concatenated, 0);
stringArray.CopyTo(concatenated, 4);
m_brWriter.Write(concatenated);
m_brWriter.Flush();
Did you consider writing the arrays in two calls to Write?

Related

How to compare effectively two SHA512Managed hash values

I use SHA512Managed class for coding user password string. I initually create etalon string coded in the folowing way:
Convert password string (for example "Johnson_#1") to byte array;
Get hash value of this byte array using SHA512Managed.ComputeHash
method. As you know, hash value gotten from SHA512Managed.ComputeHash(byte[])
method is byte array too.
Then (in program loop) I convert this hash byte array to string in the following way:
System.Text.StringBuilder sBuilder = new System.Text.StringBuilder();
for (int i = 0; i < passwordСache.Length; i++)
{
sBuilder.Append(passwordСache[i].ToString("x2"));
}
string passwordCacheString = sBuilder.ToString();
where the passwordСache is hash byte array and passwordCacheString is result string.
Finally, I store result string in MS SQL Server database table as etalon string.
The matter is in the folowing: If I periodically call SHA512Managed.ComputeHash(byte[]) method and each time pass to it the same byte array as input parameter (for example obtained from "Johnson_#1" string), then the content of returned hash byte array will differs from time to time.
So, if I convert such hash byte array to string (as I showed above) and compare this string to etalon string that is in database table, then the content of this string will differ from content of etalon string though the same string ("Johnson_#1") underlies.
Better defined the question
My question is: Is there a way of determining that two compared SHA512Managed hash byte arrays with different content were created on the base of the same string? Yuor help will be appreciated highly.
As xanatos mentioned in his comments, hash functions must be deterministic.
That is for the same input, you'll get the same hash output.
Try it for yourself:
SHA512Managed sha512Managed = new SHA512Managed();
for (int i = 0; i < 1000; i++) {
var input = Guid.NewGuid().ToString();
byte[] data = sha512Managed.ComputeHash(Encoding.UTF8.GetBytes(input));
byte[] data2 = sha512Managed.ComputeHash(Encoding.UTF8.GetBytes(input));
if (Encoding.UTF8.GetString(data) != Encoding.UTF8.GetString(data2)) {
throw new InvalidOperationException("Hash functions as we know them are useless");
}
}

Clean alternatives to passing index by reference

In my code I am parsing an array of bytes. To sequentially parse the bytes I am currently passing around the index like so:
headerData = ParseHeader(bytes, ref index)
middleData = ParseMiddle(bytes, ref index)
tailData = ParseTail (bytes, ref index)
Without hardcoding the amount to increment the header, is there a way to achieve similar functionality without having to pass the index by reference? Is this one of the rare cases that using the ref keyword is the best solution?
Like SLaks said, a possible solution is to use a stream.
To create a stream from your current bytes array you can do the following:
MemoryStream stream = new MemoryStream(bytes);
To read the current byte of the stream you can use the ReadByte method, shown below:
byte b = stream.ReadByte();
The stream will keep track of the current index in the array, so your new code would look like:
MemoryStream stream = new MemoryStream(bytes);
headerData = ParseHeader(stream)
middleData = ParseMiddle(stream)
tailData = ParseTail (stream)
Check out the documentation to see other methods that are available for MemoryStream.

How to write numbers to a file and make them readable between Java and C#

I'm into a "compatibility" issue between two versions of the same program, the first one written in Java, the second it's a port in C#.
My goal is to write some data to a file (for example, in Java), like a sequence of numbers, then to have the ability to read it in C#. Obviously, the operation should work in the reversed order.
For example, I want to write 3 numbers in sequence, represented with the following schema:
first number as one 'byte' (4 bit)
second number as one 'integer' (32 bit)
third number as one 'integer' (32 bit)
So, I can put on a new file the following sequence: 2 (as byte), 120 (as int32), 180 (as int32)
In Java, the writing procedure is more or less this one:
FileOutputStream outputStream;
byte[] byteToWrite;
// ... initialization....
// first byte
outputStream.write(first_byte);
// integers
byteToWrite = ByteBuffer.allocate(4).putInt(first_integer).array();
outputStream.write(byteToWrite);
byteToWrite = ByteBuffer.allocate(4).putInt(second_integer).array();
outputStream.write(byteToWrite);
outputStream.close();
While the reading part it's the following:
FileInputStream inputStream;
ByteBuffer byteToRead;
// ... initialization....
// first byte
first_byte = inputStream.read();
// integers
byteToRead = ByteBuffer.allocate(4);
inputStream.read(byteToRead.array());
first_integer = byteToRead.getInt();
byteToRead = ByteBuffer.allocate(4);
inputStream.read(byteToRead.array());
second_integer = byteToRead.getInt();
inputStream.close();
C# code is the following. Writing:
FileStream fs;
byte[] byteToWrite;
// ... initialization....
// first byte
byteToWrite = new byte[1];
byteToWrite[0] = first_byte;
fs.Write(byteToWrite, 0, byteToWrite.Length);
// integers
byteToWrite = BitConverter.GetBytes(first_integer);
fs.Write(byteToWrite, 0, byteToWrite.Length);
byteToWrite = BitConverter.GetBytes(second_integer);
fs.Write(byteToWrite, 0, byteToWrite.Length);
Reading:
FileStream fs;
byte[] byteToWrite;
// ... initialization....
// first byte
byte[] firstByteBuff = new byte[1];
fs.Read(firstByteBuff, 0, firstByteBuff.Length);
first_byte = firstByteBuff[0];
// integers
byteToRead = new byte[4 * 2];
fs.Read(byteToRead, 0, byteToRead.Length);
first_integer = BitConverter.ToInt32(byteToRead, 0);
second_integer = BitConverter.ToInt32(byteToRead, 4);
Please note that both the procedures works when the same Java/C# version of the program writes and reads the file. The problem is when I try to read a file written by the Java program from the C# version and viceversa. Readed integers are always "strange" numbers (like -1451020...).
There's surely a compatibility issue regarding the way Java stores and reads 32bit integer values (always signed, right?), in contrast to C#. How to handle this?
It's just an endian-ness issue. You can use my MiscUtil library to read big-endian data from .NET.
However, I would strongly advise a simpler approach to both your Java and your .NET:
In Java, use DataInputStream and DataOutputStream. There's no need to get complicated with ByteBuffer etc.
In .NET, use EndianBinaryReader from MiscUtil, which extends BinaryReader (and likewise EndianBinaryWriter for BinaryWriter)
Alternatively, consider just using text instead.
I'd consider using a standard format like XML or JSON to store your data. Then you can use standard serializers in both Java and C# to read/write the file. This sort of approach lets you easily name the data fields, read it from many languages, be easily understandable if someone opens the file in a text editor, and more easily add data to be serialized.
E.g. you can read/write JSON with Gson in Java and Json.NET in C#. The class might look like this in C#:
public class MyData
{
public byte FirstValue { get; set; }
public int SecondValue { get; set; }
public int ThirdValue { get; set; }
}
// serialize to string example
var myData = new MyData { FirstValue = 2, SecondValue = 5, ThirdValue = -1 };
string serialized = JsonConvert.SerializeObject(myData);
It would serialize to
{"FirstValue":2,"SecondValue":5,"ThirdValue":-1}
The Java would, similarly, be quite simple. You can find examples of how to read/write files in each library.
Or if an array would be a better model for your data:
string serialized = JsonConvert.SerializeObject(new[] { 2, 5, -1 }); // [2,5,-1]

How can I convert a byte array to a string array?

Just to clarify something first. I am not trying to convert a byte array to a single string. I am trying to convert a byte-array to a string-array.
I am fetching some data from the clipboard using the GetClipboardData API, and then I'm copying the data from the memory as a byte array. When you're copying multiple files (hence a CF_HDROP clipboard format), I want to convert this byte array into a string array of the files copied.
Here's my code so far.
//Get pointer to clipboard data in the selected format
var clipboardDataPointer = GetClipboardData(format);
//Do a bunch of crap necessary to copy the data from the memory
//the above pointer points at to a place we can access it.
var length = GlobalSize(clipboardDataPointer);
var #lock = GlobalLock(clipboardDataPointer);
//Init a buffer which will contain the clipboard data
var buffer = new byte[(int)length];
//Copy clipboard data to buffer
Marshal.Copy(#lock, buffer, 0, (int)length);
GlobalUnlock(clipboardDataPointer);
snapshot.InsertData(format, buffer);
Now, here's my code for reading the buffer data afterwards.
var formatter = new BinaryFormatter();
using (var serializedData = new MemoryStream(buffer))
{
paths = (string[]) formatter.Deserialize(serializedData);
}
This won't work, and it'll crash with an exception saying that the stream doesn't contain a binary header. I suppose this is because it doesn't know which type to deserialize into.
I've tried looking the Marshal class through. Nothing seems of any relevance.
If the data came through the Win32 API then a string array will just be a sequence of null-terminated strings with a double-null-terminator at the end. (Note that the strings will be UTF-16, so two bytes per character). You'll basically need to pull the strings out one at a time into an array.
The method you're looking for here is Marshal.PtrToStringUni, which you should use instead of Marshal.Copy since it works on an IntPtr. It will extract a string, up to the first null character, from your IntPtr and copy it to a string.
The idea would be to continually extract a single string, then advance the IntPtr past the null byte to the start of the next string, until you run out of buffer. I have not tested this, and it could probably be improved (in particular I think there's a smarter way to detect the end of the buffer) but the basic idea would be:
var myptr = GetClipboardData(format);
var length = GlobalSize(myptr);
var result = new List<string>();
var pos = 0;
while ( pos < length )
{
var str = Marshal.PtrToStringUni(myptr);
var count = Encoding.Unicode.GetByteCount(str);
myptr = IntPtr.Add(myptr, count + 1);
pos += count + 1;
result.Add(str);
}
return result.ToArray();
(By the way: the reason your deserialization doesn't work is because serializing a string[] doesn't just write out the characters as bytes; it writes out the structure of a string array, including additional internal bits that .NET uses like the lengths, and a binary header with type information. What you're getting back from the clipboard has none of that present, so it cannot be deserialized.)
How about this:
var strings = Encoding.Unicode
.GetString(buffer)
.Split(new[] { '\0' }, StringSplitOptions.RemoveEmptyEntries);

C# perform string operation on UTF-16 byte array

I'm reading a file into byte[] buffer. The file contains a lot of UTF-16 strings (millions) in the following format:
The first byte contain and string length in chars (range 0 .. 255)
The following bytes contains the string characters in UTF-16 encoding (each char represented by 2 bytes, means byteCount = charCount * 2).
I need to perform standard string operations for all strings in the file, for example: IndexOf, EndsWith and StartsWith, with StringComparison.OrdinalIgnoreCase and StringComparison.Ordinal.
For now my code first converting each string from byte array to System.String type. I found the following code to be the most efficient to do so:
// position/length validation removed to minimize the code
string result;
byte charLength = _buffer[_bufferI++];
int byteLength = charLength * 2;
fixed (byte* pBuffer = &_buffer[_bufferI])
{
result = new string((char*)pBuffer, 0, charLength);
}
_bufferI += byteLength;
return result;
Still, new string(char*, int, int) it's very slow because it performing unnecessary copying for each string.
Profiler says its System.String.wstrcpy(char*,char*,int32) performing slow.
I need a way to perform string operations without copying bytes for each string.
Is there a way to perform string operations on byte array directly?
Is there a way to create new string without copying its bytes?
No, you can't create a string without copying the character data.
The String object stores the meta data for the string (Length, et.c.) in the same memory area as the character data, so you can't keep the character data in the byte array and pretend that it's a String object.
You could try other ways of constructing the string from the byte data, and see if any of them has less overhead, like Encoding.UTF16.GetString.
If you are using a pointer, you could try to get multiple strings at a time, so that you don't have to fix the buffer for each string.
You could read the File using a StreamReader using Encoding.UTF16 so you do not have the "byte overhead" in between:
using (StreamReader sr = new StreamReader(filename, Encoding.UTF16))
{
string line;
while ((line = sr.ReadLine()) != null)
{
//Your Code
}
}
You could create extension methods on byte arrays to handle most of those string operations directly on the byte array and avoid the cost of converting. Not sure what all string operations you perform, so not sure if all of them could be accomplished this way.

Categories

Resources