Python to C#: how to format data for sockets? - c#

I am translating a python communication library into C#, and am having trouble interpreting how the string gets formatted before being sent over tcp.
The relevant portion of the code is as follows:
struct.pack(
'!HHBH'+str(var_name_len)+'s',
self.msg_id,
req_len,
flag,
var_name_len,
self.varname
)
Then it gets sent with: sendall()
I have looked at the Python documentation (https://docs.python.org/2/library/struct.html) but am still drawing a blank regarding the first line: '!HHBH'+str(var_name_len)+'s', I understand this is where the formatting is set, but what it is being formatted to is beyond me.
The python code that I am translating can be found at the following link:
https://github.com/linuxsand/py_openshowvar/blob/master/py_openshowvar.py
Any python and C# vigilantes out there that can help me build this bridge?
Edit: Based on jas' answer, I have written the following c# struct:
public struct messageFormat
{
ushort messageId;
ushort reqLength;
char functionType;
ushort varLengthHex;
string varname;
...
Once I populate it, I will need to send it over TCP. I have an open socket, but need to convert to to a byte[] I assume so I can use socket.send(byte[])?
Thanks

What's being formatted are the five arguments following the format string. Each argument has a corresponding element in the format string.
For the sake of the explanation, let's assume that var_name_len has the value 12 (presumably because var_name is a string of length 12 in this hypothetical case).
So the format string will be
!HHBH12s
Breaking that down according to the docs:
! Big-ending byte ordering will be used
H self.msg_id will be packed as a two-byte unsigned short
H req_len will be packed as above
B flag will be packed as a one-byte unsigned char
H var_name_len will be packed as a two-byte unsigned short
12s self.varname will be packed as a 12-byte string

Related

C++ equivalent to C# Encoding.ASCII.GetString()

I have a UDP socket in C# that sends a message using the following code
newsock.Send(sendBuffer, sendBuffer.Length, sender);
where sendBuffer is a byte[] data type.
How can I convert this message when I receive it in a UDP socket written in C++?
The C++ code is the following:
recvfrom(socket_desc, client_message, sizeof(client_message), 0,(struct sockaddr*)&client_addr, &client_struct_length)
You already have the raw bytes in client_message, and recvfrom() will return the message's byte length. If the data is truly ASCII then just use the data as-is as a char string, there is no need to convert it.
But, if you need a C++ std::string, it has a constructor that takes a char* and length as input parameters.
On the other hand, if you really need a Unicode string (ie what Encoding.GetString() returns), you can use the Win32 MultiByteToWideChar() function, or equivalent, specifying ASCII as the charset to convert from.

python astype and reshape to c# conversion

I'm creating a program in c# that involves reshaping and converting bytearrays, but I can't find any solution for the following line of code, which is written in python:
mybytearray.astype('uint16').reshape(h, l, w)
Can someone help me with this conversion, as it is a crucial piece of code for my program to work!
Kind regards,
QuickScoP3s
astype in Python does type conversion. To convert your byte array into an unsigner 16-bit integer in C#, try this:
ushort myVal = BitConverter.ToUInt16( mybytearray, 0 )
This reads the first two bytes from your array and converts them into an unsigner 16-bit integer. myVal will contain the value.

byte[] of android after conversion form Base64 string differs from c# Base64 equivalence

I am using the folowing code in c#
var Bytes = Convert.FromBase64String(mystring);
and in the byte[] I get the following values: [3,221,235,121,20,212]
but when I run the same conversion in android using :
byte[] secretKeyByteArray=Base64.decode(mystring.getBytes("UTF-8"),Base64.DEFAULT);
I get these values:[3,-35,-21,121,20,-44].
From this what I understand is that android converts byte values greater than 200 to -ve. Any suggestions on how I can get the same byte array in android as I get in c# and vice versa. Also the reason as to why this happens?. Thanks.
Those values are the same. The difference is signed and unsigned integer. You could print the hex values to see. You should interpret the bytes as unsigned integer on Android too.
After doing research I found a better alternative not mentioned in context to this problem. The problem is that the byte in android has range of -128 to 127 and the byte in c# is of range 0-255. Thus they are not equivalent in their ranges. In short, android actually converts any value above 127 to its equivalent signed value and thus the disparity. Since for my problem I wanted a solution to provide similar output in both cases so for my purpose I used the below code in c#:
var Bytes= Convert.FromBase64String(mystring);
sbyte[] secSbytes = Array.ConvertAll(Bytes, b => (sbyte)b);
The sbyte in c# is equivalent to java's byte as its range is of -128 to 127. And this solved my problem. By using this code I get [3,-35,-21,121,20,-44] for both android and java.

Read string from binary file, different encodings

I'm trying to read a binary file in Java (android) that was created by a C# program however i have stumbled in to a problem. C# by default encode string in binary file by UTF-7, Java uses UTF-8. This of course mean that the string don't get loaded in properly.
I was wonder how to read the string as UTF-7 instead of UTF-8. I also noticed that i got a similar problem with floats. Does C# and Java handle them differently and if so how do i read it correctly in Java.
Edit: I'm using the BinaryWriter class in the C# program and the DataInputStream class in java.
C# uses UTF-8 encoding unless otherwise specified.
EDIT The documentation here is incorrect.
Looking at the source, BinaryWriter writes the string length as a 7-bit encoded integer, using the following code:
protected void Write7BitEncodedInt(int value) {
// Write out an int 7 bits at a time. The high bit of the byte,
// when on, tells reader to continue reading more bytes.
uint v = (uint) value; // support negative numbers
while (v >= 0x80) {
Write((byte) (v | 0x80));
v >>= 7;
}
Write((byte)v);
}
You will need to port this code to Java in order to find out how many bytes to read.

C++ read void* data from a binary file as utf8 in Visual Studio 2010

I'm not very fluent in C++, to tell the truth.
I have some binary data in memory under a type void* (which means, i think, a pointer to some unrepresentable something/nothing). The data are first taken from the file by fread.
int readfile FILE *file, void **data_return) {
//some code...
fread((void *)data, length, 1, file);
//some code...
}
There's a complex algorithm behind reading the binary data, but I think I don't need to understand it for this task.
char *t = ((char *)loc->mo_data) + string_offset;
return t;
This code reads the void* type (loc->mo_data) as string. Still understandable for me I guess.
The problem is that this data contains russian, spanish, czech and all sort of international characters representable in UTF8.
I'm not even sure, what encoding the "char" represents, probably win1250, because the strings returned are just bad. The function returns Организация instead of Организация. The first string is the UTF8 but represented in ASCII.
The bigger picture: I'm playing with this C++ library which has already been written by someone else - the library exposes just two functions, open file (returns pointer) and get string from this file by string key (returns string). This library is being used in a C# project.
At first, I thought that there might be something wrong with passing UTF8 strings between an C# app and DLL library
[DllImport("MoReader.dll", CallingConvention = CallingConvention.Cdecl)]
public static extern IntPtr OpenFile(string path);
[DllImport("MoReader.dll", CallingConvention = CallingConvention.Cdecl)]
public static extern string FindString(IntPtr filePointer, string key);
C++ code:
extern "C" __declspec(dllexport) BinaryFileType* OpenFile(char *filePath);
extern "C" __declspec(dllexport) char *FindString(BinaryFileType *locText, char *key);
FindString returns the string but in a wrong encoding. And I don't know, how one could read ASCII represented in C# strings which are Unicode as UTF8...I tried some conversion methods but to no avail.
Though I think the problem is in the C++ code, i'd love the char type to be in the UTF8 encoding, I've noticed there's something called a codepage and there are some conversion functions and utf8 stream readers but because of my weak C++ knowledge, I don't really know the solution.
=== UPDATE ===
I've found a property in the Encoding class...
When I read the output string like this:
Encoding.UTF8.GetString(Encoding.Default.GetBytes(e))
...the result is right. I'm just getting the bytes from the string via some "Default" encoding and then I read the bytes again with the UTF8. The Default encoding here on my computer is ISO-8859-2 but it would be just plain stupid to rely on this property.
So...the question remains. I still need to know, how to read the void* type with a particular encoding. But at least, I know now that the string is being returned in the default encoding used by Windows.
** === ANSWER === **
Thanks everyone for answers.
As James pointed out, char * are just numbers. So I avoided all encoding troubles by just getting the numbers and not trying to interpret them as a string at all.
There was another problem...how to get an array of bytes in C# out of a char* in the C++ library? There is a Marshal.Copy method but I need to know the size of the string.
Every char* in C++ must end with a null character '\0'.
So in the end, I just read a byte after a byte until i find this null character. The code then looks like this.
IntPtr charPointer = ExternDll.FindString(fileIntPtr, "someString");
List<byte> bytes = new List<byte>();
for (int i=0; ;i++)
{
byte b = Marshal.ReadByte(charPointer, i);
if (b == '\0')
break;
bytes.Add(b);
}
string theResultStringInTheUTF8 = Encoding.UTF8.ToString(bytes.ToArray());
C++ is agnostic about character encoding. For that matter, if you're
getting the characters through some sort of hacky type conversions, any
language will be; there's no way for the language to know what the
encoding is.
In C++, char is really just a small integer; it's only by convention
that it contains some character encoding. But which encoding depends on
you. If your input is really UTF-8, then the char's pointed to by a
char* will contain UTF-8; if it's something else, then they'll contain
something else.
When you output the char's to the screen, C++ just passes them on (at
least by default). It's up to the terminal window to decide how to
interpret them; i.e. break the sequence into code points, then map each
code point to a graphic image. Under Unix (xterm), this is defined by
the display font; under Windows, formally at least by the code page (but
you can certainly install miscoded fonts which will screw it up). C++
has nothing to do with this. The code page for UTF-8 is 65001; if you
set the terminal to use this code page (chcp 5001 on the command
line), then output UTF-8, it should work.
.Net can automatically marshal only OEM/ANSI and Unicode/UTF-16 strings. It can't do it for UTF-8, so it's where you are wrong.
You have to manually convert strings from/to UTF-8 with System.Text.Encoding.UTF8
String decodedString = utf8.GetString(encodedBytes);
and pass them to C++ as binary data. Don't forget to append terminating '\0'

Categories

Resources