Binary to Ascii and back again - c#

I'm trying to interface with a hardware device via the serial port. When I use software like Portmon to see the messages they look like this:
42 21 21 21 21 41 45 21 26 21 29 21 26 59 5F 41 30 21 2B 21 27
42 21 21 21 21 41 47 21 27 21 28 21 27 59 5D 41 32 21 2A 21 28
When I run them thru a hex to ascii converter the commands don't make sense. Are these messages in fact something different than hex? My hope was to see the messages the device is passing and emulate them using c#. What can I do to find out exactly what the messages are?

Does the hardware device specify a protocol? Just because it's a serial port connection it doesn't mean that it has to be ASCII/Readable english Text. It could as well be just a sequence of bytes where for example 42 is a command and 21212121 is data to that command. Could be an initialization sequence or whatever.
At the end of the day, all you work with is a series of bytes. The meaning of them can be found in a protocol specification or if you don't have one, you need to manually look at each command. Issue a command to the device, capture the input, issue another command.
Look for patterns. Common Initialization? What could be the commands? What data gets passed?
Yes, it's tedious, but reverse engineering is rarely easy.
The ASCII for the Hex is this:
B!!!!AE!&!)!&Y_A0!+!'
B!!!!AG!'!(!'Y]A2!*!(
That does look like some sort of protocol to me, with some Initialization Sequence (B!!!!) and commands (AE and AG), but that's just guessing.

The decive is sending data to the computer. All digital data has the form of ones and zeroes, such as 10101001010110010... . Most often one combines groups of eight such bits (binary digits) into bytes, so all data consists of bytes. One byte can thus represent any of the 2^8 values 0 to 2^8 - 1 = 255, or, in hexadecimal notation, any of the numbers 0x00 to 0xFF.
Sometimes the bytes represent a string of alphanumerical (and other) characters, often ASCII encoded. This data format assigns a character to each value from 0 to 127. But all data is not ASCII-encoded characters.
For instance, if the device is a light-intensity sensor, then each byte could give the light intensity as a number between 0 (pitch-black) and 255 (as bright as it gets). Or, the data could be a bitmap image. Then the data would start with a couple of well-defined structures (namely this and this) specifying the colour depth (number of bits per pixel, i.e. more or less the number of colours), the width, the height, and the compression of the bitmap. Then the pixel data would begin. Typically the bytes would go BBGGRRBBGGRRBBGGRR where the first BB is the blue intensity of the first pixel, the first GG is the green intensity of the first pixel, the first RR is the red intensity of the first pixel, the second BB is the blue intensity of the second pixel, and so on.
In fact the data could mean anything. Whay kind of device is it? Does it have an open specification?

Related

How to determine UUDecoding method needed?

I'm communicating to a device that returns uuencoded data:
ASCII: EZQAEgETAhMQIBwIAUkAAABj
HEX: 45-5A-51-41-45-67-45-54-41-68-4D-51-49-42-77-49-41-55-6B-41-41-41-42-6A
The documentation for this device states the above is uuencoded but I can't figure out how to decode it. The final result won't be a human readable string but the first byte reveals the number of bytes for the following product data. (Which would be 23 or 24?)
I've tried using Crypt2 to decode it; it doesn't seem to match 644, 666, 744 modes.
I've tried to hand write it out following the Wiki: https://en.wikipedia.org/wiki/Uuencoding#Formatting_mechanism
Doesn't make sense! How do I decode this uuencoded data?
I agree with #canton7 that it looks like it's base64 encoded. You can decode it like this
byte[] decoded = Convert.FromBase64String("EZQAEgETAhMQIBwIAUkAAABj");
and if you want, you can print the hex values like this
Console.WriteLine(BitConverter.ToString(decoded));
which prints
11-94-00-12-01-13-02-13-10-20-1C-08-01-49-00-00-00-63
As #HansKilian says in the comments, this is not uuencoded.
If you base64-decode it you get (in hex):
11 94 00 12 01 13 02 13 10 20 1c 08 01 49 00 00 00 63
The first number, 17 in decimal, is the same as the number of bytes following it, which matches:
The final result won't be a human readable string but the first byte reveals the number of bytes for the following product data.
(#HansKilian made the original call that it was base64-encoded. This answer provides confirmation of that by looking at the first decoded byte, but please accept his answer)

C# How to Encode a string to minimize bytes

Is it possible to Encode a string in a certain way to minimize the number of bytes? basically i need to get 29 characters down to 11 bytes of data.
var myString = "usmiaanzaklaacn40879005900133";
byte[] bytes = Encoding.UTF8.GetBytes(myString);
Console.WriteLine(bytes.Length); //Output = 29, 1 byte per character
Console.ReadKey();
This shows when encoding with UTF8 that 29 character string results in 29 Bytes... i need 29 character string resulting in 11 bytes or less.. is this possible? I was thinking i could possible have some sort of lookup or binary mapping algorithmn but i am a little unsure on how to go about this in C#.
EDIT:
So i have a Chip that has a custom data payload of 11 bytes. I want to be able to compress a 29 character string (that is unique) into bytes, assign it to the "custom data" and then receive the custom data bytes and decompress it back to the 29 character string... now i dont know if this is possible, but any help would be greatly appreciated.. thanks :)
the string itself [usmia]-[anzakl]-[aacn40879005900]-[133] = [origin]-[dest]-[random/unique]-[weight]
Ok the last 14 characters are integers.
I have access to all the Origins and Destination... would it be feesable to create a key value store have the key as the "Origin e.g. usmia" and the value is a particular byte.. i guess that would mean i could only have like 256 different Origin and Dests and then just make the the last 14 characters an integer??
15 lg(26) + 14 lg(10) ~= 117 bits ~= 14.6 bytes. (lg = log base 2)
So even I was optimistic and assumed that your strings were always 15 lower case letters followed by 14 digits, it would still take a minimum of 15 bytes to represent.
Unless there are more restrictions, like only the lower case letters a, c, i, k, l, m, n, s, u, and z are allowed, then no, you can't code that into 11 bytes. Whoops, wait, not even then. Even that would take a little over 12 bytes.

Read Fortran binary file into C# without knowledge of Fortran source code?

Part one of my question is even if this is possible? I will briefly describe my situation first.
My work has a licence for a software that performs a very specific task, however most of our time is spent exporting data from the results into excel etc to perform further analysis. I was wondering if it was possible to dump all of the data into a C# object so that I can then write my own analysis code, which would save us a lot of time.
The software we licence was written in Fortran, but we have no access to the source code. The file looks like it is written out in binary, however I do not know if it is unformatted / sequential etc (is there anyway to discern this?).
I have used some of the other answers on this site to successfully read in the data to a byte[], however this is as far as I have got. I have tried to change portions to doubles (which I assume most of the data is) but the numbers do not strike me as being meaningful (most appear too large or too small).
I have the documentation for the software and I can see that most of the internal variable names are 8 character strings, would this be saved with the data? If not I think it would be almost impossible to match all the data to its corresponding variable. I imagine most of the data will be double arrays of the same length (the number of time points), however there will also be some arrays with a longer length as some data would have been interpolated where shorter time steps were needed for convergence.
Any tips or hints would be appreciated, or even if someone tells me its just not possible so I don't waste any more time trying to solve this.
Thank you.
If it was formatted, you should be able to read it with a text editor: The numbers are written in plain text.
So yes, it's probably unformatted.
There are different methods still. The file can have a fixed record length, or it might have a variable one.
But it seems to me that the first 4 bytes represent an integer containing the length of that record in bytes. For example, here I've written the numbers 1 to 10, and then 11 to 30 into an unformatted file, and the file looks like this:
40 1 2 3 4 5 6 7 8 9 10 40
80 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 80
(I added the new line) In here, the first 4 bytes represent the number 40, followed by 10 4-byte blocks representing the numbers 1-10, followed by another 40. The next record starts with an 80, and 20 4-byte blocks containing the numbers 11 through 30, followed by another 80.
So that might be a pattern you could try to see. Read the first 4 bytes and convert them to integer, then read that many bytes and convert them to whatever you think it should be (4 byte float, 8 byte float, et cetera), and then check whether the next 4 bytes again represents the number that you read first.
But there other methods to write data in Fortran that doesn't seem to have this behaviour, for example direct access and stream. So no guarantees.

Reading data over serial port from voltmeter

I'm sort of new at this and I'm writing a small application to read data from a voltmeter. It's a RadioShack Digital Multimeter 46-range. The purpose of my program is to perform something automatically when it detects a certain voltage. I'm using C# and I'm already familiar with the SerialPort class.
My program runs and reads the data in from the voltmeter. However, the data is all unformatted/gibberish. The device does come with its own software that displays the voltage on the PC, however this doesn't help me since I need to grab the voltage from my own program. I just can't figure out how to translate this data into something useful.
For reference, I'm using the SerialPort.Read() method:
byte[] voltage = new byte[100];
_serialPort.Read(voltage, 0, 99);
It grabs the data and displays it as so:
16 0 30 0 6 198 30 6 126 254 30 0 30 16 0 30 0 6 198 30 6 126 254 30 0 30 16 0 3
0 0 6 198 30 6 126 254 30 0 30 16 0 30 0 6 198 30 6 126 254 30 0 30 16 0 30 0 6
198 30 6 126 254 30 0 30 24 0 30 0 6 198 30 6 126 254 30 0 30 16 0 30 0 254 30 6
126 252 30 0 6 0 30 0 254 30 6 126 254 30 0
The space separates each element of the array. If I use a char[] array instead of byte[], I get complete gibberish:
▲ ? ? ▲ ♠ ~ ? ▲ ♠ ▲ ? ? ▲ ♠ ~ ? ▲ ♠ ▲ ? ? ▲ ♠ ~ ? ▲ ♠
Using the .ReadExisting() method gives me:
▲ ?~?♠~?▲ ▲? ▲ ?~♠~?▲ ?↑ ▲ ??~♠~?▲ F? ▲ ??~♠~?▲ D? ▲ ??~♠~?▲ f?
.ReadLine() times out, so doesn't work. ReadByte() and ReadChar() just give me numbers similar to the Read() into array function.
I'm in way over my head as I've never done something like this, not really sure where else to turn.
It sounds like you're close, but you need to figure out the correct Encoding to use.
To get a string from an array of bytes, you need to know the Code Page being used. If it's not covered in the manual, and you can't find it via a google/bing/other search, then you will need to use trial and error.
To see how to use GetChars() to get a string from a byte array, see Decoder.GetChars Method
In the code sample, look at this line:
Decoder uniDecoder = Encoding.Unicode.GetDecoder();
That line is specifically stating that you are to use the Unicode code page to get the correct code page.
From there, you can use an override of the Encoding class to specify different Code Pages. This is documented here: Encoding Class
If the Encoding being used isn't one of the standards, you can use the Encoding(Int32) override in the Constructor of the Encoding class. A list of valid Code Page IDs can be found at Code Pages Supported by Windows
There are two district strategies for solving your communications problem.
Locate and refer to appropriate documentation and design\modify a program to implement the specification.
The following may be appropriate, but are not guaranteed to describe the particular model DVM that you have. Nonetheless, they MAY serve as a starting point.
note that the authors of these documents comment that the Respective models may be 'visually identical', but also comments that '"Open-source packages that reportedly worked on LINUX with earlier RS-232 models do not work with the 2200039"
http://forums.parallax.com/attachment.php?attachmentid=88160&d=1325568007
http://sigrok.org/wiki/RadioShack_22-812
http://code.google.com/p/rs22812/
Try to reverse engineer the protocol. if you can read the data in a loop and collect the results, a good approach to reverse engineering a protocol, is to apply various representative signals to the DVM. You can use a short-circuit resistance measurements, various stable voltage measurements, etc.
The technique I suggest is most valuable is to use an automated variable signal generator. In this way, by analyzing the patterns of the data, you should be more readily be able to identify which points represent the raw data and which points represent stable descriptive data, like the unit of measurements, mode of operation, etc.
Some digital multimeters use 7 bit data transfer. You should set serial communication port to 7 data bits instead of standard 8 data bits.
I modified and merged a couple of older open source C programs together on linux in order to read the data values from the radio shack meter whose part number is 2200039. This is over usb. I really only added a C or an F on one range. My program is here, and it has the links where I got the other two programs in it.
I know this example is not in C#, but it does provide the format info you need. Think of it is as the API documentation written in C, you just have to translate it into C# yourself.
The protocol runs at 4800 baud, and 8N1 appears to work.

C# Int vs Byte performance & SQL Int vs Binary performance

In a C# windows app I handle HEX Strings. A single HEX string will have 5-30 HEX parts.
07 82 51 2A F1 C9 63 69 17 C1 1B BA C7 7A 18 20 20 8A 95 7A 54 5A E0 2E D4 3D 29
Currently I take this string and parse it into N number of integers using Convert.ToInt32(string, 16). I then add these int values to a database. When I extract these values from the database, I extract them as Ints and then convert them back into HEX string.
Would it be better performance wise to convert these string to bytes and then add them as binary data types within the database?
EDIT:
The 5-30 HEX parts correspond to specific tables where all the parts make up 1 record with individual parts. For instance, if i had 5 HEX values, they correspond to 5 seperate columns of 1 record.
EDIT:
To clarify (sorry):
I have 9 tables. Each table has a set number of columns.
table1:30
table2:18
table3:18
table4:18
table5:18
table6:13
table7:27
table8:5
table9:11
Each of these columns in every table corresponds to a specific HEX value.
For example, my app will receive a "payload" of 13 HEX components in a single string format: 07 82 51 2A F1 C9 63 69 17 C1 1B BA C7. Currently I take this string and parse the individual HEX components and convert them to ints, storing them in an int array. I then take these int values and store them in the corresponding table and columns in the database. When I read these values I get them as ints and then convert them to HEX strings.
What I am wondering is If I should conver the HEX string into a Byte array and store the bytes as SQL Binary variable types.
Well in terms of performance, you should of course test both ways.
However, in terms of readability, if this is just arbitrary data, I'd certainly suggest using a byte array. If it's actually meant to represent a sequence of integers, that's fine - but why would you represent an arbitrary byte array using a collection of 4-byte integers? It doesn't fit in well with anything else:
You have to consider padding if your input data isn't a multiple of 4 bytes
It's a pain to work with in terms of reading and writing the data with streams
It's not clear how you're storing the integers in the database, but I'd expect a blob to be more efficient if you're just trying to store the whole thing
I would suggest writing the code the more natural way, keeping your data close to the kind of thing it's really trying to represent, and then measuring the performance. If it's good enough, then you don't need to look any further. If it's not, you'll have a good basis for tweaking.
Yes, by far. Inserting many rows is far worse than inserting few bigger rows.
A data model often depends not on just how you want to write, but also how you want to find and read the data.
Some considerations:
If you ever have a need to find a particular a "HEX part", even when not at the start of the "HEX string", then each "HEX part" will need to be in a separate row so a database index can pick it up.
Depending on your DBMS/API, it may not be easy to seek through a BLOB or byte array. This may be important for loading non-prefix "HEX parts" or performing modifications in the middle of the "HEX string".
If the "HEX string" needs to be a PRIMARY, UNIQUE or FOREIGN KEY, or needs to be searchable by prefix, then you'll typically need a database type that is actually indexable (BLOBs typically aren't, but most DBMSes have alternate types for smaller byte arrays that are).
All in all, a byte array is probably what you need, but beware of the considerations above.

Categories

Resources