parse hex incoming / create outgoing strings

parse hex incoming / create outgoing strings - c#

I've modified an example to send & receive from serial, and that works fine.
The device I'm connecting to has three commands I need to work with.
My experience is with C.
MAP - returns a list of field_names, (decimal) values & (hex) addresses
I can keep track of which values are returned as decimal or hex.
Each line is terminated with CR
:: Example:
MEMBERS:10 - number of (decimal) member names
NAME_LENGTH:15 - (decimal) length of each name string
NAME_BASE:0A34 - 10 c-strings of (15) characters each starting at address (0x0A34) (may have junk following each null terminator)
etc.
GET hexaddr hexbytecount - returns a list of 2-char hex values starting from (hexaddr).
The returned bytes are a mix of bytes/ints/longs, and null terminated c-strings terminated with CR
:: Example::
get 0a34 10 -- will return
0A34< 54 65 73 74 20 4D 65 20 4F 75 74 00 40 D3 23 0B
This happens to be 'Test Me Out'(00) followed by junk
etc.
PUT hexaddr hexbytevalue {{value...} {value...}} sends multiple hex byte values separated by spaces starting at hex address, terminated by CR/LF
These bytes are a mix of bytes/ints/longs, and null terminated c-strings :: Example:
put 0a34 50 75 73 68 - (ascii Push)
Will replace the first 4-chars at 0x0A34 to become 'Push Me Out'
SAVED OK

See my answer previously about serial handling, which might be useful Serial Port Polling and Data handling
to convert your response to actual text :-
var s = "0A34 < 54 65 73 74 20 4D 65 20 4F 75 74 00 40 D3 23 0B";
var hex = s.Substring(s.IndexOf("<") + 1).Trim().Split(new char[] {' '});
var numbers = hex.Select(h => Convert.ToInt32(h, 16)).ToList();
var toText = String.Join("",numbers.TakeWhile(n => n!=0)
.Select(n => Char.ConvertFromUtf32(n)).ToArray());
Console.WriteLine(toText);
which :-
skips through the string till after the < character, then splits the rest into hex string
then, converts each hex string into ints ( base 16 )
then, takes each number till it finds a 0 and converts each number to text (using UTF32 encoding)
then, we join all the converted strings together to recreate the original text
alternatively, more condensed
var hex = s.Substring(s.IndexOf("<") + 1).Trim().Split(new char[] {' '});
var bytes = hex.Select(h => (byte) Convert.ToInt32(h, 16)).TakeWhile(n => n != 0);
var toText = Encoding.ASCII.GetString(bytes.ToArray());
for converting to hex from a number :-
Console.WriteLine(123.ToString("X"));
Console.WriteLine(123.ToString("X4"));
Console.WriteLine(123.ToString("X8"));
Console.WriteLine(123.ToString("x4"));
also you will find playing with hex data is well documented at https://msdn.microsoft.com/en-us/library/bb311038.aspx

Related

Reading UTF8 file, gives diamond with question mark on unknown characters

I have a file I am reading from to acquire a database of music files, like this;
00 6F 74 72 6B 00 00 02 57 74 74 79 70 00 00 00 .otrk...Wttyp...
06 00 6D 00 70 00 33 70 66 69 6C 00 00 00 98 00 ..m.p.3pfil...~.
44 00 69............. D.i.....
Etc., there could be hundreds to thousands of records in this file, all split at "otrk" into a string this is the start of a new track.
The problem actually lies in the above, all the tracks start with otrk, and the has field identifiers, their length, and value, for example above;
ttyp = type, and 06 following it is the length of the value, which is .m.p.3 or 00 6D 00 70 00 33
then the next field identifier is pfil = filename, this lies the issue, specifically the length, which value is 98, however when read into a string becomes unrecognizable and defaults to a diamond with a question mark, and a value of 239, which is wrong. How can I avoid this and get the correct value in order to display the value correctly.
My code to read the file;
db_file = File.ReadAllText(filePath, Encoding.UTF8);
and the code to split and sort through the file
string[] entries = content.Split(new string[] "otrk", StringSplitOptions.None);
public List<Songs> Songs { get; } = new List<Songs>();
foreach(string entry in entries)
{
Songs.Add(Song.Create(entry));
}
Song.Create looks like;
public static Song Create(string dbString)
{
Song toRet = new Song();
for (int farthestReached = 0; farthestReached < dbString.Length;)
{
int startOfString = -1;
int iLength = -1;
byte[] b = Encoding.UTF8.GetBytes("0");
//Gets the start index
foreach(var l in labels)
{
startOfString = dbString.IndexOf(l, farthestReached);
if (startOfString >= 0)
{
// get identifer index plus its length
iLength = startOfString + 3;
var valueIndex = iLength + 5;
// get length of value
string temp = dbString.Substring(iLength + 4, 1);
b = Encoding.UTF8.GetBytes(temp);
int xLen = b[0];
// populate the label
string fieldLabel = dbString.Substring(startOfString, l.Length);
// populate the value
string fieldValue = dbString.Substring(valueIndex, xLen);
// set new
farthestReached = xLen + valueIndex;
switch (fieldLabel[0])
{
case 'p':
case 't':
string stringValue = "";
foreach (char c in fieldValue)
{
if (c == 0)
continue;
stringValue += c;
}
assignStringField(toRet, fieldLabel, stringValue);
break;
}
break;
}
}
//If a field was not found, there are no more fields
if (startOfString == -1)
break;
}
return toRet;
}

The file is not a UTF-8 file. The hex dump shown in the question makes it clear that it is not a UTF-8 file, and neither a proper text file in any other text encoding. It rather looks like some binary (serialized) format, with data fields of different types.
You cannot reliably read a binary file naively like a text file, especially considering that certain UTF-8 characters are represented by two or more bytes. Thus, it is pretty likely that the UTF-8 decoder will get confused by all the binary data and might miss the first character(s) of a text field because preceding (binary) byte not belonging to a text field could coincidentally be equal to a start byte of a multi-byte character sequence in UTF-8, thus accidentally not correctly identifying the first character(s) in a text field because the UTF-8 decoder is trying to decode a multi-byte sequence not aligning with the text field.
Not only that, but certain byte values or byte sequences are not valid UTF-8 encodings for character, and you would "lose" such bytes when trying to read them as UTF-8 text.
Also, since it is possible for byte sequences of multiple bytes to form a single UTF-8 character, you cannot rely on every individual byte being turned into a character with the same ordinal value (even if the byte value might be a valid ASCII value), since such a byte could be decoded "merely" as part of a UTF-8 byte sequence into a single character whose ordinal value is entirely different from the value of the bytes in such a byte sequence.
That said, as far as i can tell, the text field in your litte data snippet above does not look like UTF-8 at all. 00 6D 00 70 00 33 (*m*p*3) and 00 44 00 69 (*D*i) are definitely not UTF-8 -- note the zero bytes.
Thus, first consult the file format specification to figure out the actual text encoding used for the text fields in this file format. Don't guess. Don't assume. Don't believe. Look up, verify and confirm.
Secondly, since the file is not a proper text file (as already mentioned), you cannot read it like a text file with File.ReadAllText. Instead, read the raw byte data, for example with File.ReadAllBytes.
Find the otrk marker in the byte data of the file not as text, but by the 4 byte values this marker is made of.
Then, parse the byte data following the otrk marker according to the file format specification, and only decode the bytes that are actual text data into strings, using the correct text encoding as denoted by the file format specification.

How to filter hidden characters in a String using C#

I am new to C# and trying to lean how to filter data that I read from a file. I have a file that I read from that has data similer to the follwoing:
3 286 858 95.333 0.406 0.427 87.00 348 366 4 b
9 23 207 2.556 0.300 1.00 1.51 62 207 41 a
9 37 333 4.111 0.390 0.811 2.03 130 270 64 a
10 21 210 2.100 0.348 0.757 3.17 73 159 23 a
9 79 711 8.778 0.343 0.899 2.20 244 639 111 a
10 66 660 6.600 0.324 0.780 2.25 214 515 95 a
When I read these data, some of them have Carriage return Or Line Feed characters hidden in them. Can you please tell me if there is a way to remove them. For example, one of my variable may hold the the following value due to a newline character in them:
mystringval = "9
"
I want this mystringval variable to be converted back to
mystringval = "9"

If you want to get rid of all special characters, you can learn regular expressions and use Regex.Replace.
var value = "&*^)#abcd.";
var filtered = System.Text.RegularExpressions.Regex.Replace(value, #"[^\w]", "");
REGEXPLANATION
the # before the string means that you're using a literal string and c# escape sequences don't work, leaving only the regex escape sequences
[^abc] matches all characters that are not a, b, or c(to replace them with empty space)
\w is a special regex code that means a letter, number, or underscore
you can also use #"[^A-Za-z0-9\.]" which will filter letters, numbers and decimal. See http://rubular.com/ for more details.

As well as using RegEx, you can use LINQ to do something like
var goodCharacters = input
.Replace("\r", " ")
.Replace("\n", " ")
.Where(c => char.IsLetterOrDigit(c) || c == ' ' || c == '.')
.ToArray();
var result = new string(goodCharacters).Trim();
The first two Replace calls will guard against having a number at the end of one line and a number at the start of the next, e.g. "123\r\n987" would otherwise be "123987", whereas I assume you want "123 987".
Try my sample here on ideone.com.

how to remove trailing digit from code??

my input is this 12 13 13 AF 3F 5f.
I need output the same.
I pass the input through client to server:
byte[] = system.text.encoding.ascii.getbyte(input);
and receive at server side
string some = System.Text.Encoding.ASCII.GetString(output);
but I get excess 0's in the end of the byte almost around 1000's,
how do I trim this 0's with out changing my byte array size

Various options here:
some = some.Substring(0, some.IndexOf('\0') + 1); Or
some = some.Remove(some.IndexOf('\0')); Or
some = some.TrimEnd('\0');

ASCII is 7-bit. Your value AF exceeds 7 bit, try to use UTF8 encoding

String.StartsWith not working with Asian languages?

I've noticed this strange issue. Check out this Vietnamese (according to Google Translate) string:
string line = "Mìng-dĕ̤ng-ngṳ̄";
string sub = "Mìng-dĕ̤ng-ngṳ";
line.Length
15
sub.Length
14
line.StartsWith(sub)
false
Which seems to me like a wrong result. So, I've implemented my custom StartWith function, which compares the string char-by-char.
public bool CustomStartWith(string parent, string child)
{
for (int i = 0; i < child.Length; i++)
{
if (parent[i] != child[i])
return false;
}
return true;
}
And as I assumed, the results of running this function
CustomStartWith("Mìng-dĕ̤ng-ngṳ̄", "Mìng-dĕ̤ng-ngṳ")
true
What's going on here?! How's this possible?

The result returned by StartsWith is correct. By default, most string comparison methods perform culture-sensitive comparisons using the current culture, not plain byte sequences. Although your line starts with a byte sequence identical to sub, the substring it represents is not equivalent under most (or all) cultures.
If you really want a comparison that treats strings as plain byte sequences, use the overload:
line.StartsWith(sub, StringComparison.Ordinal); // true
If you want the comparison to be case-insensitive:
line.StartsWith(sub, StringComparison.OrdinalIgnoreCase); // true
Here's a more familiar example:
var line1 = "café"; // 63 61 66 E9 – precomposed character 'é' (U+00E9)
var line2 = "café"; // 63 61 66 65 301 – base letter e (U+0065) and
// combining acute accent (U+0301)
var sub = "cafe"; // 63 61 66 65
Console.WriteLine(line1.StartsWith(sub)); // false
Console.WriteLine(line2.StartsWith(sub)); // false
Console.WriteLine(line1.StartsWith(sub, StringComparison.Ordinal)); // false
Console.WriteLine(line2.StartsWith(sub, StringComparison.Ordinal)); // true
In the above examples, line2 starts with the same byte sequence as sub, followed by a combining acute accent (U+0301) to be applied to the final e. line1 uses the precomposed character for é (U+00E9), so its byte sequence does not match that of sub.
In real-world semantics, one would typically not consider cafe to be a substring of café; the e and é are treated as distinct characters. That é happens to be represented as a pair of characters starting with e is an internal implementation detail of the encoding scheme (Unicode) that should not affect results. This is demonstrated by the above example contrasting café and café; one would not expect different results unless specifically intending an ordinal (byte-by-byte) comparison.
Adapting this explanation to your example:
string line = "Mìng-dĕ̤ng-ngṳ̄"; // 4D EC 6E 67 2D 64 115 324 6E 67 2D 6E 67 1E73 304
string sub = "Mìng-dĕ̤ng-ngṳ"; // 4D EC 6E 67 2D 64 115 324 6E 67 2D 6E 67 1E73
Each .NET character represents a UTF-16 code unit, whose values are shown in the comments above. The first 14 code units are identical, which is why your char-by-char comparison evaluates to true (just like StringComparison.Ordinal). However, the 15th code unit in line is the combining macron, ◌̄ (U+0304), which combines with its preceding ṳ (U+1E73) to give ṳ̄.

This is not a bug. The String.StartsWith is in fact much smarter than just a character-by-character check of your two strings. It takes into account your current culture (language settings, etc.) and it takes into account contractions and special characters. (It does not care you need two characters to end up with ṳ̄. It compares it as one).
So this means that if you don't want to take all those culture specific settings, and just want to check it using ordinal comparison, you have to tell the comparer that.
This is the correct way to do that (not ignoring the case, like Douglas did!):
line.StartsWith(sub, StringComparison.Ordinal);

adding 48 to string number

I have a string in C# like this:
string only_number;
I assigned it a value = 40
When I check only_number[0], I get 52
When I check only_number[1], I get 48
why it is adding 48 to a character at current position? Please suggest

String is basically char[]. So what you are seeing is ASCII value of char 4 and 0.
Proof: Diff between 4 and 0 = Diff between 52 and 48.
Since it is a string so you didn't assigned it 40. Instead you assigned it "40".

What you see is the ASCII code of '4' and '0'.

It's not adding 48 to the character. What you see is the character code, and the characters for digits start at 48 in Unicode:
'0' = 48
'1' = 49
'2' = 50
'3' = 51
'4' = 52
'5' = 53
'6' = 54
'7' = 55
'8' = 56
'9' = 57
A string is a range of char values, and each char value is a 16 bit integer basically representing a code point in the Unicode character set.
When you read from only_number[0] you get a char value that is '4', and the character code for that is 52. So, what you have done is reading a character from the string, and then converted that to an integer before you display it.
So:
char c = only_number[0];
Console.WriteLine(c); // displays 4
int n = (int)only_number[0]; // cast to integer
Console.WriteLine(n); // displays 52
int m = only_number[0]; // the cast is not needed, but the value is cast anyway
Console.WriteLine(m); // displays 52

You are accessing this string and it is outputting the ASCII character codes for each of your two characters, '4' and '0' - please see here:
http://www.theasciicode.com.ar/ascii-control-characters/null-character-ascii-code-0.html

string is the array of chars, so, that;s why you recieved these results, it basicallly display the ASCII of '4' and '0'.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.