Best way to parse ASCII(?) from a hex string in C#

Best way to parse ASCII(?) from a hex string in C# - c#

the string I get in the application includes ASCII(?) characters like !,dp,\b,(,s#.
These are suppose to be equivalent.
value in database-
\x01\x01\x03!\xea\x01\x00\x00dP\x00\x00\x1f\x8b\b\x00\x00\x00\x00\x00\x04\x00\xe3\xe6\x10\x11\x98\xc3(\xc1\xa2\xc0\xa8\xc0\xa0 \x02\xc4\x0c\x1a\x8c\x1a\x0c\x1as#\x04\x18\xf2\b\x1de\xe6\xe6\xe2\xe2b604\x14`\x94\x98\xc3\ba\x9b\"\xb1M\x80\xec\xc9\x10\xb6\x81\x05\x90=\t\xca6Ab[\x02\xd9\x13\xa1\xea\x8d\x80\xec.\xa8\xb8)\x12\xdb\x0c\xc8n\x81\xaa1\x06\xb2\x1b\x19\xb98A\xe2 \xf5\xb5\x10\xa6\x01\x90Y\rf\x1a\x9a#\x98\x16\b&\xc8\x8cJ\x88Z\x90\x11\xa5\x10Q\x90\xb6\x12\x88(H[1\x84\t\xf2O\xb6\xc0&v\tF\x1e\xa1\a\x8c\xc3\xd9\x8f\x8f\x8d%\x18\x01\xa1\x98\x8d\x97\xea\x01\x00\x00
value I get in my app that includes chracters I don't want-
01010321ea010000645000001f8b0800000000000400e3e6101198c328c1a2c0a8c0a02002c40c1a8c1a0c1a73400418f2081d65e6e6e2e26236303414609498c308619b22b14d80ecc910b68105903d09ca3641625b02d913a1ea8d80ec2ea8b82912db0cc86e81aa3106b21b19b93841e220f5b510a60190590d661a9a2398160826c88c4a885a9011a5105190b6128828485b318409f24fb6c0267609461ea1078cc3d98f8f8d251801a1988d97ea0100000a\n\n"3a1ea8d80ec2ea8b82912db0cc86e81aa3106b21b19b93841e220f5b510a60190590d661a9a2398160826c88c4a885a9011a5105190b6128828485b318409f24fb6c0267609461ea1078cc3d98f8f8d251801a1988d97ea0100000a\n\n"3a1ea8d80ec2ea8b82912db0cc86e81aa3106b21b19b93841e220f5b510a60190590d661a9a2398160826c88c4a885a9011a5105190b6128828485b318409f24fb6c0267609461ea1078cc3d98f8f8d251801a1988d97ea0100000a\n\n
you can see that \x01 is 01 then \x03 is 03 then ! is 21. I want to take out all the non hex values in the second string.
What are chracters like ! and dP. Are they ASCII?
I can remove characters like new line like hexString = hexString.Replace("\n", ""); But I'm not sure if that's the best way to do for all.
3.Comparing the two strings, I see that (=28 and s#=7340 . Is there a table for conversion for this?

My guess is given the quotes around the ouput that the database is displaying non-ASCII (Unicode?) characters as hex (e.g. \x03) and that the actual string contains a single character for each hex formatted display, in which case there is no difference to pick out - the character d is also the hex value \x64, it is just the database chooses to output visible characters as their normal letter - same thing with \t which could be output as \x09 but they choose to use (C) standard control character abbreviations.
Found this:
When it is displayed on screen, redis-cli escapes non-printable characters using the \xHH encoding format, where HH is hexadecimal notation.
In other words,
The cli is just using 3 different methods to display the values in the database field:
The character is printable, output the character (e.g. d, P, !, ").
The character is not printable, but has a C language standard escape sequence, output the escape sequence (e.g. \b, \t, \n).
The character is not printable and has no escape sequence, output the hex for the value of the character (e.g. \x03, \x01, \x00).

Related

How to convert ASCII symbol such as STX to a BitArray using C#?

I'm new to C#. In a project, I'm dealing with serial communication where I'm trying to send/receive a checksum for verifying data transfer integrity. The device I'm sending messages to accepts ascii symbols and characters, for instance an example message might look like this:
String[STX, 1, X, 5, ETX]
To calculate the checksum I need to first convert each symbol to a BitArray. For instance,
STX -> [0,1,0,0,0,0,0,0]
1 -> [1,0,0,0,1,1,0,0]
etc.. and perform some math operations. And, of course I also need to go the other way, for example:
[0,1,0,0,0,0,0,0] -> STX
I'm fine with doing this on a single character using:
byte[] res = System.Text.ASCIIEncoding.ASCII.GetBytes("X");
BitArray bit_arr = new BitArray(res);
But if I try this on a multi-character ascii symbol such as STX or SYN I get three byte arrays (one for each character). For the time being I'm using a dictionary to go from ascii symbol to unicode which works fine, but since I don't know the checksum ahead of time I'll end up having to put all possible multi-character ascii symbols into that dictionary. Is there a way to get the desired results in C#? To be crystal clear what I want is:
SomeFunc(STX) = [0,1,0,0,0,0,0,0]
Thanks for any help.

The device ... accepts ascii symbols and characters ...
ASCII typically refers to a set of codes (i.e. numeric values) for representing characters and symbols.
Printable/displayable characters (such as the upper case as well as lower case letters of the the English alphabet, punctuation marks, and decimal digits) and control codes are each assigned a unique value.
The basic set of ASCII codes range in value from zero to 127, so each code is representable in 7 bits.
ASCII has been the most widely-used basic code set for representing text in computer systems.
Programming languages typically generate text strings using the ASCII code set.
The expression "ASCII character ..." would typically be used to describe or refer to the numeric value for that character (i.e. its ASCII code value), rather than some symbol or glyph.
IOW "ASCII character" is essentially shorthand for the "ASCII code value for the character ...".
The quoted line above should be interpreted to mean that the device uses ASCII codes.
To calculate the checksum I need to first convert each symbol to a BitArray.
ASCII is not a set of "symbols and characters".
ASCII is a set of numbers. Each number represents (i.e. is mapped to) a character or symbol. Such a set of numbers is called a code set.
ASCII is a code set.
Each ASCII code can be represented in 7 bits, or less than a byte (i.e. eight bits).
Since each ASCII code is typically stored in a byte, the units of byte and character are often used interchangeably, especially when referring to storage of text.
Each character in the digital computer is represented by a number, e.g. its ASCII code.
The idea that this character is actually a "symbol" and needs to be converted to some form of numeric representation is incorrect.
A digital computer can only process numbers. If you want to represent a symbol (e.g. for text), then you have to encode that symbol as a number.
But if I try this on a multi-character ascii symbol such as STX or SYN ...
A subset of the ASCII codes represent unprintable characters, which are called the control codes.
Most of the functionality of these control codes relate to electro-mechanical teletypewriters, and paper-tape punches and readers. Some of the control codes have been repurposed for a CRT-based terminal, aka VDT.
Since ASCII control-codes are not printable/displayable, there is no need to represent them as a new single-column-wide symbol or glyph.
Hence the ASCII control-codes are represented by a two- or three-letter name, such as STX or CR.
A few control codes are frequently used in text strings, such as line feed, carriage return, and tab. Those control codes are provided printable representations (even though they are actually unprintable) when specifying text strings in some programming languages (such as C escape sequences, e.g. \n, \r, \t).
Otherwise other control codes have to be specified by their ASCII code as a numeric escape sequence (e.g. \x02 for STX).
IOW STX or SYN are not a "multi-character ascii symbol".
There is no such thing as a "multi-character ascii symbol".
For the time being I'm using a dictionary to go from ascii symbol to unicode
Is there a way to get the desired results in C#?
Each character/symbol is already represented by a number stored in a byte.
There is no need for a "dictionary" or conversion from a "symbol".
You could demonstrate this to yourself by writing a simple program.
Assign a text sting to a byte array.
Print out the byte array as a text string.
Print out the byte array, but each byte in decimal representation.
Print out the byte array, but each byte in binary representation.
Print out the byte array, but each byte in hexadecimal representation.
So the checksum could/should be calculated from the characters/bytes of the message (without any dictionary translation).

asp.net c# Regular Expression Validator for formatted numbers

I have a textbox with a numeric value. For example The number is 23542.56. The number is stored as double in an MySQL Database.
I convert the number to decimal and format it as string while I load the value from the database with thousands separator with ...ToString("N").
TextBox.Text = Convert.ToDecimal(mdr["Value"].ToString()).ToString("N");
My Regular Expression Validator accepts only digits and commas:
<asp:RegularExpressionValidator ID="RegularExpressionValidator1" runat="server" ControlToValidate="TextBox" ValidationExpression="^\d+(\,\d+$)?$" ValidationGroup="NumericValidate">Allowed Chars are: 0-9 und ,</asp:RegularExpressionValidator>
The problem is that the Validator does not accept the formatted number, for exampel 23.542,56. What is a proper way to make him accept only "0-9 and ," but also accept the thousands separator?
Thanks in advance...
Info: To show "28542.56" as "23.542,56" is the common notation in germany, that is the reason why I format the number this way.

You don't need to escape the comma , but you have to escape the dot ., otherwise it will match every character, as in .*, which matches everything.
The [ square brackets ] match any character in the given class, which contains dot ., comma , and digits.
If you want to be more strict and have triplets and such you must do:
thousands v v literal comma
^\d{1,3}(?:\.\d{3})*(?:,\d*)?$
^ ^ non-capturing-groups
This will still match ill-formed numbers with leading zeroes 0 such as:
00
01
003.999,99
There's a very easy way to exclude those, too. I leave it to you as homework :)
Hint: [1-9]
EDIT: accept only 1 1,11 1.111 1.111,11 1111,11
The regex for this should be:
^1+(?:\.111)*(?:,11)?$
It may be a little different for some corner cases, but that's basically it.

How to present a character Unicode with 5 digit (Hex) with c# language

I want to print a Unicode character with 5 hexadecimal digits on the screen (for example to write it on a Windows Forms button).
For example, the Unicode of the character Ace Heart is 1F0B1. I tried it with \x but it can present up to 4 digits.

You can use the \U escape sequence:
string text = "Ace of hearts: \U0001f0b1";
Of course, you'll have to be using a font which supports that character...
As an aside, I'd strongly recommend avoiding the \x escape sequence, as they're hard to read. For example:
string good = "Bell: \x7Good compiler";
string bad = "Bell: \x7Bad compiler";
When presented together, at first glance it would seem that these are both "Bell: " followed by U+0007 followed by either "Good compiler" or "Bad" compiler... but because "Bad" is entirely composed of valid hex characters, the second string is actually "Bell: " followed by U+7BAD followed by " compiler".

Java char literal to C# char literal

I am maintaining some Java code that I am currently converting to C#.
The Java code is doing this:
sendString(somedata + '\000');
And in C# I am trying to do the same:
sendString(somedata + '\000');
But on the '\000' VS2010 tells me that "Too many characters in character literal". How can I use '\000' in C#? I have tried to find out what the character is, but it seems to be " " or some kind of newline-character.
Do you know anything about the issue?
Thanks!

'\0' will be just fine in C#.
What's happening is that C# sees \0 and converts that to a nul-character with an ASCII value of 0; then it sees two more 0s, which is illegal inside a character (since you used single quotes, not double quotes). The nul-character is typically not printable, which is why it looked like an empty string when you tried to print it.
What you've typed in Java is a character literal supporting an octal number. C# does not support octal literals in characters or numbers, in an effort to reduce programming mistakes.*
C# does supports Unicode literals of the form '\u0000' where 0000 is a 1-4 digit hexadecimal number.
* In PHP, for example, if you type in a number with a leading zero that is a valid octal number, it gets translated. If it's not a legal octal number, it doesn't get translated correctly. <? echo 017; echo ", "; echo 018; ?> outputs 15, 1 on my machine.

That's a null character, also known as NUL. You can write it as '\0' in C#.
In C# the string "\000" represents three characters: the null character, followed by two zero digits. Since a character literal can only contain one character, this is why you get the error "Too many characters in character literal".

Convert &#char(w); to \uxxxx C#

I am working on Korean Document and the HTML Source Code contains special symbols starting with &#char(w) e.g 껰 Now I would like to convert this symbol to its Unicode represntation.
Is there a way to do so.

First, get the codepoint by converting it to int. Then, use String.Format to obtain the Unicode code string:
string result = string.Format("\\u{0:x4}", (int) chr);
or:
string result = "\\u" + ((int) chr).ToString("x4");

HTML uses the &# and &#x notation to encode Unicode characters. So your document already contains the charcters in one possible Unicode notation.
If the sequence starts with &#x the following characters are the hex code of the character. If the sequence starts with &# the following numbers are the decimal code of the character.
Convert these code to hex using ToString("x4") as in Konrad's answer.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.