Java char literal to C# char literal - c#

I am maintaining some Java code that I am currently converting to C#.
The Java code is doing this:
sendString(somedata + '\000');
And in C# I am trying to do the same:
sendString(somedata + '\000');
But on the '\000' VS2010 tells me that "Too many characters in character literal". How can I use '\000' in C#? I have tried to find out what the character is, but it seems to be " " or some kind of newline-character.
Do you know anything about the issue?
Thanks!

'\0' will be just fine in C#.
What's happening is that C# sees \0 and converts that to a nul-character with an ASCII value of 0; then it sees two more 0s, which is illegal inside a character (since you used single quotes, not double quotes). The nul-character is typically not printable, which is why it looked like an empty string when you tried to print it.
What you've typed in Java is a character literal supporting an octal number. C# does not support octal literals in characters or numbers, in an effort to reduce programming mistakes.*
C# does supports Unicode literals of the form '\u0000' where 0000 is a 1-4 digit hexadecimal number.
* In PHP, for example, if you type in a number with a leading zero that is a valid octal number, it gets translated. If it's not a legal octal number, it doesn't get translated correctly. <? echo 017; echo ", "; echo 018; ?> outputs 15, 1 on my machine.

That's a null character, also known as NUL. You can write it as '\0' in C#.
In C# the string "\000" represents three characters: the null character, followed by two zero digits. Since a character literal can only contain one character, this is why you get the error "Too many characters in character literal".

Related

Best way to parse ASCII(?) from a hex string in C#

the string I get in the application includes ASCII(?) characters like !,dp,\b,(,s#.
These are suppose to be equivalent.
value in database-
\x01\x01\x03!\xea\x01\x00\x00dP\x00\x00\x1f\x8b\b\x00\x00\x00\x00\x00\x04\x00\xe3\xe6\x10\x11\x98\xc3(\xc1\xa2\xc0\xa8\xc0\xa0 \x02\xc4\x0c\x1a\x8c\x1a\x0c\x1as#\x04\x18\xf2\b\x1de\xe6\xe6\xe2\xe2b604\x14`\x94\x98\xc3\ba\x9b\"\xb1M\x80\xec\xc9\x10\xb6\x81\x05\x90=\t\xca6Ab[\x02\xd9\x13\xa1\xea\x8d\x80\xec.\xa8\xb8)\x12\xdb\x0c\xc8n\x81\xaa1\x06\xb2\x1b\x19\xb98A\xe2 \xf5\xb5\x10\xa6\x01\x90Y\rf\x1a\x9a#\x98\x16\b&\xc8\x8cJ\x88Z\x90\x11\xa5\x10Q\x90\xb6\x12\x88(H[1\x84\t\xf2O\xb6\xc0&v\tF\x1e\xa1\a\x8c\xc3\xd9\x8f\x8f\x8d%\x18\x01\xa1\x98\x8d\x97\xea\x01\x00\x00
value I get in my app that includes chracters I don't want-
01010321ea010000645000001f8b0800000000000400e3e6101198c328c1a2c0a8c0a02002c40c1a8c1a0c1a73400418f2081d65e6e6e2e26236303414609498c308619b22b14d80ecc910b68105903d09ca3641625b02d913a1ea8d80ec2ea8b82912db0cc86e81aa3106b21b19b93841e220f5b510a60190590d661a9a2398160826c88c4a885a9011a5105190b6128828485b318409f24fb6c0267609461ea1078cc3d98f8f8d251801a1988d97ea0100000a\n\n"3a1ea8d80ec2ea8b82912db0cc86e81aa3106b21b19b93841e220f5b510a60190590d661a9a2398160826c88c4a885a9011a5105190b6128828485b318409f24fb6c0267609461ea1078cc3d98f8f8d251801a1988d97ea0100000a\n\n"3a1ea8d80ec2ea8b82912db0cc86e81aa3106b21b19b93841e220f5b510a60190590d661a9a2398160826c88c4a885a9011a5105190b6128828485b318409f24fb6c0267609461ea1078cc3d98f8f8d251801a1988d97ea0100000a\n\n
you can see that \x01 is 01 then \x03 is 03 then ! is 21. I want to take out all the non hex values in the second string.
What are chracters like ! and dP. Are they ASCII?
I can remove characters like new line like hexString = hexString.Replace("\n", ""); But I'm not sure if that's the best way to do for all.
3.Comparing the two strings, I see that (=28 and s#=7340 . Is there a table for conversion for this?
My guess is given the quotes around the ouput that the database is displaying non-ASCII (Unicode?) characters as hex (e.g. \x03) and that the actual string contains a single character for each hex formatted display, in which case there is no difference to pick out - the character d is also the hex value \x64, it is just the database chooses to output visible characters as their normal letter - same thing with \t which could be output as \x09 but they choose to use (C) standard control character abbreviations.
Found this:
When it is displayed on screen, redis-cli escapes non-printable characters using the \xHH encoding format, where HH is hexadecimal notation.
In other words,
The cli is just using 3 different methods to display the values in the database field:
The character is printable, output the character (e.g. d, P, !, ").
The character is not printable, but has a C language standard escape sequence, output the escape sequence (e.g. \b, \t, \n).
The character is not printable and has no escape sequence, output the hex for the value of the character (e.g. \x03, \x01, \x00).

Insufficient Hexadecimal Digits Regex Exception?

I am formulating a regex where it would match with all letters (including chinese) and some chosen punctuations (also including chinese).
Here's my regex
"^[\p{L}\x{FF01}-\x{FF1E}\x{3008}-\x{30A9}0-9\s##$^&*()+=,.?`~_:;|""-{}[]+$"
It throws an exception of insufficient hexadecimal digits. Can anybody please tell me what is wrong with it? I tried some regex testers online and it works there.
Im using the Regex class of c# to parse it
From the docs:
\x nn Uses hexadecimal representation to specify a character (nn consists of exactly two digits).
I think what you want is \u:
\u nnnn Matches a Unicode character by using hexadecimal representation (exactly four digits, as represented by nnnn).
Try this:
#"^[\p{L}\uFF01-\uFF1E\u3008-\u30A90-9\s##$^&*()+=,.?`~_:;|""-{}[]+$"

Why replace invalid chars in a file name with '\0'?

i stumbled upon this bit here in a project from a colleague:
foreach (var invalidChar in Path.GetInvalidFileNameChars())
fileName = fileName.Replace(invalidChar, '\0');
the gerenal idea is obvious enough but i wonder why he chose to replace the invalid chars with the literal for the null char instead of a 'regular' char or just an empty string.
i guess there's a good reason for this choice(the guy who wrote this is a senior in our team), i'd just like to know what this reason is.
After commenting the question I was looking for proof that \0 is actually not allowed for file names. I found it:
Use any character in the current code page for a name, including Unicode characters and characters in the extended character set (128–255), except for the following:
* The following reserved characters: < (less than), > (greater than), : (colon), " (double quote), / (forward slash), \ (backslash), | (vertical bar or pipe), ? (question mark), * (asterisk)
* Integer value zero, sometimes referred to as the ASCII NUL character.
* Characters whose integer representations are in the range from 1 through 31, except for alternate data streams where these characters are allowed.
It depends on the OperationSystem where your code runs. But on Windows the char \0 (0 as int) is on the list of invalid chars for a fileName.
LinqPad (run on Windows 10):
Path.GetInvalidFileNameChars().Contains('\0').Dump(); //true
I think this code was ported from another language to .net.
It would be better to throw an exception (if an user specified the name) if the filename contains invalid chars instead of replacing them with anything.
If you need to replace them you should select a char, like _, to make it clear that there was possibly something replaced.
As per some wise people there is no thing as empty char. Also should avoid confusion about space (" ") and empty string ("").

What is the difference between using \u and \x while representing character literal

I have seen \u and \x used interchangeably in some places while representing a character literal.
For example '\u00A9' == '\x00A9' evaluates to true
Aren't we supposed to use only \u to represent unicode character? What is the use of having two ways to represent a character?
I would strongly recommend only using \u, as it's much less error-prone.
\x consumes 1-4 characters, so long as they're hex digits - whereas \u must always be followed by 4 hex digits. From the C# 5 specification, section 2.4.4.4, the grammar for \x:
hexadecimal-escape-sequence:
\x hex-digit hex-digitopt hex-digitopt hex-digitopt
So for example:
string good = "Tab\x9Good compiler";
string bad = "Tab\x9Bad compiler";
... look similar but are very different strings, as the latter is effectively "Tab" followed by U+9BAD followed by " compiler".
Personally I wish the C# language had never included \x, but there we go.
Note that there's also \U, which is always followed by 8 hex digits, primarily used for non-BMP characters.
There's one other big difference between \u and \x: the latter is only used in character and string literals, whereas \u can also be used in identifiers:
string x = "just a normal string";
Console.WriteLine(\u0078); // Still refers to the identifier x

How to present a character Unicode with 5 digit (Hex) with c# language

I want to print a Unicode character with 5 hexadecimal digits on the screen (for example to write it on a Windows Forms button).
For example, the Unicode of the character Ace Heart is 1F0B1. I tried it with \x but it can present up to 4 digits.
You can use the \U escape sequence:
string text = "Ace of hearts: \U0001f0b1";
Of course, you'll have to be using a font which supports that character...
As an aside, I'd strongly recommend avoiding the \x escape sequence, as they're hard to read. For example:
string good = "Bell: \x7Good compiler";
string bad = "Bell: \x7Bad compiler";
When presented together, at first glance it would seem that these are both "Bell: " followed by U+0007 followed by either "Good compiler" or "Bad" compiler... but because "Bad" is entirely composed of valid hex characters, the second string is actually "Bell: " followed by U+7BAD followed by " compiler".

Categories

Resources