string and 4-byte Unicode characters

string and 4-byte Unicode characters - c#

I have one question about strings and chars in C#. I found that a string in C# is a Unicode string, and a char takes 2 bytes. So every char is in UTF-16 encoding. That's great, but I also read on Wikipedia that there are some characters that in UTF-16 take 4 bytes.
I'm doing a program that lets you draw characters for alphanumerical displays. In program there is also a tester, where you can write some string, and it draws it for you to see how it looks.
So how I should work with strings, where the user writes a character which takes 4 bytes, i.e. 2 chars. Because I need to go char by char through the string, find this char in the list, and draw it into the panel.

You you could do:
for( int i = 0; i < str.Length; ++i ) {
int codePoint = Char.ConvertToUTF32( str, i );
if( codePoint > 0xffff ) {
i++;
}
}
Then the codePoint represents any possible code point as a 32 bit integer.

Work entirely with String objects; don't use Char at all. Example using IndexOf:
var needle = "ℬ"; // U+1D49D (I think)
var hayStack = "a code point outside basic multi lingual plane: ℬ";
var index = heyStack.IndexOf(needle);
Most methods on the String class have overloads which accept Char or String. Most methods on Char have overrides which use String as well. Just don't use Char.

Related

Moving the first char in a string to the send of the string using a method. C#

I know there are a lot of similar questions asked, and I've looked over those, but I still can't figure out my solution.
I'm trying to write a method that takes the first character of an inputted string and moves it to the back, then I can add additional characters if needed.
Basically if the input is Hello the output would be elloH + "whatever." I hope that makes sense.
As proof that I'm just not being lazy, here is the rest of the source code for the other parts of what I am working on. It all works, I just don't know where to begin with the last part.
Thanks for looking and thanks for the help!
private string CaseSwap(string str)//method for swaping cases
{
string result = ""; //create blank var
foreach (var c in str)
if (char.IsUpper(c)) //find uppers
result += char.ToLower(c); //change to lower
else
result += char.ToUpper(c); //all other lowers changed to upper
str = result; //assign var to str
return str; //return string to method
}
private string Reverse(string str)//method for reversing string
{
char[] revArray = str.ToCharArray(); //copy into an array
Array.Reverse(revArray); //reverse the array
return new string(revArray); //return the new string
}
private string Latin(string str)//method for latin
{
}
}
}

If you want to move first character to the end of string, then you can try below
public string MoveFirstCharToEnd(string str, string whateverStr="")
{
if(string.IsNullOrEmpty(str))
return str;
string result = str.Substring(1) + str[0] + whateverStr;
return result;
}
Note: I added whateverStr as an optional parameter, so that it can support only moving first character to the end and also it supports concatenating extra string to the result.
String.Substring(Int32):
Retrieves a substring from this instance. The substring starts at a
specified character position and continues to the end of the string.

Why not just take the 1st char and combine it with the rest of the string? E.g.
Hello
^^ ^
|| |
|Substring(1) - rest of the string (substring starting from 1)
|
value[0] - first character
Code:
public static string Rotate(string value) => string.IsNullOrEmpty(value)
? value
: $"{value.Substring(1)}{value[0]}";
Generalized implementation for arbitrary rotation (either positive or negative):
public static string Rotate(string value, int count = 1) {
if (string.IsNullOrWhiteSpace(value))
return value;
return string.Concat(Enumerable
.Range(0, value.Length)
.Select(i => value[(i + count % value.Length + value.Length) % value.Length]));
}
You can simplify your current implementation with a help of Linq
using System.Linq;
...
private static string CaseSwap(string value) =>
string.Concat(value.Select(c => char.IsUpper(c)
? char.ToLower(c)
: char.ToUpper(c)));
private static string Reverse(string value) =>
string.Concat(value.Reverse());

You can try to get the first character of a string with the String.Substring(int startPosition, int length) method . With this method you can also get the rest of your text starting from position 1 (skip the first character). When you have these 2 pieces, you can concat them.
Don't forget to check for empty strings, this can be done with the String.IsNullOrEmpty(string text) method.
public static string RemoveAndConcatFirstChar(string text){
if (string.IsNullOrEmpty(text)) return "";
return text.Substring(1) + text.Substring(0,1);
}

Appending multiple characters to a string is inefficient due to the number of string objects allocated, which is not just memory intensive it's also slow. There's a reason we have StringBuilder and other such options available to us, like working with char[]s.
Here's a fairly quick method that for rotating a string left one character (moving the first character to the end):
string RotateLeft(string source)
{
var chars = source.ToCharArray();
var initial = chars[0];
Array.Copy(chars, 1, chars, 0, chars.Length - 1);
chars[^1] = initial;
return new String(chars);
}
Sadly we can't do that in-place in the string itself since they're immutable, so there's no avoiding the temporary array and string construction at the end.
Based on the fact that you called the method Latin(...) and the bit of the question where you said: "Basically if the input is Hello the output would be elloH + "whatever."... I'm assuming that you're writing a Pig Latin translation. If that's the case, you're going to need a bit more.
Pig Latin is a slightly tricky problem because it's based on the sound of the word, not the letters. For example, onto becomes ontohay (or variants thereof) while one becomes unway because the word is pronounced the same as won (with a u to capture the vowel pronunciation correctly). Phonetic operations on English is quite annoying because of all the variations with silent and implied initial letters. And don't even get me started on pseudo-vowels like y.
Special cases aside, the most common rules of Pig Latin translation code appear to be as follows:
Words starting with a single consonant followed by a vowel: move the consonant to the end and append ay.
Words starting with a pair of consonants followed by a vowel: move the consonant pair to the end and append ay.
Words that start with a vowel: append hay, yay, tay, etc.
That third one is a bit difficult since choosing the right suffix is a matter of what makes the result easiest to say... which code can't really decide all that easily. Just pick one and go with that.
Of course there are plenty of words that don't fit those rules. Anything starting with a consonant triplet for example (Christmas being the first that came to mind, followed shortly by strip... and others). Pseudo-vowels like y mess things up (cry for instance). And of course the ever-present problem of correctly representing the initial vowel sounds when you've stripped context: won is converted to un-way vocally, so rendering it as on-way in text is a little bit wrong. Same with word, whose Pig Latin version is pronounced erd-way.
For a simple first pass though... just follow the rules, treating y as a consonant if it's the first letter and as a vowel in the second or third spots.
And since this is so often a homework problem, I'm going to stop here and let you play with it for a bit. Just in case :P
(Oh, and don't forget to preserve the case of your first character just in case you're working on a capitalized word. Latin should become Atinlay, not atinLay. Just saying.)

convert a hex string to corresponding emoji string

I'm trying to create a string with emoji "👱" starting from this string "D83DDC71". For doing that I'm trying to convert the string above in this string "\uD83D\uDC71".
If i use this code it work (textbox shows 👱 as expected):
textbox.Text += "\uD83D\uDC71";
but if i use this it doesn't work (textbox shows exact text "\uD83D\uDC71" instead of single character):
textbox.Text += sender.Code.ToString("X").insert(4, #"\u").insert(0, #"\u");
What is the right way to convert hex representation of an emoji to a corresponding C# string (UTF-16)?

Okay. It seems you have a string which gives the hexadecimal of each of the UTF-16 code units of the character U+1F471 (👱).
Since char represents a UTF-16 code unit, split the string into two 4-character chunks, parse that into an int as hexadecimal, cast each to char and then combine them into a string:
var personWithBlondHair = ""
+ (char)int.Parse("D83DDC71".Substring(0, 4), NumberStyles.HexNumber)
+ (char)int.Parse("D83DDC71".Substring(4, 4), NumberStyles.HexNumber);
As per https://dotnetfiddle.net/oTgXfG

You have a string containing two shorts in hexadecimal form, so you need to parse them first. My example uses an overload of Convert.ToInt16 which also accepts an integer specifying the base of the integers in the string which, in our case, is 16 (hexadecimal).
string ParseUnicodeHex(string hex)
{
var sb = new StringBuilder();
for (int i = 0; i < hex.Length; i+=4)
{
string temp = hex.Substring(i, 4);
char character = (char)Convert.ToInt16(temp, 16);
sb.Append(character);
}
return sb.ToString();
}
Please note that this method will fail if the string's length isn't divisible by 4.
The reason this works:
textbox.Text += "\uD83D\uDC71";
is because you've got a string literal containing unicode character escape sequences. When you compile your program, the compiler replaces these escape sequences with the correct unicode bytes. This is why you cannot just add \u in front of the characters during execution to make it work.

Try this one
string str = "D83DDC71";
string emoji = string.Join("", (from Match m in Regex.Matches(str, #"\S{4}")
select (char) int.Parse(m.Value, NumberStyles.HexNumber)).ToArray());
This will Separate your string 4 by 4 into array of strings. then it will convert each of strings into char. Finally it will Join all the chars into one string as emoji. all in one line.

C# subscript string including variable

I know that in C# to write subscript, I should use Unicode
for example
I want to write H2O , I should write
String str = "H"+"\x2082"+ "O"
But I want to put variable type of int instead of 2 in formula
How can I create a string with variable, which is written in subscript?

In Unicode, the subscript digits are assigned consecutive codepoints, ranging from U+2080 to U+2089. Thus, if you take the Unicode character for subscript 0 – namely, '₀' – then you can obtain the subscript character for any other digit by adding its numeric value to the former's codepoint.
If your integer will only consist of a single digit:
int num = 3;
char subscript = (char)('₀' + num); // '₃'
If your integer may consist of any number of digits, then you can apply the same addition to each of its digits individually. The easiest way of enumerating an integer's digits is by converting it to a string, and using the LINQ Select operator to get the individual characters. Then, you subtract '0' from each character to get the digit's numeric value, and add it to '₀' as described above. The code below assumes that num is non-negative.
int num = 351;
var chars = num.ToString().Select(c => (char)('₀' + c - '0'));
string subscript = new string(chars.ToArray()); // "₃₅₁"

This wikipedia article shows you all the unicode codes for super and subscript symbols.
You can simply create a method which maps these:
public string GetSubScriptNumber(int i)
{
// get the code as needed
}
I will give a few hints to help you along:
Unfortunately you can't just do return "\x208" + i so you'll need to do a switch for the numbers 0-9 or add the integer to "\x2080".
If you only need 0-9 then do some error checking that the input is in that range and throw an ArgumentOutOfRangeException
If you need all ints then you may find it easier splitting it up into each digit and getting a char for each of those - watch out for negative numbers but there is a character for subscript minus sign too!
To include the number in your string, you can use something like String.Format:
String.Format("H{0}O", GetSubScriptNumber(i))

Try to escape it
String str = "H" + "\\x2082" + "O";
or use verbose strings
String str = "H" + #"\x2082" + "O";

Maybe we don't understand your question. Several of the answers above seem correct. Does this work?
static string GetSubscript(int value)
{
StringBuilder returnValue = new StringBuilder();
foreach (char digit in value.ToString())
returnValue.Append((char)(digit - '0' + '₀'));
return returnValue.ToString();
}
string waterFormula = "H" + GetSubscript(2) + "0" // H₂O
string methaneFormula = "CH" + GetSubscript(4) // CH₄

In C#, how can I detect if a character is a non-ASCII character?

I would like to check, in C#, if a char contains a non-ASCII character. What is the best way to check for special characters such as 志 or Ω?

ASCII ranges from 0 - 127, so just check for that range:
char c = 'a';//or whatever char you have
bool isAscii = c < 128;

bool HasNonASCIIChars(string str)
{
return (System.Text.Encoding.UTF8.GetByteCount(str) != str.Length);
}

Just in case anybody comes across this. In dotNET6 there is a new method to check whether a character is an ASCII character or not
public static bool IsAscii (char c);
To solve the issue, you can just write
var containsOnlyAscii = str.All(char.IsAscii);
using the LINQ All method.
In general, you can use this new method to check individual characters
var isAscii = char.IsAscii(c);

.NET Regex - Replace multiple characters at once without overwriting?

I'm implementing a c# program that should automatize a Mono-alphabetic substitution cipher.
The functionality i'm working on at the moment is the simplest one: The user will provide a plain text and a cipher alphabet, for example:
Plain text(input): THIS IS A TEST
Cipher alphabet: A -> Y, H -> Z, I -> K, S -> L, E -> J, T -> Q
Cipher Text(output): QZKL KL QJLQ
I thought of using regular expressions since I've been programming in perl for a while, but I'm encountering some problems on c#.
First I would like to know if someone would have a suggestion for a regular expression that would replace all occurrence of each letter by its corresponding cipher letter (provided by user) at once and without overwriting anything.
Example:
In this case, user provides plaintext "TEST", and on his cipher alphabet, he wishes to have all his T's replaced with E's, E's replaced with Y and S replaced with J. My first thought was to substitute each occurrence of a letter with an individual character and then replace that character by the cipherletter corresponding to the plaintext letter provided.
Using the same example word "TEST", the steps taken by the program to provide an answer would be:
replace T's with (lets say) #
replace E's with #
replace S's with &
Replace # with E, # with Y, & with j
Output = EYJE
This solution doesn't seem to work for large texts.
I would like to know if anyone can think of a single regular expression that would allow me to replace each letter in a given text by its corresponding letter in a 26-letter cipher alphabet without the need of splitting the task in an intermediate step as I mentioned.
If it helps visualize the process, this is a print screen of my GUI for the program: alt text http://img43.imageshack.us/img43/2118/11618743.jpg

You could also make a map of source to destination characters, then simply loop through the string and replace as you go:
Dictionary<char, char> replacements = new Dictionary<char, char>();
// set up replacement chars, like this
replacements['T'] = '#';
replacements['E'] = '#';
replacements['S'] = '&';
replacements['#'] = 'E';
replacements['#'] = 'Y';
replacements['&'] = 'J';
// actually perform the replacements
char[] result = new char[source.Length];
for (int i = 0; i < result.Length; i++) {
result[i] = replacements[source[i]];
}
return new string(result);

I don't think regular expressions are the right tool here. In Perl you would use the transliteration feature tr/TES/EYJ/. C# doesn't have this but you can do it by using a StringBuilder and looking at each character individually.
private static string Translate(string input, string from, string to)
{
StringBuilder sb = new StringBuilder();
foreach (char ch in input)
{
int i = from.IndexOf(ch);
if (i < 0)
{
sb.Append(ch);
}
else if (i < to.Length)
{
sb.Append(to[i]);
}
}
return sb.ToString();
}
The source code is a modified version of this answer from this similar question. The answers there show some other ways of doing this.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

string and 4-byte Unicode characters - c#

You you could do: for( int i = 0; i < str.Length; ++i ) { int codePoint = Char.ConvertToUTF32( str, i ); if( codePoint > 0xffff ) { i++; } } Then the codePoint represents any possible code point as a 32 bit integer.

Related

Moving the first char in a string to the send of the string using a method. C#

convert a hex string to corresponding emoji string

C# subscript string including variable

In C#, how can I detect if a character is a non-ASCII character?

.NET Regex - Replace multiple characters at once without overwriting?

Categories

Resources