C# Unicode (Japanese Characters)

C# Unicode (Japanese Characters) - c#

I have a Japanese final coming up soon, so to help me study I made a program to help me study. But, I can't seem to get VS2008 to display any Unicode in the Console. This is a sample I used to see if I could display Unicode:
string diancai = new string(new char[]{ '\u70B9','\u83DC' });
Console.Write(diancai[0] + " " + diancai[1]);
Output is:
? ?
Please help! Thank you!

Go to your command prompt and try a command "chcp"
It should be like this
C:\> chcp
現在のコード ページ: 932
932 is japanese, If code page is not correct or if your windows does not support, It can't display it in console.
I can run yours in mine, its display following chars, mine is japanese windows.
点 菜
So, For your case, I recommand you to try with GUI program instead of console

There are two conditions that must be satisfied in order for this to work:
The console's output encoding must be able to represent Japanese characters
The console's font must be able to render them
Condition 1 should be fairly simple to deal with; just set System.Console.OutputEncoding to an appropriate Encoding, such as a UTF8Encoding. (Of course, this won't work on Windows 9x, since that doesn't really support encodings or Unicode. But you aren't using that, now, are you?)
Satisfying condition 2 is a bit more involved:
First, an appropriate font must be installed on the user's system. If there aren't any installed yet, the user will have to install some, perhaps by:
Opening intl.cpl ("Regional and Language Options" in the Control Panel on Windows XP in English)
Going to the "Languages" tab
Enabling "Install files for East Asian languages"
Clicking "OK"
Actually getting the console to use such a font seems to be fairly hairy; see the question: How to display japanese Kanji inside a cmd window under windows? for more about that.

I use Windows XP english version.
But I set my OS so it can show Japanese characters.
For Windows XP this is the step:
1.Control Panel -> Regional and Language Options -> Advanced
2.Choose Japanese.
3.Choose code page conversion tables for language do you use.
4.Push OK button
5.Restart your computer.
I tried to use "chcp" command on command prompt.
It display: Active code page 932

Related

How to show emoji in c# console output?

I have a problem in output emoji in console.
String starts with Unicode flag "\u" works well, like "\u263A".
However, if just simply copy and paste an emoji into string, like "🎁", it does not work.
code test below:
using System;
using System.Text;
namespace Test
{
class Program
{
static void Main(string[] args)
{
Console.OutputEncoding = Encoding.UTF8;
string s1 = "🎁";
string s1_uni = "\ud83c\udf81"; // unicode code for s1
string s2 = "☺";
string s2_uni = "\u263A"; // unicode code for s2
Console.WriteLine(s1);
Console.WriteLine(s1_uni);
Console.WriteLine(s2);
Console.WriteLine(s2_uni);
Console.ReadLine();
}
}
}
s1 and s1_uni can successfully be outputted while s2 and s2_uni failed.
I want to know how to fix this problem.
By the way, the font applied is 'Consolas', which works perfectly in Visual Studio.
Update:
Please note that, I've done some searches in stackoverflow before I present this question. The most common way is to set the Console encoding to utf-8, which is done in the first line of Main.
This way (Console.OutputEncoding = Encoding.UTF8) can not totally fit the situation I presented.
Also, the reason why I make supplement to the console font in the question is to declare that Consolas font works perfectly in showing emoji in VS but failed in console. The first emoji failed to show.
Please do not close this question. Thanks.
Update2:
this emoji can be shown in the VS terminal.
Update3:
Thank Peter Duniho for help. And you are right.
While we are discussing, I look through the document MS Unicode Support for the Console.
Display of characters outside the Basic Multilingual Plane (that is, of surrogate pairs) is not supported, even if they are defined in a linked font file.
Code point of the emoji can't be shown in the console is just outside the BMP. And console does not support show code point outside BMP. Therefore, this emoji is not shown.
To change running context which may support this emoji. I did some experiments.
CMD:
Power Shell:
Windows Terminal:
You can see, windows terminal supports it.
Strictly speaking, the problem I met is not a duplicate question in stackoverflow. Because my code just did whatever can be done to meet the requirement. The problem is the running context, not code.
Thank Peter Duniho for help.

The current Windows command line console, cmd.exe, still uses GDI+ to render text. And the GDI+ API it uses does not correctly handle combining/surrogate pair characters like the emoji you want to display.
This is true even when using a font that includes the glyph for the character you want, and even when you have correctly set the output encoding for the Console class to a Unicode encoding (both of which you've done in your example).
Microsoft appears to be working on improvements to the command prompt code, to upgrade it to use the DirectWrite API instead of GDI+. If and when these improvements are released, the console window should be able to display your emoji correctly. See Github issue UTF-8 rendering woes #75
In the meantime, you can run your program in a context that is able to render these characters correctly, such as Windows Terminal or PowerShell.
Additional details regarding the limitations of the GDI+ font rendering can be found in Github issues Add emoji support to Windows Console #190 and emoji/unicode support mostly broken in windows #2693 (the latter isn't about a Windows component per se, but still relates to this problem).

What does "Beta: Use Unicode UTF-8 for worldwide language support" actually do?

In some Windows 10 builds (insiders starting April 2018 and also "normal" 1903) there is a new option called "Beta: Use Unicode UTF-8 for worldwide language support".
You can see this option by going to Settings and then:
All Settings -> Time & Language -> Language -> "Administrative Language Settings"
This is what it looks like:
When this checkbox is checked I observe some irregularities (below) and I would like to know what exactly this checkbox does and why the below happens.
Create a brand new Windows Forms application in your Visual Studio 2019. On the main form specify the Paint even handler as follows:
private void Form1_Paint(object sender, PaintEventArgs e)
{
Font buttonFont = new Font("Webdings", 9.25f);
TextRenderer.DrawText(e.Graphics, "0r", buttonFont, new Point(), Color.Black);
}
Run the program, here is what you will see if the checkbox is NOT checked:
However, if you check the checkbox (and reboot as asked) this changes to:
You can look up Webdings font on Wikipedia. According to character table given, the codes for these two characters are "\U0001F5D5\U0001F5D9". If I use them instead of "0r" it works with the checkbox checked but without the checkbox checked it now looks like this:
I would like to find a solution that always works that is regardless whether the box checked or unchecked.
Can this be done?

You can see it in ProcMon.
It seems to set the REG_SZ values ACP, MACCP, and OEMCP in HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Nls\CodePage
to 65001.
I'm not entirely sure but it might be related to the variable gAnsiCodePage in KernelBase.dll, which GetACP reads. If you really want to, you might be able to change it dynamically for your program regardless of the system setting by dynamically disassembling GetACP to find the instruction sequence that reads gAnsiCodePage and obtaining a pointer to it, then updating the variable directly.
(Actually, I see references to an undocumented function named SetCPGlobal that would've done the job, but I can't find that function on my system. Not sure if it still exists.)

Most Windows C APIs come in two different variants:
"A" variant that uses 8-bit strings with whatever the systems configured encoding is. This varies depending on the configured country/language.
(Microsoft calls the configured encoding the "ANSI Code Page", but it's not really anything to do with ANSI).
"W" variant that uses 16-bit strings in a fixed almost-UTF-16 encoding. (The "almost" is because "unpaired surrogates" are allowed; if you don't know what those are then don't worry about them).
The official Microsoft advice is not to use the "A" versions, but to ensure your code always use uses the "W" variants. That way you're supposed to get consistent behaviour no matter what the user's country/language is configured as.
However, it looks like that checkbox is doing more than one thing. It's clear it's supposed to change the "ANSI Code Page" to 65001, which means UTF-8. It looks like it's also changing font rendering to be more Unicody.
I suggest you detect if GetACP() == 65001, then draw the Unicode version of your strings, otherwise draw the old "0r" version. I'm not sure how you do that from .NET...

Please look at this question to see what it solves when it is enabled: How to save to file non-ascii output of program in Powershell?
Also I found explanation written by Ghisler helpful (source):
If you check this option, Windows will use codepage 65001 (Unicode
UTF-8) instead of the local codepage like 1252 (Western Latin1) for
all plain text files. The advantage is that text files created in e.g.
Russian locale can also be read in other locale like Western or
Central Europe. The downside is that ANSI-Only programs (most older
programs) will show garbage instead of accented characters.
I leave here two ways to enable it, I think they will be helpful for many users:
Win+R -> intl.cpl
Administrative tab
Click the Change system locale button.
Enable Beta: Use Unicode UTF-8 for worldwide language support
Reboot
or alternatively via reg file:
Windows Registry Editor Version 5.00
[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Nls\CodePage]
"ACP"="65001"
"OEMCP"="65001"
"MACCP"="65001"

On my windows, When I checked the Beta: Use Unicode UTF-8 for worldwide language support.
The following regedit values in HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Nls\CodePage changed.
ACP: 936 -> 65001
MACCP: 10008 -> 65001
OEMCP : 936 -> 65001
If I do not checked, then the visual studio compilation failed with Exception: Bad UTF-8 encoding (U+FFFD; REPLACEMENT CHARACTER) found while decoding string: ..., If I checked, then the compilation successed, but the os is full with unreadable code.

C# datagridview unicode character encoding

I cannot sort out the following problem:
I use a datagridview column to tell the user if the item of that row has already been processed. A little unicode icon should suffice, I thought, so I went for U+2174 (check mark) and U+2715 (cross) to achieve what I wanted. For the datatable...
row["Done"] = (listProcessed.Contains(file.FullName)) ? "\u2714" : "\u2715";
It works well in debug and release mode on my development machine, but it fails on a Windows XP virtual machine. On that one, only narrow squares are shown, just as if it didn't know the characters.
I read somewhere that it might be due to line endings, so I tried to apply TrimEnd(null) to the strings, but that did not help.
Is there a way to make this work on Windows XP? What exactly is going wrong?
thx i.a.

That means that the Windows XP machine is using a font that does not contain those characters.
Use charmap to see if you can find a font which does. (try Arial Unicode MS)

How to type a grave accent/ back tick in a Visual Studio Console?

I need to type a back tick character in a Visual Studio 2010 Console but I can't seem to make it happen. I know it is the Unicode character +0060, and I tried the Alt+ method but that didn't work; after some research I added this line to my C# application but it still doesn't let me type it: Console.OutputEncoding = System.Text.Encoding.UTF8
Is there a simple way to make it appear? I am using the Lucida Console font.
Thanks!

ALT-096 (96 decimal = 0060 in hex) should normally work, I have a vague memory from back in the DOS days of it having to be the Alt-Gr key and the numbers must be typed on the number pad but that certainly not the case on my setup here, unless that something keyboard specific.
An alternative technique is to run the charmap.exe windows accessory (you may have to install it from Add Programs/Programs and Features if it wasn't selected at original install but it is available in every windows installation that I've come across since Win 3.x days). From that you can easily copy characters in the clipboard buffer and paste in to whatever you need.
Charmap.exe is especially useful for dealing with symbol/wingdings type fonts.
The final approach I know of is to simply use '(char) 96', a char, have you tried;
Console.Write((char) 96);

Printing a line instead of "--------"

For a Windows CE project that we print slips, we have a new request which asks if it is possible to print a line insted of printing "-----------" all the way.
Is this possible without printing an image?
c# / .net 3.5
Thank you

On your desktop run charmap.exe. Tick "Advanced view" and type "box" in the Search box. You'll get the Unicode codepoints that you can use to draw lines and boxes. Copy and paste them into your code. Whether they actually show up properly on your device depends on the font support. Odds are decent since they've been around since the first IBM PC. You'll have to try.

There are extendedascii values to do this (196) but it really depends on the printer.
Or as quppa comments use _ but it will not be adequate if you want to box in a title or so.

Wikipedia has an article on box-drawing characters.
Since ─ (U+2500) didn't work for you, it's unlikely ━ (U+2501) will work either, but it's perhaps worth a shot. There is also no guarantee that there won't be spaces between these characters, given that spaces appear between underscores.
The issue is not Windows CE supporting Unicode but finding a font that you can use that has the box-drawing characters. Given the likely size limitations (fonts with lots of characters are tens of megabytes big), this might be a challenge.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

C# Unicode (Japanese Characters) - c#

Related

How to show emoji in c# console output?

What does "Beta: Use Unicode UTF-8 for worldwide language support" actually do?

C# datagridview unicode character encoding

How to type a grave accent/ back tick in a Visual Studio Console?

Printing a line instead of "--------"

Categories

Resources