How to display non-printable Ascii characters?

How to display non-printable Ascii characters? - c#

I came across this simple code to output ascii to the console:
Console.Write((char)1); //Output ☺
The thing is, it only works when I change the fonts of the console to RasterFonts, and it's ugly. I mean, look at those old text-based games, how did they draw some ascii art like this?
The Amazing Adventures of ANSI Dude, Snipes
How can I draw nice Ascii on that console?

Unless for some reason you are restricted to use ASCII characters you should use proper Unicode characters. It will avoid potential conflicts with mapping control characters (0-31) to printable characters and let you use lines and borders directly with .Net String type without going through encodings (since line and borders are part of "extended ASCII" and not mapped directly to Unicode characters unlike regular 7 bit ASCII codes 1-127).
Unicode "\u263a" would produce face you are looking for. For the borders and lines drawing use characters from Unicode box drawing range, for more characters see overall table http://unicode.org/charts/.

Related

Force .NET UTF-8 encoder to output 3-byte encoded characters

I am working on a foreign file format that was apparently developed in Japan. Most of their strings are stored with UTF-8 encoding in the 3-byte format (i.e. the capital A is represented as 0xEF,0xBC,0xA1). While it is no problem to decode such strings in .NET, I could not find a way to force the framework to output in the same format, as it will default to the abbreviated form (makes sense, but really I do need the 3-byte form).
Is there any standard functionality that will take care of this? Me being lazy I do not want to implement it myself :)

That's not the letter 'A'. It's a different rune, the FULLWIDTH LATIN CAPITAL LETTER A. Notice the extra spacing in 'Ａ'.
This isn't a different UTF8 format, it's a different character. Whoever produced this kind of file either made a mistake, or intentionally used those glyphs for layout purposes.
If you want to produce a similar text, you'll have to find how those characters are used in the first place, eg. for some words, every word, specific sections? Then you'll have to modify your own text to match this, eg by replacing normal letters with the full-width equivalents.
You can convert such strings with String.Normalize, using the KC or KD normalisation forms. For example, the following expression :
"'ＡA'".Normalize(System.Text.NormalizationForm.FormKC)
Returns:
'AA'

Suggestions needed to Apply SuperScript to C# string For Xsl transformation

I Want to apply SuperScript to String for display
It works fine with numbers in superscript, doesn't work for String characters.
Suggestions needed.
Works fine for :
var o2 = "O₂"; // or "O\x2082"
var unit2 = "unit²"; // or "unit\xB2"
Does not work for :
var xyz = "ABC365\xBTM"
Can not get TM superscripted over string ABC365.
Suggestions appreciated.

You seem to have completely misunderstood what is going on here, so I'll try a very basic explanation.
Unicode defines a large number of characters (some 1,114,111 if I remember right). These came from a large number of historic sources, and there's no great rhyme or reason about which characters made it in and which didn't. The available characters include some subscript and superscript digits, for example x2082 is subscript 2, and x00B2 is superscript 2. It also includes some special symbols such as the trademark symbol x2122 which are traditionally rendered with a superscript appearance.
But there's no general mechanism in Unicode to render any character in superscript or subscript rendition. If you want to write Xn, Unicode won't help you: to achieve that I had to resort to mechanisms outside Unicode, specifically HTML tagging. HTML allows you to render anything as subscript or superscript; Unicode only handles a few select cases.
C# recognizes the escape sequences \xHH and \xHHHH (depending on context), where H is any hex digit, to represent special characters by their Unicode code point value. So if there's a codepoint x2082 meaning subscript 2, you can write it as \x2082 in a Unicode string literal. But there's no codepoint for subscript-lowercase-italic N, so there's no way of representing that.
Now when you write \xBTM it should be clear that's nonsense. \x must be followed by 2 or 4 hex digits (depending on context). If you want the trademark symbol, you can use \x2122. If you want the two characters "T" and "M" in superscript rendition, you're out of luck; if you need to pass that sort of thing around in your application, you will need to pass strings containing HTML markup, rather than just plain Unicode.
You indicate that you're trying to create strings that will be used as input to an XSLT transformation. My suggestion would to pass XML documents rather than plain strings: but I would need to understand the requirement in better detail before saying that's definitively the right solution.

c# character encoding that treats accented characters like simple ones

Is there an encoding where an accented character like á or ä is treated as a single character?
and if not then what would be the most commonly used encoding today? I'm using UTF7 currently, how compatible is that with other types of encoding?
Thanks

You might think about what you're asking for. You're asking for an encoding that will recognize 'á' and turn it into 'a'. That's a converter, not an encoding. It would have to know what encoding the source is in so that it can convert to whatever encoding you're using.
Wait, maybe that's not what you're asking. There are encodings that treat those as single bytes. For example, the ISO-8859-1 encoding (also called Latin-1) treats many accented characters as a single byte.
(The following struck out because I was talking about ASCII, not UTF-7 ... long day.)
UTF-7 isn't particularly compatible with many other encodings. It has 128 possible values: just enough space for the 52 letters (upper and lower case, combined) used in the Latin alphabet, the 10 numerals, 32 control characters, and various punctuation marks. But it's not sufficient for Spanish, for example, which has upside-down questions marks and exclamation points as well as other things.
UTF-7 is "compatible" with other encodings in that it can represent the entire Unicode character set. But only some characters (known as the "direct characters") and a few control characters can be directly encoded as single ASCII bytes. Those characters will be the same as in UTF-8 and in many single-byte character sets. All other characters are represented by sequences, and will be different from any other encoding.
The most commonly used encoding today? On the Web, UTF-8 is used a lot. It's also the default encoding used when you create a StreamWriter. For the work I do (mostly English, and Western European character sets), it works better than anything else.
Now, it's possible that what you're looking for is something that will treat 'á' and 'a' as the same in comparisons. That's a different question. See Performing Culture-Insensitive String Comparisons for information on that.

This doesn't seem to have anything to do with encodings. In C# it doesn't matter what encoding you use for storage and transmission, the strings of characters are always internally in UTF-16 and ä is always 1 char long in composed form.
If "ä".Length is giving 2 to you, your string is in decomposed form and all you need to do is
string str = "ä"; //a + U+0308, .Length == 2
str = str.Normalize(NormalizationForm.FormC); //just ä now, with Length == 1

Sorry for the confusion over this issue, i finally found what i was looking for, which is that i needed my text to use Windows-1250 (Central European (Windows)) code page, because that is what a lot of other programs use, that correctly support characters like €đłŁ¤...etc
Thanks for all the help i got, it was a useful learning experience.

Select Unicode character subset by culture

I'd like to iterate over a character list. The character list should be a subset of a Unicode character set. Unicode have a lot of codes, and AFAIK Unicode includes all characters for all cultures.
What I want is to select a certain subset of Unicode characters depending on a specific culture, since a specific culture doesn't use all Unicode characters.
Is this possible?
I'm trying to draw a set of characters for OpenGL texture generation. In this way I can render font glyphs with OpenGL using a texture (very simple, enought fast). I'm already supporting ASCII character set, since there are less than 256 displayable characters, but with Unicode I need to select a subset of characters, otherwise the resulting OpenGL texture will be unmanageable.
What I'm trying is to select a subset of Unicode characters depending on the requested culture. I cannot think about another top level filter except the culture.

I think it would be fair to say, a specific culture does not use most Unicode Characters.
Check out the current standard. I don't think there is a direct correlation between Cultures and Scripts, this previous question touches on the problem.

.Net Regex that Matches Strings With Any non-ASCII char in it

Looking for some black magic that will match any string with "weird" characters in it. Standard ASCII characters are fine. Everything else isn't.
This is for sanitizing various web forms.

This gets anything out of the ASCII range
[^\x00-\x7F]
There are still some "weird" characters like x00 (NULL), but they are valid ASCII.
For reference, see the ASCII table

[^\p{IsBasicLatin}] for what is asked for, [^\x00-\x7F] for concision over self-documentation, or \p{C} for clearing out formatters and controls without hurting other non-ASCIIs (and with greater concision yet).

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.