c# Convert string with Emoji to unicode - c#

Im getting a string from client side like this:
This is a face :grin:
And i need to convert the :grin: to unicode in order to send it to other service.
Any clue how to do that?

Here is a link to a quite good json file with relevant information. It contains huge array (about 1500 entries) with emojis, and we are interested in 2 properties: "short_name" which represents name like "grin", and "unified" property, which contains unicode representation like "1F601".
I built a helper class to replace short names like ":grin:" with their unicode equivalent:
public static class EmojiParser {
static readonly Dictionary<string, string> _colonedEmojis;
static readonly Regex _colonedRegex;
static EmojiParser() {
// load mentioned json from somewhere
var data = JArray.Parse(File.ReadAllText(#"C:\path\to\emoji.json"));
_colonedEmojis = data.OfType<JObject>().ToDictionary(
// key dictionary by coloned short names
c => ":" + ((JValue)c["short_name"]).Value.ToString() + ":",
c => {
var unicodeRaw = ((JValue)c["unified"]).Value.ToString();
var chars = new List<char>();
// some characters are multibyte in UTF32, split them
foreach (var point in unicodeRaw.Split('-'))
// parse hex to 32-bit unsigned integer (UTF32)
uint unicodeInt = uint.Parse(point, System.Globalization.NumberStyles.HexNumber);
// convert to bytes and get chars with UTF32 encoding
// this is resulting emoji
return new string(chars.ToArray());
// build huge regex (all 1500 emojies combined) by join all names with OR ("|")
_colonedRegex = new Regex(String.Join("|", _colonedEmojis.Keys.Select(Regex.Escape)));
public static string ReplaceColonNames(string input) {
// replace match using dictoinary
return _colonedRegex.Replace(input, match => _colonedEmojis[match.Value]);
Usage is obvious:
var target = "This is a face :grin: :hash:";
target = EmojiParser.ReplaceColonNames(target);
It's quite fast (except first run, because of static constructor initialization). On your string it takes less than 1ms (was not able to measure with stopwatch, always shows 0ms). On huge string which you will never meet in practice (1MB of text) it takes 300ms on my machine.


Are there methods to convert between any string and a valid variable name in c#

I need such methods to save some information (for example, formulas) in a variable name.
Of course, it is easy to convert any string to a valid name. But I have 2 unique requirements:
1.The conversion can happen in both directions and after 2 times conversion, we should get the same original string.
Say, convert2OriginalString(Convert2Variable(originalstring)) should always equal to originalstring.
The generated variable name should be readable, not just ugly numbers.
Thank you in advance,
Just about the only "special" character that is allowed for variable names is the underscore "_"
You could create a custom Dictionary with all of the characters you want to escape, and then iterate through it replacing "special" characters in your string with escaped characters:
private static string ConvertToSafeName(string input)
var output = input;
foreach (var lookup in GetLookups())
output = output.Replace(lookup.Key, lookup.Value);
return output;
private static string RevertToSpecialName(string input)
var output = input;
foreach (var lookup in GetLookups())
output = output.Replace(lookup.Value, lookup.Key);
return output;
private static Dictionary<string, string> GetLookups()
Dictionary<string, string> lookups = new Dictionary<string, string>();
lookups.Add("=", "_eq_");
lookups.Add(">", "_gt_");
lookups.Add("-", "_mn_");
lookups.Add(" ", "__"); // double underscore for space
return lookups;
It's not 100% foolproof, but "x=y-z" translates to "x_eq_y_mn_z" and converts back again, and is fairly human readable

Casting HexNumber as character to string

I need to process a numeral as a string.
My value is 0x28 and this is the ascii code for '('.
I need to assign this to a string.
The following lines do this.
char c = (char)0x28;
string s = c.ToString();
string s2 = ((char)0x28).ToString();
My usecase is a function that only accepts strings.
My call ends up looking cluttered:
someCall( ((char)0x28).ToString() );
Is there a way of simplifying this and make it more readable without writing '(' ?
The Hexnumber in the code is always paired with a Variable that contains that hex value in its name, so "translating" it would destroy that visible connection.
A List of tuples is initialised with this where the first item has the character in its name and the second item results from a call with that character.
One of the answers below is exactly what i am looking for so i incorporated it here now.
{ existingStaticVar0x28, someCall("\u0028") }
The reader can now instinctively see the connection between item1 and item2 and is less likely to run into a trap when this gets refactored.
You can use Unicode character escape sequence in place of a hex to avoid casting:
string s2 = '\u28'.ToString();
Well supposing that you have not a fixed input then you could write an extension method
namespace MyExtensions
public static class MyStringExtensions
public static string ConvertFromHex(this string hexData)
int c = Convert.ToInt32(hexCode, 16);
return new string(new char[] {(char)c});
Now you could call it in your code wjth
string hexNumber = "0x28"; // or whatever hexcode you need to convert
string result = hexNumber.ConvertFromHex();
A bit of error handling should be added to the above conversion.

TextElement Enumerator Class Bug or (Tamil) Unicode Bug

why the TextElementEnumerator not properly parsing the Tamil Unicode character.
using System;
using System.Collections.Generic;
using System.Globalization;
namespace Glyphtest
internal class Program
private static void Main()
const string unicodetxt1 = "ஊரவர் கெளவை";
List<string> output = Syllabify(unicodetxt1);
const string unicodetxt2 = "கௌவை";
output = Syllabify(unicodetxt2);
public static List<string> Syllabify(string unicodetext)
if (string.IsNullOrEmpty(unicodetext)) return null;
TextElementEnumerator enumerator = StringInfo.GetTextElementEnumerator(unicodetext);
var data = new List<string>();
while (enumerator.MoveNext())
return data;
Following above code sample deals with Unicode character
'கௌ'-> 0x0bc8 (க) +0xbcc(ௌ). (Correct Form)
'கௌ'->0x0bc8 (க) +0xbc6(ெ) + 0xbb3(ள) (In Correct Form)
Is it bug in Text Element Enumerator Class ,
why its not to Enumerate it properly from the string.
கெளவை => 'கெள'+ 'வை' has to enumerated in Correct form
கெளவை => 'கெ' +'ள' +'வை' not to be enumerated in Incorrect form.
If so how to overcome this issue.
Its not been bug with Unicode character or TextElementEnumerator Class,
As specific to the lanaguage (Tamil)
letter made by any Tamil consonants followed by visual glyph
for eg-
க -\u0b95
ெ -\u0bc6
ள -\u0bb3
form Tamil character 'கெள' while its seems similar to formation of visual glyph
க -\u0b95
and its right form to solution.
hence before enumerating Tamil character we have replace irregular formation of character.
As with rule of Tamil Grammar (ஔகாரக் குறுக்கம்)
the visual glyph (ௌ) will come as starting letter of a word.
so that. the above code is to be should processed as
internal class Program
private static void Main()
const string unicodetxt1 = "ஊரவர் கெளவை";
List<string> output = Syllabify(unicodetxt1);
const string unicodetxt2 = "கௌவை";
output = Syllabify(unicodetxt2);
public static string CheckVisualGlyphPattern(string txt)
string[] data = txt.Split(new[] { ' ', '\r', '\n' }, StringSplitOptions.RemoveEmptyEntries);
string list = string.Empty;
var rx = new Regex("^(.*?){1}(\u0bc6){1}(\u0bb3){1}");
foreach (string s in data)
var matches = new List<Match>();
string outputs = rx.Replace(s, match =>
return string.Format("{0}\u0bcc", match.Groups[1].Value);
list += string.Format("{0} ", outputs);
return list.Trim();
public static List<string> Syllabify(string unicodetext)
var processdata = CheckVisualGlyphPattern(unicodetext);
if (string.IsNullOrEmpty(processdata)) return null;
TextElementEnumerator enumerator = StringInfo.GetTextElementEnumerator(processdata);
var data = new List<string>();
while (enumerator.MoveNext())
return data;
It produce the appropriate visual glyph while enumerating.
U+0BB3 ᴛᴀᴍɪʟ ʟᴇᴛᴛᴇʀ ʟʟᴀ has Grapheme_Cluster_Break=XX (Other). This makes the grapheme clusters <U+0BC8 U+0BC6><U+0BB3> the correct ones since there is always a grapheme cluster break before characters with Grapheme_Cluster_Break equal to Other.
<U+0BC8 U+0BCC> has no internal grapheme cluster breaks because U+0BCC has Grapheme_Cluster_Break=SpacingMark and there are usually no breaks before such characters (exceptions are at the start of text or when preceded by a control character).
Well, at least this is what the Unicode standard has to say (http://www.unicode.org/reports/tr29/#Grapheme_Cluster_Boundaries).
Now, I have no idea of how Tamil works, so take what follows with a pinch of salt.
U+0BCC decomposes into <U+0BC6 U+0BD7>, meaning the two sequences (<U+0BC8 U+0BC6 U+0BB3> and <U+0BC8 U+0BCC>) not canonically equivalent, so there is no requirement for grapheme cluster segmentation to yield the same results.
When I look at it with my Tamil-ignorant eyes, it seems U+0BCC ᴛᴀᴍɪʟ ᴀᴜ ʟᴇɴɢᴛʜ ᴍᴀʀᴋ and U+0BB3 ᴛᴀᴍɪʟ ʟᴇᴛᴛᴇʀ ʟʟᴀ look exactly the same. However, U+0BCC is a spacing mark, but U+0BB3 isn't. If you use U+0BCC in the input instead of U+0BB3, the result is what you expected.
Going on a limb, I will say that you are using the wrong character but, again, I don't know Tamil at all so I can't be sure.

How can I make nested string splits?

I have what seemed at first to be a trivial problem but turned out to become something I can't figure out how to easily solve. I need to be able to store lists of items in a string. Then those items in turn can be a list, or some other value that may contain my separator character. I have two different methods that unpack the two different cases but I realized I need to encode the contained value from any separator characters used with string.Split.
To illustrate the problem:
string[] nested = { "mary;john;carl", "dog;cat;fish", "plainValue" }
string list = string.Join(";", nested);
string[] unnested = list.Split(';'); // EEK! returns 7 items, expected 3!
This would produce a list "mary;john;carl;dog;cat;fish;plainValue", a value I can't split to get the three original nested strings from. Indeed, instead of the three original strings, I'd get 7 strings on split and this approach thus doesn't work at all.
What I want is to allow the values in my string to be encoded so I can unpack/split the contents just the way before I packed/join them. I assume I might need to go away from string.Split and string.Join and that is perfectly fine. I might just have overlooked some useful class or method.
How can I allow any string values to be packed / unpacked into lists?
I prefer neat, simple solutions over bulky if possible.
For the curious mind, I am making extensions for PlayerPrefs in Unity3D, and I can only work with ints, floats and strings. Thus I chose strings to be my data carrier. This is why I am making this nested list of strings.
const char joinChar = '╗'; // make char const
string[] nested = { "mary;john;carl", "dog;cat;fish", "plainValue" };
string list = string.Join(Convert.ToString(joinChar), nested);
string[] unnested = list.Split(joinChar); // eureka returns 3!
using an ascii character outside the normal 'set' allows you to join and split without ruining your logic that is separated on the ; char.
Encode your strings with base64 encoding before joining.
The expected items are 7 because you're splitting with a ; char. I would suggest to change your code to:
string[] nested = { "mary;john;carl", "dog;cat;fish", "plainValue" }
string list = string.Join("#" nested);
string[] unnested = list.Split('#'); // 3 strings again
Have you considered using a different separator, eg "|"?
This way the joined string will be "mary;john;carl|dog;cat;fish|plainValue" and when you call list.split("|"); it will return the three original strings
Use some other value than semicolon (;) for joining. For example - you can use comma (,) and you will get "mary;john;carl,dog;cat;fish,plainValue". When you again split it based on (,) as a separator, you should get back your original string value.
I came up with a solution of my own as well.
I could encode the length of an item, followed with the contents of an item. It would not use string.Split and string.Join at all, but it would solve my problem. The content would be untouched, and any content that need encoding could in turn use this encoding in its content space.
To illustrate the format (constant length header):
< content length > < raw content >
To illustrate the format (variable length header):
< content length > < header stop character > < raw content >
In the former, a fixed length of characters are used to describe the length of the contents. This could be plain text, hexadecimal, base64 or some other encoding.
Example with 4 hexadecimals (ffff/65535 max length):
In the latter example, we can reduce this to:
Then I could look for the first occurance of : and parse the length first, to extract the substring that follows. After that is the next item of the list.
A nested example could look like:
(List - 14 charactes including headers)
Hello (5 characters)
World (5 characters)
(List - 10 characters including headers)
Hi (2 characters)
John (4 characters)
A drawback is that it explicitly requires the length of all items, even if no "shared separator" character wouldn't been present (this solution use no separators if using fixed length header).
Maby not as nice as you wanted. But here goes :)
static void Main(string[] args)
string[] str = new string[] {"From;niklas;to;lasse", "another;day;at;work;", "Bobo;wants;candy"};
string compiledString = GetAsString(str);
string[] backAgain = BackToStringArray(compiledString);
public static string GetAsString(string[] strings)
string returnString = string.Empty;
using (MemoryStream ms = new MemoryStream())
using (BinaryWriter writer = new BinaryWriter(ms))
for (int i = 0; i < strings.Length; ++i)
byte[] array = ms.ToArray();
returnString = Encoding.UTF8.GetString(array);
return returnString;
public static string[] BackToStringArray(string encodedString)
string[] returnStrings = new string[0];
byte[] toBytes = Encoding.UTF8.GetBytes(encodedString);
using (MemoryStream stream = new MemoryStream(toBytes))
using (BinaryReader reader = new BinaryReader(stream))
int numStrings = reader.ReadInt32();
returnStrings = new string[numStrings];
for (int i = 0; i < numStrings; ++i)
returnStrings[i] = reader.ReadString();
return returnStrings;

Can .NET load and parse a properties file equivalent to Java Properties class?

Is there an easy way in C# to read a properties file that has each property on a separate line followed by an equals sign and the value, such as the following:
CustomProperty=Any value
In Java, the Properties class handles this parsing easily:
Properties myProperties=new Properties();
FileInputStream fis = new FileInputStream (new File("CustomProps.properties"));
I can easily load the file in C# and parse each line, but is there a built in way to easily get a property without having to parse out the key name and equals sign myself? The C# information I have found seems to always favor XML, but this is an existing file that I don't control and I would prefer to keep it in the existing format as it will require more time to get another team to change it to XML than parsing the existing file.
No there is no built-in support for this.
You have to make your own "INIFileReader".
Maybe something like this?
var data = new Dictionary<string, string>();
foreach (var row in File.ReadAllLines(PATH_TO_FILE))
data.Add(row.Split('=')[0], string.Join("=",row.Split('=').Skip(1).ToArray()));
Edit: Updated to reflect Paul's comment.
Final class. Thanks #eXXL.
public class Properties
private Dictionary<String, String> list;
private String filename;
public Properties(String file)
public String get(String field, String defValue)
return (get(field) == null) ? (defValue) : (get(field));
public String get(String field)
return (list.ContainsKey(field))?(list[field]):(null);
public void set(String field, Object value)
if (!list.ContainsKey(field))
list.Add(field, value.ToString());
list[field] = value.ToString();
public void Save()
public void Save(String filename)
this.filename = filename;
if (!System.IO.File.Exists(filename))
System.IO.StreamWriter file = new System.IO.StreamWriter(filename);
foreach(String prop in list.Keys.ToArray())
if (!String.IsNullOrWhiteSpace(list[prop]))
file.WriteLine(prop + "=" + list[prop]);
public void reload()
public void reload(String filename)
this.filename = filename;
list = new Dictionary<String, String>();
if (System.IO.File.Exists(filename))
private void loadFromFile(String file)
foreach (String line in System.IO.File.ReadAllLines(file))
if ((!String.IsNullOrEmpty(line)) &&
(!line.StartsWith(";")) &&
(!line.StartsWith("#")) &&
(!line.StartsWith("'")) &&
int index = line.IndexOf('=');
String key = line.Substring(0, index).Trim();
String value = line.Substring(index + 1).Trim();
if ((value.StartsWith("\"") && value.EndsWith("\"")) ||
(value.StartsWith("'") && value.EndsWith("'")))
value = value.Substring(1, value.Length - 2);
//ignore dublicates
list.Add(key, value);
catch { }
Sample use:
Properties config = new Properties(fileConfig);
//get value whith default value
com_port.Text = config.get("com_port", "1");
//set value
config.set("com_port", com_port.Text);
Most Java ".properties" files can be split by assuming the "=" is the separator - but the format is significantly more complicated than that and allows for embedding spaces, equals, newlines and any Unicode characters in either the property name or value.
I needed to load some Java properties for a C# application so I have implemented JavaProperties.cs to correctly read and write ".properties" formatted files using the same approach as the Java version - you can find it at http://www.kajabity.com/index.php/2009/06/loading-java-properties-files-in-csharp/.
There, you will find a zip file containing the C# source for the class and some sample properties files I tested it with.
Yet another answer (in January 2018) to the old question (in January 2009).
The specification of Java properties file is described in the JavaDoc of java.util.Properties.load(java.io.Reader). One problem is that the specification is a bit complicated than the first impression we may have. Another problem is that some answers here arbitrarily added extra specifications - for example, ; and ' are regarded as starters of comment lines but they should not be. Double/single quotations around property values are removed but they should not be.
The following are points to be considered.
There are two kinds of line, natural lines and logical lines.
A natural line is terminated by \n, \r, \r\n or the end of the stream.
A logical line may be spread out across several adjacent natural lines by escaping the line terminator sequence with a backslash character \.
Any white space at the start of the second and following natural lines in a logical line are discarded.
White spaces are space (, \u0020), tab (\t, \u0009) and form feed (\f, \u000C).
As stated explicitly in the specification, "it is not sufficient to only examine the character preceding a line terminator sequence to decide if the line terminator is escaped; there must be an odd number of contiguous backslashes for the line terminator to be escaped. Since the input is processed from left to right, a non-zero even number of 2n contiguous backslashes before a line terminator (or elsewhere) encodes n backslashes after escape processing."
= is used as the separator between a key and a value.
: is used as the separator between a key and a value, too.
The separator between a key and a value can be omitted.
A comment line has # or ! as its first non-white space characters, meaning leading white spaces before # or ! are allowed.
A comment line cannot be extended to next natural lines even its line terminator is preceded by \.
As stated explicitly in the specification, =, : and white spaces can be embedded in a key if they are escaped by backslashes.
Even line terminator characters can be included using \r and \n escape sequences.
If a value is omitted, an empty string is used as a value.
\uxxxx is used to represent a Unicode character.
A backslash character before a non-valid escape character is not treated as an error; it is silently dropped.
So, for example, if test.properties has the following content:
# A comment line that starts with '#'.
# This is a comment line having leading white spaces.
! A comment line that starts with '!'.
key2 : value2
key3 value3
\:\ \= = \\colon\\space\\equal
it should be interpreted as the following key-value pairs.
| key1 | value1 |
| key2 | value2 |
| key3 | value3 |
| key4 | value4 |
| key5 | value5 |
| key6 | value6 |
| : = | \colon\space\equal |
PropertiesLoader class in Authlete.Authlete NuGet package can interpret the format of the specification. The example code below:
using System;
using System.IO;
using System.Collections.Generic;
using Authlete.Util;
namespace MyApp
class Program
public static void Main(string[] args)
string file = "test.properties";
IDictionary<string, string> properties;
using (TextReader reader = new StreamReader(file))
properties = PropertiesLoader.Load(reader);
foreach (var entry in properties)
Console.WriteLine($"{entry.Key} = {entry.Value}");
will generate this output:
key1 = value1
key2 = value2
key3 = value3
key4 = value4
key5 = value5
key6 = value6
: = = \colon\space\equal
An equivalent example in Java is as follows:
import java.util.*;
import java.io.*;
public class Program
public static void main(String[] args) throws IOException
String file = "test.properties";
Properties properties = new Properties();
try (Reader reader = new FileReader(file))
for (Map.Entry<Object, Object> entry : properties.entrySet())
System.out.format("%s = %s\n", entry.getKey(), entry.getValue());
The source code, PropertiesLoader.cs, can be found in authlete-csharp. xUnit tests for PropertiesLoader are written in PropertiesLoaderTest.cs.
I've written a method that allows emty lines, outcommenting and quoting within the file.
;var4=outcommented, too
Here's the method:
public static IDictionary ReadDictionaryFile(string fileName)
Dictionary<string, string> dictionary = new Dictionary<string, string>();
foreach (string line in File.ReadAllLines(fileName))
if ((!string.IsNullOrEmpty(line)) &&
(!line.StartsWith(";")) &&
(!line.StartsWith("#")) &&
(!line.StartsWith("'")) &&
int index = line.IndexOf('=');
string key = line.Substring(0, index).Trim();
string value = line.Substring(index + 1).Trim();
if ((value.StartsWith("\"") && value.EndsWith("\"")) ||
(value.StartsWith("'") && value.EndsWith("'")))
value = value.Substring(1, value.Length - 2);
dictionary.Add(key, value);
return dictionary;
Yeah there's no built in classes to do this that I'm aware of.
But that shouldn't really be an issue should it? It looks easy enough to parse just by storing the result of Stream.ReadToEnd() in a string, splitting based on new lines and then splitting each record on the = character. What you'd be left with is a bunch of key value pairs which you can easily toss into a dictionary.
Here's an example that might work for you:
public static Dictionary<string, string> GetProperties(string path)
string fileData = "";
using (StreamReader sr = new StreamReader(path))
fileData = sr.ReadToEnd().Replace("\r", "");
Dictionary<string, string> Properties = new Dictionary<string, string>();
string[] kvp;
string[] records = fileData.Split("\n".ToCharArray());
foreach (string record in records)
kvp = record.Split("=".ToCharArray());
Properties.Add(kvp[0], kvp[1]);
return Properties;
Here's an example of how to use it:
Dictionary<string,string> Properties = GetProperties("data.txt");
Console.WriteLine("Hello: " + Properties["Hello"]);
The real answer is no (at least not by itself). You can still write your own code to do it.
C# generally uses xml-based config files rather than the *.ini-style file like you said, so there's nothing built-in to handle this. However, google returns a number of promising results.
I don't know of any built-in way to do this. However, it would seem easy enough to do, since the only delimiters you have to worry about are the newline character and the equals sign.
It would be very easy to write a routine that will return a NameValueCollection, or an IDictionary given the contents of the file.
You can also use C# automatic property syntax with default values and a restrictive set. The advantage here is that you can then have any kind of data type in your properties "file" (now actually a class). The other advantage is that you can use C# property syntax to invoke the properties. However, you just need a couple of lines for each property (one in the property declaration and one in the constructor) to make this work.
using System;
namespace ReportTester {
class TestProperties
internal String ReportServerUrl { get; private set; }
internal TestProperties()
ReportServerUrl = "http://myhost/ReportServer/ReportExecution2005.asmx?wsdl";
There are several NuGet packages for this, but all are currently in pre-release version.
Capgemini.Cauldron.Core.JavaProperties 2.0.39-beta
Kajabity.Tools.Java 0.2.6638.28124
As of June 2018, Capgemini.Cauldron.Core.JavaProperties is now in a stable version (version 2.1.0 and 3.0.20).
I realize that this isn't exactly what you're asking, but just in case:
When you want to load an actual Java properties file, you'll need to accomodate its encoding. The Java docs indicate that the encoding is ISO 8859-1, which contains some escape sequences that you might not correctly interpret. For instance look at this SO answer to see what's necessary to turn UTF-8 into ISO 8859-1 (and vice versa)
When we needed to do this, we found an open-source PropertyFile.cs and made a few changes to support the escape sequences. This class is a good one for read/write scenarios. You'll need the supporting PropertyFileIterator.cs class as well.
Even if you're not loading true Java properties, make sure that your prop file can express all the characters you need to save (UTF-8 at least)
No there is not : But I have created one easy class to help :
public class PropertiesUtility
private static Hashtable ht = new Hashtable();
public void loadProperties(string path)
string[] lines = System.IO.File.ReadAllLines(path);
bool readFlag = false;
foreach (string line in lines)
string text = Regex.Replace(line, #"\s+", "");
readFlag = checkSyntax(text);
if (readFlag)
string[] splitText = text.Split('=');
ht.Add(splitText[0].ToLower(), splitText[1]);
private bool checkSyntax(string line)
if (String.IsNullOrEmpty(line) || line[0].Equals('['))
return false;
if (line.Contains("=") && !String.IsNullOrEmpty(line.Split('=')[0]) && !String.IsNullOrEmpty(line.Split('=')[1]))
return true;
throw new Exception("Can not Parse Properties file please verify the syntax");
public string getProperty(string key)
if (ht.Contains(key))
return ht[key].ToString();
throw new Exception("Property:" + key + "Does not exist");
Hope this helps.

