convert unicode escape sequences to string

convert unicode escape sequences to string - c#

Hi I have this problem. From server I get JSON string as unicode escape sequences an I need convert this sequences to unicode string. I find some solution, but any doesn’t work for all json response.
For example from server I get this string.
string encodedText="{\"DATA\":{\"idUser\":18167521,\"nick\":\"KecMessanger2\",\"photo\":\"1\",\"sex\":1,\"photoAlbums\":0,\"videoAlbums\":0,\"sefNick\":\"kecmessanger2\",\"profilPercent\":0,\"emphasis\":false,\"age\":25,\"isBlocked\":false,\"PHOTO\":{\"normal\":\"http://213.215.107.125/fotky/1816/75/n_18167521.jpg?v=1\",\"medium\":\"http://213.215.107.125/fotky/1816/75/m_18167521.jpg?v=1\",\"24x24\":\"http://213.215.107.125/fotky/1816/75/s_18167521.jpg?v=1\"},\"PLUS\":{\"active\":false,\"activeTo\":\"0000-00-00\"},\"LOCATION\":{\"idRegion\":\"1\",\"regionName\":\"Banskobystricku00fd kraj\",\"idCity\":\"109\",\"cityName\":\"Rimavsku00e1 Sobota\"},\"STATUS\":{\"isLoged\":true,\"isChating\":false,\"idChat\":0,\"roomName\":\"\",\"lastLogin\":1291898043},\"PROJECT_STATUS\":{\"photoAlbums\":0,\"photoAlbumsFavs\":0,\"videoAlbums\":0,\"videoAlbumsFavs\":0,\"videoAlbumsExts\":0,\"blogPosts\":0,\"emailNew\":0,\"postaNew\":0,\"clubInvitations\":0,\"dashboardItems\":26},\"STATUS_MESSAGE\":{\"statusMessage\":\"Nepru00edtomnu00fd.\",\"addTime\":\"1291887539\"},\"isFriend\":false,\"isIamFriend\":false}}";
statusMessage in jsonstring consist Nepru00edtomnu00fd, in .net unicode string is it Neprítomný.
region in jsonstring consist Banskobystricku00fd in .net unicode string is it BanskoBystrický.
Other examples:
Nepru00edtomnu00fd -> Neprítomný
Banskobystricku00fd -> BanskoBystrický
Trenu010du00edn -> Trenčín
I need convert unicode escape sequences to .net string in slovak language.
On converting I used this func:
private static string UnicodeStringToNET(string input)
{
var regex = new Regex(#"\\[uU]([0-9A-F]{4})", RegexOptions.IgnoreCase);
return input = regex.Replace(input, match => ((char)int.Parse(match.Groups[1].Value,
NumberStyles.HexNumber)).ToString());
}
Where can be problem?

Here's a method (based on previous answers) that I wrote to do the job. It handles both \uhhhh and \Uhhhhhhhh, and it will preserve escaped unicode escapes (so if your string needs to contain a literal \uffff, you can do that). The temporary placeholder character \uf00b is in a private use area, so it shouldn't typically occur in Unicode strings.
public static string ParseUnicodeEscapes(string escapedString)
{
const string literalBackslashPlaceholder = "\uf00b";
const string unicodeEscapeRegexString = #"(?:\\u([0-9a-fA-F]{4}))|(?:\\U([0-9a-fA-F]{8}))";
// Replace escaped backslashes with something else so we don't
// accidentally expand escaped unicode escapes.
string workingString = escapedString.Replace("\\\\", literalBackslashPlaceholder);
// Replace unicode escapes with actual unicode characters.
workingString = new Regex(unicodeEscapeRegexString).Replace(workingString,
match => ((char) Int32.Parse(match.Value.Substring(2), NumberStyles.HexNumber))
.ToString(CultureInfo.InvariantCulture));
// Replace the escaped backslash placeholders with non-escaped literal backslashes.
workingString = workingString.Replace(literalBackslashPlaceholder, "\\");
return workingString;
}

Your escape sequences do not start with a \ like "\u00fd" so you Regex should be only
"[uU]([0-9A-F]{4})"
...

Related

How to convert a string to JSON friendly string?

I have a string (text) that I would like to convert using a JSON parser so that it is javascript friendly.
In my view page I have some javascript that looks like:
var site = {
strings: {
addToCart: #someValue,
So #someValue should be javascript safe like double quotes, escaped chars if needed etc.
That value #someValue is a string, but it has to be javascript friendly so I want to parse it using JSON.
Does the new System.Text.Json have something?
I tried this:
return System.Text.Json.JsonDocument.Parse(input).ToString();
But this doesnt' work because my text is just a string, not a JSON string.
Is there another way to parse something?

The rules for escaping strings to make them JSON safe are as follows:
Backspace is replaced with \b
Form feed is replaced with \f
Newline is replaced with \n
Carriage return is replaced with \r
Tab is replaced with \t
Double quote is replaced with \"
Backslash is replaced with \\
And while it's not strictly necessary, any non-web-safe character (i.e. any non-ASCII character) can be converted to its escaped Unicode equivalent to avoid potential encoding issues.
From this, it's pretty straightforward to create your own conversion method:
public static string MakeJsonSafe(String s)
{
var jsonEscaped = s.Replace("\\", "\\\\")
.Replace("\"", "\\\"")
.Replace("\b", "\\b")
.Replace("\f", "\\f")
.Replace("\n", "\\n")
.Replace("\r", "\\r")
.Replace("\t", "\\t");
var nonAsciiEscaped = jsonEscaped.Select((c) => c >= 127 ? "\\u" + ((int)c).ToString("X").PadLeft(4, '0') : c.ToString());
return string.Join("", nonAsciiEscaped);
}
DotNetFiddle
(Like I said, the nonAsciiEscaped stage can be omitted as it's not strictly necessary.)

How to replace the single slash to double slash in C#?

Replce single slash to double slash is not working always return the single slash..
string input;
input = "\r\t";
string mat1= input.Replace("\\\\","\\\\\\\\");
string inputt= mat1;
if i am run the above code it will return output is \r\t only....
but i need output like this
\r\t

"\r\t" is in fact just two characters, carriage return and tab. This is because the \ escape character is used to specify special characters.
If you want to have a string that is actually "\r\t" you need to escape the \ characters by using \\.
So your string should be:
input = "\\r\\t";
Or
input = #"\r\t";
And then to replace the backslashes with double backslashes:
string mat1= input.Replace("\\","\\\\");
Or
string mat1= input.Replace(#"\", #"\\");

input = "\r\t";
is a so called escaped string. \ means escape sequence. if you need \r\t you need to write
input = "\\r\\t";

This is 2 already escaped characters, not 4.
input="\r\t";

\r and \t are special literals. Check this article:
\r - Carriage return
\t - Horizontal tab
What you wanted, I gess, is to change this special literal like this:
string input;
input = "\r\t";
input = input.Replace("\r", "\\r");
input = input.Replace("\t", "\\t");
Console.WriteLine(input);

c# .NET isn't rendering russian characters

I'm using regex to match a string of unicode and store it in a string. For example:
NOTE: The following content must be read from an outside text file or else visual studio will automagically render it into russian.
"Name": "\u0412\u0438\u043d\u043d\u0438\u0446\u0430, \u0443\u043b. \u041a\u0438\u0435\u0432\u0441\u043a\u0430\u044f, 14-\u0431",
I'm using the pattern:
"\"Name\":\\s*\"(?<match>[^\"]+)\""
However, when I store the match in a string, the string is saved as:
match = "\\u0412\\u0438\\u043d\\u043d\\u0438\\u0446\\u0430, \\u0443\\u043b. \\u041a\\u0438\\u0435\\u0432\\u0441\\u043a\\u0430\\u044f, 14-\\u0431"
.NET is storing the string with an extra "\"
I tried using:
match = match.replace(#"\\", #"\")
but .NET doesn't recognize #"\\" as existing because it is looking at the 'visualizer version'.
How can I store my unicode without c# adding an extra '\'?
EDIT:
Another point:
// this works!
string russianCharacters = "\u041b\u044c\u0432\u043e\u0432, \u0414\u043e\u043b\u0438\u043d\u0430, \u0432\u0443\u043b. \u0427\u043e\u0440\u043d\u043e\u0432\u043e\u043b\u0430, 18");
This renders correctly in the visualizer as russian characters. But when I store characters from a regex match FROM AN OUTSIDE TEXT FILE, it is stored as an excaped sequence.
How can I render my string as russian characters instead of an escaped sequence of unicode?

It seems you read the string from a text file that actually contains literal Unicode points, not actual Unicode symbols. That is, your C# variable looks like:
var match = "\\u0412\\u0438\\u043d\\u043d\\u0438\\u0446\\u0430, \\u0443\\u043b. \\u041a\\u0438\\u0435\\u0432\\u0441\\u043a\\u0430\\u044f, 14-\\u0431"
or
var match = #"\u0412\u0438\u043d\u043d\u0438\u0446\u0430, \u0443\u043b. \u041a\u0438\u0435\u0432\u0441\u043a\u0430\u044f, 14-\u0431"
In this case, to get the actual Unicode string, you need to use Regex.Unescape:
Converts any escaped characters in the input string.
C# demo:
var s = "\\u0412\\u0438\\u043d\\u043d\\u0438\\u0446\\u0430, \\u0443\\u043b. \\u041a\\u0438\\u0435\\u0432\\u0441\\u043a\\u0430\\u044f, 14-\\u0431";
Console.WriteLine(s);
// \u0412\u0438\u043d\u043d\u0438\u0446\u0430, \u0443\u043b. \u041a\u0438\u0435\u0432\u0441\u043a\u0430\u044f, 14-\u0431
Console.WriteLine(Regex.Unescape(s));
// Винница, ул. Киевская, 14-б

The extra '\' is just an escape character. I'm guessing you are viewing the value in the debugger window in which case it is showing the extra '\' but the underlying value will not have the extra '\'. Try using the actual value and you will see this.
This code works as expected:
var myString = "\"Name\": \"\u0412\u0438\u043d\u043d\u0438\u0446\u0430, \u0443\u043b.\u041a\u0438\u0435\u0432\u0441\u043a\u0430\u044f, 14 - \u0431\",";
var pattern = "\"Name\":\\s*\"(?<match>[^\"]+)\"";
Regex rgx = new Regex(pattern, RegexOptions.IgnoreCase);
MatchCollection matches = rgx.Matches(myString);
if (matches.Count > 0)
{
foreach (Match match in matches)
{
var ma = System.Web.HttpUtility.HtmlDecode(match.ToString());
}
}

How have I screwed up my regex?

I am really confused here. I have written a snippet of code in C# that is passed a possible file pathway. If it contains a character specified in a regex string, it should return false. However, the regex function Match refuses to find anything matching (I even set it to a singular character I knew was in the string), resulting in severe irritation from me.
The code is:
static bool letterTest(string pathway)
{
bool validPath = false;
char[] c = Path.GetInvalidPathChars();
string test = new string(c);
string regex = "["+test+"]";
string spTest = "^[~#%&*\\{}+<>/\"|]";
Match match = Regex.Match(pathway, spTest);
if (!match.Success)
{
validPath = true;
}
return validPath;
}
The string I pass to it is: #"C:/testing/invalid#symbol"
What am I doing wrong/misunderstanding with the regex, or is it something other than the regex that I have messed up?

Remove the initial caret from your regex:
[~#%&*\\{}+<>/\"|]
You are requiring that the path begin with one of those characters. By removing that constraint, it will search the whole string for any of those characters.
But why not use the framework to do the work for you?
Check this out: Check if a string is a valid Windows directory (folder) path

Instead of a regular expression you can just do the following.
static bool letterTest(string pathway)
{
char[] badChars = Path.GetInvalidPathChars();
return pathway.All(c => !badChars.Contains(c));
// or
// return !pathway.Any(c => badChars.Contains(c));
// or
// return badChars.All(bc => !pathway.Contains(bc));
// or
// return !badChars.Any(bc => pathway.Contains(bc));
}

Someone has already pointed out the caret that was anchoring your match to the first character. But there's another error you may not be aware of yet. This one has to do with your use of string literals. What you have now is a traditional, C-style string literal:
"[~#%&*\\{}+<>/\"|]"
...which becomes this regex:
[~#%&*\{}+<>/"|]
The double backslash has become a single backslash, which is treated as an escape for the following brace (\{). The brace doesn't need escaping inside a character class, but it's not considered a syntax error.
However, the regex will not detect a backslash as you intended. To do that, you need two backslashes in the regex, so there should be four backslashes in the string literal:
"[~#%&*\\\\{}+<>/\"|]"
Alternatively, you can use a C# verbatim string literal. Backslashes have no special meaning in a verbatim string. The only thing that needs special handling is the quotation mark, which you escape by adding another quotation mark:
#"[~#%&*\\{}+<>/""|]"

you have to escape the / literal
"^[~#%&*\\{}+<>\/\"|]"

Caret stands for negation of the character group. Removing it from spTest solves this issue.
string spTest = "[~#%&*\\{}+<>/\"|]";

Why doesn't my code compile?

I am using regular expression in code behind file and defining string as
string ValEmail = "\w+([-+.']\w+)*#\w+([-.]\w+)*\.\w+([-.]\w+)*";
if (Regex.IsMatch(email, "\w+([-+.']\w+)*#\w+([-.]\w+)*\.\w+([-.]\w+)*"))
{ }
else
{ }
It gives me warning and does not compile. How can I define such string combination?.

In C# the backslash is a special character, if it is to represent a backslash we need to inform the compiler as such.
This can be achieved by escaping it with a backslash:
string ValEmail = "\\w+([-+.']\\w+)*#\\w+([-.]\\w+)*\\.\\w+([-.]\\w+)*";
Or using an # prefix when constructing the string:
string ValEmail = #"\w+([-+.']\w+)*#\w+([-.]\w+)*\.\w+([-.]\w+)*";

The backslash is the escape char in c# strings. Technically you have to escape the backslash with another blackslash ("\\") or just add an # before your string:
string ValEmail = #"\w+([-+.']\w+)*#\w+([-.]\w+)*\.\w+([-.]\w+)*";

Use #"\w+([-+.']\w+)*#\w+([-.]\w+)*\.\w+([-.]\w+)*" so the backslashes will get escaped

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

convert unicode escape sequences to string - c#

Your escape sequences do not start with a \ like "\u00fd" so you Regex should be only "[uU]([0-9A-F]{4})" ...

Related

How to convert a string to JSON friendly string?

How to replace the single slash to double slash in C#?

c# .NET isn't rendering russian characters

How have I screwed up my regex?

Why doesn't my code compile?

Categories

Resources