I have a string (text) that I would like to convert using a JSON parser so that it is javascript friendly.
In my view page I have some javascript that looks like:
var site = {
strings: {
addToCart: #someValue,
So #someValue should be javascript safe like double quotes, escaped chars if needed etc.
That value #someValue is a string, but it has to be javascript friendly so I want to parse it using JSON.
Does the new System.Text.Json have something?
I tried this:
return System.Text.Json.JsonDocument.Parse(input).ToString();
But this doesnt' work because my text is just a string, not a JSON string.
Is there another way to parse something?
The rules for escaping strings to make them JSON safe are as follows:
Backspace is replaced with \b
Form feed is replaced with \f
Newline is replaced with \n
Carriage return is replaced with \r
Tab is replaced with \t
Double quote is replaced with \"
Backslash is replaced with \\
And while it's not strictly necessary, any non-web-safe character (i.e. any non-ASCII character) can be converted to its escaped Unicode equivalent to avoid potential encoding issues.
From this, it's pretty straightforward to create your own conversion method:
public static string MakeJsonSafe(String s)
{
var jsonEscaped = s.Replace("\\", "\\\\")
.Replace("\"", "\\\"")
.Replace("\b", "\\b")
.Replace("\f", "\\f")
.Replace("\n", "\\n")
.Replace("\r", "\\r")
.Replace("\t", "\\t");
var nonAsciiEscaped = jsonEscaped.Select((c) => c >= 127 ? "\\u" + ((int)c).ToString("X").PadLeft(4, '0') : c.ToString());
return string.Join("", nonAsciiEscaped);
}
DotNetFiddle
(Like I said, the nonAsciiEscaped stage can be omitted as it's not strictly necessary.)
Related
How can I concatenate the string "\u" with "a string" to get "\u0000"?
My code creates two backslashes:
string a = #"\u" + "0000"; //ends up being "\\\u0000";
The escape sequence \uXXXX is part of the language's syntax and represents a single Unicode character. By contrast, #"\u" and "0000" are two different strings, with a total of six characters. Concatenating them won't magically turn them into a single Unicode escape.
If you're trying to convert a Unicode code point into a single-character string, do this:
char.ConvertFromUtf32(strUnicodeOfMiddleChar).ToString()
BTW, don't use == true; it's redundant.
If I understand you correctly, I think you want to build a single-char string from an arbitrary Unicode value (4 hex digits). So given the string "0000", you want to convert that into the string "\u0000", i.e., a string containing a single character.
I think this is what you want:
string f = "0000"; // Or whatever
int n = int.Parse(f, NumberStyles.AllowHexSpecifier);
string s = ((char) n).ToString();
The resulting string s is "\u0000", which you can then use for your search.
(With corrections suggested by Thomas Levesque.)
the line below creates tow backslash:
string a = #"\u" + "0000"; //a ends up being "\\u0000";
No, it doesn't; the debugger shows "\" as "\", because that's how you write a backslash in C# (when you don't prefix the string with #). If you print that string, you will see \u0000, not \\u0000.
Nope, that string really has single backslash in. Print it out to the console and you'll see that.
Escape your characters correctly!!
Both:
// I am an escaped '\'.
string a = "\\u" + "0000";
And:
// I am a literal string.
string a = #"\u" + "0000";
Will work just fine. But, and I am going out on a limb here, I am guessing that you are trying to escape a Unicode Character and Hex value so, to do that, you need:
// I am an escaped Unicode Sequence with a Hex value.
char a = '\uxxxx';
I am trying to compare two strings but one of the string contains a white space at the end. I used Trim() and compared but didn't work because that white space is getting converted to %20 and I thing Trim does not remove that. it is something like "abc" and "abc%20" , what can I do in such situation to compare strings whih ignoring the case too?
%20 is the url-encoded version of space.
You can't directly strip it off using Trim(), but you can use HttpUtility.UrlDecode() to decode the %20 back to a space, then trim/do the comparison exactly as you would otherwise;
using System.Web;
//...
var test1 = "HELLO%20";
var test2 = "hello";
Console.WriteLine(HttpUtility.UrlDecode(test1).Trim().
Equals(HttpUtility.UrlDecode(test2).Trim(),
StringComparison.InvariantCultureIgnoreCase));
> true
Use HttpUtility.UrlDecode to decode the strings:
string s1 = "abc ";
string s2 = "abc%20";
if (System.Web.HttpUtility.UrlDecode(s1).Equals(System.Web.HttpUtility.UrlDecode(s2)))
{
//equals...
}
In case of WinForms or Console (or any non ASP.NET) project you will have to add reference to the System.Web assembly in your project.
Something like:
if (System.Uri.UnescapeDataString("abc%20").ToLower() == myString.ToLower()) {}
The "%20" is the url encoded version of the ' ' (space) character. Are you comparing an encoded URL parameter? If so, you can use:
string str = "abc%20";
string decoded = HttpUtility.UrlDecode(str); // decoded becomes "abc "
If you need to trim any white spaces, you should do this for the decoded string. The Trim method does not understand or recognize the encoded whitespace characters.
decoded = decoded.Trim();
Now you can compare with the decoded variable using:
decoded.Equals(otherValue, StringComparison.OrdinalIgnoreCase);
The StringComparison.OrdinalIgnoreCase is probably the fastest way for case-insensitive comparison between strings.
Did you try this?
string before = "abc%20";
string after = before.Replace("%20", "").ToLower();
You can use String.Replace and since you mentioned case insensitivity String.ToLower like this:
var str1 = "abc";
var str2 = "Abc%20";
str1.Replace("%20", "").ToLower() == str2.Replace("%20", "").ToLower();
// will be true
It seems the root problem is when you are with Encoding the Url. If you will use the character encoding, then you will never get %20. The default encoding used by HttpUtility.UrlEncode utf-8. here is the usage
System.Web.HttpUtility.UrlEncode("ãName Marcos", System.Text.Encoding.GetEncoding("iso-8859-1"))
And Here, on Microsoft website You can read more about Character Encoding.
And if you will do proper encoding you can avoid rest of the work
And here is what you asked -
The Second Case - If you have to compare two string as per your need, you need to Decode HttpUtility.UrlDecode(test)
bool result = HttpUtility.UrlDecode(stringOne).Equals(HttpUtility.UrlDecode(stringOne));
And result bool knows if they are equal or unequal
Console.WriteLine("Result is", result ? "equal." : "not equal.");
Hope it helps
For example, if I want to remove whitespace and trailing commas from a string, I can do this:
String x = "abc,\n";
x.Trim().Trim(new char[] { ',' });
which outputs abc correctly. I could easily wrap this in an extension method, but I'm wondering if there is an in-built way of doing this with a single call to Trim() that I'm missing. I'm used to Python, where I could do this:
import string
x = "abc,\n"
x.strip(string.whitespace + ",")
The documentation states that all Unicode whitespace characters, with a few exceptions, are stripped (see Notes to Callers section), but I'm wondering if there is a way to do this without manually defining a character array in an extension method.
Is there an in-built way to do this? The number of non-whitespace characters I want to strip may vary and won't necessarily include commas, and I want to remove all whitespace, not just \n.
Yes, you can do this:
x.Trim(new char[] { '\n', '\t', ' ', ',' });
Because newline is technically a character, you can add it to the array and avoid two calls to Trim.
EDIT
.NET 4.0 uses this method to determine if a character is considered whitespace. Earlier versions maintain an internal list of whitespace characters (Source).
If you really want to only use one Trim call, then your application could do the following:
On startup, scan the range of Unicode whitespace characters, calling Char.IsWhiteSpace on each character.
If the method call returns true, then push the character onto an array.
Add your custom characters to the array as well
Now you can use a single Trim call, by passing the array you constructed.
I'm guessing that Char.IsWhiteSpace depends on the current locale, so you'll have to pay careful attention to locale.
Using regex makes this simple:
text = Regex.Replace(text, #"^[\s,]+|[\s,]+$", "");
This will match Unicode whitespace characters as well.
You can have following Strip Extension method
public static class ExtensionMethod
{
public static string Strip(this string str, char[] otherCharactersToRemove)
{
List<char> charactersToRemove = (from s in str
where char.IsWhiteSpace(s)
select s).ToList();
charactersToRemove.AddRange(otherCharactersToRemove);
string str2 = str.Trim(charactersToRemove.ToArray());
return str2;
}
}
And then you can call it like:
static void Main(string[] args)
{
string str = "abc\n\t\r\n , asdfadf , \n \r \t";
string str2 = str.Strip(new char[]{','});
}
Out put would be:
str2 = "abc\n\t\r\n , asdfadf"
The Strip Extension method will first get all the WhiteSpace characters from the string in a list. Add other characters to remove in the list and then call trim on it.
Hi I have this problem. From server I get JSON string as unicode escape sequences an I need convert this sequences to unicode string. I find some solution, but any doesn’t work for all json response.
For example from server I get this string.
string encodedText="{\"DATA\":{\"idUser\":18167521,\"nick\":\"KecMessanger2\",\"photo\":\"1\",\"sex\":1,\"photoAlbums\":0,\"videoAlbums\":0,\"sefNick\":\"kecmessanger2\",\"profilPercent\":0,\"emphasis\":false,\"age\":25,\"isBlocked\":false,\"PHOTO\":{\"normal\":\"http://213.215.107.125/fotky/1816/75/n_18167521.jpg?v=1\",\"medium\":\"http://213.215.107.125/fotky/1816/75/m_18167521.jpg?v=1\",\"24x24\":\"http://213.215.107.125/fotky/1816/75/s_18167521.jpg?v=1\"},\"PLUS\":{\"active\":false,\"activeTo\":\"0000-00-00\"},\"LOCATION\":{\"idRegion\":\"1\",\"regionName\":\"Banskobystricku00fd kraj\",\"idCity\":\"109\",\"cityName\":\"Rimavsku00e1 Sobota\"},\"STATUS\":{\"isLoged\":true,\"isChating\":false,\"idChat\":0,\"roomName\":\"\",\"lastLogin\":1291898043},\"PROJECT_STATUS\":{\"photoAlbums\":0,\"photoAlbumsFavs\":0,\"videoAlbums\":0,\"videoAlbumsFavs\":0,\"videoAlbumsExts\":0,\"blogPosts\":0,\"emailNew\":0,\"postaNew\":0,\"clubInvitations\":0,\"dashboardItems\":26},\"STATUS_MESSAGE\":{\"statusMessage\":\"Nepru00edtomnu00fd.\",\"addTime\":\"1291887539\"},\"isFriend\":false,\"isIamFriend\":false}}";
statusMessage in jsonstring consist Nepru00edtomnu00fd, in .net unicode string is it Neprítomný.
region in jsonstring consist Banskobystricku00fd in .net unicode string is it BanskoBystrický.
Other examples:
Nepru00edtomnu00fd -> Neprítomný
Banskobystricku00fd -> BanskoBystrický
Trenu010du00edn -> Trenčín
I need convert unicode escape sequences to .net string in slovak language.
On converting I used this func:
private static string UnicodeStringToNET(string input)
{
var regex = new Regex(#"\\[uU]([0-9A-F]{4})", RegexOptions.IgnoreCase);
return input = regex.Replace(input, match => ((char)int.Parse(match.Groups[1].Value,
NumberStyles.HexNumber)).ToString());
}
Where can be problem?
Here's a method (based on previous answers) that I wrote to do the job. It handles both \uhhhh and \Uhhhhhhhh, and it will preserve escaped unicode escapes (so if your string needs to contain a literal \uffff, you can do that). The temporary placeholder character \uf00b is in a private use area, so it shouldn't typically occur in Unicode strings.
public static string ParseUnicodeEscapes(string escapedString)
{
const string literalBackslashPlaceholder = "\uf00b";
const string unicodeEscapeRegexString = #"(?:\\u([0-9a-fA-F]{4}))|(?:\\U([0-9a-fA-F]{8}))";
// Replace escaped backslashes with something else so we don't
// accidentally expand escaped unicode escapes.
string workingString = escapedString.Replace("\\\\", literalBackslashPlaceholder);
// Replace unicode escapes with actual unicode characters.
workingString = new Regex(unicodeEscapeRegexString).Replace(workingString,
match => ((char) Int32.Parse(match.Value.Substring(2), NumberStyles.HexNumber))
.ToString(CultureInfo.InvariantCulture));
// Replace the escaped backslash placeholders with non-escaped literal backslashes.
workingString = workingString.Replace(literalBackslashPlaceholder, "\\");
return workingString;
}
Your escape sequences do not start with a \ like "\u00fd" so you Regex should be only
"[uU]([0-9A-F]{4})"
...
buildLetter.Append("</head>").AppendLine();
buildLetter.Append("").AppendLine();
buildLetter.Append("<style type="text/css">").AppendLine();
Assume the above contents resides in a file. I want to write a snippet that
removes any line which has empty string "" and put escape character before
the middle quotations. The final output would be:
buildLetter.Append("</head>").AppendLine();
buildLetter.Append("<style type=\"text/css\">").AppendLine();
The outer " .... " is not considered special chars. The special chars may be single
quotation or double quotation.
I could run it via find and replace feature of Visual Studio. However, in my case i
want it to be written in c# or VB.NET
Any help will be appreciated.
Perhaps this does what you want:
string s = File.ReadAllText("input.txt");
string empty = "buildLetter.Append(\"\").AppendLine();" + Environment.NewLine;
s = s.Replace(empty, "");
s = Regex.Replace(s, #"(?<="").*(?="")",
match => { return match.Value.Replace("\"", "\\\""); }
);
Result:
buildLetter.Append("</head>").AppendLine();
buildLetter.Append("<style type=\"text/css\">").AppendLine();