How to unescape unicode in JSON.NET - c#

I have JSON with parts in Unicode like { "val1": "\u003c=AA+ \u003e=AA-"}
How can I convert this to JSON which does not have Unicode formatting?
{"val1": "<=AA+ >=AA-"}

Json.NET unescapes Unicode sequences inside JsonTextReader, so you can adopt the same approach as is used in this answer to How do I get formatted JSON in .NET using C#? by Duncan Smart to reformat your JSON without unnecessary escaping by streaming directly from a JsonTextReader to a JsonTextWriter using JsonWriter.WriteToken(JsonReader):
public static partial class JsonExtensions
{
// Adapted from this answer https://stackoverflow.com/a/30329731
// To https://stackoverflow.com/q/2661063
// By Duncan Smart https://stackoverflow.com/users/1278/duncan-smart
public static string JsonPrettify(string json, Formatting formatting = Formatting.Indented)
{
using (var stringReader = new StringReader(json))
using (var stringWriter = new StringWriter())
{
return JsonPrettify(stringReader, stringWriter, formatting).ToString();
}
}
public static TextWriter JsonPrettify(TextReader textReader, TextWriter textWriter, Formatting formatting = Formatting.Indented)
{
// Let caller who allocated the the incoming readers and writers dispose them also
// Disable date recognition since we're just reformatting
using (var jsonReader = new JsonTextReader(textReader) { DateParseHandling = DateParseHandling.None, CloseInput = false })
using (var jsonWriter = new JsonTextWriter(textWriter) { Formatting = formatting, CloseOutput = false })
{
jsonWriter.WriteToken(jsonReader);
}
return textWriter;
}
}
Using this method, the following code:
var json = #"{ ""val1"": ""\u003c=AA+ \u003e=AA-""}";
var unescapedJson = JsonExtensions.JsonPrettify(json, Formatting.None);
Console.WriteLine("Unescaped JSON: {0}", unescapedJson);
Outputs
Unescaped JSON: {"val1":"<=AA+ >=AA-"}
Demo fiddle here.

I tried the following in Linqpad and it worked.
var s = #"{ ""val1"": ""\u003c=AA+ \u003e=AA-""}";
System.Text.RegularExpressions.Regex.Unescape(s).Dump();

Related

Trying to convert this .NET JsonConvert.Deserialize method that is using a custom JsonConverter to a JsonSerialize.Deserialize

I'm trying to deserialize a large json file. As such, I wish to stream the file content to the Deserialize method to reduce the number of allocations/GC, etc.
My current deserialization method uses a custom JsonConverter (which works great). I'm not sure how to do this same code, but using the streaming method of a JsonSerlialize instance with a custom JsonConverter.
Current code:
JsonConvert.DeserializeObject<IList<T>>(content, new[] { jsonConverter });
New code (incomplete):
using (var streamReader = new StreamReader(fileName))
{
using (var jsonReader = new JsonTextReader(streamReader))
{
var serializer = new JsonSerializer();
return jsonConverter == serializer.Deserialize<IList<T>>(jsonReader);
}
}
My new code doesn't take any CustomConverter instances in. How can I do this, please?
You could try using the JsonSerializer's static Create function with settings to pass your converter.
using (var streamReader = new StreamReader(fileName))
{
using (var jsonReader = new JsonTextReader(streamReader))
{
var serializer = JsonSerializer.Create(new JsonSerializerSettings() { Converters = new List<JsonConverter> { jsonConverter }});
return jsonConverter == serializer.Deserialize<IList<T>>(jsonReader);
}
}

Generating newlines instead of CRLFs in Json.Net

For my unix/java friends I would like to send newlines ('\n') instead of a CRLF (
'\r\n') in Json.Net. I tried setting a StreamWriter to use a newline without any success.
I think Json.Net code is using Environment.NewLine instead of calling TextWriter.WriteNewLine(). Changing Environment.NewLine is not an option because I'm running as a server and the newline encoding is based on the request.
Is there some other way to force newline for crlf?
Here's my code -
using (var streamWriter = new StreamWriter(writeStream, new UTF8Encoding(false))
{
NewLine = "\n"
})
using (var jsonWriter = new JsonTextWriter(streamWriter)
{
CloseOutput = true,
Indentation = 2,
Formatting = Formatting.Indented
})
{
// serialise object to JSON
}
After delving into the Json.Net code, I see the issue is with JsonTextWriter.WriteIndent, thanks Athari.
Instead of _writer.Write(Environment.NewLine); it should be _writer.WriteLine();.
I've posted a pull request to github. https://github.com/JamesNK/Newtonsoft.Json/pull/271
If you want to customize indentation whitespace, just override JsonTextWriter.WriteIndent:
public class JsonTextWriterEx : JsonTextWriter
{
public string NewLine { get; set; }
public JsonTextWriterEx (TextWriter textWriter) : base(textWriter)
{
NewLine = Environment.NewLine;
}
protected override void WriteIndent ()
{
if (Formatting == Formatting.Indented) {
WriteWhitespace(NewLine);
int currentIndentCount = Top * Indentation;
for (int i = 0; i < currentIndentCount; i++)
WriteIndentSpace();
}
}
}
I see several solutions here that are configuring the serializer. But if you want quick and dirty, just replace the characters with what you want. After all, JSON is just a string.
string json = JsonConvert.SerializeObject(myObject);
json = json.Replace("\r\n", "\n");
I know this is an old question but I had a hard time getting the accepted answer to work and maintain proper indented formatting. As a plus, I am also not creating an override to make this work.
Here is how I got this to work correctly:
using (var writer = new StringWriter())
{
writer.NewLine = "\r\n";
var serializer = JsonSerializer.Create(
new JsonSerializerSettings
{
Formatting = Formatting.Indented
});
serializer.Serialize(writer, data);
// do something with the written string
}
I am assuming that the code changes referenced in this question enabled the setting of NewLine on the StringWriter to be respected by the serializer.
On the wire protocol for just about everthing is that a line ends with crlf. If you feel that you really must do this, post processing is the way to go -- write it out to a string and then change the string before returning it.
Note that this is a lot of extra processing for what I would consider an actual negative. Not recommended.
result = result.Replace(#"\r\n", "\n");

Serializing an object with a string property containing double quotes

I have a object which has a string property that has a value with double quotes in it. I need to serialize this object and then use that XML. I wont be deserializing this xml.
I am having trouble getting the right content in the XML file. Let me explain with a code sample:
[Serializable]
public class Test {
[XmlElement]
public string obj { get; set; }
}
class Program {
static void Main(string[] args) {
var st ="Priority == \"1\"";
Test test = new Test();
test.obj = st;
//Serialize this object
XmlSerializer xsSubmit = new XmlSerializer(typeof(Test));
StringWriter sww = new StringWriter();
XmlWriter writer = XmlWriter.Create(sww, new XmlWriterSettings {
OmitXmlDeclaration = true
});
var ns = new XmlSerializerNamespaces();//just to make things simpler here
ns.Add(string.Empty, string.Empty);
xsSubmit.Serialize(writer, test, ns);
//My XML
var xml = sww.ToString();
}
}
I need my xml to be:
<Test><obj>Priority=="1"</obj></Test>
I now get:
<Test><obj>Priority==\"1\"</obj></Test>
I even tried to encode the string into HTML using var html = HttpUtility.HtmlEncode(st);
In this case, the varible html is in the right format however on serializing I get:
<Test><obj>Priority==&quot;1&quot;</obj></Test>
Need some help please.
There was no issue with the code.
I actually get<Test><obj>Priority=="1"</obj></Test> and this is fine. The mistake I was making was that I was reading the value on the debugger. When I write it somewhere, the content was in the correct format.
The " didnt get converted to " because double quotes are as such accepted in an XML document. I can work with that in this case!

C# or javascript code formatter

I'm currently using Syntax Highlighter to show a XML or SOAP messages on a page. That works fine for messages that are already formatted correctly (line breaks, indents, etc). But if I had a XML string like:
string xml = "<doc><object><first>Joe</first><last>Smith</last></object></doc>";
I would write the string to the page and the javascript highlighter would correctly syntax highlight the string, but it would be all on a single line.
Is there a C# string formatter or some syntax highlighting library that has a "smart" indent feature that would insert line breaks, indents, etc... ?
Since this is a string, adding line breaks and indents would be changing the actual value of variable xml, which is not what you want your code formatter to do!
Note that you can format the XML in C# before writing to the page, like this:
using System;
using System.IO;
using System.Text;
using System.Xml;
namespace XmlIndent
{
class Program
{
static void Main(string[] args)
{
string xml = "<doc><object><first>Joe</first><last>Smith</last></object></doc>";
var xd = new XmlDocument();
xd.LoadXml(xml);
Console.WriteLine(FormatXml(xd));
Console.ReadKey();
}
static string FormatXml(XmlDocument doc)
{
var sb = new StringBuilder();
var sw = new StringWriter(sb);
XmlTextWriter xtw = null;
using(xtw = new XmlTextWriter(sw) { Formatting = Formatting.Indented })
{
doc.WriteTo(xtw);
}
return sb.ToString();
}
}
}

XmlWriter to Write to a String Instead of to a File

I have a WCF service that needs to return a string of XML. But it seems like the writer only wants to build up a file, not a string. I tried:
string nextXMLstring = "";
using (XmlWriter writer = XmlWriter.Create(nextXMLstring))
This generates an error saying nextXMLstring doesnt have a file path. It wants something like:
using (XmlWriter writer = XmlWriter.Create("nextXMLstring.xml"))
How can I build up my XML and then return it as a string??
Thanks!!
You need to create a StringWriter, and pass that to the XmlWriter.
The string overload of the XmlWriter.Create is for a filename.
E.g.
using (var sw = new StringWriter()) {
using (var xw = XmlWriter.Create(sw)) {
// Build Xml with xw.
}
return sw.ToString();
}
As Richard said, StringWriter is the way forward. There's one snag, however: by default, StringWriter will advertise itself as being in UTF-16. Usually XML is in UTF-8. You can fix this by subclassing StringWriter;
public class Utf8StringWriter : StringWriter
{
public override Encoding Encoding
{
get { return Encoding.UTF8; }
}
}
This will affect the declaration written by XmlWriter. Of course, if you then write the string out elsewhere in binary form, make sure you use an encoding which matches whichever encoding you fix for the StringWriter. (The above code always assumes UTF-8; it's trivial to make a more general version which accepts an encoding in the constructor.)
You'd then use:
using (TextWriter writer = new Utf8StringWriter())
{
using (XmlWriter xmlWriter = XmlWriter.Create(writer))
{
...
}
return writer.ToString();
}
I know this is old and answered, but here is another way to do it. Particularly if you don't want the UTF8 BOM at the start of your string and you want the text indented:
using (var ms = new MemoryStream())
using (var x = new XmlTextWriter(ms, new UTF8Encoding(false))
   { Formatting = Formatting.Indented })
{
  // ...
  return Encoding.UTF8.GetString(ms.ToArray());
}
Use StringBuilder:
var sb = new StringBuilder();
using (XmlWriter xmlWriter = XmlWriter.Create(sb))
{
...
}
return sb.ToString();
Guys don't forget to call xmlWriter.Close() and xmlWriter.Dispose() or else your string won't finish creating. It will just be an empty string
Well I think the simplest and fastest solution here would be just to:
StringBuilder sb = new StringBuilder();
using (var writer = XmlWriter.Create(sb, settings))
{
... // Whatever code you have/need :)
sb = sb.Replace("encoding=\"utf-16\"", "encoding=\"utf-8\""); //Or whatever uft you want/use.
//Before you finally save it:
File.WriteAllText("path\\dataName.xml", sb.ToString());
}

Categories

Resources