How to deserialize JSONP in C#? [duplicate] - c#

I am trying to consume a web service that claims to return JSON, but actually always returns JSONP. I don't see a way to change that service's behavior.
I would like to use NewtonSoft Json.Net to parse the result. I have declared a class, let's call it MyType that I want to deserialize the inner JSON result into.
JSONP:
parseResponse({
"total" : "13,769",
"lower" : "1",
"upper" : "20"})
As you can see this is not correct JSON as it has parseResponse( prefix and ) suffix. While this example is very simple, the actual response can be quite long, on the order of 100Ks.
MyType:
public class MyType
{
public Decimal total;
public int lower;
public int upper;
}
After I get my web service response into a stream and JsonTextReader I try to deserialize like this:
(MyType)serializer.Deserialize(jsonTextReader, typeof(MyType));
Of course I get null for a result because there is that pesky parseResponse with round brackets.
I've taken a look at this question which unfortunately does not help. I'm actually using a JsonTextReader to feed in the JSON, rather than a string (and prefer so to avoid the performance hit of creating huge a string). Even if I'd use the suggestion from that question, it looks dangerous as it uses a global replace. If there is no good way to use a stream, an answer with safe parsing of strings would be okay.

If I interpret your question as follows:
I am trying to deserialize some JSON from a Stream. The "JSON" is actually in JSONP format and so contains some prefix and postfix text I would like to ignore. How can I skip the prefix and postfix text while still reading and deserializing directly from stream rather than loading the entire stream into a string?
Then you can deserialize your JSON from a JSONP stream using the following extension method:
public static class JsonExtensions
{
public static T DeserializeEmbeddedJsonP<T>(Stream stream)
{
using (var textReader = new StreamReader(stream))
return DeserializeEmbeddedJsonP<T>(textReader);
}
public static T DeserializeEmbeddedJsonP<T>(TextReader textReader)
{
using (var jsonReader = new JsonTextReader(textReader.SkipPast('(')))
{
var settings = new JsonSerializerSettings
{
CheckAdditionalContent = false,
};
return JsonSerializer.CreateDefault(settings).Deserialize<T>(jsonReader);
}
}
}
public static class TextReaderExtensions
{
public static TTextReader SkipPast<TTextReader>(this TTextReader reader, char ch) where TTextReader : TextReader
{
while (true)
{
var c = reader.Read();
if (c == -1 || c == ch)
return reader;
}
}
}
Notes:
Prior to constructing the JsonTextReader I construct a StreamReader and skip past the first '(' character in the stream. This positions the StreamReader at the beginning of the actual JSON.
Before deserialization I set JsonSerializerSettings.CheckAdditionalContent = false to tell the serializer to ignore any characters after the end of the JSON content. Oddly enough it is necessary to do this explicitly despite the fact that the default value seems to be false already, since the underlying field is nullable.
The same code can be used to deserialize embedded JSONP from a string by passing a StringReader to DeserializeEmbeddedJsonP<T>(TextReader reader);. Doing so avoids the need to create a new string by trimming the prefix and postfix text and so may improve performance and memory use even for smaller strings.
Sample working .Net fiddle.

It looks like it's returning JSONP. Kind of weird that a webservice would do that by default, without you including "?callback". In any case, if that's just the way it is, you can easily use a RegEx to just strip off the method call:
var x = WebServiceCall();
x = Regex.Replace(x, #"^.+?\(|\)$", "");

Related

How to deserialize a JSONP response (preferably with JsonTextReader and not a string)?

I am trying to consume a web service that claims to return JSON, but actually always returns JSONP. I don't see a way to change that service's behavior.
I would like to use NewtonSoft Json.Net to parse the result. I have declared a class, let's call it MyType that I want to deserialize the inner JSON result into.
JSONP:
parseResponse({
"total" : "13,769",
"lower" : "1",
"upper" : "20"})
As you can see this is not correct JSON as it has parseResponse( prefix and ) suffix. While this example is very simple, the actual response can be quite long, on the order of 100Ks.
MyType:
public class MyType
{
public Decimal total;
public int lower;
public int upper;
}
After I get my web service response into a stream and JsonTextReader I try to deserialize like this:
(MyType)serializer.Deserialize(jsonTextReader, typeof(MyType));
Of course I get null for a result because there is that pesky parseResponse with round brackets.
I've taken a look at this question which unfortunately does not help. I'm actually using a JsonTextReader to feed in the JSON, rather than a string (and prefer so to avoid the performance hit of creating huge a string). Even if I'd use the suggestion from that question, it looks dangerous as it uses a global replace. If there is no good way to use a stream, an answer with safe parsing of strings would be okay.
If I interpret your question as follows:
I am trying to deserialize some JSON from a Stream. The "JSON" is actually in JSONP format and so contains some prefix and postfix text I would like to ignore. How can I skip the prefix and postfix text while still reading and deserializing directly from stream rather than loading the entire stream into a string?
Then you can deserialize your JSON from a JSONP stream using the following extension method:
public static class JsonExtensions
{
public static T DeserializeEmbeddedJsonP<T>(Stream stream)
{
using (var textReader = new StreamReader(stream))
return DeserializeEmbeddedJsonP<T>(textReader);
}
public static T DeserializeEmbeddedJsonP<T>(TextReader textReader)
{
using (var jsonReader = new JsonTextReader(textReader.SkipPast('(')))
{
var settings = new JsonSerializerSettings
{
CheckAdditionalContent = false,
};
return JsonSerializer.CreateDefault(settings).Deserialize<T>(jsonReader);
}
}
}
public static class TextReaderExtensions
{
public static TTextReader SkipPast<TTextReader>(this TTextReader reader, char ch) where TTextReader : TextReader
{
while (true)
{
var c = reader.Read();
if (c == -1 || c == ch)
return reader;
}
}
}
Notes:
Prior to constructing the JsonTextReader I construct a StreamReader and skip past the first '(' character in the stream. This positions the StreamReader at the beginning of the actual JSON.
Before deserialization I set JsonSerializerSettings.CheckAdditionalContent = false to tell the serializer to ignore any characters after the end of the JSON content. Oddly enough it is necessary to do this explicitly despite the fact that the default value seems to be false already, since the underlying field is nullable.
The same code can be used to deserialize embedded JSONP from a string by passing a StringReader to DeserializeEmbeddedJsonP<T>(TextReader reader);. Doing so avoids the need to create a new string by trimming the prefix and postfix text and so may improve performance and memory use even for smaller strings.
Sample working .Net fiddle.
It looks like it's returning JSONP. Kind of weird that a webservice would do that by default, without you including "?callback". In any case, if that's just the way it is, you can easily use a RegEx to just strip off the method call:
var x = WebServiceCall();
x = Regex.Replace(x, #"^.+?\(|\)$", "");

Serialize as NDJSON using Json.NET

Is it possible to serialize to NDJSON (Newline Delimited JSON) using Json.NET? The Elasticsearch API uses NDJSON for bulk operations, and I can find nothing suggesting that this format is supported by any .NET libraries.
This answer provides guidance for deserializing NDJSON, and it was noted that one could serialize each row independently and join with newline, but I would not necessarily call that supported.
As Json.NET does not currently have a built-in method to serialize a collection to NDJSON, the simplest answer would be to write to a single TextWriter using a separate JsonTextWriter for each line, setting CloseOutput = false for each:
public static partial class JsonExtensions
{
public static void ToNewlineDelimitedJson<T>(Stream stream, IEnumerable<T> items)
{
// Let caller dispose the underlying stream
using (var textWriter = new StreamWriter(stream, new UTF8Encoding(false, true), 1024, true))
{
ToNewlineDelimitedJson(textWriter, items);
}
}
public static void ToNewlineDelimitedJson<T>(TextWriter textWriter, IEnumerable<T> items)
{
var serializer = JsonSerializer.CreateDefault();
foreach (var item in items)
{
// Formatting.None is the default; I set it here for clarity.
using (var writer = new JsonTextWriter(textWriter) { Formatting = Formatting.None, CloseOutput = false })
{
serializer.Serialize(writer, item);
}
// https://web.archive.org/web/20180513150745/http://specs.okfnlabs.org/ndjson/
// Each JSON text MUST conform to the [RFC7159] standard and MUST be written to the stream followed by the newline character \n (0x0A).
// The newline charater MAY be preceeded by a carriage return \r (0x0D). The JSON texts MUST NOT contain newlines or carriage returns.
textWriter.Write("\n");
}
}
}
Sample fiddle.
Since the individual NDJSON lines are likely to be short but the number of lines might be large, this answer suggests a streaming solution to avoid the necessity of allocating a single string larger than 85kb. As explained in Newtonsoft Json.NET Performance Tips, such large strings end up on the large object heap and may subsequently degrade application performance.
You could try this:
string ndJson = JsonConvert.SerializeObject(value, Formatting.Indented);
but now I see that you are not just wanting the serialized object to be pretty printed. If the object you are serializing is some kind of collection or enumeration, could you not just do this yourself by serializing each element?
StringBuilder sb = new StringBuilder();
foreach (var element in collection)
{
sb.AppendLine(JsonConvert.SerializeObject(element, Formatting.None));
}
// use the NDJSON output
Console.WriteLine(sb.ToString());

Parse a string containing several json objects into something more convenient

The crux of the problem here is that I don't know any C#, yet find myself adding a feature to some test infrastructure which happens to be written in C#. I suspect this question is entirely trivial and beg your patience in answering. My colleagues who originally wrote this stuff are all out of the office.
I am parsing a string representing one or more json objects. So far I can get the first object, but can't work out how to access the remainder.
public class demo
{
public void minimal()
{
// Note - the input is not quite json! I.e. I don't have
// [{"Name" : "foo"}, {"Name" : "bar"}]
// Each individual object is well formed, they just aren't in
// a convenient array for easy parsing.
// Each string representation of an object are literally concatenated.
string data = #"{""Name"": ""foo""} {""Name"" : ""bar""}";
System.Xml.XmlDictionaryReader jsonReader =
JsonReaderWriterFactory.CreateJsonReader(Encoding.UTF8.GetBytes(data),
new System.Xml.XmlDictionaryReaderQuotas());
System.Xml.Linq.XElement root = XElement.Load(jsonReader);
Assert.AreEqual(root.XPathSelectElement("//Name").Value, "foo");
// The following clearly doesn't work
Assert.AreEqual(root.XPathSelectElement("//Name").Value, "bar");
}
}
I'm roughly at the point of rolling enough of a parser to work out where to split the string by counting braces but am hoping that the library support will do this for me.
The ideal end result is a sequential datastructure of your choice (list, vector? don't care) containing one System.Xml.Linq.XElement for each json object embedded in the string.
Thanks!
edit: Roughly viable example, mostly due to George Richardson - I'm playing fast and loose with the type system (not sure dynamic is available in C#3.0), but the end result seems to be predictable.
public class demo
{
private IEnumerable<Newtonsoft.Json.Linq.JObject>
DeserializeObjects(string input)
{
var serializer = new JsonSerializer();
using (var strreader = new StringReader(input))
{
using (var jsonreader = new JsonTextReader(strreader))
{
jsonreader.SupportMultipleContent = true;
while (jsonreader.Read())
{
yield return (Newtonsoft.Json.Linq.JObject)
serializer.Deserialize(jsonreader);
}
}
}
}
public void example()
{
string json = #"{""Name"": ""foo""} {""Name"" : ""bar""} {""Name"" : ""baz""}";
var objects = DeserializeObjects(json);
var array = objects.ToArray();
Assert.AreEqual(3, array.Length);
Assert.AreEqual(array[0]["Name"].ToString(), "foo");
Assert.AreEqual(array[1]["Name"].ToString(), "bar");
Assert.AreEqual(array[2]["Name"].ToString(), "baz");
}
}
You are going to want to use JSON.net for your actual deserialization needs. The big problem I see here is that your json data is just being concatenated together which means you are going to have to extract each object from the string. Luckily json.net's JsonReader has a SupportMultipleContent property which does just this
public void Main()
{
string json = #"{""Name"": ""foo""} {""Name"" : ""bar""} {""Name"" : ""baz""}";
IEnumerable<dynamic> deserialized = DeserializeObjects(json);
string name = deserialized.First().Name; //name is "foo"
}
IEnumerable<object> DeserializeObjects(string input)
{
JsonSerializer serializer = new JsonSerializer();
using (var strreader = new StringReader(input)) {
using (var jsonreader = new JsonTextReader(strreader)) {
jsonreader.SupportMultipleContent = true;
while (jsonreader.Read()) {
yield return serializer.Deserialize(jsonreader);
}
}
}
}

Creating a double serialized JSON object in REST SOE C#

I'm not sure if I'm double-serializing my JSON object, but the output results in an unwanted format. I'm exposing the REST endpoint via an ArcGIS Server Object Extension (REST SOE). I've also recently implemented JSON.Net which essentially allowed me to remove several lines of code.
So here is the handler, which is the core piece creating the data for the service (for you non GIS peeps).
private byte[] SearchOptionsResHandler(NameValueCollection boundVariables, string outputFormat, string requestProperties, out string responseProperties)
{
responseProperties = null;
JsonObject result = new JsonObject();
// Towns
DataTable dataTableTowns = GetDataTableTowns();
String jsonStringTowns = JsonConvert.SerializeObject(dataTableTowns);
result.AddString("Towns", jsonStringTowns);
// GL_Description
DataTable dataTableGLDesc = GetDataTableGLDesc();
String jsonStringGLDesc = JsonConvert.SerializeObject(dataTableGLDesc);
result.AddString("GLDesc", jsonStringGLDesc);
return Encoding.UTF8.GetBytes(result.ToJson());
}
The result is ugly scaped JSON:
{
"Towns": "[{\"Column1\":\"ANSONIA\"},{\"Column1\":\"BETHANY\"},{\"Column1\":\"BLOOMFIELD\"}]",
"GLDesc": "[{\"Column1\":\"Commercial\"},{\"Column1\":\"Industrial\"},{\"Column1\":\"Public\"}]"
}
Is it because I'm double serializing it somehow? Thanks for looking this over.
Yes, you are double serializing it. It's right there in your code.
For each of your data tables, dataTableTowns and dataTableGLDesc, you are calling JsonConvert.SerializeObject() to convert them to JSON strings, which you then add to a result JsonObject. You then call ToJson() on the result, which presumably serializes the whole thing to JSON again.
JsonObject is not part of Json.Net, while JsonConvert is, so I would recommend using one or the other. Generally, you just want to accumulate everything into a single result object and then serialize the whole thing once at the end. Here is how I would do it with Json.Net using an anonymous object to hold the result:
var result = new
{
Towns = GetDataTableTowns(),
GLDesc = GetDataTableGLDesc()
};
string json = JsonConvert.SerializeObject(result);
return Encoding.UTF8.GetBytes(json);
Here is the output:
{
"Towns":[{"Column1":"ANSONIA"},{"Column1":"BETHANY"},{"Column1":"BLOOMFIELD"}],
"GLDesc":[{"Column1":"Commercial"},{"Column1":"Industrial"},{"Column1":"Public"}]
}

Using Protobuf-net, I suddenly got an exception about an unknown wire-type

(this is a re-post of a question that I saw in my RSS, but which was deleted by the OP. I've re-added it because I've seen this question asked several times in different places; wiki for "good form")
Suddenly, I receive a ProtoException when deserializing and the message is: unknown wire-type 6
What is a wire-type?
What are the different wire-type values and their description?
I suspect a field is causing the problem, how to debug this?
First thing to check:
IS THE INPUT DATA PROTOBUF DATA? If you try and parse another format (json, xml, csv, binary-formatter), or simply broken data (an "internal server error" html placeholder text page, for example), then it won't work.
What is a wire-type?
It is a 3-bit flag that tells it (in broad terms; it is only 3 bits after all) what the next data looks like.
Each field in protocol buffers is prefixed by a header that tells it which field (number) it represents,
and what type of data is coming next; this "what type of data" is essential to support the case where
unanticipated data is in the stream (for example, you've added fields to the data-type at one end), as
it lets the serializer know how to read past that data (or store it for round-trip if required).
What are the different wire-type values and their description?
0: variant-length integer (up to 64 bits) - base-128 encoded with the MSB indicating continuation (used as the default for integer types, including enums)
1: 64-bit - 8 bytes of data (used for double, or electively for long/ulong)
2: length-prefixed - first read an integer using variant-length encoding; this tells you how many bytes of data follow (used for strings, byte[], "packed" arrays, and as the default for child objects properties / lists)
3: "start group" - an alternative mechanism for encoding child objects that uses start/end tags - largely deprecated by Google, it is more expensive to skip an entire child-object field since you can't just "seek" past an unexpected object
4: "end group" - twinned with 3
5: 32-bit - 4 bytes of data (used for float, or electively for int/uint and other small integer types)
I suspect a field is causing the problem, how to debug this?
Are you serializing to a file? The most likely cause (in my experience) is that you have overwritten an existing file, but have not truncated it; i.e. it was 200 bytes; you've re-written it, but with only 182 bytes. There are now 18 bytes of garbage on the end of your stream that is tripping it up. Files must be truncated when re-writing protocol buffers. You can do this with FileMode:
using(var file = new FileStream(path, FileMode.Truncate)) {
// write
}
or alternatively by SetLength after writing your data:
file.SetLength(file.Position);
Other possible cause
You are (accidentally) deserializing a stream into a different type than what was serialized. It's worth double-checking both sides of the conversation to ensure this is not happening.
Since the stack trace references this StackOverflow question, I thought I'd point out that you can also receive this exception if you (accidentally) deserialize a stream into a different type than what was serialized. So it's worth double-checking both sides of the conversation to ensure this is not happening.
This can also be caused by an attempt to write more than one protobuf message to a single stream. The solution is to use SerializeWithLengthPrefix and DeserializeWithLengthPrefix.
Why this happens:
The protobuf specification supports a fairly small number of wire-types (the binary storage formats) and data-types (the .NET etc data-types). Additionally, this is not 1:1, nor is is 1:many or many:1 - a single wire-type can be used for multiple data-types, and a single data-type can be encoded via any of multiple wire-types. As a consequence, you cannot fully understand a protobuf fragment unless you already know the scema, so you know how to interpret each value. When you are, say, reading an Int32 data-type, the supported wire-types might be "varint", "fixed32" and "fixed64", where-as when reading a String data-type, the only supported wire-type is "string".
If there is no compatible map between the data-type and wire-type, then the data cannot be read, and this error is raised.
Now let's look at why this occurs in the scenario here:
[ProtoContract]
public class Data1
{
[ProtoMember(1, IsRequired=true)]
public int A { get; set; }
}
[ProtoContract]
public class Data2
{
[ProtoMember(1, IsRequired = true)]
public string B { get; set; }
}
class Program
{
static void Main(string[] args)
{
var d1 = new Data1 { A = 1};
var d2 = new Data2 { B = "Hello" };
var ms = new MemoryStream();
Serializer.Serialize(ms, d1);
Serializer.Serialize(ms, d2);
ms.Position = 0;
var d3 = Serializer.Deserialize<Data1>(ms); // This will fail
var d4 = Serializer.Deserialize<Data2>(ms);
Console.WriteLine("{0} {1}", d3, d4);
}
}
In the above, two messages are written directly after each-other. The complication is: protobuf is an appendable format, with append meaning "merge". A protobuf message does not know its own length, so the default way of reading a message is: read until EOF. However, here we have appended two different types. If we read this back, it does not know when we have finished reading the first message, so it keeps reading. When it gets to data from the second message, we find ourselves reading a "string" wire-type, but we are still trying to populate a Data1 instance, for which member 1 is an Int32. There is no map between "string" and Int32, so it explodes.
The *WithLengthPrefix methods allow the serializer to know where each message finishes; so, if we serialize a Data1 and Data2 using the *WithLengthPrefix, then deserialize a Data1 and a Data2 using the *WithLengthPrefix methods, then it correctly splits the incoming data between the two instances, only reading the right value into the right object.
Additionally, when storing heterogeneous data like this, you might want to additionally assign (via *WithLengthPrefix) a different field-number to each class; this provides greater visibility of which type is being deserialized. There is also a method in Serializer.NonGeneric which can then be used to deserialize the data without needing to know in advance what we are deserializing:
// Data1 is "1", Data2 is "2"
Serializer.SerializeWithLengthPrefix(ms, d1, PrefixStyle.Base128, 1);
Serializer.SerializeWithLengthPrefix(ms, d2, PrefixStyle.Base128, 2);
ms.Position = 0;
var lookup = new Dictionary<int,Type> { {1, typeof(Data1)}, {2,typeof(Data2)}};
object obj;
while (Serializer.NonGeneric.TryDeserializeWithLengthPrefix(ms,
PrefixStyle.Base128, fieldNum => lookup[fieldNum], out obj))
{
Console.WriteLine(obj); // writes Data1 on the first iteration,
// and Data2 on the second iteration
}
Previous answers already explain the problem better than I can. I just want to add an even simpler way to reproduce the exception.
This error will also occur simply if the type of a serialized ProtoMember is different from the expected type during deserialization.
For instance if the client sends the following message:
public class DummyRequest
{
[ProtoMember(1)]
public int Foo{ get; set; }
}
But what the server deserializes the message into is the following class:
public class DummyRequest
{
[ProtoMember(1)]
public string Foo{ get; set; }
}
Then this will result in the for this case slightly misleading error message
ProtoBuf.ProtoException: Invalid wire-type; this usually means you have over-written a file without truncating or setting the length
It will even occur if the property name changed. Let's say the client sent the following instead:
public class DummyRequest
{
[ProtoMember(1)]
public int Bar{ get; set; }
}
This will still cause the server to deserialize the int Bar to string Foo which causes the same ProtoBuf.ProtoException.
I hope this helps somebody debugging their application.
Also check the obvious that all your subclasses have [ProtoContract] attribute. Sometimes you can miss it when you have rich DTO.
I've seen this issue when using the improper Encoding type to convert the bytes in and out of strings.
Need to use Encoding.Default and not Encoding.UTF8.
using (var ms = new MemoryStream())
{
Serializer.Serialize(ms, obj);
var bytes = ms.ToArray();
str = Encoding.Default.GetString(bytes);
}
If you are using SerializeWithLengthPrefix, please mind that casting instance to object type breaks the deserialization code and causes ProtoBuf.ProtoException : Invalid wire-type.
using (var ms = new MemoryStream())
{
var msg = new Message();
Serializer.SerializeWithLengthPrefix(ms, (object)msg, PrefixStyle.Base128); // Casting msg to object breaks the deserialization code.
ms.Position = 0;
Serializer.DeserializeWithLengthPrefix<Message>(ms, PrefixStyle.Base128)
}
This happened in my case because I had something like this:
var ms = new MemoryStream();
Serializer.Serialize(ms, batch);
_queue.Add(Convert.ToBase64String(ms.ToArray()));
So basically I was putting a base64 into a queue and then, on the consumer side I had:
var stream = new MemoryStream(Encoding.UTF8.GetBytes(myQueueItem));
var batch = Serializer.Deserialize<List<EventData>>(stream);
So though the type of each myQueueItem was correct, I forgot that I converted a string. The solution was to convert it once more:
var bytes = Convert.FromBase64String(myQueueItem);
var stream = new MemoryStream(bytes);
var batch = Serializer.Deserialize<List<EventData>>(stream);

Categories

Resources