Parse a string containing several json objects into something more convenient - c#

The crux of the problem here is that I don't know any C#, yet find myself adding a feature to some test infrastructure which happens to be written in C#. I suspect this question is entirely trivial and beg your patience in answering. My colleagues who originally wrote this stuff are all out of the office.
I am parsing a string representing one or more json objects. So far I can get the first object, but can't work out how to access the remainder.
public class demo
{
public void minimal()
{
// Note - the input is not quite json! I.e. I don't have
// [{"Name" : "foo"}, {"Name" : "bar"}]
// Each individual object is well formed, they just aren't in
// a convenient array for easy parsing.
// Each string representation of an object are literally concatenated.
string data = #"{""Name"": ""foo""} {""Name"" : ""bar""}";
System.Xml.XmlDictionaryReader jsonReader =
JsonReaderWriterFactory.CreateJsonReader(Encoding.UTF8.GetBytes(data),
new System.Xml.XmlDictionaryReaderQuotas());
System.Xml.Linq.XElement root = XElement.Load(jsonReader);
Assert.AreEqual(root.XPathSelectElement("//Name").Value, "foo");
// The following clearly doesn't work
Assert.AreEqual(root.XPathSelectElement("//Name").Value, "bar");
}
}
I'm roughly at the point of rolling enough of a parser to work out where to split the string by counting braces but am hoping that the library support will do this for me.
The ideal end result is a sequential datastructure of your choice (list, vector? don't care) containing one System.Xml.Linq.XElement for each json object embedded in the string.
Thanks!
edit: Roughly viable example, mostly due to George Richardson - I'm playing fast and loose with the type system (not sure dynamic is available in C#3.0), but the end result seems to be predictable.
public class demo
{
private IEnumerable<Newtonsoft.Json.Linq.JObject>
DeserializeObjects(string input)
{
var serializer = new JsonSerializer();
using (var strreader = new StringReader(input))
{
using (var jsonreader = new JsonTextReader(strreader))
{
jsonreader.SupportMultipleContent = true;
while (jsonreader.Read())
{
yield return (Newtonsoft.Json.Linq.JObject)
serializer.Deserialize(jsonreader);
}
}
}
}
public void example()
{
string json = #"{""Name"": ""foo""} {""Name"" : ""bar""} {""Name"" : ""baz""}";
var objects = DeserializeObjects(json);
var array = objects.ToArray();
Assert.AreEqual(3, array.Length);
Assert.AreEqual(array[0]["Name"].ToString(), "foo");
Assert.AreEqual(array[1]["Name"].ToString(), "bar");
Assert.AreEqual(array[2]["Name"].ToString(), "baz");
}
}

You are going to want to use JSON.net for your actual deserialization needs. The big problem I see here is that your json data is just being concatenated together which means you are going to have to extract each object from the string. Luckily json.net's JsonReader has a SupportMultipleContent property which does just this
public void Main()
{
string json = #"{""Name"": ""foo""} {""Name"" : ""bar""} {""Name"" : ""baz""}";
IEnumerable<dynamic> deserialized = DeserializeObjects(json);
string name = deserialized.First().Name; //name is "foo"
}
IEnumerable<object> DeserializeObjects(string input)
{
JsonSerializer serializer = new JsonSerializer();
using (var strreader = new StringReader(input)) {
using (var jsonreader = new JsonTextReader(strreader)) {
jsonreader.SupportMultipleContent = true;
while (jsonreader.Read()) {
yield return serializer.Deserialize(jsonreader);
}
}
}
}

Related

How to deserialize a JSONP response (preferably with JsonTextReader and not a string)?

I am trying to consume a web service that claims to return JSON, but actually always returns JSONP. I don't see a way to change that service's behavior.
I would like to use NewtonSoft Json.Net to parse the result. I have declared a class, let's call it MyType that I want to deserialize the inner JSON result into.
JSONP:
parseResponse({
"total" : "13,769",
"lower" : "1",
"upper" : "20"})
As you can see this is not correct JSON as it has parseResponse( prefix and ) suffix. While this example is very simple, the actual response can be quite long, on the order of 100Ks.
MyType:
public class MyType
{
public Decimal total;
public int lower;
public int upper;
}
After I get my web service response into a stream and JsonTextReader I try to deserialize like this:
(MyType)serializer.Deserialize(jsonTextReader, typeof(MyType));
Of course I get null for a result because there is that pesky parseResponse with round brackets.
I've taken a look at this question which unfortunately does not help. I'm actually using a JsonTextReader to feed in the JSON, rather than a string (and prefer so to avoid the performance hit of creating huge a string). Even if I'd use the suggestion from that question, it looks dangerous as it uses a global replace. If there is no good way to use a stream, an answer with safe parsing of strings would be okay.
If I interpret your question as follows:
I am trying to deserialize some JSON from a Stream. The "JSON" is actually in JSONP format and so contains some prefix and postfix text I would like to ignore. How can I skip the prefix and postfix text while still reading and deserializing directly from stream rather than loading the entire stream into a string?
Then you can deserialize your JSON from a JSONP stream using the following extension method:
public static class JsonExtensions
{
public static T DeserializeEmbeddedJsonP<T>(Stream stream)
{
using (var textReader = new StreamReader(stream))
return DeserializeEmbeddedJsonP<T>(textReader);
}
public static T DeserializeEmbeddedJsonP<T>(TextReader textReader)
{
using (var jsonReader = new JsonTextReader(textReader.SkipPast('(')))
{
var settings = new JsonSerializerSettings
{
CheckAdditionalContent = false,
};
return JsonSerializer.CreateDefault(settings).Deserialize<T>(jsonReader);
}
}
}
public static class TextReaderExtensions
{
public static TTextReader SkipPast<TTextReader>(this TTextReader reader, char ch) where TTextReader : TextReader
{
while (true)
{
var c = reader.Read();
if (c == -1 || c == ch)
return reader;
}
}
}
Notes:
Prior to constructing the JsonTextReader I construct a StreamReader and skip past the first '(' character in the stream. This positions the StreamReader at the beginning of the actual JSON.
Before deserialization I set JsonSerializerSettings.CheckAdditionalContent = false to tell the serializer to ignore any characters after the end of the JSON content. Oddly enough it is necessary to do this explicitly despite the fact that the default value seems to be false already, since the underlying field is nullable.
The same code can be used to deserialize embedded JSONP from a string by passing a StringReader to DeserializeEmbeddedJsonP<T>(TextReader reader);. Doing so avoids the need to create a new string by trimming the prefix and postfix text and so may improve performance and memory use even for smaller strings.
Sample working .Net fiddle.
It looks like it's returning JSONP. Kind of weird that a webservice would do that by default, without you including "?callback". In any case, if that's just the way it is, you can easily use a RegEx to just strip off the method call:
var x = WebServiceCall();
x = Regex.Replace(x, #"^.+?\(|\)$", "");

What is the best way to convert Newtonsoft JSON's JToken to JArray?

Given an array in the JSON, I am trying to find the best way to convert it to JArray.
For example - consider this below C# code:
var json = #"{
""cities"": [""London"", ""Paris"", ""New York""]
}";
I can read this JSON into JObject as -
var jsonObject = JObject.Parse(json);
Now I will get the "cities" field.
var jsonCities = jsonObject["cities"];
Here I get jsonCities as type JToken.
I know jsonCities is an array, so I would like to get it converted to JArray.
The way I do currently is like this -
var cities = JArray.FromObject(jsonCities);
I am trying to find out is there any better way to get it converted to JArray.
How are other folks using it?
The accepted answer should really be the comment by dbc.
After the proposed casting to JArray, we can validate the result by checking for null value:
var jsonCities = jsonObject["cities"] as JArray;
if (jsonCities == null) return;
...do your thing with JArray...
Edit:
As stated by dbc, a JToken that represent a JArray, is already a JArray.
That is if the JToken.Type equals an JTokenType.Array. If so it can be accessed by using the as JArray notation. When the as casting notation is used, a failed cast will render a null value, as explained here. That makes it convenient for validating that you actually got a JArray you can use.
JArray.FromObject(x) takes an object, so it can be used with anything that can be represented as an object and thus certainly an JToken.
In this case we know that we can simply cast from JToken to JArray, so it gives us another possibility. I would expect it to be faster, but I leave that as an exercise for someone else to figure out.
Here we make use of c# classes to store the contents of our jsonString by deserialising the string.
Below is a basic example.
For Further Reading I will point you to the Newtonsoft.Json website.
https://www.newtonsoft.com/json/help/html/DeserializeObject.htm
class Program
{
static void Main(string[] args)
{
var json = #"{""cities"": [""London"", ""Paris"", ""New York""]}";
MyObject result = JsonConvert.DeserializeObject<MyObject>(json);
foreach (var city in result.Cities)
{
Console.WriteLine(city);
}
Console.ReadKey();
}
public class MyObject
{
[JsonProperty("cities")]
public List<string> Cities { get; set; }
}
}

Serialize as NDJSON using Json.NET

Is it possible to serialize to NDJSON (Newline Delimited JSON) using Json.NET? The Elasticsearch API uses NDJSON for bulk operations, and I can find nothing suggesting that this format is supported by any .NET libraries.
This answer provides guidance for deserializing NDJSON, and it was noted that one could serialize each row independently and join with newline, but I would not necessarily call that supported.
As Json.NET does not currently have a built-in method to serialize a collection to NDJSON, the simplest answer would be to write to a single TextWriter using a separate JsonTextWriter for each line, setting CloseOutput = false for each:
public static partial class JsonExtensions
{
public static void ToNewlineDelimitedJson<T>(Stream stream, IEnumerable<T> items)
{
// Let caller dispose the underlying stream
using (var textWriter = new StreamWriter(stream, new UTF8Encoding(false, true), 1024, true))
{
ToNewlineDelimitedJson(textWriter, items);
}
}
public static void ToNewlineDelimitedJson<T>(TextWriter textWriter, IEnumerable<T> items)
{
var serializer = JsonSerializer.CreateDefault();
foreach (var item in items)
{
// Formatting.None is the default; I set it here for clarity.
using (var writer = new JsonTextWriter(textWriter) { Formatting = Formatting.None, CloseOutput = false })
{
serializer.Serialize(writer, item);
}
// https://web.archive.org/web/20180513150745/http://specs.okfnlabs.org/ndjson/
// Each JSON text MUST conform to the [RFC7159] standard and MUST be written to the stream followed by the newline character \n (0x0A).
// The newline charater MAY be preceeded by a carriage return \r (0x0D). The JSON texts MUST NOT contain newlines or carriage returns.
textWriter.Write("\n");
}
}
}
Sample fiddle.
Since the individual NDJSON lines are likely to be short but the number of lines might be large, this answer suggests a streaming solution to avoid the necessity of allocating a single string larger than 85kb. As explained in Newtonsoft Json.NET Performance Tips, such large strings end up on the large object heap and may subsequently degrade application performance.
You could try this:
string ndJson = JsonConvert.SerializeObject(value, Formatting.Indented);
but now I see that you are not just wanting the serialized object to be pretty printed. If the object you are serializing is some kind of collection or enumeration, could you not just do this yourself by serializing each element?
StringBuilder sb = new StringBuilder();
foreach (var element in collection)
{
sb.AppendLine(JsonConvert.SerializeObject(element, Formatting.None));
}
// use the NDJSON output
Console.WriteLine(sb.ToString());

How to deserialize JSONP in C#? [duplicate]

I am trying to consume a web service that claims to return JSON, but actually always returns JSONP. I don't see a way to change that service's behavior.
I would like to use NewtonSoft Json.Net to parse the result. I have declared a class, let's call it MyType that I want to deserialize the inner JSON result into.
JSONP:
parseResponse({
"total" : "13,769",
"lower" : "1",
"upper" : "20"})
As you can see this is not correct JSON as it has parseResponse( prefix and ) suffix. While this example is very simple, the actual response can be quite long, on the order of 100Ks.
MyType:
public class MyType
{
public Decimal total;
public int lower;
public int upper;
}
After I get my web service response into a stream and JsonTextReader I try to deserialize like this:
(MyType)serializer.Deserialize(jsonTextReader, typeof(MyType));
Of course I get null for a result because there is that pesky parseResponse with round brackets.
I've taken a look at this question which unfortunately does not help. I'm actually using a JsonTextReader to feed in the JSON, rather than a string (and prefer so to avoid the performance hit of creating huge a string). Even if I'd use the suggestion from that question, it looks dangerous as it uses a global replace. If there is no good way to use a stream, an answer with safe parsing of strings would be okay.
If I interpret your question as follows:
I am trying to deserialize some JSON from a Stream. The "JSON" is actually in JSONP format and so contains some prefix and postfix text I would like to ignore. How can I skip the prefix and postfix text while still reading and deserializing directly from stream rather than loading the entire stream into a string?
Then you can deserialize your JSON from a JSONP stream using the following extension method:
public static class JsonExtensions
{
public static T DeserializeEmbeddedJsonP<T>(Stream stream)
{
using (var textReader = new StreamReader(stream))
return DeserializeEmbeddedJsonP<T>(textReader);
}
public static T DeserializeEmbeddedJsonP<T>(TextReader textReader)
{
using (var jsonReader = new JsonTextReader(textReader.SkipPast('(')))
{
var settings = new JsonSerializerSettings
{
CheckAdditionalContent = false,
};
return JsonSerializer.CreateDefault(settings).Deserialize<T>(jsonReader);
}
}
}
public static class TextReaderExtensions
{
public static TTextReader SkipPast<TTextReader>(this TTextReader reader, char ch) where TTextReader : TextReader
{
while (true)
{
var c = reader.Read();
if (c == -1 || c == ch)
return reader;
}
}
}
Notes:
Prior to constructing the JsonTextReader I construct a StreamReader and skip past the first '(' character in the stream. This positions the StreamReader at the beginning of the actual JSON.
Before deserialization I set JsonSerializerSettings.CheckAdditionalContent = false to tell the serializer to ignore any characters after the end of the JSON content. Oddly enough it is necessary to do this explicitly despite the fact that the default value seems to be false already, since the underlying field is nullable.
The same code can be used to deserialize embedded JSONP from a string by passing a StringReader to DeserializeEmbeddedJsonP<T>(TextReader reader);. Doing so avoids the need to create a new string by trimming the prefix and postfix text and so may improve performance and memory use even for smaller strings.
Sample working .Net fiddle.
It looks like it's returning JSONP. Kind of weird that a webservice would do that by default, without you including "?callback". In any case, if that's just the way it is, you can easily use a RegEx to just strip off the method call:
var x = WebServiceCall();
x = Regex.Replace(x, #"^.+?\(|\)$", "");

How to deserialize class with array of serializable object

I have class with array of some serialized class (named it elements). I want to serialize this class and then deserialize this. But deserialization is difficult.
Because I don't know type of element of my array before serialized this create two arrays match to elements array. In one (typeOfElements) keep type of element and in other (serializedElemnt) keep serialized string of elements. But I don't know after deserialized how to create my main elements array. How can I convert type to class to create my main array?
[ProtoContract]
class MyClass
{
.
.
public MyClass()
{
}
object[] elements;
[ProtoMember(1)]
string[] SerilizedElements;
[ProtoMember(2)]
string[] TypeOfElements;
[ProtoBeforeSerialization]
void initBeforeSerilize()
{
TypeOfElements = new string[elements.Length];
SerilizedElements = new string[elements.Length];
for (int i = 0; i < elements.Length; i++)
{
TypeOfElements[i] = elements[i].GetType().ToString();
using (MemoryStream ms = new MemoryStream())
{
Serializer.Serialize(ms, elements[i]);
SerilizedElements[i] = Encoding.UTF8.GetString(ms.ToArray());
}
}
}
[ProtoAfterDeserialization]
void initAfterSerilize()
{
for (int i = 0; i < SerilizedElements.Length; i++)
{
Type t = Type.GetType(TypeOfElements[i]);
using(MemoryStream ms=new MemoryStream(Encoding.ASCII.GetBytes(SerilizedElements[i])))
{
//I don't know how to write this line
elements[i]=Serializer.Deserialize<t>(ms);
}
}
}
}
For the "I only know the type at runtime" issue, look at Serializer.NonGeneric, which has all the methods you would want for working with a Type. The non-generic API is also the primary API on the v2 API, aka TypeModel. The string encoding issue has already been noted; if you need strings, base-64 should be used, but personally I'd use a byte[]. I would also suggest thinking about whether inheritance can be used instead of unknown types - this is certainly possible if the number of candidate types is finite and known.

Categories

Resources