I have a human edited JSON (a config file) and need to programmatically change a value, but keep the comments and, optionally, keep the formatting, too. Is it possible with Json.NET? I have the:
JToken jobject = JToken.Parse(json);
jobject["name"] = name;
json = jobject.ToString();
But it removes all the comments and reformats the JSON string.
Keeping comments is possible but formatting is a different story and I'm not aware of a proper way of doing it using Json.Net, however JsonTextReader has LineNumber and LinePosition and it should be possible to use them to preserve formatting to a degree but it feels hacky and fragile, hence if it is not really important I suggest to use Json.Net internal formatting.
Here is a sample for updating properties and keeping comments but not formatting.
private static string Update(string json, object update)
{
var updateObj = JObject.Parse(JsonConvert.SerializeObject(update));
var result = new StringWriter();
var writer = new JsonTextWriter(result);
writer.Formatting = Formatting.Indented;
var reader = new JsonTextReader(new StringReader(json));
while (reader.Read())
{
if (reader.Value == null)
{
writer.WriteToken(reader.TokenType);
continue;
}
var token=
reader.TokenType == JsonToken.Comment ||
reader.TokenType == JsonToken.PropertyName ||
string.IsNullOrEmpty(reader.Path)
? null
: updateObj.SelectToken(reader.Path);
if (token == null)
writer.WriteToken(reader.TokenType, reader.Value);
else
writer.WriteToken(reader.TokenType, token.ToObject(reader.ValueType));
}
return result.ToString();
}
static void Main(string[] args)
{
string json = #"{
//broken
'CPU': 'Intel',
'PSU': '500W',
'Drives': [
'DVD read/writer'
/*broken*/,
'500 gigabyte hard drive',
'200 gigabype hard drive'
]
}";
var update=Update(json, new { CPU = "AMD", Drives = new[] { "120 gigabytes ssd" } });
}
Related
I am using .NET JSON parser and would like to serialize my config file so it is readable. So instead of:
{"blah":"v", "blah2":"v2"}
I would like something nicer like:
{
"blah":"v",
"blah2":"v2"
}
My code is something like this:
using System.Web.Script.Serialization;
var ser = new JavaScriptSerializer();
configSz = ser.Serialize(config);
using (var f = (TextWriter)File.CreateText(configFn))
{
f.WriteLine(configSz);
f.Close();
}
You are going to have a hard time accomplishing this with JavaScriptSerializer.
Try JSON.Net.
With minor modifications from JSON.Net example
using System;
using Newtonsoft.Json;
namespace JsonPrettyPrint
{
internal class Program
{
private static void Main(string[] args)
{
Product product = new Product
{
Name = "Apple",
Expiry = new DateTime(2008, 12, 28),
Price = 3.99M,
Sizes = new[] { "Small", "Medium", "Large" }
};
string json = JsonConvert.SerializeObject(product, Formatting.Indented);
Console.WriteLine(json);
Product deserializedProduct = JsonConvert.DeserializeObject<Product>(json);
}
}
internal class Product
{
public String[] Sizes { get; set; }
public decimal Price { get; set; }
public DateTime Expiry { get; set; }
public string Name { get; set; }
}
}
Results
{
"Sizes": [
"Small",
"Medium",
"Large"
],
"Price": 3.99,
"Expiry": "\/Date(1230447600000-0700)\/",
"Name": "Apple"
}
Documentation: Serialize an Object
A shorter sample code for Json.Net library
private static string FormatJson(string json)
{
dynamic parsedJson = JsonConvert.DeserializeObject(json);
return JsonConvert.SerializeObject(parsedJson, Formatting.Indented);
}
If you have a JSON string and want to "prettify" it, but don't want to serialise it to and from a known C# type then the following does the trick (using JSON.NET):
using System;
using System.IO;
using Newtonsoft.Json;
class JsonUtil
{
public static string JsonPrettify(string json)
{
using (var stringReader = new StringReader(json))
using (var stringWriter = new StringWriter())
{
var jsonReader = new JsonTextReader(stringReader);
var jsonWriter = new JsonTextWriter(stringWriter) { Formatting = Formatting.Indented };
jsonWriter.WriteToken(jsonReader);
return stringWriter.ToString();
}
}
}
Shortest version to prettify existing JSON: (edit: using JSON.net)
JToken.Parse("mystring").ToString()
Input:
{"menu": { "id": "file", "value": "File", "popup": { "menuitem": [ {"value": "New", "onclick": "CreateNewDoc()"}, {"value": "Open", "onclick": "OpenDoc()"}, {"value": "Close", "onclick": "CloseDoc()"} ] } }}
Output:
{
"menu": {
"id": "file",
"value": "File",
"popup": {
"menuitem": [
{
"value": "New",
"onclick": "CreateNewDoc()"
},
{
"value": "Open",
"onclick": "OpenDoc()"
},
{
"value": "Close",
"onclick": "CloseDoc()"
}
]
}
}
}
To pretty-print an object:
JToken.FromObject(myObject).ToString()
Oneliner using Newtonsoft.Json.Linq:
string prettyJson = JToken.Parse(uglyJsonString).ToString(Formatting.Indented);
Net Core App
var js = JsonSerializer.Serialize(obj, new JsonSerializerOptions {
WriteIndented = true
});
All this can be done in one simple line:
string jsonString = JsonConvert.SerializeObject(yourObject, Formatting.Indented);
Here is a solution using Microsoft's System.Text.Json library:
static string FormatJsonText(string jsonString)
{
using var doc = JsonDocument.Parse(
jsonString,
new JsonDocumentOptions
{
AllowTrailingCommas = true
}
);
MemoryStream memoryStream = new MemoryStream();
using (
var utf8JsonWriter = new Utf8JsonWriter(
memoryStream,
new JsonWriterOptions
{
Indented = true
}
)
)
{
doc.WriteTo(utf8JsonWriter);
}
return new System.Text.UTF8Encoding()
.GetString(memoryStream.ToArray());
}
You may use following standard method for getting formatted Json
JsonReaderWriterFactory.CreateJsonWriter(Stream stream, Encoding encoding, bool ownsStream, bool indent, string indentChars)
Only set "indent==true"
Try something like this
public readonly DataContractJsonSerializerSettings Settings =
new DataContractJsonSerializerSettings
{ UseSimpleDictionaryFormat = true };
public void Keep<TValue>(TValue item, string path)
{
try
{
using (var stream = File.Open(path, FileMode.Create))
{
//var currentCulture = Thread.CurrentThread.CurrentCulture;
//Thread.CurrentThread.CurrentCulture = CultureInfo.InvariantCulture;
try
{
using (var writer = JsonReaderWriterFactory.CreateJsonWriter(
stream, Encoding.UTF8, true, true, " "))
{
var serializer = new DataContractJsonSerializer(type, Settings);
serializer.WriteObject(writer, item);
writer.Flush();
}
}
catch (Exception exception)
{
Debug.WriteLine(exception.ToString());
}
finally
{
//Thread.CurrentThread.CurrentCulture = currentCulture;
}
}
}
catch (Exception exception)
{
Debug.WriteLine(exception.ToString());
}
}
Pay your attention to lines
var currentCulture = Thread.CurrentThread.CurrentCulture;
Thread.CurrentThread.CurrentCulture = CultureInfo.InvariantCulture;
....
Thread.CurrentThread.CurrentCulture = currentCulture;
For some kinds of xml-serializers you should use InvariantCulture to avoid exception during deserialization on the computers with different Regional settings. For example, invalid format of double or DateTime sometimes cause them.
For deserializing
public TValue Revive<TValue>(string path, params object[] constructorArgs)
{
try
{
using (var stream = File.OpenRead(path))
{
//var currentCulture = Thread.CurrentThread.CurrentCulture;
//Thread.CurrentThread.CurrentCulture = CultureInfo.InvariantCulture;
try
{
var serializer = new DataContractJsonSerializer(type, Settings);
var item = (TValue) serializer.ReadObject(stream);
if (Equals(item, null)) throw new Exception();
return item;
}
catch (Exception exception)
{
Debug.WriteLine(exception.ToString());
return (TValue) Activator.CreateInstance(type, constructorArgs);
}
finally
{
//Thread.CurrentThread.CurrentCulture = currentCulture;
}
}
}
catch
{
return (TValue) Activator.CreateInstance(typeof (TValue), constructorArgs);
}
}
Thanks!
Using System.Text.Json set JsonSerializerOptions.WriteIndented = true:
JsonSerializerOptions options = new JsonSerializerOptions { WriteIndented = true };
string json = JsonSerializer.Serialize<Type>(object, options);
2023 Update
For those who ask how I get formatted JSON in .NET using C# and want to see how to use it right away and one-line lovers. Here are the indented JSON string one-line codes:
There are 2 well-known JSON formatter or parsers to serialize:
Newtonsoft Json.Net version:
using Newtonsoft.Json;
var jsonString = JsonConvert.SerializeObject(yourObj, Formatting.Indented);
.Net 7 version:
using System.Text.Json;
var jsonString = JsonSerializer.Serialize(yourObj, new JsonSerializerOptions { WriteIndented = true });
using System.Text.Json;
...
var parsedJson = JsonSerializer.Deserialize<ExpandoObject>(json);
var options = new JsonSerializerOptions() { WriteIndented = true };
return JsonSerializer.Serialize(parsedJson, options);
First I wanted to add comment under Duncan Smart post, but unfortunately I have not got enough reputation yet to leave comments. So I will try it here.
I just want to warn about side effects.
JsonTextReader internally parses json into typed JTokens and then serialises them back.
For example if your original JSON was
{ "double":0.00002, "date":"\/Date(1198908717056)\/"}
After prettify you get
{
"double":2E-05,
"date": "2007-12-29T06:11:57.056Z"
}
Of course both json string are equivalent and will deserialize to structurally equal objects, but if you need to preserve original string values, you need to take this into concideration
I have something very simple for this. You can put as input really any object to be converted into json with a format:
private static string GetJson<T> (T json)
{
return JsonConvert.SerializeObject(json, Formatting.Indented);
}
This worked for me. In case someone is looking for a VB.NET version.
#imports System
#imports System.IO
#imports Newtonsoft.Json
Public Shared Function JsonPrettify(ByVal json As String) As String
Using stringReader = New StringReader(json)
Using stringWriter = New StringWriter()
Dim jsonReader = New JsonTextReader(stringReader)
Dim jsonWriter = New JsonTextWriter(stringWriter) With {
.Formatting = Formatting.Indented
}
jsonWriter.WriteToken(jsonReader)
Return stringWriter.ToString()
End Using
End Using
End Function
.NET 5 has built in classes for handling JSON parsing, serialization, deserialization under System.Text.Json namespace. Below is an example of a serializer which converts a .NET object to a JSON string,
using System.Text.Json;
using System.Text.Json.Serialization;
private string ConvertJsonString(object obj)
{
JsonSerializerOptions options = new JsonSerializerOptions();
options.WriteIndented = true; //Pretty print using indent, white space, new line, etc.
options.NumberHandling = JsonNumberHandling.AllowNamedFloatingPointLiterals; //Allow NANs
string jsonString = JsonSerializer.Serialize(obj, options);
return jsonString;
}
Below code works for me:
JsonConvert.SerializeObject(JToken.Parse(yourobj.ToString()))
For UTF8 encoded JSON file using .NET Core 3.1, I was finally able to use JsonDocument based upon this information from Microsoft: https://learn.microsoft.com/en-us/dotnet/standard/serialization/system-text-json-how-to#utf8jsonreader-utf8jsonwriter-and-jsondocument
string allLinesAsOneString = string.Empty;
string [] lines = File.ReadAllLines(filename, Encoding.UTF8);
foreach(var line in lines)
allLinesAsOneString += line;
JsonDocument jd = JsonDocument.Parse(Encoding.UTF8.GetBytes(allLinesAsOneString));
var writer = new Utf8JsonWriter(Console.OpenStandardOutput(), new JsonWriterOptions
{
Indented = true
});
JsonElement root = jd.RootElement;
if( root.ValueKind == JsonValueKind.Object )
{
writer.WriteStartObject();
}
foreach (var jp in root.EnumerateObject())
jp.WriteTo(writer);
writer.WriteEndObject();
writer.Flush();
The Newtonsoft.Json libraries' JsonConvert.DeserializeXmlNode gives inconsistent datetime results when elements have attributes on them.
Here is a small example that demonstrates the issue
public void Main(string[] args)
{
var now = DateTime.Now.ToString("yyyy-MM-ddTHH:mm:ss");
var xml = $"<timestamp>{now}</timestamp>";
Debug.WriteLine(xml);
// <timestamp>2016-11-14T14:51:32</timestamp>
var json = XmlToJson(xml);
Debug.WriteLine(json);
// {"timestamp":"2016-11-14T14:51:32"}
var good = JsonToXml(json);
Debug.WriteLine(good);
// <?xml version="1.0" encoding="utf-8"?><timestamp>2016-11-14T14:51:32</timestamp>
var xml_with_attr = $"<timestamp id=\"1\">{now}</timestamp>";
Debug.WriteLine(xml_with_attr);
// <timestamp id="1">2016-11-14T14:51:32</timestamp>
var json_with_attr = XmlToJson(xml_with_attr);
Debug.WriteLine(json_with_attr);
// {"timestamp":{"#id":"1","#text":"2016-11-14T14:51:32"}}
var bad = JsonToXml(json_with_attr);
Debug.WriteLine(bad);
// <?xml version="1.0" encoding="utf-8"?><timestamp id="1">2016-11-14 2:51:32 PM</timestamp>
}
private string XmlToJson(string xml)
{
var doc = new XmlDocument();
doc.LoadXml(xml);
var json = JsonConvert.SerializeXmlNode(doc);
return json;
}
private string JsonToXml(string json)
{
var doc = JsonConvert.DeserializeXmlNode(json);
var xml = string.Empty;
var settings = new XmlWriterSettings
{
CloseOutput = true,
Encoding = Encoding.UTF8,
};
using (var ms = new MemoryStream())
using (var xw = XmlWriter.Create(ms, settings))
{
doc.WriteTo(xw);
xw.Flush();
xml = settings.Encoding.GetString(ms.ToArray());
}
return xml;
}
As you can see, the bad date is not in the same format as all the previous results. This is unfortunately causing the xml to fail schema validation once it gets verified against the schema.
I know about the DateTimeConverter stuff, but converting to and from a XmlDocument does not give me that option.
I can also - unfortunately - not do the JsonConvert on the schema generated class because I have no idea what it might be at the time of execution.
Does anyone know how I can get the same format back when the element has an attribute?
Thanks
Update
Fixed in Json.NET 10.0.1:
Fix - Fixed deserializing non-string values in some XML nodes
See this issue and this commit.
Original answer
This seems to be a bug in Json.NET's XmlNodeConverter. You might want to report an issue.
The workaround is to disable date parsing when converting from JSON to XML. Please note that this works reliably only as long as all dates and times in the JSON are already in ISO 8601 format. Since that seems to be true in your test case, you should be OK:
private static string JsonToXml(string json)
{
var settings = new JsonSerializerSettings
{
Converters = { new Newtonsoft.Json.Converters.XmlNodeConverter() },
DateParseHandling = DateParseHandling.None,
};
var doc = JsonConvert.DeserializeObject<XmlDocument>(json, settings);
var xmlSettings = new XmlWriterSettings
{
CloseOutput = true,
Encoding = Encoding.UTF8,
};
string xml;
using (var ms = new MemoryStream())
using (var xw = XmlWriter.Create(ms, xmlSettings))
{
doc.WriteTo(xw);
xw.Flush();
xml = xmlSettings.Encoding.GetString(ms.ToArray());
}
return xml;
}
The cause of the bug is as follows, in case you decide to report an issue. As you have noticed, Json.NET represents the XML text value for an element without attributes differently from the text value for an element with attributes:
No Attributes: {"timestamp":"2016-11-15T01:07:14"}.
In this case, the JSON token value for your date string is added to the XML DOM via the method XmlNodeConverter.CreateElement():
if (reader.TokenType == JsonToken.String
|| reader.TokenType == JsonToken.Integer
|| reader.TokenType == JsonToken.Float
|| reader.TokenType == JsonToken.Boolean
|| reader.TokenType == JsonToken.Date)
{
string text = ConvertTokenToXmlValue(reader);
if (text != null)
{
element.AppendChild(document.CreateTextNode(text));
}
}
It calls ConvertTokenToXmlValue():
private string ConvertTokenToXmlValue(JsonReader reader)
{
if (reader.TokenType == JsonToken.String)
{
return (reader.Value != null) ? reader.Value.ToString() : null;
}
else if (reader.TokenType == JsonToken.Integer)
{
#if !(NET20 || NET35 || PORTABLE || PORTABLE40)
if (reader.Value is BigInteger)
{
return ((BigInteger)reader.Value).ToString(CultureInfo.InvariantCulture);
}
#endif
return XmlConvert.ToString(Convert.ToInt64(reader.Value, CultureInfo.InvariantCulture));
}
else if (reader.TokenType == JsonToken.Float)
{
if (reader.Value is decimal)
{
return XmlConvert.ToString((decimal)reader.Value);
}
if (reader.Value is float)
{
return XmlConvert.ToString((float)reader.Value);
}
return XmlConvert.ToString(Convert.ToDouble(reader.Value, CultureInfo.InvariantCulture));
}
else if (reader.TokenType == JsonToken.Boolean)
{
return XmlConvert.ToString(Convert.ToBoolean(reader.Value, CultureInfo.InvariantCulture));
}
else if (reader.TokenType == JsonToken.Date)
{
#if !NET20
if (reader.Value is DateTimeOffset)
{
return XmlConvert.ToString((DateTimeOffset)reader.Value);
}
#endif
DateTime d = Convert.ToDateTime(reader.Value, CultureInfo.InvariantCulture);
#if !PORTABLE
return XmlConvert.ToString(d, DateTimeUtils.ToSerializationMode(d.Kind));
#else
return XmlConvert.ToString(d);
#endif
}
else if (reader.TokenType == JsonToken.Null)
{
return null;
}
else
{
throw JsonSerializationException.Create(reader, "Cannot get an XML string value from token type '{0}'.".FormatWith(CultureInfo.InvariantCulture, reader.TokenType));
}
}
Which has a lot of logic for converting a JSON token value to an XML value and is doing the right thing when converting your dates and times to XML.
Attributes: {"timestamp":{"#id":"1","#text":"2016-11-15T01:07:14"}}.
But in this case the current JSON token value gets appended as-is to the XML DOM in the method DeserializeValue():
private void DeserializeValue(JsonReader reader, IXmlDocument document, XmlNamespaceManager manager, string propertyName, IXmlNode currentNode)
{
switch (propertyName)
{
case TextName:
currentNode.AppendChild(document.CreateTextNode(reader.Value.ToString()));
break;
As you can see, the conversion logic is missing and ToString() is used instead. That's the bug.
Replacing that line with the following fixes your problem:
currentNode.AppendChild(document.CreateTextNode(ConvertTokenToXmlValue(reader)));
I need to remove the outer node of a JSON. So an example would be :
{
app: {
...
}
}
Any ideas on how to remove the outer node, so we get only
{
...
}
WITHOUT using JSON.NET, only tools in the .NET Framework (C#).
In Json.NET I used:
JObject.Parse(json).SelectToken("app").ToString();
Alternatively, any configuration of the DataContractJsonSerializer, so that it ignores the root when deserializing, would also work. The way I do the desrialization now is:
protected T DeserializeJsonString<T>(string jsonString)
{
T tempObject = default(T);
using (var memoryStream = new MemoryStream(Encoding.Unicode.GetBytes(jsonString)))
{
var serializer = new DataContractJsonSerializer(typeof(T));
tempObject = (T)serializer.ReadObject(memoryStream);
}
return tempObject;
}
Note that the root object's property name can differ from case to case. For example it can be "transaction".
Thanks for any suggestion.
There is no equivalent to SelectToken built into .Net. But if you simply want to unwrap an outer root node and do not know the node name in advance, you have the following options.
If you are using .Net 4.5 or later, you can deserialize to a Dictionary<string, T> with DataContractJsonSerializer.UseSimpleDictionaryFormat = true:
protected T DeserializeNestedJsonString<T>(string jsonString)
{
using (var memoryStream = new MemoryStream(Encoding.Unicode.GetBytes(jsonString)))
{
var serializer = new DataContractJsonSerializer(typeof(Dictionary<string, T>));
serializer.UseSimpleDictionaryFormat = true;
var dictionary = (Dictionary<string, T>)serializer.ReadObject(memoryStream);
if (dictionary == null || dictionary.Count == 0)
return default(T);
else if (dictionary.Count == 1)
return dictionary.Values.Single();
else
{
throw new InvalidOperationException("Root object has too many properties");
}
}
}
Note that if your root object contains more than one property, you cannot deserialize to a Dictionary<TKey, TValue> to get the first property since the order of the items in this class is undefined.
On any version of .Net that supports the data contract serializers, you can take advantage of the fact that DataContractJsonSerializer inherits from XmlObjectSerializer to call JsonReaderWriterFactory.CreateJsonReader() to create an XmlReader that actually reads JSON, then skip forward to the first nested "element":
protected T DeserializeNestedJsonStringWithReader<T>(string jsonString)
{
var reader = JsonReaderWriterFactory.CreateJsonReader(Encoding.Unicode.GetBytes(jsonString), System.Xml.XmlDictionaryReaderQuotas.Max);
int elementCount = 0;
while (reader.Read())
{
if (reader.NodeType == System.Xml.XmlNodeType.Element)
elementCount++;
if (elementCount == 2) // At elementCount == 1 there is a synthetic "root" element
{
var serializer = new DataContractJsonSerializer(typeof(T));
return (T)serializer.ReadObject(reader, false);
}
}
return default(T);
}
This technique looks odd (parsing JSON with an XmlReader?), but with some extra work it should be possible to extend this idea to create SAX-like parsing functionality for JSON that is similar to SelectToken(), skipping forward in the JSON until a desired property is found, then deserializing its value.
For instance, to select and deserialize specific named properties, rather than just the first root property, the following can be used:
public static class DataContractJsonSerializerExtensions
{
public static T DeserializeNestedJsonProperty<T>(string jsonString, string rootPropertyName)
{
// Check for count == 2 because there is a synthetic <root> element at the top.
Predicate<Stack<string>> match = s => s.Count == 2 && s.Peek() == rootPropertyName;
return DeserializeNestedJsonProperties<T>(jsonString, match).FirstOrDefault();
}
public static IEnumerable<T> DeserializeNestedJsonProperties<T>(string jsonString, Predicate<Stack<string>> match)
{
DataContractJsonSerializer serializer = null;
using (var reader = JsonReaderWriterFactory.CreateJsonReader(Encoding.UTF8.GetBytes(jsonString), XmlDictionaryReaderQuotas.Max))
{
var stack = new Stack<string>();
while (reader.Read())
{
if (reader.NodeType == System.Xml.XmlNodeType.Element)
{
stack.Push(reader.Name);
if (match(stack))
{
serializer = serializer ?? new DataContractJsonSerializer(typeof(T));
yield return (T)serializer.ReadObject(reader, false);
}
if (reader.IsEmptyElement)
stack.Pop();
}
else if (reader.NodeType == XmlNodeType.EndElement)
{
stack.Pop();
}
}
}
}
}
See Mapping Between JSON and XML for details on how JsonReaderWriterFactory maps JSON to XML.
I'm trying to split very large JSON files into smaller files for a given array. For example:
{
"headerName1": "headerVal1",
"headerName2": "headerVal2",
"headerName3": [{
"element1Name1": "element1Value1"
},
{
"element2Name1": "element2Value1"
},
{
"element3Name1": "element3Value1"
},
{
"element4Name1": "element4Value1"
},
{
"element5Name1": "element5Value1"
},
{
"element6Name1": "element6Value1"
}]
}
...down to { "elementNName1": "elementNValue1" } where N is a large number
The user provides the name which represents the array to be split (in this example "headerName3") and the number of array objects per file, e.g. 1,000,000
This would result in N files each containing the top name:value pairs (headerName1, headerName3) and up to 1,000,000 of the headerName3 objects in each file.
I'm using the excellent Newtonsof JSON.net and understand that I need to do this using a stream.
So far I have looked a reading in JToken objects to establish where the PropertyName == "headerName3" occurs when reading in the tokens but what I would like to do is then read in the entire JSON object for each object in the array and not have to continue parsing JSON into JTokens;
Here's a snippet of the code I am building so far:
using (StreamReader oSR = File.OpenText(strInput))
{
using (var reader = new JsonTextReader(oSR))
{
while (reader.Read())
{
if (reader.TokenType == JsonToken.StartObject)
{
intObjectCount++;
}
else if (reader.TokenType == JsonToken.EndObject)
{
intObjectCount--;
if (intObjectCount == 1)
{
intArrayRecordCount++;
// Here I want to read the entire object for this record into an untyped JSON object
if( intArrayRecordCount % 1000000 == 0)
{
//write these to the split file
}
}
}
}
}
}
I don't know - and in fact, and am not concerned with - the structure of the JSON itself, and the objects can be of varying structures within the array. I am therefore not serializing to classes.
Is this the right approach? Is there a set of methods in the JSON.net library I can easily use to perform such operation?
Any help appreciated.
You can use JsonWriter.WriteToken(JsonReader reader, true) to stream individual array entries and their descendants from a JsonReader to a JsonWriter. You can also use JProperty.Load(JsonReader reader) and JProperty.WriteTo(JsonWriter writer) to read and write entire properties and their descendants.
Using these methods, you can create a state machine that parses the JSON file, iterates through the root object, loads "prefix" and "postfix" properties, splits the array property, and writes the prefix, array slice, and postfix properties out to new file(s).
Here's a prototype implementation that takes a TextReader and a callback function to create sequential output TextWriter objects for the split file:
enum SplitState
{
InPrefix,
InSplitProperty,
InSplitArray,
InPostfix,
}
public static void SplitJson(TextReader textReader, string tokenName, long maxItems, Func<int, TextWriter> createStream, Formatting formatting)
{
List<JProperty> prefixProperties = new List<JProperty>();
List<JProperty> postFixProperties = new List<JProperty>();
List<JsonWriter> writers = new List<JsonWriter>();
SplitState state = SplitState.InPrefix;
long count = 0;
try
{
using (var reader = new JsonTextReader(textReader))
{
bool doRead = true;
while (doRead ? reader.Read() : true)
{
doRead = true;
if (reader.TokenType == JsonToken.Comment || reader.TokenType == JsonToken.None)
continue;
if (reader.Depth == 0)
{
if (reader.TokenType != JsonToken.StartObject && reader.TokenType != JsonToken.EndObject)
throw new JsonException("JSON root container is not an Object");
}
else if (reader.Depth == 1 && reader.TokenType == JsonToken.PropertyName)
{
if ((string)reader.Value == tokenName)
{
state = SplitState.InSplitProperty;
}
else
{
if (state == SplitState.InSplitProperty)
state = SplitState.InPostfix;
var property = JProperty.Load(reader);
doRead = false; // JProperty.Load() will have already advanced the reader.
if (state == SplitState.InPrefix)
{
prefixProperties.Add(property);
}
else
{
postFixProperties.Add(property);
}
}
}
else if (reader.Depth == 1 && reader.TokenType == JsonToken.StartArray && state == SplitState.InSplitProperty)
{
state = SplitState.InSplitArray;
}
else if (reader.Depth == 1 && reader.TokenType == JsonToken.EndArray && state == SplitState.InSplitArray)
{
state = SplitState.InSplitProperty;
}
else if (state == SplitState.InSplitArray && reader.Depth == 2)
{
if (count % maxItems == 0)
{
var writer = new JsonTextWriter(createStream(writers.Count)) { Formatting = formatting };
writers.Add(writer);
writer.WriteStartObject();
foreach (var property in prefixProperties)
property.WriteTo(writer);
writer.WritePropertyName(tokenName);
writer.WriteStartArray();
}
count++;
writers.Last().WriteToken(reader, true);
}
else
{
throw new JsonException("Internal error");
}
}
}
foreach (var writer in writers)
using (writer)
{
writer.WriteEndArray();
foreach (var property in postFixProperties)
property.WriteTo(writer);
writer.WriteEndObject();
}
}
finally
{
// Make sure files are closed in the event of an exception.
foreach (var writer in writers)
using (writer)
{
}
}
}
This method leaves all the files open until the end in case "postfix" properties, appearing after the array property, need to be appended. Be aware that there is a limit of 16384 open files at one time, so if you need to create more split files, this won't work. If postfix properties are never encountered in practice, you can just close each file before opening the next and throw an exception in case any postfix properties are found. Otherwise you may need to parse the large file in two passes or close and reopen the split files to append them.
Here is an example of how to use the method with an in-memory JSON string:
private static void TestSplitJson(string json, string tokenName)
{
var builders = new List<StringBuilder>();
using (var reader = new StringReader(json))
{
SplitJson(reader, tokenName, 2, i => { builders.Add(new StringBuilder()); return new StringWriter(builders.Last()); }, Formatting.Indented);
}
foreach (var s in builders.Select(b => b.ToString()))
{
Console.WriteLine(s);
}
}
Prototype fiddle.
I have a JSON string that I expect to contain duplicate keys that I am unable to make JSON.NET happy with.
I was wondering if anybody knows the best way (maybe using JsonConverter? ) to get JSON.NET to change a JObject's child JObjects into to JArrays when it sees duplicate key names ?
// For example: This gives me a JObject with a single "JProperty\JObject" child.
var obj = JsonConvert.DeserializeObject<object>("{ \"HiThere\":1}");
// This throws:
// System.ArgumentException : Can not add Newtonsoft.Json.Linq.JValue to Newtonsoft.Json.Linq.JObject.
obj = JsonConvert.DeserializeObject<object>("{ \"HiThere\":1, \"HiThere\":2, \"HiThere\":3 }");
The actual JSON I am trying to deserialize is much more complicated and the duplicates are nested at multiple levels. But the code above demonstrates why it fails for me.
I understand that the JSON is not correct which is why I am asking if JSON.NET has a way to work around this. For argument's sake let's say I do not have control over the JSON. I actually do use a specific type for the parent object but the particular property that is having trouble will either be a string or another nested JSON object. The failing property type is "object" for this reason.
Interesting question. I played around with this for a while and discovered that while a JObject cannot contain properties with duplicate names, the JsonTextReader used to populate it during deserialization does not have such a restriction. (This makes sense if you think about it: it's a forward-only reader; it is not concerned with what it has read in the past). Armed with this knowledge, I took a shot at writing some code that will populate a hierarchy of JTokens, converting property values to JArrays as necessary if a duplicate property name is encountered in a particular JObject. Since I don't know your actual JSON and requirements, you may need to make some adjustments to it, but it's something to start with at least.
Here's the code:
public static JToken DeserializeAndCombineDuplicates(JsonTextReader reader)
{
if (reader.TokenType == JsonToken.None)
{
reader.Read();
}
if (reader.TokenType == JsonToken.StartObject)
{
reader.Read();
JObject obj = new JObject();
while (reader.TokenType != JsonToken.EndObject)
{
string propName = (string)reader.Value;
reader.Read();
JToken newValue = DeserializeAndCombineDuplicates(reader);
JToken existingValue = obj[propName];
if (existingValue == null)
{
obj.Add(new JProperty(propName, newValue));
}
else if (existingValue.Type == JTokenType.Array)
{
CombineWithArray((JArray)existingValue, newValue);
}
else // Convert existing non-array property value to an array
{
JProperty prop = (JProperty)existingValue.Parent;
JArray array = new JArray();
prop.Value = array;
array.Add(existingValue);
CombineWithArray(array, newValue);
}
reader.Read();
}
return obj;
}
if (reader.TokenType == JsonToken.StartArray)
{
reader.Read();
JArray array = new JArray();
while (reader.TokenType != JsonToken.EndArray)
{
array.Add(DeserializeAndCombineDuplicates(reader));
reader.Read();
}
return array;
}
return new JValue(reader.Value);
}
private static void CombineWithArray(JArray array, JToken value)
{
if (value.Type == JTokenType.Array)
{
foreach (JToken child in value.Children())
array.Add(child);
}
else
{
array.Add(value);
}
}
And here's a demo:
class Program
{
static void Main(string[] args)
{
string json = #"
{
""Foo"" : 1,
""Foo"" : [2],
""Foo"" : [3, 4],
""Bar"" : { ""X"" : [ ""A"", ""B"" ] },
""Bar"" : { ""X"" : ""C"", ""X"" : ""D"" },
}";
using (StringReader sr = new StringReader(json))
using (JsonTextReader reader = new JsonTextReader(sr))
{
JToken token = DeserializeAndCombineDuplicates(reader);
Dump(token, "");
}
}
private static void Dump(JToken token, string indent)
{
Console.Write(indent);
if (token == null)
{
Console.WriteLine("null");
return;
}
Console.Write(token.Type);
if (token is JProperty)
Console.Write(" (name=" + ((JProperty)token).Name + ")");
else if (token is JValue)
Console.Write(" (value=" + token.ToString() + ")");
Console.WriteLine();
if (token.HasValues)
foreach (JToken child in token.Children())
Dump(child, indent + " ");
}
}
Output:
Object
Property (name=Foo)
Array
Integer (value=1)
Integer (value=2)
Integer (value=3)
Integer (value=4)
Property (name=Bar)
Array
Object
Property (name=X)
Array
String (value=A)
String (value=B)
Object
Property (name=X)
Array
String (value=C)
String (value=D)
Brian Rogers - Here is the helper function of the JsonConverter that I wrote. I modified it based on your comments about how a JsonTextReader is just a forward-reader doesn't care about duplicate values.
private static object GetObject(JsonReader reader)
{
switch (reader.TokenType)
{
case JsonToken.StartObject:
{
var dictionary = new Dictionary<string, object>();
while (reader.Read() && (reader.TokenType != JsonToken.EndObject))
{
if (reader.TokenType != JsonToken.PropertyName)
throw new InvalidOperationException("Unknown JObject conversion state");
string propertyName = (string) reader.Value;
reader.Read();
object propertyValue = GetObject(reader);
object existingValue;
if (dictionary.TryGetValue(propertyName, out existingValue))
{
if (existingValue is List<object>)
{
var list = existingValue as List<object>;
list.Add(propertyValue);
}
else
{
var list = new List<object> {existingValue, propertyValue};
dictionary[propertyName] = list;
}
}
else
{
dictionary.Add(propertyName, propertyValue);
}
}
return dictionary;
}
case JsonToken.StartArray:
{
var list = new List<object>();
while (reader.Read() && (reader.TokenType != JsonToken.EndArray))
{
object propertyValue = GetObject(reader);
list.Add(propertyValue);
}
return list;
}
default:
{
return reader.Value;
}
}
}
You should not be using a generic type of object, it should be a more specific type.
However you json is malformed which is you rmain problem
You have :
"{ \"HiThere\":1, \"HiThere\":2, \"HiThere\":3 }"
But it should be:
"{"HiTheres": [{\"HiThere\":1}, {\"HiThere\":2}, {\"HiThere\":3} ]}"
Or
"{ \"HiThereOne\":1, \"HiThereTwo\":2, \"HiThereThree\":3 }"
You json is one object with 3 fields with all the same name ("HiThere").
Which wont work.
The json I have shown gives:
An array (HiTheres) of three objects each with a field of HiThere
Or
One object with three field with different names. (HiThereOne, HiThereTwo, "HiThereThree)
Have a look at http://jsoneditoronline.org/index.html
And http://json.org/