Json.NET, can SerializeXmlNode be extended to detect numbers? - c#

I am converting from XML to JSON using SerializeXmlNode. Looks the expected behavior is to convert all XML values to strings, but I'd like to emit true numeric values where appropriate.
// Input: <Type>1</Type>
string json = JsonConvert.SerializeXmlNode(node, Newtonsoft.Json.Formatting.Indented, true);
// Output: "Type": "1"
// Desired: "Type": 1
Do I need to write a custom converter to do this, or is there a way to hook into the serialization process at the appropriate points, through delegates perhaps? Or, must I write my own custom JsonConverter class to manage the transition?
Regex Hack
Given the complexity of a proper solution, here is another (which I'm not entirely proud of, but it works...).
// Convert to JSON, and remove quotes around numbers
string json = JsonConvert.SerializeXmlNode(node, Newtonsoft.Json.Formatting.Indented, true);
// HACK to force integers as numbers, not strings.
Regex rgx = new Regex("\"(\\d+)\"");
json = rgx.Replace(json, "$1");

XML does not have a way to differentiate primitive types like JSON does. Therefore, when converting XML directly to JSON, Json.Net does not know what types the values should be, short of guessing. If it always assumed that values consisting only of digits were ordinal numbers, then things like postal codes and phone numbers with leading zeros would get mangled in the conversion. It is not surprising, then, that Json.Net takes the safe road and treats all values as string.
One way to work around this issue is to deserialize your XML to an intermediate object, then serialize that to JSON. Since the intermediate object has strongly typed properties, Json.Net knows what to output. Here is an example:
class Program
{
static void Main(string[] args)
{
string xml = #"<root><ordinal>1</ordinal><postal>02345</postal></root>";
XmlSerializer xs = new XmlSerializer(typeof(Intermediary));
using (TextReader reader = new StringReader(xml))
{
Intermediary obj = (Intermediary)xs.Deserialize(reader);
string json = JsonConvert.SerializeObject(obj , Formatting.Indented);
Console.WriteLine(json);
}
}
}
[XmlRoot("root")]
public class Intermediary
{
public int ordinal { get; set; }
public string postal { get; set; }
}
Output of the above:
{
"ordinal": 1,
"postal": "02345"
}
To make a more generic solution, yes, you'd have to write your own converter. In fact, the XML-to-JSON conversion that takes place when calling SerializeXmlNode is done using an XmlNodeConverter that ships with Json.Net. This converter itself does not appear to be very extensible, but you could always use its source code as a starting point to creating your own.

Related

c# ignoring { in strings

I'm trying to learn c# Json.net and I want to create a JSON multiline string that includes string declarations, but the first curly bracket which is meant to be part of the JSON dictionary is including itself in the declaration. It errors me on the 'name', if anyone can please give me a solution that would be great.
here is the code.
string Name = "'Name'";
string Is_Airing = "True";
string Genre_One = "'Yes'";
string Genre_Two = "'No'";
string Json_String = $#"{
'Name': '{Name}',
'Is_Airing': {Is_Airing},
'Genres': [
'{Genre_One}',
'{Genre_Two}'
]
}";
If your heart is set on DIY, double up the brackets to escape {{, but you're really [doing a poor job of] reinventing the wheel compared to using a serializer and chucking something like an anonymous or proper type into it:
//newtonsoft
JsonConvert.SerializeObject(new {
Name, //gets name of property from name of variable
Is_Airing = MyIsAiringVariableName, //specifies name of property in anonymous type
Genres = new []{
Genre_One,
Genre_Two
}
});
'Name': '{Name}',
JSON uses double quotes, by the way.. And typically uses camelCase names. Usign a serializer will ensure better compliance with standard JSON ("be strict in what you send and liberal in what you accept")
For more control over how the output JSON appears, you set options on the serializer (such as passing Formatting.Indented as the second argument to SerializeObject), or decorate your properties with attributes
The Newtonsoft documentation is quite comprehensive and includes useful samples to get you going: https://www.newtonsoft.com/json/help/html/SerializeObject.htm

Parse byte as byte, not string

I receive some JSON from a Java third-party system that contains Avro schemas in JSON format. An example looks like this:
{"type":"record", "name":"AvroRecord", "namespace":"Parent.Namespace", "fields": [{"name":"AvroField", "type":"bytes", "default":"\u00FF"}]}
I parse this JSON to do some C# code generation. The result would look like this:
public partial class AvroRecord
{
[AvroField(Name = "AvroField", Type = "bytes", DefaultValueText = "ÿ")]
public byte[] AvroField { get; set; }
public AvroRecord() { this.AvroField = new byte[] { 255 }; }
}
Eventually, from the C# representation of the schema, I need to infer back the original schema. Once I get that inferred schema, it will be sent over to the original system for comparison. That is why I want to keep the original string value for the default value, since I don't know if:
{"type":"record", "name":"AvroRecord", "namespace":"Parent.Namespace", "fields": [{"name":"AvroField", "type":"bytes", "default":"\u00FF"}]}
and
{"type":"record", "name":"AvroRecord", "namespace":"Parent.Namespace", "fields": [{"name":"AvroField", "type":"bytes", "default":"ÿ"}]}
will result in an exact match or it will have a problem.
I use JSON.NET to convert from the raw schema as a string to something more useful that I can work with:
JToken token = JToken.Parse(schema);
Is there a way in JSON.NET or any other JSON parsing library to control the parsing and copy a value without being parsed? Basically, a way to avoid "\u00FF" becoming "ÿ"

Smartly replace strings

I am working with JSON API. As c# doesn't accept characters like - (minus) or . (point), I had to replace each character by _ (underscore). The replacement happens when the JSON response is received as a string so that every attribute name containing a - or a . will have it replaced by a _ , then every attribute name will be the same as the attributes names in the class it will be deserialized into.
To make it clearer, here are some examples:
I recieve the following JSON : { "id": 1, "result": [ { "data": [ { "adm-pass": ""}]}
In the class I want to deserialize into I have this attribute : public String adm_pass {get; set;}
So I replace the minus with an underscore so that the NewtonSoft parser can deserialize it accordingly.
My problem is that I sometimes I get some negative integers in my JSON. So if I do the string replacement in: {"beta" : -1}, I get a parsing exception since the -1 (integer here) becomes _1 and cannot be deserialized properly and raises an exception.
Is there a way to replace the string smartly so I can avoid this error?
For example if - is followed by an int it's not replaced.
If this way does not exist, is there a solution for this kind of problems?
Newtonsoft allows you to specify the exact name of the JSON property, which it will use to serialize/deserialize.
So you should be able to do this
[JsonProperty("adm-pass")]
public String adm_pass { get; set; }
This way you are not restricted to name your properties exactly as the JSON property names. And in your case, you won't need to do a string replace.
Hope this helps.
You'll have to check that you are replacing the key and not the value, maybe by using a regex like http://regexr.com/3d471
Regex could work as wlalele suggests.
But I would create a new object like this:
Create a new object:
var sharpObj = {};
loop through the objects as properties as described here:
Iterate through object properties
for (var property in object) {
if (object.hasOwnProperty(property)) {
// do stuff
}
}
In the // do stuff section, create a property on sharpObj with the desired string replacements and set the property to the same value.
var cleanProperty = cleanPropertyName(property);
sharpObj[cleanProperty] = orginalObject[property];
Note: I assume you can figure out the cleanPropertyName() method or similar.
Stringify the object
var string = JSON.stringify(sharpObj);
You can substring to check whether the next character is an integer, this can adapt into your code easily as you already find a character, as such you could do
int a;
if(int.TryParse(adm_pass.Substring(adm_pass.IndexOf("-") + 1,1),out a))
{
//Code if next character is an int
}
else
{
adm_pass = adm_pass.Replace("-","_");
}
This kind of code can be looped until there are no remaining hyphens/minuses

Lightest serialization method for simple data structure

I'm having a simple data structure which I want to serialize without adding too much overhead.
Which approach you consider the best in terms of data size?
Custom serialization/deserialization using separators as "#" or other character I am 100% is not present in my data
XmlSerialization
JSON
Other
I'm using custom serialization with # as separator because I'm 100% sure I don't have that character in my data.
Data structure example:
string Title
int ChapterIndex
List<String> Paragraphs
I have a list of the object above
No optimization (tabs and spaces)
JSON:
[
{
"title": "some title 0",
"chapterIndex": 0,
"paragraphs": ["p1", "p2", "p3", "p4"]
},
{
"title": "some title 1",
"chapterIndex": 1,
"paragraphs": ["p1chap1", "p2chap1", "p3chap1", "p4chap1"]
}
]
XML:
<RootTag>
<item title="some title 0" chapterIndex="0">
<paragraph>p1</paragraph>
<paragraph>p2</paragraph>
<paragraph>p3</paragraph>
<paragraph>p4</paragraph>
</item>
<item title="some title 1" chapterIndex="1">
<paragraph>p1chap1</paragraph>
<paragraph>p2chap1</paragraph>
<paragraph>p3chap1</paragraph>
<paragraph>p4chap1</paragraph>
</item>
</RootTag>
Optimized (no unnecessary characters)
JSON:
[{"title":"some title 0","chapterIndex":0,"paragraphs":["p1","p2","p3","p4"]},{"title":"some title 1","chapterIndex":1,"paragraphs":["p1chap1","p2chap1","p3chap1","p4chap1"]}]
XML:
<RootTag><item title="some title 0" chapterIndex="0"><paragraph>p1</paragraph><paragraph>p2</paragraph><paragraph>p3</paragraph><paragraph>p4</paragraph></item><item title="some title 1" chapterIndex="1"><paragraph>p1chap1</paragraph><paragraph>p2chap1</paragraph><paragraph>p3chap1</paragraph><paragraph>p4chap1</paragraph></item></RootTag>
Custom:
some title 0##0##p1#p2#p3#p4###some title 1##1##p1chap1#p2chap1#p3chap1#p4chap1###and_so_on
Custom optimized:
some title 0§0§p1#p2#p3#p4¤some title 1§1§p1chap1#p2chap1#p3chap1#p4chap1¤and_so_on
having
¤ as list item separator
§ as properties inside item separator
# as paragraph content separator
UPDATE:
In my case I have strings more than integers since it's kind of a book/lyrics application which only needs title chapternumber/lyricId and all the paragraphs of the lyrics.
After trying all serialization types provided in answers I've come to an answer to my question.
When and only when the requirements are these:
Data size priority
Simple data structure
Custom Serialization and Deserialization required
Know the content of your data to choose your separators properly
only under these conditions is a win to use a Custom serialization as shown in my question.
About performance?
It all depends on how you write your de/serialization methods.
It is complex to decide. If your classes are composed primarily of strings, then your approach is the better one. The only "more better" approach would be to compress the resulting stream (something that you can still do after creating the serialized data).
If your data is primarily numeric/non-string, then BinaryFormatter/protobuf are binary serializers, and their output should be smaller than your serializer, because you use 5 bytes to serialize 10000, while a binary serializer will probably use only 2-4 bytes.
Json and xml serializer will surely produce bigger serialized data, because they are both "textual" (so they serialize the number 10000 as 10000) (as is your serializer) and they include additional markup that, being non-empty, is by definition non-smaller than a single character.
Now... is it better to write a custom serializer or to use protobuf? I'll say that I would trust more a serializer written by Marc Gravell (protobuf) and based on a "standard" created by google than a serializer written by me :-) As it is now you are serializing integer numbers and strings... But perhaps tomorrow you'll need to serialize DateTime or float or other complex types. Are 100 less bytes better than the hours you'll need to use to implement correctly the serialization? It is you that have to decide it.
An example with Protobuf:
[ProtoContract]
public class MyObject
{
[ProtoMember(1)]
public string title { get; set; }
[ProtoMember(2)]
public int chapterIndex { get; set; }
[ProtoMember(3)]
public List<String> paragraphs { get; set; }
}
var myo = new[]
{
new MyObject
{
title = "some title 0",
chapterIndex = 0,
paragraphs = new List<string> { "p1", "p2", "p3", "p4" }
},
new MyObject
{
title = "some title 1",
chapterIndex = 1,
paragraphs = new List<string> { "p1chap1", "p2chap1", "p3chap1", "p4chap1" }
},
};
byte[] bytes;
using (var ms = new MemoryStream())
{
Serializer.Serialize(ms, myo);
bytes = ms.ToArray();
}
using (var ms = new MemoryStream(bytes))
{
MyObject[] myo2 = Serializer.Deserialize<MyObject[]>(ms);
}
The length of the byte[] is 86, so just a little longer than your custom formatter (81). But note that this is with a single numeric field, and you used single digit numbers. The point is that protobuf is still probably better, because it has been written by a professional, and doesn't have limitations as your serializer.
In my opinion JSON is the simplest method and also generates no overhead.
Here: http://json.org/example you can see the difference between JSON and XML.
And the JSON Parser will do everything automatically for you.
I've been using google's proto-buf serialization in my project and it's one of the lightest so far.

Best way to deserialize a long string (response of an external web service)

I am querying a web service that was built by another developer. It returns a result set in a JSON-like format. I get three column values (I already know what the ordinal position of each column means):
[["Boston","142","JJK"],["Miami","111","QLA"],["Sacramento","042","PPT"]]
In reality, this result set can be thousands of records long.
What's the best way to parse this string?
I guess a JSON deserializer would be nice, but what is a good one to use in C#/.NET? I'm pretty sure the System.Runtime.Serialization.Json serializer won't work.
Using the built in libraries for asp.net (System.Runtime.Serialization and System.ServiceModel.Web) you can get what you want pretty easily:
string[][] parsed = null;
var jsonStr = #"[[""Boston"",""142"",""JJK""],[""Miami"",""111"",""QLA""],[""Sacramento"",""042"",""PPT""]]";
using (var ms = new System.IO.MemoryStream(System.Text.Encoding.Default.GetBytes(jsonStr)))
{
var serializer = new System.Runtime.Serialization.Json.DataContractJsonSerializer(typeof(string[][]));
parsed = serializer.ReadObject(ms) as string[][];
}
A little more complex example (which was my original answer)
First make a dummy class to use for serialization. It just needs one member to hold the result which should be of type string[][].
[DataContract]
public class Result
{
[DataMember(Name="d")]
public string[][] d { get; set; }
}
Then it's as simple as wrapping your result up like so: { "d": /your results/ }. See below for an example:
Result parsed = null;
var jsonStr = #"[[""Boston"",""142"",""JJK""],[""Miami"",""111"",""QLA""],[""Sacramento"",""042"",""PPT""]]";
using (var ms = new MemoryStream(Encoding.Default.GetBytes(string.Format(#"{{ ""d"": {0} }}", jsonStr))))
{
var serializer = new System.Runtime.Serialization.Json.DataContractJsonSerializer(typeof(Result));
parsed = serializer.ReadObject(ms) as Result;
}
How about this?
It sounds like you have a pretty simple format that you could write a custom parser for, since you don't always want to wait for it to parse and return the entire thing before it uses it.
I would just write a recursive parser that looks for the tokens "[", ",", "\"", and "]" and does the appropriate thing.

Categories

Resources