Lightest serialization method for simple data structure - c#

I'm having a simple data structure which I want to serialize without adding too much overhead.
Which approach you consider the best in terms of data size?
Custom serialization/deserialization using separators as "#" or other character I am 100% is not present in my data
XmlSerialization
JSON
Other
I'm using custom serialization with # as separator because I'm 100% sure I don't have that character in my data.
Data structure example:
string Title
int ChapterIndex
List<String> Paragraphs
I have a list of the object above
No optimization (tabs and spaces)
JSON:
[
{
"title": "some title 0",
"chapterIndex": 0,
"paragraphs": ["p1", "p2", "p3", "p4"]
},
{
"title": "some title 1",
"chapterIndex": 1,
"paragraphs": ["p1chap1", "p2chap1", "p3chap1", "p4chap1"]
}
]
XML:
<RootTag>
<item title="some title 0" chapterIndex="0">
<paragraph>p1</paragraph>
<paragraph>p2</paragraph>
<paragraph>p3</paragraph>
<paragraph>p4</paragraph>
</item>
<item title="some title 1" chapterIndex="1">
<paragraph>p1chap1</paragraph>
<paragraph>p2chap1</paragraph>
<paragraph>p3chap1</paragraph>
<paragraph>p4chap1</paragraph>
</item>
</RootTag>
Optimized (no unnecessary characters)
JSON:
[{"title":"some title 0","chapterIndex":0,"paragraphs":["p1","p2","p3","p4"]},{"title":"some title 1","chapterIndex":1,"paragraphs":["p1chap1","p2chap1","p3chap1","p4chap1"]}]
XML:
<RootTag><item title="some title 0" chapterIndex="0"><paragraph>p1</paragraph><paragraph>p2</paragraph><paragraph>p3</paragraph><paragraph>p4</paragraph></item><item title="some title 1" chapterIndex="1"><paragraph>p1chap1</paragraph><paragraph>p2chap1</paragraph><paragraph>p3chap1</paragraph><paragraph>p4chap1</paragraph></item></RootTag>
Custom:
some title 0##0##p1#p2#p3#p4###some title 1##1##p1chap1#p2chap1#p3chap1#p4chap1###and_so_on
Custom optimized:
some title 0§0§p1#p2#p3#p4¤some title 1§1§p1chap1#p2chap1#p3chap1#p4chap1¤and_so_on
having
¤ as list item separator
§ as properties inside item separator
# as paragraph content separator
UPDATE:
In my case I have strings more than integers since it's kind of a book/lyrics application which only needs title chapternumber/lyricId and all the paragraphs of the lyrics.

After trying all serialization types provided in answers I've come to an answer to my question.
When and only when the requirements are these:
Data size priority
Simple data structure
Custom Serialization and Deserialization required
Know the content of your data to choose your separators properly
only under these conditions is a win to use a Custom serialization as shown in my question.
About performance?
It all depends on how you write your de/serialization methods.

It is complex to decide. If your classes are composed primarily of strings, then your approach is the better one. The only "more better" approach would be to compress the resulting stream (something that you can still do after creating the serialized data).
If your data is primarily numeric/non-string, then BinaryFormatter/protobuf are binary serializers, and their output should be smaller than your serializer, because you use 5 bytes to serialize 10000, while a binary serializer will probably use only 2-4 bytes.
Json and xml serializer will surely produce bigger serialized data, because they are both "textual" (so they serialize the number 10000 as 10000) (as is your serializer) and they include additional markup that, being non-empty, is by definition non-smaller than a single character.
Now... is it better to write a custom serializer or to use protobuf? I'll say that I would trust more a serializer written by Marc Gravell (protobuf) and based on a "standard" created by google than a serializer written by me :-) As it is now you are serializing integer numbers and strings... But perhaps tomorrow you'll need to serialize DateTime or float or other complex types. Are 100 less bytes better than the hours you'll need to use to implement correctly the serialization? It is you that have to decide it.
An example with Protobuf:
[ProtoContract]
public class MyObject
{
[ProtoMember(1)]
public string title { get; set; }
[ProtoMember(2)]
public int chapterIndex { get; set; }
[ProtoMember(3)]
public List<String> paragraphs { get; set; }
}
var myo = new[]
{
new MyObject
{
title = "some title 0",
chapterIndex = 0,
paragraphs = new List<string> { "p1", "p2", "p3", "p4" }
},
new MyObject
{
title = "some title 1",
chapterIndex = 1,
paragraphs = new List<string> { "p1chap1", "p2chap1", "p3chap1", "p4chap1" }
},
};
byte[] bytes;
using (var ms = new MemoryStream())
{
Serializer.Serialize(ms, myo);
bytes = ms.ToArray();
}
using (var ms = new MemoryStream(bytes))
{
MyObject[] myo2 = Serializer.Deserialize<MyObject[]>(ms);
}
The length of the byte[] is 86, so just a little longer than your custom formatter (81). But note that this is with a single numeric field, and you used single digit numbers. The point is that protobuf is still probably better, because it has been written by a professional, and doesn't have limitations as your serializer.

In my opinion JSON is the simplest method and also generates no overhead.
Here: http://json.org/example you can see the difference between JSON and XML.
And the JSON Parser will do everything automatically for you.

I've been using google's proto-buf serialization in my project and it's one of the lightest so far.

Related

c# ignoring { in strings

I'm trying to learn c# Json.net and I want to create a JSON multiline string that includes string declarations, but the first curly bracket which is meant to be part of the JSON dictionary is including itself in the declaration. It errors me on the 'name', if anyone can please give me a solution that would be great.
here is the code.
string Name = "'Name'";
string Is_Airing = "True";
string Genre_One = "'Yes'";
string Genre_Two = "'No'";
string Json_String = $#"{
'Name': '{Name}',
'Is_Airing': {Is_Airing},
'Genres': [
'{Genre_One}',
'{Genre_Two}'
]
}";
If your heart is set on DIY, double up the brackets to escape {{, but you're really [doing a poor job of] reinventing the wheel compared to using a serializer and chucking something like an anonymous or proper type into it:
//newtonsoft
JsonConvert.SerializeObject(new {
Name, //gets name of property from name of variable
Is_Airing = MyIsAiringVariableName, //specifies name of property in anonymous type
Genres = new []{
Genre_One,
Genre_Two
}
});
'Name': '{Name}',
JSON uses double quotes, by the way.. And typically uses camelCase names. Usign a serializer will ensure better compliance with standard JSON ("be strict in what you send and liberal in what you accept")
For more control over how the output JSON appears, you set options on the serializer (such as passing Formatting.Indented as the second argument to SerializeObject), or decorate your properties with attributes
The Newtonsoft documentation is quite comprehensive and includes useful samples to get you going: https://www.newtonsoft.com/json/help/html/SerializeObject.htm

C# : How to parse EDIFACT message using Xml Serializer

i have this kind EDIFACT message.
UNB+IATB:1+NGI+OOS+180918:2003+Export_Dump++TR2+X'
UNH+1+IFLIRR:15:2:1A'
FDR+OM+135+160918'
FDD++INT'
REF'
STX+ACT'
IFD+++C+USD++N'
APD+:::::::ULN:SVO'
DAT+708:160918:0915+707:160918:1055'
STX+FD'
EQP+J+76W::EIFGN+OM'
EQI+++++++:::FGN'
EQD++++++A01'
SSQ+AVIH:5:5::::0:SSR'
SSQ+BIKE:5:5::::0:SSR'
SSQ+BSCT:2:2::::0:SSR+J'
SSQ+BSCT:5:3::::2:SSR+Y'
SSQ+INFT:15:10::::5:SSR'
SSQ+PETC:1:1::::0:SSR+J'
SSQ+PETC:3:3::::0:SSR+Y'
SSQ+POXY:1:1::::0:SSR'
SSQ+SPEQ:5:5::::0:SSR'
SSQ+STCR:0:0::::0:SSR+J'
SSQ+STCR:1:1::::0:SSR+Y'
SSQ+SVAN:1:1::::0:SSR+J'
SSQ+SVAN:3:3::::0:SSR+Y'
SSQ+TVLG:5:5::::0:SSR'
SSQ+TVSM:10:10::::0:SSR'
SSQ+UMNR:5:5::::0:SSR'
SSQ+WCOB:0:0::::0:SSR'
LEG+A01+NXC'
EQI+J:24:S+J:21:A+J:24:O+J:21:E'
This message continues more than about 1 million line.
I have used C# Xml Serializer and successfully parsed this message into XML file. But not correct structure.
Here's my code:
switch (keyword)
{
case "UNB":
parts = specificLine.Split(new char[] { '+', ':' }, StringSplitOptions.RemoveEmptyEntries);
serialization = new XmlSerializer(typeof(UNB));
UNB HeaderText = new UNB(parts[1], parts[2], parts[3], parts[4], parts[5], parts[6]);
writer = XmlWriter.Create(TxtWriter, settings);
serialization.Serialize(writer, HeaderText, EmptyNS);
break;
case "UNH":
parts = specificLine.Split(new char[] { '+', ':' }, StringSplitOptions.RemoveEmptyEntries);
serialization = new XmlSerializer(typeof(UNH));
UNH BodyText = new UNH(parts[1],parts[2],parts[3],parts[4],parts[5]);
writer = XmlWriter.Create(TxtWriter, settings);
serialization.Serialize(writer, BodyText, EmptyNS);
break;
case "FDR":
flightDateInformation Gr0 = new flightDateInformation();
parts = specificLine.Split(new char[] { '+'}, StringSplitOptions.RemoveEmptyEntries);
serialization = new XmlSerializer(typeof(flightDateInformation));
flightDateDesignator fdrbody = new flightDateDesignator(parts[1], parts[2], parts[3]);
Gr0.flightDateDesignator = fdrbody;
writer = XmlWriter.Create(TxtWriter, settings);
serialization.Serialize(writer, Gr0, EmptyNS);
break;
}
and this is my structure class code example:
[XmlRoot(ElementName = "UNB", IsNullable = false), Serializable]
public class UNB
{
[XmlAttribute]
public string identifier;
[XmlAttribute]
public string version;
[XmlAttribute]
public string sender;
[XmlAttribute]
public string recipient;
[XmlAttribute]
public string dateofpreparation;
[XmlAttribute]
public string timeofpreparation;
public UNB(string identifier, string version,string sender, string recipient, string dateofpreparation, string timeofpreparation)
{
this.identifier = identifier;
this.version = version;
this.sender = sender;
this.recipient = recipient;
this.dateofpreparation = dateofpreparation;
this.timeofpreparation = timeofpreparation;
}
public UNB()
{
}
}
And my output XML file like this :
<UNB identifier="IATB" version="1" sender="NGI" recipient="OOS" dateofpreparation="180918" timeofpreparation="2003" /><UNH identifier="1" type="IFLIRR" version="15" release="2" agency="1A" /><flightDateInformation>
<flightDateDesignator airlineCode="OM" flightNumber="135" departureDate="160918" />
</flightDateInformation><flightLevelInfo flightCharacteristics="INT" /><referenceInfomation /><flightFlags statusIndicator="ACT" /><inventoryParametersFD controlType="C" currencyCode="USD" isUnderActiveRevControl="N" /><additionalproductdetails>
<departureLocation>ULN</departureLocation>
<arrivalLocation>SVO</arrivalLocation>
</additionalproductdetails><scheduledTiming>
<qualifier>708</qualifier>
<date>160918</date>
<time>0915</time>
</scheduledTiming><scheduledTiming>
<qualifier>707</qualifier>
<date>160918</date>
<time>1055</time>
</scheduledTiming><dcsInformation statusIndicator="FD" /><aircraftInformation serviceType="J" aircraftType="76W">
<eqtRegistrationNumber>EIFGN</eqtRegistrationNumber>
<aircraftOwner>OM</aircraftOwner>
</aircraftInformation><acvInformation acvCode="FGN" /><saleableConfiguration configurationCode="A01" />
<newSSR quotaCounterName="AVIH">
<maxQuantity>5</maxQuantity>
<availability>5</availability>
<counter>0</counter>
<quotaType>SSR</quotaType>
</newSSR><newSSR quotaCounterName="BIKE">
<maxQuantity>5</maxQuantity>
<availability>5</availability>
<counter>0</counter>
<quotaType>SSR</quotaType>
</newSSR>
<newSSR quotaCounterName="BSCT" cabinCode="J">
<maxQuantity>2</maxQuantity>
<availability>2</availability>
<counter>0</counter>
<quotaType>SSR</quotaType>
</newSSR>
Now my problem is : Yes my code has worked and parsed successfully into XML file. But not as i want. Each node with only 1 line.
It's my wanted structure.
Each node has included to other parent node. Some nodes expand into other nodes. my output XML don't have any parent.
Can i solve this by improving my code or should try different way?
If you have any need more details, please kindly ask me? i will give you more details
UPDATE: I'm resolved this problem.
This question is very broad. Basically you have to understand the format, then write a software to extract and convert it to your desired format. Luckily you are not the first one with this problem and there are openSource solutions available:
Is there any good open source EDIFACT parser in Java?
I would want to see a specification of the input format, not just an example, before tackling this task, especially as the quantity of data to be converted is too large to check the correctness of the result by visual inspection.
I think you are on the right lines, however: first do a crude parse of the input that produces some kind of XML representation. Then use XML tools (specifically, XSLT) to transform this crude XML into the target XML that you actually want.
I can't tell from your "actual output" and the diagram of your "desired output" what the detailed transformation rules are, but it's likely to be some kind of grouping transformation to create a hierarchic structure from a flat structure. That's a common task in XSLT and is best tacked by getting hold of an XSLT 2.0 (or 3.0) processor and using the <xsl:for-each-group> instruction. For example, if your task is to put wrapper elements around adjacent elements having the same name, you could do:
<xsl:for-each-group select="*" group-adjacent="name()">
<xsl:choose>
<xsl:when test="name()="SSR">
<SSR-LIST><xsl:copy-of select="current-group()"/></SSR-LIST>
</xsl:when>
....
<xsl:otherwise>
<xsl:copy-of select="current-group()"/>
</xsl:otherwise>
</xsl:choose>
</xsl:for-each-group>
If you want more specific advice on this transformation, I suggest posting a new question with a concrete (and short!) example of the input and output, expressed as XML documents, with a clear relationship between the two.

Parse byte as byte, not string

I receive some JSON from a Java third-party system that contains Avro schemas in JSON format. An example looks like this:
{"type":"record", "name":"AvroRecord", "namespace":"Parent.Namespace", "fields": [{"name":"AvroField", "type":"bytes", "default":"\u00FF"}]}
I parse this JSON to do some C# code generation. The result would look like this:
public partial class AvroRecord
{
[AvroField(Name = "AvroField", Type = "bytes", DefaultValueText = "ÿ")]
public byte[] AvroField { get; set; }
public AvroRecord() { this.AvroField = new byte[] { 255 }; }
}
Eventually, from the C# representation of the schema, I need to infer back the original schema. Once I get that inferred schema, it will be sent over to the original system for comparison. That is why I want to keep the original string value for the default value, since I don't know if:
{"type":"record", "name":"AvroRecord", "namespace":"Parent.Namespace", "fields": [{"name":"AvroField", "type":"bytes", "default":"\u00FF"}]}
and
{"type":"record", "name":"AvroRecord", "namespace":"Parent.Namespace", "fields": [{"name":"AvroField", "type":"bytes", "default":"ÿ"}]}
will result in an exact match or it will have a problem.
I use JSON.NET to convert from the raw schema as a string to something more useful that I can work with:
JToken token = JToken.Parse(schema);
Is there a way in JSON.NET or any other JSON parsing library to control the parsing and copy a value without being parsed? Basically, a way to avoid "\u00FF" becoming "ÿ"

Json.NET, can SerializeXmlNode be extended to detect numbers?

I am converting from XML to JSON using SerializeXmlNode. Looks the expected behavior is to convert all XML values to strings, but I'd like to emit true numeric values where appropriate.
// Input: <Type>1</Type>
string json = JsonConvert.SerializeXmlNode(node, Newtonsoft.Json.Formatting.Indented, true);
// Output: "Type": "1"
// Desired: "Type": 1
Do I need to write a custom converter to do this, or is there a way to hook into the serialization process at the appropriate points, through delegates perhaps? Or, must I write my own custom JsonConverter class to manage the transition?
Regex Hack
Given the complexity of a proper solution, here is another (which I'm not entirely proud of, but it works...).
// Convert to JSON, and remove quotes around numbers
string json = JsonConvert.SerializeXmlNode(node, Newtonsoft.Json.Formatting.Indented, true);
// HACK to force integers as numbers, not strings.
Regex rgx = new Regex("\"(\\d+)\"");
json = rgx.Replace(json, "$1");
XML does not have a way to differentiate primitive types like JSON does. Therefore, when converting XML directly to JSON, Json.Net does not know what types the values should be, short of guessing. If it always assumed that values consisting only of digits were ordinal numbers, then things like postal codes and phone numbers with leading zeros would get mangled in the conversion. It is not surprising, then, that Json.Net takes the safe road and treats all values as string.
One way to work around this issue is to deserialize your XML to an intermediate object, then serialize that to JSON. Since the intermediate object has strongly typed properties, Json.Net knows what to output. Here is an example:
class Program
{
static void Main(string[] args)
{
string xml = #"<root><ordinal>1</ordinal><postal>02345</postal></root>";
XmlSerializer xs = new XmlSerializer(typeof(Intermediary));
using (TextReader reader = new StringReader(xml))
{
Intermediary obj = (Intermediary)xs.Deserialize(reader);
string json = JsonConvert.SerializeObject(obj , Formatting.Indented);
Console.WriteLine(json);
}
}
}
[XmlRoot("root")]
public class Intermediary
{
public int ordinal { get; set; }
public string postal { get; set; }
}
Output of the above:
{
"ordinal": 1,
"postal": "02345"
}
To make a more generic solution, yes, you'd have to write your own converter. In fact, the XML-to-JSON conversion that takes place when calling SerializeXmlNode is done using an XmlNodeConverter that ships with Json.Net. This converter itself does not appear to be very extensible, but you could always use its source code as a starting point to creating your own.

Best way to deserialize a long string (response of an external web service)

I am querying a web service that was built by another developer. It returns a result set in a JSON-like format. I get three column values (I already know what the ordinal position of each column means):
[["Boston","142","JJK"],["Miami","111","QLA"],["Sacramento","042","PPT"]]
In reality, this result set can be thousands of records long.
What's the best way to parse this string?
I guess a JSON deserializer would be nice, but what is a good one to use in C#/.NET? I'm pretty sure the System.Runtime.Serialization.Json serializer won't work.
Using the built in libraries for asp.net (System.Runtime.Serialization and System.ServiceModel.Web) you can get what you want pretty easily:
string[][] parsed = null;
var jsonStr = #"[[""Boston"",""142"",""JJK""],[""Miami"",""111"",""QLA""],[""Sacramento"",""042"",""PPT""]]";
using (var ms = new System.IO.MemoryStream(System.Text.Encoding.Default.GetBytes(jsonStr)))
{
var serializer = new System.Runtime.Serialization.Json.DataContractJsonSerializer(typeof(string[][]));
parsed = serializer.ReadObject(ms) as string[][];
}
A little more complex example (which was my original answer)
First make a dummy class to use for serialization. It just needs one member to hold the result which should be of type string[][].
[DataContract]
public class Result
{
[DataMember(Name="d")]
public string[][] d { get; set; }
}
Then it's as simple as wrapping your result up like so: { "d": /your results/ }. See below for an example:
Result parsed = null;
var jsonStr = #"[[""Boston"",""142"",""JJK""],[""Miami"",""111"",""QLA""],[""Sacramento"",""042"",""PPT""]]";
using (var ms = new MemoryStream(Encoding.Default.GetBytes(string.Format(#"{{ ""d"": {0} }}", jsonStr))))
{
var serializer = new System.Runtime.Serialization.Json.DataContractJsonSerializer(typeof(Result));
parsed = serializer.ReadObject(ms) as Result;
}
How about this?
It sounds like you have a pretty simple format that you could write a custom parser for, since you don't always want to wait for it to parse and return the entire thing before it uses it.
I would just write a recursive parser that looks for the tokens "[", ",", "\"", and "]" and does the appropriate thing.

Categories

Resources