I need to binary serialize an object that contains a collection of lots of instances of some base class. Each instance could be any of the derived types, and there are really lots of possible derived types (around one hundred). Therefore I don't really want to modify each of these types so that this object could be serialized.
I would even like to avoid adding the default public constructor to all of these types as this would damage the architecture a little bit and would also be really annoying to do (and not really DRY), not speaking of adding Serializable attribute to every public property in these types. And for the same reason writing a custom serializer/surrogate for each of these types is hardly an option.
What I have tried so far:
BinaryFormatter - requires additional attributes for serialized properties
sharpSerializer - requires public default constructor
protobuf-net - requires addtitional attributes for serialized properties
Net serializer - requires addtitional attributes for serialized properties
I have also tried serializing with Json.NET and then saving the result as a byte array (I know, I know), but I ran into an OutOfMemoryException while serializing. This likely means my object is too heavy for text serialization (it takes around 200Mb in memory, but there are lots of elements in the array, lots of properties in each of the element etc).
If it makes any better - all the objects in the collection I need to save have only public properties (I don't need to serialize fields or private properties). And there also is no particular logic in these objects' constructors - only filling the properies.
Is there any way to achieve the serialization/deserialization without modifying the serialized classes?
I don't even know if it is possible to deserialize an object that does not have parameterless constructor (the restriction required by all of the serializers I've met), but it should be as the reflection allows to create instances without calling constructors.
Related
I'm looking for a Serializer to persist my classes in text format (not binary). But...
I'm already using protobuf for binary serialization. It works pretty fine. As a side note, I would have prefer not to deal with field id (index) like with protobuf.
Before closing or voting to close this question, please consider these points:
The specificity of this question
If other question really apply to my requirements and are not too old
I'm looking for a serializer with the following properties:
Easy to use
Serialize in text (readable) either Json or XML would be fine
Free
Is documented
Support versioning easily (obsolete field, type change, property name change, ...)
Uses Attribute to define items to serialize (or not serialize)
Does not uses an index (ID like Protobuf)
Be able like Protobuf, to deserialize an object directly without any constructor. Be able to instanciate an object either if the object does not have any public constructor and does not have any constructor with no arguments.
Does not require me to change my class or member accessibility, ie:
Does not need default constructor
Can serialize fields
Can skip public property (when marked to do so)
Others points not essential:
The speed is not important
Open source is a nice bonus
Has some examples is a nice bonus
Some examples of what I prefer to not use:
Microsoft XMLSerializer and JsonSerializer does require default constructor.
I have hard time using Microsoft-DataContractSerializer, an easier solution would be welcome.
What I'm doing has already been implemented in Json.net by setting TypeNameHandling to TypeNameHandling.Objects. That way the object type will be serialized as well and the deserialized object will have the exact original type.
However using TypeNameHandling exposes some security issues and requires us to use a custom SerializationBinder to limit which types will be supported to avoid possible code injection. That's not the main reason for me trying to find another solution. Actually I find that by using TypeNameHandling.Objects, an object will be serialized to a complex JSON including not just the object data itself and the object type but also some other properties which look redundant to me.
I think we just need one more property containing info about object type (such as the assembly qualified name of the object type) so I would like to create a custom JsonConverter which will serialize any object to some JSON like this:
{
"Item" : "normal JSON string of object",
"ItemType" : "assembly qualified name of object type"
}
Isn't that just enough? As I said before, besides 2 properties similar to those (with different names), the Json.net lib includes some other properties (signature ...) which really look like redundant to me.
I'm not asking for how to implement the custom JsonConverter I mentioned above. I just wonder if that converter (with a simplified JSON structure) is fine or I should use the standard solution provided by Json.net with TypeNameHandling (which involves a more complex JSON structure)? My main concern is with possible performance issues with TypeNameHandling set to Objects due to more data to convert/serialize/transfer.
One more concern with the standard solution is performance issue, actually I just need to apply the custom converting logic to all objects of the exact type object, not to all other strongly typed objects (which may still be unnecessarily applied by TypeNameHandling?)
I have a few reactions to your proposed design for a polymorphic custom JsonConverter (let's call it PolymorphicConverter<T> where T is the base type):
Regarding security, you wrote,
... using TypeNameHandling exposes some security issues and requires us to use a custom SerializationBinder to limit which types will be supported to avoid possible code injection.
The same security risks that can arise with TypeNameHandling will also arise with PolymorphicConverter<T>.
The risk here is that an attacker tricks some polymorphic deserialization code into instantiating an attack gadget. See TypeNameHandling caution in Newtonsoft Json and External json vulnerable because of Json.Net TypeNameHandling auto? for examples and discussion. If an attacker crafts JSON with an attack gadget type specified in the "ItemType" property supported by your converter, then it may end up instantiating the attack gadget and effecting the attack.
You can reduce your attack surface by only enabling support for polymorphic deserialization for known polymorphic properties or arrays items by applying PolymorphicConverter<T> (or [JsonProperty(TypeNameHandling = TypeNameHandling.All)] for that matter) just to those properties that are actually polymorphic in practice -- but if the polymorphic base type of those properties just happens to be compatible with an attack gadget, you're going to be vulnerable to attack.
Thus, no matter what mechanism is used, you're still going to need something like a custom SerializationBinder to filter out naughty types, no matter the details of how you encode type information in your JSON.
JSON file size. Json.NET encodes type information by adding a single property to the beginning of objects:
"$type" : "assembly qualified name of object type"
Your plan is to instead add:
"ItemType" : "assembly qualified name of object type"
It is unclear why there would be an advantage, unless your type names are somehow more compact.
Performance. You wrote,
My main concern is with possible performance issues with TypeNameHandling set to Objects due to more data to convert/serialize/transfer.
Firstly, why not just measure and find out? See https://ericlippert.com/2012/12/17/performance-rant/
Secondly, Newtonsoft has a setting MetadataPropertyHandling, that, when set to Default, assumes that the polymorphic property "$type" comes first in each object, and thus is able to stream them in without pre-loading the entire JSON into a JToken hierarchy.
If your converter unconditionally preloads into a JToken hierarchy to fetch the value of the "ItemType" property, it may have worse performance.
Regarding restricting polymorphic deserialization to only required properties, you wrote:
One more concern with the standard solution is performance issue, actually I just need to apply the custom converting logic to all objects of the exact type object, not to all other strongly typed objects
Either way, this is possible with a custom ContractResolver. Override DefaultContractResolver.CreateProperty and, when JsonProperty.PropertyType == typeof(object), set TypeNameHandling or Converter as required, depending on your chosen solution.
I'm wondering how to exclude/strip certain properties of given type(s) (or collections of those) from being serialized using Json.NET library?
I tried to write my own contract resolver (inheriting from DefaultContractResolver) with no luck.
I know that I could be done using DataAnnotations, decorating the excluded properties with ScriptIgnoreAttribute, but it's not applicable in my scenario. The objects serialized can be virtually anything, so I don't know which properties to exclude at design-time. I know only the types of properties that should not be serialized.
It looks like a rather simple task, but unfortunately I couldn't find a decent solution anywhere...
BTW - I'm not bound to Json.NET library - if it can easily be done with default/other .NET JSON serializers it'd be an equally good solution for me.
UPDATE
The properties has to be excluded before trying to serialize them. Why?
Basically, the types of objects I'm receiving and serializing can have dynamic properties of type inheriting from IDynamicMetaObjectProvider. I'm not going to describe all the details, but the DynamicMetaObject returned from GetMetaObject method of these objects doesn't have DynamicMetaObject.GetDynamicMemberNames method implemented (throws NotImplementedException...). Summarizing - the problem is those objects (I need to exclude) doesn't allow to enumerate their properties, what Json.NET serializer tries to do behind the scenes. I always end up with NotImplementedException being thrown.
I have tried both the WCF JSON serialization as well as the System.Web.Script.Serialization.JavaScriptSerializer. I have found if you want solid control of the serialization process and do not want to be bound by attributes and hacks to make things work, the JavaScriptSerializer is the way to go. It is included in the .NET stack and allows you to create and register JavaScriptConverter subclasses to perform custom serialization of types.
The only restriction I have found that may cause you a problem is that you cannot easily register a converter to convert all subclasses of Object (aka, one converter to rule them all). You really need to have knowledge of common base classes or preregister the set of types up front by scanning an assembly. However, property serialization is entirely left up to you, so you can decide using simple reflection which properties to serialize and how.
Plus, the default serialization is much much much better for JSON than the WCF approach. By default, all types are serializable without attributes, enums serialize by name, string-key dictionaries serialize as JSON objects, lists serialize as arrays, etc. But for obvious reasons, such as circular trees, even the default behavior needs assistance from time to time.
In my case, I was supporting a client-API that did not exactly match the server class structure, and we wanted a much simpler JSON syntax that was easy on the eyes, and the JavaScriptSerializer did the trick every time. Just let me know if you need some code samples to get started.
Create your own contract resolver, override the method that creates the properties for an object and then filter the results to only include those that you want.
Have you considered using the ShouldSerialize prefix property to exclude the property of your specific type at runtime?
public class Employee
{
public string Name { get; set; }
public Employee Manager { get; set; }
public bool ShouldSerializeManager()
{
return (Manager != this);
}
}
Is there a way in .Net 4 to easily deserialize a stream of mixed objects one by one? I can read to the start of the element for an object I want to deserialize using XmlTextReader.Read(), but have tried many ways to deserialize that specific object unsuccessfully.
The types I want to deserialize can be read as a list of those types without a problem using XmlSerializer, however I want to be able to mix them rather than having input files containing just lists of one object type.
e.g.
<Objects>
<TypeA>...</TypeA>
<TypeB>...</TypeB>
<TypeA>...</TypeA>
<TypeC>...</TypeC>
...
</Objects>
The ordering of the objects in would be random.
Many thanks for any pointers.
I've looked at XmlSerializer, DataContractSerializer and XElement, but could not get them to work for this (although I possibly didn't set those up correctly as I'm not very familiar with them).
You can do that with the XmlSerializer.
However, be careful with the following:
The array you are serializing/deserializing must be declared as an array of "object" (or the base object if all other types inherit from it)
Each type will have "xsi:type" attached to it
You must use [XmlInclude] to include all the type(s) you are ever going to need with the "root" object.
The need to [XmlInclude] all the object types mean that you're not going to be able to dynamically add types to the serialization. You'll need to add [XmlInclude]'s and recompile to include the new type(s).
Your XML, however, will become:
<Objects>
<TypeObj xsi:type="TypeA">...</TypeObj>
<TypeObj xsi:type="TypeB">...</TypeObj>
<TypeObj xsi:type="TypeA">...</TypeObj>
<TypeObj xsi:type="TypeC">...</TypeObj>
:
</Objects>
This is the most flexible and "normal" way of approach XML serialization of multiple types. However, if you need to keep your exact format, you can declare your class this way:
[XmlRoot("Objects")]
public class Objects
{
[XmlElement("TypeA")] public TypeA[] TypeAObjects;
[XmlElement("TypeB")] public TypeB[] TypeBObjects;
[XmlElement("TypeC")] public TypeC[] TypeCObjects;
:
}
[XmlElement] means that all the objects are jumbled up on the same level (different from XmlArray). They do not even have to be in order.
The pitfalls of doing this, however, is that if you want to add a new type, you'll have to modify the "Objects" class.
Not sure if this is helpful, but it might be an idea to take a look at how RestSharp does their deserialization. https://github.com/johnsheehan/RestSharp
Specifilcally Take a look at RestSharp/Deserializers/XmlDeserializer.cs
You would need to create an XmlSerializer for each type. How many types you have and how many times each would get used would determine if it's better to create a new XmlSerializer for each object as you process it, or store them in a Dictionary<string, XmlSerializer> to be reused. The XmlSerializer takes the type in its constructor, and then you can call the Deserialize method, passing it a StringReader that contains the XML string you read from your file. Hopefully that's enough to get you started, but if you need more help I can throw together some sample code ;)
Are there any closed or open source projects for a XML serializer for C# that can serialize for the most part any object without the need to pollute my domain objects with tons of attributes? That will also handle serialization of collections built with the internal generics classes? A bonus would be that it can handle serializing an interface type property. Another bonus would be that it can serialize objects that have read-only properties (or atleast with the get accessor marked internal)
Well, first define "advanced", i.e. what specifically do you need that XmlSerializer doesn't have. In terms of POCO, XmlSerializer has an overloaded ctor that accepts all the attributes you could ever want to add, to avoid having to add them to your object model - but it does still require a public parameterless constructor, and only works on public read/write fields/properties. And you should cache/re-use the serializer if you use this approach.
I'm not aware of any like alternatives, simply because in most cases this is "good enough" - and it is often a mistake to try to brute-force your existing domain object into a DTO. It may be simpler and more maintainable to simply map your domain entities onto a new DTO(s) that are attributed (and have appropriate ctor/properties/etc).
Note that for the ctor/properties issue DataContractSerializer has some answers, but this doesn't have as much fine-grained control over what the xml looks like.
You can allow System.Xml.dll to access your internals by using the InternalsVisibleToAttribute.
Thus serializing internal types and/or internal members. Including internal .ctors.
You can also implement IXmlSerializable on classes to customize their serialization (like the container containing interface references).
You do not have to provide the XML serialization attributes on your classes, but provide them as XmlAttributeOverrides instead.
XmlSerializer is almost always exactly what people want, they just don't know that it is as flexible as it really is.