What are the drawbacks of marking a class as serializable?
I need to save my asp.net session in a db and it requires that the objects in the session are serializable.
Make sense.
But turns out that all I had to do was decorate that class with the [Serializable] attribute and it worked, so that means .NET already has the underlying infrastructure to make classes serializable. So why can't it just do it by default?
What's the need to mark it as such?
So why can't it just do it by default?
Automatic serialization/deserialization might not suffice for the object. For example, the object might contain a field that holds the name of a local file, a pointer to memory, an index into a shared array, etc. While the system could typically serialize these raw values without trouble, deserialization could easily result in something that is not usable. In general, it is impossible for the system to figure this out on its own. By requiring you to mark the class with Serializable, you indicate that you have taken these considerations into account.
In terms of drawbacks, The primary disadvantage of serialization is the performance overhead (both CPU and the disk) and the potential latency issues when sending it over the wire. There may be slight concerns with security because in general, XML serialization is insecure since it works only on public properties and classes, forcing you in some cases to exposed properties you may not have otherwise. Of course if security is really a concern, you probably wouldn't be storing too sensitive of data in session.
If you are using Silverlight, one potential drawback is that Silverlight does not support the [Serializable] attribute, so any classes decorated with it would be unusable for your Silverlight assemblies.
That said, for session management, small objects stored in the ASPState database typically perform just fine without any noticeable difference over in memory session. On the opposite end of the spectrum, I have had large objects with lists of other objects as properties etc, and if they are big enough, the performance hit can be noticeable at times.
Related
I find it a recurring inconvenience that a lot of simple types in the .Net framework are not marked as serializable. For example: System.Drawing.Point or Rectangle.
Both those structs only consist of primitive data and should be serializable in any format easily. However, because of the missing [System.Serializable] attribute, I can't use them with a BinaryFormatter.
Is there any reason for this, which I'm not seeing?
It is simply a question of efficiency. Tagging a field as serializable the compiler must map each field onto a table of aliases. If they were all marked as serializables every object injecting or inheriting them need to be mapped aswell onto the table of aliases to process its serialization when probably you will never use them and it has a cost of memory and processing and it is more unsafe. Test it with millions of elements and you will see.
Personally, I believe it has less to do with the need to pass the buck, and more to do with the fact of usefulness and actual use, coupled with the fact that the .NET Framework is simply that, a framework. It is designed to be a stepping stone which provides you the basics to complete tasks that would otherwise be daunting in other languages, rather than do everything for you.
There really isn't anything stopping you from creating your own serialization mechanisms and extensions which provide the functionality you're seeking, or relying on many of the other products out there which are FOSS or paid which achieve this for you OOB.
Granted, #Hans Passant's answer is I think very close to the truth, there's a lot of other facets to this which go beyond just simply "It's not my problem." You can take it whatever way you want, but the ultimate thing you need to get out of it is, "where can I go from here?"
Based on my understanding, SerializableAttribute provides no compile time checks, as it's all done at runtime. If that's the case, then why is it required for classes to be marked as serializable?
Couldn't the serializer just try to serialize an object and then fail? Isn't that what it does right now? When something is marked, it tries and fails. Wouldn't it be better if you had to mark things as unserializable rather than serializable? That way you wouldn't have the problem of libraries not marking things as serializable?
As I understand it, the idea behind the SerializableAttribute is to create an opt-in system for binary serialization.
Keep in mind that, unlike XML serialization, which uses public properties, binary serialization grabs all the private fields by default.
Not only this could include operating system structures and private data that is not supposed to be exposed, but deserializing it could result in corrupt state that can crash an application (silly example: a handle for a file open in a different computer).
This is only a requirement for BinaryFormatter (and the SOAP equivalent, but nobody uses that). Diego is right; there are good reasons for this in terms of what it does, but it is far from the only option - indeed, personally I only recommend BinaryFormatter for talking between AppDomains - it is not (IMO) a good way to persist data (to disk, in cache, to a database BLOB, etc).
If this behaviour causes you trouble, consider using any of the alternatives:
XmlSerializer, which works on public members (not just the fields), but demands a public parameterless constructor and public type
DataContractSerializer, which can work fully opt-in (using [DataContract]/[DataMember]), but which can also (in 3.5 and above) work against the fields instead
Also - for a 3rd-party option (me being the 3rd party); protobuf-net may have options here; "v2" (not fully released yet, but available as source) allows the model (which members to serialize, etc) to be described independently of the type, so that it can be applied to types that you don't control. And unlike BinaryFormatter the output is version-tolerant, known public format, etc.
I'm currently able to store an object I've created into HttpContext.Current.Session, and I've come across protobuf-net. Is there a way to store my object by serializing it with protobuf?
It looks like protobuf wants to store the information into a Stream, so should I (can I?) store a Stream object into the users session? Or should I first convert it from a Stream into another object type? If so, will converting the serialized object circumvent the original purpose of using protobuf (cpu usage, memory usage)? Has anyone done this before?
My goal is to use protobuf as a compression layer for storing information into the users session. Is there a better way (smaller sizes, faster compression, easier to maintain, smaller implementation overhead) of doing this, or is protobuf the right tool for this task?
Update
I'm using this class object
[Serializable]
public class DynamicMenuCache
{
public System.DateTime lastUpdated { get; set; }
public MenuList menu { get; set; }
}
This class is a wrapper for my MenuList class, which is (basically) a List of Lists containing built-in types (strings, ints). I've created the wrapper to associate a timestamp with my object.
If I have a session cache miss (session key is null or session.lastUpdated is greater than a globally stored time), I do my normal db lookup (MSSQL), create the MenuList object, and store it into the session, like so
HttpContext.Current.Session.Add("DynamicMenu" + MenuType, new DynamicMenuCache()
{
lastUpdated = System.DateTime.Now,
menu = Menu
});
Currently our session is stored in memory, but we might move to a DB session store in the future.
Our session usage is pretty heavy, as we store lots of large objects into it (although I hope to cleanup what we store in the session at some future point).
For example, we store each user's permission set into their session store to avoid the database hit. There are lots of permissions and permission storing structs that get stored into the session currently.
At this point I'm just viewing the options available, as I'd like to make more intelligent and rigorous use of the session cache in the future.
Please let me know if there is anything else you need.
Note that using protobuf-net here mainly only makes sense if you are looking at moving to a persisted state provider at some point.
Firstly, since you are using in-memory at the moment (so the types are not serialized, AFAIK), some notes on changing session to use any kind of serialization-based provider:
the types must be serializable by the provider (sounds obvious, but this has particular impact if you have circular graphs, etc)
because data is serialized, the semantic is different; you get a copy each time, meaning that any changes you make during a request are lost - this is fine as long as you make sure you explicitly re-store the data again, and can avoid some threading issues - double-edged
the inbuilt state mechanisms typically retrieve session as single operation - which can be a problem if (as you mention) you have some big objects in there; nothing to do with protobuf-net, but I once got called in to investigate a dying server, which turned out to be a multi-MB object in state killing the system, as every request (even those not using that piece of data) caused this huge object to be transported (both directions) over the network
In many ways, I'm actually simply not a fan of the standard session-state model - and this is before I even touch on how it relates to protobuf-net!
protobuf-net is, ultimately, a serialization layer. Another feature of the standard session-state implementation is that because it was originally written with BinaryFormatter in mind, it assumes that the objects can be deserialized without any extra context. protobuf-net, however, is (just like XmlSerializer, DataContractSerializer and JavaScriptSerializer) not tied to any particular type system - it takes the approach "you tell me what type you want me to populate, I'll worry about the data". This is actually a hugely good thing, as I've seen web-servers killed by BinaryFormatter when releasing new versions, because somebody had the audacity to touch even slightly one of the types that happened to relate to an object stored in persisted session. BinaryFormatter does not like that; especially if you (gasp) rename a type, or (shock) make something from a field+property to an automatically-implemented-property. Hint: these are the kinds of problems that google designed protobuf to avoid.
However! That does mean that it isn't hugely convenient to use with the standard session-state model. I have implemented systems to encode the type name into the stream before (for example, I wrote an enyim/memcached transcoder for protobuf-net), but... it isn't pretty. IMO, the better way to do this is to transfer the burden of knowing what the data is to the caller. I mean, really... the caller should know what type of data they are expecting in any given key, right?
One way to do this is to store a byte[]. Pretty much any state implementation can handle a BLOB. If it can't handle that, just use Convert.ToBase64String / Convert.FromBase64String to store a string - any implementation not handling string needs shooting! To use with a stream, you could do something like (pseudo-code here):
public static T GetFromState<T>(string key) {
byte[] blob = {standard state provider get by key}
using(var ms = new MemoryStream(blob)) {
return Serializer.Deserialize<T>(ms);
}
}
(and similar for adding)
Note that protobuf-net is not the same as BinaryFormatter - they have different expectations of what is reasonable, for example by default protobuf-net expects to know in advance what the data looks like (i.e. public object Value {get;set;} would be a pain), and doesn't handle circular graphs (although there are provisions in place to support both of these scenarios). As a general rule of thumb: if you can serialize your data with something like XmlSerializer or DataContractSerializer it will serialize easily with protobuf-net; protobuf-net supports additional scenarios too, but doesn't make an open guarantee to serialize every arbitrary data model. Thinking in terms of DTOs will make life easier. In most cases this isn't a problem at all, since most people have reasonable data. Some people do not have reasonable data, and I just want to set expectation appropriately!
Personally, though, as I say - especially when large objects can get involved, I'm simply not a fan of the inbuilt session-state pattern. What I might suggest instead is using a separate per-key data store (meaning: one record per user per key, rather than just one record per user) - maybe just for the larger objects, maybe for everything. This could be SQL Server, or something like redis/memcached. This is obviously a bit of a pain if you are using 3rd-party controls (webforms etc) that expect to use session-state, but if you are using state manually in your code, is pretty simple to implement. FWIW, BookSleeve coupled to redis works well for things like this, and provides decent access to byte[] based storage. From a byte[] you can deserialize the object as shown above.
Anyway - I'm going to stop there, in case I'm going too far off-topic; feel free to ping back with any questions, but executive summary:
protobuf-net can stop a lot of the versioning issues you might see with BinaryFormatter
but it isn't necessarily a direct 1:1 swap, since protobuf-net doesn't encode "type" information (which the inbuilt session mechanism expects)
it can be made to work, most commonly with byte[]
but if you are storing large objects, you may have other issues (unrelated to protobuf-net) related to the way session-state wants to work
for larger objects in particular, I recommend using your own mechanism (i.e. not session-state); the key-value-store systems (redis, memcached, AppFabric cache) work well for this
Here I need to cache some entites, for example, a Page Tree in a content management system (CMS). The system allows developers to write plugins, in which they can access the cached page tree. Is it good or bad to make the cached page tree mutable (i.e., there are setters for the tree node objects, and/or we expose the Add, Remove method in the ChildPages collection. So the client code can set properties of the page tree nodes, and add/remove tree nodes freely)?
Here's my opinions:
(1) If the page tree is immutable, the plugin developers has no way to modify the tree unexpected. That way we can avoid some subtle bugs.
(2) But sometimes we need to change the name of a page. If the page tree is immutable, we should invoke some method like "Refresh()" to refresh the cache. This will cause a database hit(so totally two database hits, but we should have avoided 1 of the 2 database hit). In this case, if the page tree is mutable, we can directly change the name in the page tree to make the tree up to date (so only 1 database hit is needed).
What do you think about it? And what will you do if you encounter such a situation?
Thanks in advance! :)
UPDATE: The page tree is something like:
public class PageCacheItem {
public string Name { get; set; }
public string PageTitle { get; set; }
public PageCacheItemCollection Children { get; private set; }
}
My problem here is not about the hashcode, because the PageCacheItem won't be put on a hashset or dictionary as keys.
My prolbem is:
If the PageCacheItem (the tree node) is mutable, that is, there are setters for properties(e.g., has setter for Name, PageTitle property). If some plugin authors change the properties of the PageCacheItem by mistake, the system will be in a incorrect state (that cached data is not consistent with the data in the database), and this bug is hard to debug, because it's caused by some plugin, not the system itself.
But if the PageCacheItem is readonly, it might be hard to implement efficient "cache refresh" functionality, because there are no setters for the properties, we can't simply update the properties by setting them to the latest values.
UPDATE2
Thanks guys. But I have one thing to note, that is, I'm not going to develop a generic caching framework, but develop some APIs on top of an exsiting caching framework. So my APIs is a middle layer between the underlying caching framework and the plugin authors. The plugin author doesn't need to know anything about the underlying caching framework. He only need to know this page tree is retrieved from cache. And he gets strongly-typed PageCacheItem APIs to use, not the weak-typed "object" retrieved from the underlying caching framework.
So my questions is about designing APIs for plugin authors, that is, is it good or bad to make the API class PageCacheItem mutable (here mutable == properties can be set outside the PageCacheItem class)?
First, I assume you mean the cached values may or may not be mutable, rather than the identifier it is identified by. If you mean the identifier too, then I would be quite emphatic about being immutable in this regard (emphatic enough to have my post flagged for obscene language).
As for mutable values, there is no one right answer here. You've hit on the primary pro and con either way, and there are multiple variants within each of the two options you describe. Cache invalidation is in general a notoriously difficult problem (as in the well known quote from Phil Karlton, "There are only two hard problems in Computer Science: cache invalidation and naming things."*)
Some things to consider:
How often will changes be made. If changes are rare, refreshes become easy - dump the existing cache and let it rebuild.
Will the CMS be on multiple servers, or could it in the future, as this means that any invalidation information has to be shared.
How bad is stale data, and how soon is it bad (could you happily server out of date values for the next hour or so, or would this conflict disastrously with fresh values).
Does a revalidation approach make sense for you, where after a certain time a cached value is checked to be sure it is still valid, and the time-to-next-check is updated (alternatively, periodically dump old values and let them be retrieved from the fresh source again).
How easy is detecting staleness in the first place? If its hard this can rule out some approaches.
How much does the cache actually save. Could you just get rid of it?
I haven't mentioned threading issues, because the threading issues are difficult with any sort of cache unless you're single-threaded (and if its a CMS I'm guessing it's web, and hence inherently multi-threaded). One thing I'll will say on the matter is that it's generally the case that a cache failure isn't critical (by definition, cache failure has a fallback - get the fresh value) for this reason it can be fruitful to take an approach where rather than blocking indefinitely on the monitor (which is what lock does internally) you use Montior.TryEnter with a timeout, and have the cache operation fail if the timeout is hit. Using a ReaderWriterLockSlim and allowing a slightly longer timeout for writing can be a good approach. This way if you get a point of heavy lock contention then the cache will stop working for some threads, but those threads still get usable data. This will suck for performance for those threads, but not as much as lock contention would cause for all affected threads, and caches are a place where it is very easy to introduce lock contention into a web project that only hits once you've gone live.
*(and of course the well known variant, "there are only two hard problems in Computer Science: cache invalidation, naming things, and off-by-one errors").
Look at it this way, if the entry is mutable, then it is likely that the hashcode will change when the object is mutated.
Depending on the dictionary implementation of the cache, it could either:
be 'lost'
in worst case the entire cache will need to be rehashed
There may be valid reasons why you want 'mutable hashcodes' but I cannot see a justification here. (I have only ever needed to do this once in the last 9 years).
It would be a lot easier just to remove and replace the entry you wish to be 'mutated'.
It's ok if the answer to this is "it's impossible." I won't be upset. But I'm wondering, in making a game using C#, if there's any way to mimic the functionality of the "save state" feature of console emulators. From what I understand, emulators have it somewhat easy, they just dump the entire contents of the virtualized memory, instruction pointers and all. So they can resume exactly the same way, in the exact same spot in the game code as before. I know I won't be able to resume from the same line of code, but is there any way I can maintain the entire state of the game without manually saving every single variable? I'd like a way that doesn't need to be extended or modified every single time I add something to my game.
I'm guessing that if there is any possible way to do this, it would use a p/invoke...
Well, in C# you can do the same, in principle. It's called serialization. Agreed, it's not the exact same thing as a memory dump but comes close enough.
To mark a class as serializable just add the Serializable attribute to it:
[Serializable]
class GameState
Additional information regarding classes that might change:
If new members are added to a serializable class, they can be tagged with the OptionalField attribute to allow previous versions of the object to be deserialized without error. This attribute affects only deserialization, and prevents the runtime from throwing an exception if a member is missing from the serialized stream. A member can also be marked with the NonSerialized attribute to indicate that it should not be serialized. This will allow the details of those members to be kept secret.
To modify the default deserialization (for example, to automatically initialize a member marked NonSerialized), the class must implement the IDeserializationCallback interface and define the IDeserializationCallback.OnDeserialization method.
Objects may be serialized in binary format for deserialization by other .NET applications. The framework also provides the SoapFormatter and XmlSerializer objects to support serialization in human-readable, cross-platform XML.
—Wikipedia: Serialization, .NET Framework
If you make every single one of your "state" classes Serializable then you can literally serialize the objects to a file. You can then load them all up again from this file when you need to resume.
See ISerializable
I agree with the other posters that making your game state classes Serializable is probably the way you want to go. Others have covered basic serialization; for a high end alternative you could look into NHibernate which will persist objects to a database. You can find some good info on NHibernate at these links:
http://www.codeproject.com/KB/database/Nhibernate_Made_Simple.aspx
http://nhibernate.info/doc/burrow/faq