I'm currently able to store an object I've created into HttpContext.Current.Session, and I've come across protobuf-net. Is there a way to store my object by serializing it with protobuf?
It looks like protobuf wants to store the information into a Stream, so should I (can I?) store a Stream object into the users session? Or should I first convert it from a Stream into another object type? If so, will converting the serialized object circumvent the original purpose of using protobuf (cpu usage, memory usage)? Has anyone done this before?
My goal is to use protobuf as a compression layer for storing information into the users session. Is there a better way (smaller sizes, faster compression, easier to maintain, smaller implementation overhead) of doing this, or is protobuf the right tool for this task?
Update
I'm using this class object
[Serializable]
public class DynamicMenuCache
{
public System.DateTime lastUpdated { get; set; }
public MenuList menu { get; set; }
}
This class is a wrapper for my MenuList class, which is (basically) a List of Lists containing built-in types (strings, ints). I've created the wrapper to associate a timestamp with my object.
If I have a session cache miss (session key is null or session.lastUpdated is greater than a globally stored time), I do my normal db lookup (MSSQL), create the MenuList object, and store it into the session, like so
HttpContext.Current.Session.Add("DynamicMenu" + MenuType, new DynamicMenuCache()
{
lastUpdated = System.DateTime.Now,
menu = Menu
});
Currently our session is stored in memory, but we might move to a DB session store in the future.
Our session usage is pretty heavy, as we store lots of large objects into it (although I hope to cleanup what we store in the session at some future point).
For example, we store each user's permission set into their session store to avoid the database hit. There are lots of permissions and permission storing structs that get stored into the session currently.
At this point I'm just viewing the options available, as I'd like to make more intelligent and rigorous use of the session cache in the future.
Please let me know if there is anything else you need.
Note that using protobuf-net here mainly only makes sense if you are looking at moving to a persisted state provider at some point.
Firstly, since you are using in-memory at the moment (so the types are not serialized, AFAIK), some notes on changing session to use any kind of serialization-based provider:
the types must be serializable by the provider (sounds obvious, but this has particular impact if you have circular graphs, etc)
because data is serialized, the semantic is different; you get a copy each time, meaning that any changes you make during a request are lost - this is fine as long as you make sure you explicitly re-store the data again, and can avoid some threading issues - double-edged
the inbuilt state mechanisms typically retrieve session as single operation - which can be a problem if (as you mention) you have some big objects in there; nothing to do with protobuf-net, but I once got called in to investigate a dying server, which turned out to be a multi-MB object in state killing the system, as every request (even those not using that piece of data) caused this huge object to be transported (both directions) over the network
In many ways, I'm actually simply not a fan of the standard session-state model - and this is before I even touch on how it relates to protobuf-net!
protobuf-net is, ultimately, a serialization layer. Another feature of the standard session-state implementation is that because it was originally written with BinaryFormatter in mind, it assumes that the objects can be deserialized without any extra context. protobuf-net, however, is (just like XmlSerializer, DataContractSerializer and JavaScriptSerializer) not tied to any particular type system - it takes the approach "you tell me what type you want me to populate, I'll worry about the data". This is actually a hugely good thing, as I've seen web-servers killed by BinaryFormatter when releasing new versions, because somebody had the audacity to touch even slightly one of the types that happened to relate to an object stored in persisted session. BinaryFormatter does not like that; especially if you (gasp) rename a type, or (shock) make something from a field+property to an automatically-implemented-property. Hint: these are the kinds of problems that google designed protobuf to avoid.
However! That does mean that it isn't hugely convenient to use with the standard session-state model. I have implemented systems to encode the type name into the stream before (for example, I wrote an enyim/memcached transcoder for protobuf-net), but... it isn't pretty. IMO, the better way to do this is to transfer the burden of knowing what the data is to the caller. I mean, really... the caller should know what type of data they are expecting in any given key, right?
One way to do this is to store a byte[]. Pretty much any state implementation can handle a BLOB. If it can't handle that, just use Convert.ToBase64String / Convert.FromBase64String to store a string - any implementation not handling string needs shooting! To use with a stream, you could do something like (pseudo-code here):
public static T GetFromState<T>(string key) {
byte[] blob = {standard state provider get by key}
using(var ms = new MemoryStream(blob)) {
return Serializer.Deserialize<T>(ms);
}
}
(and similar for adding)
Note that protobuf-net is not the same as BinaryFormatter - they have different expectations of what is reasonable, for example by default protobuf-net expects to know in advance what the data looks like (i.e. public object Value {get;set;} would be a pain), and doesn't handle circular graphs (although there are provisions in place to support both of these scenarios). As a general rule of thumb: if you can serialize your data with something like XmlSerializer or DataContractSerializer it will serialize easily with protobuf-net; protobuf-net supports additional scenarios too, but doesn't make an open guarantee to serialize every arbitrary data model. Thinking in terms of DTOs will make life easier. In most cases this isn't a problem at all, since most people have reasonable data. Some people do not have reasonable data, and I just want to set expectation appropriately!
Personally, though, as I say - especially when large objects can get involved, I'm simply not a fan of the inbuilt session-state pattern. What I might suggest instead is using a separate per-key data store (meaning: one record per user per key, rather than just one record per user) - maybe just for the larger objects, maybe for everything. This could be SQL Server, or something like redis/memcached. This is obviously a bit of a pain if you are using 3rd-party controls (webforms etc) that expect to use session-state, but if you are using state manually in your code, is pretty simple to implement. FWIW, BookSleeve coupled to redis works well for things like this, and provides decent access to byte[] based storage. From a byte[] you can deserialize the object as shown above.
Anyway - I'm going to stop there, in case I'm going too far off-topic; feel free to ping back with any questions, but executive summary:
protobuf-net can stop a lot of the versioning issues you might see with BinaryFormatter
but it isn't necessarily a direct 1:1 swap, since protobuf-net doesn't encode "type" information (which the inbuilt session mechanism expects)
it can be made to work, most commonly with byte[]
but if you are storing large objects, you may have other issues (unrelated to protobuf-net) related to the way session-state wants to work
for larger objects in particular, I recommend using your own mechanism (i.e. not session-state); the key-value-store systems (redis, memcached, AppFabric cache) work well for this
Related
Based on my understanding, SerializableAttribute provides no compile time checks, as it's all done at runtime. If that's the case, then why is it required for classes to be marked as serializable?
Couldn't the serializer just try to serialize an object and then fail? Isn't that what it does right now? When something is marked, it tries and fails. Wouldn't it be better if you had to mark things as unserializable rather than serializable? That way you wouldn't have the problem of libraries not marking things as serializable?
As I understand it, the idea behind the SerializableAttribute is to create an opt-in system for binary serialization.
Keep in mind that, unlike XML serialization, which uses public properties, binary serialization grabs all the private fields by default.
Not only this could include operating system structures and private data that is not supposed to be exposed, but deserializing it could result in corrupt state that can crash an application (silly example: a handle for a file open in a different computer).
This is only a requirement for BinaryFormatter (and the SOAP equivalent, but nobody uses that). Diego is right; there are good reasons for this in terms of what it does, but it is far from the only option - indeed, personally I only recommend BinaryFormatter for talking between AppDomains - it is not (IMO) a good way to persist data (to disk, in cache, to a database BLOB, etc).
If this behaviour causes you trouble, consider using any of the alternatives:
XmlSerializer, which works on public members (not just the fields), but demands a public parameterless constructor and public type
DataContractSerializer, which can work fully opt-in (using [DataContract]/[DataMember]), but which can also (in 3.5 and above) work against the fields instead
Also - for a 3rd-party option (me being the 3rd party); protobuf-net may have options here; "v2" (not fully released yet, but available as source) allows the model (which members to serialize, etc) to be described independently of the type, so that it can be applied to types that you don't control. And unlike BinaryFormatter the output is version-tolerant, known public format, etc.
I've seen couple of examples with serializable attribute like this:
[Serializable()]
public class sampleClass
{
public int Property1{ get; set; }
public string Proerty2{ get; set; }
public sampleClass() { }
public sampleClass(int pr1, int pr2)
{
pr1 = pr1;
Name = pr2.ToString();
}
}
I never had a good grasp on understanding of how this works, but from msdn:
Serialization allows the developer to save the state of an object
and recreate it as needed, providing storage of objects as well as
data exchange. Through serialization, a developer can perform actions
like sending the object to a remote application by means of a Web
Service, passing an object from one domain to another, passing an
object through a firewall as an XML string, or maintaining security or
user-specific information across applications.
But the problem is that in my code example I see no use for it. Just an object that is used to retrieve data from the database, nothing special. What are some other uses on when to use and when not to use serialization.
For example, should I always use serializzation because it is more secure? is it goin to be slower this way?
Update: Thanks for all nice answers
Serialization is useful any time you want to move a representation of your data into or out of your process boundary.
Saving an object to disk is a trivial example you'll see in many tutorials.
More commonly, serialization is used to transfer data to and from a web service, or to persist data to or from a database.
Several answers have covered the reasons of why you might want to use serialization in general. You seem to also want to know why a specific class has attribute [Serializable] and you are wondering why that may have been done.
With ASP.NET the default Session state storage is InProc which allows you to store any object as a reference and leave it on the heap. This is the best performing way to store session state, however, it only works if you are using a single worker thread or if all your session state could be rebuilt automatically if the worker thread were to change (unlikely). For the other state storage modes (StateServer and SQL Server) all the session state objects must be serializable as the ASP.NET engine will first serialize these objects using binary serialization before sending them to the storage medium.
In your case, you may be using InProc. One reason though to still mark all classes that are used in session state as Serializable and test them that way is that you may have a need to change this in the future (for example, to use a Web Farm). If you do not design your session state classes with this in mind it will be quite difficult to do the migration in the future.
Also, just because you can remove the Serializable attribute and the program "works" in one environment does not mean that it will work in another environment. For example, it may work fine for you under Visual Studio test web server (which always uses InProc session state mode) instance and even in a development IIS instance but then, perhaps a production IIS instance is setup to use a different storage mode.
These environmental/configuration differences are not necessarily limited to ASP.NET applications. There are other application engines that may do this or even standalone applications that do (it is not difficult to build this kind of configurable environment).
Finally, you may be working with a library which may be consumed by different applications. Some may need to store state in a serializable manner and others may not.
Because of these factors it is often a very good idea, at least when building a library, to consider marking simple value classes or state management classes with [Serializable]. Keep in mind that this increases the work for testing these classes and there are limits to what can be serialized (i.e. a class that contains a socket reference or open file reference may not be a good candidate for serialization as open external resources cannot be serialized) so do not overuse this.
You asked if using [Serializable] will be slower. No, it will not be. This attribute has no direct affect on performance. However, if the application environment is changed to serialize the object, then yes, performance will be affected. It is the act of serializing and deserializing that is slower than just storing the object on the heap. [Note that some routines could be written to look for the Serializable attribute and then choose to serialize but this is rare; usually it is like ASP.NET and left up to an administrator or user to decide if they want to change the store medium.]
The MSDN quote you provide explains when serialization is useful: for transporting or storing data. Writing to a file is serialization, and serialization is required t send an object over a network.
If you are just populating the object in a single application, perhaps from a database, then indeed: serialization is not a concern at all. Imaging a class for serialization has no impact on security or performance: if you don't need it, don't worry about it.
Note also that [Serializable] mainly relates to BinaryFormatter, but there are actually many more serializers than that. For example: you might want to expose your object via JSON or XML - both of those require serialization, but neither requires [Serializable].
Simple example: Imagine you have a custom shape to store application settings.
namespace My.Namespace
{
[Serializable]
public class Settings
{
public string Setting1 { get; set; }
public string Setting2 { get; set; }
}
}
You could then have a file an xml file as such:
<?xml version="1.0" encoding="utf-8" ?>
<Settings>
<Setting1>Foo</Setting1>
<Setting2>Bar</Setting2>
</Settings>
Using XmlSerializer you could simply serialize and deserialize your settings.
It is also necessary for your shape to be Serializable if you wish to stuff it into ASP.NET ViewState
These are very basic examples but demonstrate it's usefulness
What are some other uses on when to use and when not to use serialization.
Let me give you one practical example. In one of my application, I was given XML schemas (XSD files) for request and response XML files. I need to parse the request XML file, process and save the information back into several tables. Later I need to prepare response XML file accordingly and send it back to our client.
I used Xsd2Code to generate C# classes based on the schema. So parsing the request XML file is simply deserializing it to the generated request class object. Then I can access properties from the object the way it appears in request XML file. While generating response XML file is simply serializing from the generated response class object which I populate in my code. This way I can work with C# objects rather than XML files. I hope it makes sense.
For example, should I always use serializzation because it is more secure
I don't think this is related to security in any way.
I'm building a small app that needs to save an object to a file in order to save user data.
I have two questions about my serialization to this file :
The object I'm creating has some public properties and an event. I added the [Serializable] attribute to my object, and then realized I can't serialize an object with an event in it.
I then discovered that I can just add an attribute above my event [field:NonSerialized] and it will work. Is this the best way to do this, or should I try to build my Serializable objects without any events inside ?
The object I'm serializing saves some user settings about the app. These settings aren't sensitive enough to go about encrypting them in the file, but i still don't want them to be tampered with manually without opening my application. When i serialize my object to a file using a plain BinaryFormatter object, via the Serialize() method, I see readable names of .net object types in the file i'm saving this to. Is there a way for someone to reverse engineer this and see what's being saved without using my program ? Is there a way for someone to build a small application and find out how to DeSerialize the information in this file ? If so, how would i go about hiding the information in this file ?
Are there any other tips/suggestions/best practices i should stick to when going about serializing an object to a file in this kind of scenario ?
Thanks in advance!
If your object implements the ISerializable interface, you can control all the data that is stored/serialized yourself, and you can control the deserialization.
This is important if your project evolves in time. Because you might drop some properties, add others, or change the behaviour.
I always add a version to the serialization bag. That way I know what was the version of the object when it was stored, and I therefor know how to deserialize it.
[Serializable]
class Example : ISerializable {
private static const int VERSION = 3;
public Example(SerializationInfo info, StreamingContext context) {
var version = info.GetInt32("Example_Version", VERSION);
if (version == 0) {
// Restore properties for version 0
}
if (version == 1) {
// ....
}
}
void ISerializable.GetObjectData(SerializationInfo info, StreamingContext context) {
info.AddValue("Example_Version", VERSION);
// Your data here
}
}
And if you do not encrypt, it will be very easy to "read" your data. Very easy meaning you might have to invest a couple of hours. If the data you store is worth a couple of days, this means it is easy, if it is only worth a couple of minutes it is hard. If you get the point.
A very easy way to encrypt your data is using the Windows DPAPI through the ProtectedData class.
1: with BinaryFormatter, yes - you need NonSerialized for events (unless you implement ISerializable, but that adds lots of work); however I'm pretty much on-record as saying that I simply wouldn't use BinaryFormatter here. It is not very forgivig for a range of changes to your type. I would use something less tied to the internals of your code; XmlSerializer; DataContractSerializer, JavaScriptSerializer. I can suggest binary alternatives too; NetDataContractSerializer, protobuf-net (my own), etc.
2: yes, with almost any implementation that doesnt involve proper encryption, if anyone cares they can reverse engineer and obtain the strings. So it depends how hidden it needs to be. Simply running your existing serialization through GZipStream may be enough obfuscation for your needs, BUT this is just a mask against casual inspection. It will not deter anyone with a reason to look for the data.
If the data needs to be secure, you'll need proper encryption using either a key the user enters at app startup, or something like a certificate securely stores against their user-profile.
I would remove the events from the objects. It's a little cleaner that way.
Anything can be reverse engineered. Just encrypt it when saving the file. It's pretty easy to do. Of course, the encryption key is going to have to be stored in the app somewhere, so unless you're obfuscating your code a determined hacker will be able to get to it.
Here I need to cache some entites, for example, a Page Tree in a content management system (CMS). The system allows developers to write plugins, in which they can access the cached page tree. Is it good or bad to make the cached page tree mutable (i.e., there are setters for the tree node objects, and/or we expose the Add, Remove method in the ChildPages collection. So the client code can set properties of the page tree nodes, and add/remove tree nodes freely)?
Here's my opinions:
(1) If the page tree is immutable, the plugin developers has no way to modify the tree unexpected. That way we can avoid some subtle bugs.
(2) But sometimes we need to change the name of a page. If the page tree is immutable, we should invoke some method like "Refresh()" to refresh the cache. This will cause a database hit(so totally two database hits, but we should have avoided 1 of the 2 database hit). In this case, if the page tree is mutable, we can directly change the name in the page tree to make the tree up to date (so only 1 database hit is needed).
What do you think about it? And what will you do if you encounter such a situation?
Thanks in advance! :)
UPDATE: The page tree is something like:
public class PageCacheItem {
public string Name { get; set; }
public string PageTitle { get; set; }
public PageCacheItemCollection Children { get; private set; }
}
My problem here is not about the hashcode, because the PageCacheItem won't be put on a hashset or dictionary as keys.
My prolbem is:
If the PageCacheItem (the tree node) is mutable, that is, there are setters for properties(e.g., has setter for Name, PageTitle property). If some plugin authors change the properties of the PageCacheItem by mistake, the system will be in a incorrect state (that cached data is not consistent with the data in the database), and this bug is hard to debug, because it's caused by some plugin, not the system itself.
But if the PageCacheItem is readonly, it might be hard to implement efficient "cache refresh" functionality, because there are no setters for the properties, we can't simply update the properties by setting them to the latest values.
UPDATE2
Thanks guys. But I have one thing to note, that is, I'm not going to develop a generic caching framework, but develop some APIs on top of an exsiting caching framework. So my APIs is a middle layer between the underlying caching framework and the plugin authors. The plugin author doesn't need to know anything about the underlying caching framework. He only need to know this page tree is retrieved from cache. And he gets strongly-typed PageCacheItem APIs to use, not the weak-typed "object" retrieved from the underlying caching framework.
So my questions is about designing APIs for plugin authors, that is, is it good or bad to make the API class PageCacheItem mutable (here mutable == properties can be set outside the PageCacheItem class)?
First, I assume you mean the cached values may or may not be mutable, rather than the identifier it is identified by. If you mean the identifier too, then I would be quite emphatic about being immutable in this regard (emphatic enough to have my post flagged for obscene language).
As for mutable values, there is no one right answer here. You've hit on the primary pro and con either way, and there are multiple variants within each of the two options you describe. Cache invalidation is in general a notoriously difficult problem (as in the well known quote from Phil Karlton, "There are only two hard problems in Computer Science: cache invalidation and naming things."*)
Some things to consider:
How often will changes be made. If changes are rare, refreshes become easy - dump the existing cache and let it rebuild.
Will the CMS be on multiple servers, or could it in the future, as this means that any invalidation information has to be shared.
How bad is stale data, and how soon is it bad (could you happily server out of date values for the next hour or so, or would this conflict disastrously with fresh values).
Does a revalidation approach make sense for you, where after a certain time a cached value is checked to be sure it is still valid, and the time-to-next-check is updated (alternatively, periodically dump old values and let them be retrieved from the fresh source again).
How easy is detecting staleness in the first place? If its hard this can rule out some approaches.
How much does the cache actually save. Could you just get rid of it?
I haven't mentioned threading issues, because the threading issues are difficult with any sort of cache unless you're single-threaded (and if its a CMS I'm guessing it's web, and hence inherently multi-threaded). One thing I'll will say on the matter is that it's generally the case that a cache failure isn't critical (by definition, cache failure has a fallback - get the fresh value) for this reason it can be fruitful to take an approach where rather than blocking indefinitely on the monitor (which is what lock does internally) you use Montior.TryEnter with a timeout, and have the cache operation fail if the timeout is hit. Using a ReaderWriterLockSlim and allowing a slightly longer timeout for writing can be a good approach. This way if you get a point of heavy lock contention then the cache will stop working for some threads, but those threads still get usable data. This will suck for performance for those threads, but not as much as lock contention would cause for all affected threads, and caches are a place where it is very easy to introduce lock contention into a web project that only hits once you've gone live.
*(and of course the well known variant, "there are only two hard problems in Computer Science: cache invalidation, naming things, and off-by-one errors").
Look at it this way, if the entry is mutable, then it is likely that the hashcode will change when the object is mutated.
Depending on the dictionary implementation of the cache, it could either:
be 'lost'
in worst case the entire cache will need to be rehashed
There may be valid reasons why you want 'mutable hashcodes' but I cannot see a justification here. (I have only ever needed to do this once in the last 9 years).
It would be a lot easier just to remove and replace the entry you wish to be 'mutated'.
What are the drawbacks of marking a class as serializable?
I need to save my asp.net session in a db and it requires that the objects in the session are serializable.
Make sense.
But turns out that all I had to do was decorate that class with the [Serializable] attribute and it worked, so that means .NET already has the underlying infrastructure to make classes serializable. So why can't it just do it by default?
What's the need to mark it as such?
So why can't it just do it by default?
Automatic serialization/deserialization might not suffice for the object. For example, the object might contain a field that holds the name of a local file, a pointer to memory, an index into a shared array, etc. While the system could typically serialize these raw values without trouble, deserialization could easily result in something that is not usable. In general, it is impossible for the system to figure this out on its own. By requiring you to mark the class with Serializable, you indicate that you have taken these considerations into account.
In terms of drawbacks, The primary disadvantage of serialization is the performance overhead (both CPU and the disk) and the potential latency issues when sending it over the wire. There may be slight concerns with security because in general, XML serialization is insecure since it works only on public properties and classes, forcing you in some cases to exposed properties you may not have otherwise. Of course if security is really a concern, you probably wouldn't be storing too sensitive of data in session.
If you are using Silverlight, one potential drawback is that Silverlight does not support the [Serializable] attribute, so any classes decorated with it would be unusable for your Silverlight assemblies.
That said, for session management, small objects stored in the ASPState database typically perform just fine without any noticeable difference over in memory session. On the opposite end of the spectrum, I have had large objects with lists of other objects as properties etc, and if they are big enough, the performance hit can be noticeable at times.