I am involved in an effort to replace System.Web.Caching.Cache with Redis. The issue I am running into is that while System.Web.Caching.Cache seems to be able to cache just about whatever object I pass to it, Redis is string-based. This means that I have to worry about serialization myself.
The two approaches I've tried are 1) to use JSON.NET to serialize my objects to a string, and 2) to use BinaryFormatter. The JSON.NET approach can probably be made to work, but of course there are many configuration points (around what to serialize/ignore, how to handle reference loops, etc.) and making this approach work has turned into a fair amount of work.
The BinaryFormatter approach, I had suspected, was probably closer to what System.Web.Caching.Cache was doing internally. Having gone down that path a bit, though, I am not so sure of that anymore. Many types I'm trying to cache are not marked [Serializable], which seems to rule out BinaryFormatter.
I am wondering if anyone else has faced similar issues, or knows what System.Web.Caching.Cache is doing internally so that I can emulate it. Thanks.
System.Web.Caching.Cache seems to be able to cache just about whatever object I pass to it. This means that I have to worry about serialization myself.
That is exactly because System.Web.Caching.Cache stores object references when application is running.
So that I can emulate it
Unfortunately, you cannot. Redis is a remote service. When you take advantage of Redis (i.e., distributed caching, reducing memory footprint in your local machine), you have to pay the price - handle serialization yourself.
The good news is that, there are a couple of serialization libraries available.
Newtonsoft.Json
protobuf-net
MessagePack-CSharp
Related
Microsoft warns against using BinaryFormatter (they write that there is no way to make the de-serialization safe).
Applications should stop using BinaryFormatter as soon as possible,
even if they believe the data they're processing to be trustworthy.
I don't want to use XML or Json-based solutions (which are what they refer to). I am concerned about file size and preserving the object graph.
If I were to write my own methods to traverse through my object graph and convert the objects to binary could that be made safely or is it something specifically with converting from binary that makes it inherently more dangerous that text?
Are there binary (non-XML and non-JSON) alternatives to BinaryFormatter?
This question feels like it leads to answers that will be more opinion-based.
I'm sure there are a lot of libraries out there, but perhaps the best known alternative is Protocol Buffers (protobuf). It's a Google library, so it gets plenty of development and attention. However, not everyone agrees that using protobuf for generic binary serialization is the best thing to do.
Follow this discussion about BinaryFormatter on the github for dotnet if you want more info; it discusses the general problem with BinaryFormatter, as well as using protobuf as an alternative.
Can I create my own secure binary serialization system?
Yes. That said, the real question should be: 'is it worth my time to do so?'
See this link for the wind-down plan for BinaryFormatter:
https://github.com/dotnet/designs/pull/141/commits/bd0a0661f9d248ed31a354d27ad026efd6719690
At the very bottom you will find:
Why not make BinaryFormatter safe for untrusted payloads?
The BinaryFormatter protocol works by specifying the values of an
object's raw instance fields. In other words, the entire point of
BinaryFormatter is to bypass an object's typical constructor and to
use private reflection to set the instance fields to the contents that
came in over the wire. Bypassing the constructor in this fashion means
that the object cannot perform any validation or otherwise guarantee
that its internal invariants are satisfied. One consequence of this is
that BinaryFormatter is unsafe even for seemingly innocuous types
such as Exception or List<T> or Dictionary<TKey, TValue>,
regardless of the actual types of T, TKey, or TValue.
Restricting deserialization to a list of allowed types will not
resolve this issue.
The security issue isn't with binary serialization as a concept; the issue is with how BinaryFormatter was implemented.
You could design a secure binary deserialization system, if you wanted. If you have very few messages being sent, and you can tightly control which types are deserialized, perhaps it's not too much effort to make a secure system.
However, for a system flexible enough to handle many different use cases (e.g. many different types that can be deserialized), you may find that it takes a lot of effort to build in enough safety checks.
FWIW, you likely will never reach the performance levels of BinaryFormatter with a secure system that offers the same widespread utility (use cases), since BinaryFormatter's speed comes (in part) from having very few safety features. You might approach such performance levels with a targeted, small system with a narrow set of use cases.
When returning many results from a query, the code takes a really long time to convert the data into .net objects. These are basic objects, with a few strings as fields. I'm not sure but I think it's using reflection to create the instances which is slow. Is there way to speed this up?
The 10gen driver doesn't use reflection on a per object basis. It uses reflection once per type to generate a serializer using Reflection.Emit, so serialization or deserialization of the first object might be slow, but any objects afterward are fast (relatively).
Your question - is there any way to speed this up?
If your objects are simple (not nested documents, a few public fields, etc.), there probably isn't much you can do. You could implement a custom serializer for the class to eke out a little performance, but I doubt it would be more than a few percent.
I haven't looked into it, and Robert Stam (who answered this question as well) would be the authority on it, but there may be some performance to be gained on multicore or multiprocessor systems by parallelizing deserialization in the driver. I haven't looked at the driver code from that perspective yet, so it may be something Robert has already pursued.
On a general note, I think 30,000 objects in 10 seconds is pretty standard for just about any platform - SQL, Mongo, XML, etc that isn't storing objects as memory blobs directly (like you could using a language like C++).
EDIT:
It looks like the 10gen driver performs deserialization before it returns a cursor for you to enumerate. So if your query returns 30,000 results, all 30,000 objects have to be deserialized before the driver makes a cursor available for enumeration. I haven't looked at the jmongo driver, but I expect that it does the opposite, and defers deserialization until after an object is enumerated in the cursor.
The net result is that while both probably take the same amount of total time to enumerate and deserialize 30,000 objects, deserialization in the jmongo driver is spread across the entire enumeration, where in the c# driver it is frontloaded.
The difference is subtle, but likely to explain what you are seeing.
The bad news is the "fix" is a driver change. One thing you could do is break your query up in chunks, querying for 10 or 100 objects at a time.
Not sure how you are measuring. When the C# driver gets back a batch of documents from the server it deserializes them all at once, so there might be a lag on the first document but then the rest of the documents are really fast. What really matters is the total throughput in terms of documents per second and whether it is fast enough to saturate the network link, which it should be.
While there are hardcoded serializers for many of the standard .NET classes, serialization of POCOs is typically handled through class maps. Reflection is used to build the class maps, but reflection is no longer needed while doing the serialization/deserialization.
You could speed up serialization/deserialization a little bit by writing your own handcoded serializers for your classes (or by making your classes implement IBsonSerializable), but since the bottleneck is probably the network anyway it probably isn't worth it.
Here is what I am using:
Read only needed fields
Cache objects which is often needed but rarely changed in memory
When I need to read many objects by rule (e.g. products by filter criteria) I store all products in single filter object and read all of them at once. Drawback is recalculate this cache when something changed.
I'm on a project that processes and reports on large sets of aggregatable row based data. There is a primary aggregation service and then many clients who can subscribe to different views of the data from that server. The objects are passed back and forth between the Java server and the C# clients encoded in JSON. We're noticing that the parsing of the objects is taking a lot of time and somewhat memory intensive. Have others used JSON for this purpose or seen similar behavior?
We used to use straight XML across the wire and had to use custom serialization (ie. manual) for alot of the objects. While not JSON we did have performance hits due to this constraint. Once we migrated all our tech to a similar architecture we were able to switch to binary serialization which worked much better.
However on the objects where we had issues with performance due to size we made some modifications. Since we had access to the code on both ends (and both were c#) we were able to binary serialize the payload and then base64 encode it since it had to be text across the wire. It did help a good bit in terms of object size and the serialization ran a bit faster.
Since you are going from Java to C# you won't really have that luxury. So the only thing I can think of in your case would be to try and optimize your parsing of the JSON response. You may be able to use some code profiling tools to help you identify portions that are causing you performance issues and then try to optimize those. Also, on the deserialize to JSON make sure you use a string builder to build your final string. If you are doing standard concat operations it will kill performance as well.
Also, you might want to check around I have seen on the web several JSON serializers written for c# some may be faster than what you are doing, who knows.
Not sure if that helps you all that much but there is some info from things we have seen with string based message passing.
UPDATE: Just saw this on dotnetkicks: JSON.Net it's an update from james for the json.net serializers. May help out.
I know for java there are any number of opensource JSON serializers and deserializers. We use FlexJSON.
JSON can be expensive to decode. If performance is an issue try using something like Hessian.
Which serialization should I use?
I need to store a large Dictionary with 100000+ elements, and I just need to save and load this data directly without caring whether it's binary or whether it's formatted or not.
Right now I am using the BinarySerializer but not sure if it's the most effective?
Please suggest better alternatives in the .NET standard libraries or an external library, preferably free.
EDIT: This is to serialize to disk and from it. The app is single threaded too.
Well, it will depend on what's in the dictionary - but if Protocol Buffers is flexible enough for you (you have to define your own types to serialize - it doesn't do all .NET types or anything like that), it's pretty darned fast.
For example, in protocol buffers I'd represent the dictionary as a message with a repeated key/value pair field. For ultimate speed you could use the CodedOutputStream and CodedInputStream to serialize/deserialize the dictionary directly rather than reading it all into memory separately first. Again, it'll depend on what the key/value types are though.
This is entirely a guess since I haven't profiled this (ie. which is what you should do to truly get your answer).
But my guess is that the binary serializer would give you the best performance. Both in size and speed.
This is a bit of an open-ended question. Are you storing this in memory or writing it to disk? Does this execute in a multi-threaded (and perhaps multi-concurrent-access) environment? Context is important.
BinarySerializer is generally going to be pretty fast, and there are external libs that provide better compression such as ProtoBuffers. I've personally had good success with DataContractSerializer.
The great thing about all these options is that you can try all of them (relatively pain free) to learn for yourself what works in your environment and operation.
This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
How costly is .NET reflection?
The "elegant" solution to a problem I am having is to use attributes to associate a class and its properties with another's. The problem is, to convert it to the other, I'd have to use reflection. I am considering it for a server-side app that will be hosted on the cloud.
I've heard many rumblings of "reflection is slow, don't use it," how slow is slow? Is it so CPU intensive that it'll multiply my CPU time so much that I'll literally be paying for my decision to use reflection at the bottom of my architecture on the cloud?
Just in case you don't see the update on the original question: when you are reflecting to find all the types that support a certain attribute, you have a perfect opportunity to use caching. That means you don't have to use reflection more than once at runtime.
To answer the general question, reflection is slower than raw compiled method calls, but it's much, much faster than accessing a database or the file system, and practically all web servers do those things all the time.
It's many times faster than filesystem access.
It's many many times faster than database access across the network.
It's many many many times faster than sending an HTTP response to the browser.
Probably you won't even notice it. Always profile first before thinking about optimizations.
I've wondered the same thing; but it turns out that reflection isn't all that bad. I can't find the resources (I'll try to list them when I find them), but I think I remember reading that it was maybe 2x to 3x slower. 50% or 33% of fast is still fast.
Also, I under the hood ASP.net webforms and MVC do a bunch of reflection, so how slow can it really be?
EDIT
Here is one resource I remember reading: .Net Reflection and Performance
Hmm, I try to avoid reflection if I can, but if I have to create a solution, and reflection gives me an elegant way to solve problem at hand, I 'll hapily use reflection.
But, it must be told that I think reflection should not be used to do 'dirty tricks'. At this very moment, I'm also working on a solution where I use custom attributes to decorate some classes, and yes, I'll have to use reflection in order to know whether a class / property / whatever has been decorated by my custom attribute.
I also think it is a matter of 'how much do you make reflection calls' ?
If I can, I try to cache my results.
Like, in the solution where I'm working on: on application startup, I inspect certain types in a certain assembly, whether those types have been decorated with my attribute, and, I keep them in a dictionary.