How fast or lightweight Is Protocol Buffer?

How fast or lightweight Is Protocol Buffer? - c#

Is Protocol Buffer for .NET gonna be lightweight/faster than Remoting(the SerializationFormat.Binary)? Will there be a first class support for it in language/framework terms? i.e. is it handled transparently like with Remoting/WebServices?

I very much doubt that it will ever have direct language support or even framework support - it's the kind of thing which is handled perfectly well with 3rd party libraries.
My own port of the Java code is explicit - you have to call methods to serialize/deserialize. (There are RPC stubs which will automatically serialize/deserialize, but no RPC implementation yet.)
Marc Gravell's project fits in very nicely with WCF though - as far as I'm aware, you just need to tell it (once) to use protocol buffers for serialization, and the rest is transparent.
In terms of speed, you should look at Marc Gravell's benchmark page. My code tends to be slightly faster than his, but both are much, much faster than the other serialization/deserialization options in the framework. It should be pointed out that protocol buffers are much more limited as well - they don't try to serialize arbitrary types, only the supported ones. We're going to try to support more of the common data types (decimal, DateTime etc) in a portable way (as their own protocol buffer messages) in future.

Some performance and size metrics are on this page. I haven't got Jon's stats on there at the moment, just because the page is a little old (Jon: we must fix that!).
Re being transparent; protobuf-net can hook into WCF via the contract; note that it plays nicely with MTOM over basic-http too. This doesn't work with Silverlight, though, since Silverlight lacks the injection point. If you use svcutil, you also need to add an attribute to class (via a partial class).
Re BinaryFormatter (remoting); yes, this has full supprt; you can do this simply by a trivial ISerializable implementation (i.e. just call the Serializer method with the same args). If you use protogen to create your classes, then it can do it for you: you can enable this at the command line via arguments (it isn't enabled by default as BinaryFormatter doesn't work on all frameworks [CF, etc]).
Note that for very small objects (single instances, etc) on local remoting (IPC), the raw BinaryFormatter performance is actually better - but for non-trivial graphs or remote links (network remoting) protobuf-net can out-perform it pretty well.
I should also note that the protocol buffers wire format doesn't directly support inheritance; protobuf-net can spoof this (while retaining wire-compatibility), but like with XmlSerializer, you need to declare the sub-classes up-front.
Why are there two versions?
The joys of open source, I guess ;-p Jon and I have worked on joint projects before, and have discussed merging these two, but the fact is that they target two different scenarios:
dotnet-protobufs (Jon's) is a port of the existing java version. This means it has a very familiar API for anybody already using the java version, and it is built on typical java constructs (builder classes, immutable data classes, etc) - with a few C# twists.
protobuf-net (Marc's) is a ground-up re-implementation following the same binary format (indeed, a critical requirement is that you can interchange data between different formats), but using typical .NET idioms:
mutable data classes (no builders)
the serialization member specifics are expressed in attributes (comparable to XmlSerializer, DataContractSerializer, etc)
If you are working on java and .NET clients, Jon's is probably a good choice for the familiar API on both sides. If you are pure .NET, protobuf-net has advantages - the familiar .NET style API, but also:
you aren't forced to be contract-first (although you can, and a code-generator is supplied)
you can re-use your existing objects (in fact, [DataContract] and [XmlType] classes can often be used without any changes at all)
it has full support for inheritance (which it achieves on the wire by spoofing encapsulation) (possibly unique for a protocol buffers implementation? note that sub-classes have to be declared in advance)
it goes out of its way to plug into and exploit core .NET tools (BinaryFormatter, XmlSerializer, WCF, DataContractSerializer) - allowing it to work directly as a remoting engine. This would presumably be quite a big split from the main java trunk for Jon's port.
Re merging them; I think we'd both be open to it, but it seems unlikely you'd want both feature sets, since they target such different requirements.

Related

Is binary serialization inherently unsafe?

Microsoft warns against using BinaryFormatter (they write that there is no way to make the de-serialization safe).
Applications should stop using BinaryFormatter as soon as possible,
even if they believe the data they're processing to be trustworthy.
I don't want to use XML or Json-based solutions (which are what they refer to). I am concerned about file size and preserving the object graph.
If I were to write my own methods to traverse through my object graph and convert the objects to binary could that be made safely or is it something specifically with converting from binary that makes it inherently more dangerous that text?

Are there binary (non-XML and non-JSON) alternatives to BinaryFormatter?
This question feels like it leads to answers that will be more opinion-based.
I'm sure there are a lot of libraries out there, but perhaps the best known alternative is Protocol Buffers (protobuf). It's a Google library, so it gets plenty of development and attention. However, not everyone agrees that using protobuf for generic binary serialization is the best thing to do.
Follow this discussion about BinaryFormatter on the github for dotnet if you want more info; it discusses the general problem with BinaryFormatter, as well as using protobuf as an alternative.
Can I create my own secure binary serialization system?
Yes. That said, the real question should be: 'is it worth my time to do so?'
See this link for the wind-down plan for BinaryFormatter:
https://github.com/dotnet/designs/pull/141/commits/bd0a0661f9d248ed31a354d27ad026efd6719690
At the very bottom you will find:
Why not make BinaryFormatter safe for untrusted payloads?
The BinaryFormatter protocol works by specifying the values of an
object's raw instance fields. In other words, the entire point of
BinaryFormatter is to bypass an object's typical constructor and to
use private reflection to set the instance fields to the contents that
came in over the wire. Bypassing the constructor in this fashion means
that the object cannot perform any validation or otherwise guarantee
that its internal invariants are satisfied. One consequence of this is
that BinaryFormatter is unsafe even for seemingly innocuous types
such as Exception or List<T> or Dictionary<TKey, TValue>,
regardless of the actual types of T, TKey, or TValue.
Restricting deserialization to a list of allowed types will not
resolve this issue.
The security issue isn't with binary serialization as a concept; the issue is with how BinaryFormatter was implemented.
You could design a secure binary deserialization system, if you wanted. If you have very few messages being sent, and you can tightly control which types are deserialized, perhaps it's not too much effort to make a secure system.
However, for a system flexible enough to handle many different use cases (e.g. many different types that can be deserialized), you may find that it takes a lot of effort to build in enough safety checks.
FWIW, you likely will never reach the performance levels of BinaryFormatter with a secure system that offers the same widespread utility (use cases), since BinaryFormatter's speed comes (in part) from having very few safety features. You might approach such performance levels with a targeted, small system with a narrow set of use cases.

Object serialization - from C# or java to Objective C

Server side - C# or java
Client side Objective C
I need a way to serialize an object in C#\java and de-serialize it in Objective C.
I'm new to Objective C and I was wondering where I can get information about this issue.
Thanks.

Apart from the obvious JSON/XML solutions, protobuf may also be interesting. There are Java//c++/python backends for it and 3rd parties have created backends for C# and objective-c (never used that one though) as well.
The main advantages are it being much, much faster to parse[1], much smaller[2] since it's a binary format and the fact that versioning was an important factor from the beginning.
[1] google claims 20-100times compared to XML
[2] 3-10times according to the same source
Another technology similar to protobufs is Apache Thrift.
Apache Thrift is a software framework for scalable cross-language services development. Apache Thrift allows you to define data types and service interfaces in a simple definition file. Taking that file as input, the compiler generates code to be used to easily build RPC clients and servers that communicate seamlessly across programming languages.

JSON for relatively straight forward object graphs
XML/REST for more complex object graphs (distinction between Arrays / Collections / nested arrays etc)

Sudzc. I am using it. It is pretty easy to invoke a Webservice from i-os app.
You dont have to write code to serialize object.

JSON is probably the best choice, because:
It is simple to use
It is human-readable
It is data-based rather than being tied to any more complex object model
You will be able to find decent libraries for import/export in most languages.
Serialisation of more complex objects is IMHO not a good idea from the perspective of portability since often one language/platform has no effective way of expressing a concept from another language / platform. e.g. as soon as you start declaring "types" or "classes" of serialised objects you run into the thorny issue of differing object models between languages.

On iOS there are couple of JSON frameworks and libraries with an Objective-C API:
JSONKit
SBJson
TouchJson
are probably the most prominent.
JSONKit is fast and simple, but can only parse a contiguous portion of JSON text. This means, you need to save downloaded data into a temporary file, or you need to save all downloaded JSON text into a NSMutableData object (kept in memory). Only after the JSON text has been downloaded completely you can start parsing.
SBJson is more flexible to use. It provides an additional "SAX style" interface, can parse partial input and can parse more than one JSON document per "input" (for example several JSON documents per network connection). This is very handy when you want to connect to a "streaming API" (e.g. Twitter Streaming API), where many JSON documents can arrive per connection. The drawback is, it is a much slower than JSONKit.
TouchJson is even somewhat slower than SBJson.
My personal preference is some other, though. It is faster than JSONKit (20% faster on arm), has an additional SAX style API, can handle "streaming APIs", can simultaneously download and parse, can handle very large JSON strings without severely impacting memory foot-print, while it is especially easy to use with NSURLConnection. (Well, I'm probably biased since I'm the author).
You can take a look at JPJson (Apache License v2):
JPJson - it's still in beta, though.

Protocol buffers, getting C# to talk to C++ : type issues and schema issues

I am about to embark on a project to connect two programs, one in c#, and one in c++. I already have a working c# program, which is able to talk to other versions of itself. Before I start with the c++ version, I've thought of some issues:
1) I'm using protobuf-net v1. I take it the .proto files from the serializer are exactly what are required as templates for the c++ version? A google search mentioned something about pascal casing, but I have no idea if that's important.
2) What do I do if one of the .NET types does not have a direct counterpart in c++? What if I have a decimal or a Dictionary? Do I have to modify the .proto files somehow and squish the data into a different shape? (I shall examine the files and see if I can figure it out)
3) Are there any other gotchas that people can think of? Binary formats and things like that?
EDIT
I've had a look at one of the proto files now. It seems .NET specific stuff is tagged eg bcl.DateTime or bcl.Decimal. Subtypes are included in the proto definitions. I'm not sure what to do about bcl types, though. If my c++ prog sees a decimal, what will it do?

Yes, the proto files should be compatible. The casing is about conventions, which shouldn't affect actual functionality - just the generated code etc.
It's not whether or not there's a directly comparable type in .NET which is important - it's whether protocol buffers support the type which is important. Protocol buffers are mostly pretty primitive - if you want to build up anything bigger, you'll need to create your own messages.
The point of protocol buffers is to make it all binary compatible on the wire, so there really shouldn't be gotchas... read the documentation to find out about versioning policies etc. The only thing I can think of is that in the Java version at least, it's a good idea to make enum fields optional, and give the enum type itself a zero value of "unknown" which will be used if you try to deserialize a new value which isn't supported in deserializing code yet.

Some minor additions to Jon's points:
protobuf-net v1 does have a Getaproto which may help with a starting point, however, for interop purposes I would recommend starting from a .proto; protobuf-net can work this was around too, either via "protogen", or via the VS addin
other than that, you shouldn't have my issues as long as you remember to treat all files as binary; opening files in text mode will cause grief

How to mimic built-in .NET serialization idioms?

I have a library (written in C#) for which I need to read/write representations of my objects to disk (or to any Stream) in a particular binary format (to ensure compatibility with C/Java library implementations). The format requires a fair amount of bit-packing and some DEFLATE'd bytestreams. I would like my library, however, to be as idiomatic .NET as possible, however, and so would like to provide an API as close as possible to the normal binary serialization process. I'm aware of the ability to implement the IFormatter interface, but being that I really am unable to reuse any part of the built-in serialization stack, is it worth doing this, or will it just bring unnecessary overhead. In other words:
Implement IFormatter and co.
OR
Just provide "Serialize"/"Deserialize" methods that act on a Stream?
A good point brought up below about needing the serialization semantics for any case involving Remoting. In a case where using MarshalByRef objects is feasible, I'm pretty sure that this won't be an issue, so leaving that aside are there any benefits or drawbacks to using the ISerializable/IFormatter versus a custom stack (or, is my understanding remoting incorrectly)?

I have always gone with the latter. There isn't much use in reusing the serialization framework if all you're doing is writing a file to a specific framework. The only place I've run into any issues with using a custom serialization framework is when remoting, you have to make your objects serializable.
This may not help you since you have to write to a specific format, but protobuf and sqlite are good tools for doing custom serialization.

I'd do the former. There's not much to the interface, and so if you're mimicking the structure anyway adding an ": IFormatter" and the other code necessary to get full compatibility won't take much.

Writing your own serialization code is error prone and time consuming.
As a thought - have you considered existing open-source portable formats, for example "protocol buffers"? This is a high density binary serialization format that underpins much of Google's data transfer etc. Versions are available in a wide range of languages - including Java/C++ etc (in the core Google distribution), and a vast range of others.
In particular, for .NET-idiomatic usage, protobuf-net looks a lot like XmlSerializer or DataContractSerializer (indeed, it can even work purely with xml/wcf attributes if it includes an order on each element) - or can use the specific protobuf-net attributes:
[ProtoContract]
class Person {
[ProtoMember(1)]
public string Name {get;set;}
}
If you want to guarantee portability to other implementations, the recommendation is to start "contract first", with a ".proto" file - in this case, something like:
message person {
required string name = 1;
}
This .proto file can then be used to generate any language-specific variant; so with protobuf-net you'd run it through "protogen" (included in protobuf-net; and a VS2008 add-on is in progress); or for Java/C++ etc you'd run it through "protoc" (included in Google's protobuf). "protogen" in protobuf-net can currently emit C# and VB, but it is pretty easy to add another language if you want to use F# etc - it just involves writing (or migrating) an xslt.
There is also another .NET version that is a more direct port of the Java version; as such it is less .NET idiomatic. This is dotnet-protobufs.

Serialize in C++ then deserialize in C#?

Is there an easy way to serialize data in c++ (either to xml or binary), and then deserialize the data in C#?
I'm working with some remote WINNT machines that won't run .Net. My server app is written entirely in C#, so I want an easy way to share simple data (key value pairs mostly, and maybe some representation of a SQL result set). I figure the best way is going to be to write the data to xml in some predefined format on the client, transfer the xml file to my server, and have a C# wrapper read the xml into a usable c# object.
The client and server are communicating over a tcp connection, and what I really want is to serialize the data in memory on the client, transfer the binary data over the socket to a c# memory stream that I can deserialize into a c# object (eliminating file creation, transfer, etc), but I don't think anything like that exists. Feel free to enlighten me.
Edit
I know I can create a struct in the c++ app and define it in c# and transfer data that way, but in my head, that feels like I'm limiting what can be sent. I'd have to set predefined sizes for objects, etc

Protocol Buffers might be useful to you.
Protocol buffers are Google's language-neutral, platform-neutral, extensible mechanism for serializing structured data – think XML, but smaller, faster, and simpler. You define how you want your data to be structured once, then you can use special generated source code to easily write and read your structured data to and from a variety of data streams and using a variety of languages – Java, C++, or Python.
.NET ports are available from Marc Gravell and Jon Skeet.

I checked out all mentioned projects like prottocol buffers, json, xml, etc. but after I have found BSON I use this because of the following reasons:
Easy to use API
Available in many languages (C, C++, Haskell, Go, Erlang, Perl, PHP, Python, Ruby, C#, ...)
Binary therefore very space efficient and fast (less bytes->less time)
constistent over platforms (no problems with endianess, etc)
hierarchical. The data model is comparable to json (what the name suggests) so most data modelling tasks should be solvable.
No precompiler necessary
wideley used (Mongodb, many languages)

C++ doesn't have structural introspection (you can't find out the fields of a class at runtime), so there aren't general mechanisms to write a C++ object. You either have to adopt a convention and use code generation, or (more typically) write the serialisation yourself.
There are some libraries for standard formats such as ASN.1, HDF5, and so on which are implementation language neutral. There are proprietary libraries which serve the same purpose (eg protocol buffers).
If you're targeting a particular architecture and compiler, then you can also just dump the C++ object as raw bytes, and create a parser on the C# side.
Quite what is better depends how tightly coupled you want your endpoints to be, and whether the data is mainly numerical (HDF5), tree and sequence structures (ASN.1), or simple plain data objects (directly writing the values in memory)

Other options would be:
creating a binary file that contains the data in the way you need it
( not a easy & portable solution )
XML
YAML
plain text files

There are a lot of options you can choose from. Named pipes, shared
memory, DDE, remoting... Depends on your particular need.
Quick googling gave the following:
Named pipes
Named Shared Memory
DDE

As mentioned already, Protocol Buffers are a good option.
If that option doesn't suit your needs, then I would look at sending the XML over to the client (you would have to prefix the message with the length so you know how much to read) and then using an implementation of IXmlSerializer or use the DataContract/DataMember attributes in conjunction with the DataContractSerializer to get your representation in .NET.
I would recommend against using the marshaling attributes, as they aren't supported on things like List<T> and a number of other standard .NET classes which you would use normally.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.