I am doing an update of my protobuf-net library reference, specifically from 2.4.4 to 3.0.101. Previously, we used null in lists as they contain meaningful information to the business (e.g., new[] { "one", "two", null, null, "five" }). However, they are not supported in 3.x yet as far as I understand (https://protobuf-net.github.io/protobuf-net/releasenotes#).
Is there a suggested migration strategy for collections with nulls?
I can mitigate the change going forward with additional fields (e.g., transposing the collection to a dictionary & back again on serializing/deserializing), however backwards compatibility seems broken for data serialied with 2.x libraries. Are there any migration guides?
Given that 3.x doesn't yet support null retention, your options are somewhat limited:
Submit a PR to add the missing feature. A quick glance through the protobuf source code makes me think that it would be fairly trivial to implement. In line 161 of the linked source file there seems to be a throw for nulls, which is where I'd start. I could be very wrong about how complicated this would be, though.
See if you can use both libraries concurrently. You would need to know (or detect) whether data to serialize is in v2 or v3 format (I have not checked but would be surprised if there wasn't a way to detect this by looking at the first few bytes). You may need to compile a custom version to give it a different namespace, in order for the two to co-exist.
Migrate data to v3. You can do this as a one-off operation (smaller amounts of data that you control) or on-demand (large amounts of data or externally received data). You'll need to re-design the types used in lists with nulls, so that you no longer have nulls (e.g. by having a custom value that logically represents null).
Stay with v2. It's stable and works exceptionally well, so unless you have a specific need to upgrade it may not be worth the effort.
Related
I'm currently a researcher for AI company.
I require a serialization solution that is to store very similar structure wise , but vastly different types, interface/base class, and internal generic lists and arrays.
I'm working on CS(C#) due to unique requirements of my work, porting to Java for example isn't an option.
Suffice to say XML doesn't quite cut it - some Nuget packaged upgraded versions of the Microsoft default appear to be a a bit too static or their patterns seem 'clumsy'.
My next line of research Led to JSON (.net).
However I'm unsure if this is the best option - especially considering the complexity of the classes to saved - and the potential for a REST architecture distribution implementation soon.
Thanks for you time, and suggestions. Links to examples of your recommendations containing similarly complex class structures would be appreciated.
You should check out serialization and deserialization of dynamic objects. Your JSON could be as complex as possible. This should give you some idea.
https://thewayofcode.wordpress.com/2012/09/18/c-dynamic-object-and-json-serialization-with-json-net/
Can anybody summarize differences and usage scope between them?
I read SO articles,
ShouldBeEquivalientTo(): ShouldBeEquivalentTo() is intended to be used for comparing complex object graphs rather than the primitive types part of the .NET framework.
Should().BeEquivalentTo(): individual items Equals() implementation to verify equivalence and has been around since version 1. The newer ShouldBeEquivalenTo() introduced in FA 2.0 is doing an in-depth structural comparison and also reporting on any differences.
Should().Be(): cannot find.
In my humble understanding, ShouldBeEquivalientTo() and Should().BeEquivalentTo() work similar if Should().BeEquivalentTo() does in-depth comparison.
I agree this is confusing. Should().BeEquivalentTo() should actually be called Should().EqualInAnyOrder() or something like that. As you said, it uses the Equals implementation of the involved objects to see if all of the ones in the expected collection appear in the actual collection, regardless of order. I'll need to fix that for the next major version.
I am about to embark on a project to connect two programs, one in c#, and one in c++. I already have a working c# program, which is able to talk to other versions of itself. Before I start with the c++ version, I've thought of some issues:
1) I'm using protobuf-net v1. I take it the .proto files from the serializer are exactly what are required as templates for the c++ version? A google search mentioned something about pascal casing, but I have no idea if that's important.
2) What do I do if one of the .NET types does not have a direct counterpart in c++? What if I have a decimal or a Dictionary? Do I have to modify the .proto files somehow and squish the data into a different shape? (I shall examine the files and see if I can figure it out)
3) Are there any other gotchas that people can think of? Binary formats and things like that?
EDIT
I've had a look at one of the proto files now. It seems .NET specific stuff is tagged eg bcl.DateTime or bcl.Decimal. Subtypes are included in the proto definitions. I'm not sure what to do about bcl types, though. If my c++ prog sees a decimal, what will it do?
Yes, the proto files should be compatible. The casing is about conventions, which shouldn't affect actual functionality - just the generated code etc.
It's not whether or not there's a directly comparable type in .NET which is important - it's whether protocol buffers support the type which is important. Protocol buffers are mostly pretty primitive - if you want to build up anything bigger, you'll need to create your own messages.
The point of protocol buffers is to make it all binary compatible on the wire, so there really shouldn't be gotchas... read the documentation to find out about versioning policies etc. The only thing I can think of is that in the Java version at least, it's a good idea to make enum fields optional, and give the enum type itself a zero value of "unknown" which will be used if you try to deserialize a new value which isn't supported in deserializing code yet.
Some minor additions to Jon's points:
protobuf-net v1 does have a Getaproto which may help with a starting point, however, for interop purposes I would recommend starting from a .proto; protobuf-net can work this was around too, either via "protogen", or via the VS addin
other than that, you shouldn't have my issues as long as you remember to treat all files as binary; opening files in text mode will cause grief
Here's an interesting question that I don't know much about in terms of existing solutions or research in the field, though I would imagine it relates to the field of compression.
Given two potentially large strings of text, where one represents a later version of the former, is it possible (well I know it's possible, I'm asking really are there existing solutions) to compare those two strings and reduce them to a set of differences that could then later be used to deterministically reconstruct the original strings?
In my case, I'm interested in storing the latest version of the string, but keeping "compressed" (diffed) historical backups that can be restored as needed, without actually having to store all of the duplicated information.
I don't know what to tag this, please help me out.
There is no built in classes in CLR that support diffing.
Related questions seem to have have useful information (i.e. Creating Delta Diff Patches of large Binary Files in C#). You can also look search on "Delta encoding" to start with (i.e. http://en.wikipedia.org/wiki/Delta_encoding).
I have a library (written in C#) for which I need to read/write representations of my objects to disk (or to any Stream) in a particular binary format (to ensure compatibility with C/Java library implementations). The format requires a fair amount of bit-packing and some DEFLATE'd bytestreams. I would like my library, however, to be as idiomatic .NET as possible, however, and so would like to provide an API as close as possible to the normal binary serialization process. I'm aware of the ability to implement the IFormatter interface, but being that I really am unable to reuse any part of the built-in serialization stack, is it worth doing this, or will it just bring unnecessary overhead. In other words:
Implement IFormatter and co.
OR
Just provide "Serialize"/"Deserialize" methods that act on a Stream?
A good point brought up below about needing the serialization semantics for any case involving Remoting. In a case where using MarshalByRef objects is feasible, I'm pretty sure that this won't be an issue, so leaving that aside are there any benefits or drawbacks to using the ISerializable/IFormatter versus a custom stack (or, is my understanding remoting incorrectly)?
I have always gone with the latter. There isn't much use in reusing the serialization framework if all you're doing is writing a file to a specific framework. The only place I've run into any issues with using a custom serialization framework is when remoting, you have to make your objects serializable.
This may not help you since you have to write to a specific format, but protobuf and sqlite are good tools for doing custom serialization.
I'd do the former. There's not much to the interface, and so if you're mimicking the structure anyway adding an ": IFormatter" and the other code necessary to get full compatibility won't take much.
Writing your own serialization code is error prone and time consuming.
As a thought - have you considered existing open-source portable formats, for example "protocol buffers"? This is a high density binary serialization format that underpins much of Google's data transfer etc. Versions are available in a wide range of languages - including Java/C++ etc (in the core Google distribution), and a vast range of others.
In particular, for .NET-idiomatic usage, protobuf-net looks a lot like XmlSerializer or DataContractSerializer (indeed, it can even work purely with xml/wcf attributes if it includes an order on each element) - or can use the specific protobuf-net attributes:
[ProtoContract]
class Person {
[ProtoMember(1)]
public string Name {get;set;}
}
If you want to guarantee portability to other implementations, the recommendation is to start "contract first", with a ".proto" file - in this case, something like:
message person {
required string name = 1;
}
This .proto file can then be used to generate any language-specific variant; so with protobuf-net you'd run it through "protogen" (included in protobuf-net; and a VS2008 add-on is in progress); or for Java/C++ etc you'd run it through "protoc" (included in Google's protobuf). "protogen" in protobuf-net can currently emit C# and VB, but it is pretty easy to add another language if you want to use F# etc - it just involves writing (or migrating) an xslt.
There is also another .NET version that is a more direct port of the Java version; as such it is less .NET idiomatic. This is dotnet-protobufs.