Protobuf-net v2 and large Dictionaries

Protobuf-net v2 and large Dictionaries - c#

I have a weird situation happening that I'm not quite understanding.
I have a 'dataset' class that holds various metadata about a monitoring buoy including a list of 'sensors'.
Each current 'sensorstate'.
Each 'sensorstate' has a bit of metadata about it (timestamp, reason for change etc) but most importantly it has a Dictionary<DateTime,float> of values.
These sensors generally have upwards of 50k data points (years worth of 15min data readings) and so I wanted to find something that was a bit faster at serialising than the default .NET BinaryFormatter and so set up Protobuf-net which will serialize fantastically fast.
Unfortunately my problem occurs on deserialization when my dictionary of values throws a exception for there already being an item with the same key added and the only way I can get it to deserialise is to enable 'OverwriteList' but I'm a little unsure why when there aren't any duplicate keys (it's a dictionary) when serializing, so why are there duplicate keys when I deserialize? Which also brings up data integrity issues.
Any help in explaining this would be highly appreciated.
(On a side note, when giving ProtoMember attribute ids, do they need to be unique to the class or the whole project? and I'm looking for lossless compression recommendations to use in conjunction with protobuf-net as the files are getting pretty large)
Edit:
I've just put my source up on GitHub and here is the class in question
SensorState (Note: it currently has OverwriteList = true in order to have it working for other development)
Here is an example raw data file
I had already tried using the SkipContructor flag but even with it set to true it gets an exception unless OverwriteList is also true for the values dictionary.

If OverwriteList fixes it, then it suggests to me that the dictionary has some data in it by default, perhaps via a constructor or similar. If it is indeed coming from the constructor, you can disable that with [ProtoContract(SkipConstructor=true)].
If I have misunderstood the above, it may help to illustrate with a reproducible example, if possible.
With regard to the ids, they only need to be unique inside each type, and it is recommended to keep them small (due to "varint" encoding of tags, small keys are "cheaper" than large keys).
If you want to really minimise size, I would actually suggest looking at the content of the data, too. For example, you say that this is 15 minute readings... well, I'm guessing there are occasional gaps, but could you do, for example:
Block (class)
Start Time (DateTime)
Values (float[])
and have a Block for every contiguous bunch of 15-minute values (the assumption here is that every value is 15 after the last, else a new block is started). So you are storing multiple Block instances in place of a single dictionary. This has the advantages:
much less DateTime values to store
you can use "packed" encoding on the floats, which means it doesn't need to add all the intermediate tags; you do this by marking an array/list as ([ProtoMember({key}, IsPacked = true)]) - noting that it only works on a few basic data-types (not sub-objects)
combined, these two tweaks could yield significant savings
If the data has a lot of strings, you could try GZIP/DEFLATE. You can of course try these either way, but without large amounts of string data I would be cautious of expecting too much extra from compression.
As an update based on the supplied (CSV) data file, there is no inherent problem here handling the dictionary - as shown:
using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;
using ProtoBuf;
class Program
{
static void Main()
{
var data = new Data
{
Points =
{
{new DateTime(2009,09,1,0,0,0), 11.04F},
{new DateTime(2009,09,1,0,15,0), 11.04F},
{new DateTime(2009,09,1,0,30,0), 11.01F},
{new DateTime(2009,09,1,0,45,0), 11.01F},
{new DateTime(2009,09,1,1,0,0), 11F},
{new DateTime(2009,09,1,1,15,0), 10.98F},
{new DateTime(2009,09,1,1,30,0), 10.98F},
{new DateTime(2009,09,1,1,45,0), 10.92F},
{new DateTime(2009,09,1,2,00,0), 10.09F},
}
};
var ms = new MemoryStream();
Serializer.Serialize(ms, data);
ms.Position = 0;
var clone =Serializer.Deserialize<Data>(ms);
Console.WriteLine("{0} points:", clone.Points.Count);
foreach(var pair in clone.Points.OrderBy(x => x.Key))
{
float orig;
data.Points.TryGetValue(pair.Key, out orig);
Console.WriteLine("{0}: {1}", pair.Key, pair.Value == orig ? "correct" : "FAIL");
}
}
}
[ProtoContract]
class Data
{
private readonly Dictionary<DateTime, float> points = new Dictionary<DateTime, float>();
[ProtoMember(1)]
public Dictionary<DateTime, float> Points { get { return points; } }
}

This is where I apologize for ever suggesting it had anything to do with code that wasn't my own doing. And while I'm here mad props to the team behind protobuf and Marc Gravell for protobuf-net it's seriously fast.
What was happening was in the Sensor class I had some logic to never let a couple of Properties never be null.
[ProtoMember(12)]
public SensorState CurrentState
{
get { return (_currentState == null) ? RawData : _currentState; }
set { _currentState = value; }
}
Link
[ProtoMember(16)]
public SensorState RawData
{
get { return _rawData ?? (_rawData = new SensorState(this, DateTime.Now, new Dictionary<DateTime, float>(), "", true, null)); }
private set { _rawData = value; }
}
Link
While this works fantastically for when I'm using the properties it messes up serialization processes.
The simple fix was to instead mark the underlying objects for serialization instead.
[ProtoMember(16)]
private SensorState _rawData;
[ProtoMember(12)]
private SensorState _currentState;
Link

Related

C# serialization without struct metadata itself

Right now I'm working on a game engine. To be more efficient and keep data from the end user, I'm trying to use serialization on a modified form of the Wavefront's *.OBJ format. I have multiple structs set up to represent data, and the serialization of the objects works fine except it takes up a significant amount of file space (at least x5 that of the original OBJ file).
To be specific, here's a quick example of what the final object would be (in a JSON-esque format):
{
[{float 5.0, float 2.0, float 1.0}, {float 7.0, float 2.0, float 1.0}, ...]
// ^^^ vertex positions
// other similar structures for colors, normals, texture coordinates
// ...
[[{int 1, int 1, int 1}, {int 2, int 2, int 1}, {int 3, int 3, int 2}], ...]
//represents one face; represents the following
//face[vertex{position index, text coords index, normal index}, vertex{}...]
}
Basically, my main issue with the method of serializing data (binary format) is it saves the names of the structs, not the values. I'd love to keep the data in the format I have already, just without saving the struct itself in my data. I want to save something similar to the above, yet it'll still let me recompile with a different struct name later.
Here's the main object I'm serializing and saving to a file:
[Serializable()] //the included structs have this applied
public struct InstantGameworksObjectData
{
public Position[] Positions;
public TextureCoordinates[] TextureCoordinates;
public Position[] Normals;
public Face[] Faces;
}
Here's the method in which I serialize and save the data:
IFormatter formatter = new BinaryFormatter();
long Beginning = DateTime.Now.Ticks / 10000000;
foreach (string file in fileNames)
{
Console.WriteLine("Begin " + Path.GetFileName(file));
var output = InstantGameworksObject.ConvertOBJToIGWO(File.ReadAllLines(file));
Console.WriteLine("Writing file");
Stream fileOutputStream = new FileStream(outputPath + #"\" + Path.GetFileNameWithoutExtension(file) + ".igwo", FileMode.Create, FileAccess.Write, FileShare.None);
formatter.Serialize(fileOutputStream, output);
Console.WriteLine(outputPath + #"\" + Path.GetFileNameWithoutExtension(file) + ".igwo");
}
The output, of course, is in binary/hex (based on what program you use to view the file), and that's great:
But putting it into a hex-to-text converter online yields specific name data:
In the long run, this could mean gigabytes worth of useless data. How can I save my C# object with the data in the correct format, just without the extra meta-clutter?

As you correctly note, the standard framework binary formatters include a host of metadata about the structure of the data. This is to try to keep the serialised data self-describing. If they were to separate the data from all that metadata, then the smallest change to the structure of classes would render the previously serialised data useless. By that token, I doubt you'd find any standard framework method of serialising binary data that didn't include all the metadata.
Even ProtoBuf includes the semantics of the data in the file data, albeit with less overhead.
Given that the structure of your data follows the reasonably common and well established form of 3D object data, you could roll your own format for your assets which strips the semantics and only stores the raw data. You can implement read and write methods easily using the BinaryReader/BinaryWriter classes (which would be my preference). If you're looking to obfuscate data from the end user, there are a variety of different ways that you could achieve that with this approach.
For example:
public static InstantGameworksObjectData ReadIgoObjct(BinaryReader pReader)
{
var lOutput = new InstantGameworksObjectData();
int lVersion = pReader.ReadInt32(); // Useful in case you ever want to change the format
int lPositionCount = pReader.ReadInt32(); // Store the length of the Position array before the data so you can pre-allocate the array.
lOutput.Positions = new Position[lPositionCount];
for ( int lPositionIndex = 0 ; lPositionIndex < lPositionCount ; ++ lPositionIndex )
{
lOutput.Positions[lPositionIndex] = new Position();
lOutput.Positions[lPositionIndex].X = pReader.ReadSingle();
lOutput.Positions[lPositionIndex].Y = pReader.ReadSingle();
lOutput.Positions[lPositionIndex].Z = pReader.ReadSingle();
// or if you prefer... lOutput.Positions[lPositionIndex] = Position.ReadPosition(pReader);
}
int lTextureCoordinateCount = pReader.ReadInt32();
lOutput.TextureCoordinates = new TextureCoordinate[lPositionCount];
for ( int lTextureCoordinateIndex = 0 ; lTextureCoordinateIndex < lTextureCoordinateCount ; ++ lTextureCoordinateIndex )
{
lOutput.TextureCoordinates[lTextureCoordinateIndex] = new TextureCoordinate();
lOutput.TextureCoordinates[lTextureCoordinateIndex].X = pReader.ReadSingle();
lOutput.TextureCoordinates[lTextureCoordinateIndex].Y = pReader.ReadSingle();
lOutput.TextureCoordinates[lTextureCoordinateIndex].Z = pReader.ReadSingle();
// or if you prefer... lOutput.TextureCoordinates[lTextureCoordinateIndex] = TextureCoordinate.ReadTextureCoordinate(pReader);
}
// ...
}
As far as space efficiency and speed goes, this approach is hard to beat. However, this works well for the 3D objects as they're fairly well-defined and the format is not likely to change, but this approach may not extend well to the other assets that you want to store.
If you find you are needing to change class structures frequently, you may find you have to write lots of if-blocks based on version to correctly read a file, and have to regularly debug issues where the data in the file is not quite in the format you expect. A happy medium might be to use something such as ProtoBuf for the bulk of your development until you're happy with the structure of your data object classes, and then writing raw binary Read/Write methods for each of them before you release.
I'd also recommend some Unit Tests to ensure that your Read and Write methods are correctly persisting the object to avoid pulling your hair out later.
Hope this helps

Best way to store ROE in C#

I want to save Rate of Exchange of all currencies corresponding different base currency. What is the best and efficient type or struct to save the same. Currently i am using
Dictionary<string, Dictionary<string, decimal>>();
Thanks in advance. Please suggest

Typically I use a variation of the following class.
public ForexSpotContainer : IEnumerable<FxSpot>
{
[DataMember] private readonly Dictionary<string, FxSpot> _fxSpots;
public FxSpot this[string baseCurrency, string quoteCurrency]
{
get
{
var baseCurrencySpot = _fxSpots[baseCurrency];
var quoteCurrencySpot = _fxSpots[quoteCurrency];
return baseCurrencySpot.Invert()*quoteCurrencySpot;
}
}
public IEnumerator<FxSpot> GetEnumerator()
{
return _fxSpots.Values.GetEnumerator();
}
IEnumerator IEnumerable.GetEnumerator()
{
return GetEnumerator();
}
}
I tend to then create a Money class and a FxSpot class then create +-*/ operators for the Money and FxSpot classes so that I can do financial calculations in a safe way.
EDIT: I my experience, when working in financial systems, I have always had issues with code like this
decimal sharePriceOfMsft = 40.30m;
decimal usdEur = 0.75m;
decimal msftInEur = sharePriceOfMsft * usdEur;
Since it always takes a few second for me to check if I should multiply the spot or divide.
The problem is compounded when I have to use Forex Crosses, such as JPYEUR or EURJPY etc, and hours were lost to subtitle bugs from close Forex Spots.
Also of consequence is the dimensional analysis of equations. When you multiple lots of numbers together and you expect a Price, are you sure you didn't mess up a multiple/divide. By creating a new class for each unit, you have a little more compile time error checking that can ultimately save you seconds for each line of code you read (which in a many thousand line library will add up to hours very quickly).

Don't see big problem with your approach, if not
1) use some custom 3rd party in memory DB (redis like) . But may reveal too combersome for your case.
2) derive from your type
public class MyCustomHolder : Dictionary<string, Dictionary<string, decimal>> {
}
So avoid that long and confusing definitions in the code, and bring more semantics to
your code reader and yourself.

Reflection is too slow while deserialising JSON strings into .NET objects

I'm having some issues with System.Reflection in C#. I'm pulling data from a database and retrieving that data in a JSON string. I've made my own implementation of handling the data from JSON into my self declared objects using Reflection. However, since I ussually get a JSON string with an array of like 50 - 100 objects my program runs really slow because of the loops I'm using with reflection.
I've heard that reflection is slow but it shouldn't be this slow. I feel something is not right in my implementation since I have a different project where I use JSON.NET serializer and instantiate my objects a bit differently with reflection that runs just fine on the same output (less than a second) while my slow program takes about 10 seconds for 50 objects.
Below are my classses that I'm using to store data
class DC_Host
{
public string name;
public void printProperties()
{
//Prints all properties of a class usign reflection
//Doesn't really matter, since I'm not usign this for processing
}
}
class Host : DC_Host
{
public string asset_tag;
public string assigned;
public string assigned_to;
public string attributes;
public bool? can_print;
public string category;
public bool? cd_rom;
public int? cd_speed;
public string change_control;
public string chassis_type;
//And some more properties (around 70 - 80 fields in total)
Below you'll find my methods for processing the information into the objects that are stored inside a List. The JSON data is stored inside a dictionairy that contains a another dictionairy for every array object defined in the JSON input. Deserialising the JSON happens in a matter of miliseconds so there shouldn't be a problem in there.
public List<DC_Host> readJSONTtoHost(ref Dictionary<string, dynamic> json)
{
bool array = isContainer();
List<DC_Host> hosts = new List<DC_Host>();
//Do different processing on objects depending on table type (array/single)
if (array)
{
foreach (Dictionary<string, dynamic> obj in json[json.First().Key])
{
hosts.Add(reflectToObject(obj));
}
}
else
{
hosts.Add(reflectToObject(json[json.First().Key]));
}
return hosts;
}
private DC_Host reflectToObject(Dictionary<string,dynamic> obj)
{
Host h = new Host();
FieldInfo[] fields = h.GetType().GetFields();
foreach (FieldInfo f in fields)
{
Object value = null;
/* IF there are values that are not in the dictionairy or where wrong conversion is
* utilised the values will not be processed and therefore not inserted into the
* host object or just ignored. On a later stage I might post specific error messages
* in the Catch module. */
/* TODO : Optimize and find out why this is soo slow */
try
{
value = obj[convTable[f.Name]];
}
catch { }
if (value == null)
{
f.SetValue(h, null);
continue;
}
// Het systeem werkt met list containers, MAAAR dan mogen er geen losse values zijn dus dit hangt
// zeer sterk af van de implementatie van Service Now.
if (f.FieldType == typeof(List<int?>)) //Arrays voor strings,ints en bools dus nog definieren
{
int count = obj[convTable[f.Name]].Count;
List<int?> temp = new List<int?>();
for (int i = 0; i < count; i++)
{
temp.Add(obj[convTable[f.Name]][i]);
f.SetValue(h, temp);
}
}
else if (f.FieldType == typeof(int?))
f.SetValue(h, int.Parse((string)value));
else if (f.FieldType == typeof(bool?))
f.SetValue(h, bool.Parse((string)value));
else
f.SetValue(h, (string)value);
}
Console.WriteLine("Processed " + h.name);
return h;
}
I'm not sure what JSON.NET's implementation is in the background for using reflection but I'm assumign they use something I'm missing for optimising their reflection.

Basically, high-performance code like this tends to use meta-programming extensively; lots of ILGenerator etc (or Expression / CodeDom if you find that scary). PetaPoco showed a similar example earlier today: prevent DynamicMethod VerificationException - operation could destabilize the runtime
You could also look at the code other serialization engines, such as protobuf-net, which has crazy amounts of meta-programming.
If you don't want to go quite that far, you could look at FastMember, which handles the crazy stuff for you, so you just have to worry about object/member-name/value.

For people that are running into this article I'll post my solution to my problem in here.
The issue wasn't really related to reflection. There are ways to improve the speed using Reflection like CodesInChaos and Marc Gravell mentioned where Marc even craeted a very usefull library (FastMember) for people with not too much experience in low level reflection.
The solution however was non related to reflection itself. I had a Try Catch statement to evaluate if values exist in my dictionary. Using try catch statements to handle program flow is not a good idea. Handling exceptions is heavy on performance and especially when you're running the debugger, Try Catch statements can drastically kill your performance.
//New implementation, use TryGetValue from Dictionary to check for excising values.
dynamic value = null;
obj.TryGetValue(convTable[f.Name], out value);
My program runs perfectly fine now since I omitted the TryCatch statement.

How to access the reference values of a HashSet<TValue> without enumeration?

I have this scenario in which memory conservation is paramount. I am trying to read in > 1 GB of Peptide sequences into memory and group peptide instances together that share the same sequence. I am storing the Peptide objects in a Hash so I can quickly check for duplication, but found out that you cannot access the objects in the Set, even after knowing that the Set contains that object.
Memory is really important and I don't want to duplicate data if at all possible. (Otherwise I would of designed my data structure as: peptides = Dictionary<string, Peptide> but that would duplicate the string in both the dictionary and Peptide class). Below is the code to show you what I would like to accomplish:
public SomeClass {
// Main Storage of all the Peptide instances, class provided below
private HashSet<Peptide> peptides = new HashSet<Peptide>();
public void SomeMethod(IEnumerable<string> files) {
foreach(string file in files) {
using(PeptideReader reader = new PeptideReader(file)) {
foreach(DataLine line in reader.ReadNextLine()) {
Peptide testPep = new Peptide(line.Sequence);
if(peptides.Contains(testPep)) {
// ** Problem Is Here **
// I want to get the Peptide object that is in HashSet
// so I can add the DataLine to it, I don't want use the
// testPep object (even though they are considered "equal")
peptides[testPep].Add(line); // I know this doesn't work
testPep.Add(line) // THIS IS NO GOOD, since it won't be saved in the HashSet which i use in other methods.
} else {
// The HashSet doesn't contain this peptide, so we can just add it
testPep.Add(line);
peptides.Add(testPep);
}
}
}
}
}
}
public Peptide : IEquatable<Peptide> {
public string Sequence {get;private set;}
private int hCode = 0;
public PsmList PSMs {get;set;}
public Peptide(string sequence) {
Sequence = sequence.Replace('I', 'L');
hCode = Sequence.GetHashCode();
}
public void Add(DataLine data) {
if(PSMs == null) {
PSMs = new PsmList();
}
PSMs.Add(data);
}
public override int GethashCode() {
return hCode;
}
public bool Equals(Peptide other) {
return Sequence.Equals(other.Sequence);
}
}
public PSMlist : List<DataLine> { // and some other stuff that is not important }
Why does HashSet not let me get the object reference that is contained in the HashSet? I know people will try to say that if HashSet.Contains() returns true, your objects are equivalent. They may be equivalent in terms of values, but I need the references to be the same since I am storing additional information in the Peptide class.
The only solution I came up with is Dictionary<Peptide, Peptide> in which both the key and value point to the same reference. But this seems tacky. Is there another data structure to accomplish this?

Basically you could reimplement HashSet<T> yourself, but that's about the only solution I'm aware of. The Dictionary<Peptide, Peptide> or Dictionary<string, Peptide> solution is probably not that inefficient though - if you're only wasting a single reference per entry, I would imagine that would be relatively insignificant.
In fact, if you remove the hCode member from Peptide, that will safe you 4 bytes per object which is the same size as a reference in x86 anyway... there's no point in caching the hash as far as I can tell, as you'll only compute the hash of each object once, at least in the code you've shown.
If you're really desperate for memory, I suspect you could store the sequence considerably more efficiently than as a string. If you give us more information about what the sequence contains, we may be able to make some suggestions there.
I don't know that there's any particularly strong reason why HashSet doesn't permit this, other than that it's a relatively rare requirement - but it's something I've seen requested in Java as well...

Use a Dictionary<string, Peptide>.

Using Protobuf-net, I suddenly got an exception about an unknown wire-type

(this is a re-post of a question that I saw in my RSS, but which was deleted by the OP. I've re-added it because I've seen this question asked several times in different places; wiki for "good form")
Suddenly, I receive a ProtoException when deserializing and the message is: unknown wire-type 6
What is a wire-type?
What are the different wire-type values and their description?
I suspect a field is causing the problem, how to debug this?

First thing to check:
IS THE INPUT DATA PROTOBUF DATA? If you try and parse another format (json, xml, csv, binary-formatter), or simply broken data (an "internal server error" html placeholder text page, for example), then it won't work.
What is a wire-type?
It is a 3-bit flag that tells it (in broad terms; it is only 3 bits after all) what the next data looks like.
Each field in protocol buffers is prefixed by a header that tells it which field (number) it represents,
and what type of data is coming next; this "what type of data" is essential to support the case where
unanticipated data is in the stream (for example, you've added fields to the data-type at one end), as
it lets the serializer know how to read past that data (or store it for round-trip if required).
What are the different wire-type values and their description?
0: variant-length integer (up to 64 bits) - base-128 encoded with the MSB indicating continuation (used as the default for integer types, including enums)
1: 64-bit - 8 bytes of data (used for double, or electively for long/ulong)
2: length-prefixed - first read an integer using variant-length encoding; this tells you how many bytes of data follow (used for strings, byte[], "packed" arrays, and as the default for child objects properties / lists)
3: "start group" - an alternative mechanism for encoding child objects that uses start/end tags - largely deprecated by Google, it is more expensive to skip an entire child-object field since you can't just "seek" past an unexpected object
4: "end group" - twinned with 3
5: 32-bit - 4 bytes of data (used for float, or electively for int/uint and other small integer types)
I suspect a field is causing the problem, how to debug this?
Are you serializing to a file? The most likely cause (in my experience) is that you have overwritten an existing file, but have not truncated it; i.e. it was 200 bytes; you've re-written it, but with only 182 bytes. There are now 18 bytes of garbage on the end of your stream that is tripping it up. Files must be truncated when re-writing protocol buffers. You can do this with FileMode:
using(var file = new FileStream(path, FileMode.Truncate)) {
// write
}
or alternatively by SetLength after writing your data:
file.SetLength(file.Position);
Other possible cause
You are (accidentally) deserializing a stream into a different type than what was serialized. It's worth double-checking both sides of the conversation to ensure this is not happening.

Since the stack trace references this StackOverflow question, I thought I'd point out that you can also receive this exception if you (accidentally) deserialize a stream into a different type than what was serialized. So it's worth double-checking both sides of the conversation to ensure this is not happening.

This can also be caused by an attempt to write more than one protobuf message to a single stream. The solution is to use SerializeWithLengthPrefix and DeserializeWithLengthPrefix.
Why this happens:
The protobuf specification supports a fairly small number of wire-types (the binary storage formats) and data-types (the .NET etc data-types). Additionally, this is not 1:1, nor is is 1:many or many:1 - a single wire-type can be used for multiple data-types, and a single data-type can be encoded via any of multiple wire-types. As a consequence, you cannot fully understand a protobuf fragment unless you already know the scema, so you know how to interpret each value. When you are, say, reading an Int32 data-type, the supported wire-types might be "varint", "fixed32" and "fixed64", where-as when reading a String data-type, the only supported wire-type is "string".
If there is no compatible map between the data-type and wire-type, then the data cannot be read, and this error is raised.
Now let's look at why this occurs in the scenario here:
[ProtoContract]
public class Data1
{
[ProtoMember(1, IsRequired=true)]
public int A { get; set; }
}
[ProtoContract]
public class Data2
{
[ProtoMember(1, IsRequired = true)]
public string B { get; set; }
}
class Program
{
static void Main(string[] args)
{
var d1 = new Data1 { A = 1};
var d2 = new Data2 { B = "Hello" };
var ms = new MemoryStream();
Serializer.Serialize(ms, d1);
Serializer.Serialize(ms, d2);
ms.Position = 0;
var d3 = Serializer.Deserialize<Data1>(ms); // This will fail
var d4 = Serializer.Deserialize<Data2>(ms);
Console.WriteLine("{0} {1}", d3, d4);
}
}
In the above, two messages are written directly after each-other. The complication is: protobuf is an appendable format, with append meaning "merge". A protobuf message does not know its own length, so the default way of reading a message is: read until EOF. However, here we have appended two different types. If we read this back, it does not know when we have finished reading the first message, so it keeps reading. When it gets to data from the second message, we find ourselves reading a "string" wire-type, but we are still trying to populate a Data1 instance, for which member 1 is an Int32. There is no map between "string" and Int32, so it explodes.
The *WithLengthPrefix methods allow the serializer to know where each message finishes; so, if we serialize a Data1 and Data2 using the *WithLengthPrefix, then deserialize a Data1 and a Data2 using the *WithLengthPrefix methods, then it correctly splits the incoming data between the two instances, only reading the right value into the right object.
Additionally, when storing heterogeneous data like this, you might want to additionally assign (via *WithLengthPrefix) a different field-number to each class; this provides greater visibility of which type is being deserialized. There is also a method in Serializer.NonGeneric which can then be used to deserialize the data without needing to know in advance what we are deserializing:
// Data1 is "1", Data2 is "2"
Serializer.SerializeWithLengthPrefix(ms, d1, PrefixStyle.Base128, 1);
Serializer.SerializeWithLengthPrefix(ms, d2, PrefixStyle.Base128, 2);
ms.Position = 0;
var lookup = new Dictionary<int,Type> { {1, typeof(Data1)}, {2,typeof(Data2)}};
object obj;
while (Serializer.NonGeneric.TryDeserializeWithLengthPrefix(ms,
PrefixStyle.Base128, fieldNum => lookup[fieldNum], out obj))
{
Console.WriteLine(obj); // writes Data1 on the first iteration,
// and Data2 on the second iteration
}

Previous answers already explain the problem better than I can. I just want to add an even simpler way to reproduce the exception.
This error will also occur simply if the type of a serialized ProtoMember is different from the expected type during deserialization.
For instance if the client sends the following message:
public class DummyRequest
{
[ProtoMember(1)]
public int Foo{ get; set; }
}
But what the server deserializes the message into is the following class:
public class DummyRequest
{
[ProtoMember(1)]
public string Foo{ get; set; }
}
Then this will result in the for this case slightly misleading error message
ProtoBuf.ProtoException: Invalid wire-type; this usually means you have over-written a file without truncating or setting the length
It will even occur if the property name changed. Let's say the client sent the following instead:
public class DummyRequest
{
[ProtoMember(1)]
public int Bar{ get; set; }
}
This will still cause the server to deserialize the int Bar to string Foo which causes the same ProtoBuf.ProtoException.
I hope this helps somebody debugging their application.

Also check the obvious that all your subclasses have [ProtoContract] attribute. Sometimes you can miss it when you have rich DTO.

I've seen this issue when using the improper Encoding type to convert the bytes in and out of strings.
Need to use Encoding.Default and not Encoding.UTF8.
using (var ms = new MemoryStream())
{
Serializer.Serialize(ms, obj);
var bytes = ms.ToArray();
str = Encoding.Default.GetString(bytes);
}

If you are using SerializeWithLengthPrefix, please mind that casting instance to object type breaks the deserialization code and causes ProtoBuf.ProtoException : Invalid wire-type.
using (var ms = new MemoryStream())
{
var msg = new Message();
Serializer.SerializeWithLengthPrefix(ms, (object)msg, PrefixStyle.Base128); // Casting msg to object breaks the deserialization code.
ms.Position = 0;
Serializer.DeserializeWithLengthPrefix<Message>(ms, PrefixStyle.Base128)
}

This happened in my case because I had something like this:
var ms = new MemoryStream();
Serializer.Serialize(ms, batch);
_queue.Add(Convert.ToBase64String(ms.ToArray()));
So basically I was putting a base64 into a queue and then, on the consumer side I had:
var stream = new MemoryStream(Encoding.UTF8.GetBytes(myQueueItem));
var batch = Serializer.Deserialize<List<EventData>>(stream);
So though the type of each myQueueItem was correct, I forgot that I converted a string. The solution was to convert it once more:
var bytes = Convert.FromBase64String(myQueueItem);
var stream = new MemoryStream(bytes);
var batch = Serializer.Deserialize<List<EventData>>(stream);

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Protobuf-net v2 and large Dictionaries - c#

Related

C# serialization without struct metadata itself

Best way to store ROE in C#

Reflection is too slow while deserialising JSON strings into .NET objects

How to access the reference values of a HashSet<TValue> without enumeration?

Using Protobuf-net, I suddenly got an exception about an unknown wire-type

Categories

Resources