How to implement VARIANT in Protobuf - c#

As part of my protobuf protocol I require the ability to send data of a dynamic type, a little bit like VARIANT. Roughly I require the data to be an integer, string, boolean or "other" where "other" (e.g. DateTime) is serialized as a string. I need to be able to use these as a single field and in lists in a number of different locations in the protocol.
How can this best be implemented while keeping message size minimal and performance optimal?
I'm using protobuf-net with C#.
EDIT:
I've posted a proposed answer below which uses what I think is the minimum of memory required.
EDIT2:
Created a github.com project at http://github.com/pvginkel/ProtoVariant with a complete implementation.

Jon's multiple optionals covers the simplest setup, especially if you need cross-platform support. On the .NET side (to ensure you don't serialize unnecessary values), simply return null from any property that isn't a match, for example:
public object Value { get;set;}
[ProtoMember(1)]
public int? ValueInt32 {
get { return (Value is int) ? (int)Value : (int?)null; }
set { Value = value; }
}
[ProtoMember(2)]
public string ValueString {
get { return (Value is string) ? (string)Value : null; }
set { Value = value; }
}
// etc
You can also do the same using the bool ShouldSerialize*() pattern if you don't like the nulls.
Wrap that up in a class and you should be fine to use that at either the field level or list level. You mention optimal performance; the only additional thing I can suggest there is to perhaps consider treating as a "group" rather than "submessage", as this is easier to encode (and just as easy to decode, as long as you expect the data). To do that, use the Grouped data-format, via [ProtoMember], i.e.
[ProtoMember(12, DataFormat = DataFormat.Group)]
public MyVariant Foo {get;set;}
However, the difference here can be minimal - but it avoids some back-tracking in the output stream to fix the lengths. Either way, in terms of overheads a "submessage" will take at least 2 bytes; "at least one" for the field-header (perhaps taking more if the 12 is actually 1234567) - and "at least one" for the length, which gets bigger for longer messages. A group takes 2 x the field-header, so if you use low field-numbers this will be 2 bytes regardless of the length of the encapsulated data (it could be 5MB of binary).
A separate trick, useful for more complex scenarios but not as interoperable, is generic inheritance, i.e. an abstract base class that has ConcreteType<int>, ConcreteType<string> etc listed as subtypes - this, however, takes an extra 2 bytes (typically), so is not as frugal.
Taking another step further away from the core spec, if you genuinely can't tell what types you need to support, and don't need interoperability - there is some support for including (optimized) type information in the data; see the DynamicType option on ProtoMember - this takes more space than the other two options.

You could have a message like this:
message Variant {
optional string string_value = 1;
optional int32 int32_value = 2;
optional int64 int64_value = 3;
optional string other_value = 4;
// etc
}
Then write a helper class - and possibly extension methods - to ensure that you only ever set one field in the variant.
You could optionally include a separate enum value to specify which field is set (to make it more like a tagged union) but the ability to check the optional fields just means the data is already there. It depends on whether you want the speed of finding the right field (in which case add the discriminator) or the space efficiency of only including the data itself (in which case don't add the discriminator).
That's a general Protocol Buffer approach. There may be something more protobuf-net specific, of course.

Asking questions always helps me think. I found a way to get the number of bytes used for transfer to a bare minimum.
What I've done here is make use of optional properties. Say I want to send an int32. When the value isn't zero, I can just check a property on the message for whether it has a value. Otherwise, I set a type to INT32_ZERO. This way I can correctly store and reconstruct the value. The example below has this implementation for a number of types.
The .proto file:
message Variant {
optional VariantType type = 1 [default = AUTO];
optional int32 value_int32 = 2;
optional int64 value_int64 = 3;
optional float value_float = 4;
optional double value_double = 5;
optional string value_string = 6;
optional bytes value_bytes = 7;
optional string value_decimal = 8;
optional string value_datetime = 9;
}
enum VariantType {
AUTO = 0;
BOOL_FALSE = 1;
BOOL_TRUE = 2;
INT32_ZERO = 3;
INT64_ZERO = 4;
FLOAT_ZERO = 5;
DOUBLE_ZERO = 6;
NULL = 7;
}
And accompanying partial .cs file:
using System;
using System.Collections.Generic;
using System.Text;
using System.Globalization;
namespace ConsoleApplication6
{
partial class Variant
{
public static Variant Create(object value)
{
var result = new Variant();
if (value == null)
result.Type = VariantType.NULL;
else if (value is string)
result.ValueString = (string)value;
else if (value is byte[])
result.ValueBytes = (byte[])value;
else if (value is bool)
result.Type = (bool)value ? VariantType.BOOLTRUE : VariantType.BOOLFALSE;
else if (value is float)
{
if ((float)value == 0f)
result.Type = VariantType.FLOATZERO;
else
result.ValueFloat = (float)value;
}
else if (value is double)
{
if ((double)value == 0d)
result.Type = VariantType.DOUBLEZERO;
else
result.ValueDouble = (double)value;
}
else if (value is decimal)
result.ValueDecimal = ((decimal)value).ToString("r", CultureInfo.InvariantCulture);
else if (value is DateTime)
result.ValueDatetime = ((DateTime)value).ToString("o", CultureInfo.InvariantCulture);
else
throw new ArgumentException(String.Format("Cannot store data type {0} in Variant", value.GetType().FullName), "value");
return result;
}
public object Value
{
get
{
switch (Type)
{
case VariantType.BOOLFALSE:
return false;
case VariantType.BOOLTRUE:
return true;
case VariantType.NULL:
return null;
case VariantType.DOUBLEZERO:
return 0d;
case VariantType.FLOATZERO:
return 0f;
case VariantType.INT32ZERO:
return 0;
case VariantType.INT64ZERO:
return (long)0;
default:
if (ValueInt32 != 0)
return ValueInt32;
if (ValueInt64 != 0)
return ValueInt64;
if (ValueFloat != 0f)
return ValueFloat;
if (ValueDouble != 0d)
return ValueDouble;
if (ValueString != null)
return ValueString;
if (ValueBytes != null)
return ValueBytes;
if (ValueDecimal != null)
return Decimal.Parse(ValueDecimal, CultureInfo.InvariantCulture);
if (ValueDatetime != null)
return DateTime.Parse(ValueDatetime, CultureInfo.InvariantCulture);
return null;
}
}
}
}
}
EDIT:
Further comments from #Marc Gravell have improved the implementation significantly. See the Git repository for a complete implementation of this concept.

Actually protobuf doesn't support any kind of VARIANT types.
You can try to play around using Unions, see more details here
The main idea is to define message wrapper with all existing message types as optional field, and by using union just specify which type of this concrete message it is.
See example by following the link above.

I use ProtoInclude with an abstract base type and subclasses to get the type and single value statically set. Here's the start of what that could look like for Variant:
[ProtoContract]
[ProtoInclude(1, typeof(Integer))]
[ProtoInclude(2, typeof(String))]
public abstract class Variant
{
[ProtoContract]
public sealed class Integer
{
[ProtoMember(1)]
public int Value;
}
[ProtoContract]
public sealed class String
{
[ProtoMember(1)]
public string Value;
}
}
Usage:
var foo = new Variant.String { Value = "Bar" };
var baz = new Variant.Integer { Value = 10 };
This answer gives takes a bit more space as it encodes the length of the ProtoInclude'd class instance (e.g. 1 byte for int and under < 125 byte strings). I am willing to live with this for the benefit of controlling the type statically.

Related

Multiple condition for one if value

public enum Waypointtype { Start, Point, End };
Waypoint currentPoint = m_ListPoints[i];
if(currentPoint.Type == (Waypointtype.Start || Waypointtype.End))
Hello, is there a way do this "if" like above in c#? I am bit lazy and always searching to find a way to write shorter code. Or is the only way like below?
if (currentPoint.Type == Waypointtype.Start || currentPoint.Type == Waypointtype.End)
I don't think there's a shorter way than what you already have. There are two other approaches that I can think of. For example, you could use a switch statement:
switch (currentPoint.Type)
{
case Waypointtype.Start:
case Waypointtype.End:
// do stuff
break;
default:
// default case
break;
}
Or you could use an array with contains:
if (new [] { Waypointtype.Start, Waypointtype.End }.Contains(currentPoint.Type))
In my opinion, the switch conveys intent better here.
Flag attribute can be right tool for the job
[Flags]
public enum Waypointtype
{
Start = 1,
Point = 2,
End = 4
};
Notice that enumeration values should be in powers of two: 1, 2, 4, 8, and so on.
Usage
const Waypointtype StartOrEnd = Waypointtype.Start | Waypointtype.End;
var current = Waypointtype.Start;
if ((StartOrEnd & current) == current)
{
// current type is one of values from test type.
}
Right answer should be #Fabio's answer of using enum Flag attribute.
But, because we are using object-oriented programming language, we should benefit from it.
Condition uses class Waypoint and it's property Type of enum type Waypointtype.
So only class should know "am I of start or end type?".
By encapsulating condition within class we can provide readable name and protect class consumers from knowing implementation details.
// Use FlagAttribut
[Flag]
public enum WaypointType { Start = 1, Point = 2, End = 4 };
public class Waypoint
{
private const WaypointType START_OR_END = WaypointType.Start | WaypointType.End;
public WaypointType Type { get; set; }
public bool IsStartOrEnd => (StartOrEnd & Type) == Type;
}
Usage become short, readable and reusable.
Waypoint currentPoint = m_ListPoints[i];
if (currentPoint.IsStartOrEnd())
{
// do staff
}
Notice that we(developers) are reading code much more than writing it (80% vs 20% maybe).
So instead of writing short code, write it in the way it can be read and understood quickly.
Sometimes it can be dome by writing short code and sometimes it can be done by encapsulating short code under comprehensible structure.
You could add an extension method for Waypoint:
public static class WaypointExtensions
{
public static bool IsStartOrEnd(this Waypoint waypoint)
{
if (waypoint == null)
{
return false;
}
return (waypoint.Type == Waypointtype.Start || waypoint.Type == Waypointtype.End);
}
}
And then use it like:
Waypoint currentPoint = m_ListPoints[i];
if(currentPoint.IsStartOrEnd())
{
...
}

Implement CompareTo() - compare by various functions

I implemented CompareTo() like so:
public override int CompareTo(object obj)
{
//quick failsafe
MyClass other;
if (obj is MyClass)
{
other = obj as MyClass;
}
else
{
return 1;
}
//now we should have another comparable object.
/*
* 1: this is greater.
* 0: equals.
* -1: this is less.
*/
if (other.GetValue() < this.GetValue())
{
// this is bigger
return 1;
}
else if (other.GetValue() > this.GetValue())
{
//this is smaller
return -1;
}
else
{
return 0;
}
}
However, things get interesting when I want to chose the function GetValue(). I have a couple of them set up for that: namely Average(), Best(), CorrectedAverage(), Median(). I compare by an array of floats by the way. Thing is, I don't want to use a switch-case on an enum I defined in this class to tell what to order by. Is there a way that I decide which function to order by nice and clean?
Given that your class has a whole bunch of different ways of comparing it, it almost certainly shouldn't implement IComparable at all.
Instead, create IComparer<T> instances for each different way of comparing your object. Someone who wants to comparer instances of the type can then pick the comparer that uses the comparison that's most appropriate for their situation.

Is there a cleaner way to represent this idiom in C#? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 years ago.
Improve this question
I am using a struct in a project, like so:
struct Position
{
public int X { get; private set; }
public int Y { get; private set; }
// etc
}
I would like to add a method that allows me to create a modified copy of the struct with arbitrarily changed properties. For example, it would be convenient to use this:
var position = new Position(5, 7);
var newPos = position.With(X: position.X + 1);
Is this idiom hacky? Are there better ways to support this?
public Position With(int? X = null, int? Y = null)
{
return new Position(X ?? this.X, Y ?? this.Y);
}
Edit: in case it was unclear, the struct is immutable, I simply want to create a new value with some values modified. Incidentally, this is very similar to Haskell's syntactic sugar for records, where one would write newPos = oldPos { x = x oldPos + 1 }. This is just a bit experimental as to whether such an idiom is helpful in C#.
Personally, I consider the idiom of a plain-old-data-struct to be vastly underrated. Mutable structs which encapsulate state in anything other than public fields are problematic, but sometimes it's useful to bind together a fixed collection of variables stuck together with duct tape so they can be passed around as a unit. A plain-old-data-struct is a perfect fit for that usage; it behaves like a fixed collection of variables stuck together with duct tape, since that's what it is. One can with some work come up with an immutable class which requires slow and hard-to-read code to do anything with, or with some more work come up with something that's still slow but not quite so unaesthetic; one can also code structures in such fashion as to mimic such classes. In many cases, however, the only effect of going through all that effort is that one's code will be slower and less clear than it would have been if one had simply used a PODS.
The key thing that needs to be understood is that a PODS like struct PersonInfo { public string Name, SSN; public Date Birthdate; } does not represent a person. It represents a space that can hold two strings and a date. If one says var fredSmithInfo = myDatabase.GetPersonInfo("Fred Smith");, then FredSmithInfo.BirthDate doesn't represent Fred Smith's birthdate; it represents a variable of type Date which is initially loaded with the value returned by a call to GetPersonInfo--but like any other variable of type Date, could be changed to hold any other date.
That's about as neat a way as you're going to get. Doesn't seem particularly hacky to me.
Although in cases where you're just doing position.X + 1 it'd be neater to have something that was like:
var position = new Position(5,7);
var newPos = position.Add(new Position(1,0));
Which would give you a modified X value but not a modified Y value.
One could consider this approach as a variant of the prototype pattern where the focus is on having a template struct rather than avoiding the cost of new instances. Whether the design is good or bad depends on your context. If you can make the message behind the syntax clear (I think the name With you're using is a bit unspecific; maybe something like CreateVariant or CreateMutant would make the intention clearer), I would consider it an appropriate approach.
I'm adding an expression based form as well. Do note the horrendous boxing/unboxing which needs to be done due to the fact that it is a struct.
But as one can see the format is quite nice:
var p2 = p.With(t => t.X, 4);
var p3 = p.With(t => t.Y, 7).With(t => t.X, 5); // Yeah, replace all the values :)
And the method is really applicable to all kinds of types.
public void Test()
{
var p = new Position(8, 3);
var p2 = p.With(t => t.X, 4);
var p3 = p.With(t => t.Y, 7).With(t => t.X, 5);
Console.WriteLine(p);
Console.WriteLine(p2);
Console.WriteLine(p3);
}
public struct Position
{
public Position(int X, int Y)
{
this._X = X; this._Y = Y;
}
private int _X; private int _Y;
public int X { get { return _X; } private set { _X = value; } }
public int Y { get { return _Y; } private set { _Y = value; } }
public Position With<T, P>(Expression<Func<Position, P>> propertyExpression, T value)
{
// Copy this
var copy = (Position)this.MemberwiseClone();
// Get the expression, might be both MemberExpression and UnaryExpression
var memExpr = propertyExpression.Body as MemberExpression ?? ((UnaryExpression)propertyExpression.Body).Operand as MemberExpression;
if (memExpr == null)
throw new Exception("Empty expression!");
// Get the propertyinfo, we need this one to set the value
var propInfo = memExpr.Member as PropertyInfo;
if (propInfo == null)
throw new Exception("Not a valid expression!");
// Set the value via boxing and unboxing (mutable structs are evil :) )
object copyObj = copy;
propInfo.SetValue(copyObj, value); // Since struct are passed by value we must box it
copy = (Position)copyObj;
// Return the copy
return copy;
}
public override string ToString()
{
return string.Format("X:{0,4} Y:{1,4}", this.X, this.Y);
}
}

C# Assigning a variable from a different object

I'm not quite sure how to ask my question in C# terms, so please bear with the long-winded explanation.
I'm writing a stock trading algorithm. When the algo starts, it checks to see what kind of instrument it is applied to (in this case, either stock or futures), and then depending on the instrument, assigns a value to "double x".
If its a future instrument, then the assignment is a simple, flat value (in this case, "double x = 5;). However, if its a stock, I'd like "x" to be assigned to a value from another object - lets call the object "Algo2" and the value "y". So, in my script the assignment is as follows: "double x = Algo2.y" (note: that's the convention in the editor I'm using). This block of code is run only once when the algorithm begins.
What I'm trying to achieve here is to tell my algorithm to get the latest value of "Algo2.y" whenever "x" is used in a formula such as "EntryValue = Price + x". However, whats happening is that "x" is permanently assigned the value of "Algo2.y" at the start of the program, and since that block is never run again, remains that constant value throughout.
Can anyone help with the syntax so that instead of assigning a value to "x", it simply points to get the latest value of "Algo2.y" whevever it's called?
Thanks!
Make 'x' a property, so that it fetches the value each time you ask for x.
class StockInstrument
{
public double Value //x isn't a good name, I'll use "Value"
{
get
{
if(...) return 5.0;
else return Algo2.y;
}
}
}
Write a function for it:
double getAlgo2YValue()
{
return Algo2.y; // or Algo2.getY(), another function if you can't access it
}
In your main algorithm, now call:
x = getAlgo2YValue();
To update X.
I would use a method to return your latest value
public double GetXValue()
{
if (AlgoType == Algos.Futures)
{
return 5.0;
}
else if (AlgoType == Algos.Stock)
{
return Algo2.y;
}
//else
throw new Exception("unknown algo type");
}
This is quite hard coded, but it could be cleaned up using delegates and encapsulation of the algorithms, but at a low level - this is the idea. Also, some people prefer to use properties for this - Just don't use properties when the get has modifying affects
public double X
{
get
{
if (AlgoType == Algos.Futures)
{
return 5.0;
}
else if (AlgoType == Algos.Stock)
{
return Algo2.y;
}
//else
throw new Exception("unknown algo type");
}
}
May use something like:
double X {
get {
if(isStock())
return Algo2.y;
else
return 5;
}
}
Func<int> getX;
if(isFuture)
getX = () => 5;
else
getX = () => Algo.y;
// using getX() will always return the current value of Algo.y,
// in case it's a stock.
int xval = getX();
Give Algo2 a reference to Algo so that no 'double X' copy is needed. Algo can then dereference the actual value in Algo2 at any time, (thread-safety an issue?).
Value data types, such as int are always going to be copied by value, not as a reference. However, what you can do is architect your solution a little differently, and then it will provide the right value. For example:
public class ValueContainer
{
protected Algo2 _reference = null;
protected double _staticValue = 0;
public double CurrentValue
{
get
{
if(_reference == null)
return _staticValue;
return _reference.y;
}
}
public ValueContainer(Algo2 reference)
{
_reference = reference;
}
public ValueContainer(double value)
{
_staticValue = value;
}
}
Then, you replace your x with the ValueContainer instance wherever needed and use the CurrentValue property to get the value. You create each version with a different constructor then:
ValueContainer container = null;
if(stock)
container = new ValueContainer(5);
else
container = new ValueContainer(Algo2);
What you need is a property wrapper for x to control the value that's returned, based on the instrument type. Here's an example, which will require some significant adaptation for your app.
public class Instrument
{
// an example enum holding types
public InstrumentType Type {get; set;}
// x is not a great name, but following your question's convention...
public double X
{
get
{
if(type == InstrumentType.Stock)
return Algo2.y();
// note that I changed this to be a method rather than a property
// Algo2.y() should be static so it can be called without an instance
else if(type == InstrumentType.Future)
return 5.0;
else
// return some default value here
}
}
}

Why do 2 delegate instances return the same hashcode?

Take the following:
var x = new Action(() => { Console.Write("") ; });
var y = new Action(() => { });
var a = x.GetHashCode();
var b = y.GetHashCode();
Console.WriteLine(a == b);
Console.WriteLine(x == y);
This will print:
True
False
Why is the hashcode the same?
It is kinda surprising, and will make using delegates in a Dictionary as slow as a List (aka O(n) for lookups).
Update:
The question is why. IOW who made such a (silly) decision?
A better hashcode implementation would have been:
return Method ^ Target == null ? 0 : Target.GetHashcode();
// where Method is IntPtr
Easy! Since here is the implementation of the GetHashCode (sitting on the base class Delegate):
public override int GetHashCode()
{
return base.GetType().GetHashCode();
}
(sitting on the base class MulticastDelegate which will call above):
public sealed override int GetHashCode()
{
if (this.IsUnmanagedFunctionPtr())
{
return ValueType.GetHashCodeOfPtr(base._methodPtr);
}
object[] objArray = this._invocationList as object[];
if (objArray == null)
{
return base.GetHashCode();
}
int num = 0;
for (int i = 0; i < ((int) this._invocationCount); i++)
{
num = (num * 0x21) + objArray[i].GetHashCode();
}
return num;
}
Using tools such as Reflector, we can see the code and it seems like the default implementation is as strange as we see above.
The type value here will be Action. Hence the result above is correct.
UPDATE
My first attempt of a better implementation:
public class DelegateEqualityComparer:IEqualityComparer<Delegate>
{
public bool Equals(Delegate del1,Delegate del2)
{
return (del1 != null) && del1.Equals(del2);
}
public int GetHashCode(Delegate obj)
{
if(obj==null)
return 0;
int result = obj.Method.GetHashCode() ^ obj.GetType().GetHashCode();
if(obj.Target != null)
result ^= RuntimeHelpers.GetHashCode(obj);
return result;
}
}
The quality of this should be good for single cast delegates, but not so much for multicast delegates (If I recall correctly Target/Method return the values of the last element delegate).
But I'm not really sure if it fulfills the contract in all corner cases.
Hmm it looks like quality requires referential equality of the targets.
This smells like some of the cases mentioned in this thread, maybe it will give you some pointers on this behaviour. else, you could log it there :-)
What's the strangest corner case you've seen in C# or .NET?
Rgds GJ
From MSDN :
The default implementation of
GetHashCode does not guarantee
uniqueness or consistency; therefore,
it must not be used as a unique object
identifier for hashing purposes.
Derived classes must override
GetHashCode with an implementation
that returns a unique hash code. For
best results, the hash code must be
based on the value of an instance
field or property, instead of a static
field or property.
So if you have not overwritten the GetHashCode method, it may return the same. I suspect this is because it generates it from the definition, not the instance.

Categories

Resources