Problems with Hashing (with BinaryFormatter) with class object containing negative number - c#

Basically, we have a number of POCO's which we convert to hash values. The purpose is to use the hash string as the unique identifer for that particular object. If we find another object with the same values, the hash string should is the same, etc.
However, we have come across a problem where the hash results appears to be the same if the integer field contains a negative number.
Below is our extension method to Serialize and hash the given object: -
public static string Serialize<T>(this T classObject) where T : class
{
var formatter = new BinaryFormatter();
using (var stream = new MemoryStream())
{
formatter.Serialize(stream, classObject);
stream.Position = 0;
var sr = new StreamReader(stream);
var text = sr.ReadToEnd();
return text;
}
}
public static string ToHash(this string str)
{
var bytes = Encoding.UTF8.GetBytes(str);
var md5 = new SHA256CryptoServiceProvider();
byte[] result = md5.ComputeHash(bytes);
return Convert.ToBase64String(result);
}
In order to demonstrate this problem, I have created a sample class: -
[Serializable]
public class TestClass
{
public string StringA;
public string StringB;
public int? Created;
}
Here is my test code...
var testZero = new TestClass
{
StringA = "String A",
StringB = "String B",
Created = 0,
};
var testNull = new TestClass
{
StringA = "String A",
StringB = "String B",
Created = null,
};
var testMinusOne = new TestClass
{
StringA = "String A",
StringB = "String B",
Created = -1
};
var testMinusTwo = new TestClass
{
StringA = "String A",
StringB = "String B",
Created = -2
};
var testMinusThree = new TestClass
{
StringA = "String A",
StringB = "String B",
Created = -3
};
var testMinusOneHundred = new TestClass
{
StringA = "String A",
StringB = "String B",
Created = -100
};
var testOneHundred = new TestClass
{
StringA = "String A",
StringB = "String B",
Created = 100
};
var rHashZero = testZero.Serialize().ToHash();
var rHashNull = testNull.Serialize().ToHash();
var rHashMinusOne = testMinusOne.Serialize().ToHash();
var rHashMinusTwo = testMinusTwo.Serialize().ToHash();
var rHashMinusThree = testMinusThree.Serialize().ToHash();
var rHashMinusHundred = testMinusOneHundred.Serialize().ToHash();
var rHashHundred = testOneHundred.Serialize().ToHash();
The variables (at the end) contain the following values :-
rHashZero = "aFJROVaqEbWneZJkDnB00qkxPf4TF/w+22VhgR+4nHU=";
rHashNull = "0/tsIhQzZK+Jirnee1o8QTjU8G1hOB/ODdnr2UipBPU=";
rHashMinusOne = "Q5xsfYpm/Em2vw19N9283Gq9fUoI7WxN+ip61S/m3h0=";
rHashMinusTwo = "Q5xsfYpm/Em2vw19N9283Gq9fUoI7WxN+ip61S/m3h0=";
rHashMinusThree = "Q5xsfYpm/Em2vw19N9283Gq9fUoI7WxN+ip61S/m3h0=";
rHashMinusHundred = "Q5xsfYpm/Em2vw19N9283Gq9fUoI7WxN+ip61S/m3h0=";
rHashHundred = "3q6S9vZPujnSc5b2YAbtD61Dj+4B5ZzoILnL1lH291M=";
My main question is why are the objects with the negative integer value all return the same hash string? Despite StringA and StringB being the same, the Created field is not the same.
If anyone can explain this to me - that would be great. Also, Is there a solution?
I have also tested this by removing the nullable (?) from the int, but the results are the same.
PS -- I am convinced I came across a site which mentioned something about negative numbers, but was convinced it was 'fixed' in a later .net release. This is going back a while now so that site may no longer exist.
I tried to find info about this on the internet but no luck. Maybe I am not using correct words on a search engine?
Any help is appreciated.

The problem is that you're reading the result of result of the BinaryFormatter as if it were a properly formed UTF-16 string. It is not.
Unicode is not a simple 1:1 mapping between bytes and characters, unlike ASCII. This means that you managed to malform the data. It's obvious when you print out the string that results from the SerializeMethod:
For the 100 case, I get
□□□□□����□□□□□□□□□□□□□Cquery_rtzxks, Version=0.0.0.0, Culture=neutral, PublicKeyToken=null□□□□□□UserQuery+TestClass□□□□□StringA□StringB□Created□□□□System.Int32□□□□□□□□□□String A□□□□□□String B□□d□□□□
While for -100, I get
□□□□□����□□□□□□□□□□□□□Cquery_rtzxks, Version=0.0.0.0, Culture=neutral, PublicKeyToken=null□□□□□□UserQuery+TestClass□□□□□StringA□StringB□Created□□□□System.Int32□□□□□□□□□□String A□□□□□□String B□□����□
(the namespaces etc. are from LINQPad. The important point are the values like □����□ right there at the end)
It should be rather obvious that your "conversion" is throwing away tons of data. Due to the way the memory is organized, this makes your code appear to work sometimes, but that is the exception - it's just that by chance, some values of the serialized integer happen to be proper unicode characters, which would then result in a different string - if they are not proper characters, they will be the same.
The solution is simple - don't pretend random byte sequences are valid UTF-16 strings. Just pass the byte[] you get from stream.ToArray() and be done with it. If you absolutely want string for some reason, use Convert.ToBase64String.
Also, since this isn't clear in your question, do not treat hashes as unique - they are not. The relation is "if the values are the same, the hashes must be the same", but not "if the hashes are the same, the values must be the same". So in a way, your hashing function is just fine, it doesn't violate this relation. It's not all that useful either, though.
So, why is this giving you trouble for negative numbers specially? The short answer is "it doesn't". This has to do with how numbers are saved in BinaryFormatter - negative values are really large, for example, -1 would be 0xFFFFFFFF. Those are turned to �, of course, because there's no code-point mappings. On the other hand, the test positive values you've used are relatively small, and have a good chance of hitting ASCII-like code points. For example, the value of 100 is 0x64000000 - and 0x64 is d, which is fine. However, for example, 65535 and 65532 will have the same "string" representation, because both 0xFFFF and 0xFFFC are incorrect code points, and will be resolved into �. When you then feed this to your hashing function, the two input strings will be exactly the same. For negative numbers, -3 and -65532 will produce different hashes, for example.

Thanks for everyones answer. I have pretty much gone the direction of using stream.ToArray() and Convert.ToBase64 to return the string. The results look promising at this time.
I apologise that this question is causing lots of "wtf" and I understand the downvote with more likely to follow! I am not a hardcore C# developer and I am working on a large project at this time. I was not suppose to be on this, either! Trying to piece this project together was a bit of a challenge especially when a half-finished change involved negative numbers.
Thanks again.

Related

Using Dictionaries to assign values to variables

I have the following code which is intended, as an example, to take a fruitName and assign it to the next available unused "name#" string variable (one that does not already have a fruit name assigned to it).
Wanting to avoid using nested IF statements, i am trying to use a Dictionary as follows.
public static void AssignFruitToNextAvailableSlot(string fruitName)
{
string NextEmptyNameSlot = "";
string Name1 = "Apple";
string Name2 = "Orange";
string Name3 = "Tomato";
string Name4 = "";
string Name5 = "";
string Name6 = "";
string Name7 = "";
string Name8 = "";
string Name9 = "";
Dictionary<string, string> nameSlots = new Dictionary<string, string>()
{
{"Slot1", Name1},
{"Slot2", Name2},
{"Slot3", Name3},
{"Slot4", Name4},
{"Slot5", Name5},
{"Slot6", Name6},
{"Slot7", Name7},
{"Slot8", Name8},
{"Slot9", Name9}
};
foreach (KeyValuePair<string, string> nameSlot in nameSlots)
{
NextEmptyNameSlot = nameSlot.Key;
if (nameSlot.Value == "")
{
break;
}
}
Console.WriteLine($"Available name slot at {NextEmptyNameSlot}");
Console.WriteLine($"Content of empty name slot \"{nameSlots[NextEmptyNameSlot]}\"");
nameSlots[NextEmptyNameSlot] = fruitName;
Console.WriteLine($"Empty image slot has been assigned the value {nameSlots[NextEmptyNameSlot]}");
Console.WriteLine($"Empty image slot has been assigned the value {Name4}");
Console.ReadLine();
}
Sample output for AssignFruitToNextAvailableSlot("Strawberry") :
Available name slot at Slot4
Content of empty name slot ""
Empty image slot has been assigned the value Strawberry
Empty image slot has been assigned the value
As you can see the code works fine to identify the empty name slot, in this case Slot4. However when using the syntax...
nameSlots[NextEmptyNameSlot] = fruitName
... "strawberry" is assigned to nameSlots[NextEmptyNameSlot], but not the variable Name4. I tried using "ref" to assign by reference but that yielded various error messages.
**
What is the right syntax to assign the fruitName "Strawberry" to the Name4 string variable using the dictionary? Sorry if this is a very basic question. I am new to C#.
**
I think you are making this more complex than it needs to be. If you have a fixed number of elements and you want the order to remain the same then an array is the simplest way to handle the data.
Declare the array with 9 elements.
string[] fruitArray = new string[9];
Then keep a counter of how many slots you have used;
int NextSlot = 0;
Increment it after you add a fruit
NextSlot++;
Then you can iterate it to display the data
for(int loop =0; loop <= fruitArray.Length; loop++)
{
Console.WriteLine($"Slot {loop + 1} contains the value {fruitArray[loop]}");
}
If you don't have a fixed number of elements, or you don't know the size during design time then you can use a List<string> instead of an array.
A List will keep the insertion order of the data, until you sort it.
While it's not the primary problem here, a confusing factor is that strings are immutable, which means if you "change it" then it's thrown away and a new one is created. I'll come back to this point in a moment
The major problem here is that you can not establish a reference to something:
string s = "Hello";
Then establish another reference to it (like you're doing with dictionary)
string t = s;
Then "change the original thing" by changing this new reference out for something else:
t = "Good bye";
And hope that the original s has changed. s still points to a string saying "Hello". t used to point to "Hello" also (it never pointed to s pointed to hello in some sort of chain) but now points to a new string "Good bye". We never used the new keyword when we said "Good bye" but the compiler had to use it, and new made another object and changed the reference to it. Because references aren't chained, you cannot change what a downstream variable points to and hope that an upstream variable pointing to the same thing will also change.
//we had this
s ===> "Hello" <=== t
//not this
t ==> s ==> "Hello"
//then we had this
s ===> "Hello" t ===> "Good bye"
Because we have two independent reference that point to the same thing, the only way you can operate on one reference and have the other see it is by modifying the contents of what they point to, which means you will have to use something mutable, and not throw it away. This is where string confuses things because strings cannot be modified once made. They MUST be thrown away and a new one made. You don't see the new - the compiler does it for you.
So instead of using strings, we have to use something like a string but can have its contents altered:
StringBuilder sb1 = new StringBuilder("Hello");
StringBuilder sb2 = null;
var d = new Dictionary<string, StringBuilder>();
d["sb1"] = sb1;
d["sb2"] = sb2;
Now you can change your string in the mutable sb1, by accessing the stringbuilder via the dictionary:
d["sb1"].Clear();
d["sb1"].Append("Good bye");
Console.Write(sb1.Length); //prints 8, because it contains: Good bye
But you still cannot assign new references to your sb variables using the dictionary:
d["sb2"] = new StringBuilder("Oh");
//sb2 is still and will always be null
Console.Write(sb2.Length); //null reference exception
Using new here stops the dictionary pointing to sb2 and points it to something else, sb2 is not changed and is still null. There is no practical way to set sb2 to a stringbuilder instance, by using the dictionary d
This is also why the original string thing didn't work out - you can't change the content of a string - c# will throw the old string away and make a new one and every time new is used any reference that might have pointed to the old thing will now point to a new thing
As a result you'll have to init all your references to something filled or empty:
var sb1 = new StringBuilder("Hello");
var sb2 = new StringBuilder("Goodbye");
var sb3 = new StringBuilder("");
var sb4 = new StringBuilder("");
You'll have to link your dictionary to all of them:
d["sb1"] = sb1;
d["sb2"] = sb2;
d["sb3"] = sb3;
d["sb4"] = sb4;
And you'll have to skip through your dictionary looking for an empty one:
//find empty one
for(int i = 1, i <= 4; i++){
if(d["sb"+i].Length ==0)
return "sb"+i;
}
And then change its contents
This is all maaassively complex and I wholeheartedly agree with the answer given by jason that tells you to use arrays (because it's what they were invented for), but I hope i've answered your questions as to why C# didn't work the way you expected
If such strange thing is absolutely necessary, you can use Reflection.
var frutSlots = this.GetProperties()
.Where(p => p.Name.StartsWith("Name")).OrderBy(p => p.Name).ToList();
You'll get List of PropertyInfo objects, ordered by property name, through which you can iterate or just use Linq.
fruitSolts.First(fs => fs.GetValue(this).ToString()="").SetValue(this."somefruit");
But mind that reflections are not too quick and not too good for performance.

Turning the decimal values of characters in binary into character

I am working o a project that turns message into ascii decimal values... this side is not important, the problem is it needs to read it back so the translation is basically like this:
if (textBox1.Text.Contains("a"))
{
textBox3.Text = textBox3.Text.Replace("a", "97");
}
if (textBox1.Text.Contains("b"))
{
textBox3.Text = textBox3.Text.Replace("b", "98");
}
.
.
.
if (textBox1.Text.Contains("Ğ"))
{
textBox3.Text = textBox3.Text.Replace("Ğ", "286");
}
if (textBox1.Text.Contains("ş"))
{
textBox3.Text = textBox3.Text.Replace("ş", "351");
}
this translation works perfect.
but translating back the output is the problem.
my translating back method in a nutshell:
if (sonmesajBinary.Text.Contains("97"))
{
okunanMesaj.Text = okunanMesaj.Text.Replace("97", "a");
}
if (sonmesajBinary.Text.Contains("98"))
{
okunanMesaj.Text = okunanMesaj.Text.Replace("98", "b");
}
if (sonmesajBinary.Text.Contains("99"))
{
okunanMesaj.Text = okunanMesaj.Text.Replace("99", "c");
}
and the problem is lets say the output is 140
but it also includes "40"
so pc gets it wrong. That's my problem, and i require your kind help:).
i am kinda noob so sorry for my mistakes and i am 17 also english is not my native language.
note: ascii values might not be the real ones, these are just for example.
There are many problems with your code there. Checking Contains will return true for any number of occurrences of a character at any location. You're checking in textBox1 and replacing in textBox3. You're checking each character known to you but it is possible there are more! There are easier ways of getting the byte/int/number equivalent of your character based on the encoding of your input.
Here's a rudimentary solution based on comments following the question. You however need to read more about code pages and then encodings. This is only part of the Encrypt operation. I'm sure you can figure out how to replace the contents and later also Decrypt to usable format. Cheers! Happy coding.
static void Main(string[] args)
{
string fileContents = "";
int encryptKey = 3; // Consider getting this from args[0], etc.
using (FileStream fs = File.OpenRead(#"C:\Users\My\Desktop\testfile.txt"))
using (TextReader tr = new StreamReader(fs))
{
fileContents = tr.ReadToEnd();
}
byte[] asciiBytesOfFile = Encoding.ASCII.GetBytes(fileContents);
int[] encryptedContents = Encrypt(encryptKey, asciiBytesOfFile);
}
private static int[] Encrypt(int encryptKey, byte[] asciiBytesOfFile)
{
int[] encryptedChars = new int[asciiBytesOfFile.Length];
for (int i = 0; i < asciiBytesOfFile.Length; i++)
{
encryptedChars[i] = encryptKey ^ asciiBytesOfFile[i];
}
return encryptedChars;
}
It was fixed thanks to Tom Blodget, all I needed to do was delimit. So I added 0 to beginning of every 2 digit values:D
if (textBox1.Text.Contains("a"))
{
textBox3.Text = textBox3.Text.Replace("a", "097");
}

How to check with an if statement, a text, but doesn't check the upper- and lowercases

So here is my code:
if (txtboxAntwoord.Text == lblProvincie.Text)
{
}
The thing I want to achieve is: make the if statement so that it does check if the text is the same, but it does not check if the text contains upper- or lowercases.
Let's say lblProvincie's text = "Some Text" and I want to check if the containing text of txtboxAntwoord is the same, but it shouldn't matter if it contains the uppercases of the text.
You can use the .Equals method on string and pass in a string comparison option that ignores case.
if (string.Equals(txtboxAntwoord.Text, lblProvincie.Text,
StringComparison.OrdinalIgnoreCase))
for pure speed where culture-based comparison is unimportant
OR
if (string.Equals(txtboxAntwoord.Text, lblProvincie.Text,
StringComparison.CurrentCultureIgnoreCase))
if you need to take culture-based comparisons into account.
While this approach may be slightly more complicated, it is more efficient than the ToUpper() approach since new strings do not need to be allocated. It also has the advantage of being able to specify different comparison options such as CurrentCultureIgnoreCase.
While this may not be much of an impact on application performance in an isolated context, this will certainly make a difference when doing large amounts of string comparisons.
const string test1 = "Test1";
const string test2 = "test1";
var s1 = new Stopwatch();
s1.Start();
for (int i = 0; i < 1000000; i++)
{
if (!(test1.ToUpper() == test2.ToUpper()))
{
var x = "1";
}
}
s1.Stop();
s1.ElapsedMilliseconds.Dump();
var s2 = new Stopwatch();
s2.Start();
for (int i = 0; i < 1000000; i++)
{
if(!string.Equals(test1, test2,
StringComparison.OrdinalIgnoreCase))
{
var x = "1";
}
}
s2.Stop();
s2.ElapsedMilliseconds.Dump();
The first contrived example takes 265 milliseconds on my machine for 1 million iterations. The second only takes 25. In addition, there was additional string creation for each of those iterations.
Per Mike's suggestion in the comments, it is only fair to also profile CurrentCultureIgnoreCase. This is still more efficient than ToUpper, taking 114 milliseconds which is still over twice as fast as ToUpper and does not allocate additional strings.
You can use ToUpper() or ToLower on both values so that both have same case uppor or lower, you can do it like:
if (txtboxAntwoord.Text.ToUpper() == lblProvincie.Text.ToUpper())
What you are looking for is called "case insensitive string comparison".
You can achieve it with Ehsan Sajjad's suggestion, but it would be inefficient, because for each comparison you would be generating at least one (in his example two, but that can be optimized) new string to contain the uppercase version of the string to compare to, and then immediately letting that string be garbage-collected.
David L's suggestion is bound to perform a lot better, though I would advise against StringComparison.OrdinalIgnoreCase, because it ignores the current culture.
Instead, use the following:
string.Equals( text1, text2, StringComparison.CurrentCultureIgnoreCase )

match multiple terms in a string to determine topic

I am trying to search a user-entered string against a list of known terms to determine a topic. That is, I maintain my own list of topics and related keywords, and want to match against the user-entered string to determine the topic(s) it relates to. However, I want to make sure multiple terms are "hit" to avoid false-positives.
e.g. based on the code:
//create a list of topic keywords
List<string> CivilWar = new List<string>()
{
"Confederacy", "Union", "Civil War", "Lincoln", "Stonewall Jackson"
};
//does the user agent string exist in the list?
bool isTopic = CivilWar.Exists(x => source.Contains(x));
return isTopic
the string "Stonewall Jackson fought for the Confederacy" returns a correct positive / true result, but the string "John Kennedy Toole wrote A Confederacy of Dunces" returns a false positive / true result.
How can I make sure multiple terms are required to score a positive?
bool isTopic = CivilWar.Where(x => source.Contains(x)).Count() > 1;
Use Count instead of Exists, and make sure it is greater than 1 (multi-term):
//create a list of topic keywords
List<string> CivilWar = new List<string>()
{
"Confederacy", "Union", "Civil War", "Lincoln", "Stonewall Jackson"
};
//does the user agent string exist in the list?
return CivilWar.Count(x => source.Contains(x)) > 1; //must be greater than 1

Using Protobuf-net, I suddenly got an exception about an unknown wire-type

(this is a re-post of a question that I saw in my RSS, but which was deleted by the OP. I've re-added it because I've seen this question asked several times in different places; wiki for "good form")
Suddenly, I receive a ProtoException when deserializing and the message is: unknown wire-type 6
What is a wire-type?
What are the different wire-type values and their description?
I suspect a field is causing the problem, how to debug this?
First thing to check:
IS THE INPUT DATA PROTOBUF DATA? If you try and parse another format (json, xml, csv, binary-formatter), or simply broken data (an "internal server error" html placeholder text page, for example), then it won't work.
What is a wire-type?
It is a 3-bit flag that tells it (in broad terms; it is only 3 bits after all) what the next data looks like.
Each field in protocol buffers is prefixed by a header that tells it which field (number) it represents,
and what type of data is coming next; this "what type of data" is essential to support the case where
unanticipated data is in the stream (for example, you've added fields to the data-type at one end), as
it lets the serializer know how to read past that data (or store it for round-trip if required).
What are the different wire-type values and their description?
0: variant-length integer (up to 64 bits) - base-128 encoded with the MSB indicating continuation (used as the default for integer types, including enums)
1: 64-bit - 8 bytes of data (used for double, or electively for long/ulong)
2: length-prefixed - first read an integer using variant-length encoding; this tells you how many bytes of data follow (used for strings, byte[], "packed" arrays, and as the default for child objects properties / lists)
3: "start group" - an alternative mechanism for encoding child objects that uses start/end tags - largely deprecated by Google, it is more expensive to skip an entire child-object field since you can't just "seek" past an unexpected object
4: "end group" - twinned with 3
5: 32-bit - 4 bytes of data (used for float, or electively for int/uint and other small integer types)
I suspect a field is causing the problem, how to debug this?
Are you serializing to a file? The most likely cause (in my experience) is that you have overwritten an existing file, but have not truncated it; i.e. it was 200 bytes; you've re-written it, but with only 182 bytes. There are now 18 bytes of garbage on the end of your stream that is tripping it up. Files must be truncated when re-writing protocol buffers. You can do this with FileMode:
using(var file = new FileStream(path, FileMode.Truncate)) {
// write
}
or alternatively by SetLength after writing your data:
file.SetLength(file.Position);
Other possible cause
You are (accidentally) deserializing a stream into a different type than what was serialized. It's worth double-checking both sides of the conversation to ensure this is not happening.
Since the stack trace references this StackOverflow question, I thought I'd point out that you can also receive this exception if you (accidentally) deserialize a stream into a different type than what was serialized. So it's worth double-checking both sides of the conversation to ensure this is not happening.
This can also be caused by an attempt to write more than one protobuf message to a single stream. The solution is to use SerializeWithLengthPrefix and DeserializeWithLengthPrefix.
Why this happens:
The protobuf specification supports a fairly small number of wire-types (the binary storage formats) and data-types (the .NET etc data-types). Additionally, this is not 1:1, nor is is 1:many or many:1 - a single wire-type can be used for multiple data-types, and a single data-type can be encoded via any of multiple wire-types. As a consequence, you cannot fully understand a protobuf fragment unless you already know the scema, so you know how to interpret each value. When you are, say, reading an Int32 data-type, the supported wire-types might be "varint", "fixed32" and "fixed64", where-as when reading a String data-type, the only supported wire-type is "string".
If there is no compatible map between the data-type and wire-type, then the data cannot be read, and this error is raised.
Now let's look at why this occurs in the scenario here:
[ProtoContract]
public class Data1
{
[ProtoMember(1, IsRequired=true)]
public int A { get; set; }
}
[ProtoContract]
public class Data2
{
[ProtoMember(1, IsRequired = true)]
public string B { get; set; }
}
class Program
{
static void Main(string[] args)
{
var d1 = new Data1 { A = 1};
var d2 = new Data2 { B = "Hello" };
var ms = new MemoryStream();
Serializer.Serialize(ms, d1);
Serializer.Serialize(ms, d2);
ms.Position = 0;
var d3 = Serializer.Deserialize<Data1>(ms); // This will fail
var d4 = Serializer.Deserialize<Data2>(ms);
Console.WriteLine("{0} {1}", d3, d4);
}
}
In the above, two messages are written directly after each-other. The complication is: protobuf is an appendable format, with append meaning "merge". A protobuf message does not know its own length, so the default way of reading a message is: read until EOF. However, here we have appended two different types. If we read this back, it does not know when we have finished reading the first message, so it keeps reading. When it gets to data from the second message, we find ourselves reading a "string" wire-type, but we are still trying to populate a Data1 instance, for which member 1 is an Int32. There is no map between "string" and Int32, so it explodes.
The *WithLengthPrefix methods allow the serializer to know where each message finishes; so, if we serialize a Data1 and Data2 using the *WithLengthPrefix, then deserialize a Data1 and a Data2 using the *WithLengthPrefix methods, then it correctly splits the incoming data between the two instances, only reading the right value into the right object.
Additionally, when storing heterogeneous data like this, you might want to additionally assign (via *WithLengthPrefix) a different field-number to each class; this provides greater visibility of which type is being deserialized. There is also a method in Serializer.NonGeneric which can then be used to deserialize the data without needing to know in advance what we are deserializing:
// Data1 is "1", Data2 is "2"
Serializer.SerializeWithLengthPrefix(ms, d1, PrefixStyle.Base128, 1);
Serializer.SerializeWithLengthPrefix(ms, d2, PrefixStyle.Base128, 2);
ms.Position = 0;
var lookup = new Dictionary<int,Type> { {1, typeof(Data1)}, {2,typeof(Data2)}};
object obj;
while (Serializer.NonGeneric.TryDeserializeWithLengthPrefix(ms,
PrefixStyle.Base128, fieldNum => lookup[fieldNum], out obj))
{
Console.WriteLine(obj); // writes Data1 on the first iteration,
// and Data2 on the second iteration
}
Previous answers already explain the problem better than I can. I just want to add an even simpler way to reproduce the exception.
This error will also occur simply if the type of a serialized ProtoMember is different from the expected type during deserialization.
For instance if the client sends the following message:
public class DummyRequest
{
[ProtoMember(1)]
public int Foo{ get; set; }
}
But what the server deserializes the message into is the following class:
public class DummyRequest
{
[ProtoMember(1)]
public string Foo{ get; set; }
}
Then this will result in the for this case slightly misleading error message
ProtoBuf.ProtoException: Invalid wire-type; this usually means you have over-written a file without truncating or setting the length
It will even occur if the property name changed. Let's say the client sent the following instead:
public class DummyRequest
{
[ProtoMember(1)]
public int Bar{ get; set; }
}
This will still cause the server to deserialize the int Bar to string Foo which causes the same ProtoBuf.ProtoException.
I hope this helps somebody debugging their application.
Also check the obvious that all your subclasses have [ProtoContract] attribute. Sometimes you can miss it when you have rich DTO.
I've seen this issue when using the improper Encoding type to convert the bytes in and out of strings.
Need to use Encoding.Default and not Encoding.UTF8.
using (var ms = new MemoryStream())
{
Serializer.Serialize(ms, obj);
var bytes = ms.ToArray();
str = Encoding.Default.GetString(bytes);
}
If you are using SerializeWithLengthPrefix, please mind that casting instance to object type breaks the deserialization code and causes ProtoBuf.ProtoException : Invalid wire-type.
using (var ms = new MemoryStream())
{
var msg = new Message();
Serializer.SerializeWithLengthPrefix(ms, (object)msg, PrefixStyle.Base128); // Casting msg to object breaks the deserialization code.
ms.Position = 0;
Serializer.DeserializeWithLengthPrefix<Message>(ms, PrefixStyle.Base128)
}
This happened in my case because I had something like this:
var ms = new MemoryStream();
Serializer.Serialize(ms, batch);
_queue.Add(Convert.ToBase64String(ms.ToArray()));
So basically I was putting a base64 into a queue and then, on the consumer side I had:
var stream = new MemoryStream(Encoding.UTF8.GetBytes(myQueueItem));
var batch = Serializer.Deserialize<List<EventData>>(stream);
So though the type of each myQueueItem was correct, I forgot that I converted a string. The solution was to convert it once more:
var bytes = Convert.FromBase64String(myQueueItem);
var stream = new MemoryStream(bytes);
var batch = Serializer.Deserialize<List<EventData>>(stream);

Categories

Resources