Better ways of improving code serialization speed - c#

I have the following code that serializes a List to a byte array for transport via Web Services. The code works relatively fast on smaller entities, but this is a list of 60,000 or so items. It takes several seconds to execute the formatter.Serialize method. Anyway to speed this up?
public static byte[] ToBinary(Object objToBinary)
{
using (MemoryStream memStream = new MemoryStream())
{
BinaryFormatter formatter = new BinaryFormatter(null, new StreamingContext(StreamingContextStates.Clone));
formatter.Serialize(memStream, objToBinary);
memStream.Seek(0, SeekOrigin.Begin);
return memStream.ToArray();
}
}

The inefficiency you're experiencing comes from several sources:
The default serialization routine uses reflection to enumerate object fields and get their values.
The binary serialization format stores things in associative lists keyed by the string names of the fields.
You've got a spurious ToArray in there (as Danny mentioned).
You can get a pretty big improvement off the bat by implementing ISerializable on the object type that is contained in your List. That will cut out the default serialization behavior that uses reflection.
You can get a little more speed if you cut down the number of elements in the associative array that holds the serialized data. Make sure the elements you do store in that associative array are primitive types.
Finally, you can eliminate the ToArray but I doubt you'll even notice the bump that gives you.

if you want some real serialization speed , consider using protobuf-net which is the c# version of google's protocol buffers. it's supposed to be an order of magnitude faster that binary formatter.

It would probably be much faster to serialize the entire array (or collection) of 60,000 items in one shot, into a single large byte[] array, instead of in separate chunks. Is having each of the individual objects be represented by its own byte[] array a requirement of other parts of the system you're working within? Also, are the actual Type's of the objects known? If you were using a specific Type (maybe some common base class to all of these 60,000 objects) then the framework would not have to do as much casting and searching for your prebuilt serialization assemblies. Right now you're only giving it Object.

.ToArray() creates a new array, it more be more effcient to copy the data to an existing array using unsafe methods (such as accessing the stream's memory using fixed, then copying the memory using MemCopy() via DllImport).
Also consider using a faster custom formatter.

I started a code-generator project, that includes a binary DataContract-Serialzer that beats at least Json.NET by a factor of 30. All you need are the generator nuget package and an additional lib that comes with faster replacements of BitConverter.
Then you create a partial class and decorate it with DataContract and each serializable property with DataMember. The generator will then create a ToBytes-method and together with the additional lib you can serialize collections as well. Look at my example from this post:
var objects = new List<Td>();
for (int i = 0; i < 1000; i++)
{
var obj = new Td
{
Message = "Hello my friend",
Code = "Some code that can be put here",
StartDate = DateTime.Now.AddDays(-7),
EndDate = DateTime.Now.AddDays(2),
Cts = new List<Ct>(),
Tes = new List<Te>()
};
for (int j = 0; j < 10; j++)
{
obj.Cts.Add(new Ct { Foo = i * j });
obj.Tes.Add(new Te { Bar = i + j });
}
objects.Add(obj);
}
With this generated ToBytes() method:
public int Size
{
get
{
var size = 24;
// Add size for collections and strings
size += Cts == null ? 0 : Cts.Count * 4;
size += Tes == null ? 0 : Tes.Count * 4;
size += Code == null ? 0 : Code.Length;
size += Message == null ? 0 : Message.Length;
return size;
}
}
public byte[] ToBytes(byte[] bytes, ref int index)
{
if (index + Size > bytes.Length)
throw new ArgumentOutOfRangeException("index", "Object does not fit in array");
// Convert Cts
// Two bytes length information for each dimension
GeneratorByteConverter.Include((ushort)(Cts == null ? 0 : Cts.Count), bytes, ref index);
if (Cts != null)
{
for(var i = 0; i < Cts.Count; i++)
{
var value = Cts[i];
value.ToBytes(bytes, ref index);
}
}
// Convert Tes
// Two bytes length information for each dimension
GeneratorByteConverter.Include((ushort)(Tes == null ? 0 : Tes.Count), bytes, ref index);
if (Tes != null)
{
for(var i = 0; i < Tes.Count; i++)
{
var value = Tes[i];
value.ToBytes(bytes, ref index);
}
}
// Convert Code
GeneratorByteConverter.Include(Code, bytes, ref index);
// Convert Message
GeneratorByteConverter.Include(Message, bytes, ref index);
// Convert StartDate
GeneratorByteConverter.Include(StartDate.ToBinary(), bytes, ref index);
// Convert EndDate
GeneratorByteConverter.Include(EndDate.ToBinary(), bytes, ref index);
return bytes;
}
It serializes each object in ~1.5 micro seconds -> 1000 objects in 1,7ms.

Related

How can I marshal this character array parameter from C to a string in C#?

I have the following function, written as part of another class in C:
int example(char *remoteServerName)
{
if (doSomething(job))
return getError(job);
if (job->server != NULL) {
int length = strlen(jobPtr->server->name); // name is a char * of length 1025
remoteServerName = malloc (length * sizeof(char));
strncpy(remoteServerName, jobPtr->server->name, length);
}
return 0;
}
How can I get the remoteServerName back from it? I have tried the following:
[DllImport("example.dll")]
public static extern int example(StringBuilder remoteServerName);
var x = new StringBuilder();
example(x);
Console.WriteLine(x.ToString());
But the string is always empty.
You need to allocate some space for the string to be returned in. Instead of:
var x = new StringBuilder();
provide a capacity value:
var x = new StringBuilder(1024);
You should also remove your call to malloc. The caller allocates the memory. That is the purpose of marshalling with StringBuilder.
You are not using strncpy correctly, and so fail to write a null terminator. You could pass the buffer length like this:
int example(char *remoteServerName)
{
if (doSomething(job))
return getError(job);
if (job->server != NULL) {
// note that new StringBuilder(N) means a buffer of length N+1 is marshaled
strncpy(remoteServerName, jobPtr->server->name, 1025);
}
return 0;
}
But that would be a bit wasteful, with all the zero padding that is implied. Really, strncpy is next to useless and you should use a different function to copy, as has been discussed many times before here. I don't really want to get drawn into that because it's a little off to the side of the question.
It would be prudent to design your API to allow the caller to also pass the length of the character array so that the callee can make sure not to overrun the buffer, and so that you don't need to use magic constants as the code here does.

Converting byte[] to an object

I'm trying to convert an object which I have in a byte[] to an object.
I've tried using this code I found online:
object byteArrayToObject(byte[] bytes)
{
try
{
MemoryStream ms = new MemoryStream(bytes);
BinaryFormatter bf = new BinaryFormatter();
//ms.Position = 0;
return bf.Deserialize(ms,null);
}
catch
{
return null;
}
}
SerializationException: "End of Stream encountered before parsing was
completed.".
I've tried it with the ms.Position = 0 line uncommented of course too...
bytes[] is only 8 bytes long, each byte isn't null.
Suggestions?
[edit]
The byte[] was written to a binary file from a c++ program using something along the lines of
void WriteToFile (std::ostream& file,T* value)
{
file.write(reinterpret_cast<char*>(value), sizeof(*T))
}
Where value may be a number of different types.
I can cast to some objects okay from the file using BitConverter, but anything BitConverter doesn't cover I can't do..
As was stated by cdhowie, you will need to manually deserialize the encoded data. Based on the limited information available, you may either want an array of objects or an object containing an array. It looks like you have a single long but there is no way to know from your code. You will need to recreate your object in its true form so take the below myLong as a simple example for a single long array. Since it was unspecified I'll assume you want a struct containing an array like:
public struct myLong {
public long[] value;
}
You could do the same thing with an array of structs, or classes with minor changes to the code posted below.
Your method will be something like this: (written in the editor)
private myLong byteArrayToObject(byte[] bytes) {
try
{
int len = sizeof(long);
myLong data = new myLong();
data.value = new long[bytes.Length / len];
int byteindex = 0;
for (int i = 0; i < data.value.Length; i++) {
data.value[i] = BitConverter.ToInt64(bytes,byteindex);
byteindex += len;
}
return data;
}
catch
{
return null;
}
}

Fast way to get the length of serialized data using ProtoBuf-net?

Let's say I have a Person class. It has an editable Notes property.
I want to serialize the Person instance to a fixed size buffer, thus the Notes cannot be infinite long.
In the UI, I use a TextBox to let user edit the notes. I want to dynamically update a label saying how many more characters you can write.
This is my current implementation, is there any faster method? (I'm using rs282)
public Int32 GetSerializedLength()
{
Byte[] data;
using (MemoryStream ms = new MemoryStream())
{
Serializer.SerializeWithLengthPrefix<Person>(ms, this, PrefixStyle.Base128);
data = ms.ToArray();
}
using (MemoryStream ms = new MemoryStream(data))
{
Int32 length = 0;
if (Serializer.TryReadLengthPrefix(ms, PrefixStyle.Base128, out length))
return length;
else
return -1;
}
}
EDIT: I'm confused about the internal lengh of the serialized data and the total length of the serialized data.
Here is my final version:
private static MemoryStream _ms = new MemoryStream();
public static Int64 GetSerializedLength(Person person)
{
if(null == person) return 0;
_ms.SetLength(0);
Serializer.Serialize<Person>(_ms, person);
return _ms.Length;
}
With the edit, it sounds like you want the length without serializing it (since if you want the length with serializing it, you'd just serialize it and check the .Length).
Basically, no - this isn't available. I know that some of the other implementations build this data eagerly, that is in part because they are constructing the buffered data all the time, where-as protobuf-net works from an object graph.
protobuf-net does not do this - it builds the data by discovery during a single pass over the object graph. Is there a specific purpose you have in mind? Things can always be added (with effort, though).
Re the issue of a notes (string) field that you don't want to be over-sized; as a sanity check, note that protubuf uses UTF8 or string data, so personally I would just do:
if(theText.Length > MAX || Encoding.UTF8.GetByteCount(theText) > MAX
|| GetSerializedLength(obj) > MAX)
{
//
}
note we've checked this a bit more cheaply in the obvious cases

Why does the same algorithm work in Scala much slower than in C#? And how to make it faster?

The algorithm creates all possible variants of the sequence from variants for each member of the sequence.
C# code :
static void Main(string[] args)
{
var arg = new List<List<int>>();
int i = 0;
for (int j = 0; j < 5; j++)
{
arg.Add(new List<int>());
for (int j1 = i; j1 < i + 3; j1++)
{
//if (j1 != 5)
arg[j].Add(j1);
}
i += 3;
}
List<Utils<int>.Variant<int>> b2 = new List<Utils<int>.Variant<int>>();
//int[][] bN;
var s = System.Diagnostics.Stopwatch.StartNew();
//for(int j = 0; j < 10;j++)
b2 = Utils<int>.Produce2(arg);
s.Stop();
Console.WriteLine(s.ElapsedMilliseconds);
}
public class Variant<T>
{
public T element;
public Variant<T> previous;
}
public static List<Variant<T>> Produce2(List<List<T>> input)
{
var ret = new List<Variant<T>>();
foreach (var form in input)
{
var newRet = new List<Variant<T>>(ret.Count * form.Count);
foreach (var el in form)
{
if (ret.Count == 0)
{
newRet.Add(new Variant<T>{ element = el, previous = null });
}
else
{
foreach (var variant in ret)
{
var buf = new Variant<T> { previous = variant, element = el };
newRet.Add(buf);
}
}
}
ret = newRet;
}
return ret;
}
Scala code :
object test {
def main() {
var arg = new Array[Array[Int]](5)
var i = 0
var init = 0
while (i<5)
{
var buf = new Array[Int](3)
var j = 0
while (j<3)
{
buf(j) = init
init = init+1
j = j + 1
}
arg(i)=buf
i = i + 1
}
println("Hello, world!")
val start = System.currentTimeMillis
var res = Produce(arg)
val stop = System.currentTimeMillis
println(stop-start)
/*for(list <- res)
{
for(el <- list)
print(el+" ")
println
}*/
println(res.length)
}
def Produce[T](input:Array[Array[T]]):Array[Variant[T]]=
{
var ret = new Array[Variant[T]](1)
for(val forms <- input)
{
if(forms!=null)
{
var newRet = new Array[Variant[T]](forms.length*ret.length)
if(ret.length>0)
{
for(val prev <-ret)
if(prev!=null)
for(val el <-forms)
{
newRet = newRet:+new Variant[T](el,prev)
}
}
else
{
for(val el <- forms)
{
newRet = newRet:+new Variant[T](el,null)
}
}
ret = newRet
}
}
return ret
}
}
class Variant[T](var element:T, previous:Variant[T])
{
}
As others have said, the difference is in how you're using the collections. Array in Scala is the same thing as Java's primitive array, [], which is the same as C#'s primitive array []. Scala is clever enough to do what you ask (namely, copy the entire array with a new element on the end), but not so clever as to tell you that you'd be better off using a different collection. For example, if you just change Array to ArrayBuffer it should be much faster (comparable to C#).
Actually, though, you'd be better off not using for loops at all. One of the strengths of Scala's collections library is that you have a wide variety of powerful operations at your disposal. In this case, you want to take every item from forms and convert it into a Variant. That's what map does.
Also, your Scala code doesn't seem to actually work.
If you want all possible variants from each member, you really want to use recursion. This implementation does what you say you want:
object test {
def produce[T](input: Array[Array[T]], index: Int = 0): Array[List[T]] = {
if (index >= input.length) Array()
else if (index == input.length-1) input(index).map(elem => List(elem))
else {
produce(input, index+1).flatMap(variant => {
input(index).map(elem => elem :: variant)
})
}
}
def main() {
val arg = Array.tabulate(5,3)((i,j) => i*3+j)
println("Hello, world!")
val start = System.nanoTime
var res = produce(arg)
val stop = System.nanoTime
println("Time elapsed (ms): " + (stop-start)/1000000L)
println("Result length: " + res.length)
println(res.deep)
}
}
Let's unpack this a little. First, we've replaced your entire construction of the initial variants with a single tabulate instruction. tabulate takes a target size (5x3, here), and then a function that maps from the indices into that rectangle into the final value.
We've also made produce a recursive function. (Normally we'd make it tail-recursive, but let's keep things as simple as we can for now.) How do you generate all variants? Well, all variants is clearly (every possibility at this position) + (all variants from later positions). So we write that down recursively.
Note that if we build variants recursively like this, all the tails of the variants end up the same, which makes List a perfect data structure: it's a singly-linked immutable list, so instead of having to copy all those tails over and over again, we just point to them.
Now, how do we actually do the recursion? Well, if there's no data at all, we had better return an empty array (i.e. if index is past the end of the array). If we're on the last element of the array of variations, we basically want each element to turn into a list of length 1, so we use map to do exactly that (elem => List(elem)). Finally, if we are not at the end, we get the results from the rest (which is produce(input, index+1)) and make variants with each element.
Let's take the inner loop first: input(index).map(elem => elem :: variant). This takes each element from variants in position index and sticks them onto an existing variant. So this will give us a new batch of variants. Fair enough, but where do we get the new variant from? We produce it from the rest of the list: produce(input, index+1), and then the only trick is that we need to use flatMap--this takes each element, produces a collection out of it, and glues all those collections together.
I encourage you to throw printlns in various places to see what's going on.
Finally, note that with your test size, it's actually an insigificant amount of work; you can't accurately measure that, even if you switch to using the more accurate System.nanoTime as I did. You'd need something like tabulate(12,3) before it gets significant (500,000 variants produced).
The :+ method of the Array (more precisely of ArrayOps) will always create a copy of the array. So instead of a constant time operation you have one that is more or less O(n).
You do it within nested cycles => your whole stuff will be an order of magnitude slower.
This way you more or less emulate an immutable data structure with a mutable one (which was not designed for it).
To fix it you can either use Array as a mutable data structure (but then try to avoid endless copying), or you can switch to a immutable one. I did not check your code very carefully, but the first bet is usually List, check the scaladoc of the various methods to see their performance behaviour.
ret.length is not 0 all the time, right before return it is 243. The size of array should not be changed, and List in .net is an abstraction on top of array. BUT thank you for the point - problem was that I used :+ operator with array which as I understand caused implicit use of type LinkedList

IEnumerable<T> ToArray usage - Is it a copy or a pointer?

I am parsing an arbitrary length byte array that is going to be passed around to a few different layers of parsing. Each parser creates a Header and a Packet payload just like any ordinary encapsulation.
My problem lies in how the encapsulation holds its packet byte array payload. Say I have a 100 byte array with three levels of encapsulation. Three packet objects will be created and I want to set the payload of these packets to the corresponding position in the byte array of the packet.
For example, let's say the payload size is 20 for all levels, then imagine it has a public byte[] Payload on each object. However, the problem is that this byte[] Payload is a copy of the original 100 bytes, so I'm going to end up with 160 bytes in memory instead of 100.
If it were in C++, I could just easily use a pointer - however, I'm writing this in C#.
So I created the following class:
public class PayloadSegment<T> : IEnumerable<T>
{
public readonly T[] Array;
public readonly int Offset;
public readonly int Count;
public PayloadSegment(T[] array, int offset, int count)
{
this.Array = array;
this.Offset = offset;
this.Count = count;
}
public T this[int index]
{
get
{
if (index < 0 || index >= this.Count)
throw new IndexOutOfRangeException();
else
return Array[Offset + index];
}
set
{
if (index < 0 || index >= this.Count)
throw new IndexOutOfRangeException();
else
Array[Offset + index] = value;
}
}
public IEnumerator<T> GetEnumerator()
{
for (int i = Offset; i < Offset + Count; i++)
yield return Array[i];
}
System.Collections.IEnumerator System.Collections.IEnumerable.GetEnumerator()
{
IEnumerator<T> enumerator = this.GetEnumerator();
while (enumerator.MoveNext())
{
yield return enumerator.Current;
}
}
}
This way I can simply reference a position inside the original byte array but use positional indexing. However, if I do something like:
PayloadSegment<byte> something = new PayloadSegment<byte>(someArray, 5, 10);
byte[] somethingArray = something.ToArray();
Will the somethingArray be a copy of the bytes, or a reference to the original PayloadSegment (which in turn is a reference to the original byte array)?
EDIT: Actually after rethinking this, can't I simply use a new MemoryStream(array, offset, length)?
The documentation for the Enumerable.ToArray extension method doesn't specifically mention what it does when it's passed a sequence that happens to already be an array. But a simple check with .NET Reflector reveals that it does indeed create a copy of the array.
It is worth noting however that when given a sequence that implements ICollection<T> (which Array does) the copy can be done much faster because the number of elements is known up front so it does not have to do dynamic resizing of the buffer such as List<T> does.
There is a very strong practice which suggests that calling "ToArray" on an object should return a new array which is detached from anything else. Nothing that is done to the original object should affect the array, and nothing which is done to the array should affect the original object. My personal preference would have been to call the routine "ToNewArray", to make explicit that each call will return a different new array.
A few of my classes have an "AsReadableArray", which returns an array which may or may not be attached to anything else. The array won't change in response to manipulations to the original object, but it's possible that multiple reads yielding the same data (which they often will) will return the same array. I really wish .net had an ImmutableArray type, supporting the same sorts of operations as String [a String, in essence, being an ImmutableArray(Of Char)], and a ReadableArray abstract type (from which both Array and ImmutableArray would inherit). I doubt such a thing could be squeezed into .Net 5.0, but it would allow a lot of things to be done much more cleanly.
It is a copy. When you call a To<Type> method, it creates a copy of the source element with the target Type
Because byte is a value type, the array will hold copies of the values, not pointers to them.
If you need the same behavior as an reference type, it is best to create a class that holds the byte has a property, and may group other data and functionality.
It's a copy. It would be very unintuitive if I passed something.ToArray() to some method, and the method changed the value of something by changing the array!

Categories

Resources