Binary Formatter, Set Position to Deserialize Particular Object - c#

I want to ask about serialize/deserialize object with binary formatter. well i'm trying to deserialize object in FileStream that contain many objects that has been serialized one by one. The size of an object is too big to be saved in process memmory that's why i don't pack all of objects in one such as: List because they are too big in process memory So i serialize as much as needed in many times. with this way it won't take many process memmory because i just process one object alternately not all of objects. take a look at sketch that i mean
<FileStream>
----Object 1-----Size = 100 Mb------index = 0
----Object 2-----Size = 100 Mb------index = 1
----Object 3-----Size = 100 Mb------index = 2
----Object 4-----Size = 100 Mb------index = 3
----Object 5-----Size = 100 Mb------index = 4
----Object 6-----Size = 100 Mb------index = 5
</FileStream>
Serialization object is also successfully now i got a problem to deserialized an object.
here is the problem:
in List we can take an item with index. so if we like to take fifth index we can call it such as:
List<object> list = new List<object>();
list(0) = "object1";
list(1) = "object2";
list(2) = "object3";
list(3) = "object4";
list(4) = "object5";
list(5) = "object6";
object fifthIndex = list[5]; // here we can get item based index
Well now the problem is how can i get objects with fifth index just like List Method on six Deserialization object in a filestream with Binary Formatter. i know in FileStream there is a property that named "FileStream.Position" but it does not like Index, it looks like a random number when i have deserialize/serialize an object. maybe it will increase random number.
actually i have succeeded with this but i bet this is not best way take a look at my code that i have ever tried:
object GetObjectStream(FileStream fs, int index)
{
if (fs != null)
{
BinaryFormatter binaryformatter = new BinaryFormatter();
bool isFinished = false; int count = 0;
while (isFinished == false)
{
try
{
object objectdeserialized = binaryformatter.Deserialize(fs);
if (count == index) return objectdeserialized;
count++;
}
catch
{
isFinished = true;
return null;
}
}
}
return null;
}
these codes will "foreach" every object that has been serialized and then deserialize every objects. i bet my codes are not the best way because to Deserialize object that contain 100 MB maybe it will take long time, I don't even know the object except index that ever be deserialized will be disposed or not? i want method just like a "Serialization Leap."
your answer is very helpfull and usefull for me if we can solve this problem.
Thanks before..

Each object will most likely take a different amount of space to serialize - data packs differently, especially for things like strings and arrays. Basically, to do this efficiently (i.e. without reading every object in full each time), you would want to either:
prefix each object with the amount of data it takes, by serializing it to a MemoryStream, storing the .Length (any way that is convenient to you; a 4 byte little-endian chunk would suffice), and then copy the data you wrote to MemoryStream to the output; then you can skip to the n'th item by n-times-(read 4 bytes as an int, skipping that many bytes)
in a separate index, store the .Position of the base stream just before you serialize each object; then to read the nth object, you use the index to find the position you need, and scroll to there
Actually, you were quite lucky here: BinaryFormatter isn't actually documented as being safe to append, but as it happens it does kinda work out ok it you do that - but this isn't true for all serialization formats.
Personally, though, I'd question whether there is simply a different design that could be used here.

----Write objects to file---
1. Keep a Dictionary<KeyType, long> dict. KeyType probably int if you key on an object number.
2. open file FileX for output; FileX.position = 0 at this point.
3. For each object:
update dictionary(oject.key, fileX.Position)
Serialize object to FileX (note: FileX.Position is updated by BinaryFormatter)
4. Close FileX. Save Dictionary (serialize to another file).
----Read Back Object---
Use Dictionary to get offset based on key of object you want:
do FileX.Seek(offset, 0); formatter.deserialize(FileX) to get back the
object you wish.

Related

C# serialization without struct metadata itself

Right now I'm working on a game engine. To be more efficient and keep data from the end user, I'm trying to use serialization on a modified form of the Wavefront's *.OBJ format. I have multiple structs set up to represent data, and the serialization of the objects works fine except it takes up a significant amount of file space (at least x5 that of the original OBJ file).
To be specific, here's a quick example of what the final object would be (in a JSON-esque format):
{
[{float 5.0, float 2.0, float 1.0}, {float 7.0, float 2.0, float 1.0}, ...]
// ^^^ vertex positions
// other similar structures for colors, normals, texture coordinates
// ...
[[{int 1, int 1, int 1}, {int 2, int 2, int 1}, {int 3, int 3, int 2}], ...]
//represents one face; represents the following
//face[vertex{position index, text coords index, normal index}, vertex{}...]
}
Basically, my main issue with the method of serializing data (binary format) is it saves the names of the structs, not the values. I'd love to keep the data in the format I have already, just without saving the struct itself in my data. I want to save something similar to the above, yet it'll still let me recompile with a different struct name later.
Here's the main object I'm serializing and saving to a file:
[Serializable()] //the included structs have this applied
public struct InstantGameworksObjectData
{
public Position[] Positions;
public TextureCoordinates[] TextureCoordinates;
public Position[] Normals;
public Face[] Faces;
}
Here's the method in which I serialize and save the data:
IFormatter formatter = new BinaryFormatter();
long Beginning = DateTime.Now.Ticks / 10000000;
foreach (string file in fileNames)
{
Console.WriteLine("Begin " + Path.GetFileName(file));
var output = InstantGameworksObject.ConvertOBJToIGWO(File.ReadAllLines(file));
Console.WriteLine("Writing file");
Stream fileOutputStream = new FileStream(outputPath + #"\" + Path.GetFileNameWithoutExtension(file) + ".igwo", FileMode.Create, FileAccess.Write, FileShare.None);
formatter.Serialize(fileOutputStream, output);
Console.WriteLine(outputPath + #"\" + Path.GetFileNameWithoutExtension(file) + ".igwo");
}
The output, of course, is in binary/hex (based on what program you use to view the file), and that's great:
But putting it into a hex-to-text converter online yields specific name data:
In the long run, this could mean gigabytes worth of useless data. How can I save my C# object with the data in the correct format, just without the extra meta-clutter?
As you correctly note, the standard framework binary formatters include a host of metadata about the structure of the data. This is to try to keep the serialised data self-describing. If they were to separate the data from all that metadata, then the smallest change to the structure of classes would render the previously serialised data useless. By that token, I doubt you'd find any standard framework method of serialising binary data that didn't include all the metadata.
Even ProtoBuf includes the semantics of the data in the file data, albeit with less overhead.
Given that the structure of your data follows the reasonably common and well established form of 3D object data, you could roll your own format for your assets which strips the semantics and only stores the raw data. You can implement read and write methods easily using the BinaryReader/BinaryWriter classes (which would be my preference). If you're looking to obfuscate data from the end user, there are a variety of different ways that you could achieve that with this approach.
For example:
public static InstantGameworksObjectData ReadIgoObjct(BinaryReader pReader)
{
var lOutput = new InstantGameworksObjectData();
int lVersion = pReader.ReadInt32(); // Useful in case you ever want to change the format
int lPositionCount = pReader.ReadInt32(); // Store the length of the Position array before the data so you can pre-allocate the array.
lOutput.Positions = new Position[lPositionCount];
for ( int lPositionIndex = 0 ; lPositionIndex < lPositionCount ; ++ lPositionIndex )
{
lOutput.Positions[lPositionIndex] = new Position();
lOutput.Positions[lPositionIndex].X = pReader.ReadSingle();
lOutput.Positions[lPositionIndex].Y = pReader.ReadSingle();
lOutput.Positions[lPositionIndex].Z = pReader.ReadSingle();
// or if you prefer... lOutput.Positions[lPositionIndex] = Position.ReadPosition(pReader);
}
int lTextureCoordinateCount = pReader.ReadInt32();
lOutput.TextureCoordinates = new TextureCoordinate[lPositionCount];
for ( int lTextureCoordinateIndex = 0 ; lTextureCoordinateIndex < lTextureCoordinateCount ; ++ lTextureCoordinateIndex )
{
lOutput.TextureCoordinates[lTextureCoordinateIndex] = new TextureCoordinate();
lOutput.TextureCoordinates[lTextureCoordinateIndex].X = pReader.ReadSingle();
lOutput.TextureCoordinates[lTextureCoordinateIndex].Y = pReader.ReadSingle();
lOutput.TextureCoordinates[lTextureCoordinateIndex].Z = pReader.ReadSingle();
// or if you prefer... lOutput.TextureCoordinates[lTextureCoordinateIndex] = TextureCoordinate.ReadTextureCoordinate(pReader);
}
// ...
}
As far as space efficiency and speed goes, this approach is hard to beat. However, this works well for the 3D objects as they're fairly well-defined and the format is not likely to change, but this approach may not extend well to the other assets that you want to store.
If you find you are needing to change class structures frequently, you may find you have to write lots of if-blocks based on version to correctly read a file, and have to regularly debug issues where the data in the file is not quite in the format you expect. A happy medium might be to use something such as ProtoBuf for the bulk of your development until you're happy with the structure of your data object classes, and then writing raw binary Read/Write methods for each of them before you release.
I'd also recommend some Unit Tests to ensure that your Read and Write methods are correctly persisting the object to avoid pulling your hair out later.
Hope this helps

Store string data using an array (C#)

A task that I can't seem to solve, even after hours and hours of trying.
Basically, I have a phonebook that takes input from the user: name and number (both string type), which becomes a Contact.
I'm supposed to store the Contact in an Array, and the user shall both be able to add and also delete data (Contact) from the array, via the methods Create and Delete.
I made an own Repository class to handle the data (Contact also has an own little class), but I used List to store the data, so I could simply use Add and Remove, so my code looks like this:
public class Repository
{
List<Contact> storagelist;
public Repository() {
storagelist = new List<Contact>();
}
public void Create(Contact item) //Adds the item to the list
{
storagelist.Add(item);
}
public bool Delete(Contact item) //Removes the item
{
if (!storagelist.Contains(item))
return false;
storagelist.Remove(item);
return true;
}
}
What I am looking for, is how do exactly this, have these 2 features of adding and removing a Contact, but store the data in an Array instead.
Since arrays (to my knowledge) has to have a fixed, pre-defined size I have no idea how it could be used in exactly the same way as the List. The array size shall always be the same as the amount of Contacts that are stored, but how can this be done when an array's size is fixed??
So, how to create an array, that always has the same size as the amount of Contacts that are stored, and how to Add and Remove to/from this array?
Help is very much appreciated!
EDIT: Thanks for all responses! Every answer was helpful in the process (Omar and person66 in particular!).
I solved the Removal by "moving" the entire array after the delete-element, to 1 index lower, and finally resizing the array to be smaller. Like so:
int deleteIndex = Array.IndexOf(storagelist, item);
for (int index = deleteIndex + 1; index < storagelist.Length; index++)
{
storagelist[index - 1] = storagelist[index];
}
Array.Resize(ref storagelist, storagelist.Length - 1);
You are right in that array sizes are fixed. You can, however, use Array.Resize() to create a new array of the specified size with all the current array data. So for adding you would resize to 1 larger and add the new contact at the end. For removing you will have to use a loop to shift all the elements in the array past the one being removed back one spot, then resize it to be 1 smaller.
EDIT: A simpler option for removing would be to use Array.Copy():
Array.Copy(a, deleteIndex + 1, a, deleteIndex, a.Length - (deleteIndex + 1));
Array.Resize(ref a, a.Length - 1);
A list is a much better solution to this problem, I don't know why you would ever want to use an array for this.
A List just ends up using an array for it's storage anyway. The way a list works is it is initializes an array with a certain amount of storage then if it's capacity is exceeded it recreates an array with a larger size and copies the elements back. You could try this approach, except you'd just be recreating what a list does.
The other option is just declare an arbitrarily large array of 100,000 elements or so. A number which you know will not be exceeded.
For size you can write your own function which keeps track of the number of contacts in the array.
You can use a generic list. Under the hood the List class uses an array for storage but does so in a fashion that allows it to grow effeciently.
Take a look at this link for more details, it can be helpfull.
var contacts = new[]
{
new { Name = "Foo", Phone = "9999999999" },
new { Name = "Bar", Phone = "0000000000" }
};
You can create an array of anonymous object and then use linq to delete objects from array.
You can create a new object and insert into anonymous object variable.

Loop of intensive processing and serializing - how to completely clean memory after each serialization in C#?

In my C# console application, I instantiate an object MyObject of type MyType. This object contains 6 very large arrays, some of them containing elements of primitive type, others elements of other reference types. The latter can in turn contain big arrays. In order to instantiate all these arrays, I do some intensive processing, which lasts for about 2 minutes.
The machine I'm working on has 4 GB RAM, on a 32bit Windows. Before running my console app, the available memory is at about 2413 MB, and right before finishing, the available memory goes to about 300-400 MB.
After I assign values for all my arrays in the object MyObject, I'm serializing it. My objective is to instantiate and serialize 50 objects like this one. So after serializing, I set to null all the arrays. [This does not reflect immediately in Task Manager, where the available memory is still 300-400 Mo. So I assume the GC does not collect immediately.] Right after this, I'm reexecuting the method that instantiates the arrays in MyObject. I get a system out of memory exception almost immediately. I'm thinking this is not the right approach to effectively manage memory in .NET.
So my question is this: knowing that processing one object of type MyType, like MyObject, "fits" in the available memory resources, how can I instantiate one object of type MyType, serialize it, then completely clean ALL the memory that was used for this purpose? And then, either reinstantiate the same object, or a new object of the same type? So that I the end I get with 50 different serialized objects of type MyType?
Thanks.
Updating question with code. Here's a simplified version of my code:
class MyType
{
int[] intArray; int[] intArray2;
double[] doubleArray;
RefType1[] refType1Array;
RefType2[] refType2Array;
RefType3[] refType3Array;
public MyType (params)
{
for (int i = 0; i < 50; i++)
{
instantiateArrays();
serializeObject();
releaseMemory();
}
}
private void instantiateArrays()
{
//instantiate all the primitive arrays with 50.000 elements per array
//instantiate refType1Array with 300 elements, refType2Array with 3000 elements and
//refType3Array with 150 elements
//lasts about 2 minutes
}
private void serializeObject()
{
Stream fileStream = File.Create(filePath);
BinaryFormatter serializer = new BinaryFormatter();
serializer.Serialize(fileStream, this);
fileStream.Close();
}
private void releaseMemory()
{
intArray = null;
intArray2 = null;
doubleArray = null;
refType1Array = null;
refType2Array = null;
refType3Array = null;
}
}
RefType1 contains integer and double fields, and another array of integers with, on average, 50 elements. RefType2 and RefType3 contain integer and double fields, and another array of reference type RefType4. On average, this array contains 500 objects. Each RefType4 object contains an array of 15 integers, on average.
you can clean memory by this command:
GC.Collect();
GC.WaitForPendingFinalizers();

Counting Facebook "Likes" using Graph API

Facebook fixed the /likes in Graph API. /likes now returns the complete list of user's that liked a particular object in the graph (Photos, Albums, etc). In before, it returns only 3 - 5 users.
My question is, how do you count the total number of "likes" without parsing the entire JSON and getting the element count? I'm only interested in the "likes" count; I'm not interested in the users who gave the likes.
It seems a little expensive to get the entire JSON dataset just to count.
EG: https://graph.facebook.com/161820597180936/likes
This photo has like 1,000+ likes.
Seeing as the string is JSON, why not convert it into a standard .net object, and use the .Count on the array that it creates. Then cache this information for 15 or more minutes (depending on stale you want your info).
The method above is quite heavy handed as you are essentially going to search a string an unknown number of times, to return an index, to compare it to an int, to add up another index and so on. And the C# above doesn't work (assuming it is C# that you are demoing).
Use something like this instead:
public static T FromJson<T>(this string s)
{
var ser = new System.Web.Script.Serialization.JavaScriptSerializer.JavaScriptSerializer();
return ser.Deserialize<T>(s);
}
where this method is an extension method, that takes properly formatted JSON string and converts it to the object T e.g.
var result = // call facebook here and get your response string
List<FacebookLikes> likes = result.FromJson <List<FacebookLikes>>();
Response.Write(likes.Count.ToString());
// now cache the likes somewhere, and get from cache next time.
Am not sure on the performance of this, not done any testing, but to me it looks a lot tidier and a lot more readable. And seeing as you are caching the data, then I'd go with the readable over the previous method.
Why is it expensive to parse the entire dataset? This should take milliseconds:
public static int CountLikes(string dataSet)
{
int i = 0;
int j = 0;
while ((i = dataSet.IndexOf('"id":', i)) != -1)
{
i += 5;
j++;
}
return j;
}
You can also append the parameter limit=# such as:
https://graph.facebook.com/161820597180936/likes?limit=1000

Using Protobuf-net, I suddenly got an exception about an unknown wire-type

(this is a re-post of a question that I saw in my RSS, but which was deleted by the OP. I've re-added it because I've seen this question asked several times in different places; wiki for "good form")
Suddenly, I receive a ProtoException when deserializing and the message is: unknown wire-type 6
What is a wire-type?
What are the different wire-type values and their description?
I suspect a field is causing the problem, how to debug this?
First thing to check:
IS THE INPUT DATA PROTOBUF DATA? If you try and parse another format (json, xml, csv, binary-formatter), or simply broken data (an "internal server error" html placeholder text page, for example), then it won't work.
What is a wire-type?
It is a 3-bit flag that tells it (in broad terms; it is only 3 bits after all) what the next data looks like.
Each field in protocol buffers is prefixed by a header that tells it which field (number) it represents,
and what type of data is coming next; this "what type of data" is essential to support the case where
unanticipated data is in the stream (for example, you've added fields to the data-type at one end), as
it lets the serializer know how to read past that data (or store it for round-trip if required).
What are the different wire-type values and their description?
0: variant-length integer (up to 64 bits) - base-128 encoded with the MSB indicating continuation (used as the default for integer types, including enums)
1: 64-bit - 8 bytes of data (used for double, or electively for long/ulong)
2: length-prefixed - first read an integer using variant-length encoding; this tells you how many bytes of data follow (used for strings, byte[], "packed" arrays, and as the default for child objects properties / lists)
3: "start group" - an alternative mechanism for encoding child objects that uses start/end tags - largely deprecated by Google, it is more expensive to skip an entire child-object field since you can't just "seek" past an unexpected object
4: "end group" - twinned with 3
5: 32-bit - 4 bytes of data (used for float, or electively for int/uint and other small integer types)
I suspect a field is causing the problem, how to debug this?
Are you serializing to a file? The most likely cause (in my experience) is that you have overwritten an existing file, but have not truncated it; i.e. it was 200 bytes; you've re-written it, but with only 182 bytes. There are now 18 bytes of garbage on the end of your stream that is tripping it up. Files must be truncated when re-writing protocol buffers. You can do this with FileMode:
using(var file = new FileStream(path, FileMode.Truncate)) {
// write
}
or alternatively by SetLength after writing your data:
file.SetLength(file.Position);
Other possible cause
You are (accidentally) deserializing a stream into a different type than what was serialized. It's worth double-checking both sides of the conversation to ensure this is not happening.
Since the stack trace references this StackOverflow question, I thought I'd point out that you can also receive this exception if you (accidentally) deserialize a stream into a different type than what was serialized. So it's worth double-checking both sides of the conversation to ensure this is not happening.
This can also be caused by an attempt to write more than one protobuf message to a single stream. The solution is to use SerializeWithLengthPrefix and DeserializeWithLengthPrefix.
Why this happens:
The protobuf specification supports a fairly small number of wire-types (the binary storage formats) and data-types (the .NET etc data-types). Additionally, this is not 1:1, nor is is 1:many or many:1 - a single wire-type can be used for multiple data-types, and a single data-type can be encoded via any of multiple wire-types. As a consequence, you cannot fully understand a protobuf fragment unless you already know the scema, so you know how to interpret each value. When you are, say, reading an Int32 data-type, the supported wire-types might be "varint", "fixed32" and "fixed64", where-as when reading a String data-type, the only supported wire-type is "string".
If there is no compatible map between the data-type and wire-type, then the data cannot be read, and this error is raised.
Now let's look at why this occurs in the scenario here:
[ProtoContract]
public class Data1
{
[ProtoMember(1, IsRequired=true)]
public int A { get; set; }
}
[ProtoContract]
public class Data2
{
[ProtoMember(1, IsRequired = true)]
public string B { get; set; }
}
class Program
{
static void Main(string[] args)
{
var d1 = new Data1 { A = 1};
var d2 = new Data2 { B = "Hello" };
var ms = new MemoryStream();
Serializer.Serialize(ms, d1);
Serializer.Serialize(ms, d2);
ms.Position = 0;
var d3 = Serializer.Deserialize<Data1>(ms); // This will fail
var d4 = Serializer.Deserialize<Data2>(ms);
Console.WriteLine("{0} {1}", d3, d4);
}
}
In the above, two messages are written directly after each-other. The complication is: protobuf is an appendable format, with append meaning "merge". A protobuf message does not know its own length, so the default way of reading a message is: read until EOF. However, here we have appended two different types. If we read this back, it does not know when we have finished reading the first message, so it keeps reading. When it gets to data from the second message, we find ourselves reading a "string" wire-type, but we are still trying to populate a Data1 instance, for which member 1 is an Int32. There is no map between "string" and Int32, so it explodes.
The *WithLengthPrefix methods allow the serializer to know where each message finishes; so, if we serialize a Data1 and Data2 using the *WithLengthPrefix, then deserialize a Data1 and a Data2 using the *WithLengthPrefix methods, then it correctly splits the incoming data between the two instances, only reading the right value into the right object.
Additionally, when storing heterogeneous data like this, you might want to additionally assign (via *WithLengthPrefix) a different field-number to each class; this provides greater visibility of which type is being deserialized. There is also a method in Serializer.NonGeneric which can then be used to deserialize the data without needing to know in advance what we are deserializing:
// Data1 is "1", Data2 is "2"
Serializer.SerializeWithLengthPrefix(ms, d1, PrefixStyle.Base128, 1);
Serializer.SerializeWithLengthPrefix(ms, d2, PrefixStyle.Base128, 2);
ms.Position = 0;
var lookup = new Dictionary<int,Type> { {1, typeof(Data1)}, {2,typeof(Data2)}};
object obj;
while (Serializer.NonGeneric.TryDeserializeWithLengthPrefix(ms,
PrefixStyle.Base128, fieldNum => lookup[fieldNum], out obj))
{
Console.WriteLine(obj); // writes Data1 on the first iteration,
// and Data2 on the second iteration
}
Previous answers already explain the problem better than I can. I just want to add an even simpler way to reproduce the exception.
This error will also occur simply if the type of a serialized ProtoMember is different from the expected type during deserialization.
For instance if the client sends the following message:
public class DummyRequest
{
[ProtoMember(1)]
public int Foo{ get; set; }
}
But what the server deserializes the message into is the following class:
public class DummyRequest
{
[ProtoMember(1)]
public string Foo{ get; set; }
}
Then this will result in the for this case slightly misleading error message
ProtoBuf.ProtoException: Invalid wire-type; this usually means you have over-written a file without truncating or setting the length
It will even occur if the property name changed. Let's say the client sent the following instead:
public class DummyRequest
{
[ProtoMember(1)]
public int Bar{ get; set; }
}
This will still cause the server to deserialize the int Bar to string Foo which causes the same ProtoBuf.ProtoException.
I hope this helps somebody debugging their application.
Also check the obvious that all your subclasses have [ProtoContract] attribute. Sometimes you can miss it when you have rich DTO.
I've seen this issue when using the improper Encoding type to convert the bytes in and out of strings.
Need to use Encoding.Default and not Encoding.UTF8.
using (var ms = new MemoryStream())
{
Serializer.Serialize(ms, obj);
var bytes = ms.ToArray();
str = Encoding.Default.GetString(bytes);
}
If you are using SerializeWithLengthPrefix, please mind that casting instance to object type breaks the deserialization code and causes ProtoBuf.ProtoException : Invalid wire-type.
using (var ms = new MemoryStream())
{
var msg = new Message();
Serializer.SerializeWithLengthPrefix(ms, (object)msg, PrefixStyle.Base128); // Casting msg to object breaks the deserialization code.
ms.Position = 0;
Serializer.DeserializeWithLengthPrefix<Message>(ms, PrefixStyle.Base128)
}
This happened in my case because I had something like this:
var ms = new MemoryStream();
Serializer.Serialize(ms, batch);
_queue.Add(Convert.ToBase64String(ms.ToArray()));
So basically I was putting a base64 into a queue and then, on the consumer side I had:
var stream = new MemoryStream(Encoding.UTF8.GetBytes(myQueueItem));
var batch = Serializer.Deserialize<List<EventData>>(stream);
So though the type of each myQueueItem was correct, I forgot that I converted a string. The solution was to convert it once more:
var bytes = Convert.FromBase64String(myQueueItem);
var stream = new MemoryStream(bytes);
var batch = Serializer.Deserialize<List<EventData>>(stream);

Categories

Resources