I have a MemoryStream which I write into as I receive data off the network. Since the data can be broken up, there is the potential for the stream to have a partial message or multiple messages stored in the stream. When deserializing, I place the pointer back at the beginning of the stream and try to deserialize a class of mine. I have the deserialize wrapped in a try catch block, but I get to the deserialize line, the application just quits out (no exception, not more lines run in the function, etc).
I have multiple questions:
What is the best way to receive a stream of XML data from the network that may or may not be complete, and if so may or may not have more than one message?
Does the deserializer need to know about the encoding to decode the XML within the MemoryStream?
Does deserialization place the stream pointer after the deserialized object?
Can you deserialize multiple objects within a single stream?
1) You can leverage the XmlReader class which "provides forward-only, read-only access to a stream of XML data". That may help you translate xml data that may not be complete. http://msdn.microsoft.com/en-us/library/vstudio/system.xml.xmlreader
2) If you are referring to the mixing ASCII, UTF-8, etc. then yes, otherwise I am not sure what the question is.
3) That depends on the deserializer you are using.
4) Yes, with the XMlReader class you can cleverly extract attributes and xml fragments for later consumption (although the solution is not elegant and rather ugly)
Related
I was using XmlSerializer when I came across someone using XmlTextWriter.
What is the difference between those two?
To me, they serve the same function which is to create XML files. Microsoft website said that XmlTextWriter provides a fast, non-cached, forward-only way of generating streams but I don't really know what that means.
The XmlTextWriter class is an object that knows XML. You can use it to generate arbitrary XML documents. It doesn't matter where the data's coming from; you can pull data for XML elements, attributes, and contents along with the actual structure of the XML document from whatever source you see fit, and it doesn't need to match any particular object's structure or data.
On the other hand XmlSerializer is an object that knows types. It has the features necessary to analyze a type, extract the important information, and write that information out. It happens to be able to use an XmlTextWriter object to perform the actual I/O; you can provide your own, or at some level it will always create a similar object to handle the actual I/O. In other words, the serializer object doesn't really know XML per se, nor does it need to. It delegates that work to another object.
Microsoft website said that XmlTextWriter provides a fast, non-cached, forward-only way of generating streams but I don't really know what that means.
"fast": not slow
"non-cached": important pieces of information are not stored in memory longer than absolutely necessary
"forward-only": you cannot revisit parts of the XML document you've already created
That is in contrast to other methods for generating XML documents in which the entire document structure is held in memory as its constructed, and written to a file only once the entire document has been constructed. This is often described as the "document object model", or DOM.
The writer approach tends to be more efficient in performance because the XML data is being generated on the fly, as needed, directly from other in-memory data structures you already have. Because the DOM approach requires the entire file's data and structure to be represented in memory at once, it will usually use more memory, which in some cases can reduce performance (though, frankly, on modern computers and for typical XML documents, this is usually a complete non-issue).
I am forced to work with a crappy 3rd party API where there is no consistency with the return type. So I submit a programmatic web request, grab the Stream back and the underlying content might be an error message (worse still because it can be either raw text, or xml they return) or it returns a binary file. I have no means of knowing what format to expect with any given request so I need a way to introspect this at runtime.
How should I go about tackling this? The stream is non-seekable so I can't do anything other than read it. I usually try not to use exception handling for flow control but it seems like that might be the best way to handle it. Always treat it like it should be the expected binary file type and if anything blows up then catch the exception and try to extract what should be an error message
One thing that comes to mind is to examine the first x number of bytes in the stream. If the first bit is well formed xml, then it's probably xml. The problem is trying to determine the difference between raw text or binary.
How can I show all elements in a protocol buffer message?
Do I need to use reflection or convert the message into an XML message and then show it?
Ideally some generic code that will work for any message.
Lars
A protobuf message is internally ambiguous unless you have the .proto schema (or can infer a schema) available, as (for example) a "string" wire-type could represent:
a utf-8 string
a BLOB
a sub-message
a packed array
Similar ambiguity exists for all wire-types (except perhaps "groups").
My recommendation would be to run it through your existing deserialization process (against the type-model that you presumably have available in the project) to get an object model suitable for inspection. From the object-model you have all the usual options - reflection, serialization via XmlSerializer / JavaScriptSerializer, etc.
If all you have is the raw data, there is a wireshark plugin that might help, or protobuf-net exists a ProtoReader class that might be useful for parsing such a stream; but the emphasis here is that the stream is tricky to decipher in isolation.
I am faced with the following problem. I need to (de)serialize (binary) a stream of objects to a single file on disk. Serialization part is not an issue, just open a stream in append mode and use .Net's BinaryFormatter Serialize method and you are done. The problem with this approach is that I can't just give this stream to BinaryFormatter's deserialize function, what it contains is not a single instance of the object I've serialized.
Does a common solution to this problem exists? All objects serialized to a given stream are of the same type, so at least we don't need to figure out what is to be deserialized, that's a given, but it doesn't seem to suggest a way out of this to me.
Clarification based on replies: The number of objects sent in is expected to be large and it is therefore infeasible to hold them all in a wrapper collection (as flushing to disk would require to load them all into memory -> add the new ones -> flush to disk).
Normally when you serialize a single object you get a file that contains:
[Object]
What I am creating is a file that contains:
[Object][Object][Object][Object]...[Object]
And I need to deserialize the individual Object instances.
Thanks in advance!
Answer: Since the answer is alluded to in this thread (with sufficient clarity), but never explicitly stated, I thought I'll state it here:
while (fileStream.Position < fileStream.Length)
messages.Add((Message)formatter.Deserialize(fileStream));
The BinaryFormatter will deserialize one object at a time as desired :) You might want to cache the fileStream.Length property, since the length appears to be re-computed every time you call the property, slowing things down. I've got no clue why that didn't work the first time I tried it before posting this question, but it does work flawlessly now.
Try putting your objects into a serializable collection (I believe List is serializable), then (de)serializing that object
EDIT in response to clarification:
I just realized that this question has the same answer as this question. Rather than try and reinvent an answer, I'd just take a look at Mark Gravell's answer, or to this one
A file is a serialization, so I would think sending your stream directly to a file would do what you seem to want. A more specific explanation of your situation would help, to provide a more useful answer. (I wish I could enter this as a 'comment', but somehow the comment button is not available to me.)
why we can't Serialize objects into Random Access file ? and on the other hand we can serialize objects into sequential access file ?
""C# does not provide a means to obtain an object’s size at runtime. This means that,
if we serialize the class, we cannot guarantee a fixed-length record size "" (from the book that i read in).
so we cannot read the the random access file because we don't know every object size in the file so how we could do seeking ??????
Any object marked with the SerializableAttribute attribute can be serialized (in most scenarios). The result from serialization is always directed to a stream, which may very well be a file output stream.
Are you asking why an object graph cannot be deserialized partially? .NET serialization only [de]serializes complete object graphs. Otherwise you'll have to turn to other serialization formatters, or write your own.
For direct random access to a file, you must open the file with a stream that supports seeking.
EDIT:
Seeking in the resulting stream from a serialization has no practical purpose - only the serialiation formatter knows what's in there anyway and should always be fed the very start of the stream.
For persisting the data into other structures; do it in a two-stage process: First, target the serialization bytes to a [i.e. memory-backed] stream that you can read the size from afterwards, then write the data to the actual backing store, using said knowledge of size.
You can't predict the size of a serialized object, because the serialized representation might differ a lot from the runtime representation.
It it still possible to achieve exact control over output size, if you use only primitive types, and you write using a BinaryWriter - but that is not serialization per-se.
The default binary serialization in .NET serializes a whole object graph, which, by its nature of being a graph, doesn't have a constant size, which means each serialization object (record) won't have a constant size, preventing random access.
To be able to randomly access any record in a file, write your own implementation of the binary serialization of your class, or use a database. If you need a simple, no-install single-threaded database engine, have a look at SQL Server Compact.