Determining content from StreamReader? - c#

I am forced to work with a crappy 3rd party API where there is no consistency with the return type. So I submit a programmatic web request, grab the Stream back and the underlying content might be an error message (worse still because it can be either raw text, or xml they return) or it returns a binary file. I have no means of knowing what format to expect with any given request so I need a way to introspect this at runtime.
How should I go about tackling this? The stream is non-seekable so I can't do anything other than read it. I usually try not to use exception handling for flow control but it seems like that might be the best way to handle it. Always treat it like it should be the expected binary file type and if anything blows up then catch the exception and try to extract what should be an error message

One thing that comes to mind is to examine the first x number of bytes in the stream. If the first bit is well formed xml, then it's probably xml. The problem is trying to determine the difference between raw text or binary.

Related

Intercepting a filestream...Impossible?

No doubt this isn't possible but i would like to see if anyone has an ingenious suggestion. We have a third party assembly which can output an image stored internally within a bespoke database to file using an internal method 'SaveToFile', an example:
3rdParty.Scripting.ImageManager man = new 3rdParty.Scripting.ImageManager("ref");
3rdParty.Scripting.Image itemImg = man.GetImage(orderNumber);
itemImg.SaveToFile("c:\file.jpg")
ItemImg.SaveToFile has no return type and just creates a bitmap internally and writes that to a filestream. We have absolutely no access to the compiled method.
What i need to do is somehow intercept the filestream and read the bitmap, i know this probably isn't possible but i'm no absolute expert so wanted to see if there is a magical way to do this.
If all else fails i'll save the file then read it back, i just want to avoid saving to disk where i might be able to obtain the data directly and eventually convert that to a base64 string value.
Unfortunately unless the 3rd party library provides a SaveToStream method where you could provide the stream from the outside there's no way to achieve what you are after. You will have to save the contents to a temporary file and then read the contents back.
That's why it's usually best practice when designing a library to provide methods taking Streams as I/O parameters as this would give the consumer the control of whether he wants to save it to a file, memory or network stream.

Deserializing .NET stream with multiple objects

I have a MemoryStream which I write into as I receive data off the network. Since the data can be broken up, there is the potential for the stream to have a partial message or multiple messages stored in the stream. When deserializing, I place the pointer back at the beginning of the stream and try to deserialize a class of mine. I have the deserialize wrapped in a try catch block, but I get to the deserialize line, the application just quits out (no exception, not more lines run in the function, etc).
I have multiple questions:
What is the best way to receive a stream of XML data from the network that may or may not be complete, and if so may or may not have more than one message?
Does the deserializer need to know about the encoding to decode the XML within the MemoryStream?
Does deserialization place the stream pointer after the deserialized object?
Can you deserialize multiple objects within a single stream?
1) You can leverage the XmlReader class which "provides forward-only, read-only access to a stream of XML data". That may help you translate xml data that may not be complete. http://msdn.microsoft.com/en-us/library/vstudio/system.xml.xmlreader
2) If you are referring to the mixing ASCII, UTF-8, etc. then yes, otherwise I am not sure what the question is.
3) That depends on the deserializer you are using.
4) Yes, with the XMlReader class you can cleverly extract attributes and xml fragments for later consumption (although the solution is not elegant and rather ugly)

Show all elements in a protocol buffer message

How can I show all elements in a protocol buffer message?
Do I need to use reflection or convert the message into an XML message and then show it?
Ideally some generic code that will work for any message.
Lars
A protobuf message is internally ambiguous unless you have the .proto schema (or can infer a schema) available, as (for example) a "string" wire-type could represent:
a utf-8 string
a BLOB
a sub-message
a packed array
Similar ambiguity exists for all wire-types (except perhaps "groups").
My recommendation would be to run it through your existing deserialization process (against the type-model that you presumably have available in the project) to get an object model suitable for inspection. From the object-model you have all the usual options - reflection, serialization via XmlSerializer / JavaScriptSerializer, etc.
If all you have is the raw data, there is a wireshark plugin that might help, or protobuf-net exists a ProtoReader class that might be useful for parsing such a stream; but the emphasis here is that the stream is tricky to decipher in isolation.

Where can I find a list of all possible messages that an XmlException can contain?

I'm writing an XML code editor and I want to display syntax errors in the user interface. Because my code editor is strongly constrained to a particular problem domain and audience, I want to rewrite certain XMLException messages to be more meaningful for users. For instance, an exception message like this:
'"' is an unexpected token. The
expected token is '='. Line 30,
position 35
.. is very technical and not very informative to my audience. Instead, I'd like to rewrite it and other messages to something else. For completeness' sake that means I need to build up a dictionary of existing messages mapped to the new message I would like to display instead. To accomplish that I'm going to need a list of all possible messages XMLException can contain.
Is there such a list somewhere? Or can I find out the possible messages through inspection of objects in C#?
Edit: specifically, I am using XmlDocument.LoadXml to parse a string into an XmlDocument, and that method throws an XmlException when there are syntax errors. So specifically, my question is where I can find a list of messages applied to XmlException by XmlDocument.LoadXml. The discussion about there potentially being a limitless variation of actual strings in the Message property of XmlException is moot.
Edit 2: More specifically, I'm not looking for advice as to whether I should be attempting this; I'm just looking for any clues to a way to obtain the various messages. Ben's answer is a step in the right direction. Does anyone know of another way?
Technically there is no such thing, any class that throws an XmlException can set the message to any string. Really it depends on which classes you are using, and how they handle exceptions. It is perfectly possible you may be using a class that includes context specific information in the message, e.g. info about some xml node or attribute that is malformed. In that case the number of unqiue message strings could be infinite depending on the XML that was being processed. It is equally possible that a particular class does not work in this way and has a finite number of messages that occur under specific circumstances. Perhaps a better aproach would be to use try/catch blocks in specific parts of your code, where you understand the processing that is taking place and provide more generic error messages based on what is happening. E.g. in your example you could simply look at the line and character number and produce an error along the lines of "Error processing xml file LineX CharacterY" or even something as general as "error processing file".
Edit:
Further to your edit i think you will have trouble doing what you require. Essentially you are trying to change a text string to another text string based on certain keywords that may be in the string. This is likely to be messy and inconsistent. If you really want to do it i would advise using something like Redgate .net Reflector to reflect out the loadXML method and dig through the code to see how it handles different kinds of syntax errors in the XML and what kind of messages it generates based on what kind of errors it finds. This is likely to be time consuming and dificult. If you want to hide the technical errors but still provide useful info to the user then i would still recomend ignoring the error message and simply pointing the user to the location of the problem in the file.
Just my opinion, but ... spelunking the error messages and altering them before displaying them to the user seems like a really misguided idea.
First, The messages are different for each international language. Even if you could collect them for English, and you're willing to pay the cost, they'll be different for other languages.
Second, even if you are dealing with a single language, there's no way to be sure that an external package hasn't injected a novel XmlException into the scope of LoadXml.
Last, the list of messages is not stable. It may change from release to release.
A better idea is to just emit an appropriate message from your own app, and optionally display -- maybe upon demand -- the original error message contained in the XmlException.

Validating and reparing xml

Is there a way to get more useful information on validation error? XmlSchemaException provides the line number and position of the error which makes little sense to me. Xml document after all is not about its transient textual representation. I'd like to get an enumerated error (or an error code) specifying what when wrong, node name (or an xpath) to locate the source of the problem so that perhaps I can try and fix it.
Edit: I'm talking about valid xml documents - just not valid against a particular schema!
In my experience, you are lucky to get a line number and parse position.
You might consider validating via a DTD which can sometimes give slightly more interesting errors, however, on a project I currently work on, we validate using XSLTs. The transform checks the syntax and reports errors as outputted transform text. I would consider that route if you want more friendly error checking. For us, an empty output means no errors, otherwise we get some nice detail from the XSLT processing on what the error was and where.
You can accomplish this, sort of, by setting up an XmlReader whose XmlReaderSettings contain the schema and then using it to read through the input stream node by node. You can keep track of the last node read and have a pretty good idea of where you are in the document when a validation error happens.
I think that if you try this exercise, you'll discover that there are a lot of validation errors (e.g. required element missing) where the concept of the error node doesn't make much sense. Yes, the parent element is clearly what's in error in that case, but what really triggered the error was the reader encountering the end tag without ever seeing the required element, which is why the error line and position point at the end tag.
personally I'm not sure how to get a more detailed error, typcially f you open the document and go to the location mentioned you can easily find the error.
If the code isn't able to parse the file as valid XML, it is pretty hard for it to give an XPATH or other named XML detail.
It seems this is no easy task. Robert Rossney's answer comes closest to programmaticaly solving my problem so I'll accept that for now. I'll continue using the xsl solution. Anyone finding a better way to resolve validation errors can respond to this thread.

Categories

Resources