Parsing concatenated, non-delimited XML messages from TCP-stream using C# - c#

I am trying to parse XML messages which are send to my C# application over TCP. Unfortunately, the protocol can not be changed and the XML messages are not delimited and no length prefix is used. Moreover the character encoding is not fixed but each message starts with an XML declaration <?xml>. The question is, how can i read one XML message at a time, using C#.
Up to now, I tried to read the data from the TCP stream into a byte array and use it through a MemoryStream. The problem is, the buffer might contain more than one XML messages or the first message may be incomplete. In these cases, I get an exception when trying to parse it with XmlReader.Read or XmlDocument.Load, but unfortunately the XmlException does not really allow me to distinguish the problem (except parsing the localized error string).
I tried using XmlReader.Read and count the number of Element and EndElement nodes. That way I know when I am finished reading the first, entire XML message.
However, there are several problems. If the buffer does not yet contain the entire message, how can I distinguish the XmlException from an actually invalid, non-well-formed message? In other words, if an exception is thrown before reading the first root EndElement, how can I decide whether to abort the connection with error, or to collect more bytes from the TCP stream?
If no exception occurs, the XmlReader is positioned at the start of the root EndElement. Casting the XmlReader to IXmlLineInfo gives me the current LineNumber and LinePosition, however it is not straight forward to get the byte position where the EndElement really ends. In order to do that, I would have to convert the byte array into a string (with the encoding specified in the XML declaration), seek to LineNumber,LinePosition and convert that back to the byte offset. I try to do that with StreamReader.ReadLine, but the stream reader gives no public access to the current byte position.
All this seams very inelegant and non robust. I wonder if you have ideas for a better solution. Thank you.

After locking around for some time I think I can answer my own question as following (I might be wrong, corrections are welcome):
I found no method so that the XmlReader can continue parsing a second XML message (at least not, if the second message has an XmlDeclaration). XmlTextReader.ResetState could do something similar, but for that I would have to assume the same encoding for all messages. Therefor I could not connect the XmlReader directly to the TcpStream.
After closing the XmlReader, the buffer is not positioned at the readers last position. So it is not possible to close the reader and use a new one to continue with the next message. I guess the reason for this is, that the reader could not successfully seek on every possible input stream.
When XmlReader throws an exception it can not be determined whether it happened because of an premature EOF or because of a non-wellformed XML. XmlReader.EOF is not set in case of an exception. As workaround I derived my own MemoryBuffer, which returns the very last byte as a single byte. This way I know that the XmlReader was really interested in the last byte and the following exception is likely due to a truncated message (this is kinda sloppy, in that it might not detect every non-wellformed message. However, after appending more bytes to the buffer, sooner or later the error will be detected.
I could cast my XmlReader to the IXmlLineInfo interface, which gives access to the LineNumber and the LinePosition of the current node. So after reading the first message I remember these positions and use it to truncate the buffer. Here comes the really sloppy part, because I have to use the character encoding to get the byte position. I am sure you could find test cases for the code below where it breaks (e.g. internal elements with mixed encoding). But up to now it worked for all my tests.
Here is the parser class I came up with -- may it be useful (I know, its very far from perfect...)
class XmlParser {
private byte[] buffer = new byte[0];
public int Length {
get {
return buffer.Length;
}
}
// Append new binary data to the internal data buffer...
public XmlParser Append(byte[] buffer2) {
if (buffer2 != null && buffer2.Length > 0) {
// I know, its not an efficient way to do this.
// The EofMemoryStream should handle a List<byte[]> ...
byte[] new_buffer = new byte[buffer.Length + buffer2.Length];
buffer.CopyTo(new_buffer, 0);
buffer2.CopyTo(new_buffer, buffer.Length);
buffer = new_buffer;
}
return this;
}
// MemoryStream which returns the last byte of the buffer individually,
// so that we know that the buffering XmlReader really locked at the last
// byte of the stream.
// Moreover there is an EOF marker.
private class EofMemoryStream: Stream {
public bool EOF { get; private set; }
private MemoryStream mem_;
public override bool CanSeek {
get {
return false;
}
}
public override bool CanWrite {
get {
return false;
}
}
public override bool CanRead {
get {
return true;
}
}
public override long Length {
get {
return mem_.Length;
}
}
public override long Position {
get {
return mem_.Position;
}
set {
throw new NotSupportedException();
}
}
public override void Flush() {
mem_.Flush();
}
public override long Seek(long offset, SeekOrigin origin) {
throw new NotSupportedException();
}
public override void SetLength(long value) {
throw new NotSupportedException();
}
public override void Write(byte[] buffer, int offset, int count) {
throw new NotSupportedException();
}
public override int Read(byte[] buffer, int offset, int count) {
count = Math.Min(count, Math.Max(1, (int)(Length - Position - 1)));
int nread = mem_.Read(buffer, offset, count);
if (nread == 0) {
EOF = true;
}
return nread;
}
public EofMemoryStream(byte[] buffer) {
mem_ = new MemoryStream(buffer, false);
EOF = false;
}
protected override void Dispose(bool disposing) {
mem_.Dispose();
}
}
// Parses the first xml message from the stream.
// If the first message is not yet complete, it returns null.
// If the buffer contains non-wellformed xml, it ~should~ throw an exception.
// After reading an xml message, it pops the data from the byte array.
public Message deserialize() {
if (buffer.Length == 0) {
return null;
}
Message message = null;
Encoding encoding = Message.default_encoding;
//string xml = encoding.GetString(buffer);
using (EofMemoryStream sbuffer = new EofMemoryStream (buffer)) {
XmlDocument xmlDocument = null;
XmlReaderSettings settings = new XmlReaderSettings();
int LineNumber = -1;
int LinePosition = -1;
bool truncate_buffer = false;
using (XmlReader xmlReader = XmlReader.Create(sbuffer, settings)) {
try {
// Read to the first node (skipping over some element-types.
// Don't use MoveToContent here, because it would skip the
// XmlDeclaration too...
while (xmlReader.Read() &&
(xmlReader.NodeType==XmlNodeType.Whitespace ||
xmlReader.NodeType==XmlNodeType.Comment)) {
};
// Check for XML declaration.
// If the message has an XmlDeclaration, extract the encoding.
switch (xmlReader.NodeType) {
case XmlNodeType.XmlDeclaration:
while (xmlReader.MoveToNextAttribute()) {
if (xmlReader.Name == "encoding") {
encoding = Encoding.GetEncoding(xmlReader.Value);
}
}
xmlReader.MoveToContent();
xmlReader.Read();
break;
}
// Move to the first element.
xmlReader.MoveToContent();
if (xmlReader.EOF) {
return null;
}
// Read the entire document.
xmlDocument = new XmlDocument();
xmlDocument.Load(xmlReader.ReadSubtree());
} catch (XmlException e) {
// The parsing of the xml failed. If the XmlReader did
// not yet look at the last byte, it is assumed that the
// XML is invalid and the exception is re-thrown.
if (sbuffer.EOF) {
return null;
}
throw e;
}
{
// Try to serialize an internal data structure using XmlSerializer.
Type type = null;
try {
type = Type.GetType("my.namespace." + xmlDocument.DocumentElement.Name);
} catch (Exception e) {
// No specialized data container for this class found...
}
if (type == null) {
message = new Message();
} else {
// TODO: reuse the serializer...
System.Xml.Serialization.XmlSerializer ser = new System.Xml.Serialization.XmlSerializer(type);
message = (Message)ser.Deserialize(new XmlNodeReader(xmlDocument));
}
message.doc = xmlDocument;
}
// At this point, the first XML message was sucessfully parsed.
// Remember the lineposition of the current end element.
IXmlLineInfo xmlLineInfo = xmlReader as IXmlLineInfo;
if (xmlLineInfo != null && xmlLineInfo.HasLineInfo()) {
LineNumber = xmlLineInfo.LineNumber;
LinePosition = xmlLineInfo.LinePosition;
}
// Try to read the rest of the buffer.
// If an exception is thrown, another xml message appears.
// This way the xml parser could tell us that the message is finished here.
// This would be prefered as truncating the buffer using the line info is sloppy.
try {
while (xmlReader.Read()) {
}
} catch {
// There comes a second message. Needs workaround for trunkating.
truncate_buffer = true;
}
}
if (truncate_buffer) {
if (LineNumber < 0) {
throw new Exception("LineNumber not given. Cannot truncate xml buffer");
}
// Convert the buffer to a string using the encoding found before
// (or the default encoding).
string s = encoding.GetString(buffer);
// Seek to the line.
int char_index = 0;
while (--LineNumber > 0) {
// Recognize \r , \n , \r\n as newlines...
char_index = s.IndexOfAny(new char[] {'\r', '\n'}, char_index);
// char_index should not be -1 because LineNumber>0, otherwise an RangeException is
// thrown, which is appropriate.
char_index++;
if (s[char_index-1]=='\r' && s.Length>char_index && s[char_index]=='\n') {
char_index++;
}
}
char_index += LinePosition - 1;
var rgx = new System.Text.RegularExpressions.Regex(xmlDocument.DocumentElement.Name + "[ \r\n\t]*\\>");
System.Text.RegularExpressions.Match match = rgx.Match(s, char_index);
if (!match.Success || match.Index != char_index) {
throw new Exception("could not find EndElement to truncate the xml buffer.");
}
char_index += match.Value.Length;
// Convert the character offset back to the byte offset (for the given encoding).
int line1_boffset = encoding.GetByteCount(s.Substring(0, char_index));
// remove the bytes from the buffer.
buffer = buffer.Skip(line1_boffset).ToArray();
} else {
buffer = new byte[0];
}
}
return message;
}
}

Reading into a MemoryStream is not necessary to use an XmlReader. You can attach the reader more directly to the stream to read as much as you require to reach the end of the XML document. A BufferedStream can be utilized to improve the efficiency of reading from the socket directly.
string server = "tcp://myserver"
string message = "GetMyXml"
int port = 13000;
int bufferSize = 1024;
using(var client = new TcpClient(server, port))
using(var clientStream = client.GetStream())
using(var bufferedStream = new BufferedStream(clientStream, bufferSize))
using(var xmlReader = XmlReader.Create(bufferedStream))
{
xmlReader.MoveToContent();
try
{
while(xmlReader.Read())
{
// Check for XML declaration.
if(xmlReader.NodeType != XmlNodeType.XmlDeclaration)
{
throw new Exception("Expected XML declaration.");
}
// Move to the first element.
xmlReader.Read();
xmlReader.MoveToContent();
// Read the root element.
// Hand this document to another method to process further.
var xmlDocument = XmlDocument.Load(xmlReader.ReadSubtree());
}
}
catch(XmlException ex)
{
// Record exception reading stream.
// Move reader to start of next document or rethrow exception to exit.
}
}
The key to making this work is the call to XmlReader.ReadSubtree() which creates a child reader on top of the parent reader, one that will treat the current element (in this case the root element) as the entire XML tree. This should allow you to parse document elements separately.
My code's a little sloppy around reading the document, especially as I ignore all the information in the XML declaration. I'm sure there's room for improvement, but hopefully this gets you on the right track.

Assuming that you can change the protocol, I'd suggest adding start and stop markers to the messages, so that when you read it all in as a text stream you can split it up in separate messages (leaving incomplete messages in an "incoming buffer" of some kind), clean up the markers and then you know that you've got exactly one message at the time.

The 2 issues that I found were:
XmlReader will only permit an XML declaration at the very beginning. Since it can't be reset it needs to be recreated.
Once XmlReader has done its work it will usually have consumed additional characters after the end of the document because it uses the Read(char[], int, int) method.
My (brittle) workaround is to create a wrapper that only fills the array until a '>' is encountered. This keeps the XmlReader from consuming characters past the ending > of the document it was parsing:
public class SegmentingReader : TextReader {
private TextReader reader;
private char trigger;
public SegmentingReader(TextReader reader, char trigger) {
this.reader = reader;
this.trigger = trigger;
}
// Dispose omitted for brevity
public override int Peek() { return reader.Peek(); }
public override int Read() { return reader.Read(); }
public override int Read(char[] buffer, int index, int count) {
int n = 0;
while (n < count) {
char ch = (char)reader.Read();
buffer[index + n] = ch;
n++;
if (ch == trigger) break;
}
return n;
}
}
Then it can be used as simply as:
using(var inputReader = new SegmentingReader(/*TextReader from somewhere */))
using(var serializer = new XmlSerializer(typeof(SerializedClass)))
while (inputReader.Peek() != -1)
{
using (var xmlReader = XmlReader.Create(inputReader)) {
xmlReader.MoveToContent();
var obj = serializer.Deserialize(xmlReader.ReadSubtree());
DoStuff(obj);
}
}

Related

Split one xml file to multiple using c# [duplicate]

[Case]
I have reveived a bunch of 'xml files' with metadata about a big number of documents in them. At least, that was what I requested. What I received where 'xml files' without a root element, they are structured something like this (i left out a bunch of elements):
<folder name = "abc"></folder>
<folder name = "abc/def">
<document name = "ghi1">
</document>
<document name = "ghi2">
</document>
</folder>
[Problem]
When I try to read the file in an XmlTextReader object it fails telling me that there is no root element.
[Current workaround]
Of course I can read the file as a stream, append < xmlroot> and < /xmlroot> and write the stream to a new file and read that one in XmlTextReader. Which is exactly what I am doing now, but I prefer not to 'tamper' with the original data.
[Requested solution]
I understand that I should use XmlTextReader for this, with the DocumentFragment option. However, this gives the compiletime error:
An unhandled exception of type 'System.Xml.XmlException' occurred in
System.Xml.dll
Additional information: XmlNodeType DocumentFragment is not supported
for partial content parsing. Line 1, position 1.
[Faulty code]
using System.Diagnostics;
using System.Xml;
namespace XmlExample
{
class Program
{
static void Main(string[] args)
{
string file = #"C:\test.txt";
XmlTextReader tr = new XmlTextReader(file, XmlNodeType.DocumentFragment, null);
while(tr.Read())
Debug.WriteLine("NodeType: {0} NodeName: {1}", tr.NodeType, tr.Name);
}
}
}
This works:
using System.Diagnostics;
using System.Xml;
namespace XmlExample
{
class Program
{
static void Main(string[] args)
{
string file = #"C:\test.txt";
XmlReaderSettings settings = new XmlReaderSettings();
settings.ConformanceLevel = ConformanceLevel.Fragment;
using (XmlReader reader = XmlReader.Create(file, settings))
{
while (reader.Read())
Debug.WriteLine("NodeType: {0} NodeName: {1}", reader.NodeType, reader.Name);
}
}
}
}
Even though the XmlReader can be made to read the data using the ConformanceLevel.Fragment option as demonstrated by Martijn, it seems that XmlDataDocument does not like the idea of having multiple root elements.
I thought I'd try a different approach, much like the one you're currently using, but without the intermediate file. Most XML libraries (XmlDocument, XDocument, XmlDataDocument) can take a TextReader as an input, so I've implemented one of my own. It's used like so:
var dataDocument = new XmlDataDocument();
dataDocument.Load(new FakeRootStreamReader(File.OpenRead("test.xml")));
The code of the actual class:
public class FakeRootStreamReader : TextReader
{
private static readonly char[] _rootStart;
private static readonly char[] _rootEnd;
private readonly TextReader _innerReader;
private int _charsRead;
private bool _eof;
static FakeRootStreamReader()
{
_rootStart = "<root>".ToCharArray();
_rootEnd = "</root>".ToCharArray();
}
public FakeRootStreamReader(Stream stream)
{
_innerReader = new StreamReader(stream);
}
public FakeRootStreamReader(TextReader innerReader)
{
_innerReader = innerReader;
}
public override int Read(char[] buffer, int index, int count)
{
if (!_eof && _charsRead < _rootStart.Length)
{
// Prepend root element
return ReadFake(_rootStart, buffer, index, count);
}
if (!_eof)
{
// Normal reading operation
int charsRead = _innerReader.Read(buffer, index, count);
if (charsRead > 0) return charsRead;
// We've reached the end of the Stream
_eof = true;
_charsRead = 0;
}
// Append root element end tag at the end of the Stream
return ReadFake(_rootEnd, buffer, index, count);
}
private int ReadFake(char[] source, char[] buffer, int offset, int count)
{
int length = Math.Min(source.Length - _charsRead, count);
Array.Copy(source, _charsRead, buffer, offset, length);
_charsRead += length;
return length;
}
}
The first call to Read(...) will return only the <root> element. Subsequent calls read the stream as normal, until the end of the stream is reached, then the end tag is outputted.
The code is a bit... meh... mostly because I wanted to handle some never-gonna-happen cases where someone tries to read the stream less than 6 characters at a time.

Xml over tcp without message frame

I have to implement a tcp connection where raw xml data is passed.
Unfortunately there is no message framing, I now this is realy bad, but I have to deal with this...
The Message would look like this:
<?xml version="1.0" encoding="utf-8"?>
<DATA></DATA>
or this
<?xml version="1.0" encoding="utf-8"?>
<DATA />
Now I have to receive messages that could have self closed tags. The message is always the same, it is always like xml description and a data tag with inner xml that is the message content.
So if it would be without self closed tags, this would be easy, but how can I read both?
By the way I am using the TcpListener.
Edit :
Everything is fine if there is no self closed tag.
if (_clientSocket != null)
{
NetworkStream networkStream = _clientSocket.GetStream();
_clientSocket.ReceiveTimeout = 100; // 1000 miliseconds
while (_continueProcess)
{
if (networkStream.DataAvailable)
{
bool isMessageComplete = false;
String messageString = String.Empty;
while (!isMessageComplete)
{
var bytes = new byte[_clientSocket.ReceiveBufferSize];
try
{
int bytesReaded = networkStream.Read(bytes, 0, (int) _clientSocket.ReceiveBufferSize);
if (bytesReaded > 0)
{
var data = Encoding.UTF8.GetString(bytes, 0, bytesReaded);
messageString += data;
if (messageString.IndexOf("<DATA", StringComparison.OrdinalIgnoreCase) > 0 &&
messageString.IndexOf("</DATA", StringComparison.OrdinalIgnoreCase) > 0)
{
isMessageComplete = true;
}
}
}
catch (IOException)
{
// Timeout
}
catch (SocketException)
{
Console.WriteLine("Conection is broken!");
break;
}
}
}
Thread.Sleep(200);
} // while ( _continueProcess )
networkStream.Close();
_clientSocket.Close();
}
Edit 2 (30.03.2015 12:00)
Unfortunately it is not possible to use some kind of message frame.
So I ended up to use this part of code (DATA is my root node):
if (_clientSocket != null)
{
NetworkStream networkStream = _clientSocket.GetStream();
_clientSocket.ReceiveTimeout = 100;
string data = string.Empty;
while (_continueProcess)
{
try
{
if (networkStream.DataAvailable)
{
Stopwatch sw = new Stopwatch();
sw.Start();
var bytes = new byte[_clientSocket.ReceiveBufferSize];
int completeXmlLength = 0;
int bytesReaded = networkStream.Read(bytes, 0, (int) _clientSocket.ReceiveBufferSize);
if (bytesReaded > 0)
{
message.AddRange(bytes);
data += Encoding.UTF8.GetString(bytes, 0, bytesReaded);
if (data.IndexOf("<?", StringComparison.Ordinal) == 0)
{
if (data.IndexOf("<DATA", StringComparison.Ordinal) > 0)
{
Int32 rootStartPos = data.IndexOf("<DATA", StringComparison.Ordinal);
completeXmlLength += rootStartPos;
var root = data.Substring(rootStartPos);
int rootCloseTagPos = root.IndexOf(">", StringComparison.Ordinal);
Int32 rootSelfClosedTagPos = root.IndexOf("/>", StringComparison.Ordinal);
// If there is an empty tag that is self closed.
if (rootSelfClosedTagPos > 0)
{
string rootTag = root.Substring(0, rootSelfClosedTagPos +1);
// If there is no '>' between the self closed tag and the start of '<DATA'
// the root element is empty.
if (rootTag.IndexOf(">", StringComparison.Ordinal) <= 0)
{
completeXmlLength += rootSelfClosedTagPos;
string messageXmlString = data.Substring(0, completeXmlLength + 1);
data = data.Substring(messageXmlString.Length);
try
{
// parse complete xml.
XDocument xmlDocument = XDocument.Parse(messageXmlString);
}
catch(Exception)
{
// Invalid Xml.
}
continue;
}
}
if (rootCloseTagPos > 0)
{
Int32 rootEndTagStartPos = root.IndexOf("</DATA", StringComparison.Ordinal);
if (rootEndTagStartPos > 0)
{
var endTagString = root.Substring(rootEndTagStartPos);
completeXmlLength += rootEndTagStartPos;
Int32 completeEndPos = endTagString.IndexOf(">", StringComparison.Ordinal);
if (completeEndPos > 0)
{
completeXmlLength += completeEndPos;
string messageXmlString = data.Substring(0, completeXmlLength + 1);
data = data.Substring(messageXmlString.Length);
try
{
// parse complete xml.
XDocument xmlDocument = XDocument.Parse(messageXmlString);
}
catch(Exception)
{
// Invalid Xml.
}
}
}
}
}
}
}
sw.Stop();
string timeElapsed = sw.Elapsed.ToString();
}
}
catch (IOException)
{
data = String.Empty;
}
catch (SocketException)
{
Console.WriteLine("Conection is broken!");
break;
}
}
This code I had use if ther were some kind of message framing, in this case 4 bytes of message length:
if (_clientSocket != null)
{
NetworkStream networkStream = _clientSocket.GetStream();
_clientSocket.ReceiveTimeout = 100;
string data = string.Empty;
while (_continueProcess)
{
try
{
if (networkStream.DataAvailable)
{
Stopwatch sw = new Stopwatch();
sw.Start();
var lengthBytes = new byte[sizeof (Int32)];
int bytesReaded = networkStream.Read(lengthBytes, 0, sizeof (Int32) - offset);
if (bytesReaded > 0)
{
offset += bytesReaded;
message.AddRange(lengthBytes.Take(bytesReaded));
}
if (offset < sizeof (Int32))
{
continue;
}
Int32 length = BitConverter.ToInt32(message.Take(sizeof(Int32)).ToArray(), 0);
message.Clear();
while (length > 0)
{
Int32 bytesToRead = length < _clientSocket.ReceiveBufferSize ? length : _clientSocket.ReceiveBufferSize;
byte[] messageBytes = new byte[bytesToRead];
bytesReaded = networkStream.Read(messageBytes, 0, bytesToRead);
length = length - bytesReaded;
message.AddRange(messageBytes);
}
try
{
string xml = Encoding.UTF8.GetString(message.ToArray());
XDocument xDocument = XDocument.Parse(xml);
}
catch (Exception ex)
{
// Invalid Xml.
}
sw.Stop();
string timeElapsed = sw.Elapsed.ToString();
}
}
catch (IOException)
{
data = String.Empty;
}
catch (SocketException)
{
Console.WriteLine("Conection is broken!");
break;
}
}
Like you can see I wanted to measure the elapsed time, to see witch methode has a better performance. The strange thing is that the methode whith no message framing has an average time of 0,2290 ms, the other methode has an average time of 1,2253 ms.
Can someone explain me why? I thought the one without message framing would be slower...
Hand the NetworkStream to the .NET XML infrastructure. For example create an XmlReader from the NetworkStream.
Unfortunately I did not find a built-in way to easily create an XmlDocument from an XmlReader that has multiple documents in it. It complains about multiple root elements (which is correct). You would need to wrap the XmlReader and make it stop returning nodes when the first document is done. You can do that by keeping track of some state and by looking at the nesting level. When the nesting level is zero again the first document is done.
This is just a raw sketch. I'm pretty sure this will work and it handles all possible XML documents.
No need for this horrible string processing code that you have there. The existing code looks quite slow as well but since this approach is much better it serves no purpose to comment on the perf issues. You need to throw this away.
I had the same problem - 3rd party system sends messages in XML format via TCP but my TCP client application may receive message partially or several messages at once. One of my colleagues proposed very simple and quite generic solution.
The idea is to have a string buffer which should be populated char by char from TCP stream, after each char try to parse buffer content with regular .Net XML parser. If parser throws an exception - continue adding chars to the buffer. Otherwise - message is ready and can be processed by application.
Here is the code:
private object _dataReceiverLock = new object();
private string _messageBuffer;
private Stopwatch _timeSinceLastMessage = new Stopwatch();
private List<string> NormalizeMessage(string rawMsg)
{
lock (_dataReceiverLock)
{
List<string> result = new List<string>();
//following code prevents buffer to store too old information
if (_timeSinceLastMessage.ElapsedMilliseconds > _settings.ResponseTimeout)
{
_messageBuffer = string.Empty;
}
_timeSinceLastMessage.Restart();
foreach (var ch in rawMsg)
{
_messageBuffer += ch;
if (ch == '>')//to avoid extra checks
{
if (IsValidXml(_messageBuffer))
{
result.Add(_messageBuffer);
_messageBuffer = string.Empty;
}
}
}
return result;
}
}
private bool IsValidXml(string xml)
{
try
{
//fastest way to validate XML format correctness
using (XmlTextReader reader = new XmlTextReader(new StringReader(xml)))
{
while (reader.Read()) { }
}
return true;
}
catch
{
return false;
}
}
Few comments:
Need to control lifetime of string buffer, otherwise in case network disconnection old information may stay in the buffer forever
There major problem here is the performance - parsing after every new character is quite slow. So need to add some optimizations, such as parse only after '>' character.
Make sure this method is thread safe, otherwise several threads may flood string buffer with different XML pieces.
The usage is simple:
private void _tcpClient_DataReceived(byte[] data)
{
var rawMsg = Encoding.Unicode.GetString(data);
var normalizedMessages = NormalizeMessage(rawMsg);
foreach (var normalizedMessage in normalizedMessages)
{
//TODO: your logic
}
}

Read a 'fake' xml document (xml fragment) in a XmlTextReader object

[Case]
I have reveived a bunch of 'xml files' with metadata about a big number of documents in them. At least, that was what I requested. What I received where 'xml files' without a root element, they are structured something like this (i left out a bunch of elements):
<folder name = "abc"></folder>
<folder name = "abc/def">
<document name = "ghi1">
</document>
<document name = "ghi2">
</document>
</folder>
[Problem]
When I try to read the file in an XmlTextReader object it fails telling me that there is no root element.
[Current workaround]
Of course I can read the file as a stream, append < xmlroot> and < /xmlroot> and write the stream to a new file and read that one in XmlTextReader. Which is exactly what I am doing now, but I prefer not to 'tamper' with the original data.
[Requested solution]
I understand that I should use XmlTextReader for this, with the DocumentFragment option. However, this gives the compiletime error:
An unhandled exception of type 'System.Xml.XmlException' occurred in
System.Xml.dll
Additional information: XmlNodeType DocumentFragment is not supported
for partial content parsing. Line 1, position 1.
[Faulty code]
using System.Diagnostics;
using System.Xml;
namespace XmlExample
{
class Program
{
static void Main(string[] args)
{
string file = #"C:\test.txt";
XmlTextReader tr = new XmlTextReader(file, XmlNodeType.DocumentFragment, null);
while(tr.Read())
Debug.WriteLine("NodeType: {0} NodeName: {1}", tr.NodeType, tr.Name);
}
}
}
This works:
using System.Diagnostics;
using System.Xml;
namespace XmlExample
{
class Program
{
static void Main(string[] args)
{
string file = #"C:\test.txt";
XmlReaderSettings settings = new XmlReaderSettings();
settings.ConformanceLevel = ConformanceLevel.Fragment;
using (XmlReader reader = XmlReader.Create(file, settings))
{
while (reader.Read())
Debug.WriteLine("NodeType: {0} NodeName: {1}", reader.NodeType, reader.Name);
}
}
}
}
Even though the XmlReader can be made to read the data using the ConformanceLevel.Fragment option as demonstrated by Martijn, it seems that XmlDataDocument does not like the idea of having multiple root elements.
I thought I'd try a different approach, much like the one you're currently using, but without the intermediate file. Most XML libraries (XmlDocument, XDocument, XmlDataDocument) can take a TextReader as an input, so I've implemented one of my own. It's used like so:
var dataDocument = new XmlDataDocument();
dataDocument.Load(new FakeRootStreamReader(File.OpenRead("test.xml")));
The code of the actual class:
public class FakeRootStreamReader : TextReader
{
private static readonly char[] _rootStart;
private static readonly char[] _rootEnd;
private readonly TextReader _innerReader;
private int _charsRead;
private bool _eof;
static FakeRootStreamReader()
{
_rootStart = "<root>".ToCharArray();
_rootEnd = "</root>".ToCharArray();
}
public FakeRootStreamReader(Stream stream)
{
_innerReader = new StreamReader(stream);
}
public FakeRootStreamReader(TextReader innerReader)
{
_innerReader = innerReader;
}
public override int Read(char[] buffer, int index, int count)
{
if (!_eof && _charsRead < _rootStart.Length)
{
// Prepend root element
return ReadFake(_rootStart, buffer, index, count);
}
if (!_eof)
{
// Normal reading operation
int charsRead = _innerReader.Read(buffer, index, count);
if (charsRead > 0) return charsRead;
// We've reached the end of the Stream
_eof = true;
_charsRead = 0;
}
// Append root element end tag at the end of the Stream
return ReadFake(_rootEnd, buffer, index, count);
}
private int ReadFake(char[] source, char[] buffer, int offset, int count)
{
int length = Math.Min(source.Length - _charsRead, count);
Array.Copy(source, _charsRead, buffer, offset, length);
_charsRead += length;
return length;
}
}
The first call to Read(...) will return only the <root> element. Subsequent calls read the stream as normal, until the end of the stream is reached, then the end tag is outputted.
The code is a bit... meh... mostly because I wanted to handle some never-gonna-happen cases where someone tries to read the stream less than 6 characters at a time.

Trying to deserialize more than 1 object at the same time

Im trying to send some object from a server to the client.
My problem is that when im sending only 1 object, everything works correctly. But at the moment i add another object an exception is thrown - "binary stream does not contain a valid binaryheader" or "No map for object (random number)".
My thoughts are that the deserialization does not understand where the stream starts / ends and i hoped that you guys can help me out here.
heres my deserialization code:
public void Listen()
{
try
{
bool offline = true;
Dispatcher.Invoke(System.Windows.Threading.DispatcherPriority.Normal,
new Action(() => offline = Offline));
while (!offline)
{
TcpObject tcpObject = new TcpObject();
IFormatter formatter = new BinaryFormatter();
tcpObject = (TcpObject)formatter.Deserialize(serverStream);
if (tcpObject.Command == Command.Transfer)
{
SentAntenna sentAntenna = (SentAntenna)tcpObject.Object;
int idx = 0;
foreach (string name in SharedProperties.AntennaNames)
{
if (name == sentAntenna.Name)
break;
idx++;
}
if (idx < 9)
{
PointCollection pointCollection = new PointCollection();
foreach (Frequency f in sentAntenna.Frequencies)
pointCollection.Add(new Point(f.Channel, f.Intensity));
SharedProperties.AntennaPoints[idx] = pointCollection;
}
}
}
}
catch (Exception ex)
{
MessageBox.Show(ex.Message); // raise an event
}
}
serialization code:
case Command.Transfer:
Console.WriteLine("Transfering");
Thread transfer = new Thread(new ThreadStart(delegate
{
try
{
string aName = tcpObject.Object.ToString();
int indx = 0;
foreach (string name in names)
{
if (name == aName)
break;
indx++;
}
if (indx < 9)
{
while (true) // need to kill when the father thread terminates
{
if (antennas[indx].Frequencies != null)
{
lock (antennas[indx].Frequencies)
{
TcpObject sendTcpObject = new TcpObject();
sendTcpObject.Command = Command.Transfer;
SentAntenna sa = new SentAntenna(antennas[indx].Frequencies, aName);
sendTcpObject.Object = sa;
formatter.Serialize(networkStream, sendTcpObject);
}
}
}
}
}
catch (Exception ex) { Console.WriteLine(ex); }
}));
transfer.Start();
break;
Interesting. There's nothing particularly odd in your serialization code, and I've seen people use vanilla concatenation for multiple objects in the past, although I've actually always advised against it as BinaryFormatter does not explicitly claim this scenario is OK. But: if it isn't, the only thing I can suggest is to implement your own framing; so your write code becomes:
serialize to an empty MemoryStream
note the length and write the length to the NetworkStream, for example as a simple fixed-width 32-bit network-byte-order integer
write the payload from the MemoryStream to the NetworkStream
rinse, repeat
And the read code becomes:
read exactly 4 bytes and compute the length
buffer that many bytes into a MemoryStream
deserialize from the NetworkStream
(Noting in both cases to set the MemoryStream's position back to 0 between write and read)
You can also implement a Stream-subclass that caps the length if you want to avoid a buffer when reading, bit that is more complex.
apperantly i came up with a really simple solution. I just made sure only 1 thread is allowed to transfer data at the same time so i changed this line of code:
formatter.Serialize(networkStream, sendTcpObject);
to these lines of code:
if (!transfering) // making sure only 1 thread is transfering data
{
transfering = true;
formatter.Serialize(networkStream, sendTcpObject);
transfering = false;
}

Finding elements from a memory mapped file in C#

I need to find certain elements within a memory mapped file. I have managed to map the file, however I get some problems finding the elements. My idea was to save all file elements into a list, and then search on that list.
How do I create a function that returns a list with all elements of the mapped file?
// Index indicates the line to read from
public List<string> GetElement(int index) {
}
The way I am mapping the file:
public void MapFile(string path)
{
string mapName = Path.GetFileName(path);
try
{
// Opening existing mmf
if (mapName != null)
{
_mmf = MemoryMappedFile.OpenExisting(mapName);
}
// Setting the pointer at the start of the file
_pointer = 0;
// We create the accessor to read the file
_accessor = _mmf.CreateViewAccessor();
// We mark the file as open
_open = true;
}
catch (Exception ex) {....}
try
{
// Trying to create the mmf
_mmf = MemoryMappedFile.CreateFromFile(path);
// Setting the pointer at the start of the file
_pointer = 0;
// We create the accessor to read the file
_accessor = _mmf.CreateViewAccessor();
// We mark the file as open
_open = true;
}
catch (Exception exInner){..}
}
The file that I am mapping is a UTF-8 ASCII file. Nothing weird.
What I have done:
var list = new List<string>();
// String to store what we read
string trace = string.Empty;
// We read the byte of the pointer
b = _accessor.ReadByte(_pointer);
int tracei = 0;
var traceb = new byte[2048];
// If b is different from 0 we have some data to read
if (b != 0)
{
while (b != 0)
{
// Check if it's an endline
if (b == '\n')
{
trace = Encoding.UTF8.GetString(traceb, 0, tracei - 1);
list.Add(trace);
trace = string.Empty;
tracei = 0;
_lastIndex++;
}
else
{
traceb[tracei++] = b;
}
// Advance and read
b = _accessor.ReadByte(++_pointer);
}
}
The code is difficult to read for humans and is not very efficient. How can I improve it?
You are re-inventing StreamReader, it does exactly what you do. The odds that you really want a memory-mapped file are quite low, they take a lot of virtual memory which you only can make pay off if you repeatedly read the same file at different offsets. Which is very unlikely, text files must be read sequentially since you don't know how long the lines are.
Which makes this one line of code the probable best replacement for what you posted:
string[] trace = System.IO.File.ReadAllLines(path);

Categories

Resources