WCF streaming large number objects - c#

I have a WCF service that query a database and returns a large number of records. There is so many records, that the server runs out of memory and fails before it can return.
So I want to send the records back as I fetch them from the database, or a set number back at a time.
For additional clarity, I cannot collect call records fetched into a collection on the server, as the server runs out of memory before I have collected all the records. I want to try and find away to send them back one by one or in chunks, in one call.
For example, in chunks:
Fetch first 1000 records
Add to collection
Send collection to client
Clear collection
Fetch next 1000 records, and repeat from step 2
So the idea I have how the web service code will look something like this:
Public IEnumerable<Customer> GetAllCustomers()
{
// Setup Query
string query = PrepareQuery();
// Create Connection
connection = new SqlConnection(ConnectionString);
connection.Open();
var sqlcommand = connection.CreateCommand();
sqlcommand.CommandText = query.ToString();
// Read Results
var reader = sqlcommand.ExecuteReader();
while (reader.Read())
{
Customer customer = new Customer();
foreach (var column in Columns)
{
int fieldIndex = reader.GetOrdinal(column);
object value = reader.GetValue(fieldIndex);
customer[column.Name] = value;
}
yield return customer;
}
}
I don't want to consider paging as the Order By on the SQL server is slow.
Looking for way to do this in WCF

I think you answer your own question. There are 2 ways to do it, stream or chunk.
You can do streaming in wcf - see https://learn.microsoft.com/en-us/dotnet/framework/wcf/feature-details/large-data-and-streaming
You get a Stream to write to, so you need to handle yourself how you are going to encode your data on that stream, and how you are going decode it at the client.
The alternative is you do chunking/paging. You just modify your service so it accepts e.g. a page number or some other way to indicate which page is needed.
Which one you do depends on the application, eg how much data? what is the nature of the client? is it possible to use some field to page on? etc etc
Here is some psudo code for making a stream that can do this on the server side. It is based on the example here: https://learn.microsoft.com/en-us/dotnet/framework/wcf/feature-details/how-to-enable-streaming
I'm not writing the full compilable code for you, but this is the gist of it.
In the server:
public Stream GetBigData()
{
return new BigDataStream();
}
BigDataStream (the non-implimented methods are not shown):
class BigDataStream : Stream
{
public BigDataStream()
{
// open DB connection
// run your query
// get a DataReader
}
// you need a buffer to encode your data between calls to Read
List<byte> _encodeBuffer = new List<byte>();
public override int Read(byte[] buffer, int offset, int count)
{
// read from the DataReader and populate the _encodeBuffer
// until the _encodeBuffer contains at least count bytes
// (or until there are no more records)
// for example:
while (_encodeBuffer.Count < count && _reader.Read())
{
// (1)
// encode the record into a byte array. How to do this?
// you can read into a class and then use the data
// contract serialization for example. If you do this, you
// will probably find it easier to prepend an integer which
// specifies the length of the following encoded message.
// This will make it easier for the client to deserialize it.
// (2)
// append the encoded record bytes (plus any length prefix
// etc) to _encodeBuffer
}
// remove up to the first count bytes from _encodeBuffer
// and copy them into buffer at the offset requested
// return the number of bytes added
}
public override void Close()
{
// close the reader + db connection
base.Close();
}
}

Thank to mikelegg & Reniuz for helping come to a solution. I wish I could give them the tick for the right answer, but I am a afraid the next developer to read this question would not fully benefit. So where is what I ended up with.
Setup the config files for the Server and Client (Follow link: Large Data and Streaming)
Followed this solution, can download source code from here
I had to change the DBRowStream.DBThreadProc method a bit to work so I post the source code:
DBRowStream Class:
void DBThreadProc(object o)
{
SqlConnection con = null;
SqlCommand com = null;
try
{
con = new System.Data.SqlClient.SqlConnection(/*ConnectionString*/);
com = new SqlCommand();
com.Connection = con;
com.CommandText = PrepareQuery();
con.Open();
SqlDataReader reader = com.ExecuteReader();
int count = 0;
MemoryStream memStream = memStream1;
memStreamWriteStatus = 1;
readyToWriteToMemStream1.WaitOne();
while (reader.Read())
{
// Populate
Customer customer = new Customer();
foreach (var column in Columns)
{
int fieldIndex = reader.GetOrdinal(column);
object value = reader.GetValue(fieldIndex);
customer[column.Name] = value;
}
// Serialize: I used a custom Serializer
// but BinaryFormatter should be fine
DBDataFormatter.Serialize(memStream, customer);
count++;
if (count == PAGESIZE) // const int PAGESIZE = 10000
{
switch (memStreamWriteStatus)
{
case 1: // done writing to stream 1
{
memStream1.Position = 0;
readyToSendFromMemStream1.Set();
// write stream 1 is done...waiting for stream 2
readyToWriteToMemStream2.WaitOne();
memStream = memStream2;
memStream.Position = 0;
memStream.SetLength(0); // Added:To Reset the stream. Else was getting garbage data back
memStreamWriteStatus = 2;
break;
}
case 2: // done writing to stream 2
{
memStream2.Position = 0;
readyToSendFromMemStream2.Set();
// Write on stream 2 is done...waiting for stream 1
readyToWriteToMemStream1.WaitOne();
// done waiting for stream 1
memStream = memStream1;
memStreamWriteStatus = 1;
memStream.Position = 0;
memStream.SetLength(0); // Added: Reset the stream. Else was getting garbage data back
break;
}
}
count = 0;
}
}
if (count > 0)
{
switch (memStreamWriteStatus)
{
case 1: // done writing to stream 1
{
memStream1.Position = 0;
readyToSendFromMemStream1.Set();
// END write stream 1 is done...waiting for stream 2
break;
}
case 2: // done writing to stream 2
{
memStream2.Position = 0;
readyToSendFromMemStream2.Set();
// END write stream 2 is done...waiting for stream 1
break;
}
}
}
bDoneWriting = true;
bCanRead = false;
}
catch
{
throw;
}
finally
{
if (com != null)
{
com.Dispose();
com = null;
}
if (con != null)
{
con.Close();
con.Dispose();
con = null;
}
}
}
And then the Client side:
private static void TestGetRecordsAndDump()
{
const string FILE_NAME = "Records.CSV";
File.Delete(FILE_NAME);
var file = File.AppendText(FILE_NAME);
long count = 0;
try
{
ServiceReference1.ServiceClient service = new ServiceReference1.DataServiceClient();
var stream = service.GetDBRowStream();
Console.WriteLine("Records Retrieved : ");
Console.WriteLine("File Size (MB) : ");
var canDoLastRead = true;
while (stream.CanRead && canDoLastRead)
{
try
{
Customer customer = DBDataFormatter.Deserialize(stream); // Used custom Deserializer, but BinaryFormatter should be fine
file.Write(customer.ToString());
count++;
}
catch
{
canDoLastRead = false; // Bug: stream.CanRead is not set to false at the end of stream, so I do this trick to know if I finished retruning all records.
}
finally
{
Console.SetCursorPosition("Records Retrieved : ".Length, 0);
Console.Write(string.Format("{0} ", count));
Console.SetCursorPosition("File Size (MB) : ".Length, 1);
Console.Write(string.Format("{0:G} ", file.BaseStream.Length / 1024f / 1024f));
}
}
finally
{
file.Close();
}
}
}
There is a bug I cannot seem to solve, that stream.CanRead is not set to false, then all the records have been returned, have not been able to work out why, but at least now, I can query large data sets, and return all records, with out the server or client running out of memory.

Related

How to store paging state in Cassandra c# driver?

I have read couple of links to implement manual paging using Cassandra c# driver.
Links referred:
Backward paging in cassandra c# driver
https://datastax.github.io/csharp-driver/features/paging/
My requirement:
I am trying to get list of all distinct partition keys form table which is too big in size.
Because of size Cassandra db is throwing error in between retrieving them or on the first execution of query. Now suppose it failed after fetching 100000 distinct partition keys I will use the Paging state provided by Cassandra c# driver.
Now I am saving the last available page state before failing to log file and use it again to continue from where it failed.
I am saving the paging state into log file using:
Encoding.ASCII.GetString(pagingState);
And retrieving form log file using:
Encoding.ASCII.GetBytes(pagingState);
But when I pass it to .SetPagingState(pagingState) and execute the query it throws exception like:
java.lang.IllegalStateException: Cannot call hasNext() until the
previous iterator has been fully consumed
I compared byte by byte array bytes before saving into file and after retrieving them from file. Few values in byte array are different.
I tried with UIF8 encoding but no use.
NOTE: It works perfectly when I pass byte array without converting. I mean the below if condition code works perfectly.
if (pagingState != null)
{
GenerateInitialLogs(pagingState);
}
Full functions:
private void BtnGetPrimaryKeys_Click(object sender, EventArgs e)
{
string fileContent = File.ReadAllText("D:/Logs/log.txt");
if(fileContent.Length > 0)
{
GenerateInitialLogs(Encoding.ASCII.GetBytes(fileContent));
}
else
{
GenerateInitialLogs(null);
}
}
private void Log(byte[] pagingState)
{
File.WriteAllText("D:/Logs/log.txt", Encoding.ASCII.GetString(pagingState));
}
private int GenerateInitialLogs(byte[] pagingState)
{
try
{
RowSet rowSet = BLL.SelectDistinctPrimaryKeys(pagingState);
List<PrimaryKey> distinctPrimaryKeys = new List<PrimaryKey>();
foreach (Row row in rowSet)
{
if (rowSet.PagingState != null) { pagingState = new byte[rowSet.PagingState.Length]; }
pagingState = rowSet.PagingState;
}
Log(pagingState)
if (pagingState != null)
{
GenerateInitialLogs(pagingState);
}
}
catch(Exception ex)
{
throw ex;
}
}
public static RowSet SelectDistinctPrimaryKeysFromTagReadings(byte[] pagingState)
{
try
{
// will execute on continuing after failing in between.
if (pagingState != null)
{
PreparedStatement preparedStatement = BLL.currentSession.Prepare("SELECT DISTINCT \"Url\",\"Id\" FROM \"Readings\" ");
BoundStatement boundStatement = preparedStatement.Bind();
IStatement istatement = boundStatement.SetAutoPage(false).SetPageSize(1000).SetPagingState(pagingState);
return BLL.currentSession.Execute(istatement);
}
else
{
PreparedStatement preparedStatement = BLL.currentSession.Prepare("SELECT DISTINCT \"Url\",\"Id\" FROM \"Readings\" ");
BoundStatement boundStatement = preparedStatement.Bind();
IStatement istatement = boundStatement.SetAutoPage(false).SetPageSize(1000);
return BLL.currentSession.Execute(istatement);
}
}
catch (Exception ex)
{
throw ex;
}
}
This solution is not figured out by me. It's done by Jorge Bay Gondra (employee of datastax).
Original answer:
https://groups.google.com/a/lists.datastax.com/forum/#!topic/csharp-driver-user/4XWTXZC-hyI
Solution:
Can't convert them into ASCII or UIF8 or any encoding because they don't represent text.
Use these functions to convert byte array into hexadecimal and vice versa.
public static string ByteArrayToHexaDecimalString(byte[] bytes)
{
StringBuilder stringBuilder = new StringBuilder(bytes.Length * 2);
foreach (byte b in bytes) { stringBuilder.AppendFormat("{0:x2}", b); }
return stringBuilder.ToString();
}
public static byte[] HexaDecimalStringToByteArray(String hexaDecimalString)
{
int NumberChars = hexaDecimalString.Length;
byte[] bytes = new byte[NumberChars / 2];
for (int i = 0; i < NumberChars; i += 2)
{
bytes[i / 2] = Convert.ToByte(hexaDecimalString.Substring(i, 2), 16);
}
return bytes;
}
I also found Encoding.UTF8.GetString and GetBytes to not work in all cases, though does for some, but found Convert.ToBase64String and the reverse to work fine.
public static string ConvertPagingStateToString(byte[] pagingState)
=> Convert.ToBase64String(pagingState);
public static byte[] ConvertStringToPagingState(string pagingStateString)
=> Convert.FromBase64String(pagingStateString);

Multithreading and TPL do not speed up execution C#

I have a program that makes use of SQL Server to pull information from a database and then perform a series of insertions into other tables and also send an email with the data that was retireved.
The program takes around 3 and a half minutes to execute and there is only 5 rows of data in the database. I am trying to reduce this time in any way I can and have tried multithreading which seems to slow it down further, and TPL which neither increases nor reduces the time. Does anyone know why I am not seeing performance improvements?
I am using an Intel Core i5 which I know has 2 cores so using more than 2 cores I understand will reduce performance. Here is how I am incorporating the use of tasks:
private static void Main(string[] args)
{
Util util = new Util(); //Util object
List<Data> dataList = new List<Data>(); //List of Data Objects
//Reads each row of data and creates Data obj for each
//Then adds each object to the list
dataList = util.getData();
var stopwatch = Stopwatch.StartNew();
var tasks = new Task[dataList.Count];
int i = 0; //Count
foreach (Data data in dataList)
{
//Perform insertions and send email with data
tasks[i++] = Task.Factory.StartNew(() => util.processData(data));
}
Task.WaitAll(tasks); //Wait for completion
Console.WriteLine("DONE: {0}", stopwatch.ElapsedMilliseconds);
}
Util Class:
class Util
{
// create and open a connection object
SqlConnection conn = new SqlConnection("**Connection String**");
//Gets all results from table, and adds object to list
public List<Data> getData()
{
conn.Open();
SqlCommand cmd = new SqlCommand("REF.GET_DATA", conn);
cmd.CommandType = CommandType.StoredProcedure;
SqlDataReader reader = cmd.ExecuteReader();
List<Data> dataList = new List<Data>();
while (reader.Read())
{
//** Take data from table and assigns them to variables
//** Removed for simplicity
Data data= new Data(** pass varaibles here **);
dataList.Add(data); //Add object to datalist
}
return dataList;
}
public void processData(Data data)
{
//** Perform range of trivial operations on data
//** Removed for simplicity
byte[] results = data.RenderData(); //THIS IS WHAT TAKES A LONG TIME TO COMPLETE
data.EmailFile(results);
return;
} //END handleReport()
}
Am I using tasks in the wrong place? Should I instead be making use of parellelism in the util.processData() method? I also tried using await and async around the util.processData(data) call in the main method with no improvements.
EDIT:
Here is the renderData function:
//returns byte data of report results which will be attatched to email.
public byte[] RenderData(string format, string mimeType, ReportExecution.ParameterValue[] parameters)
{
ReportExecutionService res = new ReportExecutionService();
res.Credentials = System.Net.CredentialCache.DefaultCredentials;
res.Timeout = 600000;
//Prepare Render arguments
string historyID = null;
string deviceInfo = String.Empty;
string extension = String.Empty;
string encoding = String.Empty;
ReportExecution.Warning[] warnings = null;
string[] streamIDs = null;
byte[] results = null;
try
{
res.LoadReport(reportPath, historyID);
res.SetExecutionParameters(parameters, "en-gb"); //"/LSG Reporting/Repossession Sales (SAL)/SAL004 - Conveyancing Matter Listing"
results = res.Render(format, deviceInfo, out extension, out mimeType, out encoding, out warnings, out streamIDs);
break;
}
catch (Exception ex)
{
Console.WriteLine(ex.StackTrace)
}
return results;
}

Trying to deserialize more than 1 object at the same time

Im trying to send some object from a server to the client.
My problem is that when im sending only 1 object, everything works correctly. But at the moment i add another object an exception is thrown - "binary stream does not contain a valid binaryheader" or "No map for object (random number)".
My thoughts are that the deserialization does not understand where the stream starts / ends and i hoped that you guys can help me out here.
heres my deserialization code:
public void Listen()
{
try
{
bool offline = true;
Dispatcher.Invoke(System.Windows.Threading.DispatcherPriority.Normal,
new Action(() => offline = Offline));
while (!offline)
{
TcpObject tcpObject = new TcpObject();
IFormatter formatter = new BinaryFormatter();
tcpObject = (TcpObject)formatter.Deserialize(serverStream);
if (tcpObject.Command == Command.Transfer)
{
SentAntenna sentAntenna = (SentAntenna)tcpObject.Object;
int idx = 0;
foreach (string name in SharedProperties.AntennaNames)
{
if (name == sentAntenna.Name)
break;
idx++;
}
if (idx < 9)
{
PointCollection pointCollection = new PointCollection();
foreach (Frequency f in sentAntenna.Frequencies)
pointCollection.Add(new Point(f.Channel, f.Intensity));
SharedProperties.AntennaPoints[idx] = pointCollection;
}
}
}
}
catch (Exception ex)
{
MessageBox.Show(ex.Message); // raise an event
}
}
serialization code:
case Command.Transfer:
Console.WriteLine("Transfering");
Thread transfer = new Thread(new ThreadStart(delegate
{
try
{
string aName = tcpObject.Object.ToString();
int indx = 0;
foreach (string name in names)
{
if (name == aName)
break;
indx++;
}
if (indx < 9)
{
while (true) // need to kill when the father thread terminates
{
if (antennas[indx].Frequencies != null)
{
lock (antennas[indx].Frequencies)
{
TcpObject sendTcpObject = new TcpObject();
sendTcpObject.Command = Command.Transfer;
SentAntenna sa = new SentAntenna(antennas[indx].Frequencies, aName);
sendTcpObject.Object = sa;
formatter.Serialize(networkStream, sendTcpObject);
}
}
}
}
}
catch (Exception ex) { Console.WriteLine(ex); }
}));
transfer.Start();
break;
Interesting. There's nothing particularly odd in your serialization code, and I've seen people use vanilla concatenation for multiple objects in the past, although I've actually always advised against it as BinaryFormatter does not explicitly claim this scenario is OK. But: if it isn't, the only thing I can suggest is to implement your own framing; so your write code becomes:
serialize to an empty MemoryStream
note the length and write the length to the NetworkStream, for example as a simple fixed-width 32-bit network-byte-order integer
write the payload from the MemoryStream to the NetworkStream
rinse, repeat
And the read code becomes:
read exactly 4 bytes and compute the length
buffer that many bytes into a MemoryStream
deserialize from the NetworkStream
(Noting in both cases to set the MemoryStream's position back to 0 between write and read)
You can also implement a Stream-subclass that caps the length if you want to avoid a buffer when reading, bit that is more complex.
apperantly i came up with a really simple solution. I just made sure only 1 thread is allowed to transfer data at the same time so i changed this line of code:
formatter.Serialize(networkStream, sendTcpObject);
to these lines of code:
if (!transfering) // making sure only 1 thread is transfering data
{
transfering = true;
formatter.Serialize(networkStream, sendTcpObject);
transfering = false;
}

Finding elements from a memory mapped file in C#

I need to find certain elements within a memory mapped file. I have managed to map the file, however I get some problems finding the elements. My idea was to save all file elements into a list, and then search on that list.
How do I create a function that returns a list with all elements of the mapped file?
// Index indicates the line to read from
public List<string> GetElement(int index) {
}
The way I am mapping the file:
public void MapFile(string path)
{
string mapName = Path.GetFileName(path);
try
{
// Opening existing mmf
if (mapName != null)
{
_mmf = MemoryMappedFile.OpenExisting(mapName);
}
// Setting the pointer at the start of the file
_pointer = 0;
// We create the accessor to read the file
_accessor = _mmf.CreateViewAccessor();
// We mark the file as open
_open = true;
}
catch (Exception ex) {....}
try
{
// Trying to create the mmf
_mmf = MemoryMappedFile.CreateFromFile(path);
// Setting the pointer at the start of the file
_pointer = 0;
// We create the accessor to read the file
_accessor = _mmf.CreateViewAccessor();
// We mark the file as open
_open = true;
}
catch (Exception exInner){..}
}
The file that I am mapping is a UTF-8 ASCII file. Nothing weird.
What I have done:
var list = new List<string>();
// String to store what we read
string trace = string.Empty;
// We read the byte of the pointer
b = _accessor.ReadByte(_pointer);
int tracei = 0;
var traceb = new byte[2048];
// If b is different from 0 we have some data to read
if (b != 0)
{
while (b != 0)
{
// Check if it's an endline
if (b == '\n')
{
trace = Encoding.UTF8.GetString(traceb, 0, tracei - 1);
list.Add(trace);
trace = string.Empty;
tracei = 0;
_lastIndex++;
}
else
{
traceb[tracei++] = b;
}
// Advance and read
b = _accessor.ReadByte(++_pointer);
}
}
The code is difficult to read for humans and is not very efficient. How can I improve it?
You are re-inventing StreamReader, it does exactly what you do. The odds that you really want a memory-mapped file are quite low, they take a lot of virtual memory which you only can make pay off if you repeatedly read the same file at different offsets. Which is very unlikely, text files must be read sequentially since you don't know how long the lines are.
Which makes this one line of code the probable best replacement for what you posted:
string[] trace = System.IO.File.ReadAllLines(path);

Parsing concatenated, non-delimited XML messages from TCP-stream using C#

I am trying to parse XML messages which are send to my C# application over TCP. Unfortunately, the protocol can not be changed and the XML messages are not delimited and no length prefix is used. Moreover the character encoding is not fixed but each message starts with an XML declaration <?xml>. The question is, how can i read one XML message at a time, using C#.
Up to now, I tried to read the data from the TCP stream into a byte array and use it through a MemoryStream. The problem is, the buffer might contain more than one XML messages or the first message may be incomplete. In these cases, I get an exception when trying to parse it with XmlReader.Read or XmlDocument.Load, but unfortunately the XmlException does not really allow me to distinguish the problem (except parsing the localized error string).
I tried using XmlReader.Read and count the number of Element and EndElement nodes. That way I know when I am finished reading the first, entire XML message.
However, there are several problems. If the buffer does not yet contain the entire message, how can I distinguish the XmlException from an actually invalid, non-well-formed message? In other words, if an exception is thrown before reading the first root EndElement, how can I decide whether to abort the connection with error, or to collect more bytes from the TCP stream?
If no exception occurs, the XmlReader is positioned at the start of the root EndElement. Casting the XmlReader to IXmlLineInfo gives me the current LineNumber and LinePosition, however it is not straight forward to get the byte position where the EndElement really ends. In order to do that, I would have to convert the byte array into a string (with the encoding specified in the XML declaration), seek to LineNumber,LinePosition and convert that back to the byte offset. I try to do that with StreamReader.ReadLine, but the stream reader gives no public access to the current byte position.
All this seams very inelegant and non robust. I wonder if you have ideas for a better solution. Thank you.
After locking around for some time I think I can answer my own question as following (I might be wrong, corrections are welcome):
I found no method so that the XmlReader can continue parsing a second XML message (at least not, if the second message has an XmlDeclaration). XmlTextReader.ResetState could do something similar, but for that I would have to assume the same encoding for all messages. Therefor I could not connect the XmlReader directly to the TcpStream.
After closing the XmlReader, the buffer is not positioned at the readers last position. So it is not possible to close the reader and use a new one to continue with the next message. I guess the reason for this is, that the reader could not successfully seek on every possible input stream.
When XmlReader throws an exception it can not be determined whether it happened because of an premature EOF or because of a non-wellformed XML. XmlReader.EOF is not set in case of an exception. As workaround I derived my own MemoryBuffer, which returns the very last byte as a single byte. This way I know that the XmlReader was really interested in the last byte and the following exception is likely due to a truncated message (this is kinda sloppy, in that it might not detect every non-wellformed message. However, after appending more bytes to the buffer, sooner or later the error will be detected.
I could cast my XmlReader to the IXmlLineInfo interface, which gives access to the LineNumber and the LinePosition of the current node. So after reading the first message I remember these positions and use it to truncate the buffer. Here comes the really sloppy part, because I have to use the character encoding to get the byte position. I am sure you could find test cases for the code below where it breaks (e.g. internal elements with mixed encoding). But up to now it worked for all my tests.
Here is the parser class I came up with -- may it be useful (I know, its very far from perfect...)
class XmlParser {
private byte[] buffer = new byte[0];
public int Length {
get {
return buffer.Length;
}
}
// Append new binary data to the internal data buffer...
public XmlParser Append(byte[] buffer2) {
if (buffer2 != null && buffer2.Length > 0) {
// I know, its not an efficient way to do this.
// The EofMemoryStream should handle a List<byte[]> ...
byte[] new_buffer = new byte[buffer.Length + buffer2.Length];
buffer.CopyTo(new_buffer, 0);
buffer2.CopyTo(new_buffer, buffer.Length);
buffer = new_buffer;
}
return this;
}
// MemoryStream which returns the last byte of the buffer individually,
// so that we know that the buffering XmlReader really locked at the last
// byte of the stream.
// Moreover there is an EOF marker.
private class EofMemoryStream: Stream {
public bool EOF { get; private set; }
private MemoryStream mem_;
public override bool CanSeek {
get {
return false;
}
}
public override bool CanWrite {
get {
return false;
}
}
public override bool CanRead {
get {
return true;
}
}
public override long Length {
get {
return mem_.Length;
}
}
public override long Position {
get {
return mem_.Position;
}
set {
throw new NotSupportedException();
}
}
public override void Flush() {
mem_.Flush();
}
public override long Seek(long offset, SeekOrigin origin) {
throw new NotSupportedException();
}
public override void SetLength(long value) {
throw new NotSupportedException();
}
public override void Write(byte[] buffer, int offset, int count) {
throw new NotSupportedException();
}
public override int Read(byte[] buffer, int offset, int count) {
count = Math.Min(count, Math.Max(1, (int)(Length - Position - 1)));
int nread = mem_.Read(buffer, offset, count);
if (nread == 0) {
EOF = true;
}
return nread;
}
public EofMemoryStream(byte[] buffer) {
mem_ = new MemoryStream(buffer, false);
EOF = false;
}
protected override void Dispose(bool disposing) {
mem_.Dispose();
}
}
// Parses the first xml message from the stream.
// If the first message is not yet complete, it returns null.
// If the buffer contains non-wellformed xml, it ~should~ throw an exception.
// After reading an xml message, it pops the data from the byte array.
public Message deserialize() {
if (buffer.Length == 0) {
return null;
}
Message message = null;
Encoding encoding = Message.default_encoding;
//string xml = encoding.GetString(buffer);
using (EofMemoryStream sbuffer = new EofMemoryStream (buffer)) {
XmlDocument xmlDocument = null;
XmlReaderSettings settings = new XmlReaderSettings();
int LineNumber = -1;
int LinePosition = -1;
bool truncate_buffer = false;
using (XmlReader xmlReader = XmlReader.Create(sbuffer, settings)) {
try {
// Read to the first node (skipping over some element-types.
// Don't use MoveToContent here, because it would skip the
// XmlDeclaration too...
while (xmlReader.Read() &&
(xmlReader.NodeType==XmlNodeType.Whitespace ||
xmlReader.NodeType==XmlNodeType.Comment)) {
};
// Check for XML declaration.
// If the message has an XmlDeclaration, extract the encoding.
switch (xmlReader.NodeType) {
case XmlNodeType.XmlDeclaration:
while (xmlReader.MoveToNextAttribute()) {
if (xmlReader.Name == "encoding") {
encoding = Encoding.GetEncoding(xmlReader.Value);
}
}
xmlReader.MoveToContent();
xmlReader.Read();
break;
}
// Move to the first element.
xmlReader.MoveToContent();
if (xmlReader.EOF) {
return null;
}
// Read the entire document.
xmlDocument = new XmlDocument();
xmlDocument.Load(xmlReader.ReadSubtree());
} catch (XmlException e) {
// The parsing of the xml failed. If the XmlReader did
// not yet look at the last byte, it is assumed that the
// XML is invalid and the exception is re-thrown.
if (sbuffer.EOF) {
return null;
}
throw e;
}
{
// Try to serialize an internal data structure using XmlSerializer.
Type type = null;
try {
type = Type.GetType("my.namespace." + xmlDocument.DocumentElement.Name);
} catch (Exception e) {
// No specialized data container for this class found...
}
if (type == null) {
message = new Message();
} else {
// TODO: reuse the serializer...
System.Xml.Serialization.XmlSerializer ser = new System.Xml.Serialization.XmlSerializer(type);
message = (Message)ser.Deserialize(new XmlNodeReader(xmlDocument));
}
message.doc = xmlDocument;
}
// At this point, the first XML message was sucessfully parsed.
// Remember the lineposition of the current end element.
IXmlLineInfo xmlLineInfo = xmlReader as IXmlLineInfo;
if (xmlLineInfo != null && xmlLineInfo.HasLineInfo()) {
LineNumber = xmlLineInfo.LineNumber;
LinePosition = xmlLineInfo.LinePosition;
}
// Try to read the rest of the buffer.
// If an exception is thrown, another xml message appears.
// This way the xml parser could tell us that the message is finished here.
// This would be prefered as truncating the buffer using the line info is sloppy.
try {
while (xmlReader.Read()) {
}
} catch {
// There comes a second message. Needs workaround for trunkating.
truncate_buffer = true;
}
}
if (truncate_buffer) {
if (LineNumber < 0) {
throw new Exception("LineNumber not given. Cannot truncate xml buffer");
}
// Convert the buffer to a string using the encoding found before
// (or the default encoding).
string s = encoding.GetString(buffer);
// Seek to the line.
int char_index = 0;
while (--LineNumber > 0) {
// Recognize \r , \n , \r\n as newlines...
char_index = s.IndexOfAny(new char[] {'\r', '\n'}, char_index);
// char_index should not be -1 because LineNumber>0, otherwise an RangeException is
// thrown, which is appropriate.
char_index++;
if (s[char_index-1]=='\r' && s.Length>char_index && s[char_index]=='\n') {
char_index++;
}
}
char_index += LinePosition - 1;
var rgx = new System.Text.RegularExpressions.Regex(xmlDocument.DocumentElement.Name + "[ \r\n\t]*\\>");
System.Text.RegularExpressions.Match match = rgx.Match(s, char_index);
if (!match.Success || match.Index != char_index) {
throw new Exception("could not find EndElement to truncate the xml buffer.");
}
char_index += match.Value.Length;
// Convert the character offset back to the byte offset (for the given encoding).
int line1_boffset = encoding.GetByteCount(s.Substring(0, char_index));
// remove the bytes from the buffer.
buffer = buffer.Skip(line1_boffset).ToArray();
} else {
buffer = new byte[0];
}
}
return message;
}
}
Reading into a MemoryStream is not necessary to use an XmlReader. You can attach the reader more directly to the stream to read as much as you require to reach the end of the XML document. A BufferedStream can be utilized to improve the efficiency of reading from the socket directly.
string server = "tcp://myserver"
string message = "GetMyXml"
int port = 13000;
int bufferSize = 1024;
using(var client = new TcpClient(server, port))
using(var clientStream = client.GetStream())
using(var bufferedStream = new BufferedStream(clientStream, bufferSize))
using(var xmlReader = XmlReader.Create(bufferedStream))
{
xmlReader.MoveToContent();
try
{
while(xmlReader.Read())
{
// Check for XML declaration.
if(xmlReader.NodeType != XmlNodeType.XmlDeclaration)
{
throw new Exception("Expected XML declaration.");
}
// Move to the first element.
xmlReader.Read();
xmlReader.MoveToContent();
// Read the root element.
// Hand this document to another method to process further.
var xmlDocument = XmlDocument.Load(xmlReader.ReadSubtree());
}
}
catch(XmlException ex)
{
// Record exception reading stream.
// Move reader to start of next document or rethrow exception to exit.
}
}
The key to making this work is the call to XmlReader.ReadSubtree() which creates a child reader on top of the parent reader, one that will treat the current element (in this case the root element) as the entire XML tree. This should allow you to parse document elements separately.
My code's a little sloppy around reading the document, especially as I ignore all the information in the XML declaration. I'm sure there's room for improvement, but hopefully this gets you on the right track.
Assuming that you can change the protocol, I'd suggest adding start and stop markers to the messages, so that when you read it all in as a text stream you can split it up in separate messages (leaving incomplete messages in an "incoming buffer" of some kind), clean up the markers and then you know that you've got exactly one message at the time.
The 2 issues that I found were:
XmlReader will only permit an XML declaration at the very beginning. Since it can't be reset it needs to be recreated.
Once XmlReader has done its work it will usually have consumed additional characters after the end of the document because it uses the Read(char[], int, int) method.
My (brittle) workaround is to create a wrapper that only fills the array until a '>' is encountered. This keeps the XmlReader from consuming characters past the ending > of the document it was parsing:
public class SegmentingReader : TextReader {
private TextReader reader;
private char trigger;
public SegmentingReader(TextReader reader, char trigger) {
this.reader = reader;
this.trigger = trigger;
}
// Dispose omitted for brevity
public override int Peek() { return reader.Peek(); }
public override int Read() { return reader.Read(); }
public override int Read(char[] buffer, int index, int count) {
int n = 0;
while (n < count) {
char ch = (char)reader.Read();
buffer[index + n] = ch;
n++;
if (ch == trigger) break;
}
return n;
}
}
Then it can be used as simply as:
using(var inputReader = new SegmentingReader(/*TextReader from somewhere */))
using(var serializer = new XmlSerializer(typeof(SerializedClass)))
while (inputReader.Peek() != -1)
{
using (var xmlReader = XmlReader.Create(inputReader)) {
xmlReader.MoveToContent();
var obj = serializer.Deserialize(xmlReader.ReadSubtree());
DoStuff(obj);
}
}

Categories

Resources