I have read couple of links to implement manual paging using Cassandra c# driver.
Links referred:
Backward paging in cassandra c# driver
https://datastax.github.io/csharp-driver/features/paging/
My requirement:
I am trying to get list of all distinct partition keys form table which is too big in size.
Because of size Cassandra db is throwing error in between retrieving them or on the first execution of query. Now suppose it failed after fetching 100000 distinct partition keys I will use the Paging state provided by Cassandra c# driver.
Now I am saving the last available page state before failing to log file and use it again to continue from where it failed.
I am saving the paging state into log file using:
Encoding.ASCII.GetString(pagingState);
And retrieving form log file using:
Encoding.ASCII.GetBytes(pagingState);
But when I pass it to .SetPagingState(pagingState) and execute the query it throws exception like:
java.lang.IllegalStateException: Cannot call hasNext() until the
previous iterator has been fully consumed
I compared byte by byte array bytes before saving into file and after retrieving them from file. Few values in byte array are different.
I tried with UIF8 encoding but no use.
NOTE: It works perfectly when I pass byte array without converting. I mean the below if condition code works perfectly.
if (pagingState != null)
{
GenerateInitialLogs(pagingState);
}
Full functions:
private void BtnGetPrimaryKeys_Click(object sender, EventArgs e)
{
string fileContent = File.ReadAllText("D:/Logs/log.txt");
if(fileContent.Length > 0)
{
GenerateInitialLogs(Encoding.ASCII.GetBytes(fileContent));
}
else
{
GenerateInitialLogs(null);
}
}
private void Log(byte[] pagingState)
{
File.WriteAllText("D:/Logs/log.txt", Encoding.ASCII.GetString(pagingState));
}
private int GenerateInitialLogs(byte[] pagingState)
{
try
{
RowSet rowSet = BLL.SelectDistinctPrimaryKeys(pagingState);
List<PrimaryKey> distinctPrimaryKeys = new List<PrimaryKey>();
foreach (Row row in rowSet)
{
if (rowSet.PagingState != null) { pagingState = new byte[rowSet.PagingState.Length]; }
pagingState = rowSet.PagingState;
}
Log(pagingState)
if (pagingState != null)
{
GenerateInitialLogs(pagingState);
}
}
catch(Exception ex)
{
throw ex;
}
}
public static RowSet SelectDistinctPrimaryKeysFromTagReadings(byte[] pagingState)
{
try
{
// will execute on continuing after failing in between.
if (pagingState != null)
{
PreparedStatement preparedStatement = BLL.currentSession.Prepare("SELECT DISTINCT \"Url\",\"Id\" FROM \"Readings\" ");
BoundStatement boundStatement = preparedStatement.Bind();
IStatement istatement = boundStatement.SetAutoPage(false).SetPageSize(1000).SetPagingState(pagingState);
return BLL.currentSession.Execute(istatement);
}
else
{
PreparedStatement preparedStatement = BLL.currentSession.Prepare("SELECT DISTINCT \"Url\",\"Id\" FROM \"Readings\" ");
BoundStatement boundStatement = preparedStatement.Bind();
IStatement istatement = boundStatement.SetAutoPage(false).SetPageSize(1000);
return BLL.currentSession.Execute(istatement);
}
}
catch (Exception ex)
{
throw ex;
}
}
This solution is not figured out by me. It's done by Jorge Bay Gondra (employee of datastax).
Original answer:
https://groups.google.com/a/lists.datastax.com/forum/#!topic/csharp-driver-user/4XWTXZC-hyI
Solution:
Can't convert them into ASCII or UIF8 or any encoding because they don't represent text.
Use these functions to convert byte array into hexadecimal and vice versa.
public static string ByteArrayToHexaDecimalString(byte[] bytes)
{
StringBuilder stringBuilder = new StringBuilder(bytes.Length * 2);
foreach (byte b in bytes) { stringBuilder.AppendFormat("{0:x2}", b); }
return stringBuilder.ToString();
}
public static byte[] HexaDecimalStringToByteArray(String hexaDecimalString)
{
int NumberChars = hexaDecimalString.Length;
byte[] bytes = new byte[NumberChars / 2];
for (int i = 0; i < NumberChars; i += 2)
{
bytes[i / 2] = Convert.ToByte(hexaDecimalString.Substring(i, 2), 16);
}
return bytes;
}
I also found Encoding.UTF8.GetString and GetBytes to not work in all cases, though does for some, but found Convert.ToBase64String and the reverse to work fine.
public static string ConvertPagingStateToString(byte[] pagingState)
=> Convert.ToBase64String(pagingState);
public static byte[] ConvertStringToPagingState(string pagingStateString)
=> Convert.FromBase64String(pagingStateString);
Related
Long story short. ;
I have a class named Scope. And this class contains all logic for scope operations etc. It also starts backround thread that constantly read serial port data (in my case events was unreliable):
Thread BackgroundReader = new Thread(ReadBuffer);
BackgroundReader.IsBackground = true;
BackgroundReader.Start();
private void ReadBuffer()
{
SerialPort.DiscardInBuffer();
while (!_stopCapture)
{
int bufferSize = SerialPort.BytesToRead;
byte[] buffer = new byte[bufferSize];
if(bufferSize > 5)
{
SerialPort.Read(buffer, 0, bufferSize);
Port_DataReceivedEvent(buffer, null);
}
Thread.Sleep(_readDelay);
}
CurrentBuffer = null;
}
In Scope class there is a public field named Buffer
public byte[] Buffer
{
get
{
return CurrentBuffer;
}
}
And here is event fired while there is new data readed
private void Port_DataReceivedEvent(object sender, EventArgs e)
{
//populate buffer
Info(sender, null);
CurrentBuffer = ((byte[])sender);
foreach(byte data in CurrentBuffer)
{
DataBuffer.Enqueue(data);
}
if (DataBuffer.Count() > _recordLength)
{
GenerateFrame(DataBuffer.ToArray());
DataBuffer.Clear(); ;
}
}
To make code more manageable, I splitted it in several classes. One of this classes is for searching specific data pattern in current stream and create specific object from this data. This code works in way that send to serial port specific command and expect return frame. If reponse is not received or not ok, send is performed again and again until correct response arrives or there will be timeout. Response is expected to be in current buffer. Those strange string manipulation is for debug purposes.
public class GetAcknowledgedFrame
{
byte[] WritedData;
string lastEx;
string stringData;
public DataFrame WriteAcknowledged(Type SendType, Type ReturnType, JyeScope scope)
{
var stopwatch = new Stopwatch();
stopwatch.Restart();
while (stopwatch.ElapsedMilliseconds < scope.TimeoutTime)
{
try
{
if (SendType == typeof(GetParameters))
{
WriteFrame(new ScopeControlFrames.GetParameters(), scope.SerialPort);
}
else if(SendType == typeof(GetConfig))
{
WriteFrame(new ScopeControlFrames.GetConfig(), scope.SerialPort);
}
else if (SendType == typeof(EnterUSBScopeMode))
{
WriteFrame(new ScopeControlFrames.EnterUSBScopeMode(), scope.SerialPort);
}
return ReturnFrame(ReturnType, scope.Buffer, scope.TimeoutTime);
}
catch (InvalidDataFrameException ex)
{
lastEx = ex.Message;
System.Threading.Thread.Sleep(10);
}
}
stringData = "";
foreach (var data in scope.Buffer)
{
stringData += data + ",";
}
stringData.Remove(stringData.Length - 1);
throw new TimeoutException($"Timeout while waiting for frame acknowledge: " + SendType.ToString() + ", " + ReturnType.ToString() + Environment.NewLine+ "Add. err: "+lastEx);
}
private DataFrame ReturnFrame(Type FrameType, byte[] buffer, int timeoutTime)
{
if (FrameType == typeof(DataFrames.DSO068.CurrConfigDataFrame))
{
DataFrames.DSO068.CurrConfigDataFrame CurrConfig = new DataFrames.DSO068.CurrConfigDataFrame(buffer);
return CurrConfig;
}
else if (FrameType == typeof(DataFrames.DSO112.CurrConfigDataFrame))
{
DataFrames.DSO112.CurrConfigDataFrame CurrParam = new DataFrames.DSO112.CurrConfigDataFrame(buffer);
return CurrParam;
}
else if (FrameType == typeof(CurrParamDataFrame))
{
CurrParamDataFrame CurrParam = new CurrParamDataFrame(buffer);
return CurrParam;
}
else if (FrameType == typeof(DataBlockDataFrame))
{
DataBlockDataFrame CurrData = new DataBlockDataFrame(buffer);
return CurrData;
}
else if (FrameType == typeof(DataSampleDataFrame))
{
DataSampleDataFrame CurrData = new DataSampleDataFrame(buffer);
return CurrData;
}
else if (FrameType == typeof(ScopeControlFrames.ScopeReady))
{
ScopeControlFrames.ScopeReady ready = new ScopeControlFrames.ScopeReady(buffer);
return ready;
}
else
{
throw new InvalidOperationException("Wrong object type");
}
}
private bool WriteFrame(DataFrame frame, IStreamResource port)
{
WritedData = frame.Data;
port.Write(frame.Data, 0, frame.Data.Count());
return true;
}
}
From main class (and main thread) I call method in this class, for example:
var Ready = (ScopeControlFrames.ScopeReady)new GetAcknowledgedFrame().WriteAcknowledged
(typeof(ScopeControlFrames.EnterUSBScopeMode), typeof(ScopeControlFrames.ScopeReady), this);
The problem is when I pass "this" object (that has thread working in background) to my helper class. It seems like helper class not see changing data in this object. The problem started when I separate code of my helper class from main class.
My questions:
- I know that object are passed by reference, that means I think that when object is dynamically changing its state (in this case data buffer should changing while new data is received) all classes that has reference to this object are also seeing this changes. Maybe I'm missing something?
- I tried passing array (by ref), arrays are also reference types. But this not help me at all. Maybe I'm missing something?
I tried changing this class to static, it not helped.
Many thanks for help.
The code below;
Info(sender, null);
CurrentBuffer = ((byte[])sender);
is creating a new reference variable called CurrentBuffer. Any other code holding a reference 'pointer' to the CurrentBuffer value prior to this line of code will not get the new value of CurrentBuffer when its reset.
I have a WCF service that query a database and returns a large number of records. There is so many records, that the server runs out of memory and fails before it can return.
So I want to send the records back as I fetch them from the database, or a set number back at a time.
For additional clarity, I cannot collect call records fetched into a collection on the server, as the server runs out of memory before I have collected all the records. I want to try and find away to send them back one by one or in chunks, in one call.
For example, in chunks:
Fetch first 1000 records
Add to collection
Send collection to client
Clear collection
Fetch next 1000 records, and repeat from step 2
So the idea I have how the web service code will look something like this:
Public IEnumerable<Customer> GetAllCustomers()
{
// Setup Query
string query = PrepareQuery();
// Create Connection
connection = new SqlConnection(ConnectionString);
connection.Open();
var sqlcommand = connection.CreateCommand();
sqlcommand.CommandText = query.ToString();
// Read Results
var reader = sqlcommand.ExecuteReader();
while (reader.Read())
{
Customer customer = new Customer();
foreach (var column in Columns)
{
int fieldIndex = reader.GetOrdinal(column);
object value = reader.GetValue(fieldIndex);
customer[column.Name] = value;
}
yield return customer;
}
}
I don't want to consider paging as the Order By on the SQL server is slow.
Looking for way to do this in WCF
I think you answer your own question. There are 2 ways to do it, stream or chunk.
You can do streaming in wcf - see https://learn.microsoft.com/en-us/dotnet/framework/wcf/feature-details/large-data-and-streaming
You get a Stream to write to, so you need to handle yourself how you are going to encode your data on that stream, and how you are going decode it at the client.
The alternative is you do chunking/paging. You just modify your service so it accepts e.g. a page number or some other way to indicate which page is needed.
Which one you do depends on the application, eg how much data? what is the nature of the client? is it possible to use some field to page on? etc etc
Here is some psudo code for making a stream that can do this on the server side. It is based on the example here: https://learn.microsoft.com/en-us/dotnet/framework/wcf/feature-details/how-to-enable-streaming
I'm not writing the full compilable code for you, but this is the gist of it.
In the server:
public Stream GetBigData()
{
return new BigDataStream();
}
BigDataStream (the non-implimented methods are not shown):
class BigDataStream : Stream
{
public BigDataStream()
{
// open DB connection
// run your query
// get a DataReader
}
// you need a buffer to encode your data between calls to Read
List<byte> _encodeBuffer = new List<byte>();
public override int Read(byte[] buffer, int offset, int count)
{
// read from the DataReader and populate the _encodeBuffer
// until the _encodeBuffer contains at least count bytes
// (or until there are no more records)
// for example:
while (_encodeBuffer.Count < count && _reader.Read())
{
// (1)
// encode the record into a byte array. How to do this?
// you can read into a class and then use the data
// contract serialization for example. If you do this, you
// will probably find it easier to prepend an integer which
// specifies the length of the following encoded message.
// This will make it easier for the client to deserialize it.
// (2)
// append the encoded record bytes (plus any length prefix
// etc) to _encodeBuffer
}
// remove up to the first count bytes from _encodeBuffer
// and copy them into buffer at the offset requested
// return the number of bytes added
}
public override void Close()
{
// close the reader + db connection
base.Close();
}
}
Thank to mikelegg & Reniuz for helping come to a solution. I wish I could give them the tick for the right answer, but I am a afraid the next developer to read this question would not fully benefit. So where is what I ended up with.
Setup the config files for the Server and Client (Follow link: Large Data and Streaming)
Followed this solution, can download source code from here
I had to change the DBRowStream.DBThreadProc method a bit to work so I post the source code:
DBRowStream Class:
void DBThreadProc(object o)
{
SqlConnection con = null;
SqlCommand com = null;
try
{
con = new System.Data.SqlClient.SqlConnection(/*ConnectionString*/);
com = new SqlCommand();
com.Connection = con;
com.CommandText = PrepareQuery();
con.Open();
SqlDataReader reader = com.ExecuteReader();
int count = 0;
MemoryStream memStream = memStream1;
memStreamWriteStatus = 1;
readyToWriteToMemStream1.WaitOne();
while (reader.Read())
{
// Populate
Customer customer = new Customer();
foreach (var column in Columns)
{
int fieldIndex = reader.GetOrdinal(column);
object value = reader.GetValue(fieldIndex);
customer[column.Name] = value;
}
// Serialize: I used a custom Serializer
// but BinaryFormatter should be fine
DBDataFormatter.Serialize(memStream, customer);
count++;
if (count == PAGESIZE) // const int PAGESIZE = 10000
{
switch (memStreamWriteStatus)
{
case 1: // done writing to stream 1
{
memStream1.Position = 0;
readyToSendFromMemStream1.Set();
// write stream 1 is done...waiting for stream 2
readyToWriteToMemStream2.WaitOne();
memStream = memStream2;
memStream.Position = 0;
memStream.SetLength(0); // Added:To Reset the stream. Else was getting garbage data back
memStreamWriteStatus = 2;
break;
}
case 2: // done writing to stream 2
{
memStream2.Position = 0;
readyToSendFromMemStream2.Set();
// Write on stream 2 is done...waiting for stream 1
readyToWriteToMemStream1.WaitOne();
// done waiting for stream 1
memStream = memStream1;
memStreamWriteStatus = 1;
memStream.Position = 0;
memStream.SetLength(0); // Added: Reset the stream. Else was getting garbage data back
break;
}
}
count = 0;
}
}
if (count > 0)
{
switch (memStreamWriteStatus)
{
case 1: // done writing to stream 1
{
memStream1.Position = 0;
readyToSendFromMemStream1.Set();
// END write stream 1 is done...waiting for stream 2
break;
}
case 2: // done writing to stream 2
{
memStream2.Position = 0;
readyToSendFromMemStream2.Set();
// END write stream 2 is done...waiting for stream 1
break;
}
}
}
bDoneWriting = true;
bCanRead = false;
}
catch
{
throw;
}
finally
{
if (com != null)
{
com.Dispose();
com = null;
}
if (con != null)
{
con.Close();
con.Dispose();
con = null;
}
}
}
And then the Client side:
private static void TestGetRecordsAndDump()
{
const string FILE_NAME = "Records.CSV";
File.Delete(FILE_NAME);
var file = File.AppendText(FILE_NAME);
long count = 0;
try
{
ServiceReference1.ServiceClient service = new ServiceReference1.DataServiceClient();
var stream = service.GetDBRowStream();
Console.WriteLine("Records Retrieved : ");
Console.WriteLine("File Size (MB) : ");
var canDoLastRead = true;
while (stream.CanRead && canDoLastRead)
{
try
{
Customer customer = DBDataFormatter.Deserialize(stream); // Used custom Deserializer, but BinaryFormatter should be fine
file.Write(customer.ToString());
count++;
}
catch
{
canDoLastRead = false; // Bug: stream.CanRead is not set to false at the end of stream, so I do this trick to know if I finished retruning all records.
}
finally
{
Console.SetCursorPosition("Records Retrieved : ".Length, 0);
Console.Write(string.Format("{0} ", count));
Console.SetCursorPosition("File Size (MB) : ".Length, 1);
Console.Write(string.Format("{0:G} ", file.BaseStream.Length / 1024f / 1024f));
}
}
finally
{
file.Close();
}
}
}
There is a bug I cannot seem to solve, that stream.CanRead is not set to false, then all the records have been returned, have not been able to work out why, but at least now, I can query large data sets, and return all records, with out the server or client running out of memory.
I do not want to read the whole file at any point, I know there are answers on that question, I want t
o read the First or Last line.
I know that my code locks the file that it's reading for two reasons 1) The application that writes to the file crashes intermittently when I run my little app with this code but it never crashes when I am not running this code! 2) There are a few articles that will tell you that File.ReadLines locks the file.
There are some similar questions but that answer seems to involve reading the whole file which is slow for large files and therefore not what I want to do. My requirement to only read the last line most of the time is also unique from what I have read about.
I nead to know how to read the first line (Header row) and the last line (latest row). I do not want to read all lines at any point in my code because this file can become huge and reading the entire file will become slow.
I know that
line = File.ReadLines(fullFilename).First().Replace("\"", "");
... is the same as ...
FileStream fs = new FileStream(#fullFilename, FileMode.Open, FileAccess.Read, FileShare.Read);
My question is, how can I repeatedly read the first and last lines of a file which may be being written to by another application without locking it in any way. I have no control over the application that is writting to the file. It is a data log which can be appended to at any time. The reason I am listening in this way is that this log can be appended to for days on end. I want to see the latest data in this log in my own c# programme without waiting for the log to finish being written to.
My code to call the reading / listening function ...
//Start Listening to the "data log"
private void btnDeconstructCSVFile_Click(object sender, EventArgs e)
{
MySandbox.CopyCSVDataFromLogFile copyCSVDataFromLogFile = new MySandbox.CopyCSVDataFromLogFile();
copyCSVDataFromLogFile.checkForLogData();
}
My class which does the listening. For now it simply adds the data to 2 generics lists ...
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using MySandbox.Classes;
using System.IO;
namespace MySandbox
{
public class CopyCSVDataFromLogFile
{
static private List<LogRowData> listMSDataRows = new List<LogRowData>();
static String fullFilename = string.Empty;
static LogRowData previousLineLogRowList = new LogRowData();
static LogRowData logRowList = new LogRowData();
static LogRowData logHeaderRowList = new LogRowData();
static Boolean checking = false;
public void checkForLogData()
{
//Initialise
string[] logHeaderArray = new string[] { };
string[] badDataRowsArray = new string[] { };
//Get the latest full filename (file with new data)
//Assumption: only 1 file is written to at a time in this directory.
String directory = "C:\\TestDir\\";
string pattern = "*.csv";
var dirInfo = new DirectoryInfo(directory);
var file = (from f in dirInfo.GetFiles(pattern) orderby f.LastWriteTime descending select f).First();
fullFilename = directory + file.ToString(); //This is the full filepath and name of the latest file in the directory!
if (logHeaderArray.Length == 0)
{
//Populate the Header Row
logHeaderRowList = getRow(fullFilename, true);
}
LogRowData tempLogRowList = new LogRowData();
if (!checking)
{
//Read the latest data in an asynchronous loop
callDataProcess();
}
}
private async void callDataProcess()
{
checking = true; //Begin checking
await checkForNewDataAndSaveIfFound();
}
private static Task checkForNewDataAndSaveIfFound()
{
return Task.Run(() => //Call the async "Task"
{
while (checking) //Loop (asynchronously)
{
LogRowData tempLogRowList = new LogRowData();
if (logHeaderRowList.ValueList.Count == 0)
{
//Populate the Header row
logHeaderRowList = getRow(fullFilename, true);
}
else
{
//Populate Data row
tempLogRowList = getRow(fullFilename, false);
if ((!Enumerable.SequenceEqual(tempLogRowList.ValueList, previousLineLogRowList.ValueList)) &&
(!Enumerable.SequenceEqual(tempLogRowList.ValueList, logHeaderRowList.ValueList)))
{
logRowList = getRow(fullFilename, false);
listMSDataRows.Add(logRowList);
previousLineLogRowList = logRowList;
}
}
//System.Threading.Thread.Sleep(10); //Wait for next row.
}
});
}
private static LogRowData getRow(string fullFilename, bool isHeader)
{
string line;
string[] logDataArray = new string[] { };
LogRowData logRowListResult = new LogRowData();
try
{
if (isHeader)
{
//Asign first (header) row data.
//Works but seems to block writting to the file!!!!!!!!!!!!!!!!!!!!!!!!!!!
line = File.ReadLines(fullFilename).First().Replace("\"", "");
}
else
{
//Assign data as last row (default behaviour).
line = File.ReadLines(fullFilename).Last().Replace("\"", "");
}
logDataArray = line.Split(',');
//Copy Array to Generics List and remove last value if it's empty.
for (int i = 0; i < logDataArray.Length; i++)
{
if (i < logDataArray.Length)
{
if (i < logDataArray.Length - 1)
{
//Value is not at the end, from observation, these always have a value (even if it's zero) and so we'll store the value.
logRowListResult.ValueList.Add(logDataArray[i]);
}
else
{
//This is the last value
if (logDataArray[i].Replace("\"", "").Trim().Length > 0)
{
//In this case, the last value is not empty, store it as normal.
logRowListResult.ValueList.Add(logDataArray[i]);
}
else { /*The last value is empty, e.g. "123,456,"; the final comma denotes another field but this field is empty so we will ignore it now. */ }
}
}
}
}
catch (Exception ex)
{
if (ex.Message == "Sequence contains no elements")
{ /*Empty file, no problem. The code will safely loop and then will pick up the header when it appears.*/ }
else
{
//TODO: catch this error properly
Int32 problemID = 10; //Unknown ERROR.
}
}
return logRowListResult;
}
}
}
I found the answer in a combination of other questions. One answer explaining how to read from the end of a file, which I adapted so that it would read only 1 line from the end of the file. And another explaining how to read the entire file without locking it (I did not want to read the entire file but the not locking part was useful). So now you can read the last line of the file (if it contains end of line characters) without locking it. For other end of line delimeters, just replace my 10 and 13 with your end of line character bytes...
Add the method below to public class CopyCSVDataFromLogFile
private static string Reverse(string str)
{
char[] arr = new char[str.Length];
for (int i = 0; i < str.Length; i++)
arr[i] = str[str.Length - 1 - i];
return new string(arr);
}
and replace this line ...
line = File.ReadLines(fullFilename).Last().Replace("\"", "");
with this code block ...
Int32 endOfLineCharacterCount = 0;
Int32 previousCharByte = 0;
Int32 currentCharByte = 0;
//Read the file, from the end, for 1 line, allowing other programmes to access it for read and write!
using (FileStream reader = new FileStream(fullFilename, FileMode.Open, FileAccess.Read, FileShare.ReadWrite, 0x1000, FileOptions.SequentialScan))
{
int i = 0;
StringBuilder lineBuffer = new StringBuilder();
int byteRead;
while ((-i < reader.Length) /*Belt and braces: if there were no end of line characters, reading beyond the file would give a catastrophic error here (to be avoided thus).*/
&& (endOfLineCharacterCount < 2)/*Exit Condition*/)
{
reader.Seek(--i, SeekOrigin.End);
byteRead = reader.ReadByte();
currentCharByte = byteRead;
//Exit condition: the first 2 characters we read (reading backwards remember) were end of line ().
//So when we read the second end of line, we have read 1 whole line (the last line in the file)
//and we must exit now.
if (currentCharByte == 13 && previousCharByte == 10)
{
endOfLineCharacterCount++;
}
if (byteRead == 10 && lineBuffer.Length > 0)
{
line += Reverse(lineBuffer.ToString());
lineBuffer.Remove(0, lineBuffer.Length);
}
lineBuffer.Append((char)byteRead);
previousCharByte = byteRead;
}
reader.Close();
}
I have to implement a tcp connection where raw xml data is passed.
Unfortunately there is no message framing, I now this is realy bad, but I have to deal with this...
The Message would look like this:
<?xml version="1.0" encoding="utf-8"?>
<DATA></DATA>
or this
<?xml version="1.0" encoding="utf-8"?>
<DATA />
Now I have to receive messages that could have self closed tags. The message is always the same, it is always like xml description and a data tag with inner xml that is the message content.
So if it would be without self closed tags, this would be easy, but how can I read both?
By the way I am using the TcpListener.
Edit :
Everything is fine if there is no self closed tag.
if (_clientSocket != null)
{
NetworkStream networkStream = _clientSocket.GetStream();
_clientSocket.ReceiveTimeout = 100; // 1000 miliseconds
while (_continueProcess)
{
if (networkStream.DataAvailable)
{
bool isMessageComplete = false;
String messageString = String.Empty;
while (!isMessageComplete)
{
var bytes = new byte[_clientSocket.ReceiveBufferSize];
try
{
int bytesReaded = networkStream.Read(bytes, 0, (int) _clientSocket.ReceiveBufferSize);
if (bytesReaded > 0)
{
var data = Encoding.UTF8.GetString(bytes, 0, bytesReaded);
messageString += data;
if (messageString.IndexOf("<DATA", StringComparison.OrdinalIgnoreCase) > 0 &&
messageString.IndexOf("</DATA", StringComparison.OrdinalIgnoreCase) > 0)
{
isMessageComplete = true;
}
}
}
catch (IOException)
{
// Timeout
}
catch (SocketException)
{
Console.WriteLine("Conection is broken!");
break;
}
}
}
Thread.Sleep(200);
} // while ( _continueProcess )
networkStream.Close();
_clientSocket.Close();
}
Edit 2 (30.03.2015 12:00)
Unfortunately it is not possible to use some kind of message frame.
So I ended up to use this part of code (DATA is my root node):
if (_clientSocket != null)
{
NetworkStream networkStream = _clientSocket.GetStream();
_clientSocket.ReceiveTimeout = 100;
string data = string.Empty;
while (_continueProcess)
{
try
{
if (networkStream.DataAvailable)
{
Stopwatch sw = new Stopwatch();
sw.Start();
var bytes = new byte[_clientSocket.ReceiveBufferSize];
int completeXmlLength = 0;
int bytesReaded = networkStream.Read(bytes, 0, (int) _clientSocket.ReceiveBufferSize);
if (bytesReaded > 0)
{
message.AddRange(bytes);
data += Encoding.UTF8.GetString(bytes, 0, bytesReaded);
if (data.IndexOf("<?", StringComparison.Ordinal) == 0)
{
if (data.IndexOf("<DATA", StringComparison.Ordinal) > 0)
{
Int32 rootStartPos = data.IndexOf("<DATA", StringComparison.Ordinal);
completeXmlLength += rootStartPos;
var root = data.Substring(rootStartPos);
int rootCloseTagPos = root.IndexOf(">", StringComparison.Ordinal);
Int32 rootSelfClosedTagPos = root.IndexOf("/>", StringComparison.Ordinal);
// If there is an empty tag that is self closed.
if (rootSelfClosedTagPos > 0)
{
string rootTag = root.Substring(0, rootSelfClosedTagPos +1);
// If there is no '>' between the self closed tag and the start of '<DATA'
// the root element is empty.
if (rootTag.IndexOf(">", StringComparison.Ordinal) <= 0)
{
completeXmlLength += rootSelfClosedTagPos;
string messageXmlString = data.Substring(0, completeXmlLength + 1);
data = data.Substring(messageXmlString.Length);
try
{
// parse complete xml.
XDocument xmlDocument = XDocument.Parse(messageXmlString);
}
catch(Exception)
{
// Invalid Xml.
}
continue;
}
}
if (rootCloseTagPos > 0)
{
Int32 rootEndTagStartPos = root.IndexOf("</DATA", StringComparison.Ordinal);
if (rootEndTagStartPos > 0)
{
var endTagString = root.Substring(rootEndTagStartPos);
completeXmlLength += rootEndTagStartPos;
Int32 completeEndPos = endTagString.IndexOf(">", StringComparison.Ordinal);
if (completeEndPos > 0)
{
completeXmlLength += completeEndPos;
string messageXmlString = data.Substring(0, completeXmlLength + 1);
data = data.Substring(messageXmlString.Length);
try
{
// parse complete xml.
XDocument xmlDocument = XDocument.Parse(messageXmlString);
}
catch(Exception)
{
// Invalid Xml.
}
}
}
}
}
}
}
sw.Stop();
string timeElapsed = sw.Elapsed.ToString();
}
}
catch (IOException)
{
data = String.Empty;
}
catch (SocketException)
{
Console.WriteLine("Conection is broken!");
break;
}
}
This code I had use if ther were some kind of message framing, in this case 4 bytes of message length:
if (_clientSocket != null)
{
NetworkStream networkStream = _clientSocket.GetStream();
_clientSocket.ReceiveTimeout = 100;
string data = string.Empty;
while (_continueProcess)
{
try
{
if (networkStream.DataAvailable)
{
Stopwatch sw = new Stopwatch();
sw.Start();
var lengthBytes = new byte[sizeof (Int32)];
int bytesReaded = networkStream.Read(lengthBytes, 0, sizeof (Int32) - offset);
if (bytesReaded > 0)
{
offset += bytesReaded;
message.AddRange(lengthBytes.Take(bytesReaded));
}
if (offset < sizeof (Int32))
{
continue;
}
Int32 length = BitConverter.ToInt32(message.Take(sizeof(Int32)).ToArray(), 0);
message.Clear();
while (length > 0)
{
Int32 bytesToRead = length < _clientSocket.ReceiveBufferSize ? length : _clientSocket.ReceiveBufferSize;
byte[] messageBytes = new byte[bytesToRead];
bytesReaded = networkStream.Read(messageBytes, 0, bytesToRead);
length = length - bytesReaded;
message.AddRange(messageBytes);
}
try
{
string xml = Encoding.UTF8.GetString(message.ToArray());
XDocument xDocument = XDocument.Parse(xml);
}
catch (Exception ex)
{
// Invalid Xml.
}
sw.Stop();
string timeElapsed = sw.Elapsed.ToString();
}
}
catch (IOException)
{
data = String.Empty;
}
catch (SocketException)
{
Console.WriteLine("Conection is broken!");
break;
}
}
Like you can see I wanted to measure the elapsed time, to see witch methode has a better performance. The strange thing is that the methode whith no message framing has an average time of 0,2290 ms, the other methode has an average time of 1,2253 ms.
Can someone explain me why? I thought the one without message framing would be slower...
Hand the NetworkStream to the .NET XML infrastructure. For example create an XmlReader from the NetworkStream.
Unfortunately I did not find a built-in way to easily create an XmlDocument from an XmlReader that has multiple documents in it. It complains about multiple root elements (which is correct). You would need to wrap the XmlReader and make it stop returning nodes when the first document is done. You can do that by keeping track of some state and by looking at the nesting level. When the nesting level is zero again the first document is done.
This is just a raw sketch. I'm pretty sure this will work and it handles all possible XML documents.
No need for this horrible string processing code that you have there. The existing code looks quite slow as well but since this approach is much better it serves no purpose to comment on the perf issues. You need to throw this away.
I had the same problem - 3rd party system sends messages in XML format via TCP but my TCP client application may receive message partially or several messages at once. One of my colleagues proposed very simple and quite generic solution.
The idea is to have a string buffer which should be populated char by char from TCP stream, after each char try to parse buffer content with regular .Net XML parser. If parser throws an exception - continue adding chars to the buffer. Otherwise - message is ready and can be processed by application.
Here is the code:
private object _dataReceiverLock = new object();
private string _messageBuffer;
private Stopwatch _timeSinceLastMessage = new Stopwatch();
private List<string> NormalizeMessage(string rawMsg)
{
lock (_dataReceiverLock)
{
List<string> result = new List<string>();
//following code prevents buffer to store too old information
if (_timeSinceLastMessage.ElapsedMilliseconds > _settings.ResponseTimeout)
{
_messageBuffer = string.Empty;
}
_timeSinceLastMessage.Restart();
foreach (var ch in rawMsg)
{
_messageBuffer += ch;
if (ch == '>')//to avoid extra checks
{
if (IsValidXml(_messageBuffer))
{
result.Add(_messageBuffer);
_messageBuffer = string.Empty;
}
}
}
return result;
}
}
private bool IsValidXml(string xml)
{
try
{
//fastest way to validate XML format correctness
using (XmlTextReader reader = new XmlTextReader(new StringReader(xml)))
{
while (reader.Read()) { }
}
return true;
}
catch
{
return false;
}
}
Few comments:
Need to control lifetime of string buffer, otherwise in case network disconnection old information may stay in the buffer forever
There major problem here is the performance - parsing after every new character is quite slow. So need to add some optimizations, such as parse only after '>' character.
Make sure this method is thread safe, otherwise several threads may flood string buffer with different XML pieces.
The usage is simple:
private void _tcpClient_DataReceived(byte[] data)
{
var rawMsg = Encoding.Unicode.GetString(data);
var normalizedMessages = NormalizeMessage(rawMsg);
foreach (var normalizedMessage in normalizedMessages)
{
//TODO: your logic
}
}
Im trying to send some object from a server to the client.
My problem is that when im sending only 1 object, everything works correctly. But at the moment i add another object an exception is thrown - "binary stream does not contain a valid binaryheader" or "No map for object (random number)".
My thoughts are that the deserialization does not understand where the stream starts / ends and i hoped that you guys can help me out here.
heres my deserialization code:
public void Listen()
{
try
{
bool offline = true;
Dispatcher.Invoke(System.Windows.Threading.DispatcherPriority.Normal,
new Action(() => offline = Offline));
while (!offline)
{
TcpObject tcpObject = new TcpObject();
IFormatter formatter = new BinaryFormatter();
tcpObject = (TcpObject)formatter.Deserialize(serverStream);
if (tcpObject.Command == Command.Transfer)
{
SentAntenna sentAntenna = (SentAntenna)tcpObject.Object;
int idx = 0;
foreach (string name in SharedProperties.AntennaNames)
{
if (name == sentAntenna.Name)
break;
idx++;
}
if (idx < 9)
{
PointCollection pointCollection = new PointCollection();
foreach (Frequency f in sentAntenna.Frequencies)
pointCollection.Add(new Point(f.Channel, f.Intensity));
SharedProperties.AntennaPoints[idx] = pointCollection;
}
}
}
}
catch (Exception ex)
{
MessageBox.Show(ex.Message); // raise an event
}
}
serialization code:
case Command.Transfer:
Console.WriteLine("Transfering");
Thread transfer = new Thread(new ThreadStart(delegate
{
try
{
string aName = tcpObject.Object.ToString();
int indx = 0;
foreach (string name in names)
{
if (name == aName)
break;
indx++;
}
if (indx < 9)
{
while (true) // need to kill when the father thread terminates
{
if (antennas[indx].Frequencies != null)
{
lock (antennas[indx].Frequencies)
{
TcpObject sendTcpObject = new TcpObject();
sendTcpObject.Command = Command.Transfer;
SentAntenna sa = new SentAntenna(antennas[indx].Frequencies, aName);
sendTcpObject.Object = sa;
formatter.Serialize(networkStream, sendTcpObject);
}
}
}
}
}
catch (Exception ex) { Console.WriteLine(ex); }
}));
transfer.Start();
break;
Interesting. There's nothing particularly odd in your serialization code, and I've seen people use vanilla concatenation for multiple objects in the past, although I've actually always advised against it as BinaryFormatter does not explicitly claim this scenario is OK. But: if it isn't, the only thing I can suggest is to implement your own framing; so your write code becomes:
serialize to an empty MemoryStream
note the length and write the length to the NetworkStream, for example as a simple fixed-width 32-bit network-byte-order integer
write the payload from the MemoryStream to the NetworkStream
rinse, repeat
And the read code becomes:
read exactly 4 bytes and compute the length
buffer that many bytes into a MemoryStream
deserialize from the NetworkStream
(Noting in both cases to set the MemoryStream's position back to 0 between write and read)
You can also implement a Stream-subclass that caps the length if you want to avoid a buffer when reading, bit that is more complex.
apperantly i came up with a really simple solution. I just made sure only 1 thread is allowed to transfer data at the same time so i changed this line of code:
formatter.Serialize(networkStream, sendTcpObject);
to these lines of code:
if (!transfering) // making sure only 1 thread is transfering data
{
transfering = true;
formatter.Serialize(networkStream, sendTcpObject);
transfering = false;
}