Trying to deserialize more than 1 object at the same time - c#

Im trying to send some object from a server to the client.
My problem is that when im sending only 1 object, everything works correctly. But at the moment i add another object an exception is thrown - "binary stream does not contain a valid binaryheader" or "No map for object (random number)".
My thoughts are that the deserialization does not understand where the stream starts / ends and i hoped that you guys can help me out here.
heres my deserialization code:
public void Listen()
{
try
{
bool offline = true;
Dispatcher.Invoke(System.Windows.Threading.DispatcherPriority.Normal,
new Action(() => offline = Offline));
while (!offline)
{
TcpObject tcpObject = new TcpObject();
IFormatter formatter = new BinaryFormatter();
tcpObject = (TcpObject)formatter.Deserialize(serverStream);
if (tcpObject.Command == Command.Transfer)
{
SentAntenna sentAntenna = (SentAntenna)tcpObject.Object;
int idx = 0;
foreach (string name in SharedProperties.AntennaNames)
{
if (name == sentAntenna.Name)
break;
idx++;
}
if (idx < 9)
{
PointCollection pointCollection = new PointCollection();
foreach (Frequency f in sentAntenna.Frequencies)
pointCollection.Add(new Point(f.Channel, f.Intensity));
SharedProperties.AntennaPoints[idx] = pointCollection;
}
}
}
}
catch (Exception ex)
{
MessageBox.Show(ex.Message); // raise an event
}
}
serialization code:
case Command.Transfer:
Console.WriteLine("Transfering");
Thread transfer = new Thread(new ThreadStart(delegate
{
try
{
string aName = tcpObject.Object.ToString();
int indx = 0;
foreach (string name in names)
{
if (name == aName)
break;
indx++;
}
if (indx < 9)
{
while (true) // need to kill when the father thread terminates
{
if (antennas[indx].Frequencies != null)
{
lock (antennas[indx].Frequencies)
{
TcpObject sendTcpObject = new TcpObject();
sendTcpObject.Command = Command.Transfer;
SentAntenna sa = new SentAntenna(antennas[indx].Frequencies, aName);
sendTcpObject.Object = sa;
formatter.Serialize(networkStream, sendTcpObject);
}
}
}
}
}
catch (Exception ex) { Console.WriteLine(ex); }
}));
transfer.Start();
break;

Interesting. There's nothing particularly odd in your serialization code, and I've seen people use vanilla concatenation for multiple objects in the past, although I've actually always advised against it as BinaryFormatter does not explicitly claim this scenario is OK. But: if it isn't, the only thing I can suggest is to implement your own framing; so your write code becomes:
serialize to an empty MemoryStream
note the length and write the length to the NetworkStream, for example as a simple fixed-width 32-bit network-byte-order integer
write the payload from the MemoryStream to the NetworkStream
rinse, repeat
And the read code becomes:
read exactly 4 bytes and compute the length
buffer that many bytes into a MemoryStream
deserialize from the NetworkStream
(Noting in both cases to set the MemoryStream's position back to 0 between write and read)
You can also implement a Stream-subclass that caps the length if you want to avoid a buffer when reading, bit that is more complex.

apperantly i came up with a really simple solution. I just made sure only 1 thread is allowed to transfer data at the same time so i changed this line of code:
formatter.Serialize(networkStream, sendTcpObject);
to these lines of code:
if (!transfering) // making sure only 1 thread is transfering data
{
transfering = true;
formatter.Serialize(networkStream, sendTcpObject);
transfering = false;
}

Related

Problem with passing constantly reading serial buffer data to another class

Long story short. ;
I have a class named Scope. And this class contains all logic for scope operations etc. It also starts backround thread that constantly read serial port data (in my case events was unreliable):
Thread BackgroundReader = new Thread(ReadBuffer);
BackgroundReader.IsBackground = true;
BackgroundReader.Start();
private void ReadBuffer()
{
SerialPort.DiscardInBuffer();
while (!_stopCapture)
{
int bufferSize = SerialPort.BytesToRead;
byte[] buffer = new byte[bufferSize];
if(bufferSize > 5)
{
SerialPort.Read(buffer, 0, bufferSize);
Port_DataReceivedEvent(buffer, null);
}
Thread.Sleep(_readDelay);
}
CurrentBuffer = null;
}
In Scope class there is a public field named Buffer
public byte[] Buffer
{
get
{
return CurrentBuffer;
}
}
And here is event fired while there is new data readed
private void Port_DataReceivedEvent(object sender, EventArgs e)
{
//populate buffer
Info(sender, null);
CurrentBuffer = ((byte[])sender);
foreach(byte data in CurrentBuffer)
{
DataBuffer.Enqueue(data);
}
if (DataBuffer.Count() > _recordLength)
{
GenerateFrame(DataBuffer.ToArray());
DataBuffer.Clear(); ;
}
}
To make code more manageable, I splitted it in several classes. One of this classes is for searching specific data pattern in current stream and create specific object from this data. This code works in way that send to serial port specific command and expect return frame. If reponse is not received or not ok, send is performed again and again until correct response arrives or there will be timeout. Response is expected to be in current buffer. Those strange string manipulation is for debug purposes.
public class GetAcknowledgedFrame
{
byte[] WritedData;
string lastEx;
string stringData;
public DataFrame WriteAcknowledged(Type SendType, Type ReturnType, JyeScope scope)
{
var stopwatch = new Stopwatch();
stopwatch.Restart();
while (stopwatch.ElapsedMilliseconds < scope.TimeoutTime)
{
try
{
if (SendType == typeof(GetParameters))
{
WriteFrame(new ScopeControlFrames.GetParameters(), scope.SerialPort);
}
else if(SendType == typeof(GetConfig))
{
WriteFrame(new ScopeControlFrames.GetConfig(), scope.SerialPort);
}
else if (SendType == typeof(EnterUSBScopeMode))
{
WriteFrame(new ScopeControlFrames.EnterUSBScopeMode(), scope.SerialPort);
}
return ReturnFrame(ReturnType, scope.Buffer, scope.TimeoutTime);
}
catch (InvalidDataFrameException ex)
{
lastEx = ex.Message;
System.Threading.Thread.Sleep(10);
}
}
stringData = "";
foreach (var data in scope.Buffer)
{
stringData += data + ",";
}
stringData.Remove(stringData.Length - 1);
throw new TimeoutException($"Timeout while waiting for frame acknowledge: " + SendType.ToString() + ", " + ReturnType.ToString() + Environment.NewLine+ "Add. err: "+lastEx);
}
private DataFrame ReturnFrame(Type FrameType, byte[] buffer, int timeoutTime)
{
if (FrameType == typeof(DataFrames.DSO068.CurrConfigDataFrame))
{
DataFrames.DSO068.CurrConfigDataFrame CurrConfig = new DataFrames.DSO068.CurrConfigDataFrame(buffer);
return CurrConfig;
}
else if (FrameType == typeof(DataFrames.DSO112.CurrConfigDataFrame))
{
DataFrames.DSO112.CurrConfigDataFrame CurrParam = new DataFrames.DSO112.CurrConfigDataFrame(buffer);
return CurrParam;
}
else if (FrameType == typeof(CurrParamDataFrame))
{
CurrParamDataFrame CurrParam = new CurrParamDataFrame(buffer);
return CurrParam;
}
else if (FrameType == typeof(DataBlockDataFrame))
{
DataBlockDataFrame CurrData = new DataBlockDataFrame(buffer);
return CurrData;
}
else if (FrameType == typeof(DataSampleDataFrame))
{
DataSampleDataFrame CurrData = new DataSampleDataFrame(buffer);
return CurrData;
}
else if (FrameType == typeof(ScopeControlFrames.ScopeReady))
{
ScopeControlFrames.ScopeReady ready = new ScopeControlFrames.ScopeReady(buffer);
return ready;
}
else
{
throw new InvalidOperationException("Wrong object type");
}
}
private bool WriteFrame(DataFrame frame, IStreamResource port)
{
WritedData = frame.Data;
port.Write(frame.Data, 0, frame.Data.Count());
return true;
}
}
From main class (and main thread) I call method in this class, for example:
var Ready = (ScopeControlFrames.ScopeReady)new GetAcknowledgedFrame().WriteAcknowledged
(typeof(ScopeControlFrames.EnterUSBScopeMode), typeof(ScopeControlFrames.ScopeReady), this);
The problem is when I pass "this" object (that has thread working in background) to my helper class. It seems like helper class not see changing data in this object. The problem started when I separate code of my helper class from main class.
My questions:
- I know that object are passed by reference, that means I think that when object is dynamically changing its state (in this case data buffer should changing while new data is received) all classes that has reference to this object are also seeing this changes. Maybe I'm missing something?
- I tried passing array (by ref), arrays are also reference types. But this not help me at all. Maybe I'm missing something?
I tried changing this class to static, it not helped.
Many thanks for help.
The code below;
Info(sender, null);
CurrentBuffer = ((byte[])sender);
is creating a new reference variable called CurrentBuffer. Any other code holding a reference 'pointer' to the CurrentBuffer value prior to this line of code will not get the new value of CurrentBuffer when its reset.

WCF streaming large number objects

I have a WCF service that query a database and returns a large number of records. There is so many records, that the server runs out of memory and fails before it can return.
So I want to send the records back as I fetch them from the database, or a set number back at a time.
For additional clarity, I cannot collect call records fetched into a collection on the server, as the server runs out of memory before I have collected all the records. I want to try and find away to send them back one by one or in chunks, in one call.
For example, in chunks:
Fetch first 1000 records
Add to collection
Send collection to client
Clear collection
Fetch next 1000 records, and repeat from step 2
So the idea I have how the web service code will look something like this:
Public IEnumerable<Customer> GetAllCustomers()
{
// Setup Query
string query = PrepareQuery();
// Create Connection
connection = new SqlConnection(ConnectionString);
connection.Open();
var sqlcommand = connection.CreateCommand();
sqlcommand.CommandText = query.ToString();
// Read Results
var reader = sqlcommand.ExecuteReader();
while (reader.Read())
{
Customer customer = new Customer();
foreach (var column in Columns)
{
int fieldIndex = reader.GetOrdinal(column);
object value = reader.GetValue(fieldIndex);
customer[column.Name] = value;
}
yield return customer;
}
}
I don't want to consider paging as the Order By on the SQL server is slow.
Looking for way to do this in WCF
I think you answer your own question. There are 2 ways to do it, stream or chunk.
You can do streaming in wcf - see https://learn.microsoft.com/en-us/dotnet/framework/wcf/feature-details/large-data-and-streaming
You get a Stream to write to, so you need to handle yourself how you are going to encode your data on that stream, and how you are going decode it at the client.
The alternative is you do chunking/paging. You just modify your service so it accepts e.g. a page number or some other way to indicate which page is needed.
Which one you do depends on the application, eg how much data? what is the nature of the client? is it possible to use some field to page on? etc etc
Here is some psudo code for making a stream that can do this on the server side. It is based on the example here: https://learn.microsoft.com/en-us/dotnet/framework/wcf/feature-details/how-to-enable-streaming
I'm not writing the full compilable code for you, but this is the gist of it.
In the server:
public Stream GetBigData()
{
return new BigDataStream();
}
BigDataStream (the non-implimented methods are not shown):
class BigDataStream : Stream
{
public BigDataStream()
{
// open DB connection
// run your query
// get a DataReader
}
// you need a buffer to encode your data between calls to Read
List<byte> _encodeBuffer = new List<byte>();
public override int Read(byte[] buffer, int offset, int count)
{
// read from the DataReader and populate the _encodeBuffer
// until the _encodeBuffer contains at least count bytes
// (or until there are no more records)
// for example:
while (_encodeBuffer.Count < count && _reader.Read())
{
// (1)
// encode the record into a byte array. How to do this?
// you can read into a class and then use the data
// contract serialization for example. If you do this, you
// will probably find it easier to prepend an integer which
// specifies the length of the following encoded message.
// This will make it easier for the client to deserialize it.
// (2)
// append the encoded record bytes (plus any length prefix
// etc) to _encodeBuffer
}
// remove up to the first count bytes from _encodeBuffer
// and copy them into buffer at the offset requested
// return the number of bytes added
}
public override void Close()
{
// close the reader + db connection
base.Close();
}
}
Thank to mikelegg & Reniuz for helping come to a solution. I wish I could give them the tick for the right answer, but I am a afraid the next developer to read this question would not fully benefit. So where is what I ended up with.
Setup the config files for the Server and Client (Follow link: Large Data and Streaming)
Followed this solution, can download source code from here
I had to change the DBRowStream.DBThreadProc method a bit to work so I post the source code:
DBRowStream Class:
void DBThreadProc(object o)
{
SqlConnection con = null;
SqlCommand com = null;
try
{
con = new System.Data.SqlClient.SqlConnection(/*ConnectionString*/);
com = new SqlCommand();
com.Connection = con;
com.CommandText = PrepareQuery();
con.Open();
SqlDataReader reader = com.ExecuteReader();
int count = 0;
MemoryStream memStream = memStream1;
memStreamWriteStatus = 1;
readyToWriteToMemStream1.WaitOne();
while (reader.Read())
{
// Populate
Customer customer = new Customer();
foreach (var column in Columns)
{
int fieldIndex = reader.GetOrdinal(column);
object value = reader.GetValue(fieldIndex);
customer[column.Name] = value;
}
// Serialize: I used a custom Serializer
// but BinaryFormatter should be fine
DBDataFormatter.Serialize(memStream, customer);
count++;
if (count == PAGESIZE) // const int PAGESIZE = 10000
{
switch (memStreamWriteStatus)
{
case 1: // done writing to stream 1
{
memStream1.Position = 0;
readyToSendFromMemStream1.Set();
// write stream 1 is done...waiting for stream 2
readyToWriteToMemStream2.WaitOne();
memStream = memStream2;
memStream.Position = 0;
memStream.SetLength(0); // Added:To Reset the stream. Else was getting garbage data back
memStreamWriteStatus = 2;
break;
}
case 2: // done writing to stream 2
{
memStream2.Position = 0;
readyToSendFromMemStream2.Set();
// Write on stream 2 is done...waiting for stream 1
readyToWriteToMemStream1.WaitOne();
// done waiting for stream 1
memStream = memStream1;
memStreamWriteStatus = 1;
memStream.Position = 0;
memStream.SetLength(0); // Added: Reset the stream. Else was getting garbage data back
break;
}
}
count = 0;
}
}
if (count > 0)
{
switch (memStreamWriteStatus)
{
case 1: // done writing to stream 1
{
memStream1.Position = 0;
readyToSendFromMemStream1.Set();
// END write stream 1 is done...waiting for stream 2
break;
}
case 2: // done writing to stream 2
{
memStream2.Position = 0;
readyToSendFromMemStream2.Set();
// END write stream 2 is done...waiting for stream 1
break;
}
}
}
bDoneWriting = true;
bCanRead = false;
}
catch
{
throw;
}
finally
{
if (com != null)
{
com.Dispose();
com = null;
}
if (con != null)
{
con.Close();
con.Dispose();
con = null;
}
}
}
And then the Client side:
private static void TestGetRecordsAndDump()
{
const string FILE_NAME = "Records.CSV";
File.Delete(FILE_NAME);
var file = File.AppendText(FILE_NAME);
long count = 0;
try
{
ServiceReference1.ServiceClient service = new ServiceReference1.DataServiceClient();
var stream = service.GetDBRowStream();
Console.WriteLine("Records Retrieved : ");
Console.WriteLine("File Size (MB) : ");
var canDoLastRead = true;
while (stream.CanRead && canDoLastRead)
{
try
{
Customer customer = DBDataFormatter.Deserialize(stream); // Used custom Deserializer, but BinaryFormatter should be fine
file.Write(customer.ToString());
count++;
}
catch
{
canDoLastRead = false; // Bug: stream.CanRead is not set to false at the end of stream, so I do this trick to know if I finished retruning all records.
}
finally
{
Console.SetCursorPosition("Records Retrieved : ".Length, 0);
Console.Write(string.Format("{0} ", count));
Console.SetCursorPosition("File Size (MB) : ".Length, 1);
Console.Write(string.Format("{0:G} ", file.BaseStream.Length / 1024f / 1024f));
}
}
finally
{
file.Close();
}
}
}
There is a bug I cannot seem to solve, that stream.CanRead is not set to false, then all the records have been returned, have not been able to work out why, but at least now, I can query large data sets, and return all records, with out the server or client running out of memory.

Xml over tcp without message frame

I have to implement a tcp connection where raw xml data is passed.
Unfortunately there is no message framing, I now this is realy bad, but I have to deal with this...
The Message would look like this:
<?xml version="1.0" encoding="utf-8"?>
<DATA></DATA>
or this
<?xml version="1.0" encoding="utf-8"?>
<DATA />
Now I have to receive messages that could have self closed tags. The message is always the same, it is always like xml description and a data tag with inner xml that is the message content.
So if it would be without self closed tags, this would be easy, but how can I read both?
By the way I am using the TcpListener.
Edit :
Everything is fine if there is no self closed tag.
if (_clientSocket != null)
{
NetworkStream networkStream = _clientSocket.GetStream();
_clientSocket.ReceiveTimeout = 100; // 1000 miliseconds
while (_continueProcess)
{
if (networkStream.DataAvailable)
{
bool isMessageComplete = false;
String messageString = String.Empty;
while (!isMessageComplete)
{
var bytes = new byte[_clientSocket.ReceiveBufferSize];
try
{
int bytesReaded = networkStream.Read(bytes, 0, (int) _clientSocket.ReceiveBufferSize);
if (bytesReaded > 0)
{
var data = Encoding.UTF8.GetString(bytes, 0, bytesReaded);
messageString += data;
if (messageString.IndexOf("<DATA", StringComparison.OrdinalIgnoreCase) > 0 &&
messageString.IndexOf("</DATA", StringComparison.OrdinalIgnoreCase) > 0)
{
isMessageComplete = true;
}
}
}
catch (IOException)
{
// Timeout
}
catch (SocketException)
{
Console.WriteLine("Conection is broken!");
break;
}
}
}
Thread.Sleep(200);
} // while ( _continueProcess )
networkStream.Close();
_clientSocket.Close();
}
Edit 2 (30.03.2015 12:00)
Unfortunately it is not possible to use some kind of message frame.
So I ended up to use this part of code (DATA is my root node):
if (_clientSocket != null)
{
NetworkStream networkStream = _clientSocket.GetStream();
_clientSocket.ReceiveTimeout = 100;
string data = string.Empty;
while (_continueProcess)
{
try
{
if (networkStream.DataAvailable)
{
Stopwatch sw = new Stopwatch();
sw.Start();
var bytes = new byte[_clientSocket.ReceiveBufferSize];
int completeXmlLength = 0;
int bytesReaded = networkStream.Read(bytes, 0, (int) _clientSocket.ReceiveBufferSize);
if (bytesReaded > 0)
{
message.AddRange(bytes);
data += Encoding.UTF8.GetString(bytes, 0, bytesReaded);
if (data.IndexOf("<?", StringComparison.Ordinal) == 0)
{
if (data.IndexOf("<DATA", StringComparison.Ordinal) > 0)
{
Int32 rootStartPos = data.IndexOf("<DATA", StringComparison.Ordinal);
completeXmlLength += rootStartPos;
var root = data.Substring(rootStartPos);
int rootCloseTagPos = root.IndexOf(">", StringComparison.Ordinal);
Int32 rootSelfClosedTagPos = root.IndexOf("/>", StringComparison.Ordinal);
// If there is an empty tag that is self closed.
if (rootSelfClosedTagPos > 0)
{
string rootTag = root.Substring(0, rootSelfClosedTagPos +1);
// If there is no '>' between the self closed tag and the start of '<DATA'
// the root element is empty.
if (rootTag.IndexOf(">", StringComparison.Ordinal) <= 0)
{
completeXmlLength += rootSelfClosedTagPos;
string messageXmlString = data.Substring(0, completeXmlLength + 1);
data = data.Substring(messageXmlString.Length);
try
{
// parse complete xml.
XDocument xmlDocument = XDocument.Parse(messageXmlString);
}
catch(Exception)
{
// Invalid Xml.
}
continue;
}
}
if (rootCloseTagPos > 0)
{
Int32 rootEndTagStartPos = root.IndexOf("</DATA", StringComparison.Ordinal);
if (rootEndTagStartPos > 0)
{
var endTagString = root.Substring(rootEndTagStartPos);
completeXmlLength += rootEndTagStartPos;
Int32 completeEndPos = endTagString.IndexOf(">", StringComparison.Ordinal);
if (completeEndPos > 0)
{
completeXmlLength += completeEndPos;
string messageXmlString = data.Substring(0, completeXmlLength + 1);
data = data.Substring(messageXmlString.Length);
try
{
// parse complete xml.
XDocument xmlDocument = XDocument.Parse(messageXmlString);
}
catch(Exception)
{
// Invalid Xml.
}
}
}
}
}
}
}
sw.Stop();
string timeElapsed = sw.Elapsed.ToString();
}
}
catch (IOException)
{
data = String.Empty;
}
catch (SocketException)
{
Console.WriteLine("Conection is broken!");
break;
}
}
This code I had use if ther were some kind of message framing, in this case 4 bytes of message length:
if (_clientSocket != null)
{
NetworkStream networkStream = _clientSocket.GetStream();
_clientSocket.ReceiveTimeout = 100;
string data = string.Empty;
while (_continueProcess)
{
try
{
if (networkStream.DataAvailable)
{
Stopwatch sw = new Stopwatch();
sw.Start();
var lengthBytes = new byte[sizeof (Int32)];
int bytesReaded = networkStream.Read(lengthBytes, 0, sizeof (Int32) - offset);
if (bytesReaded > 0)
{
offset += bytesReaded;
message.AddRange(lengthBytes.Take(bytesReaded));
}
if (offset < sizeof (Int32))
{
continue;
}
Int32 length = BitConverter.ToInt32(message.Take(sizeof(Int32)).ToArray(), 0);
message.Clear();
while (length > 0)
{
Int32 bytesToRead = length < _clientSocket.ReceiveBufferSize ? length : _clientSocket.ReceiveBufferSize;
byte[] messageBytes = new byte[bytesToRead];
bytesReaded = networkStream.Read(messageBytes, 0, bytesToRead);
length = length - bytesReaded;
message.AddRange(messageBytes);
}
try
{
string xml = Encoding.UTF8.GetString(message.ToArray());
XDocument xDocument = XDocument.Parse(xml);
}
catch (Exception ex)
{
// Invalid Xml.
}
sw.Stop();
string timeElapsed = sw.Elapsed.ToString();
}
}
catch (IOException)
{
data = String.Empty;
}
catch (SocketException)
{
Console.WriteLine("Conection is broken!");
break;
}
}
Like you can see I wanted to measure the elapsed time, to see witch methode has a better performance. The strange thing is that the methode whith no message framing has an average time of 0,2290 ms, the other methode has an average time of 1,2253 ms.
Can someone explain me why? I thought the one without message framing would be slower...
Hand the NetworkStream to the .NET XML infrastructure. For example create an XmlReader from the NetworkStream.
Unfortunately I did not find a built-in way to easily create an XmlDocument from an XmlReader that has multiple documents in it. It complains about multiple root elements (which is correct). You would need to wrap the XmlReader and make it stop returning nodes when the first document is done. You can do that by keeping track of some state and by looking at the nesting level. When the nesting level is zero again the first document is done.
This is just a raw sketch. I'm pretty sure this will work and it handles all possible XML documents.
No need for this horrible string processing code that you have there. The existing code looks quite slow as well but since this approach is much better it serves no purpose to comment on the perf issues. You need to throw this away.
I had the same problem - 3rd party system sends messages in XML format via TCP but my TCP client application may receive message partially or several messages at once. One of my colleagues proposed very simple and quite generic solution.
The idea is to have a string buffer which should be populated char by char from TCP stream, after each char try to parse buffer content with regular .Net XML parser. If parser throws an exception - continue adding chars to the buffer. Otherwise - message is ready and can be processed by application.
Here is the code:
private object _dataReceiverLock = new object();
private string _messageBuffer;
private Stopwatch _timeSinceLastMessage = new Stopwatch();
private List<string> NormalizeMessage(string rawMsg)
{
lock (_dataReceiverLock)
{
List<string> result = new List<string>();
//following code prevents buffer to store too old information
if (_timeSinceLastMessage.ElapsedMilliseconds > _settings.ResponseTimeout)
{
_messageBuffer = string.Empty;
}
_timeSinceLastMessage.Restart();
foreach (var ch in rawMsg)
{
_messageBuffer += ch;
if (ch == '>')//to avoid extra checks
{
if (IsValidXml(_messageBuffer))
{
result.Add(_messageBuffer);
_messageBuffer = string.Empty;
}
}
}
return result;
}
}
private bool IsValidXml(string xml)
{
try
{
//fastest way to validate XML format correctness
using (XmlTextReader reader = new XmlTextReader(new StringReader(xml)))
{
while (reader.Read()) { }
}
return true;
}
catch
{
return false;
}
}
Few comments:
Need to control lifetime of string buffer, otherwise in case network disconnection old information may stay in the buffer forever
There major problem here is the performance - parsing after every new character is quite slow. So need to add some optimizations, such as parse only after '>' character.
Make sure this method is thread safe, otherwise several threads may flood string buffer with different XML pieces.
The usage is simple:
private void _tcpClient_DataReceived(byte[] data)
{
var rawMsg = Encoding.Unicode.GetString(data);
var normalizedMessages = NormalizeMessage(rawMsg);
foreach (var normalizedMessage in normalizedMessages)
{
//TODO: your logic
}
}

how to parse emails faster than OpenPop.dll

it is possible using OpenPop.dll.
Pop3Client objPOP3Client = new Pop3Client();
int intTotalEmail = 0;
DataTable dtEmail = new DataTable();
object[] objMessageParts;
try
{
dtEmail = GetAllEmailStructure();
if (objPOP3Client.Connected)
objPOP3Client.Disconnect();
objPOP3Client.Connect(strHostName, intPort, bulUseSSL);
try
{
objPOP3Client.Authenticate(strUserName, new Common()._Decode(strPassword));
intTotalEmail = objPOP3Client.GetMessageCount();
AddMapping();
for (int i = 1; i <= intTotalEmail; i++)
{
objMessageParts = GetMessageContent(i, ref objPOP3Client, dtExistMailList);
if (objMessageParts != null && objMessageParts[0].ToString() == "0")
{
AddToDtEmail(objMessageParts, i, dtEmail, dtUserList, dtTicketIDList, dtBlacklistEmails, dtBlacklistSubject, dtBlacklistDomains);
}
}
}
catch (Exception ex)
{
}
}
catch (Exception ex)
{
ParserLogError(ex, "GetAllEmail()");
}
finally
{
if (objPOP3Client.Connected)
objPOP3Client.Disconnect();
}
// function
public object[] GetMessageContent(int intMessageNumber, ref Pop3Client objPOP3Client, DataTable dtExistingMails)
{
object[] strArrMessage = new object[10];
Message objMessage;
MessagePart plainTextPart = null, HTMLTextPart = null;
string strMessageId = "";
try
{
strArrMessage[0] = "";
strArrMessage[1] = "";
strArrMessage[2] = "";
strArrMessage[3] = "";
strArrMessage[4] = "";
strArrMessage[5] = "";
strArrMessage[6] = "";
strArrMessage[7] = null;
strArrMessage[8] = null;
strArrMessage[7] = "";
strArrMessage[8] = "";
objMessage = objPOP3Client.GetMessage(intMessageNumber);
strMessageId = (objMessage.Headers.MessageId == null ? "" : objMessage.Headers.MessageId.Trim());
if (!IsExistMessageID(dtExistingMails, strMessageId)) //check in data base message id is exists or not
{
strArrMessage[0] = "0";
strArrMessage[1] = objMessage.Headers.From.Address.Trim(); // From EMail Address
strArrMessage[2] = objMessage.Headers.From.DisplayName.Trim(); // From EMail Name
strArrMessage[3] = objMessage.Headers.Subject.Trim();// Mail Subject
plainTextPart = objMessage.FindFirstPlainTextVersion();
strArrMessage[4] = (plainTextPart == null ? "" : plainTextPart.GetBodyAsText().Trim());
HTMLTextPart = objMessage.FindFirstHtmlVersion();
strArrMessage[5] = (HTMLTextPart == null ? "" : HTMLTextPart.GetBodyAsText().Trim());
strArrMessage[6] = strMessageId;
List<MessagePart> attachment = objMessage.FindAllAttachments();
strArrMessage[7] = null;
strArrMessage[8] = null;
if (attachment.Count > 0)
{
if (attachment[0] != null && attachment[0].IsAttachment)
{
strArrMessage[7] = attachment[0].FileName.Trim();
strArrMessage[8] = attachment[0];
}
}
}
else
{
strArrMessage[0] = "1";
}
}
catch (Exception ex)
{
ParserLogError(ex, "GetMessageContent()");
}
return strArrMessage;
}
but, i want to make it faster than above OpenPop.dll. so please let me know if any other technique are there for parsing mails.
please check code and then tell me.
Thanks in advance
but, i want to make it faster than above OpenPop.dll. so please let me
know if any other technique are there for parsing mails.
In your GetMessageContent() method, the 1 place that consumes the vast amount of time is:
objMessage = objPOP3Client.GetMessage(intMessageNumber);
The network I/O part of downloading a message cannot really be optimized, but OpenPop.NET's parser is slow (based on my own performance tests).
MimeKit is 25x faster than OpenPop.NET at parsing email messages.
One of the main performance problems in OpenPop.NET's MIME parser is the fact that it uses a StreamReader for parsing (which is slow due to unnecessary charset conversion, reading 1 line at a time, etc - I have an analysis of another email library that uses StreamReader for parsing here: https://stackoverflow.com/a/18787176/87117).
Then there's the problem that OpenPop.NET's parser also uses Regex to remove CFWS (Comments and Folding White Space) from a header string before parsing/decoding it. This is expensive. It's far better to write a good tokenizer that can deal with CFWS.
If you are interested in some of the other techniques I used to optimize MimeKit to be so fast (as fast or faster than highly optimized C implementations), I wrote some blog posts about this:
Optimization Tricks used by MimeKit: Part 1
The summary of the optimization I talk about in part 1 is replacing loops like this that scan for the end of a line:
while (*inptr != (byte) '\n')
inptr++;
with a faster loop, like this:
int* dword = (int*) inptr;
do {
mask = *dword++ ^ 0x0A0A0A0A;
mask = ((mask - 0x01010101) & (~mask & 0x80808080));
} while (mask == 0);
inptr = (byte*) (dword - 1);
while (*inptr != (byte) '\n')
inptr++;
which improved performance by 20% (although on non-x86 architectures, it requires 'dword' to be 4-byte aligned).
Optimization Tricks used by MimeKit: Part 2
In part 2, I talk about writing a more optimized version of System.IO.MemoryStream. The problem with MemoryStream is that it has to keep 1 contiguous block of memory with the content, which means that as you write more data to it and it has to resize its internal byte array, it has to copy the content to the new array (which is expensive, especially once the amount of data in the stream is large).
To work around this performance bottleneck, I wrote a MemoryBlockStream which does not need to use a contiguous block of memory - it uses a linked list of byte arrays. Instead of having to resize the byte array when you overflow the current buffer, it simply allocates another 2048-byte array that the data will overflow into and appends it to the linked list.
Note: MimeKit itself only does email parsing, it doesn't do POP3 or SMTP or IMAP. If you want that kind of functionality, I've also written a library built on MimeKit that does that as well: MailKit
Update:
Sample code using MailKit (as requested) to download/parse all messages:
using System;
using System.Net;
using MailKit.Net.Pop3;
using MailKit;
using MimeKit;
namespace TestClient {
class Program
{
public static void Main (string[] args)
{
using (var client = new Pop3Client ()) {
client.Connect ("pop.gmail.com", 995, true);
// Note: since we don't have an OAuth2 token, disable
// the XOAUTH2 authentication mechanism.
client.AuthenticationMechanisms.Remove ("XOAUTH2");
client.Authenticate ("joey#gmail.com", "password");
int count = client.GetMessageCount ();
for (int i = 0; i < count; i++) {
var message = client.GetMessage (i);
Console.WriteLine ("Subject: {0}", message.Subject);
}
client.Disconnect (true);
}
}
}
}

Finding elements from a memory mapped file in C#

I need to find certain elements within a memory mapped file. I have managed to map the file, however I get some problems finding the elements. My idea was to save all file elements into a list, and then search on that list.
How do I create a function that returns a list with all elements of the mapped file?
// Index indicates the line to read from
public List<string> GetElement(int index) {
}
The way I am mapping the file:
public void MapFile(string path)
{
string mapName = Path.GetFileName(path);
try
{
// Opening existing mmf
if (mapName != null)
{
_mmf = MemoryMappedFile.OpenExisting(mapName);
}
// Setting the pointer at the start of the file
_pointer = 0;
// We create the accessor to read the file
_accessor = _mmf.CreateViewAccessor();
// We mark the file as open
_open = true;
}
catch (Exception ex) {....}
try
{
// Trying to create the mmf
_mmf = MemoryMappedFile.CreateFromFile(path);
// Setting the pointer at the start of the file
_pointer = 0;
// We create the accessor to read the file
_accessor = _mmf.CreateViewAccessor();
// We mark the file as open
_open = true;
}
catch (Exception exInner){..}
}
The file that I am mapping is a UTF-8 ASCII file. Nothing weird.
What I have done:
var list = new List<string>();
// String to store what we read
string trace = string.Empty;
// We read the byte of the pointer
b = _accessor.ReadByte(_pointer);
int tracei = 0;
var traceb = new byte[2048];
// If b is different from 0 we have some data to read
if (b != 0)
{
while (b != 0)
{
// Check if it's an endline
if (b == '\n')
{
trace = Encoding.UTF8.GetString(traceb, 0, tracei - 1);
list.Add(trace);
trace = string.Empty;
tracei = 0;
_lastIndex++;
}
else
{
traceb[tracei++] = b;
}
// Advance and read
b = _accessor.ReadByte(++_pointer);
}
}
The code is difficult to read for humans and is not very efficient. How can I improve it?
You are re-inventing StreamReader, it does exactly what you do. The odds that you really want a memory-mapped file are quite low, they take a lot of virtual memory which you only can make pay off if you repeatedly read the same file at different offsets. Which is very unlikely, text files must be read sequentially since you don't know how long the lines are.
Which makes this one line of code the probable best replacement for what you posted:
string[] trace = System.IO.File.ReadAllLines(path);

Categories

Resources