XDocument Text Node New Line - c#

I'm trying to get a newline into a text node using XText from the Linq XML namespace.
I have a string which contains newline characters however I need to work out how to convert these to entity characters (i.e.
) rather than just having them appear in the XML as new lines.
XElement element = new XElement( "NodeName" );
...
string example = "This is a string\nWith new lines in it\n";
element.Add( new XText( example ) );
The XElement is then written out using an XmlTextWriter which results in the file containing the newline rather than an entity replacement.
Has anyone come across this problem and found a solution?
EDIT:
The problem manifests itself when I load the XML into EXCEL which doesn't seem to like the newline character but which accepts the entity replacement. The result is that newlines aren't showing in EXCEL unless I replace them with
Nick.

Cheating:
XmlWriterSettings settings = new XmlWriterSettings();
settings.Indent = true;
settings.CheckCharacters = false;
settings.NewLineChars = "
";
XmlWriter writer = XmlWriter.Create(..., settings);
element.WriteTo(writer);
writer.Flush();
UPDATE:
Complete program
using System;
using System.Xml;
using System.Xml.Linq;
namespace ConsoleApplication1
{
class Program
{
static void Main(string[] args)
{
XElement element = new XElement( "NodeName" );
string example = "This is a string\nWith new lines in it\n";
element.Add( new XText( example ) );
XmlWriterSettings settings = new XmlWriterSettings();
settings.Indent = true;
settings.CheckCharacters = false;
settings.NewLineChars = "
";
XmlWriter writer = XmlWriter.Create(Console.Out, settings);
element.WriteTo(writer);
writer.Flush();
}
}
}
OUTPUT:
C:\Users\...\\ConsoleApplication1\bin\Release>ConsoleApplication1.exe
<?xml version="1.0" encoding="ibm850"?>
<NodeName>This is a string
With new lines in it
</NodeName>

To any standard XML parser there is no difference between the entity
and a new line character, as they are one and the same thing.
To illustrate this the following code shows that they are the same thing:
string s1 = "<root>Test
Test2</root>";
string s2 = "<root>Test\nTest2</root>";
XDocument doc1 = XDocument.Parse(s1);
XDocument doc2 = XDocument.Parse(s2);
Console.WriteLine(doc1.ToString());
Console.WriteLine(doc2.ToString());

It's the XmlTextWriter which is responsible for outputting escaped entities. So if you do this, for example:
using (XmlTextWriter w = new XmlTextWriter("test.xml", Encoding.UTf8))
{
w.WriteString("");
}
You will also get an escaped ampersand output in text.xml &#x10, which you don't want. You would like to keep the  sequence raw, as is.
The solution I propose is to create a new StreamWriter implementation capable of detecting an escaped string like "&#x10;":
// A StreamWriter that does not escape
characters
public class NonXmlEscapingStreamWriter : StreamWriter
{
private const string AmpToken = "amp";
private int _bufferState = 0; // used to keep state
// add other ctors overloads if needed
public NonXmlEscapingStreamWriter(string path)
: base(path)
{
}
// NOTE this code is based on the assumption that StreamWriter
// only overrides these 4 Write functions, which is true today but could change in the future
// and also on the assumption that the XmlTextWrite writes escaped values in a specific WriteXX calls sequence
public override void Write(char value)
{
if (value == '&')
{
if (_bufferState == 0)
{
_bufferState++;
return; // hold it
}
else
{
_bufferState = 0;
}
}
else if (value == ';')
{
if (_bufferState > 1)
{
_bufferState++;
return;
}
else
{
Write('&'); // release what's been held
Write(AmpToken);
_bufferState = 0;
}
}
else if (value == '\n') // detect non escaped \n
{
base.Write("
");
return;
}
base.Write(value);
}
public override void Write(string value)
{
if (_bufferState > 0)
{
if (value == AmpToken)
{
_bufferState++;
return; // hold it
}
else
{
Write('&'); // release what's been held
_bufferState = 0;
}
}
base.Write(value);
}
public override void Write(char[] buffer, int index, int count)
{
if (_bufferState > 2)
{
_bufferState = 0;
base.Write('&'); // release this anyway
string replace;
if ((buffer != null) && ((replace = GetReplaceLength(buffer, index, count)) != null))
{
base.Write(replace);
base.Write(buffer, index + replace.Length, count - replace.Length);
return;
}
else
{
base.Write(AmpToken); // release this
base.Write(';'); // release this
}
}
base.Write(buffer, index, count);
}
public override void Write(char[] buffer)
{
Write(buffer, 0, buffer != null ? buffer.Length : 0);
}
private string GetReplaceLength(char[] buffer, int index, int count)
{
// this is specific to the 10 character but could be adapted
const string token = "#10;";
if ((index + count) < token.Length)
return null;
// we test the char array to avoid string allocations
for(int i = 0; i < token.Length; i++)
{
if (buffer[index + i] != token[i])
return null;
}
return token;
}
}
And you can use it like this:
using (XmlTextWriter w = new XmlTextWriter(new NonXmlEscapingStreamWriter("test.xml")))
{
element.WriteTo(w);
}
NOTE: Although it is capable of detecting lonely \n sequences, I suggest you ensure all \n are actually escaped in your original text, so, you need to replace \n by  before you actually output xml, like this:
string example = "This is a stringWith new lines in it";

Related

How To Go Back To Previous Line In .csv? [duplicate]

This question already has answers here:
How to read a text file reversely with iterator in C#
(11 answers)
Closed 1 year ago.
I'm trying to figure out how to either Record which line I'm in, for example, line = 32, allowing me to just add line-- in the previous record button event or find a better alternative.
I currently have my form setup and working where if I click on "Next Record" button, the file increments to the next line and displays the cells correctly within their associated textboxes, but how do I create a button that goes to the previous line in the .csv file?
StreamReader csvFile;
public GP_Appointment_Manager()
{
InitializeComponent();
}
private void buttonOpenFile_Click(object sender, EventArgs e)
{
try
{
csvFile = new StreamReader("patients_100.csv");
// Read First line and do nothing
string line;
if (ReadPatientLineFromCSV(out line))
{
// Read second line, first patient line and populate form
ReadPatientLineFromCSV(out line);
PopulateForm(line);
}
}
catch (Exception ex)
{
MessageBox.Show(ex.Message);
}
}
private bool ReadPatientLineFromCSV(out string line)
{
bool result = false;
line = "";
if ((csvFile != null) && (!csvFile.EndOfStream))
{
line = csvFile.ReadLine();
result = true;
}
else
{
MessageBox.Show("File has not been opened. Please open file before reading.");
}
return result;
}
private void PopulateForm(string patientDetails)
{
string[] patient = patientDetails.Split(',');
//Populates ID
textBoxID.Text = patient[0];
//Populates Personal
comboBoxSex.SelectedIndex = (patient[1] == "M") ? 0 : 1;
dateTimePickerDOB.Value = DateTime.Parse(patient[2]);
textBoxFirstName.Text = patient[3];
textBoxLastName.Text = patient[4];
//Populates Address
textboxAddress.Text = patient[5];
textboxCity.Text = patient[6];
textboxCounty.Text = patient[7];
textboxTelephone.Text = patient[8];
//Populates Kin
textboxNextOfKin.Text = patient[9];
textboxKinTelephone.Text = patient[10];
}
Here's the code for the "Next Record" Button
private void buttonNextRecord_Click(object sender, EventArgs e)
{
string patientInfo;
if (ReadPatientLineFromCSV(out patientInfo))
{
PopulateForm(patientInfo);
}
}
Now, this is some sort of exercise. This class uses the standard StreamReader with a couple of modification, to implement simple move-forward/step-back functionalities.
It also allows to associate an array/list of Controls with the data read from a CSV-like file format. Note that this is not a general-purpose CSV reader; it just splits a string in parts, using a separator that can be specified calling its AssociateControls() method.
The class has 3 constructors:
(1) public LineReader(string filePath)
(2) public LineReader(string filePath, bool hasHeader)
(3) public LineReader(string filePath, bool hasHeader, Encoding encoding)
The source file has no Header in the first line and the text Encoding should be auto-detected
Same, but the first line of the file contain the Header if hasHeader = true
Used to specify an Encoding, if the automatic discovery cannot identify it correctly.
The positions of the lines of text are stored in a Dictionary<long, long>, where the Key is the line number and Value is the starting position of the line.
This has some advantages: no strings are stored anywhere, the file is indexed while reading it but you could use a background task to complete the indexing (this feature is not implemented here, maybe later...).
The disadvantage is that the Dictionary takes space in memory. If the file is very large (just the number of lines counts, though), it may become a problem. To test.
A note about the Encoding:
The text encoding auto-detection is reliable enough only if the Encoding is not set to the default one (UTF-8). The code here, if you don't specify an Encoding, sets it to Encoding.ASCII. When the first line is read, the automatic feature tries to determine the actual encoding. It usually gets it right.
In the default StreamReader implementation, if we specify Encoding.UTF8 (or none, which is the same) and the text encoding is ASCII, the encoder will use the default (Encoding.UTF8) encoding, since UTF-8 maps to ASCII gracefully.
However, when this is the case, [Encoding].GetPreamble() will return the UTF-8 BOM (3 bytes), compromising the calculation of the current position in the underlying stream.
To associate controls with the data read, you just need to pass a collection of controls to the LineReader.AssociateControls() method.
This will map each control to the data field in the same position.
To skip a data field, specify null instead of a control reference.
The visual example is built using a CSV file with this structure:
(Note: this data is generated using an automated on-line tool)
seq;firstname;lastname;age;street;city;state;zip;deposit;color;date
---------------------------------------------------------------------------
1;Harriett;Gibbs;62;Segmi Center;Ebanavi;ID;57854;$4444.78;WHITE;05/15/1914
2;Oscar;McDaniel;49;Kulak Drive;Jetagoz;IL;57631;$5813.94;RED;02/11/1918
3;Winifred;Olson;29;Wahab Mill;Ucocivo;NC;46073;$2002.70;RED;08/11/2008
I skipped the seq and color fields, passing this array of Controls:
LineReader lineReader = null;
private void btnOpenFile_Click(object sender, EventArgs e)
{
string filePath = Path.Combine(Application.StartupPath, #"sample.csv");
lineReader = new LineReader(filePath, true);
string header = lineReader.HeaderLine;
Control[] controls = new[] {
null, textBox1, textBox2, textBox3, textBox4, textBox5,
textBox6, textBox9, textBox7, null, textBox8 };
lineReader.AssociateControls(controls, ";");
}
The null entries correspond to the data fields that are not considered.
Visual sample of the functionality:
using System.Collections.Generic;
using System.IO;
using System.Linq;
using System.Text;
using System.Windows.Forms;
class LineReader : IDisposable
{
private StreamReader reader = null;
private Dictionary<long, long> positions;
private string m_filePath = string.Empty;
private Encoding m_encoding = null;
private IEnumerable<Control> m_controls = null;
private string m_separator = string.Empty;
private bool m_associate = false;
private long m_currentPosition = 0;
private bool m_hasHeader = false;
public LineReader(string filePath) : this(filePath, false) { }
public LineReader(string filePath, bool hasHeader) : this(filePath, hasHeader, Encoding.ASCII) { }
public LineReader(string filePath, bool hasHeader, Encoding encoding)
{
if (!File.Exists(filePath)) {
throw new FileNotFoundException($"The file specified: {filePath} was not found");
}
this.m_filePath = filePath;
m_hasHeader = hasHeader;
CurrentLineNumber = 0;
reader = new StreamReader(this.m_filePath, encoding, true);
CurrentLine = reader.ReadLine();
m_encoding = reader.CurrentEncoding;
m_currentPosition = m_encoding.GetPreamble().Length;
positions = new Dictionary<long, long>() { [0]= m_currentPosition };
if (hasHeader) { this.HeaderLine = CurrentLine = this.MoveNext(); }
}
public string HeaderLine { get; private set; }
public string CurrentLine { get; private set; }
public long CurrentLineNumber { get; private set; }
public string MoveNext()
{
string read = reader.ReadLine();
if (string.IsNullOrEmpty(read)) return this.CurrentLine;
CurrentLineNumber += 1;
if ((positions.Count - 1) < CurrentLineNumber) {
AdjustPositionToLineFeed();
positions.Add(CurrentLineNumber, m_currentPosition);
}
else {
m_currentPosition = positions[CurrentLineNumber];
}
this.CurrentLine = read;
if (m_associate) this.Associate();
return read;
}
public string MovePrevious()
{
if (CurrentLineNumber == 0 || (CurrentLineNumber == 1 && m_hasHeader)) return this.CurrentLine;
CurrentLineNumber -= 1;
m_currentPosition = positions[CurrentLineNumber];
reader.BaseStream.Position = m_currentPosition;
reader.DiscardBufferedData();
this.CurrentLine = reader.ReadLine();
if (m_associate) this.Associate();
return this.CurrentLine;
}
private void AdjustPositionToLineFeed()
{
long linePos = m_currentPosition + m_encoding.GetByteCount(this.CurrentLine);
long prevPos = reader.BaseStream.Position;
reader.BaseStream.Position = linePos;
byte[] buffer = new byte[4];
reader.BaseStream.Read(buffer, 0, buffer.Length);
char[] chars = m_encoding.GetChars(buffer).Where(c => c.Equals((char)10) || c.Equals((char)13)).ToArray();
m_currentPosition = linePos + m_encoding.GetByteCount(chars);
reader.BaseStream.Position = prevPos;
}
public void AssociateControls(IEnumerable<Control> controls, string separator)
{
m_controls = controls;
m_separator = separator;
m_associate = true;
if (!string.IsNullOrEmpty(this.CurrentLine)) Associate();
}
private void Associate()
{
string[] values = this.CurrentLine.Split(new[] { m_separator }, StringSplitOptions.None);
int associate = 0;
m_controls.ToList().ForEach(c => {
if (c != null) c.Text = values[associate];
associate += 1;
});
}
public override string ToString() =>
$"File Path: {m_filePath} Encoding: {m_encoding.BodyName} CodePage: {m_encoding.CodePage}";
public void Dispose()
{
this.Dispose(true);
GC.SuppressFinalize(this);
}
protected virtual void Dispose(bool disposing)
{
if (disposing) { reader?.Dispose(); }
}
}
General approach is the following:
Add a text file input.txt like this
line 1
line 2
line 3
and set Copy to Output Directory property to Copy if newer
Create extension methods for StreamReader
public static class StreamReaderExtensions
{
public static bool TryReadNextLine(this StreamReader reader, out string line)
{
var isAvailable = reader != null &&
!reader.EndOfStream;
line = isAvailable ? reader.ReadLine() : null;
return isAvailable;
}
public static bool TryReadPrevLine(this StreamReader reader, out string line)
{
var stream = reader.BaseStream;
var encoding = reader.CurrentEncoding;
var bom = GetBOM(encoding);
var isAvailable = reader != null &&
stream.Position > 0;
if(!isAvailable)
{
line = null;
return false;
}
var buffer = new List<byte>();
var str = string.Empty;
stream.Position++;
while (!str.StartsWith(Environment.NewLine))
{
stream.Position -= 2;
buffer.Insert(0, (byte)stream.ReadByte());
var reachedBOM = buffer.Take(bom.Length).SequenceEqual(bom);
if (reachedBOM)
buffer = buffer.Skip(bom.Length).ToList();
str = encoding.GetString(buffer.ToArray());
if (reachedBOM)
break;
}
stream.Position--;
line = str.Trim(Environment.NewLine.ToArray());
return true;
}
private static byte[] GetBOM(Encoding encoding)
{
if (encoding.Equals(Encoding.UTF7))
return new byte[] { 0x2b, 0x2f, 0x76 };
if (encoding.Equals(Encoding.UTF8))
return new byte[] { 0xef, 0xbb, 0xbf };
if (encoding.Equals(Encoding.Unicode))
return new byte[] { 0xff, 0xfe };
if (encoding.Equals(Encoding.BigEndianUnicode))
return new byte[] { 0xfe, 0xff };
if (encoding.Equals(Encoding.UTF32))
return new byte[] { 0, 0, 0xfe, 0xff };
return new byte[0];
}
}
And use it like this:
using (var reader = new StreamReader("input.txt"))
{
string na = "N/A";
string line;
for (var i = 0; i < 4; i++)
{
var isAvailable = reader.TryReadNextLine(out line);
Console.WriteLine($"Next line available: {isAvailable}. Line: {(isAvailable ? line : na)}");
}
for (var i = 0; i < 4; i++)
{
var isAvailable = reader.TryReadPrevLine(out line);
Console.WriteLine($"Prev line available: {isAvailable}. Line: {(isAvailable ? line : na)}");
}
}
The result is:
Next line available: True. Line: line 1
Next line available: True. Line: line 2
Next line available: True. Line: line 3
Next line available: False. Line: N/A
Prev line available: True. Line: line 3
Prev line available: True. Line: line 2
Prev line available: True. Line: line 1
Prev line available: False. Line: N/A
GetBOM is based on this.

error in XML document. Unexpected XML declaration. XML declaration must be the first node in the document

There is an error in XML document (8, 20). Inner 1: Unexpected XML declaration. The XML declaration must be the first node in the document, and no white space characters are allowed to appear before it.
OK, I understand this error.
How I get it, however, is what perplexes me.
I create the document with Microsoft's Serialize tool. Then, I turn around and attempt to read it back, again, using Microsoft's Deserialize tool.
I am not in control of writing the XML file in the correct format - that I can see.
Here is the single routine I use to read and write.
private string xmlPath = System.Web.Hosting.HostingEnvironment.MapPath(WebConfigurationManager.AppSettings["DATA_XML"]);
private object objLock = new Object();
public string ErrorMessage { get; set; }
public StoredMsgs Operation(string from, string message, FileAccess access) {
StoredMsgs list = null;
lock (objLock) {
ErrorMessage = null;
try {
if (!File.Exists(xmlPath)) {
var root = new XmlRootAttribute(rootName);
var serializer = new XmlSerializer(typeof(StoredMsgs), root);
if (String.IsNullOrEmpty(message)) {
from = "Code Window";
message = "Created File";
}
var item = new StoredMsg() {
From = from,
Date = DateTime.Now.ToString("s"),
Message = message
};
using (var stream = File.Create(xmlPath)) {
list = new StoredMsgs();
list.Add(item);
serializer.Serialize(stream, list);
}
} else {
var root = new XmlRootAttribute("MessageHistory");
var serializer = new XmlSerializer(typeof(StoredMsgs), root);
var item = new StoredMsg() {
From = from,
Date = DateTime.Now.ToString("s"),
Message = message
};
using (var stream = File.Open(xmlPath, FileMode.Open, FileAccess.ReadWrite)) {
list = (StoredMsgs)serializer.Deserialize(stream);
if ((access == FileAccess.ReadWrite) || (access == FileAccess.Write)) {
list.Add(item);
serializer.Serialize(stream, list);
}
}
}
} catch (Exception error) {
var sb = new StringBuilder();
int index = 0;
sb.AppendLine(String.Format("Top Level Error: <b>{0}</b>", error.Message));
var err = error.InnerException;
while (err != null) {
index++;
sb.AppendLine(String.Format("\tInner {0}: {1}", index, err.Message));
err = err.InnerException;
}
ErrorMessage = sb.ToString();
}
}
return list;
}
Is something wrong with my routine? If Microsoft write the file, it seems to me that it should be able to read it back.
It should be generic enough for anyone to use.
Here is my StoredMsg class:
[Serializable()]
[XmlType("StoredMessage")]
public class StoredMessage {
public StoredMessage() {
}
[XmlElement("From")]
public string From { get; set; }
[XmlElement("Date")]
public string Date { get; set; }
[XmlElement("Message")]
public string Message { get; set; }
}
[Serializable()]
[XmlRoot("MessageHistory")]
public class MessageHistory : List<StoredMessage> {
}
The file it generates doesn't look to me like it has any issues.
I saw the solution here:
Error: The XML declaration must be the first node in the document
But, in that case, it seems someone already had an XML document they wanted to read. They just had to fix it.
I have an XML document created my Microsoft, so it should be read back in by Microsoft.
The problem is that you are adding to the file. You deserialize, then re-serialize to the same stream without rewinding and resizing to zero. This gives you multiple root elements:
<?xml version="1.0"?>
<StoredMessage>
</StoredMessage
<?xml version="1.0"?>
<StoredMessage>
</StoredMessage
Multiple root elements, and multiple XML declarations, are invalid according to the XML standard, thus the .NET XML parser throws an exception in this situation by default.
For possible solutions, see XML Error: There are multiple root elements, which suggests you either:
Enclose your list of StoredMessage elements in some synthetic outer element, e.g. StoredMessageList.
This would require you to load the list of messages from the file, add the new message, and then truncate the file and re-serialize the entire list when adding a single item. Thus the performance may be worse than in your current approach, but the XML will be valid.
When deserializing a file containing concatenated root elements, create an XML writer using XmlReaderSettings.ConformanceLevel = ConformanceLevel.Fragment and iteratively walk through the concatenated root node(s) and deserialize each one individually as shown, e.g., here. Using ConformanceLevel.Fragment allows the reader to parse streams with multiple root elements (although multiple XML declarations will still cause an error to be thrown).
Later, when adding a new element to the end of the file using XmlSerializer, seek to the end of the file and serialize using an XML writer returned from XmlWriter.Create(TextWriter, XmlWriterSettings)
with XmlWriterSettings.OmitXmlDeclaration = true. This prevents output of multiple XML declarations as explained here.
For option #2, your Operation would look something like the following:
private string xmlPath = System.Web.Hosting.HostingEnvironment.MapPath(WebConfigurationManager.AppSettings["DATA_XML"]);
private object objLock = new Object();
public string ErrorMessage { get; set; }
const string rootName = "MessageHistory";
static readonly XmlSerializer serializer = new XmlSerializer(typeof(StoredMessage), new XmlRootAttribute(rootName));
public MessageHistory Operation(string from, string message, FileAccess access)
{
var list = new MessageHistory();
lock (objLock)
{
ErrorMessage = null;
try
{
using (var file = File.Open(xmlPath, FileMode.OpenOrCreate))
{
list.AddRange(XmlSerializerHelper.ReadObjects<StoredMessage>(file, false, serializer));
if (list.Count == 0 && String.IsNullOrEmpty(message))
{
from = "Code Window";
message = "Created File";
}
var item = new StoredMessage()
{
From = from,
Date = DateTime.Now.ToString("s"),
Message = message
};
if ((access == FileAccess.ReadWrite) || (access == FileAccess.Write))
{
file.Seek(0, SeekOrigin.End);
var writerSettings = new XmlWriterSettings
{
OmitXmlDeclaration = true,
Indent = true, // Optional; remove if compact XML is desired.
};
using (var textWriter = new StreamWriter(file))
{
if (list.Count > 0)
textWriter.WriteLine();
using (var xmlWriter = XmlWriter.Create(textWriter, writerSettings))
{
serializer.Serialize(xmlWriter, item);
}
}
}
list.Add(item);
}
}
catch (Exception error)
{
var sb = new StringBuilder();
int index = 0;
sb.AppendLine(String.Format("Top Level Error: <b>{0}</b>", error.Message));
var err = error.InnerException;
while (err != null)
{
index++;
sb.AppendLine(String.Format("\tInner {0}: {1}", index, err.Message));
err = err.InnerException;
}
ErrorMessage = sb.ToString();
}
}
return list;
}
Using the following extension method adapted from Read nodes of a xml file in C#:
public partial class XmlSerializerHelper
{
public static List<T> ReadObjects<T>(Stream stream, bool closeInput = true, XmlSerializer serializer = null)
{
var list = new List<T>();
serializer = serializer ?? new XmlSerializer(typeof(T));
var settings = new XmlReaderSettings
{
ConformanceLevel = ConformanceLevel.Fragment,
CloseInput = closeInput,
};
using (var xmlTextReader = XmlReader.Create(stream, settings))
{
while (xmlTextReader.Read())
{ // Skip whitespace
if (xmlTextReader.NodeType == XmlNodeType.Element)
{
using (var subReader = xmlTextReader.ReadSubtree())
{
var logEvent = (T)serializer.Deserialize(subReader);
list.Add(logEvent);
}
}
}
}
return list;
}
}
Note that if you are going to create an XmlSerializer using a custom XmlRootAttribute, you must cache the serializer to avoid a memory leak.
Sample fiddle.

Xml over tcp without message frame

I have to implement a tcp connection where raw xml data is passed.
Unfortunately there is no message framing, I now this is realy bad, but I have to deal with this...
The Message would look like this:
<?xml version="1.0" encoding="utf-8"?>
<DATA></DATA>
or this
<?xml version="1.0" encoding="utf-8"?>
<DATA />
Now I have to receive messages that could have self closed tags. The message is always the same, it is always like xml description and a data tag with inner xml that is the message content.
So if it would be without self closed tags, this would be easy, but how can I read both?
By the way I am using the TcpListener.
Edit :
Everything is fine if there is no self closed tag.
if (_clientSocket != null)
{
NetworkStream networkStream = _clientSocket.GetStream();
_clientSocket.ReceiveTimeout = 100; // 1000 miliseconds
while (_continueProcess)
{
if (networkStream.DataAvailable)
{
bool isMessageComplete = false;
String messageString = String.Empty;
while (!isMessageComplete)
{
var bytes = new byte[_clientSocket.ReceiveBufferSize];
try
{
int bytesReaded = networkStream.Read(bytes, 0, (int) _clientSocket.ReceiveBufferSize);
if (bytesReaded > 0)
{
var data = Encoding.UTF8.GetString(bytes, 0, bytesReaded);
messageString += data;
if (messageString.IndexOf("<DATA", StringComparison.OrdinalIgnoreCase) > 0 &&
messageString.IndexOf("</DATA", StringComparison.OrdinalIgnoreCase) > 0)
{
isMessageComplete = true;
}
}
}
catch (IOException)
{
// Timeout
}
catch (SocketException)
{
Console.WriteLine("Conection is broken!");
break;
}
}
}
Thread.Sleep(200);
} // while ( _continueProcess )
networkStream.Close();
_clientSocket.Close();
}
Edit 2 (30.03.2015 12:00)
Unfortunately it is not possible to use some kind of message frame.
So I ended up to use this part of code (DATA is my root node):
if (_clientSocket != null)
{
NetworkStream networkStream = _clientSocket.GetStream();
_clientSocket.ReceiveTimeout = 100;
string data = string.Empty;
while (_continueProcess)
{
try
{
if (networkStream.DataAvailable)
{
Stopwatch sw = new Stopwatch();
sw.Start();
var bytes = new byte[_clientSocket.ReceiveBufferSize];
int completeXmlLength = 0;
int bytesReaded = networkStream.Read(bytes, 0, (int) _clientSocket.ReceiveBufferSize);
if (bytesReaded > 0)
{
message.AddRange(bytes);
data += Encoding.UTF8.GetString(bytes, 0, bytesReaded);
if (data.IndexOf("<?", StringComparison.Ordinal) == 0)
{
if (data.IndexOf("<DATA", StringComparison.Ordinal) > 0)
{
Int32 rootStartPos = data.IndexOf("<DATA", StringComparison.Ordinal);
completeXmlLength += rootStartPos;
var root = data.Substring(rootStartPos);
int rootCloseTagPos = root.IndexOf(">", StringComparison.Ordinal);
Int32 rootSelfClosedTagPos = root.IndexOf("/>", StringComparison.Ordinal);
// If there is an empty tag that is self closed.
if (rootSelfClosedTagPos > 0)
{
string rootTag = root.Substring(0, rootSelfClosedTagPos +1);
// If there is no '>' between the self closed tag and the start of '<DATA'
// the root element is empty.
if (rootTag.IndexOf(">", StringComparison.Ordinal) <= 0)
{
completeXmlLength += rootSelfClosedTagPos;
string messageXmlString = data.Substring(0, completeXmlLength + 1);
data = data.Substring(messageXmlString.Length);
try
{
// parse complete xml.
XDocument xmlDocument = XDocument.Parse(messageXmlString);
}
catch(Exception)
{
// Invalid Xml.
}
continue;
}
}
if (rootCloseTagPos > 0)
{
Int32 rootEndTagStartPos = root.IndexOf("</DATA", StringComparison.Ordinal);
if (rootEndTagStartPos > 0)
{
var endTagString = root.Substring(rootEndTagStartPos);
completeXmlLength += rootEndTagStartPos;
Int32 completeEndPos = endTagString.IndexOf(">", StringComparison.Ordinal);
if (completeEndPos > 0)
{
completeXmlLength += completeEndPos;
string messageXmlString = data.Substring(0, completeXmlLength + 1);
data = data.Substring(messageXmlString.Length);
try
{
// parse complete xml.
XDocument xmlDocument = XDocument.Parse(messageXmlString);
}
catch(Exception)
{
// Invalid Xml.
}
}
}
}
}
}
}
sw.Stop();
string timeElapsed = sw.Elapsed.ToString();
}
}
catch (IOException)
{
data = String.Empty;
}
catch (SocketException)
{
Console.WriteLine("Conection is broken!");
break;
}
}
This code I had use if ther were some kind of message framing, in this case 4 bytes of message length:
if (_clientSocket != null)
{
NetworkStream networkStream = _clientSocket.GetStream();
_clientSocket.ReceiveTimeout = 100;
string data = string.Empty;
while (_continueProcess)
{
try
{
if (networkStream.DataAvailable)
{
Stopwatch sw = new Stopwatch();
sw.Start();
var lengthBytes = new byte[sizeof (Int32)];
int bytesReaded = networkStream.Read(lengthBytes, 0, sizeof (Int32) - offset);
if (bytesReaded > 0)
{
offset += bytesReaded;
message.AddRange(lengthBytes.Take(bytesReaded));
}
if (offset < sizeof (Int32))
{
continue;
}
Int32 length = BitConverter.ToInt32(message.Take(sizeof(Int32)).ToArray(), 0);
message.Clear();
while (length > 0)
{
Int32 bytesToRead = length < _clientSocket.ReceiveBufferSize ? length : _clientSocket.ReceiveBufferSize;
byte[] messageBytes = new byte[bytesToRead];
bytesReaded = networkStream.Read(messageBytes, 0, bytesToRead);
length = length - bytesReaded;
message.AddRange(messageBytes);
}
try
{
string xml = Encoding.UTF8.GetString(message.ToArray());
XDocument xDocument = XDocument.Parse(xml);
}
catch (Exception ex)
{
// Invalid Xml.
}
sw.Stop();
string timeElapsed = sw.Elapsed.ToString();
}
}
catch (IOException)
{
data = String.Empty;
}
catch (SocketException)
{
Console.WriteLine("Conection is broken!");
break;
}
}
Like you can see I wanted to measure the elapsed time, to see witch methode has a better performance. The strange thing is that the methode whith no message framing has an average time of 0,2290 ms, the other methode has an average time of 1,2253 ms.
Can someone explain me why? I thought the one without message framing would be slower...
Hand the NetworkStream to the .NET XML infrastructure. For example create an XmlReader from the NetworkStream.
Unfortunately I did not find a built-in way to easily create an XmlDocument from an XmlReader that has multiple documents in it. It complains about multiple root elements (which is correct). You would need to wrap the XmlReader and make it stop returning nodes when the first document is done. You can do that by keeping track of some state and by looking at the nesting level. When the nesting level is zero again the first document is done.
This is just a raw sketch. I'm pretty sure this will work and it handles all possible XML documents.
No need for this horrible string processing code that you have there. The existing code looks quite slow as well but since this approach is much better it serves no purpose to comment on the perf issues. You need to throw this away.
I had the same problem - 3rd party system sends messages in XML format via TCP but my TCP client application may receive message partially or several messages at once. One of my colleagues proposed very simple and quite generic solution.
The idea is to have a string buffer which should be populated char by char from TCP stream, after each char try to parse buffer content with regular .Net XML parser. If parser throws an exception - continue adding chars to the buffer. Otherwise - message is ready and can be processed by application.
Here is the code:
private object _dataReceiverLock = new object();
private string _messageBuffer;
private Stopwatch _timeSinceLastMessage = new Stopwatch();
private List<string> NormalizeMessage(string rawMsg)
{
lock (_dataReceiverLock)
{
List<string> result = new List<string>();
//following code prevents buffer to store too old information
if (_timeSinceLastMessage.ElapsedMilliseconds > _settings.ResponseTimeout)
{
_messageBuffer = string.Empty;
}
_timeSinceLastMessage.Restart();
foreach (var ch in rawMsg)
{
_messageBuffer += ch;
if (ch == '>')//to avoid extra checks
{
if (IsValidXml(_messageBuffer))
{
result.Add(_messageBuffer);
_messageBuffer = string.Empty;
}
}
}
return result;
}
}
private bool IsValidXml(string xml)
{
try
{
//fastest way to validate XML format correctness
using (XmlTextReader reader = new XmlTextReader(new StringReader(xml)))
{
while (reader.Read()) { }
}
return true;
}
catch
{
return false;
}
}
Few comments:
Need to control lifetime of string buffer, otherwise in case network disconnection old information may stay in the buffer forever
There major problem here is the performance - parsing after every new character is quite slow. So need to add some optimizations, such as parse only after '>' character.
Make sure this method is thread safe, otherwise several threads may flood string buffer with different XML pieces.
The usage is simple:
private void _tcpClient_DataReceived(byte[] data)
{
var rawMsg = Encoding.Unicode.GetString(data);
var normalizedMessages = NormalizeMessage(rawMsg);
foreach (var normalizedMessage in normalizedMessages)
{
//TODO: your logic
}
}

trouble comparing strings in wp7 application

Here is a sample of my code.
Here I recieve a string variable from another page.
protected override void OnNavigatedTo(System.Windows.Navigation.NavigationEventArgs e)
{
base.OnNavigatedTo(e);
string newparameter = this.NavigationContext.QueryString["search"];
weareusingxml();
displayResults(newparameter);
}
private void displayResults(string search)
{
bool flag = false;
try
{
using (IsolatedStorageFile myIsolatedStorage = IsolatedStorageFile.GetUserStoreForApplication())
{
using (IsolatedStorageFileStream stream = myIsolatedStorage.OpenFile("People.xml", FileMode.Open))
{
XmlSerializer serializer = new XmlSerializer(typeof(List<Person>));
List<Person> data = (List<Person>)serializer.Deserialize(stream);
List<Result> results = new List<Result>();
for (int i = 0; i < data.Count; i++)
{
string temp1 = data[i].name.ToUpper();
string temp2 = "*" + search.ToUpper() + "*";
if (temp1 == temp2)
{
results.Add(new Result() {name = data[i].name, gender = data[i].gender, pronouciation = data[i].pronouciation, definition = data[i].definition, audio = data[i].audio });
flag = true;
}
}
this.listBox.ItemsSource = results;
}
catch
{
textBlock1.Text = "error loading page";
}
if(!flag)
{
textBlock1.Text = "no matching results";
}
}
Nothing is loaded into the list when the code is run, I just get the message "no matching results".
Looks like you are trying to do a contains search (my guess based on your addition of the * around the search string. You can remove the '*' and do a string.Contains match.
Try this.
string temp1 = data[i].name.ToUpper();
string temp2 = search.ToUpper()
if (temp1.Contains(temp2))
{
It looks like you are trying to check if one string contains another (ie substring match) and not if they are equal.
In C#, you do this like this:
haystack = "Applejuice box";
needle = "juice";
if (haystack.Contains(needle))
{
// Match
}
Or, in your case (and skip the * you added to the string temp2)
if (temp1.Contains(temp2))
{
// add them to the list
}
Have you checked to make sure data.Count > 0?

Reading a line from a streamreader without consuming?

Is there a way to read ahead one line to test if the next line contains specific tag data?
I'm dealing with a format that has a start tag but no end tag.
I would like to read a line add it to a structure then test the line below to make sure it not a new "node" and if it isn't keep adding if it is close off that struct and make a new one
the only solution i can think of is to have two stream readers going at the same time kinda suffling there way along lock step but that seems wastefull (if it will even work)
i need something like peek but peekline
The problem is the underlying stream may not even be seekable. If you take a look at the stream reader implementation it uses a buffer so it can implement TextReader.Peek() even if the stream is not seekable.
You could write a simple adapter that reads the next line and buffers it internally, something like this:
public class PeekableStreamReaderAdapter
{
private StreamReader Underlying;
private Queue<string> BufferedLines;
public PeekableStreamReaderAdapter(StreamReader underlying)
{
Underlying = underlying;
BufferedLines = new Queue<string>();
}
public string PeekLine()
{
string line = Underlying.ReadLine();
if (line == null)
return null;
BufferedLines.Enqueue(line);
return line;
}
public string ReadLine()
{
if (BufferedLines.Count > 0)
return BufferedLines.Dequeue();
return Underlying.ReadLine();
}
}
You could store the position accessing StreamReader.BaseStream.Position, then read the line next line, do your test, then seek to the position before you read the line:
// Peek at the next line
long peekPos = reader.BaseStream.Position;
string line = reader.ReadLine();
if (line.StartsWith("<tag start>"))
{
// This is a new tag, so we reset the position
reader.BaseStream.Seek(pos);
}
else
{
// This is part of the same node.
}
This is a lot of seeking and re-reading the same lines. Using some logic, you may be able to avoid this altogether - for instance, when you see a new tag start, close out the existing structure and start a new one - here's a basic algorithm:
SomeStructure myStructure = null;
while (!reader.EndOfStream)
{
string currentLine = reader.ReadLine();
if (currentLine.StartsWith("<tag start>"))
{
// Close out existing structure.
if (myStructure != null)
{
// Close out the existing structure.
}
// Create a new structure and add this line.
myStructure = new Structure();
// Append to myStructure.
}
else
{
// Add to the existing structure.
if (myStructure != null)
{
// Append to existing myStructure
}
else
{
// This means the first line was not part of a structure.
// Either handle this case, or throw an exception.
}
}
}
Why the difficulty? Return the next line, regardless. Check if it is a new node, if not, add it to the struct. If it is, create a new struct.
// Not exactly C# but close enough
Collection structs = new Collection();
Struct struct;
while ((line = readline()) != null)) {
if (IsNode(line)) {
if (struct != null) structs.add(struct);
struct = new Struct();
continue;
}
// Whatever processing you need to do
struct.addLine(line);
}
structs.add(struct); // Add the last one to the collection
// Use your structures here
foreach s in structs {
}
Here is what i go so far. I went more of the split route than the streamreader line by line route.
I'm sure there are a few places that are dieing to be more elegant but for right now it seems to be working.
Please let me know what you think
struct INDI
{
public string ID;
public string Name;
public string Sex;
public string BirthDay;
public bool Dead;
}
struct FAM
{
public string FamID;
public string type;
public string IndiID;
}
List<INDI> Individuals = new List<INDI>();
List<FAM> Family = new List<FAM>();
private void button1_Click(object sender, EventArgs e)
{
string path = #"C:\mostrecent.ged";
ParseGedcom(path);
}
private void ParseGedcom(string path)
{
//Open path to GED file
StreamReader SR = new StreamReader(path);
//Read entire block and then plit on 0 # for individuals and familys (no other info is needed for this instance)
string[] Holder = SR.ReadToEnd().Replace("0 #", "\u0646").Split('\u0646');
//For each new cell in the holder array look for Individuals and familys
foreach (string Node in Holder)
{
//Sub Split the string on the returns to get a true block of info
string[] SubNode = Node.Replace("\r\n", "\r").Split('\r');
//If a individual is found
if (SubNode[0].Contains("INDI"))
{
//Create new Structure
INDI I = new INDI();
//Add the ID number and remove extra formating
I.ID = SubNode[0].Replace("#", "").Replace(" INDI", "").Trim();
//Find the name remove extra formating for last name
I.Name = SubNode[FindIndexinArray(SubNode, "NAME")].Replace("1 NAME", "").Replace("/", "").Trim();
//Find Sex and remove extra formating
I.Sex = SubNode[FindIndexinArray(SubNode, "SEX")].Replace("1 SEX ", "").Trim();
//Deterine if there is a brithday -1 means no
if (FindIndexinArray(SubNode, "1 BIRT ") != -1)
{
// add birthday to Struct
I.BirthDay = SubNode[FindIndexinArray(SubNode, "1 BIRT ") + 1].Replace("2 DATE ", "").Trim();
}
// deterimin if there is a death tag will return -1 if not found
if (FindIndexinArray(SubNode, "1 DEAT ") != -1)
{
//convert Y or N to true or false ( defaults to False so no need to change unless Y is found.
if (SubNode[FindIndexinArray(SubNode, "1 DEAT ")].Replace("1 DEAT ", "").Trim() == "Y")
{
//set death
I.Dead = true;
}
}
//add the Struct to the list for later use
Individuals.Add(I);
}
// Start Family section
else if (SubNode[0].Contains("FAM"))
{
//grab Fam id from node early on to keep from doing it over and over
string FamID = SubNode[0].Replace("# FAM", "");
// Multiple children can exist for each family so this section had to be a bit more dynaimic
// Look at each line of node
foreach (string Line in SubNode)
{
// If node is HUSB
if (Line.Contains("1 HUSB "))
{
FAM F = new FAM();
F.FamID = FamID;
F.type = "PAR";
F.IndiID = Line.Replace("1 HUSB ", "").Replace("#","").Trim();
Family.Add(F);
}
//If node for Wife
else if (Line.Contains("1 WIFE "))
{
FAM F = new FAM();
F.FamID = FamID;
F.type = "PAR";
F.IndiID = Line.Replace("1 WIFE ", "").Replace("#", "").Trim();
Family.Add(F);
}
//if node for multi children
else if (Line.Contains("1 CHIL "))
{
FAM F = new FAM();
F.FamID = FamID;
F.type = "CHIL";
F.IndiID = Line.Replace("1 CHIL ", "").Replace("#", "");
Family.Add(F);
}
}
}
}
}
private int FindIndexinArray(string[] Arr, string search)
{
int Val = -1;
for (int i = 0; i < Arr.Length; i++)
{
if (Arr[i].Contains(search))
{
Val = i;
}
}
return Val;
}

Categories

Resources