This question already has answers here:
How to read a text file reversely with iterator in C#
(11 answers)
Closed 1 year ago.
I'm trying to figure out how to either Record which line I'm in, for example, line = 32, allowing me to just add line-- in the previous record button event or find a better alternative.
I currently have my form setup and working where if I click on "Next Record" button, the file increments to the next line and displays the cells correctly within their associated textboxes, but how do I create a button that goes to the previous line in the .csv file?
StreamReader csvFile;
public GP_Appointment_Manager()
{
InitializeComponent();
}
private void buttonOpenFile_Click(object sender, EventArgs e)
{
try
{
csvFile = new StreamReader("patients_100.csv");
// Read First line and do nothing
string line;
if (ReadPatientLineFromCSV(out line))
{
// Read second line, first patient line and populate form
ReadPatientLineFromCSV(out line);
PopulateForm(line);
}
}
catch (Exception ex)
{
MessageBox.Show(ex.Message);
}
}
private bool ReadPatientLineFromCSV(out string line)
{
bool result = false;
line = "";
if ((csvFile != null) && (!csvFile.EndOfStream))
{
line = csvFile.ReadLine();
result = true;
}
else
{
MessageBox.Show("File has not been opened. Please open file before reading.");
}
return result;
}
private void PopulateForm(string patientDetails)
{
string[] patient = patientDetails.Split(',');
//Populates ID
textBoxID.Text = patient[0];
//Populates Personal
comboBoxSex.SelectedIndex = (patient[1] == "M") ? 0 : 1;
dateTimePickerDOB.Value = DateTime.Parse(patient[2]);
textBoxFirstName.Text = patient[3];
textBoxLastName.Text = patient[4];
//Populates Address
textboxAddress.Text = patient[5];
textboxCity.Text = patient[6];
textboxCounty.Text = patient[7];
textboxTelephone.Text = patient[8];
//Populates Kin
textboxNextOfKin.Text = patient[9];
textboxKinTelephone.Text = patient[10];
}
Here's the code for the "Next Record" Button
private void buttonNextRecord_Click(object sender, EventArgs e)
{
string patientInfo;
if (ReadPatientLineFromCSV(out patientInfo))
{
PopulateForm(patientInfo);
}
}
Now, this is some sort of exercise. This class uses the standard StreamReader with a couple of modification, to implement simple move-forward/step-back functionalities.
It also allows to associate an array/list of Controls with the data read from a CSV-like file format. Note that this is not a general-purpose CSV reader; it just splits a string in parts, using a separator that can be specified calling its AssociateControls() method.
The class has 3 constructors:
(1) public LineReader(string filePath)
(2) public LineReader(string filePath, bool hasHeader)
(3) public LineReader(string filePath, bool hasHeader, Encoding encoding)
The source file has no Header in the first line and the text Encoding should be auto-detected
Same, but the first line of the file contain the Header if hasHeader = true
Used to specify an Encoding, if the automatic discovery cannot identify it correctly.
The positions of the lines of text are stored in a Dictionary<long, long>, where the Key is the line number and Value is the starting position of the line.
This has some advantages: no strings are stored anywhere, the file is indexed while reading it but you could use a background task to complete the indexing (this feature is not implemented here, maybe later...).
The disadvantage is that the Dictionary takes space in memory. If the file is very large (just the number of lines counts, though), it may become a problem. To test.
A note about the Encoding:
The text encoding auto-detection is reliable enough only if the Encoding is not set to the default one (UTF-8). The code here, if you don't specify an Encoding, sets it to Encoding.ASCII. When the first line is read, the automatic feature tries to determine the actual encoding. It usually gets it right.
In the default StreamReader implementation, if we specify Encoding.UTF8 (or none, which is the same) and the text encoding is ASCII, the encoder will use the default (Encoding.UTF8) encoding, since UTF-8 maps to ASCII gracefully.
However, when this is the case, [Encoding].GetPreamble() will return the UTF-8 BOM (3 bytes), compromising the calculation of the current position in the underlying stream.
To associate controls with the data read, you just need to pass a collection of controls to the LineReader.AssociateControls() method.
This will map each control to the data field in the same position.
To skip a data field, specify null instead of a control reference.
The visual example is built using a CSV file with this structure:
(Note: this data is generated using an automated on-line tool)
seq;firstname;lastname;age;street;city;state;zip;deposit;color;date
---------------------------------------------------------------------------
1;Harriett;Gibbs;62;Segmi Center;Ebanavi;ID;57854;$4444.78;WHITE;05/15/1914
2;Oscar;McDaniel;49;Kulak Drive;Jetagoz;IL;57631;$5813.94;RED;02/11/1918
3;Winifred;Olson;29;Wahab Mill;Ucocivo;NC;46073;$2002.70;RED;08/11/2008
I skipped the seq and color fields, passing this array of Controls:
LineReader lineReader = null;
private void btnOpenFile_Click(object sender, EventArgs e)
{
string filePath = Path.Combine(Application.StartupPath, #"sample.csv");
lineReader = new LineReader(filePath, true);
string header = lineReader.HeaderLine;
Control[] controls = new[] {
null, textBox1, textBox2, textBox3, textBox4, textBox5,
textBox6, textBox9, textBox7, null, textBox8 };
lineReader.AssociateControls(controls, ";");
}
The null entries correspond to the data fields that are not considered.
Visual sample of the functionality:
using System.Collections.Generic;
using System.IO;
using System.Linq;
using System.Text;
using System.Windows.Forms;
class LineReader : IDisposable
{
private StreamReader reader = null;
private Dictionary<long, long> positions;
private string m_filePath = string.Empty;
private Encoding m_encoding = null;
private IEnumerable<Control> m_controls = null;
private string m_separator = string.Empty;
private bool m_associate = false;
private long m_currentPosition = 0;
private bool m_hasHeader = false;
public LineReader(string filePath) : this(filePath, false) { }
public LineReader(string filePath, bool hasHeader) : this(filePath, hasHeader, Encoding.ASCII) { }
public LineReader(string filePath, bool hasHeader, Encoding encoding)
{
if (!File.Exists(filePath)) {
throw new FileNotFoundException($"The file specified: {filePath} was not found");
}
this.m_filePath = filePath;
m_hasHeader = hasHeader;
CurrentLineNumber = 0;
reader = new StreamReader(this.m_filePath, encoding, true);
CurrentLine = reader.ReadLine();
m_encoding = reader.CurrentEncoding;
m_currentPosition = m_encoding.GetPreamble().Length;
positions = new Dictionary<long, long>() { [0]= m_currentPosition };
if (hasHeader) { this.HeaderLine = CurrentLine = this.MoveNext(); }
}
public string HeaderLine { get; private set; }
public string CurrentLine { get; private set; }
public long CurrentLineNumber { get; private set; }
public string MoveNext()
{
string read = reader.ReadLine();
if (string.IsNullOrEmpty(read)) return this.CurrentLine;
CurrentLineNumber += 1;
if ((positions.Count - 1) < CurrentLineNumber) {
AdjustPositionToLineFeed();
positions.Add(CurrentLineNumber, m_currentPosition);
}
else {
m_currentPosition = positions[CurrentLineNumber];
}
this.CurrentLine = read;
if (m_associate) this.Associate();
return read;
}
public string MovePrevious()
{
if (CurrentLineNumber == 0 || (CurrentLineNumber == 1 && m_hasHeader)) return this.CurrentLine;
CurrentLineNumber -= 1;
m_currentPosition = positions[CurrentLineNumber];
reader.BaseStream.Position = m_currentPosition;
reader.DiscardBufferedData();
this.CurrentLine = reader.ReadLine();
if (m_associate) this.Associate();
return this.CurrentLine;
}
private void AdjustPositionToLineFeed()
{
long linePos = m_currentPosition + m_encoding.GetByteCount(this.CurrentLine);
long prevPos = reader.BaseStream.Position;
reader.BaseStream.Position = linePos;
byte[] buffer = new byte[4];
reader.BaseStream.Read(buffer, 0, buffer.Length);
char[] chars = m_encoding.GetChars(buffer).Where(c => c.Equals((char)10) || c.Equals((char)13)).ToArray();
m_currentPosition = linePos + m_encoding.GetByteCount(chars);
reader.BaseStream.Position = prevPos;
}
public void AssociateControls(IEnumerable<Control> controls, string separator)
{
m_controls = controls;
m_separator = separator;
m_associate = true;
if (!string.IsNullOrEmpty(this.CurrentLine)) Associate();
}
private void Associate()
{
string[] values = this.CurrentLine.Split(new[] { m_separator }, StringSplitOptions.None);
int associate = 0;
m_controls.ToList().ForEach(c => {
if (c != null) c.Text = values[associate];
associate += 1;
});
}
public override string ToString() =>
$"File Path: {m_filePath} Encoding: {m_encoding.BodyName} CodePage: {m_encoding.CodePage}";
public void Dispose()
{
this.Dispose(true);
GC.SuppressFinalize(this);
}
protected virtual void Dispose(bool disposing)
{
if (disposing) { reader?.Dispose(); }
}
}
General approach is the following:
Add a text file input.txt like this
line 1
line 2
line 3
and set Copy to Output Directory property to Copy if newer
Create extension methods for StreamReader
public static class StreamReaderExtensions
{
public static bool TryReadNextLine(this StreamReader reader, out string line)
{
var isAvailable = reader != null &&
!reader.EndOfStream;
line = isAvailable ? reader.ReadLine() : null;
return isAvailable;
}
public static bool TryReadPrevLine(this StreamReader reader, out string line)
{
var stream = reader.BaseStream;
var encoding = reader.CurrentEncoding;
var bom = GetBOM(encoding);
var isAvailable = reader != null &&
stream.Position > 0;
if(!isAvailable)
{
line = null;
return false;
}
var buffer = new List<byte>();
var str = string.Empty;
stream.Position++;
while (!str.StartsWith(Environment.NewLine))
{
stream.Position -= 2;
buffer.Insert(0, (byte)stream.ReadByte());
var reachedBOM = buffer.Take(bom.Length).SequenceEqual(bom);
if (reachedBOM)
buffer = buffer.Skip(bom.Length).ToList();
str = encoding.GetString(buffer.ToArray());
if (reachedBOM)
break;
}
stream.Position--;
line = str.Trim(Environment.NewLine.ToArray());
return true;
}
private static byte[] GetBOM(Encoding encoding)
{
if (encoding.Equals(Encoding.UTF7))
return new byte[] { 0x2b, 0x2f, 0x76 };
if (encoding.Equals(Encoding.UTF8))
return new byte[] { 0xef, 0xbb, 0xbf };
if (encoding.Equals(Encoding.Unicode))
return new byte[] { 0xff, 0xfe };
if (encoding.Equals(Encoding.BigEndianUnicode))
return new byte[] { 0xfe, 0xff };
if (encoding.Equals(Encoding.UTF32))
return new byte[] { 0, 0, 0xfe, 0xff };
return new byte[0];
}
}
And use it like this:
using (var reader = new StreamReader("input.txt"))
{
string na = "N/A";
string line;
for (var i = 0; i < 4; i++)
{
var isAvailable = reader.TryReadNextLine(out line);
Console.WriteLine($"Next line available: {isAvailable}. Line: {(isAvailable ? line : na)}");
}
for (var i = 0; i < 4; i++)
{
var isAvailable = reader.TryReadPrevLine(out line);
Console.WriteLine($"Prev line available: {isAvailable}. Line: {(isAvailable ? line : na)}");
}
}
The result is:
Next line available: True. Line: line 1
Next line available: True. Line: line 2
Next line available: True. Line: line 3
Next line available: False. Line: N/A
Prev line available: True. Line: line 3
Prev line available: True. Line: line 2
Prev line available: True. Line: line 1
Prev line available: False. Line: N/A
GetBOM is based on this.
Related
i have a lass like this
public class Params
{
public string FirstName;
public string SecondName;
public string Path;
public long Count;
public double TotalSize;
public long Time;
public bool HasError;
public Params()
{
}
public Params(string firstName, string secondName, string path, long count, double totalSize, long time, bool hasError)
{
FirstName = firstName;
SecondName = secondName;
Path = path;
Count = count;
TotalSize = totalSize;
Time = time;
HasError = hasError;
}
}
I have the json class like this:
public static class FileWriterJson
{
public static void WriteToJsonFile<T>(string filePath, T objectToWrite, bool append = true) where T : new()
{
TextWriter writer = null;
try
{
var contentsToWriteToFile = JsonConvert.SerializeObject(objectToWrite);
writer = new StreamWriter(filePath, append);
writer.Write(contentsToWriteToFile);
}
finally
{
if (writer != null)
writer.Close();
}
}
public static T ReadFromJsonFile<T>(string filePath) where T : new()
{
TextReader reader = null;
try
{
reader = new StreamReader(filePath);
var fileContents = reader.ReadToEnd();
return JsonConvert.DeserializeObject<T>(fileContents);
}
finally
{
if (reader != null)
reader.Close();
}
}
}
The main program is like this
var Params1 = new Params("Test", "TestSecondName", "Mypath",7, 65.0, 0, false);
FileWriterJson.WriteToJsonFile<Params>("C:\\Users\\myuser\\bin\\Debug\\test1.json", Params1);
FileWriterJson.WriteToJsonFile<Params>("C:\\Users\\myuser\\bin\\Debug\\test1.json", Params1);
This is mine test1.json:
{"FirstName":"Test","SecondName":"TestSecondName","Path":"Mypath","Count":7,"TotalSize":65.0,"Time":0,"HasError":false}{"FirstName":"Test","SecondName":"TestSecondName","Path":"Mypath","Count":7,"TotalSize":65.0,"Time":0,"HasError":false}
As you can see i have two json objects written in the file.
What i need to do is:
void ReadAllObjects(){
//read the json object from the file
// count the json objects - suppose there are two objects
for (int i=0;i<2;i++){
//do some processing with the first object
// if processing is successfull delete the object (i don't know how to delete the particular json object from file)
} }
but when i read like this
var abc =FileWriterJson.ReadFromJsonFile<Params>(
"C:\\Users\\myuser\\bin\\Debug\\test1.json");
i get the following error:
"Additional text encountered after finished reading JSON content: {.
Path '', line 1, position 155."
Then i used the following code to read the JSON file
public static IEnumerable<T> FromDelimitedJson<T>(TextReader reader, JsonSerializerSettings settings = null)
{
using (var jsonReader = new JsonTextReader(reader) { CloseInput = false, SupportMultipleContent = true })
{
var serializer = JsonSerializer.CreateDefault(settings);
while (jsonReader.Read())
{
if (jsonReader.TokenType == JsonToken.Comment)
continue;
yield return serializer.Deserialize<T>(jsonReader);
}
}
}
}
Which worked fine for me.
Now i need following suggestion:
1> when i put my test1.json data in https://jsonlint.com/ it says Error:
Parse error on line 9:
..."HasError": false} { "FirstName": "Tes
----------------------^
Expecting 'EOF', '}', ',', ']', got '{'
should i write into file in some other way.
2>Is there any better of doing this.
You are writing each object out individually to the file.
But what you are creating is not a valid JSON file, just a text file with individual JSON objects.
To make it valid JSON, then you need to put the objects into an array or list and then save this to the file.
var Params1 = new Params("Test", "TestFirstName", "Mypath",7, 65.0, 0, false);
var Params2 = new Params("Test 2", "TestSecondName", "Mypath",17, 165.0, 10, false);
List<Params> paramsList = new List<Params>();
paramsList .Add(Params1);
paramsList .Add(Params2);
FileWriterJson.WriteToJsonFile<List<Params>>("C:\\Users\\myuser\\bin\\Debug\\test1.json", paramsList);
FileWriterJson.WriteToJsonFile("C:\Users\myuser\bin\Debug\test1.json", Params1);
Then you should be able to read it in OK. Don't forget to read in a List<Params>
Here is a sample of my code.
Here I recieve a string variable from another page.
protected override void OnNavigatedTo(System.Windows.Navigation.NavigationEventArgs e)
{
base.OnNavigatedTo(e);
string newparameter = this.NavigationContext.QueryString["search"];
weareusingxml();
displayResults(newparameter);
}
private void displayResults(string search)
{
bool flag = false;
try
{
using (IsolatedStorageFile myIsolatedStorage = IsolatedStorageFile.GetUserStoreForApplication())
{
using (IsolatedStorageFileStream stream = myIsolatedStorage.OpenFile("People.xml", FileMode.Open))
{
XmlSerializer serializer = new XmlSerializer(typeof(List<Person>));
List<Person> data = (List<Person>)serializer.Deserialize(stream);
List<Result> results = new List<Result>();
for (int i = 0; i < data.Count; i++)
{
string temp1 = data[i].name.ToUpper();
string temp2 = "*" + search.ToUpper() + "*";
if (temp1 == temp2)
{
results.Add(new Result() {name = data[i].name, gender = data[i].gender, pronouciation = data[i].pronouciation, definition = data[i].definition, audio = data[i].audio });
flag = true;
}
}
this.listBox.ItemsSource = results;
}
catch
{
textBlock1.Text = "error loading page";
}
if(!flag)
{
textBlock1.Text = "no matching results";
}
}
Nothing is loaded into the list when the code is run, I just get the message "no matching results".
Looks like you are trying to do a contains search (my guess based on your addition of the * around the search string. You can remove the '*' and do a string.Contains match.
Try this.
string temp1 = data[i].name.ToUpper();
string temp2 = search.ToUpper()
if (temp1.Contains(temp2))
{
It looks like you are trying to check if one string contains another (ie substring match) and not if they are equal.
In C#, you do this like this:
haystack = "Applejuice box";
needle = "juice";
if (haystack.Contains(needle))
{
// Match
}
Or, in your case (and skip the * you added to the string temp2)
if (temp1.Contains(temp2))
{
// add them to the list
}
Have you checked to make sure data.Count > 0?
I'm using a DataGrid bound to an ObservableCollection source to display two columns, a file name and a number that I get from analysing the file.
ObservableCollection<SearchFile> fileso;
//...
private void worker_DoWork(object sender, DoWorkEventArgs e)
{
/* Get all the files to check. */
int dirCount = searchFoldersListView.Items.Count;
List<string> allFiles = new List<string>();
for(int i = 0; i < dirCount; i++)
{
try
{
allFiles.AddRange(Directory.GetFiles(searchFoldersListView.Items[i].ToString(), "*.txt").ToList());
allFiles.AddRange(Directory.GetFiles(searchFoldersListView.Items[i].ToString(), "*.pdf").ToList());
}
catch
{ /* stuff */ }
}
/* Clear the collection and populate it with unchecked files again, refreshing the grid. */
this.Dispatcher.Invoke(new Action(delegate
{
fileso.Clear();
foreach(var file in allFiles)
{
SearchFile sf = new SearchFile() { path=file, occurrences=0 };
fileso.Add(sf);
}
}));
/* Check the files. */
foreach(var file in allFiles)
{
this.Dispatcher.Invoke(new Action(delegate
{
int occurences;
bool result = FileSearcher.searchFile(file, searchTermTextBox.Text, out occurences);
fileso.AddOccurrences(file, occurences); // This is an extension method that alters the collection by finding the relevant item and changing it.
}));
}
}
//...
public static void AddOccurrences(this ObservableCollection<SearchFile> collection, string path, int occurrences)
{
for(int i = 0; i < collection.Count; i++)
{
if(collection[i].path == path)
{
collection[i].occurrences = occurrences;
break;
}
}
}
//...
public static bool searchTxtFile(string path, string term, out int occurences)
{
string contents = File.ReadAllText(path);
occurences = Regex.Matches(contents, term, RegexOptions.IgnoreCase).Count;
if(occurences>0)
return true;
return false;
}
public static bool searchDocxFile(string path, string term, out int occurences)
{
occurences = 0;
string tempPath = Path.GetTempPath();
string rawName = Path.GetFileNameWithoutExtension(path);
string destFile = System.IO.Path.Combine(tempPath, rawName + ".zip");
System.IO.File.Copy(path, destFile, true);
using(ZipFile zf = new ZipFile(destFile))
{
ZipEntry ze = zf.GetEntry("word/document.xml");
if(ze != null)
{
using(Stream zipstream = zf.GetInputStream(ze))
{
using(StreamReader sr = new StreamReader(zipstream))
{
string docContents = sr.ReadToEnd();
string rawText = Extensions.StripTagsRegexCompiled(docContents);
occurences = Regex.Matches(rawText, term, RegexOptions.IgnoreCase).Count;
if(occurences>0)
return true;
return false;
}
}
}
}
return false;
}
public static bool searchFile(string path, string term, out int occurences)
{
occurences = 0;
string ext = System.IO.Path.GetExtension(path);
switch(ext)
{
case ".txt":
return searchTxtFile(path, term, out occurences);
//case ".doc":
// return searchDocFile(path, term, out occurences);
case ".docx":
return searchDocxFile(path, term, out occurences);
}
return false;
}
But the problem is that sometimes, when I hit the update button (which starts the worker with the the do_work method above), some of the time, I'm getting random zeroes in the number column instead of the correct number. Why is that? I'm assuming it's because there's some problem with updating the number column twice, with sometimes the first zeroing getting applied after the actual update, but I'm not sure about the details.
I think this is a case of an access to a modified closure
/* Check the files. */
foreach(var file in allFiles)
{
var fileTmp = file; // avoid access to modified closure
this.Dispatcher.Invoke(new Action(delegate
{
int occurences;
bool result = FileSearcher.searchFile(fileTmp, searchTermTextBox.Text, out occurences);
fileso.AddOccurrences(fileTmp, occurences); // This is an extension method that alters the collection by finding the relevant item and changing it.
}));
}
Basically what's happening is that you are passing the file variable to a lambda expression, but file will be modified by the foreach loop before the action is actually invoked, using a temp variable to hold file should solve this.
I'm trying to get a newline into a text node using XText from the Linq XML namespace.
I have a string which contains newline characters however I need to work out how to convert these to entity characters (i.e.
) rather than just having them appear in the XML as new lines.
XElement element = new XElement( "NodeName" );
...
string example = "This is a string\nWith new lines in it\n";
element.Add( new XText( example ) );
The XElement is then written out using an XmlTextWriter which results in the file containing the newline rather than an entity replacement.
Has anyone come across this problem and found a solution?
EDIT:
The problem manifests itself when I load the XML into EXCEL which doesn't seem to like the newline character but which accepts the entity replacement. The result is that newlines aren't showing in EXCEL unless I replace them with
Nick.
Cheating:
XmlWriterSettings settings = new XmlWriterSettings();
settings.Indent = true;
settings.CheckCharacters = false;
settings.NewLineChars = "
";
XmlWriter writer = XmlWriter.Create(..., settings);
element.WriteTo(writer);
writer.Flush();
UPDATE:
Complete program
using System;
using System.Xml;
using System.Xml.Linq;
namespace ConsoleApplication1
{
class Program
{
static void Main(string[] args)
{
XElement element = new XElement( "NodeName" );
string example = "This is a string\nWith new lines in it\n";
element.Add( new XText( example ) );
XmlWriterSettings settings = new XmlWriterSettings();
settings.Indent = true;
settings.CheckCharacters = false;
settings.NewLineChars = "
";
XmlWriter writer = XmlWriter.Create(Console.Out, settings);
element.WriteTo(writer);
writer.Flush();
}
}
}
OUTPUT:
C:\Users\...\\ConsoleApplication1\bin\Release>ConsoleApplication1.exe
<?xml version="1.0" encoding="ibm850"?>
<NodeName>This is a string
With new lines in it
</NodeName>
To any standard XML parser there is no difference between the entity
and a new line character, as they are one and the same thing.
To illustrate this the following code shows that they are the same thing:
string s1 = "<root>Test
Test2</root>";
string s2 = "<root>Test\nTest2</root>";
XDocument doc1 = XDocument.Parse(s1);
XDocument doc2 = XDocument.Parse(s2);
Console.WriteLine(doc1.ToString());
Console.WriteLine(doc2.ToString());
It's the XmlTextWriter which is responsible for outputting escaped entities. So if you do this, for example:
using (XmlTextWriter w = new XmlTextWriter("test.xml", Encoding.UTf8))
{
w.WriteString("");
}
You will also get an escaped ampersand output in text.xml , which you don't want. You would like to keep the sequence raw, as is.
The solution I propose is to create a new StreamWriter implementation capable of detecting an escaped string like "":
// A StreamWriter that does not escape
characters
public class NonXmlEscapingStreamWriter : StreamWriter
{
private const string AmpToken = "amp";
private int _bufferState = 0; // used to keep state
// add other ctors overloads if needed
public NonXmlEscapingStreamWriter(string path)
: base(path)
{
}
// NOTE this code is based on the assumption that StreamWriter
// only overrides these 4 Write functions, which is true today but could change in the future
// and also on the assumption that the XmlTextWrite writes escaped values in a specific WriteXX calls sequence
public override void Write(char value)
{
if (value == '&')
{
if (_bufferState == 0)
{
_bufferState++;
return; // hold it
}
else
{
_bufferState = 0;
}
}
else if (value == ';')
{
if (_bufferState > 1)
{
_bufferState++;
return;
}
else
{
Write('&'); // release what's been held
Write(AmpToken);
_bufferState = 0;
}
}
else if (value == '\n') // detect non escaped \n
{
base.Write("
");
return;
}
base.Write(value);
}
public override void Write(string value)
{
if (_bufferState > 0)
{
if (value == AmpToken)
{
_bufferState++;
return; // hold it
}
else
{
Write('&'); // release what's been held
_bufferState = 0;
}
}
base.Write(value);
}
public override void Write(char[] buffer, int index, int count)
{
if (_bufferState > 2)
{
_bufferState = 0;
base.Write('&'); // release this anyway
string replace;
if ((buffer != null) && ((replace = GetReplaceLength(buffer, index, count)) != null))
{
base.Write(replace);
base.Write(buffer, index + replace.Length, count - replace.Length);
return;
}
else
{
base.Write(AmpToken); // release this
base.Write(';'); // release this
}
}
base.Write(buffer, index, count);
}
public override void Write(char[] buffer)
{
Write(buffer, 0, buffer != null ? buffer.Length : 0);
}
private string GetReplaceLength(char[] buffer, int index, int count)
{
// this is specific to the 10 character but could be adapted
const string token = "#10;";
if ((index + count) < token.Length)
return null;
// we test the char array to avoid string allocations
for(int i = 0; i < token.Length; i++)
{
if (buffer[index + i] != token[i])
return null;
}
return token;
}
}
And you can use it like this:
using (XmlTextWriter w = new XmlTextWriter(new NonXmlEscapingStreamWriter("test.xml")))
{
element.WriteTo(w);
}
NOTE: Although it is capable of detecting lonely \n sequences, I suggest you ensure all \n are actually escaped in your original text, so, you need to replace \n by before you actually output xml, like this:
string example = "This is a stringWith new lines in it";
Is there a way to read ahead one line to test if the next line contains specific tag data?
I'm dealing with a format that has a start tag but no end tag.
I would like to read a line add it to a structure then test the line below to make sure it not a new "node" and if it isn't keep adding if it is close off that struct and make a new one
the only solution i can think of is to have two stream readers going at the same time kinda suffling there way along lock step but that seems wastefull (if it will even work)
i need something like peek but peekline
The problem is the underlying stream may not even be seekable. If you take a look at the stream reader implementation it uses a buffer so it can implement TextReader.Peek() even if the stream is not seekable.
You could write a simple adapter that reads the next line and buffers it internally, something like this:
public class PeekableStreamReaderAdapter
{
private StreamReader Underlying;
private Queue<string> BufferedLines;
public PeekableStreamReaderAdapter(StreamReader underlying)
{
Underlying = underlying;
BufferedLines = new Queue<string>();
}
public string PeekLine()
{
string line = Underlying.ReadLine();
if (line == null)
return null;
BufferedLines.Enqueue(line);
return line;
}
public string ReadLine()
{
if (BufferedLines.Count > 0)
return BufferedLines.Dequeue();
return Underlying.ReadLine();
}
}
You could store the position accessing StreamReader.BaseStream.Position, then read the line next line, do your test, then seek to the position before you read the line:
// Peek at the next line
long peekPos = reader.BaseStream.Position;
string line = reader.ReadLine();
if (line.StartsWith("<tag start>"))
{
// This is a new tag, so we reset the position
reader.BaseStream.Seek(pos);
}
else
{
// This is part of the same node.
}
This is a lot of seeking and re-reading the same lines. Using some logic, you may be able to avoid this altogether - for instance, when you see a new tag start, close out the existing structure and start a new one - here's a basic algorithm:
SomeStructure myStructure = null;
while (!reader.EndOfStream)
{
string currentLine = reader.ReadLine();
if (currentLine.StartsWith("<tag start>"))
{
// Close out existing structure.
if (myStructure != null)
{
// Close out the existing structure.
}
// Create a new structure and add this line.
myStructure = new Structure();
// Append to myStructure.
}
else
{
// Add to the existing structure.
if (myStructure != null)
{
// Append to existing myStructure
}
else
{
// This means the first line was not part of a structure.
// Either handle this case, or throw an exception.
}
}
}
Why the difficulty? Return the next line, regardless. Check if it is a new node, if not, add it to the struct. If it is, create a new struct.
// Not exactly C# but close enough
Collection structs = new Collection();
Struct struct;
while ((line = readline()) != null)) {
if (IsNode(line)) {
if (struct != null) structs.add(struct);
struct = new Struct();
continue;
}
// Whatever processing you need to do
struct.addLine(line);
}
structs.add(struct); // Add the last one to the collection
// Use your structures here
foreach s in structs {
}
Here is what i go so far. I went more of the split route than the streamreader line by line route.
I'm sure there are a few places that are dieing to be more elegant but for right now it seems to be working.
Please let me know what you think
struct INDI
{
public string ID;
public string Name;
public string Sex;
public string BirthDay;
public bool Dead;
}
struct FAM
{
public string FamID;
public string type;
public string IndiID;
}
List<INDI> Individuals = new List<INDI>();
List<FAM> Family = new List<FAM>();
private void button1_Click(object sender, EventArgs e)
{
string path = #"C:\mostrecent.ged";
ParseGedcom(path);
}
private void ParseGedcom(string path)
{
//Open path to GED file
StreamReader SR = new StreamReader(path);
//Read entire block and then plit on 0 # for individuals and familys (no other info is needed for this instance)
string[] Holder = SR.ReadToEnd().Replace("0 #", "\u0646").Split('\u0646');
//For each new cell in the holder array look for Individuals and familys
foreach (string Node in Holder)
{
//Sub Split the string on the returns to get a true block of info
string[] SubNode = Node.Replace("\r\n", "\r").Split('\r');
//If a individual is found
if (SubNode[0].Contains("INDI"))
{
//Create new Structure
INDI I = new INDI();
//Add the ID number and remove extra formating
I.ID = SubNode[0].Replace("#", "").Replace(" INDI", "").Trim();
//Find the name remove extra formating for last name
I.Name = SubNode[FindIndexinArray(SubNode, "NAME")].Replace("1 NAME", "").Replace("/", "").Trim();
//Find Sex and remove extra formating
I.Sex = SubNode[FindIndexinArray(SubNode, "SEX")].Replace("1 SEX ", "").Trim();
//Deterine if there is a brithday -1 means no
if (FindIndexinArray(SubNode, "1 BIRT ") != -1)
{
// add birthday to Struct
I.BirthDay = SubNode[FindIndexinArray(SubNode, "1 BIRT ") + 1].Replace("2 DATE ", "").Trim();
}
// deterimin if there is a death tag will return -1 if not found
if (FindIndexinArray(SubNode, "1 DEAT ") != -1)
{
//convert Y or N to true or false ( defaults to False so no need to change unless Y is found.
if (SubNode[FindIndexinArray(SubNode, "1 DEAT ")].Replace("1 DEAT ", "").Trim() == "Y")
{
//set death
I.Dead = true;
}
}
//add the Struct to the list for later use
Individuals.Add(I);
}
// Start Family section
else if (SubNode[0].Contains("FAM"))
{
//grab Fam id from node early on to keep from doing it over and over
string FamID = SubNode[0].Replace("# FAM", "");
// Multiple children can exist for each family so this section had to be a bit more dynaimic
// Look at each line of node
foreach (string Line in SubNode)
{
// If node is HUSB
if (Line.Contains("1 HUSB "))
{
FAM F = new FAM();
F.FamID = FamID;
F.type = "PAR";
F.IndiID = Line.Replace("1 HUSB ", "").Replace("#","").Trim();
Family.Add(F);
}
//If node for Wife
else if (Line.Contains("1 WIFE "))
{
FAM F = new FAM();
F.FamID = FamID;
F.type = "PAR";
F.IndiID = Line.Replace("1 WIFE ", "").Replace("#", "").Trim();
Family.Add(F);
}
//if node for multi children
else if (Line.Contains("1 CHIL "))
{
FAM F = new FAM();
F.FamID = FamID;
F.type = "CHIL";
F.IndiID = Line.Replace("1 CHIL ", "").Replace("#", "");
Family.Add(F);
}
}
}
}
}
private int FindIndexinArray(string[] Arr, string search)
{
int Val = -1;
for (int i = 0; i < Arr.Length; i++)
{
if (Arr[i].Contains(search))
{
Val = i;
}
}
return Val;
}