Baffling IndexOutOfBoundsArray Exception - c#

Can any one of you fine folks tell me what would possibly be causing this C# method to throw an IndexOutOfBounds exception? It would be much appreciated.
public bool PopulateStudents(string path) //decided to return bool if successful reading file.
{
theStudentList = new List<Student>(); //create instance..
string text = null;
FileInfo source = new FileInfo(#path);
bool success = true;
try
{
StreamReader r = source.OpenText();
text = r.ReadLine();
string[] splitText = new string[23];
Student currentStudent = new Student();
while (text != null)
{
splitText = text.Split(',');
currentStudent = new Student(splitText[0], splitText[1], splitText[2]);
for (int i = 0; i < 20; i += 2)
{
currentStudent.EnterGrade(int.Parse(splitText[i + 3]), int.Parse(splitText[i + 4]));
}
currentStudent.CalGrade();
theStudentList.Add(currentStudent);
text = r.ReadLine();
}
r.Close();
}
catch (Exception exc)
{
success = false;
Console.WriteLine(exc.Message);
}
return success;
}
Sample input file:
0199911,Bill,Gates,27,30,56,60,0,30,83,100,57,60,0,30,59,60,0,30,59,60,88,100
0199912,Steve,Jobs,30,30,55,60,25,30,70,100,55,60,25,30,50,60,0,30,58,60,80,100
0199913,Marc,Andresen,30,30,55,60,25,30,70,100,55,60,25,30,50,60,0,30,58,60,80,100
0199914,Larry,Ellisen,30,30,55,60,25,30,70,100,55,60,25,30,50,60,0,30,58,60,80,100
EDIT: All of your answers are great and much appreciated, but as it turns out I just had some empty blank space at the end of my text file. I would like to point out that the responses you provided would fix this problem if I were to keep the blank space at the end. :)

Whenever you read a line with less than 23 commas. Most likely this is an empty line in the end.
You should do
if (splitText.Length<24)
{
WarnLogOrDoSomethingElse(text);
continue;
}
immediately after
splitText = text.Split(',');

The problem is that when you assign the return of text.Split(',') to splitText you are replacing your array of length 23 with an array that has a length equal to the number of tokens that result from splitting the text. You need to check how many items are now in your array before accessing specific items and the loop should probably use splitText.Length as an upper bound.

Well when you say:
splitText = text.Split(', ');
You presume further down your loop that your always going to get 23 elements I suspect this might not always be the case.

Related

How to count strings from text file in C#

in this button click event I am trying to count strings from text file that are the same as in textboxes, then display number of them in label. My problem is that I have no idea how to count them-I'm talking about code inside if-statement. I would really appreciate any help.
private void btnCalculate_Click(object sender, EventArgs e)
{
string openFileName;
using (OpenFileDialog ofd = new OpenFileDialog())
{
if (ofd.ShowDialog() != DialogResult.OK)
{
MessageBox.Show("You did not select OK");
return;
}
openFileName = ofd.FileName;
}
FileStream fs = null;
StreamReader sr = null;
try
{
fs = new FileStream("x", FileMode.Open, FileAccess.Read);
fs.Seek(0, SeekOrigin.Begin);
sr = new StreamReader(fs);
string s = sr.ReadLine();
while (s != null)
{
s = sr.ReadLine();
}
if(s.Contains(tbFirstClub.Text))
{
s.Count = lblResult1.Text; //problem is here
}
else if(s.Contains(tbSecondClub.Text))
{
s.Count = lblResult2.Text; //problem is here
}
}
catch (IOException)
{
MessageBox.Show("Error reading file");
}
catch (Exception)
{
MessageBox.Show("Something went wrong");
}
finally
{
if (sr != null)
{
sr.Close();
}
}
}
Thanks in advance.
s.Count = lblResult1.Text; //problem is here
wait...you are saying here..
you have a variable (s)
and you access its property (Count)
and then set it to the label text(lblResult1.Text)
is that what you're trying to do? because the reverse seems more likely
Using LINQ you can get the number of occurences, like below:
int numOfOcuurences= s.Count( s=> s == tbFirstClub.Text);
lblResult1.Text = numOfOcuurences.ToString();
welcome to Stack Overflow.
I want to point out something you said.
else if(s.Contains(tbSecondClub.Text))
{
s.Count = lblResult2.Text; //problem is here
}
S is our string that we just read from the file.
You're saying assoung S.Count (The length of the string) to text.
I don't think this is what you want. We want to return the number of times specified strings show up in a specified file
Let's refactor this, (And add some tricks along the way).
// Let's create a dictionary to store all of our desired texts, and the counts.
var textAndCounts = new Dictionary<string, int>();
textAndCounts.Add(tbFirstClub.Text, 0); // Assuming the type of Text is string, change acccorrdingly
textAndCounts.Add(tbSecondClub.Text, 0);
//We added both out texts fields to our dictionary with a value of 0
// Read all the lines from the file.
var allLines = File.ReadAllLines(openFileName); /* using System.IO */
foreach(var line in allLines)
{
if(line.Contains(tbFirstClub.Text))
{
textAndCounts[tbFirstClub.Text] += 1; // Go to where we stored our count for our text and increment
}
if(line.Contains(tbSecondClub.Text))
{
textandCounts[tbSecondClub.Text] += 1;
}
}
This should solve your problem, but it's still pretty brittle. Optimally, we want to design a system that works for any number of strings and counts them.
So how would I do it?
public Dictionary<string, int> GetCountsPerStringInFile(IEnumerable<string> textsToSearch, string filePath)
{
//Lets use Linq to create a dictionary, assuming all strings are unique.
//This means, create a dictionary in this list, where the key is the values in the list, and the value is 0 <Text, 0>
var textsAndCount = textsToSearch.ToDictionary(text => text, count => 0);
var allLines = File.ReadAllLines(openFileName);
foreach (var line in allLines)
{
// You didn't specify if a line could maintain multiple values, so let's handle that here.
var keysContained = textsAndCounts.Keys.Where(c => line.Contains(c)); // take all the keys where the line has that key.
foreach (var key in keysContained)
{
textsAndCounts[key] += 1; // increment the count associated with that string.
}
}
return textsAndCounts;
}
The above code allows us to return a data structure with any amount of strings with a count.
I think this is a good example for you to save you some headaches going forward, and it's probably a good first toe-dip into design patterns. I'd suggest looking up some material on Data structures and their use cases.

JSON Array to Entity Framework Core VERY Slow?

I'm working on a utility to read through a JSON file I've been given and to transform it into SQL Server. My weapon of choice is a .NET Core Console App (I'm trying to do all of my new work with .NET Core unless there is a compelling reason not to). I have the whole thing "working" but there is clearly a problem somewhere because the performance is truly horrifying almost to the point of being unusable.
The JSON file is approximately 27MB and contains a main array of 214 elements and each of those contains a couple of fields along with an array of from 150-350 records (that array has several fields and potentially a small <5 record array or two). Total records are approximately 35,000.
In the code below I've changed some names and stripped out a few of the fields to keep it more readable but all of the logic and code that does actual work is unchanged.
Keep in mind, I've done a lot of testing with the placement and number of calls to SaveChanges() think initially that number of trips to the Db was the problem. Although the version below is calling SaveChanges() once for each iteration of the 214-record loop, I've tried moving it outside of the entire looping structure and there is no discernible change in performance. In other words, with zero trips to the Db, this is still SLOW. How slow you ask, how does > 24 hours to run hit you? I'm willing to try anything at this point and am even considering moving the whole process into SQL Server but would much reather work in C# than TSQL.
static void Main(string[] args)
{
string statusMsg = String.Empty;
JArray sets = JArray.Parse(File.ReadAllText(#"C:\Users\Public\Downloads\ImportFile.json"));
try
{
using (var _db = new WidgetDb())
{
for (int s = 0; s < sets.Count; s++)
{
Console.WriteLine($"{s.ToString()}: {sets[s]["name"]}");
// First we create the Set
Set eSet = new Set()
{
SetCode = (string)sets[s]["code"],
SetName = (string)sets[s]["name"],
Type = (string)sets[s]["type"],
Block = (string)sets[s]["block"] ?? ""
};
_db.Entry(eSet).State = Microsoft.EntityFrameworkCore.EntityState.Added;
JArray widgets = sets[s]["widgets"].ToObject<JArray>();
for (int c = 0; c < widgets.Count; c++)
{
Widget eWidget = new Widget()
{
WidgetId = (string)widgets[c]["id"],
Layout = (string)widgets[c]["layout"] ?? "",
WidgetName = (string)widgets[c]["name"],
WidgetNames = "",
ReleaseDate = releaseDate,
SetCode = (string)sets[s]["code"]
};
// WidgetColors
if (widgets[c]["colors"] != null)
{
JArray widgetColors = widgets[c]["colors"].ToObject<JArray>();
for (int cc = 0; cc < widgetColors.Count; cc++)
{
WidgetColor eWidgetColor = new WidgetColor()
{
WidgetId = eWidget.WidgetId,
Color = (string)widgets[c]["colors"][cc]
};
_db.Entry(eWidgetColor).State = Microsoft.EntityFrameworkCore.EntityState.Added;
}
}
// WidgetTypes
if (widgets[c]["types"] != null)
{
JArray widgetTypes = widgets[c]["types"].ToObject<JArray>();
for (int ct = 0; ct < widgetTypes.Count; ct++)
{
WidgetType eWidgetType = new WidgetType()
{
WidgetId = eWidget.WidgetId,
Type = (string)widgets[c]["types"][ct]
};
_db.Entry(eWidgetType).State = Microsoft.EntityFrameworkCore.EntityState.Added;
}
}
// WidgetVariations
if (widgets[c]["variations"] != null)
{
JArray widgetVariations = widgets[c]["variations"].ToObject<JArray>();
for (int cv = 0; cv < widgetVariations.Count; cv++)
{
WidgetVariation eWidgetVariation = new WidgetVariation()
{
WidgetId = eWidget.WidgetId,
Variation = (string)widgets[c]["variations"][cv]
};
_db.Entry(eWidgetVariation).State = Microsoft.EntityFrameworkCore.EntityState.Added;
}
}
}
_db.SaveChanges();
}
}
statusMsg = "Import Complete";
}
catch (Exception ex)
{
statusMsg = ex.Message + " (" + ex.InnerException + ")";
}
Console.WriteLine(statusMsg);
Console.ReadKey();
}
I had an issue with that kind of code, lots of loops and tons of changing state.
Any change / manipulation you make in _db context, will generate a "trace" of it. And it making your context slower each time. Read more here.
The fix for me was to create new EF context(_db) at some key points. It saved me a few hours per run!
You could try to create a new instance of _db each iteration in this loop
contains a main array of 214 elements
If it make no change, try to add some stopwatch to get a best idea of what/where is taking so long.
If you're making thousands of updates then EF is not really the way to go. Something like SQLBulkCopy will do the trick.
You could try the bulkwriter library.
IEnumerable<string> ReadFile(string path)
{
using (var stream = File.OpenRead(path))
using (var reader = new StreamReader(stream))
{
while (reader.Peek() >= 0)
{
yield return reader.ReadLine();
}
}
}
var items =
from line in ReadFile(#"C:\products.csv")
let values = line.Split(',')
select new Product {Sku = values[0], Name = values[1]};
then
using (var bulkWriter = new BulkWriter<Product>(connectionString)) {
bulkWriter.WriteToDatabase(items);
}

How to avoid c# File.ReadLines First() locking file

I do not want to read the whole file at any point, I know there are answers on that question, I want t
o read the First or Last line.
I know that my code locks the file that it's reading for two reasons 1) The application that writes to the file crashes intermittently when I run my little app with this code but it never crashes when I am not running this code! 2) There are a few articles that will tell you that File.ReadLines locks the file.
There are some similar questions but that answer seems to involve reading the whole file which is slow for large files and therefore not what I want to do. My requirement to only read the last line most of the time is also unique from what I have read about.
I nead to know how to read the first line (Header row) and the last line (latest row). I do not want to read all lines at any point in my code because this file can become huge and reading the entire file will become slow.
I know that
line = File.ReadLines(fullFilename).First().Replace("\"", "");
... is the same as ...
FileStream fs = new FileStream(#fullFilename, FileMode.Open, FileAccess.Read, FileShare.Read);
My question is, how can I repeatedly read the first and last lines of a file which may be being written to by another application without locking it in any way. I have no control over the application that is writting to the file. It is a data log which can be appended to at any time. The reason I am listening in this way is that this log can be appended to for days on end. I want to see the latest data in this log in my own c# programme without waiting for the log to finish being written to.
My code to call the reading / listening function ...
//Start Listening to the "data log"
private void btnDeconstructCSVFile_Click(object sender, EventArgs e)
{
MySandbox.CopyCSVDataFromLogFile copyCSVDataFromLogFile = new MySandbox.CopyCSVDataFromLogFile();
copyCSVDataFromLogFile.checkForLogData();
}
My class which does the listening. For now it simply adds the data to 2 generics lists ...
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using MySandbox.Classes;
using System.IO;
namespace MySandbox
{
public class CopyCSVDataFromLogFile
{
static private List<LogRowData> listMSDataRows = new List<LogRowData>();
static String fullFilename = string.Empty;
static LogRowData previousLineLogRowList = new LogRowData();
static LogRowData logRowList = new LogRowData();
static LogRowData logHeaderRowList = new LogRowData();
static Boolean checking = false;
public void checkForLogData()
{
//Initialise
string[] logHeaderArray = new string[] { };
string[] badDataRowsArray = new string[] { };
//Get the latest full filename (file with new data)
//Assumption: only 1 file is written to at a time in this directory.
String directory = "C:\\TestDir\\";
string pattern = "*.csv";
var dirInfo = new DirectoryInfo(directory);
var file = (from f in dirInfo.GetFiles(pattern) orderby f.LastWriteTime descending select f).First();
fullFilename = directory + file.ToString(); //This is the full filepath and name of the latest file in the directory!
if (logHeaderArray.Length == 0)
{
//Populate the Header Row
logHeaderRowList = getRow(fullFilename, true);
}
LogRowData tempLogRowList = new LogRowData();
if (!checking)
{
//Read the latest data in an asynchronous loop
callDataProcess();
}
}
private async void callDataProcess()
{
checking = true; //Begin checking
await checkForNewDataAndSaveIfFound();
}
private static Task checkForNewDataAndSaveIfFound()
{
return Task.Run(() => //Call the async "Task"
{
while (checking) //Loop (asynchronously)
{
LogRowData tempLogRowList = new LogRowData();
if (logHeaderRowList.ValueList.Count == 0)
{
//Populate the Header row
logHeaderRowList = getRow(fullFilename, true);
}
else
{
//Populate Data row
tempLogRowList = getRow(fullFilename, false);
if ((!Enumerable.SequenceEqual(tempLogRowList.ValueList, previousLineLogRowList.ValueList)) &&
(!Enumerable.SequenceEqual(tempLogRowList.ValueList, logHeaderRowList.ValueList)))
{
logRowList = getRow(fullFilename, false);
listMSDataRows.Add(logRowList);
previousLineLogRowList = logRowList;
}
}
//System.Threading.Thread.Sleep(10); //Wait for next row.
}
});
}
private static LogRowData getRow(string fullFilename, bool isHeader)
{
string line;
string[] logDataArray = new string[] { };
LogRowData logRowListResult = new LogRowData();
try
{
if (isHeader)
{
//Asign first (header) row data.
//Works but seems to block writting to the file!!!!!!!!!!!!!!!!!!!!!!!!!!!
line = File.ReadLines(fullFilename).First().Replace("\"", "");
}
else
{
//Assign data as last row (default behaviour).
line = File.ReadLines(fullFilename).Last().Replace("\"", "");
}
logDataArray = line.Split(',');
//Copy Array to Generics List and remove last value if it's empty.
for (int i = 0; i < logDataArray.Length; i++)
{
if (i < logDataArray.Length)
{
if (i < logDataArray.Length - 1)
{
//Value is not at the end, from observation, these always have a value (even if it's zero) and so we'll store the value.
logRowListResult.ValueList.Add(logDataArray[i]);
}
else
{
//This is the last value
if (logDataArray[i].Replace("\"", "").Trim().Length > 0)
{
//In this case, the last value is not empty, store it as normal.
logRowListResult.ValueList.Add(logDataArray[i]);
}
else { /*The last value is empty, e.g. "123,456,"; the final comma denotes another field but this field is empty so we will ignore it now. */ }
}
}
}
}
catch (Exception ex)
{
if (ex.Message == "Sequence contains no elements")
{ /*Empty file, no problem. The code will safely loop and then will pick up the header when it appears.*/ }
else
{
//TODO: catch this error properly
Int32 problemID = 10; //Unknown ERROR.
}
}
return logRowListResult;
}
}
}
I found the answer in a combination of other questions. One answer explaining how to read from the end of a file, which I adapted so that it would read only 1 line from the end of the file. And another explaining how to read the entire file without locking it (I did not want to read the entire file but the not locking part was useful). So now you can read the last line of the file (if it contains end of line characters) without locking it. For other end of line delimeters, just replace my 10 and 13 with your end of line character bytes...
Add the method below to public class CopyCSVDataFromLogFile
private static string Reverse(string str)
{
char[] arr = new char[str.Length];
for (int i = 0; i < str.Length; i++)
arr[i] = str[str.Length - 1 - i];
return new string(arr);
}
and replace this line ...
line = File.ReadLines(fullFilename).Last().Replace("\"", "");
with this code block ...
Int32 endOfLineCharacterCount = 0;
Int32 previousCharByte = 0;
Int32 currentCharByte = 0;
//Read the file, from the end, for 1 line, allowing other programmes to access it for read and write!
using (FileStream reader = new FileStream(fullFilename, FileMode.Open, FileAccess.Read, FileShare.ReadWrite, 0x1000, FileOptions.SequentialScan))
{
int i = 0;
StringBuilder lineBuffer = new StringBuilder();
int byteRead;
while ((-i < reader.Length) /*Belt and braces: if there were no end of line characters, reading beyond the file would give a catastrophic error here (to be avoided thus).*/
&& (endOfLineCharacterCount < 2)/*Exit Condition*/)
{
reader.Seek(--i, SeekOrigin.End);
byteRead = reader.ReadByte();
currentCharByte = byteRead;
//Exit condition: the first 2 characters we read (reading backwards remember) were end of line ().
//So when we read the second end of line, we have read 1 whole line (the last line in the file)
//and we must exit now.
if (currentCharByte == 13 && previousCharByte == 10)
{
endOfLineCharacterCount++;
}
if (byteRead == 10 && lineBuffer.Length > 0)
{
line += Reverse(lineBuffer.ToString());
lineBuffer.Remove(0, lineBuffer.Length);
}
lineBuffer.Append((char)byteRead);
previousCharByte = byteRead;
}
reader.Close();
}

IndexOutOfRangeException when trying to create an Object from a text file in C#

I'm currently in the middle of trying to take a '|' delimited text file and create objects from the data contained within. Example:
Name|Address|City|State|Zip|Birthday|ID|Etc.
Name2|Address2|City2|State2|Zip2|Birthday2|ID2|Etc.
The newly created object, is then added to a list of said objects and the program moves to the next line of the file by way of a while loop using .Peek() (to make sure I don't go past the end of the file).
However, when it gets to creating the second object (more specifically, the second field of the second object), it throws an Index Out Of Range Exception, and I can't for the life of me figure out why. Thank you whomever might read this!
StreamReader textIn = new StreamReader(new FileStream(path, FileMode.OpenOrCreate, FileAccess.Read));
List<Student> students = new List<Student>();
while (textIn.Peek() != -1)
{
string row = textIn.ReadLine();
MessageBox.Show(row);
string [] fields = row.Split('|');
Student temp = new Student();
try
{
temp.name = fields[0];
temp.address = fields[1];
temp.city = fields[2];
temp.state = fields[3];
temp.zipCode = Convert.ToInt32(fields[4]);
temp.birthdate = fields[5];
temp.studentID = Convert.ToInt32(fields[6]);
temp.sGPA = Convert.ToDouble(fields[7]);
}
catch
{
MessageBox.Show("IndexOutOfRangeException caught");
}
students.Add(temp);
}
textIn.Close();
First you can't ensure if its a IndexOutOfRange Exception with your current catch block.
catch
{
MessageBox.Show("IndexOutOfRangeException caught");
}
It can be anything, may be exception during parsing to double. You may modify your catch block to:
catch(IndexOutOfRangeException ex)
{
MessageBox.Show(ex.Message);
}
Also if you are going to access fields[7] then its better if you can check against the length of array to ensure that you got atleast 8 elements in your array.
if(fileds.Length >=8)
{
temp.name = fields[0];
....
To catch FormatException which can occur during double parsing you may add an extra catch block for:
catch (FormatException ex)
{
MessageBox.Show(ex.Message);
}
Check if you have all 8 fieds in a line.
Show a message if ther isn't.
Get the actual exception and show its message to see the real problem description.
Use Double.TryParse Method and Int32.TryParse Method to be sure all numeric values are valid
Also use while (!textIn.EndOfStream) instead.
try
{
int tempInt;
double tempDouble;
if (fields.Length = 8)//#1
{
temp.name = fields[0];
temp.address = fields[1];
temp.city = fields[2];
temp.state = fields[3];
if (!int.TryParse(fields[4], out tempInt)) //#4
temp.zipCode = tempInt;
else
{
//..invalid value in field
}
temp.birthdate = fields[5];
if (!int.TryParse(fields[6], out tempInt)) //#4
temp.studentID = tempInt;
else
{
//..invalid value in field
}
if (!int.TryParse(fields[7], out tempDouble)) //#4
temp.sGPA = tempDouble;
else
{
//..invalid value in field
}
}
else //#2
{
MessageBox.Show("Invalid number of fields");
}
}
catch (Exception ex) //#3
{
MessageBox.Show(ex.Message);
}
Maybe ReadAllLines will work a bit better if the data is on each line:
List<Student> students = new List<Student>();
using (FileStream textIn = new FileStream(path, FileMode.Open, FileAccess.Read))
{
foreach (string line in File.ReadAllLines(path))
{
MessageBox.Show(line);
string[] fields = line.Split('|');
Student temp = new Student();
try
{
temp.name = fields[0];
temp.address = fields[1];
temp.city = fields[2];
temp.state = fields[3];
temp.zipCode = Convert.ToInt32(fields[4]);
temp.birthdate = fields[5];
temp.studentID = Convert.ToInt32(fields[6]);
temp.sGPA = Convert.ToDouble(fields[7]);
}
catch
{
MessageBox.Show(string.Format("IndexOutOfRangeException caught, Split Result:", string.Join(", ", fields.ToArray())));
}
students.Add(temp);
}
}
In the given data if you have atleast eight columns for every row, you wont be getting index of of range exception but parsing of item at 4, 6, 7 would fail as they are not numbers and converting the non number values to int and double raises the exception.
temp.zipCode = Convert.ToInt32(fields[4]);
temp.studentID = Convert.ToInt32(fields[6]);
temp.sGPA = Convert.ToDouble(fields[7]);
You need to change the catch block to know the reason for exception
}
catch(Exception ex)
{
MessageBox.Show(ex.Message);
}

Reading a line from a streamreader without consuming?

Is there a way to read ahead one line to test if the next line contains specific tag data?
I'm dealing with a format that has a start tag but no end tag.
I would like to read a line add it to a structure then test the line below to make sure it not a new "node" and if it isn't keep adding if it is close off that struct and make a new one
the only solution i can think of is to have two stream readers going at the same time kinda suffling there way along lock step but that seems wastefull (if it will even work)
i need something like peek but peekline
The problem is the underlying stream may not even be seekable. If you take a look at the stream reader implementation it uses a buffer so it can implement TextReader.Peek() even if the stream is not seekable.
You could write a simple adapter that reads the next line and buffers it internally, something like this:
public class PeekableStreamReaderAdapter
{
private StreamReader Underlying;
private Queue<string> BufferedLines;
public PeekableStreamReaderAdapter(StreamReader underlying)
{
Underlying = underlying;
BufferedLines = new Queue<string>();
}
public string PeekLine()
{
string line = Underlying.ReadLine();
if (line == null)
return null;
BufferedLines.Enqueue(line);
return line;
}
public string ReadLine()
{
if (BufferedLines.Count > 0)
return BufferedLines.Dequeue();
return Underlying.ReadLine();
}
}
You could store the position accessing StreamReader.BaseStream.Position, then read the line next line, do your test, then seek to the position before you read the line:
// Peek at the next line
long peekPos = reader.BaseStream.Position;
string line = reader.ReadLine();
if (line.StartsWith("<tag start>"))
{
// This is a new tag, so we reset the position
reader.BaseStream.Seek(pos);
}
else
{
// This is part of the same node.
}
This is a lot of seeking and re-reading the same lines. Using some logic, you may be able to avoid this altogether - for instance, when you see a new tag start, close out the existing structure and start a new one - here's a basic algorithm:
SomeStructure myStructure = null;
while (!reader.EndOfStream)
{
string currentLine = reader.ReadLine();
if (currentLine.StartsWith("<tag start>"))
{
// Close out existing structure.
if (myStructure != null)
{
// Close out the existing structure.
}
// Create a new structure and add this line.
myStructure = new Structure();
// Append to myStructure.
}
else
{
// Add to the existing structure.
if (myStructure != null)
{
// Append to existing myStructure
}
else
{
// This means the first line was not part of a structure.
// Either handle this case, or throw an exception.
}
}
}
Why the difficulty? Return the next line, regardless. Check if it is a new node, if not, add it to the struct. If it is, create a new struct.
// Not exactly C# but close enough
Collection structs = new Collection();
Struct struct;
while ((line = readline()) != null)) {
if (IsNode(line)) {
if (struct != null) structs.add(struct);
struct = new Struct();
continue;
}
// Whatever processing you need to do
struct.addLine(line);
}
structs.add(struct); // Add the last one to the collection
// Use your structures here
foreach s in structs {
}
Here is what i go so far. I went more of the split route than the streamreader line by line route.
I'm sure there are a few places that are dieing to be more elegant but for right now it seems to be working.
Please let me know what you think
struct INDI
{
public string ID;
public string Name;
public string Sex;
public string BirthDay;
public bool Dead;
}
struct FAM
{
public string FamID;
public string type;
public string IndiID;
}
List<INDI> Individuals = new List<INDI>();
List<FAM> Family = new List<FAM>();
private void button1_Click(object sender, EventArgs e)
{
string path = #"C:\mostrecent.ged";
ParseGedcom(path);
}
private void ParseGedcom(string path)
{
//Open path to GED file
StreamReader SR = new StreamReader(path);
//Read entire block and then plit on 0 # for individuals and familys (no other info is needed for this instance)
string[] Holder = SR.ReadToEnd().Replace("0 #", "\u0646").Split('\u0646');
//For each new cell in the holder array look for Individuals and familys
foreach (string Node in Holder)
{
//Sub Split the string on the returns to get a true block of info
string[] SubNode = Node.Replace("\r\n", "\r").Split('\r');
//If a individual is found
if (SubNode[0].Contains("INDI"))
{
//Create new Structure
INDI I = new INDI();
//Add the ID number and remove extra formating
I.ID = SubNode[0].Replace("#", "").Replace(" INDI", "").Trim();
//Find the name remove extra formating for last name
I.Name = SubNode[FindIndexinArray(SubNode, "NAME")].Replace("1 NAME", "").Replace("/", "").Trim();
//Find Sex and remove extra formating
I.Sex = SubNode[FindIndexinArray(SubNode, "SEX")].Replace("1 SEX ", "").Trim();
//Deterine if there is a brithday -1 means no
if (FindIndexinArray(SubNode, "1 BIRT ") != -1)
{
// add birthday to Struct
I.BirthDay = SubNode[FindIndexinArray(SubNode, "1 BIRT ") + 1].Replace("2 DATE ", "").Trim();
}
// deterimin if there is a death tag will return -1 if not found
if (FindIndexinArray(SubNode, "1 DEAT ") != -1)
{
//convert Y or N to true or false ( defaults to False so no need to change unless Y is found.
if (SubNode[FindIndexinArray(SubNode, "1 DEAT ")].Replace("1 DEAT ", "").Trim() == "Y")
{
//set death
I.Dead = true;
}
}
//add the Struct to the list for later use
Individuals.Add(I);
}
// Start Family section
else if (SubNode[0].Contains("FAM"))
{
//grab Fam id from node early on to keep from doing it over and over
string FamID = SubNode[0].Replace("# FAM", "");
// Multiple children can exist for each family so this section had to be a bit more dynaimic
// Look at each line of node
foreach (string Line in SubNode)
{
// If node is HUSB
if (Line.Contains("1 HUSB "))
{
FAM F = new FAM();
F.FamID = FamID;
F.type = "PAR";
F.IndiID = Line.Replace("1 HUSB ", "").Replace("#","").Trim();
Family.Add(F);
}
//If node for Wife
else if (Line.Contains("1 WIFE "))
{
FAM F = new FAM();
F.FamID = FamID;
F.type = "PAR";
F.IndiID = Line.Replace("1 WIFE ", "").Replace("#", "").Trim();
Family.Add(F);
}
//if node for multi children
else if (Line.Contains("1 CHIL "))
{
FAM F = new FAM();
F.FamID = FamID;
F.type = "CHIL";
F.IndiID = Line.Replace("1 CHIL ", "").Replace("#", "");
Family.Add(F);
}
}
}
}
}
private int FindIndexinArray(string[] Arr, string search)
{
int Val = -1;
for (int i = 0; i < Arr.Length; i++)
{
if (Arr[i].Contains(search))
{
Val = i;
}
}
return Val;
}

Categories

Resources