parsing multi sections of a text file C#

parsing multi sections of a text file C# - c#

First let me start by thanking you all for being part of this site, I have already gained so much helpful information from it. including some basic parsing of text files in to Arrays, but i now want to go a step further.
I have a text file that looks some thing like this
Start Section 1 - foods
apple
bannana
pear
pineapple
orange
end section 1
Start section 2 - animals
dog
cat
horse
cow
end section 2
what I want to do is using a single read of the file copy the data from section 1 in to an array called "foods" and section 2 in to and array called "animals"
now I can get it to work by using a new loop for each section, closing and reopening the file each time, looping till I find the section I want and creating the array.
But I was thinking there must be a way to read each section in to a separate array in one go saving time.
so my current code is
List<string> typel = new List<string>();
using (StreamReader reader = new StreamReader("types.txt")) // opens file using streamreader
{
string line; // reads line by line in to varible "line"
while ((line = reader.ReadLine()) != null) // loops untill it reaches an empty line
{
typel.Add(line); // adds the line to the list varible "typel"
}
}
Console.WriteLine(typel[1]); // test to see if list is beeing incremented
string[] type = typel.ToArray(); //converts the list to a true array
Console.WriteLine(type.Length); // returns the number of elements of the array created.
which is for a simple text file with no sections just list of values, using list seemed a good way to deal with unknown lengths of arrays.
I was also wondering how to deal with the first value.
for example if i do
while ((line = reader.ReadLine()) != Start Section 1 - foods)
{
}
while ((line = reader.ReadLine()) != end Section 1)
{
foods.Add(line);
}
...
....
I end up with the "start Section 1 - foods" as one of the array elements. I can remove it with code but is there an easy way to avoid this so only the list items get populated?
Cheers and once again thanks for all the help. Its great to be getting back in to programming after many many years.
Aaron

Reading the lines is not the issue, see System.IO.ReadAllLines(fileName) and its siblings.
What you need is a (very simple) interpreter:
// totally untested
Dictionary<string, List<string>> sections = new Dictionary<string, List<string>>();
List<string> section = null;
foreach(string line in GetLines())
{
if (IsSectionStart(line))
{
string name = GetSectionName(line);
section = new List<string>();
sections.Add(name, section);
}
else if (IsSectionEnd(line))
{
section = null; // invite exception when we're lost
}
else
{
section.Add(line);
}
}
...
List<string> foods = sections ["foods"];

Look for pointers for start and end. This is where you start putting things into arrays, lists, etc.
Here is a stab at making it very flexible:
class Program
{
private static Dictionary<string, List<string>> _arrayLists = new Dictionary<string, List<string>>();
static void Main(string[] args)
{
string filePath = "c:\\logs\\arrays.txt";
StreamReader reader = new StreamReader(filePath);
string line;
string category = "";
while (null != (line = reader.ReadLine()))
{
if (line.ToLower().Contains("start"))
{
string[] splitHeader = line.Split("-".ToCharArray());
category = splitHeader[1].Trim();
}
else
{
if (!_arrayLists.ContainsKey(category))
{
List<string> stringList = new List<string>();
_arrayLists.Add(category, stringList);
}
if((!line.ToLower().Contains("end")&&(line.Trim().Length > 0)))
{
_arrayLists[category].Add(line.Trim());
}
}
}
//testing
foreach(var keyValue in _arrayLists)
{
Console.WriteLine("Category: {0}",keyValue.Key);
foreach(var value in keyValue.Value)
{
Console.WriteLine("{0}".PadLeft(5, ' '), value);
}
}
Console.Read();
}
}

To add to the other answers, if you don't want to parse the text file yourself, you could always use a quick and dirty regular expression if you're comfortable with them:
var regex = new Regex(#"Start Section \d+ - (?<section>\w+)\r\n(?<list>[\w\s]+)End Section", RegexOptions.IgnoreCase);
var data = new Dictionary<string, List<string>>();
foreach (Match match in regex.Matches(File.ReadAllText("types.txt")))
{
string section = match.Groups["section"].Value;
string[] items = match.Groups["list"].Value.Split(new string[] { Environment.NewLine }, StringSplitOptions.RemoveEmptyEntries);
data.Add(section, new List<string>(items));
}
// data["animals"] now contains a list of "dog", "cat", "horse", and "cow"
In response to the comment:
but "list" sounds so simple and basic
(like i am going shopping), array has
much nicer ring to it ;) But I will
look in to them maybe a bit more, I
got the impression from my research
that arrays are more efficent code?
It's not about whether a list vs. array is "basic" or "has a nicer ring", it's about the purpose of the code. In your case, you're iterating a file line-by-line and adding items to a collection of an unknown size beforehand - which is one problem a list was designed to solve. Of course you could peek through the file and determine the exact size, but is doing that worth the extra "efficiency" you get from using an array, and is iterating the file twice going to take longer than using a list in the first place? You don't know unless you profile your code and conclude that specific portion is a bottleneck... which I'll say, will almost never be the case.

Uhmmm, like this?
//converting it to array called allLines, elements/index per line
string[] allLines = File.ReadAllLines("types.txt").ToArray();
//getting the index of allLines that contains "Start Section 1" and "end section 1"
int[] getIndexes = new int[] { Array.FindIndex(allLines, start => start.Contains("Start Section 1")), Array.FindIndex(allLines, start => start.Contains("end section 1")) };
//create list to get indexes of the list(apple,banana, pear, etc...)
List<int> indexOfList = new List<int>();
//get index of the list(apple,banana, pear,etc...)
for (int i = getIndexes[0]; i < getIndexes[1]; i++)
{
indexOfList.Add(i);
}
//remove the index of the element or line "Start Section 1"
indexOfList.RemoveAt(0);
//final list
string[] foodList = new string[]{ allLines[indexOfList[0]], allLines[indexOfList[1]], and so on...};
Then you can call them or edit then save.
//call them
Console.Writeline(foodList[0] + "\n" + foodList[1] + ...)
//edit the list
allLines[indexOfList[0]] = "chicken"; //from apple to chicken
allLines[indexOfList[1]] = "egg"; //from banana to egg
//save lines
File.WriteAllLines("types.txt", allLines);

Related

Adding items from a text file to an array

In my C# program, I'm trying to read in data from a text file to an array. I've looked at many answers on here and can't figure out what's wrong with my code. Whenever I run it, I get an exception that says that I got an unhandled exception of type 'System.IndexOutOfRangeException', but I don't understand why the index would be out of range.
Here's what I have:
string[] namesArray = new string[] { };
using (var sr = new StreamReader(mydocpath + #"\nameIDLines.txt"))
{
for (int i = 1; i < numLines; i++)
{
nameIDLine = sr.ReadLine();
nameIDLine = nameIDLine.Split(new string[] { "ID" }, StringSplitOptions.None)[0];
namesArray[i] = nameIDLine;
}
}

When you do this:
string[] namesArray = new string[] { };
You're initializing an array with length 0. Therefore, when you do namesArray[i], it will always be out of range regardless of i.
Since you know the number of lines, and therefore the number of items, you can initialize the array with that number:
string[] namesArray = new string[numLines];

As #stybl pointed out, the problem is that you are initializing a zero-length array to store your lines... and that will always lead to an "index out of range" exception. On the top of that, I wonder how you are managing to get a consistent numLines value while using a StreamReader to parse the file... this is very risky since you must be sure that the file always contains enough lines.
I suggest you to use the following approach instead:
String[] lines = File.ReadAllLines(mydocpath + #"\nameIDLines.txt");
If you want to take a specific number of lines from the file, you can proceed as follows:
String[] lines = File.ReadAllLines(mydocpath + #"\nameIDLines.txt").Take(numLines).ToArray();

C# using dictionaries

I'm sorry in advance if it's bad to ask for this sort of help... but I don't know who else to ask.
I have an assignment to read two text files, and find the 10 longest words in the first file (and the amount of times they're repeated) which dont exist in the second file.
I currently read both of the files with File.ReadAllLines then split them into arrays, where every element is a single word (punctuation marks removed as well) and removed empty entries.
The idea I had to pick out the words fitting the requirements was: to make a dictionary containing a string Word and an int Count. Then make a loop repeating for the first file's length.... firstly comparing the element with the entire dictionary - if it finds a match, increase the Count by 1. Then if it doesn't match with any of the dictionary elements - compare the given element with every element in the 2nd file through another loop, if it finds a match - just go on to the next element of the first file, if it doesn't find any matches - add the word to the dictionary, and set Count to 1.
So my first question is: Is this actually the most efficient way to do this? (Don't forget I've only recently started studying c# and am not allowed to use linq)
Second question: How do I work with the dictionary, because most of the results I could find were very confusing, and we have not yet met them at university.
My code so far:
// Reading and making all the words lowercase for comparisons
string punctuation = " ,.?!;:\"\r\n";
string Read1 = File.ReadAllText("#\\..\\Book1.txt");
Read1 = Read1.ToLower();
string Read2 = File.ReadAllText("#\\..\\Book2.txt");
Read2 = Read2.ToLower();
//Working with the 1st file
string[] FirstFileWords = Read1.Split(punctuation.ToCharArray());
var temp1 = new List<string>();
foreach (var word in FirstFileWords)
{
if (!string.IsNullOrEmpty(word))
temp1.Add(word);
}
FirstFileWords = temp1.ToArray();
Array.Sort(FirstFileWords, (x, y) => y.Length.CompareTo(x.Length));
//Working with the 2nd file
string[] SecondFileWords = Read2.Split(punctuation.ToCharArray());
var temp2 = new List<string>();
foreach (var word in SecondFileWords)
{
if (!string.IsNullOrEmpty(word))
temp2.Add(word);
}
SecondFileWords = temp2.ToArray();

Well I think you've done very well so far. Not being able to use Linq here is torture ;)
As for performance, you should consider making your SecondFileWords a HashSet<string> as this would increase lookup times if any word exists in the 2nd file tremendously without much effort. I wouldn't go much further in terms of performance optimization for an exercise like that if performance is not a key requirement.
Of course, you would have to check that you don't add duplicates to your 2nd list, so change your current implementation to something like:
HashSet<string> temp2 = new HashSet<string>();
foreach (var word in SecondFileWords)
{
if (!string.IsNullOrEmpty(word) && !temp2.Contains(word))
{
temp2.Add(word);
}
}
Don't convert this back to an Array again, this is not necessary.
This brings me back to your FirstFileWords which would contain duplicates too. This will cause issues later on when the top words might contain the same word multiple times. So let's get rid of them. Here it's more complicated as you need to retain the information how often a word appeared in your first list.
So let's bring a Dictionary<string, int> into play here now. A Dictionary stores a lookup key, as the HashSet, but in addition, also a value. We will use the key for the word, and the value for a number that contains the amount of how often the word appeared in the first list.
Dictionary<string, int> temp1 = new Dictionary<string, int>();
foreach (var word in FirstFileWords)
{
if (string.IsNullOrEmpty(word))
{
continue;
}
if (temp1.ContainsKey(word))
{
temp1[word]++;
}
else
{
temp1.Add(word, 1);
}
}
Now a dictionary cannot be sorted, which complicates things at this point as you still need to get your sorting by word length done. So let's get back to your Array.Sort method which I think is a good choice when you are not allowed to use Linq:
KeyValuePair<string, int>[] firstFileWordsWithCount = temp1.ToArray();
Array.Sort(firstFileWordsWithCount, (x, y) => y.Key.Length.CompareTo(x.Key.Length));
Note: You are using .ToArray() in your example, so I think it's OK to use it. But strictly speaking, this would also fall unter using Linq IMHO.
Now all that's left is working through your firstFileWordsWithCount array until you got 10 words that do not exist in the HashSet temp2. Something like:
int foundWords = 0;
foreach(KeyValuePair<string, int> candidate in firstFileWordsWithCount)
{
if (!temp2.Contains(candidate.Key))
{
Console.WriteLine($"{candidate.Key}: {candidate.Value}");
foundWords++;
}
if (foundWords >= 10)
{
break;
}
}
If anything is unclear, just ask.

This is what you'll get when using dictionaries:
string File1 = "AMD Intel Skylake Processors Graphics Cards Nvidia Architecture Microprocessor Skylake SandyBridge KabyLake";
string File2 = "Graphics Nvidia";
Dictionary<string, int> Dic = new Dictionary<string, int>();
string[] File1Array = File1.Split(" ".ToCharArray(), StringSplitOptions.RemoveEmptyEntries);
Array.Sort(File1Array, (s1, s2) => s2.Length.CompareTo(s1.Length));
foreach (string s in File1Array)
{
if (Dic.ContainsKey(s))
{
Dic[s]++;
}
else
{
Dic.Add(s, 1);
}
}
string[] File2Array = File2.Split(" ".ToCharArray(), StringSplitOptions.RemoveEmptyEntries);
foreach (string s in File2Array)
{
if (Dic.ContainsKey(s))
{
Dic.Remove(s);
}
}
int i = 0;
foreach (KeyValuePair<string, int> kvp in Dic)
{
i++;
Console.WriteLine(kvp.Key + " " + kvp.Value);
if (i == 9)
{
break;
}
}
My earlier attempt was using LINQ, which is apparently not allowed but missed it.
string[] Results = File1.Split(" ".ToCharArray()).Except(File2.Split(" ".ToCharArray())).OrderByDescending(s => s.Length).Take(10).ToArray();
for (int i = 0; i < Results.Length; i++)
{
Console.WriteLine(Results[i] + " " + Regex.Matches(File1, Results[i]).Count);
}

Manipulating Values in Dictionary

So I have a dictionary whose index is an int, and whose value is a class that contains a list of doubles, the class is built like this:
public class MyClass
{
public List<double> MyList = new List<double>();
}
and the dictionary is built like this:
public static Dictionary<int, MyClass> MyDictionary = new Dictionary<int, MyClass>();
I populate the dictionary by reading a file in line by line, and adding the pieces of the file into a splitstring, of which there is a known number of parts (100), then adding the pieces of the string into the list, and finally into the dictionary. Here's what that looks like:
public void DictionaryFiller()
{
string LineFromFile;
string[] splitstring;
int LineNumber = 0;
StreamReader sr = sr.ReadLine();
while (!sr.EndOfStream)
{
LineFromFile = sr.ReadLine();
splitstring = LineFromFile.Split(',');
MyClass newClass = new MyClass();
for (int i = 1; i < 100; i++)
{
newClass.MyList.Add(Convert.ToDouble(splitstring[i]));
}
MyDictionary.Add(LineNumber, MyClass);
LineNumber++;
}
}
My question is this: is I were to then read another file and begin the DictionaryFiller method again, could I add terms to each item in the list for each value in the dictionary. What I mean by that is, say the file's 1st line started with 10,23,15,... Now, when I read in a second file, lets say its first line begins with 10,13,18,... what I'm looking to have happen is for the dictionary to have the first 3 doubles in its value-list (indexed at 0) to then become 20,36,33,...
Id like to be able to add terms for any number of files read in, and ultimately then take their average by going through the dictionary again (in a separate method) and dividing each term in the value-list by the number of files read in. Is this possible to do? Thanks for any advice you have, I'm a novice programmer and any help you have is appreciated.

Just Replace
newClass.MyList.Add(Convert.ToDouble(splitstring[i]))
with
newClass.MyList.Add(Convert.ToDouble(splitstring[i]) + MyDictionary[LineNumber].GetListOfDouble()[i])
and then replace
MyDictionary.add(Linenumber, Myclass)
with
MyDictionary[linenumber] = MyClass
Just makes sure that the MyDictionary[LineNumber] is not null before adding it :)
Something like this would work
If(MyDictionary[LineNumber] == null)
{
MyDictionnary.add(LIneNUmber, new List<double>());
}
If(MyDictionary[LineNUmber][i] == null)
{
return 0;
}
My solution does not care about list size and it done at reading time not afterward, which should be more efficient than traversing your Dictionary twice.

var current = MyDictionary[key];
for(int i = 0; i < current.MyList.Length; i++)
{
current.MyList[i] = current.MyList[i] + newData[i];
}
Given both lists have same length and type of data.
You can get the custom object by key of the dictionary and then use its list to do any operation. You need to keep track of how many files are read separately.

Can't find string in input file

I have a text file, which I am trying to insert a line of code into. Using my linked-lists I believe I can avoid having to take all the data out, sort it, and then make it into a new text file.
What I did was come up with the code below. I set my bools, but still it is not working. I went through debugger and what it seems to be going on is that it is going through the entire list (which is about 10,000 lines) and it is not finding anything to be true, so it does not insert my code.
Why or what is wrong with this code?
List<string> lines = new List<string>(File.ReadAllLines("Students.txt"));
using (StreamReader inFile = new StreamReader("Students.txt", true))
{
string newLastName = "'Constant";
string newRecord = "(LIST (LIST 'Constant 'Malachi 'D ) '1234567890 'mdcant#mail.usi.edu 4.000000 )";
string line;
string lastName;
bool insertionPointFound = false;
for (int i = 0; i < lines.Count && !insertionPointFound; i++)
{
line = lines[i];
if (line.StartsWith("(LIST (LIST "))
{
values = line.Split(" ".ToCharArray());
lastName = values[2];
if (newLastName.CompareTo(lastName) < 0)
{
lines.Insert(i, newRecord);
insertionPointFound = true;
}
}
}
if (!insertionPointFound)
{
lines.Add(newRecord);
}

You're just reading the file into memory and not committing it anywhere.
I'm afraid that you're going to have to load and completely re-write the entire file. Files support appending, but they don't support insertions.
you can write to a file the same way that you read from it
string[] lines;
/// instanciate and build `lines`
File.WriteAllLines("path", lines);
WriteAllLines also takes an IEnumerable, so you can past a List of string into there if you want.
one more issue: it appears as though you're reading your file twice. one with ReadAllLines and another with your StreamReader.

There are at least four possible errors.
The opening of the streamreader is not required, you have already read
all the lines. (Well not really an error, but...)
The check for StartsWith can be fooled if you lines starts with blank
space and you will miss the insertionPoint. (Adding a Trim will remove any problem here)
In the CompareTo line you check for < 0 but you should check for == 0. CompareTo returns 0 if the strings are equivalent, however.....
To check if two string are equals you should avoid using CompareTo as
explained in MSDN link above but use string.Equals
List<string> lines = new List<string>(File.ReadAllLines("Students.txt"));
string newLastName = "'Constant";
string newRecord = "(LIST (LIST 'Constant 'Malachi 'D ) '1234567890 'mdcant#mail.usi.edu 4.000000 )";
string line;
string lastName;
bool insertionPointFound = false;
for (int i = 0; i < lines.Count && !insertionPointFound; i++)
{
line = lines[i].Trim();
if (line.StartsWith("(LIST (LIST "))
{
values = line.Split(" ".ToCharArray());
lastName = values[2];
if (newLastName.Equals(lastName))
{
lines.Insert(i, newRecord);
insertionPointFound = true;
}
}
}
if (!insertionPointFound)
lines.Add(newRecord);
I don't list as an error the missing write back to the file. Hope that you have just omitted that part of the code. Otherwise it is a very simple problem.
(However I think that the way in which CompareTo is used is probably the main reason of your problem)
EDIT Looking at your comment below it seems that the answer from Sam I Am is the right one for you. Of course you need to write back the modified array of lines. All the changes are made to an in memory array of lines and nothing is written back to a file if you don't have code that writes a file. However you don't need new file
File.WriteAllLines("Students.txt", lines);

Analysing line by line and storing if meets criteria, else ignore

I've dug around a lot on this one and not found quite was I was looking for.
INPUT: Multiple (in the hundreds, occasionally thousands) of lines of ASCII text, ranging from 97 characters long to over 500. The criteria for whether I want to keep this data or not is purely contained in the first 3 characters (always numbers - arbitrary values 100,200 and 300 are the ones I'm interested in).
The output required is only those that start with 100, 200 or 300, the rest I can ignore.
This is what I have as my streamreader, which currently outputs to console:
using System;
using System.Collections.Generic;
using System.IO;
class Program
{
public void Do
{
// Read in a file line-by-line, and store in a List.
List<string> list = new List<string>();
using (StreamReader reader = new StreamReader("File.dat"))
{
string line;
while ((line = reader.ReadLine()) != null)
{
list.Add(line); // Add to list.
Console.WriteLine(line); // Write to console.
// Console.ReadLine();
}
}
}
}
I was hoping to put in a line that says
IF {
FIRST3CHAR != (100,200,300) }
then skip,
but I'm unsure how to define the FIRST3CHAR class. This is the only filter that will be done on the raw data.
I will afterwards, be analysing this filtered data set based on other criteria contained within, but I'll give that a shot myself before asking for any assistance.

This code is more readable and does what you want:
var allowedNumbers = new[]{ "100", "200", "300" };
IEnumerable<String> lines = File
.ReadLines("File.dat")
.Where(l => allowedNumbers.Any(num => l.StartsWith(num)));
now you can enumerate the lines for example with a foreach:
foreach(string line in lines)
{
Console.WriteLine(line); // Write to console.
}
Since you want to add those lines to a List<string> anyway, you can use Enumerable.ToList instead of the foreach:
List<string> list = lines.ToList();

At the simplest level:
if(line.StartsWith("100") || line.StartsWith("200") || line.StartsWith("300"))
{
list.Add(line); // Add to list.
Console.WriteLine(line); // Write to console.
}
If the file is huge (as in, hundreds of thousands of lines), it might also be worth looking at implementing it as an iterator block. But the "starts" test is pretty simple.
If you need more flexibility, I would consider a regex; for example:
static readonly Regex re = new Regex("^[012]00", RegexOptions.Compiled);
...
while (...)
{
if(re.IsMatch(line))
{
list.Add(line); // Add to list.
Console.WriteLine(line); // Write to console.
}
}

Is there a reason why you don't just add this condition to your loop?
while ((line = reader.ReadLine()) != null)
{
var beginning = line.Substring(0, 3);
if(beginning != "100" && beginning != "200" && beginning != "300")
continue;
list.Add(line); // Add to list.
Console.WriteLine(line); // Write to console.
}

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

parsing multi sections of a text file C# - c#

Related

Adding items from a text file to an array

C# using dictionaries

Manipulating Values in Dictionary

Can't find string in input file

Analysing line by line and storing if meets criteria, else ignore

Categories

Resources