How to skip txt file chunks

How to skip txt file chunks - c#

How do I skip reading the file at the red boxes only to continue reading the file at the blue boxes? What adjustments would I need to make to 'fileReader'?
So far, with the help of SO users, I've been able to successfully skip the first 8 lines (first red box) and read the rest of the file. But now I want to read ONLY the parts indicated in blue.
I'm thinking of making a method for each chunk in blue. Basically start it by skipping first 8 lines of file if its first blue box, about 23 for the next blue box but ending the file reader is where I'm having problems. Simply don't know what to use.
private void button1_Click(object sender, EventArgs e)
{
// Reading/Inputing column values
OpenFileDialog ofd = new OpenFileDialog();
if (ofd.ShowDialog() == System.Windows.Forms.DialogResult.OK)
{
string[] lines = File.ReadAllLines(ofd.FileName).Skip(8).ToArray();
textBox1.Lines = lines;
int[] pos = new int[3] {0, 6, 18}; //setlen&pos to read specific colmn vals
int[] len = new int[3] {6, 12, 28}; // only doing 3 columns right now
foreach (string line in textBox1.Lines)
{
for (int j = 0; j < 3; j++) // 3 columns
{
val[j] = line.Substring(pos[j], len[j]).Trim();
list.Add(val[j]); // column values stored in list
}
}
}
}

Try something like this:
using System.Text.RegularExpressions; //add this using
foreach (string line in lines)
{
string[] tokens = Regex.Split(line.Trim(), " +");
int seq = 0;
DateTime dt;
if(tokens.Length > 0 && int.TryParse(tokens[0], out seq))
{
// parse this line - 1st type
}
else if (tokens.Length > 0 && DateTime.TryParse(tokens[0], out dt))
{
// parse this line - 2nd type
}
// else - don't parse the line
}
The Regex split is handy to break on any spaces till the next token. The Regex " +" means match one or more spaces. It splits when it finds something else. Based on your example, you only want to parse lines that begin with a number or a date, which this should accomplish. Note that I trimmed the line of leading and trailing spaces so that you don't split on any of those and get empty string tokens.

I can see what you want to read anything what:
between line ending with Numerics (possible one line after)
until line starting with 0Total (is that zero, right?);
between line ending with CURREN
until line with 1 as first symbol in the row.
Shouldn't be hard. Read file by line. When (1) or (3) occurs, start generating until (2) or (4) correspondingly.

Related

C# - Reading from CSV file producing two entries at once?

I am trying to read in a list of several hundred thousand values from what once was a spreadsheet, but for the sake of simplicity, I have turned into a CSV file.
My problem is that while testing it to make sure it reads properly, the code is for some reason ignoring the comma after the second position, and combining the value in that spot with the value next to it, despite being, you know, separated by a comma. It also begins to combine the final value with the first value from the next set.
For Example:
CSV File:
0,0,0,104672
0,1,6,51971
0,1,36,80212
0,2,5,51972
0,2,13,51973
...
Program Output:
0
00
1046720
00
16
519710
136
...
I think the example probably does a better job describing what's going on than I did in words. It continues like that, displaying the wrong information until it reaches the end of the file.
My code is as follows:
static void Main()
{
using(var fs = File.OpenRead(#"C:\path\to\file.csv"))
using(var read = new StreamReader(fs))
{
while (!read.EndOfStream)
{
int i = 0;
var line = read.ReadLine();
while (i < 4)
{
var values = line.Split(',');
Console.Write(values[i]);
Console.Read();
i++;
}
}
}
}
EDIT: Sorry, I got lost in my understanding of what the code should do and forgot to explain the goal here.
This program is made to take these values and rename a file from the 4th value (for example, 104672) to the first three values, separated by dashes (ex. 0-0-0). What I want from my output right now is to be able to see the program give me the values back, one at a time, so that I know when I go to rename the files, I'm not getting improper results.
EDIT 2: I also realize, a day later, that the answer I got was one of significance to making my program work, rather than actually discovering why I was getting the output I got. For those curious in the future, the answer is essentially that Console.Read(); is not a true pause, and causes more writes to happen upon key press than expected.

A more clear and easy-to-understand approach would be:
using (StreamReader sr = new StreamReader(#"C:\path\to\file.csv"))
{
string currentLine;
while((currentLine = sr.ReadLine()) != null)
{
string[] lineArr = line.Split(',');
foreach(string subLine in lineArr)
{
Console.WriteLine(subline);
}
Console.Read(); // Awaits user input in order to proceed
}
}

if you need those values for later use, why don't you put them into List for later... like this:
List<string[]> listOfValues = new List<string[]>();
using (var fs = File.OpenRead(#"C:\temp\csv.txt"))
using (var read = new StreamReader(fs))
{
while (!read.EndOfStream)
{
var line = read.ReadLine();
listOfValues.Add(line.Split(','));
}
}
later, you can use data from list:
for (int i = 0; i<listOfValues.Count; i++)
{
Console.WriteLine("line number: {0}, contents: {1}", i + 1, string.Join(" ", listOfValues[i]));
}
which gives you
line number: 1, contents: 0 0 0 104672
line number: 2, contents: 0 1 6 51971
line number: 3, contents: 0 1 36 80212
line number: 4, contents: 0 2 5 51972
line number: 5, contents: 0 2 13 51973

It's hard to tell from your code what you think it's supposed to do. Here is a version that will read each line, split it on the commas, and iterate through the values, printing each value. After printing all the values for a line, it prints a new line. Hopefully that's something like what you were trying to achieve.
static void Main()
{
using(var fs = File.OpenRead(#"C:\path\to\file.csv"))
using(var read = new StreamReader(fs))
{
while (!read.EndOfStream)
{
int i = 0;
var line = read.ReadLine();
var values = line.Split(',');
while (i < values.Length)
{
Console.Write(values[i]);
//Console.Read();
i++;
}
Console.WriteLine();
}
}
}

As #rory.ap said, you have plenty of libraries to read CSV right out of the box. But event if you still want to do it on your own, it seems that it is taking a great effort on doing a simple task. Try this:
using (StreamReader reader = new StreamReader("C:/yourpath/yourfile.csv"))
{
while (reader.Peek() >= 0)
{
string line = reader.ReadLine();
string[] yourData = line.Split(',');
}
}

Why not just
foreach (var line in File.ReadLines(#"C:\path\to\file.csv"))
foreach (var segment in line.Split(','))
Console.WriteLine(segment);

parsing text file to data table with irregular rows

i am trying to parse a tabular data in a text file into a data table.
the text file contains text
PID USERNAME THR PRI NICE SIZE RES STATE TIME WCPU COMMAND
11 root 1 171 52 0K 12K RUN 23:46 80.42% idle
12 root 1 -20 -139 0K 12K RUN AS 0:56 7.96% swi7:
the code i have is like
public class Program
{
static void Main(string[] args)
{
var lines = File.ReadLines("bb.txt").ToArray();
var headerLine = lines[0];
var dt = new DataTable();
var columnsArray = headerLine.Split(" ".ToCharArray(), StringSplitOptions.RemoveEmptyEntries);
var dataColumns = columnsArray.Select(item => new DataColumn { ColumnName = item });
dt.Columns.AddRange(dataColumns.ToArray());
for (int i = 1; i < lines.Length; i++)
{
var rowLine = lines[i];
var rowArray = rowLine.Split(" ".ToCharArray(), StringSplitOptions.RemoveEmptyEntries);
var x = dt.NewRow();
x.ItemArray = rowArray;
dt.Rows.Add(x);
}
}
}
i get an error that "Input array is longer than the number of columns in this table" at second attempt on
x.ItemArray = rowArray;
Off course because second row has "RUN AS" as the value of 8th column. it also has a space between it which is a common split character for the entire row hence creating a mismatch between array's length and columns length.
what is the possible solution for this kind of situation.

Assuming that "RUN AS" is your only string that causes you the condition like this, you could just run var sanitizedLine = rowLine.Replace("RUN AS", "RUNAS") before your split and then separate the words back out afterwards. If this happens more often, however, you may need to set a condition to check that the array generated by the split matches the length of the header, then combine the offending indexes in a new array of the correct length before attempting to add it.
Ideally, however, you would instead have whatever is generating your input file wrap strings in quotes to make your life easier.

Code does not execute

I know I've been a bit of pain, the last couple of days, that is, with all my questions, but I've been developing this project and I'm (figuratively) inches away from finishing it.
That being said, I would like your help on one more matter. It kind of relates to my previous questions, but you do not need the code for those. The problem lies exactly on this bit of code. What I want from you is to help me identify it and, consequently, solve it.
Before I show you the code I'd been working on, I'd like to say a few extra things:
My application has a file merging feature, merging two files together and handling duplicate entries.
In any given file, each line can have one of these four formats (the last three are optional): Card Name|Amount, .Card Name|Amount, ..Card Name|Amount, _Card Name|Amount.
If a line is not appropriately formatted, the program will skip it (ignore it altogether).
So, basically, a sample file could be as follows:
Blue-Eyes White Dragon|3
..Blue-Eyes Ultimate Dragon|1
.Dragon Master Knight|1
_Kaibaman|1
Now, when it comes to using the file merger, if a line starts with one of the special characters . .. _, it should act accordingly. For ., it operates normally. For lines starting with .., it moves the index to the second dot and, finally, it ignores _ lines completely (they have another use not related to this discussion).
Here is my code for the merge function (for some odd reason, the code inside the second loop won't execute at all):
if (openFileDialog1.ShowDialog() == DialogResult.OK)
{
// Save file names to array.
string[] fileNames = openFileDialog1.FileNames;
// Loop through the files.
foreach (string fileName in fileNames)
{
// Save all lines of the current file to an array.
string[] lines = File.ReadAllLines(fileName);
// Loop through the lines of the file.
for (int i = 0; i < lines.Length; i++)
{
// Split current line.
string[] split = lines[i].Split('|');
// If the current line is badly formatted, skip to the next one.
if (split.Length != 2)
continue;
string title = split[0];
string times = split[1];
if (lines[i].StartsWith("_"))
continue;
// If newFile (list used to store contents of the card resource file) contains the current line of the file that we're currently looping through...
for (int k = 0; k < newFile.Count; k++)
{
if (lines[i].StartsWith(".."))
{
string newTitle = lines[i].Substring(
lines[i].IndexOf("..") + 1);
if (newFile[k].Contains(newTitle))
{
// Split the line once again.
string[] secondSplit = newFile.ElementAt(
newFile.IndexOf(newFile[k])).Split('|');
string secondTimes = secondSplit[1];
// Replace the newFile element at the specified index.
newFile[newFile.IndexOf(newFile[k])] =
string.Format("{0}|{1}", newTitle, int.Parse(times) + int.Parse(secondTimes));
}
// If newFile does not contain the current line of the file we're looping through, just add it to newFile.
else
newFile.Add(string.Format(
"{0}|{1}",
newTitle, times));
continue;
}
if (newFile[k].Contains(title))
{
string[] secondSplit = newFile.ElementAt(
newFile.IndexOf(newFile[k])).Split('|');
string secondTimes = secondSplit[1];
newFile[newFile.IndexOf(newFile[k])] =
string.Format("{0}|{1}", title, int.Parse(times) + int.Parse(secondTimes));
}
else
{
newFile.Add(string.Format("{0}|{1}", title, times));
}
}
}
}
// Overwrite resources file with newFile.
using (StreamWriter sw = new StreamWriter("CardResources.ygodc"))
{
foreach (string line in newFile)
sw.WriteLine(line);
}
I know this is quite a long piece of code, but I believe all of it is relevant to a point. I skipped some unimportant bits (after all of this is executed) as they are completely irrelevant.

Splitting a string seems not to work

I have problems with reading a file (textasset) line by line and getting the results!
Here is the file I am trying to read:
AUTHOR
COMMENT
INFO 1 X ARG 0001 0.581 2.180 1.470
INFO 2 X ARG 0001 1.400 0.974 1.724
INFO 3 X ARG 0001 2.553 0.934 0.751
INFO 4 X ARG 0001 3.650 0.494 1.053
INFO 5 X ARG 0001 1.188 3.073 1.532
INFO 6 X ARG 0001 2.312 1.415 -0.466
INFO 7 X ARG 0001 -0.232 2.249 2.180
END
Here is the code I am using:
//read file
string[] line = file.text.Split("\n"[0]);
for(int i = 0 ; i < line.Length ; i++)
{
if(line[i].Contains("INFO"))
{
//To replace all spaces with single underscore "_" (it works fine)
string l = Regex.Replace(line[i]," {2,}","_");
//In this Debug.Log i get correct results
//For example "INFO_1_X_ARG_0001_0.581_2.180_1.470"
Debug.Log(l);
string[] subline = Regex.Split(l,"_");
//Only for first "INFO" line i get correct results (INFO,1,X,ARG,0001,0.581,2.180,1.470)
//For all other "INFO" lines i get incomplete results (first,fourth and fifth element are not put into substrings
//like they are dissapeard!
foreach(string s in subline){Debug.Log(s);}
}
}
Explanation:
I first split text into lines (works fine),then i read only lines that contain INFO
I loop all lines that contain INFO and replace all spaces with underscore _ (this works fine)
I split lines that contain INFO into substrings based on underscore _
When I print out the lines only first line with INFO seems to have all substrings
every next line is not splitted correctly (first part INFO is omitted as well as third string)
It seems very unreliable. Is this the way to go with these things? Any help is appreciated! This should be simple, what i am doing wrong?
EDIT:
Something is wrong with this code (it should be simple, but it does not work)
Here is the updated code (i just made a List<string> list = new List<string>() and copied all substrings. I use unity3D so that list contents show in the inspector. I was shocked when i so all properly extracted substrings but simple
foreach(string s in list)
Debug.Log(s);
was indeed missing some values. so I was trying different things and this code:
for(int x = 0; x < list.Count ; x++)
{
Debug.Log("List: " + x.ToString() + " " + list[x].ToString());
}
shows contents of the list properly, but this code (note that i just removed x.ToString()) is missing some elements in the list. It does not want to read them!
for(int x = 0; x < list.Count ; x++)
Debug.Log("List: " + list[x].ToString());
So i am not sure what is going on here?!

There are some problems
1>The contains method you are using is case sensitive i.e INFO != info
You should use
line[i].ToLower().Contains("info")
2>Is the text always separated by space.it may also be separated by tabs.you are better off with
Regex.Replace(line[i]," {2,}|\t+","_");
//this would replace 1 to many tabs or 2 or more space

The following seems to be working for me:
using (var fs = new FileStream(filePath, FileMode.Open))
using (var reader = new StreamReader(fs))
{
string line;
while ((line = reader.ReadLine()) != null)
{
if (line.StartsWith("INFO"))
{
line = Regex.Replace(line, "[ ]+", "_");
var subline = line.Split('_');
foreach (var str in subline)
{
Console.Write("{0} ",str);
}
Console.WriteLine();
}
}
}

You may want to try something like this:
for (int i = 0; i < line.Length; i++)
{
if (line[i].Contains("INFO"))
{
string l = Regex.Replace(line[i], #"\p{Zs}{2,}|\t+", "_");
string[] sublines = l.Split('_');
// If you want to see the debug....
sublines.ForEach(s => Debug.Log(s));
}
}
The \p{Zs} will match all Unicode separator/space characters (e.g. space, non-breaking spaces, etc.). The following reference may be of some help to you: Character Classes in Regular Expressions.

Try string.split("\t"[0]") You have probable tabulators between columns.

is there any way to ignore reading in certain lines in a text file?

I'm trying to read in a text file in a c# application, but I don't want to read the first two lines, or the last line. There's 8 lines in the file, so effectivly I just want to read in lines, 3, 4, 5, 6 and 7.
Is there any way to do this?
example file
_USE [Shelley's Other Database]
CREATE TABLE db.exmpcustomers(
fName varchar(100) NULL,
lName varchar(100) NULL,
dateOfBirth date NULL,
houseNumber int NULL,
streetName varchar(100) NULL
) ON [PRIMARY]_
EDIT
Okay, so, I've implemented Callum Rogers answer into my code and for some reason it works with my edited text file (I created a text file with the lines I didn't want to use omitted) and it does exactly what it should, but whenever I try it with the original text file (above) it throws an exception. I display this information in a DataGrid and I think that's where the exception is being thrown.
Any ideas?

The Answer by Rogers is good, I am just providing another way of doing this.
Try this,
List<string> list = new List<string>();
using (StreamReader reader = new StreamReader(FilePath))
{
string text = "";
while ((text = reader.ReadLine()) != null)
{
list.Add(text);
}
list.RemoveAt(0);
list.RemoveAt(0);
}
Hope this helps

Why do you want to ignore exactly the first two and the last line?
Depending on what your file looks like you might want to analyze the line, e.g. look at the first character whether it is a comment sign, or ignore everything until you find the first empty line, etc.
Sometimes, hardcoding "magic" numbers isn't such a good idea. What if the file format needs to be changed to contain 3 header lines?
As the other answers demonstrate: Nothing keeps you from doing what you ever want with a line you have read, so of course, you can ignore it, too.
Edit, now that you've provided an example of your file: For your case I'd definitely not use the hardcoded numbers approach. What if some day the SQL statement should contain another field, or if it appears on one instead of 8 lines?
My suggestion: Read in the whole string at once, then analyze it. Safest way would be to use a grammar, but if you presume the SQL statement is never going to be more complicated, you can use a regular expression (still much better than using line numbers etc.):
string content = File.ReadAllText(filename);
Regex r = new Regex(#"CREATE TABLE [^\(]+\((.*)\) ON");
string whatYouWant = r.Match(content).Groups[0].Value;

Why not just use File.ReadAllLines() and then remove the first 2 lines and the last line? With such a small file speed differences will not be noticeable.
string[] allLines = File.ReadAllLines("file.ext");
string[] linesWanted = new string[allLines.Length-3];
Array.Copy(allLines, 2, linesWanted, 0, allLines.Length-3);

If you have a TextReader object wrapping the filestream you could just call ReadLine() two times.
StreamReader inherits from TextReader, which is abstract.
Non-fool proof example:
using (var fs = new FileStream("blah", FileMode.Open))
using (var reader = new StreamReader(fs))
{
reader.ReadLine();
reader.ReadLine();
// Do stuff.
}

string filepath = #"C:\whatever.txt";
using (StreamReader rdr = new StreamReader(filepath))
{
rdr.ReadLine(); // ignore 1st line
rdr.ReadLine(); // ignore 2nd line
string fileContents = "";
while (true)
{
string line = rdr.ReadLine();
if (rdr.EndOfStream)
break; // finish without processing last line
fileContents += line + #"\r\n";
}
Console.WriteLine(fileContents);
}

How about a general solution?
To me, the first step is to enumerate over the lines of a file (already provided by ReadAllLines, but that has a performance cost due to populating an entire string[] array; there's also ReadLines, but that's only available as of .NET 4.0).
Implementing this is pretty trivial:
public static IEnumerable<string> EnumerateLines(this FileInfo file)
{
using (var reader = file.OpenText())
{
while (!reader.EndOfStream)
{
yield return reader.ReadLine();
}
}
}
The next step is to simply skip the first two lines of this enumerable sequence. This is straightforward using the Skip extension method.
The last step is to ignore the last line of the enumerable sequence. Here's one way you could implement this:
public static IEnumerable<T> IgnoreLast<T>(this IEnumerable<T> source, int ignoreCount)
{
if (ignoreCount < 0)
{
throw new ArgumentOutOfRangeException("ignoreCount");
}
var buffer = new Queue<T>();
foreach (T value in source)
{
if (buffer.Count < ignoreCount)
{
buffer.Enqueue(value);
continue;
}
T buffered = buffer.Dequeue();
buffer.Enqueue(value);
yield return buffered;
}
}
OK, then. Putting it all together, we have:
var file = new FileInfo(#"path\to\file.txt");
var lines = file.EnumerateLines().Skip(2).IgnoreLast(1);
Test input (contents of file):
This is line number 1.
This is line number 2.
This is line number 3.
This is line number 4.
This is line number 5.
This is line number 6.
This is line number 7.
This is line number 8.
This is line number 9.
This is line number 10.
Output (of Skip(2).IgnoreLast(1)):
This is line number 3.
This is line number 4.
This is line number 5.
This is line number 6.
This is line number 7.
This is line number 8.
This is line number 9.

You can do this:
var valid = new int[] { 3, 4, 5, 6, 7 };
var lines = File.ReadAllLines("file.txt").
Where((line, index) => valid.Contains(index + 1));
Or the opposite:
var invalid = new int[] { 1, 2, 8 };
var lines = File.ReadAllLines("file.txt").
Where((line, index) => !invalid.Contains(index + 1));
If you're looking for a general way to remove the last and the first 2, you can use this:
var allLines = File.ReadAllLines("file.txt");
var lines = allLines
.Take(allLines.Length - 1)
.Skip(2);
But from your example it seems that you're better off looking for the string pattern that you want to read from the file. Try using regexes.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

How to skip txt file chunks - c#

Related

C# - Reading from CSV file producing two entries at once?

parsing text file to data table with irregular rows

Code does not execute

Splitting a string seems not to work

is there any way to ignore reading in certain lines in a text file?

Categories

Resources