C# - Reading from CSV file producing two entries at once? - c#

I am trying to read in a list of several hundred thousand values from what once was a spreadsheet, but for the sake of simplicity, I have turned into a CSV file.
My problem is that while testing it to make sure it reads properly, the code is for some reason ignoring the comma after the second position, and combining the value in that spot with the value next to it, despite being, you know, separated by a comma. It also begins to combine the final value with the first value from the next set.
For Example:
CSV File:
0,0,0,104672
0,1,6,51971
0,1,36,80212
0,2,5,51972
0,2,13,51973
...
Program Output:
0
00
1046720
00
16
519710
136
...
I think the example probably does a better job describing what's going on than I did in words. It continues like that, displaying the wrong information until it reaches the end of the file.
My code is as follows:
static void Main()
{
using(var fs = File.OpenRead(#"C:\path\to\file.csv"))
using(var read = new StreamReader(fs))
{
while (!read.EndOfStream)
{
int i = 0;
var line = read.ReadLine();
while (i < 4)
{
var values = line.Split(',');
Console.Write(values[i]);
Console.Read();
i++;
}
}
}
}
EDIT: Sorry, I got lost in my understanding of what the code should do and forgot to explain the goal here.
This program is made to take these values and rename a file from the 4th value (for example, 104672) to the first three values, separated by dashes (ex. 0-0-0). What I want from my output right now is to be able to see the program give me the values back, one at a time, so that I know when I go to rename the files, I'm not getting improper results.
EDIT 2: I also realize, a day later, that the answer I got was one of significance to making my program work, rather than actually discovering why I was getting the output I got. For those curious in the future, the answer is essentially that Console.Read(); is not a true pause, and causes more writes to happen upon key press than expected.

A more clear and easy-to-understand approach would be:
using (StreamReader sr = new StreamReader(#"C:\path\to\file.csv"))
{
string currentLine;
while((currentLine = sr.ReadLine()) != null)
{
string[] lineArr = line.Split(',');
foreach(string subLine in lineArr)
{
Console.WriteLine(subline);
}
Console.Read(); // Awaits user input in order to proceed
}
}

if you need those values for later use, why don't you put them into List for later... like this:
List<string[]> listOfValues = new List<string[]>();
using (var fs = File.OpenRead(#"C:\temp\csv.txt"))
using (var read = new StreamReader(fs))
{
while (!read.EndOfStream)
{
var line = read.ReadLine();
listOfValues.Add(line.Split(','));
}
}
later, you can use data from list:
for (int i = 0; i<listOfValues.Count; i++)
{
Console.WriteLine("line number: {0}, contents: {1}", i + 1, string.Join(" ", listOfValues[i]));
}
which gives you
line number: 1, contents: 0 0 0 104672
line number: 2, contents: 0 1 6 51971
line number: 3, contents: 0 1 36 80212
line number: 4, contents: 0 2 5 51972
line number: 5, contents: 0 2 13 51973

It's hard to tell from your code what you think it's supposed to do. Here is a version that will read each line, split it on the commas, and iterate through the values, printing each value. After printing all the values for a line, it prints a new line. Hopefully that's something like what you were trying to achieve.
static void Main()
{
using(var fs = File.OpenRead(#"C:\path\to\file.csv"))
using(var read = new StreamReader(fs))
{
while (!read.EndOfStream)
{
int i = 0;
var line = read.ReadLine();
var values = line.Split(',');
while (i < values.Length)
{
Console.Write(values[i]);
//Console.Read();
i++;
}
Console.WriteLine();
}
}
}

As #rory.ap said, you have plenty of libraries to read CSV right out of the box. But event if you still want to do it on your own, it seems that it is taking a great effort on doing a simple task. Try this:
using (StreamReader reader = new StreamReader("C:/yourpath/yourfile.csv"))
{
while (reader.Peek() >= 0)
{
string line = reader.ReadLine();
string[] yourData = line.Split(',');
}
}

Why not just
foreach (var line in File.ReadLines(#"C:\path\to\file.csv"))
foreach (var segment in line.Split(','))
Console.WriteLine(segment);

Related

Can't read a file line-by-line in C#

I've been trying to read a file line by line for my UNI project.
I am getting an error that I am not sure I understand. I would need your experiene to help me solve it out, please.
Some clarification for the code:
datas is a List, which has a custom class type which class has 3 properties: text1, text2, int1.
v is a simple object with the same custom class type as the datas List.
The data in the text file are in line-by-line, each line contains 1 value for the 3 properties like this: text1value;text2value;int1value.
if (File.Exists("example.txt"))
{
StreamReader sr = new StreamReader("example.txt");
while(!sr.EndOfStream)
{
string[] data = sr.ReadLine().Split(';');
v.text1 = data[0];
v.text2 = data[1];
v.int1 = Convert.ToInt32(data[2]);
datas.Add(v);
}
sr.Close();
Thanks to you guys I have made improvements on my code and made it work!
Now I only have 1 functionality error which I do not understand on the code which is after the read in is completed. (so the code runs without error, crash, etc. - but gives the wrong result SOMETIMES!).
int i = 0;
int cnt = datas.Count;
while (i < cnt)
{
if (datas[i].Text1 == tb_Text1.Text && datas[i].Text2 == tb_Text2.Text)
{
// I do stuff here with the correct combination
DialogResult = DialogResult.OK;
break;
}
else
{
i++;
}
}
if(i==cnt)
{
MessageBox.Show("The following combination is not in the txt file!");
}
}
So in the second part of the code, on the Windows Form, there are 2 textboxes: one is for the text1 property, the other is for the text2 property.
I would like it to work like it would in a username-password scenario.
If the user types a text1 and text2 value in the textboxes, and clicks on the button which is on the Form, and that specific text1 and text2 values are stored in the same line of the txt file which was read in in the first half of the code, it should ACCEPT that combination.
Now, my problem is, I have 2 lines of records in my txt file right now.
So that should mean that in my datas named List, there should be 2 "items".
The first line for example is this in the txt file: Example1;example123;1
And the second line is this: Example2;example234;1
Every time I write Example2 and example234 in the textboxes, it WORKS.
Every time I write Example1 and example123 in the textboxes, it DOESNT WORK and I get the MessageBox message.
Anyone have any idea where did I go wrong?
Remove your loop:
for(int j=0; j<x; j++)
{
sr.ReadLine();
}
I am assuming you are attempting to position to the correct line, but StreamReader.ReadLine() already advances the read position. You don't need the loop.
What is happening is that your loop is reading past the end of the file, so then the ReadLine in
string[] data = sr.ReadLine().Split(';');
returns null, and so the Split() throws a null reference exception.
I think that you are trying to do something along these lines? The ReadLine() will automatically move to the next row in the file.
if (File.Exists("example.txt"))
{
StreamReader sr = new StreamReader("example.txt");
while(!sr.EndOfStream)
{
string[] data = sr.ReadLine().Split(';');
v.text1 = data[0];
v.text2 = data[1];
v.int1 = Convert.ToInt32(data[2]);
datas.Add(v);
}
sr.Close();
}
To propose an additional improvement, use using to create the StreamReader and it will take care of the file handeling for you:
if (File.Exists("example.txt"))
{
using(StreamReader sr = new StreamReader("example.txt"))
{
while(!sr.EndOfStream)
{
string[] data = sr.ReadLine().Split(';');
v.text1 = data[0];
v.text2 = data[1];
v.int1 = Convert.ToInt32(data[2]);
datas.Add(v);
}
}
}
(And maybe include the case that the file does not exist as an error and catch it.)
Your loop is the while. The for() loop will just disrupt the flow. My guess is you think you have to read from the start every time you want to do a ReadLine(). But the stream will remember where you left off after the last ReadLine().
if (File.Exists("example.txt"))
{
StreamReader sr = new StreamReader("example.txt");
while(!sr.EndOfStream)
{
string[] data = sr.ReadLine().Split(';');
v.text1 = data[0];
v.text2 = data[1];
v.int1 = Convert.ToInt32(data[2]);
datas.Add(v);
}
sr.Close();
}

Reading CSV file, match column 3 to column 1 on separate lines

I am beating my head against the wall, and I'm hoping that someone can point me in the right direction. This can't be as complicated as I'm making it.
I'm working on a project where the program reads one file (that's approx 50 lines or so) and I need to have it match data on the third column to data on the first column of a separate line. I opened up a new project because it was getting too complex for such an easy task.
Here's an example file that is closely relevant to the actual file I'm working with:
a1,b1,c1,d1
**c4**,b2,c2,d2
a3,b3,c3,d3
a4,b4,**c4**,d4
a5,b5,c4,d5
I promise this isn't for a school project, this is something that I need to figure out for work purposes.
Here is what I have, and I know it's just not going to work because it's only reading line by line for comparison. How do I get the program to read the current array value in the foreach command against the entire file that I caught in streamreader?
static void Main(string[] args)
{
StreamReader sr = new StreamReader("directories.txt");
string sline = sr.ReadLine();
string[] sarray = sline.Split(',');
string col3 = sarray[2];
string col1 = sarray[0];
foreach(string a in sarray)
{
// ?!?!?!!!
// I know this won't work because I'm comparing the same line being read.
// How in the world can I make this program read col3 of the current line being read against the entire file that was read earlier?
if (col3 == col1)
{
Directory.CreateDirectory("DRIVE:\\Location\\" + a.ToString());
}
}
}
Thank you ahead of time!
Since your file is small you can go with the simplest path...
var lines = File.ReadLines(filename)
.Select(line => line.Split(','))
.ToList();
var result = from a in lines
from b in lines
where a[0] == b[2]
select new { a, b };
foreach(var x in result)
{
Console.WriteLine(string.Join(",", x.a) + " - " + string.Join(",", x.b));
}

Remove a specific column from a delimited file

I've been working with some big delimited text (~1GB) files these days. It looks like somewhat below
COlumn1 #COlumn2#COlumn3#COlumn4
COlumn1#COlumn2#COlumn3 #COlumn4
where # is the delimiter.
In case a column is invalid I might have to remove it from the whole text file. The output file when Column 3 is invalid should look like this.
COlumn1 #COlumn2#COlumn4
COlumn1#COlumn2#COlumn4
string line = "COlumn1# COlumn2 #COlumn3# COlumn4";
int junk =3;
int columncount = line.Split(new char[] { '#' }, StringSplitOptions.None).Count();
//remove the [junk-1]th '#' and the value till [junk]th '#'
//"COlumn1# COlumn2 # COlumn4"
I's not able to find a c# version of this in SO. Is there a way I can do that? Please help.
EDIT:
The solution which I found myself is like below which does the job. Is there a way I could modify this to a better way so that it narrows down the performance impact it might have in case of large text files?
int junk = 3;
string line = "COlumn1#COlumn2#COlumn3#COlumn4";
int counter = 0;
int colcount = line.Split(new char[] { '#' }, StringSplitOptions.None).Length - 1;
string[] linearray = line.Split(new char[] { '#' }, StringSplitOptions.None);
List<string> linelist = linearray.ToList();
linelist.RemoveAt(junk - 1);
string finalline = string.Empty;
foreach (string s in linelist)
{
counter++;
finalline += s;
if (counter < colcount)
finalline += "#";
}
Console.WriteLine(finalline);
EDITED
This method can be very memory expensive, as your can read in this post, the suggestion should be:
If you need to run complex queries against the data in the file, the right thing to do is to load the data to database and let DBMS to take care of data retrieval and memory management.
To avoid memory consumption you should use a StreamReader to read file line by line
This could be a start for your task, missing your invalid match logic
using System.Collections.Generic;
using System.IO;
using System.Text;
namespace ConsoleApplication1
{
class Program
{
static void Main(string[] args)
{
const string fileName = "temp.txt";
var results = FindInvalidColumns(fileName);
using (var reader = File.OpenText(fileName))
{
while (!reader.EndOfStream)
{
var builder = new StringBuilder();
var line = reader.ReadLine();
if (line == null) continue;
var split = line.Split(new[] { "#" }, 0);
for (var i = 0; i < split.Length; i++)
if (!results.Contains(i))
builder.Append(split[i]);
using (var fs = new FileStream("new.txt", FileMode.Append, FileAccess.Write))
using (var sw = new StreamWriter(fs))
{
sw.WriteLine(builder.ToString());
}
}
}
}
private static List<int> FindInvalidColumns(string fileName)
{
var invalidColumnIndexes = new List<int>();
using (var reader = File.OpenText(fileName))
{
while (!reader.EndOfStream)
{
var line = reader.ReadLine();
if (line == null) continue;
var split = line.Split(new[] { "#" }, 0);
for (var i = 0; i < split.Length; i++)
{
if (IsInvalid(split[i]) && !invalidColumnIndexes.Contains(i))
invalidColumnIndexes.Add(i);
}
}
}
return invalidColumnIndexes;
}
private static bool IsInvalid(string s)
{
return false;
}
}
}
First, what you will do is re-write the line to a text file using a 0-length string for COlumn3. Therefore the line after being written correctly would look like this:
COlumun1#COlumn2##COlumn4
As you can see, there are two delimiters between COlumn2 and COlumn4. This is a cell with no data in it. (By "cell" I mean one column of a certain, single row.) Later, when some other process reads this using the Split function, it will still create a new value for Column 3, but in the array generated by Split, the 3rd position would be an empty string:
String[] columns = stream_reader.ReadLine().Split('#');
int lengthOfThirdItem = columns[2].Length; // for proof
// lengthOfThirdItem = 0
This reduces invalid values to null and persists them back in the text file.
For more on String.Split see C# StreamReader save to Array with separator.
It is not possible to write to lines internal to a text file while it is also open for read. This article discusses it some (simultaneous read-write a file in C#), but it looks like that question-asker just wants to be able to write lines to the end. You want to be able to write lines at any point in the interior. I think this is not possible without buffering the data in some way.
The simplest way to buffer the data is rename the file to a temp file first (using File.CoMovepy() // http://msdn.microsoft.com/en-us/library/system.io.file.move(v=vs.110).aspx). Then use the temp file as the data source. Just open the temp file that to read in the data which may have corrupt entries, and write the data afresh to the original file name using the approach I describe above to represent empty columns. After this is complete, then you should delete the temp file.
Important
Deleting the temp file may leave you vulnerable to power and data transients (or software 'transients'). (I.e., a power drop that interrupts part of the process could leave the data in an unusable state.) So you may also want to leave the temp file on the drive as an emergency backup in case of some problem.

Read and extract from file

I have a huge file with ~3 mill rows. Every line contains record like this:
1|2|3|4|5|6|7|8|9
Exactly 8 separators like '|' on every line. I am looking for a way to read this file then extract last '9' number only from every line and store it into another file.
edit:
Ok here is what i done already.
using (StreamReader sr = new StreamReader(filepath))
using (StreamWriter sw = new StreamWriter(filepath1))
{
string line = null;
while ((line = sr.ReadLine()) != null)
sw.WriteLine(line.Split('|')[8]);
}
File.WriteAllLines("filepath", File.ReadAllLines(filepath).Where(l => !string.IsNullOrWhiteSpace(l)));
Read file, extract last digits then write in new file and clear blank lines. Last digit is 10-15 symbols and I want to extract first 6. I continue to read and try some and when I'm done or have some question I'll edit again.
Thanks
Edit 2:
Ok, here I take first 8 digits from the number:
sw.WriteLine(line.Substring(0, Math.Min(line.Length, 8)));
Edit 3:
I have no idea how can I match now every numbers that left in file. I want to match them and to see witch number how many times is in the file.
Any help?
I am looking for a way to read this file then extract last [..] number only from every line and store it into another file.
What part exactly are you having trouble with? In psuedo code, this is what you want:
fileReader = OpenFile("input")
fileWriter = OpenFile("output")
while !fileReader.EndOfFile
line = fileReader.ReadLine
records[] = line.Split('|')
value = records[8]
fileWriter.WriteLine(value)
do
So start implementing it and feel free to ask a question on any specific line you're having trouble with. Each line of code I posted contains enough pointers to figure out the C# code or the terms to do a web search for it.
You don't say where you are stuck. Break the problem down:
Write and run minimal C# program
Read lines from file
Break up one line
write result line to a file
Are you stuck on any one of those? Then ask a specific question about that. This decomposition technique is key to many programming tasks, and indeed complex tasks in general.
You might find the string split capability useful.
Because it's a huge file you must read it line by line!
public IEnumerable ReadFileIterator(String filePath)
{
using (StreamReader streamReader = new StreamReader(filePath, Encoding.Default))
{
String line;
while ((line = streamReader.ReadLine()) != null)
{
yield return line;
}
yield break;
}
}
public void WriteToFile(String inputFilePath, String outputFilePath)
{
using (StreamWriter streamWriter = new StreamWriter(outputFilePath, true, Encoding.Default))
{
foreach (String line in ReadFileIterator(inputFilePath))
{
String[] subStrings = line.Split('|');
streamWriter.WriteLine(subStrings[8]);
}
streamWriter.Flush();
streamWriter.Close();
}
}
using (StreamReader sr = new StreamReader("input"))
using (StreamWriter sw = new StreamWriter("output"))
{
string line = null;
while ((line=sr.ReadLine())!=null)
sw.WriteLine(line.Split('|')[8]);
}
Some pointer to start from: StreamReader.Readline() and String.Split(). There are examples on both pages.
With LINQ you could do a thing like the following to filter the numbers:
var numbers = from l in File.ReadLines(fileName)
let p = l.Split('|')
select p[8];
and then write them into a new file like that:
File.WriteAllText(newFileName, String.Join("\r\n", numbers));
Use String.Split() to get the line inside an array and get the last element and store it into another file. Repeat the process for each line.
Try this...
// Read the file and display it line by line.
System.IO.StreamReader file =
new System.IO.StreamReader("c:\\test.txt");
while((line = file.ReadLine()) != null)
{
string[] words = s.Split('|');
string value = words [8]
Console.WriteLine (value);
}
file.Close();

is there any way to ignore reading in certain lines in a text file?

I'm trying to read in a text file in a c# application, but I don't want to read the first two lines, or the last line. There's 8 lines in the file, so effectivly I just want to read in lines, 3, 4, 5, 6 and 7.
Is there any way to do this?
example file
_USE [Shelley's Other Database]
CREATE TABLE db.exmpcustomers(
fName varchar(100) NULL,
lName varchar(100) NULL,
dateOfBirth date NULL,
houseNumber int NULL,
streetName varchar(100) NULL
) ON [PRIMARY]_
EDIT
Okay, so, I've implemented Callum Rogers answer into my code and for some reason it works with my edited text file (I created a text file with the lines I didn't want to use omitted) and it does exactly what it should, but whenever I try it with the original text file (above) it throws an exception. I display this information in a DataGrid and I think that's where the exception is being thrown.
Any ideas?
The Answer by Rogers is good, I am just providing another way of doing this.
Try this,
List<string> list = new List<string>();
using (StreamReader reader = new StreamReader(FilePath))
{
string text = "";
while ((text = reader.ReadLine()) != null)
{
list.Add(text);
}
list.RemoveAt(0);
list.RemoveAt(0);
}
Hope this helps
Why do you want to ignore exactly the first two and the last line?
Depending on what your file looks like you might want to analyze the line, e.g. look at the first character whether it is a comment sign, or ignore everything until you find the first empty line, etc.
Sometimes, hardcoding "magic" numbers isn't such a good idea. What if the file format needs to be changed to contain 3 header lines?
As the other answers demonstrate: Nothing keeps you from doing what you ever want with a line you have read, so of course, you can ignore it, too.
Edit, now that you've provided an example of your file: For your case I'd definitely not use the hardcoded numbers approach. What if some day the SQL statement should contain another field, or if it appears on one instead of 8 lines?
My suggestion: Read in the whole string at once, then analyze it. Safest way would be to use a grammar, but if you presume the SQL statement is never going to be more complicated, you can use a regular expression (still much better than using line numbers etc.):
string content = File.ReadAllText(filename);
Regex r = new Regex(#"CREATE TABLE [^\(]+\((.*)\) ON");
string whatYouWant = r.Match(content).Groups[0].Value;
Why not just use File.ReadAllLines() and then remove the first 2 lines and the last line? With such a small file speed differences will not be noticeable.
string[] allLines = File.ReadAllLines("file.ext");
string[] linesWanted = new string[allLines.Length-3];
Array.Copy(allLines, 2, linesWanted, 0, allLines.Length-3);
If you have a TextReader object wrapping the filestream you could just call ReadLine() two times.
StreamReader inherits from TextReader, which is abstract.
Non-fool proof example:
using (var fs = new FileStream("blah", FileMode.Open))
using (var reader = new StreamReader(fs))
{
reader.ReadLine();
reader.ReadLine();
// Do stuff.
}
string filepath = #"C:\whatever.txt";
using (StreamReader rdr = new StreamReader(filepath))
{
rdr.ReadLine(); // ignore 1st line
rdr.ReadLine(); // ignore 2nd line
string fileContents = "";
while (true)
{
string line = rdr.ReadLine();
if (rdr.EndOfStream)
break; // finish without processing last line
fileContents += line + #"\r\n";
}
Console.WriteLine(fileContents);
}
How about a general solution?
To me, the first step is to enumerate over the lines of a file (already provided by ReadAllLines, but that has a performance cost due to populating an entire string[] array; there's also ReadLines, but that's only available as of .NET 4.0).
Implementing this is pretty trivial:
public static IEnumerable<string> EnumerateLines(this FileInfo file)
{
using (var reader = file.OpenText())
{
while (!reader.EndOfStream)
{
yield return reader.ReadLine();
}
}
}
The next step is to simply skip the first two lines of this enumerable sequence. This is straightforward using the Skip extension method.
The last step is to ignore the last line of the enumerable sequence. Here's one way you could implement this:
public static IEnumerable<T> IgnoreLast<T>(this IEnumerable<T> source, int ignoreCount)
{
if (ignoreCount < 0)
{
throw new ArgumentOutOfRangeException("ignoreCount");
}
var buffer = new Queue<T>();
foreach (T value in source)
{
if (buffer.Count < ignoreCount)
{
buffer.Enqueue(value);
continue;
}
T buffered = buffer.Dequeue();
buffer.Enqueue(value);
yield return buffered;
}
}
OK, then. Putting it all together, we have:
var file = new FileInfo(#"path\to\file.txt");
var lines = file.EnumerateLines().Skip(2).IgnoreLast(1);
Test input (contents of file):
This is line number 1.
This is line number 2.
This is line number 3.
This is line number 4.
This is line number 5.
This is line number 6.
This is line number 7.
This is line number 8.
This is line number 9.
This is line number 10.
Output (of Skip(2).IgnoreLast(1)):
This is line number 3.
This is line number 4.
This is line number 5.
This is line number 6.
This is line number 7.
This is line number 8.
This is line number 9.
You can do this:
var valid = new int[] { 3, 4, 5, 6, 7 };
var lines = File.ReadAllLines("file.txt").
Where((line, index) => valid.Contains(index + 1));
Or the opposite:
var invalid = new int[] { 1, 2, 8 };
var lines = File.ReadAllLines("file.txt").
Where((line, index) => !invalid.Contains(index + 1));
If you're looking for a general way to remove the last and the first 2, you can use this:
var allLines = File.ReadAllLines("file.txt");
var lines = allLines
.Take(allLines.Length - 1)
.Skip(2);
But from your example it seems that you're better off looking for the string pattern that you want to read from the file. Try using regexes.

Categories

Resources