Reading CSV file, match column 3 to column 1 on separate lines

Reading CSV file, match column 3 to column 1 on separate lines - c#

I am beating my head against the wall, and I'm hoping that someone can point me in the right direction. This can't be as complicated as I'm making it.
I'm working on a project where the program reads one file (that's approx 50 lines or so) and I need to have it match data on the third column to data on the first column of a separate line. I opened up a new project because it was getting too complex for such an easy task.
Here's an example file that is closely relevant to the actual file I'm working with:
a1,b1,c1,d1
**c4**,b2,c2,d2
a3,b3,c3,d3
a4,b4,**c4**,d4
a5,b5,c4,d5
I promise this isn't for a school project, this is something that I need to figure out for work purposes.
Here is what I have, and I know it's just not going to work because it's only reading line by line for comparison. How do I get the program to read the current array value in the foreach command against the entire file that I caught in streamreader?
static void Main(string[] args)
{
StreamReader sr = new StreamReader("directories.txt");
string sline = sr.ReadLine();
string[] sarray = sline.Split(',');
string col3 = sarray[2];
string col1 = sarray[0];
foreach(string a in sarray)
{
// ?!?!?!!!
// I know this won't work because I'm comparing the same line being read.
// How in the world can I make this program read col3 of the current line being read against the entire file that was read earlier?
if (col3 == col1)
{
Directory.CreateDirectory("DRIVE:\\Location\\" + a.ToString());
}
}
}
Thank you ahead of time!

Since your file is small you can go with the simplest path...
var lines = File.ReadLines(filename)
.Select(line => line.Split(','))
.ToList();
var result = from a in lines
from b in lines
where a[0] == b[2]
select new { a, b };
foreach(var x in result)
{
Console.WriteLine(string.Join(",", x.a) + " - " + string.Join(",", x.b));
}

Related

C# - Reading from CSV file producing two entries at once?

I am trying to read in a list of several hundred thousand values from what once was a spreadsheet, but for the sake of simplicity, I have turned into a CSV file.
My problem is that while testing it to make sure it reads properly, the code is for some reason ignoring the comma after the second position, and combining the value in that spot with the value next to it, despite being, you know, separated by a comma. It also begins to combine the final value with the first value from the next set.
For Example:
CSV File:
0,0,0,104672
0,1,6,51971
0,1,36,80212
0,2,5,51972
0,2,13,51973
...
Program Output:
0
00
1046720
00
16
519710
136
...
I think the example probably does a better job describing what's going on than I did in words. It continues like that, displaying the wrong information until it reaches the end of the file.
My code is as follows:
static void Main()
{
using(var fs = File.OpenRead(#"C:\path\to\file.csv"))
using(var read = new StreamReader(fs))
{
while (!read.EndOfStream)
{
int i = 0;
var line = read.ReadLine();
while (i < 4)
{
var values = line.Split(',');
Console.Write(values[i]);
Console.Read();
i++;
}
}
}
}
EDIT: Sorry, I got lost in my understanding of what the code should do and forgot to explain the goal here.
This program is made to take these values and rename a file from the 4th value (for example, 104672) to the first three values, separated by dashes (ex. 0-0-0). What I want from my output right now is to be able to see the program give me the values back, one at a time, so that I know when I go to rename the files, I'm not getting improper results.
EDIT 2: I also realize, a day later, that the answer I got was one of significance to making my program work, rather than actually discovering why I was getting the output I got. For those curious in the future, the answer is essentially that Console.Read(); is not a true pause, and causes more writes to happen upon key press than expected.

A more clear and easy-to-understand approach would be:
using (StreamReader sr = new StreamReader(#"C:\path\to\file.csv"))
{
string currentLine;
while((currentLine = sr.ReadLine()) != null)
{
string[] lineArr = line.Split(',');
foreach(string subLine in lineArr)
{
Console.WriteLine(subline);
}
Console.Read(); // Awaits user input in order to proceed
}
}

if you need those values for later use, why don't you put them into List for later... like this:
List<string[]> listOfValues = new List<string[]>();
using (var fs = File.OpenRead(#"C:\temp\csv.txt"))
using (var read = new StreamReader(fs))
{
while (!read.EndOfStream)
{
var line = read.ReadLine();
listOfValues.Add(line.Split(','));
}
}
later, you can use data from list:
for (int i = 0; i<listOfValues.Count; i++)
{
Console.WriteLine("line number: {0}, contents: {1}", i + 1, string.Join(" ", listOfValues[i]));
}
which gives you
line number: 1, contents: 0 0 0 104672
line number: 2, contents: 0 1 6 51971
line number: 3, contents: 0 1 36 80212
line number: 4, contents: 0 2 5 51972
line number: 5, contents: 0 2 13 51973

It's hard to tell from your code what you think it's supposed to do. Here is a version that will read each line, split it on the commas, and iterate through the values, printing each value. After printing all the values for a line, it prints a new line. Hopefully that's something like what you were trying to achieve.
static void Main()
{
using(var fs = File.OpenRead(#"C:\path\to\file.csv"))
using(var read = new StreamReader(fs))
{
while (!read.EndOfStream)
{
int i = 0;
var line = read.ReadLine();
var values = line.Split(',');
while (i < values.Length)
{
Console.Write(values[i]);
//Console.Read();
i++;
}
Console.WriteLine();
}
}
}

As #rory.ap said, you have plenty of libraries to read CSV right out of the box. But event if you still want to do it on your own, it seems that it is taking a great effort on doing a simple task. Try this:
using (StreamReader reader = new StreamReader("C:/yourpath/yourfile.csv"))
{
while (reader.Peek() >= 0)
{
string line = reader.ReadLine();
string[] yourData = line.Split(',');
}
}

Why not just
foreach (var line in File.ReadLines(#"C:\path\to\file.csv"))
foreach (var segment in line.Split(','))
Console.WriteLine(segment);

How to add text to the line that starts with "hello" in a file

I have a file that contains many lines. There is a line here looking like below:
hello jim jack nina richi sam
I need to add a specific text salmon in this line and change it to below (it could be added anywhere in this line -end -begining - in the middle -doesnt matter ):
hello jim jack nina richi sam salmon
I tried:
string path = #"C:\testFolder\newTestLog.txt";
StreamReader myReader = new StreamReader(path);
string[] allLines = File.ReadAllLines(path);
foreach (string element in allLines) {
if (element.StartsWith("hello"))
{
Console.WriteLine(element);
}
}
myReader.Close();
}
Using this I'm able to read the file line by line and add each line to an array and print that line if that starts with "hello", but I'm not sure how to add text to this line

You should use what Joel answered it's nicer but if you're having trouble implementing it try this. After adding the salmon to the lines that start with hello you can overwrite the txt file by using File.WriteAllLines
string filePath = #"C:\testFolder\newTestLog.txt";
string[] allLines = File.ReadAllLines(filePath);
for(int i = 0; i < allLines.Length; i++)
{
if (allLines[i].StartsWith("hello"))
{
allLines[i] += " salmon";
}
}
File.WriteAllLines(filePath, allLines);

Try this:
string path = #"C:\testFolder\newTestLog.txt";
var lines = File.ReadLines(path).Select(l => l + l.StartsWith("hello")?" salmon":"");
foreach (string line in lines)
Console.WriteLine(line);
Note that this still only writes the results to the Console, as your sample does. It's not clear what you really want to happen with the output.
If you want this saved to the original file, you've opened up a small can of worms. Think of all of the data in your file as if it's stored in one contiguous block1. If you append text to any line in the file, that text has nowhere to go but to overwrite the beginning of the next. As a practical matter, if you need to modify file, this often means either writing out a whole new file, and then deleting/renaming when done, or alternatively keeping the whole file in memory and writing it all from start to finish.
Using the 2nd approach, where we keep everything in memory, you can do this:
string path = #"C:\testFolder\newTestLog.txt";
var lines = File.ReadAllLines(path).Select(l => l + l.StartsWith("hello")?" salmon":"");
File.WriteAllLines(path, lines);
1 In fact, a file may be split into several fragments on the disk, but even so, each fragment is presented to your program as part of a single whole.

Code does not execute

I know I've been a bit of pain, the last couple of days, that is, with all my questions, but I've been developing this project and I'm (figuratively) inches away from finishing it.
That being said, I would like your help on one more matter. It kind of relates to my previous questions, but you do not need the code for those. The problem lies exactly on this bit of code. What I want from you is to help me identify it and, consequently, solve it.
Before I show you the code I'd been working on, I'd like to say a few extra things:
My application has a file merging feature, merging two files together and handling duplicate entries.
In any given file, each line can have one of these four formats (the last three are optional): Card Name|Amount, .Card Name|Amount, ..Card Name|Amount, _Card Name|Amount.
If a line is not appropriately formatted, the program will skip it (ignore it altogether).
So, basically, a sample file could be as follows:
Blue-Eyes White Dragon|3
..Blue-Eyes Ultimate Dragon|1
.Dragon Master Knight|1
_Kaibaman|1
Now, when it comes to using the file merger, if a line starts with one of the special characters . .. _, it should act accordingly. For ., it operates normally. For lines starting with .., it moves the index to the second dot and, finally, it ignores _ lines completely (they have another use not related to this discussion).
Here is my code for the merge function (for some odd reason, the code inside the second loop won't execute at all):
if (openFileDialog1.ShowDialog() == DialogResult.OK)
{
// Save file names to array.
string[] fileNames = openFileDialog1.FileNames;
// Loop through the files.
foreach (string fileName in fileNames)
{
// Save all lines of the current file to an array.
string[] lines = File.ReadAllLines(fileName);
// Loop through the lines of the file.
for (int i = 0; i < lines.Length; i++)
{
// Split current line.
string[] split = lines[i].Split('|');
// If the current line is badly formatted, skip to the next one.
if (split.Length != 2)
continue;
string title = split[0];
string times = split[1];
if (lines[i].StartsWith("_"))
continue;
// If newFile (list used to store contents of the card resource file) contains the current line of the file that we're currently looping through...
for (int k = 0; k < newFile.Count; k++)
{
if (lines[i].StartsWith(".."))
{
string newTitle = lines[i].Substring(
lines[i].IndexOf("..") + 1);
if (newFile[k].Contains(newTitle))
{
// Split the line once again.
string[] secondSplit = newFile.ElementAt(
newFile.IndexOf(newFile[k])).Split('|');
string secondTimes = secondSplit[1];
// Replace the newFile element at the specified index.
newFile[newFile.IndexOf(newFile[k])] =
string.Format("{0}|{1}", newTitle, int.Parse(times) + int.Parse(secondTimes));
}
// If newFile does not contain the current line of the file we're looping through, just add it to newFile.
else
newFile.Add(string.Format(
"{0}|{1}",
newTitle, times));
continue;
}
if (newFile[k].Contains(title))
{
string[] secondSplit = newFile.ElementAt(
newFile.IndexOf(newFile[k])).Split('|');
string secondTimes = secondSplit[1];
newFile[newFile.IndexOf(newFile[k])] =
string.Format("{0}|{1}", title, int.Parse(times) + int.Parse(secondTimes));
}
else
{
newFile.Add(string.Format("{0}|{1}", title, times));
}
}
}
}
// Overwrite resources file with newFile.
using (StreamWriter sw = new StreamWriter("CardResources.ygodc"))
{
foreach (string line in newFile)
sw.WriteLine(line);
}
I know this is quite a long piece of code, but I believe all of it is relevant to a point. I skipped some unimportant bits (after all of this is executed) as they are completely irrelevant.

Searching for line of one text file in another text file, faster

Is there a faster way to search each line of one text file for occurrence in another text file, than by going line by line in both files?
I have two text files - one has ~2500 lines (let's call it TxtA), the other has ~86000 lines(TxtB). I want to search TxtB for each line in TxtA, and return the line in TxtB for each match found.
I currently have this setup as: for each line in TxtA, search TxtB line by line for a match. However this is taking a really long time to process. It seems like it would take 1-3 hours to find all the matches.
Here is my code...
private static void getGUIDAndType()
{
try
{
Console.WriteLine("Begin.");
System.Threading.Thread.Sleep(4000);
String dbFilePath = #"C:\WindowsApps\CRM\crm_interface\data\";
StreamReader dbsr = new StreamReader(dbFilePath + "newdbcontents.txt");
List<string> dblines = new List<string>();
String newDataPath = #"C:\WindowsApps\CRM\crm_interface\data\";
StreamReader nsr = new StreamReader(newDataPath + "HolidayList1.txt");
List<string> new1 = new List<string>();
string dbline;
string newline;
List<string> results = new List<string>();
while ((newline = nsr.ReadLine()) != null)
{
//Reset
dbsr.BaseStream.Position = 0;
dbsr.DiscardBufferedData();
while ((dbline = dbsr.ReadLine()) != null)
{
newline = newline.Trim();
if (dbline.IndexOf(newline) != -1)
{//if found... get all info for now
Console.WriteLine("FOUND: " + newline);
System.Threading.Thread.Sleep(1000);
new1.Add(newline);
break;
}
else
{//the first line of db does not contain this line...
//go to next dbline.
Console.WriteLine("Lines do not match - continuing");
continue;
}
}
Console.WriteLine("Going to next new Line");
System.Threading.Thread.Sleep(1000);
//continue;
}
nsr.Close();
Console.WriteLine("Writing to dbc3.txt");
System.IO.File.WriteAllLines(#"C:\WindowsApps\CRM\crm_interface\data\dbc3.txt", results.ToArray());
Console.WriteLine("Finished. Press ENTER to continue.");
Console.WriteLine("End.");
Console.ReadLine();
}
catch (Exception ex)
{
Console.WriteLine("Error: " + ex);
Console.ReadLine();
}
}
Please let me know if there is a faster way. Preferably something that would take 5-10 minutes... I've heard of indexing but didn't find much on this for txt files. I've tested regex and it's no faster than indexof. Contains won't work because the lines will never be exactly the same.
Thanks.

There might be a faster way, but this LINQ apporoach should be faster than 3 hours and is a sight better to read and maintain:
var f1Lines = File.ReadAllLines(f1Path);
var f2LineInf1 = File.ReadLines(f2Path)
.Where( line => f1Lines.Contains(line))
.Select(line => line).ToList();
Edit: tested and required less than 1 second for 400000 lines in file2 and 17000 lines in file1. I can use File.ReadLines for the big file which does not load all into memory at once. For the smaller file i need to use File.ReadAllLines since Contains needs the complete list of lines of file 1.
If you want to log the result in a third file:
File.WriteAllLines(logPath, f2LineInf1);

EDIT: Note that I'm assuming it's reasonable to read at least one file into memory. You may want to swap the queries below around to avoid loading the "big" file into memory, but even 86,000 lines at (say) 1K per line is going to be less than 2G of memory - which is relatively little to do something significant.
You're reading the "inner" file each time. There's no need for that. Load both files into memory and go from there. Heck, for exact matches you can do the whole thing in LINQ easily:
var query = from line1 in File.ReadLines("newDataPath + "HolidayList1.txt")
join line2 in File.ReadLines(dbFilePath + "newdbcontents.txt")
on line1 equals line2
select line1;
var commonLines = query.ToList();
But for non-joins it's still simple; just read one file completely first (explicitly) and then stream the other:
// Eagerly read the "inner" file
var lines2 = File.ReadAllLines(dbFilePath + "newdbcontents.txt");
var query = from line1 in File.ReadLines("newDataPath + "HolidayList1.txt")
from line2 in lines2
where line2.Contains(line1)
select line1;
var commonLines = query.ToList();
There's nothing clever here - it's just a really simple way of writing code to read all the lines in one file, then iterate over the lines in the other file and for each line check against all the lines in the first file. But even without anything clever, I strongly suspect it would perform well enough for you. Concentrate on simplicity, eliminate unnecessary IO, and see whether that's good enough before trying to do anything fancier.
Note that in your original code, you should be using using statements for your StreamReader variables, to ensure they get disposed properly. Using the above code makes it simple to not even need that though...

Quick and dirty because I've got to go... If you can do it in memory, try working with this snippet:
//string[] searchIn = File.ReadAllLines("File1.txt");
//string[] searchFor = File.ReadAllLines("File2.txt");
string[] searchIn = new string[] {"A","AB","ABC","ABCD", null, "", " "};
string[] searchFor = new string[] {"A","BC","BCD", null, "", " "};
matchDictionary;
foreach(string item in file2Content)
{
string[] matchingItems = Array.FindAll(searchIn, x => (x == item) || (!string.IsNullOrEmpty(x) && !string.IsNullOrEmpty(item) ? (x.Contains(item) || item.Contains(x)) : false));
}

is there any way to ignore reading in certain lines in a text file?

I'm trying to read in a text file in a c# application, but I don't want to read the first two lines, or the last line. There's 8 lines in the file, so effectivly I just want to read in lines, 3, 4, 5, 6 and 7.
Is there any way to do this?
example file
_USE [Shelley's Other Database]
CREATE TABLE db.exmpcustomers(
fName varchar(100) NULL,
lName varchar(100) NULL,
dateOfBirth date NULL,
houseNumber int NULL,
streetName varchar(100) NULL
) ON [PRIMARY]_
EDIT
Okay, so, I've implemented Callum Rogers answer into my code and for some reason it works with my edited text file (I created a text file with the lines I didn't want to use omitted) and it does exactly what it should, but whenever I try it with the original text file (above) it throws an exception. I display this information in a DataGrid and I think that's where the exception is being thrown.
Any ideas?

The Answer by Rogers is good, I am just providing another way of doing this.
Try this,
List<string> list = new List<string>();
using (StreamReader reader = new StreamReader(FilePath))
{
string text = "";
while ((text = reader.ReadLine()) != null)
{
list.Add(text);
}
list.RemoveAt(0);
list.RemoveAt(0);
}
Hope this helps

Why do you want to ignore exactly the first two and the last line?
Depending on what your file looks like you might want to analyze the line, e.g. look at the first character whether it is a comment sign, or ignore everything until you find the first empty line, etc.
Sometimes, hardcoding "magic" numbers isn't such a good idea. What if the file format needs to be changed to contain 3 header lines?
As the other answers demonstrate: Nothing keeps you from doing what you ever want with a line you have read, so of course, you can ignore it, too.
Edit, now that you've provided an example of your file: For your case I'd definitely not use the hardcoded numbers approach. What if some day the SQL statement should contain another field, or if it appears on one instead of 8 lines?
My suggestion: Read in the whole string at once, then analyze it. Safest way would be to use a grammar, but if you presume the SQL statement is never going to be more complicated, you can use a regular expression (still much better than using line numbers etc.):
string content = File.ReadAllText(filename);
Regex r = new Regex(#"CREATE TABLE [^\(]+\((.*)\) ON");
string whatYouWant = r.Match(content).Groups[0].Value;

Why not just use File.ReadAllLines() and then remove the first 2 lines and the last line? With such a small file speed differences will not be noticeable.
string[] allLines = File.ReadAllLines("file.ext");
string[] linesWanted = new string[allLines.Length-3];
Array.Copy(allLines, 2, linesWanted, 0, allLines.Length-3);

If you have a TextReader object wrapping the filestream you could just call ReadLine() two times.
StreamReader inherits from TextReader, which is abstract.
Non-fool proof example:
using (var fs = new FileStream("blah", FileMode.Open))
using (var reader = new StreamReader(fs))
{
reader.ReadLine();
reader.ReadLine();
// Do stuff.
}

string filepath = #"C:\whatever.txt";
using (StreamReader rdr = new StreamReader(filepath))
{
rdr.ReadLine(); // ignore 1st line
rdr.ReadLine(); // ignore 2nd line
string fileContents = "";
while (true)
{
string line = rdr.ReadLine();
if (rdr.EndOfStream)
break; // finish without processing last line
fileContents += line + #"\r\n";
}
Console.WriteLine(fileContents);
}

How about a general solution?
To me, the first step is to enumerate over the lines of a file (already provided by ReadAllLines, but that has a performance cost due to populating an entire string[] array; there's also ReadLines, but that's only available as of .NET 4.0).
Implementing this is pretty trivial:
public static IEnumerable<string> EnumerateLines(this FileInfo file)
{
using (var reader = file.OpenText())
{
while (!reader.EndOfStream)
{
yield return reader.ReadLine();
}
}
}
The next step is to simply skip the first two lines of this enumerable sequence. This is straightforward using the Skip extension method.
The last step is to ignore the last line of the enumerable sequence. Here's one way you could implement this:
public static IEnumerable<T> IgnoreLast<T>(this IEnumerable<T> source, int ignoreCount)
{
if (ignoreCount < 0)
{
throw new ArgumentOutOfRangeException("ignoreCount");
}
var buffer = new Queue<T>();
foreach (T value in source)
{
if (buffer.Count < ignoreCount)
{
buffer.Enqueue(value);
continue;
}
T buffered = buffer.Dequeue();
buffer.Enqueue(value);
yield return buffered;
}
}
OK, then. Putting it all together, we have:
var file = new FileInfo(#"path\to\file.txt");
var lines = file.EnumerateLines().Skip(2).IgnoreLast(1);
Test input (contents of file):
This is line number 1.
This is line number 2.
This is line number 3.
This is line number 4.
This is line number 5.
This is line number 6.
This is line number 7.
This is line number 8.
This is line number 9.
This is line number 10.
Output (of Skip(2).IgnoreLast(1)):
This is line number 3.
This is line number 4.
This is line number 5.
This is line number 6.
This is line number 7.
This is line number 8.
This is line number 9.

You can do this:
var valid = new int[] { 3, 4, 5, 6, 7 };
var lines = File.ReadAllLines("file.txt").
Where((line, index) => valid.Contains(index + 1));
Or the opposite:
var invalid = new int[] { 1, 2, 8 };
var lines = File.ReadAllLines("file.txt").
Where((line, index) => !invalid.Contains(index + 1));
If you're looking for a general way to remove the last and the first 2, you can use this:
var allLines = File.ReadAllLines("file.txt");
var lines = allLines
.Take(allLines.Length - 1)
.Skip(2);
But from your example it seems that you're better off looking for the string pattern that you want to read from the file. Try using regexes.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Reading CSV file, match column 3 to column 1 on separate lines - c#

Related

C# - Reading from CSV file producing two entries at once?

How to add text to the line that starts with "hello" in a file

Code does not execute

Searching for line of one text file in another text file, faster

is there any way to ignore reading in certain lines in a text file?

Categories

Resources