I know I've been a bit of pain, the last couple of days, that is, with all my questions, but I've been developing this project and I'm (figuratively) inches away from finishing it.
That being said, I would like your help on one more matter. It kind of relates to my previous questions, but you do not need the code for those. The problem lies exactly on this bit of code. What I want from you is to help me identify it and, consequently, solve it.
Before I show you the code I'd been working on, I'd like to say a few extra things:
My application has a file merging feature, merging two files together and handling duplicate entries.
In any given file, each line can have one of these four formats (the last three are optional): Card Name|Amount, .Card Name|Amount, ..Card Name|Amount, _Card Name|Amount.
If a line is not appropriately formatted, the program will skip it (ignore it altogether).
So, basically, a sample file could be as follows:
Blue-Eyes White Dragon|3
..Blue-Eyes Ultimate Dragon|1
.Dragon Master Knight|1
_Kaibaman|1
Now, when it comes to using the file merger, if a line starts with one of the special characters . .. _, it should act accordingly. For ., it operates normally. For lines starting with .., it moves the index to the second dot and, finally, it ignores _ lines completely (they have another use not related to this discussion).
Here is my code for the merge function (for some odd reason, the code inside the second loop won't execute at all):
if (openFileDialog1.ShowDialog() == DialogResult.OK)
{
// Save file names to array.
string[] fileNames = openFileDialog1.FileNames;
// Loop through the files.
foreach (string fileName in fileNames)
{
// Save all lines of the current file to an array.
string[] lines = File.ReadAllLines(fileName);
// Loop through the lines of the file.
for (int i = 0; i < lines.Length; i++)
{
// Split current line.
string[] split = lines[i].Split('|');
// If the current line is badly formatted, skip to the next one.
if (split.Length != 2)
continue;
string title = split[0];
string times = split[1];
if (lines[i].StartsWith("_"))
continue;
// If newFile (list used to store contents of the card resource file) contains the current line of the file that we're currently looping through...
for (int k = 0; k < newFile.Count; k++)
{
if (lines[i].StartsWith(".."))
{
string newTitle = lines[i].Substring(
lines[i].IndexOf("..") + 1);
if (newFile[k].Contains(newTitle))
{
// Split the line once again.
string[] secondSplit = newFile.ElementAt(
newFile.IndexOf(newFile[k])).Split('|');
string secondTimes = secondSplit[1];
// Replace the newFile element at the specified index.
newFile[newFile.IndexOf(newFile[k])] =
string.Format("{0}|{1}", newTitle, int.Parse(times) + int.Parse(secondTimes));
}
// If newFile does not contain the current line of the file we're looping through, just add it to newFile.
else
newFile.Add(string.Format(
"{0}|{1}",
newTitle, times));
continue;
}
if (newFile[k].Contains(title))
{
string[] secondSplit = newFile.ElementAt(
newFile.IndexOf(newFile[k])).Split('|');
string secondTimes = secondSplit[1];
newFile[newFile.IndexOf(newFile[k])] =
string.Format("{0}|{1}", title, int.Parse(times) + int.Parse(secondTimes));
}
else
{
newFile.Add(string.Format("{0}|{1}", title, times));
}
}
}
}
// Overwrite resources file with newFile.
using (StreamWriter sw = new StreamWriter("CardResources.ygodc"))
{
foreach (string line in newFile)
sw.WriteLine(line);
}
I know this is quite a long piece of code, but I believe all of it is relevant to a point. I skipped some unimportant bits (after all of this is executed) as they are completely irrelevant.
Related
I have a CSV that looks like this. My goal is to extract each entry (notice I said entry, not line), where an entry starts from the first column and stretches to the last column, and may span multiple lines. I'd like to extract an entry without ruining the formatting. For example, I do not want the following to be considered four seperate lines,
Eg. 1, One Column Multiple Lines
...,"1. copy ctor
2. copy ctor
3. declares function
4. default ctor",... // Where ... represents the columns before and after
but rather a column in one entry that can be represented as such
Eg. 2, One Column Single Line
"1. copy ctor\n2.copy ctor\ndeclares function\n4.default ctor"
When I iterate over the CSV, as such, I get Eg. 1. I'm not sure why splitting on a comma is treating a new line as a comma.
using (var streamReader = new StreamReader("results-survey111101.csv"))
{
string line;
while ((line = streamReader.ReadLine()) != null)
{
string[] splitLine = line.Split(',');
foreach (var column in splitLine)
Console.WriteLine(column);
}
}
If someone can show me what I need to do to get these multi line CSV columns into one line that maintains the formatting (e.g. adds \t or \n where necessary) that would be great. Thanks!
Assuming your source file is valid CSV, variability in the data is really hard to account for. That's all I'll say, but I'll link you to another SO answer if you need convincing that writing your own CSV parser is a horrible task. Reading CSV files using C#
Let's assume you are going to take advantage of an existing CSV reader library. I'll use TextFieldParser from the Microsoft.VisualBasic library as is used in the example answer I linked.
Your task is to read your source file line by line, and validate whether the line is a complete CSV entry on it's own, or if it forms part of a broken line.
If it forms part of a broken line, we need to remember the line and add the next line to it before attempting validation again.
For this we need to know one thing:
What is the expected number of fields each data entry row should have?
int expectedFieldCount = 7;
string brokenLine = "";
using (var streamReader = new StreamReader("results-survey111101.csv"))
{
string line;
while ((line = streamReader.ReadLine()) != null) // read the next line
{
// if the previous line was incomplete, add it to the current line,
// otherwise use the current line
string csvLineData = (brokenLine.Length > 0) ? brokenLine + line : line;
try
{
using (StringReader stringReader = new StringReader(csvLineData ))
using (TextFieldParser parser = new TextFieldParser(stringReader))
{
parser.SetDelimiters(",");
while (!parser.EndOfData)
{
string[] fields = parser.ReadFields(); // tests if the line is valid csv
if (expectedFieldCount == fields.Length)
{
// do whatever you want with the fields now.
foreach (var field in fields)
{
Console.WriteLine(field);
}
brokenLine = ""; // reset the brokenLine
}
else // it was valid csv, but we don't have the required number of fields yet
{
brokenLine += line + #"\r\n";
break;
}
}
}
}
catch (Exception ex) // the current line is NOT valid csv, update brokenLine
{
brokenLine += (line + #"\r\n");
}
}
}
I am replacing the line breaks that broken lines contain with \r\n literals. You can display these in your resulting one-liner field however you want. But you shouldn't expect to be able to copy paste the result into notepad and see line breaks.
One assumes you have the same number of columns in each record. Therefore in your code where you do your Split you can merely sum the length of splitLine into a running columnsReadCount until they equal the desired columnsPerRecordCount. At that point you have read all the record and can reset the running columnsReadCount back to zero ready for the next record to read.
I have a file that contains many lines. There is a line here looking like below:
hello jim jack nina richi sam
I need to add a specific text salmon in this line and change it to below (it could be added anywhere in this line -end -begining - in the middle -doesnt matter ):
hello jim jack nina richi sam salmon
I tried:
string path = #"C:\testFolder\newTestLog.txt";
StreamReader myReader = new StreamReader(path);
string[] allLines = File.ReadAllLines(path);
foreach (string element in allLines) {
if (element.StartsWith("hello"))
{
Console.WriteLine(element);
}
}
myReader.Close();
}
Using this I'm able to read the file line by line and add each line to an array and print that line if that starts with "hello", but I'm not sure how to add text to this line
You should use what Joel answered it's nicer but if you're having trouble implementing it try this. After adding the salmon to the lines that start with hello you can overwrite the txt file by using File.WriteAllLines
string filePath = #"C:\testFolder\newTestLog.txt";
string[] allLines = File.ReadAllLines(filePath);
for(int i = 0; i < allLines.Length; i++)
{
if (allLines[i].StartsWith("hello"))
{
allLines[i] += " salmon";
}
}
File.WriteAllLines(filePath, allLines);
Try this:
string path = #"C:\testFolder\newTestLog.txt";
var lines = File.ReadLines(path).Select(l => l + l.StartsWith("hello")?" salmon":"");
foreach (string line in lines)
Console.WriteLine(line);
Note that this still only writes the results to the Console, as your sample does. It's not clear what you really want to happen with the output.
If you want this saved to the original file, you've opened up a small can of worms. Think of all of the data in your file as if it's stored in one contiguous block1. If you append text to any line in the file, that text has nowhere to go but to overwrite the beginning of the next. As a practical matter, if you need to modify file, this often means either writing out a whole new file, and then deleting/renaming when done, or alternatively keeping the whole file in memory and writing it all from start to finish.
Using the 2nd approach, where we keep everything in memory, you can do this:
string path = #"C:\testFolder\newTestLog.txt";
var lines = File.ReadAllLines(path).Select(l => l + l.StartsWith("hello")?" salmon":"");
File.WriteAllLines(path, lines);
1 In fact, a file may be split into several fragments on the disk, but even so, each fragment is presented to your program as part of a single whole.
I've been at this all day and I'm struggling to do it, I've searched far and wide and though there are similar questions, there are no answers.
What I'm trying to do:
I'm trying to search a file for multiple strings and receive the line numbers so that when users wish to change said strings inside my application, my application knows what lines to replace with their string.
What I've tried so far:
I've tried the below code, my attempt was to read the file then receive the line number, however it doesn't seem to be working at all.
var lines = File.ReadAllLines(filepath);
foreach (var match in lines // File.ReadLines(filePath)
.Select((text, index) => new { text, lineNumber = index + 1 })
.Where(x => x.text.Contains("motd=")))
{
lines[match.lineNumber] = "motd=" + textBox1.Text;
File.WriteAllLines(filepath, lines);
}
}
What I expect the above code to do is find the string "motd=" in the file, get the line number and attempt to re-write it with what the user has inputted.
However, I receive this error: "Index was outside the bounds of the array".
I think a for loop makes more sense here
var lines = File.ReadAllLines(filepath);
for (int i = 0; i < lines.Length; i++)
{
if(lines[i].Contains("motd="))
{
lines[i] = "motd=" + textBox1.Text;
}
}
File.WriteAllLines(filepath, lines);
A couple of issues with your code was that you were writing out the file in side the loop and you incremented the index by 1 which would then point to the wrong line in the array and result in the exception you are getting if the last line contains the text you are searching for.
It should be noted that this could be a memory hog if you are working with a really large file. In that case I'd read the lines in one at a time and write them out to a temp file, then delete the original file and rename the temp at the end.
var tempFile = Path.GetTempFileName();
using (var file = File.OpenWrite(tempFile))
using (var writer = new StreamWriter(file))
{
foreach (var line in File.ReadLines(filepath))
{
if (line.Contains("motd="))
{
writer.WriteLine("motd=" + textBox1.Text);
}
else
{
writer.WriteLine(line);
}
}
}
File.Delete(filepath);
File.Move(tempFile, filepath);
This will be much faster, and use less memory, and work with very large files, but you need to write to a new file:
File.WriteAllLines(filepath2,
File
.ReadLines(filepath)
.Select(x=>x.Contains("motd=")?"motd="+TextBox1.Text:x));
It uses ReadLines which reads each line as it can instead of reading the entire file and breaking it into lines in one step. So it processes it as quickly as your file system can read, processes the line, then writes the result.
A complete replacement would look like this:
var filepath2= Path.GetTempFileName();
File.WriteAllLines(filepath2,
File
.ReadLines(filepath)
.Select(x=>x.Contains("motd=")?"motd="+TextBox1.Text:x));
File.Delete(filepath);
File.Move(filepath2, filepath);
To do multiple values:
var filepath2= Path.GetTempFileName();
File.WriteAllLines(filepath2,
File
.ReadLines(filepath)
.Select(x=> {
if (x.Contains("motd=")) return "motd="+TextBox1.Text;
if (x.Contains("author=")) return "author="+TextBox2.Text;
return x;
}));
File.Delete(filepath);
File.Move(filepath2, filepath);
or move the replace logic to it's own function:
string ReplaceTokens(string src)
{
if (src.Contains("motd=")) return "motd="+TextBox1.Text;
if (src.Contains("author=")) return "author="+TextBox2.Text;
return src;
}
var filepath2= Path.GetTempFileName();
File.WriteAllLines(filepath2,
File
.ReadLines(filepath)
.Select(ReplaceTokens));
File.Delete(filepath);
File.Move(filepath2, filepath);
You're writing the file on every single iteration which makes no sense, and doing a select/where/index thing is just weird.
Since you're doing a Where you have to iterate over every single line, so why not save the trouble and just iterate over everything explicitly?
var lines = File.ReadAllLines(filepath);
for(int i = 0; i < lines.Length; i++)
{
if(lines[i].Contains("motd="))
{
lines[i] = "motd=" + textBox1.Text;
}
}
File.WriteAllLines(filepath, lines);
I am trying to read characters from a file and then append them in another file after removing the comments (which are followed by semicolon).
sample data from parent file:
Name- Harly Brown ;Name is Harley Brown
Age- 20 ;Age is 20 years
Desired result:
Name- Harley Brown
Age- 20
I am trying the following code-
StreamReader infile = new StreamReader(floc + "G" + line + ".NC0");
while (infile.Peek() != -1)
{
letter = Convert.ToChar(infile.Read());
if (letter == ';')
{
infile.ReadLine();
}
else
{
System.IO.File.AppendAllText(path, Convert.ToString(letter));
}
}
But the output i am getting is-
Name- Harley Brown Age-20
Its because AppendAllText is not working for the newline. Is there any alternative?
Sure, why not use File.AppendAllLines. See documentation here.
Appends lines to a file, and then closes the file. If the specified file does not exist, this method creates a file, writes the specified lines to the file, and then closes the file.
It takes in any IEnumerable<string> and adds every line to the specified file. So it always adds the line on a new line.
Small example:
const string originalFile = #"D:\Temp\file.txt";
const string newFile = #"D:\Temp\newFile.txt";
// Retrieve all lines from the file.
string[] linesFromFile = File.ReadAllLines(originalFile);
List<string> linesToAppend = new List<string>();
foreach (string line in linesFromFile)
{
// 1. Split the line at the semicolon.
// 2. Take the first index, because the first part is your required result.
// 3. Trim the trailing and leading spaces.
string appendAbleLine = line.Split(';').FirstOrDefault().Trim();
// Add the line to the list of lines to append.
linesToAppend.Add(appendAbleLine);
}
// Append all lines to the file.
File.AppendAllLines(newFile, linesToAppend);
Output:
Name- Harley Brown
Age- 20
You could even change the foreach-loop into a LINQ-expression, if you prefer LINQ:
List<string> linesToAppend = linesFromFile.Select(line => line.Split(';').FirstOrDefault().Trim()).ToList();
Why use char by char comparison when .NET Framework is full of useful string manipulation functions?
Also, don't use a file write function multiple times when you can use it only one time, it's time and resources consuming!
StreamReader stream = new StreamReader("file1.txt");
string str = "";
while ((string line = infile.ReadLine()) != null) { // Get every line of the file.
line = line.Split(';')[0].Trim(); // Remove comment (right part of ;) and useless white characters.
str += line + "\n"; // Add it to our final file contents.
}
File.WriteAllText("file2.txt", str); // Write it to the new file.
You could do this with LINQ, System.File.ReadLines(string), and System.File.WriteAllLines(string, IEnumerable<string>). You could also use System.File.AppendAllLines(string, IEnumerable<string>) in a find-and-replace fashion if that was, in fact, the functionality you were going for. The difference, as the names suggest, is whether it writes everything out as a new file or if it just appends to an existing one.
System.IO.File.WriteAllLines(newPath, System.IO.File.ReadLines(oldPath).Select(c =>
{
int semicolon = c.IndexOf(';');
if (semicolon > -1)
return c.Remove(semicolon);
else
return c;
}));
In case you aren't super familiar with LINQ syntax, the idea here is to loop through each line in the file, and if it contains a semicolon (that is, IndexOf returns something that is over -1) we cut that off, and otherwise, we just return the string. Then we write all of those to the file. The StreamReader equivalent to this would be:
using (StreamReader reader = new StreamReader(oldPath))
using (StreamWriter writer = new StreamWriter(newPath))
{
string line;
while ((line = reader.ReadLine()) != null)
{
int semicolon = line.IndexOf(';');
if (semicolon > -1)
line = c.Remove(semicolon);
writer.WriteLine(line);
}
}
Although, of course, this would feed an extra empty line at the end and the LINQ version wouldn't (as far as I know, it occurs to me that I'm not one hundred percent sure on that, but if someone reading this does know I would appreciate a comment).
Another important thing to note, just looking at your original file, you might want to add in some Trim calls, since it looks like you can have spaces before your semicolons, and I don't imagine you want those copied through.
Is there a faster way to search each line of one text file for occurrence in another text file, than by going line by line in both files?
I have two text files - one has ~2500 lines (let's call it TxtA), the other has ~86000 lines(TxtB). I want to search TxtB for each line in TxtA, and return the line in TxtB for each match found.
I currently have this setup as: for each line in TxtA, search TxtB line by line for a match. However this is taking a really long time to process. It seems like it would take 1-3 hours to find all the matches.
Here is my code...
private static void getGUIDAndType()
{
try
{
Console.WriteLine("Begin.");
System.Threading.Thread.Sleep(4000);
String dbFilePath = #"C:\WindowsApps\CRM\crm_interface\data\";
StreamReader dbsr = new StreamReader(dbFilePath + "newdbcontents.txt");
List<string> dblines = new List<string>();
String newDataPath = #"C:\WindowsApps\CRM\crm_interface\data\";
StreamReader nsr = new StreamReader(newDataPath + "HolidayList1.txt");
List<string> new1 = new List<string>();
string dbline;
string newline;
List<string> results = new List<string>();
while ((newline = nsr.ReadLine()) != null)
{
//Reset
dbsr.BaseStream.Position = 0;
dbsr.DiscardBufferedData();
while ((dbline = dbsr.ReadLine()) != null)
{
newline = newline.Trim();
if (dbline.IndexOf(newline) != -1)
{//if found... get all info for now
Console.WriteLine("FOUND: " + newline);
System.Threading.Thread.Sleep(1000);
new1.Add(newline);
break;
}
else
{//the first line of db does not contain this line...
//go to next dbline.
Console.WriteLine("Lines do not match - continuing");
continue;
}
}
Console.WriteLine("Going to next new Line");
System.Threading.Thread.Sleep(1000);
//continue;
}
nsr.Close();
Console.WriteLine("Writing to dbc3.txt");
System.IO.File.WriteAllLines(#"C:\WindowsApps\CRM\crm_interface\data\dbc3.txt", results.ToArray());
Console.WriteLine("Finished. Press ENTER to continue.");
Console.WriteLine("End.");
Console.ReadLine();
}
catch (Exception ex)
{
Console.WriteLine("Error: " + ex);
Console.ReadLine();
}
}
Please let me know if there is a faster way. Preferably something that would take 5-10 minutes... I've heard of indexing but didn't find much on this for txt files. I've tested regex and it's no faster than indexof. Contains won't work because the lines will never be exactly the same.
Thanks.
There might be a faster way, but this LINQ apporoach should be faster than 3 hours and is a sight better to read and maintain:
var f1Lines = File.ReadAllLines(f1Path);
var f2LineInf1 = File.ReadLines(f2Path)
.Where( line => f1Lines.Contains(line))
.Select(line => line).ToList();
Edit: tested and required less than 1 second for 400000 lines in file2 and 17000 lines in file1. I can use File.ReadLines for the big file which does not load all into memory at once. For the smaller file i need to use File.ReadAllLines since Contains needs the complete list of lines of file 1.
If you want to log the result in a third file:
File.WriteAllLines(logPath, f2LineInf1);
EDIT: Note that I'm assuming it's reasonable to read at least one file into memory. You may want to swap the queries below around to avoid loading the "big" file into memory, but even 86,000 lines at (say) 1K per line is going to be less than 2G of memory - which is relatively little to do something significant.
You're reading the "inner" file each time. There's no need for that. Load both files into memory and go from there. Heck, for exact matches you can do the whole thing in LINQ easily:
var query = from line1 in File.ReadLines("newDataPath + "HolidayList1.txt")
join line2 in File.ReadLines(dbFilePath + "newdbcontents.txt")
on line1 equals line2
select line1;
var commonLines = query.ToList();
But for non-joins it's still simple; just read one file completely first (explicitly) and then stream the other:
// Eagerly read the "inner" file
var lines2 = File.ReadAllLines(dbFilePath + "newdbcontents.txt");
var query = from line1 in File.ReadLines("newDataPath + "HolidayList1.txt")
from line2 in lines2
where line2.Contains(line1)
select line1;
var commonLines = query.ToList();
There's nothing clever here - it's just a really simple way of writing code to read all the lines in one file, then iterate over the lines in the other file and for each line check against all the lines in the first file. But even without anything clever, I strongly suspect it would perform well enough for you. Concentrate on simplicity, eliminate unnecessary IO, and see whether that's good enough before trying to do anything fancier.
Note that in your original code, you should be using using statements for your StreamReader variables, to ensure they get disposed properly. Using the above code makes it simple to not even need that though...
Quick and dirty because I've got to go... If you can do it in memory, try working with this snippet:
//string[] searchIn = File.ReadAllLines("File1.txt");
//string[] searchFor = File.ReadAllLines("File2.txt");
string[] searchIn = new string[] {"A","AB","ABC","ABCD", null, "", " "};
string[] searchFor = new string[] {"A","BC","BCD", null, "", " "};
matchDictionary;
foreach(string item in file2Content)
{
string[] matchingItems = Array.FindAll(searchIn, x => (x == item) || (!string.IsNullOrEmpty(x) && !string.IsNullOrEmpty(item) ? (x.Contains(item) || item.Contains(x)) : false));
}