I was just learning and had a problem working with files.
I have a method that has two inputs, one at the beginning of the line (lineStart) I want and the other at the end of the line (lineEnd)
I need method that extract between these two numbers for me and write on file .
ex ) lineStart = 20 , lineEnd = 90, in output Must be = 21-89 line of txt file.
string[] lines = File.ReadAllLines(#"");
int lineStart = 0;
foreach (string line0 in lines)
{
lineStart++;
if (line0.IndexOf("target1") > -1)
{
Console.Write(lineStart + "\n");
}
}
int lineEnd = 0;
foreach (string line1 in lines)
{
lineEnd++;
if (line1.IndexOf("target2") > -1)
{
Console.Write(lineEnd);
}
}
// method grabText(lineStart,lineEnd){}
enter code here
It is just a line of code
string[] lines = File.ReadLines(#"").Skip(lineStart).Take(lineEnd-lineStart);
Notice also that I use ReadLines and not ReadAllLines. The first one doesn't load everything in memory.
It is not very clear what are the boundary of the lines to take but of course it is very easy to adapt the calculation
If your text file is huge, don't read it into memory. Don't look for indexes either, just process it line by line:
bool writing = false;
using var sw = File.CreateText(#"C:\some\path\to.txt");
foreach(var line in File.ReadLines(...)){ //don't use ReadAllInes, use ReadLines - it's incremental and burns little memory
if(!writing && line.Contains("target1")){
writing = true; //start writing
continue; //don't write this line
}
if(writing){
if(line.Contains("target2"))
break; //exit loop without writing this line
sw.WriteLine(line);
}
}
Related
i am trying to editing only one column within my csv. however the code does not seem to affect the file. the changes im trying to make is to change to separate the 4th column data with a comma.
class Program
{
static void Main(string[] args)
{
var filePath = Path.Combine(Directory.GetCurrentDirectory(), "kaviaReport 02_08_2016.csv");
var fileContents = ReadFile(filePath);
foreach (var line in fileContents)
{
Console.WriteLine(line);
}
Console.WriteLine("Press any key to exit...");
Console.ReadKey();
}
public static IList<string> ReadFile(string fileName)
{
var results = new List<string>();
int lineCounter = 0;
string currentLine = string.Empty;
var target = File
.ReadAllLines(fileName);
while ((currentLine = fileName) != null)//while there are lines to read
{
List<string> fielded = new List<string>(currentLine.Split(','));
if (lineCounter != 0)
{
//If it's not the first line
var lineElements = currentLine.Split(',');//split your fields into an array
var replace = target[4].Replace(' ', ',');//replace the space in position 4(field 5) of your array
results.Add(replace);
//target.WriteAllLines(string.Join(",", fielded));//write the line in the new file
}
lineCounter++;
File.WriteAllLines(fileName, target);
}
return results;
}
}
The current code has some errors.
The biggest one is the assignement of currentLine to fileName. This, of course is meaningless if you want to loop over the lines. So you need a foreach over the read lines.
Then inside the loop you should use the variable lineElements to get the 5 column available after the splitting of the currentLine.
Finally the rewrite of the file goes outside the loop and should use the result list.
// Loop, but skip the first line....
foreach(string currentLine in target.Skip(1))
{
// split your line into an array of strings
var lineElements = currentLine.Split(',');
// Replace spaces with commas on the fifth column of lineElements
var replace = lineElements[4].Replace(' ', ',');
// Add the changed line to the result list
results.Add(replace);
}
// move outside the foreach loop the write of your changes
File.WriteAllLines(fileName, results.ToArray());
Something has occured to my mind while writing this code. It is not clear if you want to rewrite the CSV file with only the data in the fifth column expanded with commas or if you want to rewrite the entire line (also column 0,1,2,3,4 etc..) in this latter case you need a different code
// Replace spaces with commas on the fifth column of lineElements
// And resssign the result to the same fifth column
lineElements[4] = lineElements[4].Replace(' ', ',');
// Add the changed line to the result list putting the comma
// between the array of strings lineElements
results.Add(string.Join(",", lineElements);
while ((currentLine = fileName) != null) will set currentLine = fileName which will make the line always true and make a infinite loop
I would write it as a for loop instead of a while
public static IList<string> ReadFile(string fileName)
{
var target = File.ReadAllLines(fileName).ToList();
// i = 1 (skip first line)
for (int i = 1; i < target.Count; i++)
{
target[4] = target[4].Replace(' ', ','); //replace the space in position 4(field 5)
}
File.WriteAllLines(fileName, target);
// Uncomment the RemoveAt(0) to remove first line
// target.RemoveAt(0);
return target;
}
I am looking for a bit of advice on ways I can make this function quicker.
The function is designed to run through a delimited text file (with CRLF row ends) and remove any carriage returns or line breaks in between data rows.
E.g. A file of -
A|B|C|D
A|B|C|D
A|B|
C|D
A|B|C|D
Would become -
A|B|C|D
A|B|C|D
A|B|C|D
A|B|C|D
The function seems to work well, however when we start processing large files, the performance is too slow. An example is - for 800k rows it takes 3 seconds, for 130 million rows it takes over an hour....
The code is -
private void CleanDelimitedFile(string readFilePath, string writeFilePath, string delimiter, string problemFilePath, string rejectsFilePath, int estimateNumberOfRows)
{
ArrayList rejects = new ArrayList();
ArrayList problems = new ArrayList();
int safeSameLengthBreak = 0;
int numberOfLinesSameLength = 0;
int lineCount = 0;
int maxCount = 0;
string previousLine = string.Empty;
string currentLine = string.Empty;
// determine after how many rows with the same number of delimiter chars that we can safety
// say that we have found the expected length of a row (to save reading the full file twice)
if (estimateNumberOfRows > 100000000)
safeSameLengthBreak = estimateNumberOfRows / 200; // set the safe check limit as 0.5% of the file (minimum of 500,000)
else if (estimateNumberOfRows > 10000000)
safeSameLengthBreak = estimateNumberOfRows / 50; // set the safe check limit as 2% of the file (minimum of 200,000)
else
safeSameLengthBreak = 50000; // set the safe check limit as 50,000 (if there are less than 50,000 this wont be required anyway)
// open a reader
using (var reader = new StreamReader(readFilePath))
{
// check the file is still being read
while (!reader.EndOfStream)
{
// append the line count (for debugging)
lineCount += 1;
// get the current line
currentLine = reader.ReadLine();
// get the number of chars in the new line
int chars = (currentLine.Length - currentLine.Replace(delimiter, "").Length);
// if the number is higher than the previous maximum set the new maximum
if (maxCount < chars)
{
maxCount = chars;
// the maximum has changed, reset the number of lines in a row with the same delimiter
numberOfLinesSameLength = 0;
}
else
{
// the maximum has not changed, add to the number of lines in a row with the same delimiter
numberOfLinesSameLength += 1;
}
// is the number of lines parsed in a row with the same number of delimiter chars above the safe limit? If so break the loop
if (numberOfLinesSameLength > safeSameLengthBreak)
{
break;
}
}
}
// reset the line count
lineCount = 0;
// open a writer for the duration of the next read
using (var writer = new StreamWriter(writeFilePath))
{
using (var reader = new StreamReader(readFilePath))
{
// check the file is still being read
while (!reader.EndOfStream)
{
// append the line count (for debugging)
lineCount += 1;
// get the current line
currentLine = reader.ReadLine();
// get the number of chars in the new line
int chars = (currentLine.Length - currentLine.Replace(delimiter, "").Length);
// check the number of chars in the line matches the required number
if (chars == maxCount)
{
// write line
writer.WriteLine(currentLine);
// clear the previous line variable as this was a valid write
previousLine = string.Empty;
}
else
{
// add the line to problems
problems.Add(currentLine);
// append the new line to the previous line
previousLine += currentLine;
// get the number of chars in the new appended previous line
int newPreviousChars = (previousLine.Length - previousLine.Replace(delimiter, "").Length);
// check the number of chars in the previous appended line matches the required number
if (newPreviousChars == maxCount)
{
// write line
writer.WriteLine(previousLine);
// clear the previous line as this was a valid write
previousLine = string.Empty;
}
else if (newPreviousChars > maxCount)
{
// the number of delimiter chars in the new line is higher than the file maximum, add to rejects
rejects.Add(previousLine);
// clear the previous line and move on
previousLine = string.Empty;
}
}
}
}
}
// rename the original file as _original
System.IO.File.Move(readFilePath, readFilePath.Replace(".txt", "") + "_Original.txt");
// rename the new file as the original file name
System.IO.File.Move(writeFilePath, readFilePath);
// Write rejects
using (var rejectWriter = new StreamWriter(rejectsFilePath))
{
// loop through the problem array list and write the problem row to the problem file
foreach (string reject in rejects)
{
rejectWriter.WriteLine(reject);
}
}
// Write problems
using (var problemWriter = new StreamWriter(problemFilePath))
{
// loop through the reject array list and write the reject row to the problem file
foreach (string problem in problems)
{
problemWriter.WriteLine(problem);
}
}
}
Any pointers would be greatly appreciated.
Thanks in advance.
A few ideas
List<String>
For rejects and problems and allocate an initial capacity to you think they will need
Don't process over the network
Get an SSD, copy to it, process, write lines to it, and then copy the file back
This does not look like an efficient way to me to count delimeters
int chars = (currentLine.Length - currentLine.Replace(delimiter, "").Length);
This is wastefully expensive: currentLine.Replace(delimiter, "")
int chars = 0;
foreach(char c in currentLine) if (c == delimeter) chars++;
This is not efficient
previousLine += currentLine;
Use StringBuilder
And allocate StringBuilder once outside the loop
In the loop call .Clear()
This question already has answers here:
Edit a specific Line of a Text File in C#
(6 answers)
Closed 8 years ago.
Could anyone tell me how to edit a specific line in a text document?
For instance, lets say that my document contains two phone numbers:
"0889367882
0887343160"
I want to delete the second number and write a new phone number, how can I do that?
I am printing the text in the document, but i don't know how to choose which line to edit
and how to do that.
string path = #"C:\Users\...\text1.txt";
string[] lines = File.ReadAllLines(path);
int i = 0;
foreach (var line in lines)
{
i++;
Console.WriteLine("{0}. {1}", i, line);
}
Thanks!
Simply use string.replace.
Like this:
if(line.Contains("0887343160")
line = line.Replace("0887343160", "0889367882");
and after replacing, write all lines back in the file.
A better version would be to iterate the lines in the file rather than loading the whole file lines to memory. Hence using an iterator would do best here.
We do a MoveNext() on the iterator object and write the current line pointed by the iterator to the file after executing the necessary replace logic.
StreamWriter wtr = new StreamWriter("out.txt");
var e = File.ReadLines(path).GetEnumerator();
int lineno = 12; //arbitrary
int counter = 0;
string line = string.Empty;
while(e.MoveNext())
{
counter++;
if(counter == lineno)
line = replaceLogic(e.Current);
else
line = e.Current;
wtr.WriteLine(line);
}
wtr.Close();
Solution 1: if you want to remove the Line based on user input String (matches with one of the line from file) you can try this.
string path = #"C:\Data.txt";
string[] lines = File.ReadAllLines(path);
String strRemove = "8971820518";
List<String> lst = new List<String>();
for(int i=0;i<lines.Length;i++)
{
if (!lines[i].Equals(strRemove)) //if string is part of line use Contains()
{
lst.Add(lines[i]);
}
}
File.WriteAllLines(path,lst.ToArray());
Solution 2: if you want to remove the Line based on user input LineNO (matched with exact line no in file) you can try this
string path = #"C:\Data.txt";
string[] lines = File.ReadAllLines(path);
int iRemoveLineNo = 6;
List<String> lst = new List<String>();
for(int i=0;i<lines.Length;i++)
{
if (iRemoveLineNo-1!=i)
{
lst.Add(lines[i]);
}
}
File.WriteAllLines(path,lst.ToArray());
I have a large text file(20MB), and I'm trying to change every 4th & 5th line to 0,0
I've tried with the following code but I will be interested to know if theres any better way of doing it..
EDIT:
Power = new List<float>();
Time = new List<float>();
string line;
float _i =0.0f;
float _q =0.0f;
int counter = 0;
StreamReader file = new StreamReader(iqFile2Open);
while ((line = file.ReadLine()) != null)
{
if (Regex.Matches(line, #"[a-zA-Z]").Count == 0)
{
string[] IQ = line.Split(',');
if (IQ.Length == 2)
{
_i = float.Parse(IQ[0]);
_q = float.Parse(IQ[1]);
double _p = 10 * (Math.Log10((_i * _i) + (_q * _q)));
if((counter%4)==0 || (counter%5)==0)
sw.WriteLine("0,0");
else
sw.WriteLine(string.Format("{0},{1}", _i, _q));
counter++;
}
}
}
Thanks in advance.!
You can read in all of the lines, map each line to what it should be based on it's position, and then write them all out:
var lines = File.ReadLines(inputFile)
.Select((line, i) => ComputeLine(line, i + 1));
File.WriteAllLines(outputFile, lines);
As for the actual mapping, you can mod the line number by 5 to get an "every 5th item" result, and then just compare the result to the two mod values you care about. Note that since you don't want the first item wiped out it's important that the index is 1-indexed, not zero indexed.
private static string ComputeLine(string line, int i)
{
if (i % 5 == 4 || i % 5 == 0)
return "0,0";
else
return line;
}
This streams through each line in the file, rather than loading the entire file into memory. Because of this it's important that the input and output files be different. You can copy the output file to the input file if needed, or you could instead use ReadAllLines to bring the entire file into memory (assuming the file stays suitably small) thus allowing you to write to the same file you read from.
What exactly are you trying to replace? Are you replacing by specific LINE or specific TEXT?
If you are looking to replace specific text you can easily do a string.Replace() method...
StreamReader fileIn = new StreamReader("somefile");
string fileText = fileIn.Readlines();
fileText = fileText.Replace("old", "new");
//Repeat last line for all old strings.
//write file...
I have a text file that I am opening up and it is in a similar format to this:
10 SOME TEXT
20 T A40
B B5, C45, D48
30 B E25
40 B F17, G18
60 T H20, I23,
B J6, K7, L8, M9, N10, O11, P12,
Q31, R32, S33, T34, U35, V36,
W37, X38, Y39
100 T Z65
360 B A1, B4, C5, D6, E7, F10
2000 T SOME TEXT
423 TEXT
With this text I need to be able to read it and replace values accordingly. If a ReadLine begins with a number (ie, 10, 20, 30, 40, 60, 100, 360, 2000, 423) I need to to check if there is a T, B, or text after it. The only case that I need to change/reformat the lines when they come in and output them differently.
Example: 10 is fine except for I would like to add zeros in front of every number to make them 4 digits long (ie, 10 turns to 0010, 360 turns to 0360, 2000 stays the same). When the string "B B5, C45, D48" is read (this is the third line in the text) I need to change it to say "20A B5, C45, D48". I need to grab the number above the "B" and concat it to the "B" and replace the "B" with an "A". If instead of a "B" there is a "T" I simply need to remove the "T". Also, if a line does not start with a number or a "B" (ie, Q31 or W37) I need to concat that line with the previous line.
So after the changes take place it should look like this:
0010 SOME TEXT
0020 A40
0020A B5, C45, D48
0030A E25
0040A F17, G18
0060 H20, I23,
0060A J6, K7, L8, M9, N10, O11, P12, Q31, R32, S33, T34, U35, V36, W37, X38, Y39
0100 Z65
0360A A1, B4, C5, D6, E7, F10
2000 SOME TEXT
0423 TEXT
I am currently trying to use Regex to do this but I have been told that there is an easier way to do this and I am not sure how. So far I have been able to add the zeros in front of the numbers. Also, my code is adding an "A" to the end of everything as well as keeping the original number on the next line and I am not grabbing the lines that begin with anything but a digit.
This is what my current output is turning out to look like:
0010A
0010
0020A
0020
0030A
0030
0060A
0060
0100A
0100
0360A
0360
2000
2000
0423A
0423
I am obviously doing something wrong using Regex.
Here is my current code:
private void openRefsButton_Click(object sender, EventArgs e)
{
// Initialize the OpenFileDialog to specify the .txt extension as well as
// its intial directory for the file.
openRefs.DefaultExt = "*.txt";
openRefs.Filter = ".txt Files|*.txt";
openRefs.InitialDirectory = "C:\\";
openRefs.RestoreDirectory = true;
try
{
// Open the contents of the file into the originalTextRichTextBox.
if (openRefs.ShowDialog() == DialogResult.OK && openRefs.FileName.Length > 0)
refsTextRichTextBox.LoadFile(openRefs.FileName, RichTextBoxStreamType.PlainText);
// Throws a FileNotFoundException otherwise.
else
throw new FileNotFoundException();
StreamReader refsInput = File.OpenText(openRefs.FileName);
string regExpression = #"^[\d]+";
string findNewBottomRegex = #"^B\s";
StringBuilder buildNumberText = new StringBuilder();
StringBuilder formatMatchText = new StringBuilder();
foreach (string allLines in File.ReadAllLines(openRefs.FileName))
{
Match newBottomMatch = Regex.Match(allLines, findNewBottomRegex);
Match numberStartMatch = Regex.Match(allLines, regExpression);
int counter = 0;
if (counter < numberStartMatch.Length)
{
if (numberStartMatch.Value.Length == 2)
{
if (refsTextRichTextBox.Text.Contains(newBottomMatch.ToString()))
{
finalTextRichTextBox.AppendText("00" + numberStartMatch + "A\n");
}
finalTextRichTextBox.AppendText("00" + numberStartMatch + "\n");
}
else if (numberStartMatch.Value.Length == 3)
{
if (refsTextRichTextBox.Text.Contains(newBottomMatch.ToString()))
{
finalTextRichTextBox.AppendText("0" + numberStartMatch + "A\n");
}
finalTextRichTextBox.AppendText("0" + numberStartMatch + "\n");
}
else
{
if (refsTextRichTextBox.Text.Contains(newBottomMatch.ToString()))
{
finalTextRichTextBox.AppendText(numberStartMatch + "A\n");
}
finalTextRichTextBox.AppendText(numberStartMatch + "\n");
}
counter++;
}
}
}
// Catches an exception if the file was not opened.
catch (Exception)
{
MessageBox.Show("There was not a specified file path.", "Path Not Found Error",
MessageBoxButtons.OK, MessageBoxIcon.Warning);
}
}
}
}
QUESTION(S):
What is a better way to go about doing this task?
Are there any recommendations on changing my code to be more efficient and cleaner?
How do I properly split each line into number, T/B, A40 when every line is not the same?
After the lines are properly split, how do I replace copy the line before if the current line begins with a "B"?
If the line begins with "Q31" or similar, how do I add that current line to the end of the previous one?
Once this happens, is there a way to concat everything to create the speficied format above?
WORK FLOW #jaywayco
Open Text File
Read file line by line
Save each line in a list of strings
Split each string by ' '
Find each line that starts with a digit
Replace that digit to make it 4 digits in length
Check the following text after the digit to see if it is a "B ", "T ", or "SOME TEXT"
if "B " copy the line above
Add an "A" to the end of the digit
if "T " remove the "T "
if "SOME TEXT" do nothing
Find each line that starts with a "B "
Copy the digits on the line above and concat to the front of the "B "
Follow step 4.b.i
Find each line that starts with (or similar to) "Q31"
Concat this line to the end of the previous line
...?
Here's a really lame, procedural solution:
using System.IO;
using System.Collections.Generic;
namespace ConsoleApplication
{
class Program
{
static void Main(string[] args)
{
var list = new List<string>();
using (var reader = File.OpenText(#"c:\input.txt"))
{
while (true)
{
var line = reader.ReadLine();
if (string.IsNullOrEmpty(line)) break;
list.Add(line);
}
}
list = HandleRemoveTRequirement(list);
list = HandleFourDigitRequirement(list);
list = HandleConcatRequirement(list);
list = HandleStartsWithBRequirement(list);
list = HandleSecondElementIsBRequirement(list);
using (var output = new StreamWriter(#"c:\output.txt"))
{
foreach (var line in list)
{
output.WriteLine(line);
}
}
}
static List<string> HandleSecondElementIsBRequirement(List<string> list)
{
var result = new List<string>();
foreach (var line in list)
{
var parts = line.Split(' ');
if (parts[1].Equals("B"))
{
parts[0] += "A";
parts[1] = string.Empty;
result.Add(string.Join(" ", parts).Replace(" ", " "));
}
else
{
result.Add(line);
}
}
return result;
}
static List<string> HandleStartsWithBRequirement(List<string> list)
{
var result = new List<string>();
var i = 0;
foreach (var line in list)
{
var parts = line.Split(' ');
if (parts[0].Equals("B"))
{
parts[0] = string.Empty;
result.Add(list[i - 1].Split(' ')[0] + "A" + string.Join(" ", parts));
}
else
{
result.Add(line);
}
i++;
}
return result;
}
static List<string> HandleConcatRequirement(List<string> list)
{
var result = new List<string>();
foreach (var line in list)
{
var parts = line.Split(' ');
int test;
if (int.TryParse(parts[0], out test) || parts[0].Equals("B"))
{
result.Add(line);
}
else
{
result[result.Count -1] += line;
}
}
return result;
}
static List<string> HandleRemoveTRequirement(List<string> list)
{
var result = new List<string>();
foreach (var line in list)
{
var parts = line.Split(' ');
if (parts[1].Equals("T"))
{
parts[1] = string.Empty;
}
result.Add(string.Join(" ", parts).Replace(" ", " "));
}
return result;
}
static List<string> HandleFourDigitRequirement(List<string> list)
{
var result = new List<string>();
foreach (var line in list)
{
var parts = line.Split(' ');
int test;
if (int.TryParse(parts[0], out test))
{
parts[0] = parts[0].PadLeft(4, '0');
result.Add(string.Join(" ", parts));
}
else
{
result.Add(line);
}
}
return result;
}
}
}
These are pretty complicated requirements and I would be tempted to implement this as a workflow. This way you can separate out each of the logical steps and this will increase maintainability.
I would be tempted to represent the text file as an array of string arrays or even a data table. Then you can write general functions that concatenate/transform specific values
One way to possibly approach this is similiar to jaywayco's.
I'd start with placing each line split by spaces into it's own array. Place that array into an Array of arrays. From there you can consider your workflow. Your line array that is split by the spaces you can determine how to print it based off the first value, being a number or letter B etc... If it's a B, you know that it should start with array[i-1] first value, which would be the number etc. You'd have to think through the logic a bit, but I think you can understand where I am coming from. I'm not sure if this is the best approach or not, but I think this is the way I would tackle it. Good luck!
Edit: Here is some mock code...
var mainArray = new Array[textFile.Count];
//obviously get the count of number of lines set that to the size of your array object.
for(int i=0; i < mainArray.Length; i++)
{
var line = methodToGetLineFromTextFile[i];
string[] lineArray = line.Split(' ');
mainArray[i] = lineArray;
}
//Once you have everything loaded into your arrays, apply your workflow logic.
Hope this helps!
The way I would go about this task is to write a set of unit tests based on your requirements, then make them pass one at a time (having one test per requirement).
As jaywayco suggested, I would read the file into an array of lines, then implement each of your rules as a line transformation method which can be tested in isolation. I would probably separate out the method which can select which transformation(s) to apply. Then loop over the lines and apply the transformations.