get all lines from a huge textfile after a string - c#

I have a lot of huge text files, I need to retrive all lines after certain string using c#,
fyi, the string will be there within last few lines, but not sure last how many lines.
sample text would be
someline
someline
someline
someline
etc
etc
"uniqueString"
line 1
line 2
line 3
I need to get lines
line 1
line 2
line 3

bool found=false;
List<String> lines = new List<String>();
foreach(var line in File.ReadLines(#"C:\MyFile.txt"))
{
if(found)
{
lines.Add(line);
}
if(!found && line.Contains("UNIQUEstring"))
{
found=true;
}
}

Try this code
public string[] GetLines()
{
List<string> lines = new List<string>();
bool startRead = false;
string uniqueString = "uniqueString";
using (StreamReader st = new StreamReader("File.txt"))
{
while (!st.EndOfStream)
{
if (!startRead && st.ReadLine().Equals(uniqueString))
startRead = true;
if (!startRead)
continue;
lines.Add(st.ReadLine());
}
}
return lines.ToArray();
}

Related

How to read and separate segments of a txt file?

I have a txt file, that has headers and then 3 columns of values (i.e)
Description=null
area = 100
1,2,3
1,2,4
2,1,5 ...
... 1,2,1//(these are the values that I need in one list)
Then another segment
Description=null
area = 10
1,2,3
1,2,4
2,1,5 ...
... 1,2,1//(these are the values that I need in one list).
In fact I just need one list per "Table" of values, the values always are in 3 columns but, there are n segments, any idea?
Thanks!
List<double> VMM40xyz = new List<double>();
foreach (var item in VMM40blocklines)
{
if (item.Contains(','))
{
VMM40xyz.AddRange(item.Split(',').Select(double.Parse).ToList());
}
}
I tried this, but it just work with the values in just one big list.
It looks like you want your data to end up in a format like this:
public class SetOfData //Feel free to name these parts better.
{
public string Description = "";
public string Area = "";
public List<double> Data = new List<double>();
}
...stored somewhere in...
List<SetOfData> finalData = new List<SetOfData>();
So, here's how I'd read that in:
public static List<SetOfData> ReadCustomFile(string Filename)
{
if (!File.Exists(Filename))
{
throw new FileNotFoundException($"{Filename} does not exist.");
}
List<SetOfData> returnData = new List<SetOfData>();
SetOfData currentDataSet = null;
using (FileStream fs = new FileStream(Filename, FileMode.Open))
{
using (StreamReader reader = new StreamReader(fs))
{
while (!reader.EndOfStream)
{
string line = reader.ReadLine();
//This will start a new object on every 'Description' line.
if (line.Contains("Description="))
{
//Save off the old data set if there is one.
if (currentDataSet != null)
returnData.Add(currentDataSet);
currentDataSet = new SetOfData();
//Now, to make sure there is something after "Description=" and to set the Description if there is.
//Your example data used "null" here, which this will take literally to be a string containing the letters "null". You can check the contents of parts[1] inside the if block to change this.
string[] parts = line.Split('=');
if (parts.Length > 1)
currentDataSet.Description = parts[1].Trim();
}
else if (line.Contains("area = "))
{
//Just in case your file didn't start with a "Description" line for some reason.
if (currentDataSet == null)
currentDataSet = new SetOfData();
//And then we do some string splitting like we did for Description.
string[] parts = line.Split('=');
if (parts.Length > 1)
currentDataSet.Area = parts[1].Trim();
}
else
{
//Just in case your file didn't start with a "Description" line for some reason.
if (currentDataSet == null)
currentDataSet = new SetOfData();
string[] parts = line.Split(',');
foreach (string part in parts)
{
if (double.TryParse(part, out double number))
{
currentDataSet.Data.Add(number);
}
}
}
}
//Make sure to add the last set.
returnData.Add(currentDataSet);
}
}
return returnData;
}

Removing specified text from CSV file

it's my first attempt at doing this and I have no idea if I'm on the right lines.
Basically I want to remove text from a CSV file that contains a specific keyword but I can't figure out how to remove the line.
static void Main(string[] args)
{
var searchItem = "running";
var lines = File.ReadLines("C://Users//Pete//Desktop//testdata.csv");
foreach (string line in lines)
{
if (line.Contains(searchItem))
{
//Remove line here?
}
}
}
Try this one to remove one or a few multiple words.
static void sd(string[] args)
{
string contents = File.ReadAllText("C://Users//Pete//Desktop//testdata.csv");
string output = contents.Replace("running", string.Empty).Replace("replaceThis", string.Empty).Replace("replaceThisToo", string.Empty);
//string output = contents.Replace("a", "b").Replace("b", "c").Replace("c", "d");
}
To remove multiple string, you can use this...
static void Main(string[] args)
{
string[] removeTheseWords = { "aaa", "bbb", "ccc" };
string contents = File.ReadAllText("C://Users//Pete//Desktop//testdata.csv");
string output = string.Empty;
foreach (string value in removeTheseWords)
{
output = contents.Replace(value, string.Empty);
}
}
More info: https://learn.microsoft.com/en-us/dotnet/api/system.string.replace
The simple way if you'd like to remove a whole line:
var searchItem = "running";
var pathToYourFile = #"C://Users//Pete//Desktop//testdata.csv";
var lines = File.ReadAllLines(pathToYourFile);
lines = lines.Where(line => !line.Contains(searchItem)).ToArray();
File.WriteAllLines(pathToYourFile, lines);
For multiple search items:
var searchItems = "running;walking;waiting;any";
var pathToYourFile = #"..\..\items.csv";
var lines = File.ReadAllLines(pathToYourFile);
// split with your separator, actually is ';' character
foreach(var searchItem in searchItems.Split(';'))
lines = lines.Where(line =>!line.Contains(searchItem)).ToArray();
File.WriteAllLines(pathToYourFile, lines);
if you are using foreach and removing from lines its will through an exception called collection modified exception so go with for
for(int i=lines.Count - 1; i > -1; i--)
{
if (lines[i].Contains(searchItem))
{
lines.RemoveAt(i);
}
}
You don't need to remove line just skip those lines that contain your search term
foreach (string line in lines)
{
if (!line.Contains(searchItem)) //<= Notice here I added exclamation mark (!)
{
//Do your work when line does not contains search term
}
else
{
//Do something if line contains search term
}
}
Or alternative is to filtered your lines that does not contains your search term before loop like
lines = lines.Where(line => !line.Contains(searchItem));
foreach (string line in lines)
{
//Here are those line that does not contain search term
}
If your search term contains multiple words separated with comma(,) then you can skip those line by
lines = lines.Where(line => searchItem.Split(',').All(term => !line.Contains(term)));

Delete rows in a csv file

I have two files: Example1.csv and Example2.csv, note they are not comma-separated, but are saved with the 'csv' extension.
Example 1 has 1 column which has emails address only
Example 2 has many columns in which it has the column that is there in example 1 csv file.
Example1.csv file
emails
abc#gmail.com
jhg#yahoo.com
...
...
Example 2.csv
Column1 column2 Column3 column4 emails
1 45 456 123 abc#gmail.com
2 89 898 254 jhg#yahoo.com
3 85 365 789 ...
Now i need to delete the rows in example2.csv that matches with data in example 1 file, for example: Row 1 and 2 should be removed as they both the email matches.
string[] lines = File.ReadAllLines(#"C:\example2.csv");
var emails = File.ReadAllLines(#"C:\example1.csv");
List<string> linesToWrite = new List<string>();
foreach (string s in lines)
{
String[] split = s.Split(' ');
if (s.Contains(emails))
linesToWrite.Remove(s);
}
File.WriteAllLines("file3.csv", linesToWrite);
This should work:
var emails = new HashSet<string>(File.ReadAllLines(#"C:\example1.csv").Skip(1));
File.WriteAllLines("file3.csv", File.ReadAllLines("C:\example2.csv").Where(line => !emails.Contains(line.Split(',')[4]));
It reads all of file one, puts all emails into a format where lookup is easy, then goes through all lines in the second file and writes only those to disk that don't match any of the existing emails in their 5th column. You may want to expand on many parts, for example there is little to no error handling. It also compares emails case-sensitive, although emails are normally not.
Variable line is not string, but string array, same as lines, you are reading it in the same way as lines.
Also this line
if (s.Contains(line))
is not correct. You are trying to check if a string contains an array. If you need to check if a line contains an email from list, then this will be better:
if (split.Intersect(line).Any())
So, here is the final code.
var lines = File.ReadAllLines(#"C:\example2.csv");
var line = File.ReadAllLines(#"C:\example1.csv");
var linesToWrite = new List<string>();
foreach (var s in lines)
{
var split = s.Split(',');
if (split.Intersect(line).Any())
{
linesToWrite.Remove(s);
}
}
File.WriteAllLines("file3.csv", linesToWrite);
static void Main(string[] args)
{
var Example1CsvPath = #"C:\Inetpub\Poligon\Poligon\Resources\Example1.csv";
var Example2CsvPath = #"C:\Inetpub\Poligon\Poligon\Resources\Example2.csv";
var Example3CsvPath = #"C:\Inetpub\Poligon\Poligon\Resources\Example3.csv";
var EmailsToDelete = new List<string>();
var Result = new List<string>();
foreach(var Line in System.IO.File.ReadAllLines(Example1CsvPath))
{
if (!string.IsNullOrWhiteSpace(Line) && Line.IndexOf('#') > -1)
{
EmailsToDelete.Add(Line.Trim());
}
}
foreach (var Line in System.IO.File.ReadAllLines(Example2CsvPath))
{
if (!string.IsNullOrWhiteSpace(Line))
{
var Values = Line.Split(' ');
if (!EmailsToDelete.Contains(Values[4]))
{
Result.Add(Line);
}
}
}
System.IO.File.WriteAllLines(Example3CsvPath, Result);
}
I know this is 4 years-old... But I've got some ideas from this and I like to share my solution...
The idea behind this code is a simple CSV, with maximum of about 20 lines (reeeeally maximum), so I've decided to make something basic and not use a DB for this.
My solution is to rescan the CSV saving all variables (that is not the same that I like to delete) into a list and after scanning the CSV, it writes the list into the CSV (removing the one I've passed {textBox1})
List<string> _ = new();
try {
using (var reader = new StreamReader($"{Main.directory}\\bin\\ip.csv")) {
while (!reader.EndOfStream) {
var line = reader.ReadLine();
var values = line.Split(',');
if (values[0] == textBox1.Text || values[1] == textBox2.Text)
continue;
_.Add($"{values[0]},{values[1]},{values[2]},");
}
}
File.WriteAllLines($"{Main.directory}\\bin\\ip.csv", _);
} catch (Exception f) {
MessageBox.Show(f.Message);
}

I encountered System.IndexOutOfRangeException when loading a txt file

I am trying the load a txt file which is written under a certain format, then I have encountered System.IndexOutOfRangeException. Do you have any idea on what's wrong with my codes? Thank you!
txt.File:
P§Thomas§40899§2§§§
P§Damian§40726§1§§§
P=Person; Thomas=first name; 40899=ID; 2=status
here are my codes:
using (StreamReader file = new StreamReader(fileName))
{
while (file.Peek() >= 0)
{
string line = file.ReadLine();
char[] charSeparators = new char[] { '§' };
string[] parts = line.Split(charSeparators, StringSplitOptions.RemoveEmptyEntries);
foreach (PersonId personids in PersonIdDetails)
{
personids.ChildrenVisualisation.Clear();
foreach (PersonId personidchildren in personids.Children)
{
personidchildren.FirstName = parts[1];
personidchildren.ID = parts[2];
personidchildren.Status = parts[3];
personids.ChildrenVisualisation.Add(personidchildren);
}
}
}
}
at parts[1] the exception was thrown.
You should check if parts have enough items:
...
string[] parts = line.Split(charSeparators, StringSplitOptions.RemoveEmptyEntries);
foreach (PersonId personids in PersonIdDetails) {
personids.ChildrenVisualisation.Clear();
// Check if parts have enough infirmation: at least 3 items
if (parts.Length > 3) // <- Pay attention for "> 3"
foreach (PersonId personidchildren in personids.Children) {
//TODO: Check, do you really start with 1, not with 0?
personidchildren.FirstName = parts[1];
personidchildren.ID = parts[2];
personidchildren.Status = parts[3];
personids.ChildrenVisualisation.Add(personidchildren);
}
else {
// parts doesn't have enough data
//TODO: clear personidchildren or throw an exception
}
}
...
I'm given the impression that the actual file is not that big, so it might be useful to use File.ReadAllLines instead (the con is the you need to have the entire file in memory), which gives you all the lines.
Also, removing the lines which are either empty or just whitespace might be necessary.
foreach (var line in File.ReadAllLines(fileName).Where(l => !string.IsNullOrWhiteSpace(l))
{
char[] charSeparators = new char[] { '§' };
string[] parts = line.Split(charSeparators, StringSplitOptions.RemoveEmptyEntries);
foreach (PersonId personids in PersonIdDetails)
{
personids.ChildrenVisualisation.Clear();
foreach (PersonId personidchildren in personids.Children)
{
personidchildren.FirstName = parts[1];
personidchildren.ID = parts[2];
personidchildren.Status = parts[3];
personids.ChildrenVisualisation.Add(personidchildren);
}
}
}
Change the first line to
using (StreamReader file = new StreamReader(fileName, Encoding.GetEncoding("iso-8859-1")));
Possible way to do it, It's just one solution between more solutions
using (StreamReader file = new StreamReader(fileName))
{
while (file.Peek() >= 0)
{
string line = file.ReadLine();
char[] charSeparators = new char[] { '§' };
string[] parts = line.Split(charSeparators, StringSplitOptions.RemoveEmptyEntries);
foreach (PersonId personids in PersonIdDetails)
{
personids.ChildrenVisualisation.Clear();
foreach (PersonId personidchildren in personids.Children)
{
if(parts.Length > 3)//Only if you want to save lines with all parts but you can create an else clause for other lines with 1 or 2 parts depending on the length
{
personidchildren.FirstName = parts[1];
personidchildren.ID = parts[2];
personidchildren.Status = parts[3];
personids.ChildrenVisualisation.Add(personidchildren);
}
}
}
}
}

How to read tab delimited lines by skipping alternate lines

I am currently able to parse and extract data from large tab delimited file. I am reading, parsing and extracting line by line and adding the split items in my Data table (Row Limit adding 3 rows at a time). I need to skip even lines i.e. Read first maximum tab delimited line and then skip 2nd one and read the third one directly.
My Tab delimited source file format
001Mean 26.975 1.1403 910.45
001Stdev 26.975 1.1403 910.45
002Mean 26.975 1.1403 910.45
002Stdev 26.975 1.1403 910.45
Need to skip or avoid reading Stdev tab delimited lines.
C# Code:
Getting the Maximum length of items in a tab delimited line of the file by splitting a line
using (var reader = new StreamReader(sourceFileFullName))
{
string line = null;
line = reader.ReadToEnd();
if (!string.IsNullOrEmpty(line))
{
var list_with_max_cols = line.Split('\n').OrderByDescending(y => y.Split('\t').Count()).Take(1);
foreach (var value in list_with_max_cols)
{
var values = value.ToString().Split(new[] { '\t', '\n' }).ToArray();
MAX_NO_OF_COLUMNS = values.Length;
}
}
}
Reading the file line by line until maximum length in a tab delimited line is satisfied as first line to parse and extract
using (var reader = new StreamReader(sourceFileFullName))
{
string new_read_line = null;
//Read and display lines from the file until the end of the file is reached.
while ((new_read_line = reader.ReadLine()) != null)
{
var items = new_read_line.Split(new[] { '\t', '\n' }).ToArray();
if (items.Length != MAX_NO_OF_COLUMNS)
continue;
//when reach first line it is column list need to create datatable based on that.
if (firstLineOfFile)
{
columnData = new_read_line;
firstLineOfFile = false;
continue;
}
if (firstLineOfChunk)
{
firstLineOfChunk = false;
chunkDataTable = CreateEmptyDataTable(columnData);
}
AddRow(chunkDataTable, new_read_line);
chunkRowCount++;
if (chunkRowCount == _chunkRowLimit)
{
firstLineOfChunk = true;
chunkRowCount = 0;
yield return chunkDataTable;
chunkDataTable = null;
}
}
}
Creating Data Table:
private DataTable CreateEmptyDataTable(string firstLine)
{
IList<string> columnList = Split(firstLine);
var dataTable = new DataTable("TableName");
for (int columnIndex = 0; columnIndex < columnList.Count; columnIndex++)
{
string c_string = columnList[columnIndex];
if (Regex.Match(c_string, "\\s").Success)
{
string tmp = Regex.Replace(c_string, "\\s", "");
string finaltmp = Regex.Replace(tmp, #" ?\[.*?\]", ""); // To strip strings inside [] and inclusive [] alone
columnList[columnIndex] = finaltmp;
}
}
dataTable.Columns.AddRange(columnList.Select(v => new DataColumn(v)).ToArray());
dataTable.Columns.Add("ID");
return dataTable;
}
How to skip lines by reading alternatively and split and then add to my datatable !!!
AddRow Function : Managed to achieve my requirement by adding following changes !!!
private void AddRow(DataTable dataTable, string line)
{
if (line.Contains("Stdev"))
{
return;
}
else
{
//Rest of Code
}
}
Considering you have tab separated values in each line, how about reading the odd lines and splitting them into arrays. This is just a sample; you can expand upon this.
Test data (file.txt)
luck is when opportunity meets preparation
this line needs to be skipped
microsoft visual studio
another line to be skipped
let us all code
Code
var oddLines = File.ReadLines(#"C:\projects\file.txt").Where((item, index) => index%2 == 0);
foreach (var line in oddLines)
{
var words = line.Split('\t');
}
Debug screen shots
EDIT
To get lines that don't contain 'Stdev'
var filteredLines = System.IO.File.ReadLines(#"C:\projects\file.txt").Where(item => !item.Contains("Stdev"));
Change
using (var reader = new StreamReader(sourceFileFullName))
{
string new_read_line = null;
//Read and display lines from the file until the end of the file is reached.
while ((new_read_line = reader.ReadLine()) != null)
{
var items = new_read_line.Split(new[] { '\t', '\n' }).ToArray();
if (items.Length != MAX_NO_OF_COLUMNS)
continue;
To
using (var reader = new StreamReader(sourceFileFullName))
{
int cnt = 0;
string new_read_line = null;
//Read and display lines from the file until the end of the file is reached.
while ((new_read_line = reader.ReadLine()) != null)
{
cnt++;
if(cnt % 2 == 0)
continue;
var items = new_read_line.Split(new[] { '\t', '\n' }).ToArray();
if (items.Length != MAX_NO_OF_COLUMNS)
continue;

Categories

Resources