How to delete first and last line from a text file c#? - c#

I found this code on stackoverflow to delete first and last line from a text file.
But I'm not getting how to combine this code into one so that it will delete 1st and
last line from a single file?
What I tried was using streamreader read the file and then skip 1st and last line then
streamwriter to write in new file, but couldn't get the proper structure.
To delete first line.
var lines = System.IO.File.ReadAllLines("test.txt");
System.IO.File.WriteAllLines("test.txt", lines.Skip(1).ToArray());
to delete last line.
var lines = System.IO.File.ReadAllLines("...");
System.IO.File.WriteAllLines("...", lines.Take(lines.Length - 1).ToArray());

You can chain the Skip and Take methods. Remember to subtract the appropriate number of lines in the Take method. The more you skip at the beginning, the less lines remain.
var filename = "test.txt";
var lines = System.IO.File.ReadAllLines(filename);
System.IO.File.WriteAllLines(
filename,
lines.Skip(1).Take(lines.Length - 2)
);

Whilst probably not a major issue in this case, the existing answers all rely on reading the entire contents of the file into memory first. For small files, that's probably fine, but if you're working with very large files, this could prove prohibitive.
It is reasonably trivial to create a SkipLast equivalent of the existing Skip Linq method:
public static class SkipLastExtension
{
public static IEnumerable<T> SkipLast<T>(this IEnumerable<T> source, int count)
{
var queue = new Queue<T>();
foreach (var item in source)
{
queue.Enqueue(item);
if (queue.Count > count)
{
yield return queue.Dequeue();
}
}
}
}
If we also define a method that allows us to enumerate over each line of a file without pre-buffering the whole file (per: https://stackoverflow.com/a/1271236/381588):
static IEnumerable<string> ReadFrom(string filename)
{
using (var reader = File.OpenText(filename))
{
string line;
while ((line = reader.ReadLine()) != null)
{
yield return line;
}
}
}
Then we can use the following the following one-liner to write a new file that contains all the lines from the original file, except the first and last:
File.WriteAllLines("output.txt", ReadFrom("input.txt").Skip(1).SkipLast(1));
This is undoubtedly (considerably) more code than the other answers that have already been posted here, but should work on files of essentially any size, (as well as providing a code for a potentially useful SkipLast extension method).

Here's a different approach that uses ArraySegment<string> instead:
var lines = File.ReadAllLines("test.txt");
File.WriteAllLines("test.txt", new ArraySegment<string>(lines, 1, lines.Length-2));

Related

search in big files performance difference

I checked 2 ways to search in big files.
I tested on 500mb size file.
1st way took 9500ms and 2nd way took 11500ms.
How could it happen?
Buffering is faster than accessing the resources on each iteration.
Linq is more powerfull than foreach search.
Is it trouble with memory allocation?
1:
var __file = new System.IO.StreamReader(file);
var line = "";
while ((line = __file.ReadLine()) != null)
{
var firstOccurrence = line.Contains(contains);
}
__file.Close();
2:
var lines = File.ReadAllLines(_file);
var firstOccurrence = lines.FirstOrDefault(l => l.Contains(contains));
In your first code snippet, you don't stop looping when you find a match. Try something like this:
while ((line = __file.ReadLine()) != null)
{
var firstOccurrence = line.Contains(contains);
if (firstOccurrence)
{
break;
}
}
In your second code snippet, you read the entire file into memory, and then start looking through it line-by-line. This is different to your first code snippet, where you read the file off disk one line at a time.
The equivalent method is File.ReadLines -- this reads the file line-by-line:
var firstOccurrence = File.ReadLines(_file).FirstOrDefault(l => l.Contains(Contains));

C# list.remove throwing enumeration operation error in foreach loop

I have looked at multiple SO questions related to mine, as well as Googling, and I have been unable to find a solution that works for me.
Given a list of files, I am trying to cull all the non-cs ones, for reasons. I take the string array, convert to list, and iterate over the list, removing all the files I don't want by using list.Remove().
After the first removal, it errors out with
Collection was modified; enumeration operation might not execute
The code is:
string[] files = null;
try
{
files = System.IO.Directory.GetFiles(currentDir);
//Converting to list, to use list.Remove rather than rewriting entire array at each delete
var list = new List<string>(files);
foreach (string readFile in list)
{
if (Path.GetExtension(readFile) != ".cs") //|| Path.GetExtension(readFile) != ".dll")
{
//remove, as we don't currently care about non cs files.
list.Remove(readFile);
}
}
//Converting back to string array for use in the rest of the program
files = list.ToArray();
}
I have also tried RemoteAt(), which produces the same error.
string[] files = null;
try
{
files = System.IO.Directory.GetFiles(currentDir);
//Converting to list, to use list.Remove rather than rewriting entire array at each delete
var list = new List<string>(files);
int i = 0;
for(int i=0; i<list.Count();i++)
{
if (Path.GetExtension(readFile) != ".cs") //|| Path.GetExtension(readFile) != ".dll")
{
//remove, as we don't currently care about non cs files.
list.RemoveAt(i);
}
i++;
}
//Converting back to string array for use in the rest of the program
files = list.ToArray();
}
Any recommendations for overcoming this error, as some of the directories will have over 500 files in them, and I want to avoid rewriting the string array as much as possible.
I have read the following SO questions:
How to remove item from list in C#?
C# error Collection was modified; enumeration operation might not execute
Enumerations Foreach Loop C#
"List.Remove" in C# does not remove item?
https://learn.microsoft.com/en-us/dotnet/api/system.collections.generic.list-1.remove?view=netframework-4.8
https://learn.microsoft.com/en-us/dotnet/api/system.collections.generic.list-1.removeat?view=netframework-4.8
You can also use a for loop in reverse order so that removed entries do not effect the loop.
for (int x = list.Count-1; x >= 0; x--)
{
string readFile = list[x];
// ...
list.Remove(readFile);
}
If I change
foreach (string readFile in list)
to
foreach (string readFile in list.ToList())
it works.
My guess is that the list.Remove alters the original list, and invalidates the enumeration.
You can use the RemoveAll method on List:
list.RemoveAll(readFile => Path.GetExtension(readFile) != ".cs");
This will remove everything that matches your predicate, and will return the number of items it removed.

IEnumerable.Take(0) on File.ReadLines seems not to dispose/close the File handle

I have a function which Skips n lines of code and Takes y lines from a given file using File.ReadLines with Skip and Take combination. When I try to open the file given by filePath the next time:
string[] Lines = File.ReadLines(filePath).Skip(0).Take(0).ToArray();
using (StreamWriter streamWriter = new StreamWriter(filePath))
{
// ...
}
I get a File in use by another process exception on the "using" line.
It looks like IEnumerable.Take(0) is the culprit, since it returns an empty IEnumerable without enumerating on the object returned by File.ReadLines(), which I believe is not disposing the file.
Am I right? Should they not enumerate to avoid this kind of errors? How to do this properly?
This is basically a bug in File.ReadLines, not Take. ReadLines returns an IEnumerable<T>, which should logically be lazy, but it eagerly opens the file. Unless you actually iterate over the return value, you have nothing to dispose.
It's also broken in terms of only iterating once. For example, you should be able to write:
var lines = File.ReadLines("text.txt");
var query = from line1 in lines
from line2 in lines
select line1 + line2;
... that should give a cross-product of lines in the file. It doesn't, due to the brokenness.
File.ReadLines should be implemented something like this:
public static IEnumerable<string> ReadLines(string filename)
{
return ReadLines(() => File.OpenText(filename));
}
private static IEnumerable<string> ReadLines(Func<TextReader> readerProvider)
{
using (var reader = readerProvider())
{
string line;
while ((line = reader.ReadLine()) != null)
{
yield return line;
}
}
}
Unfortunately it's not :(
Options:
Use the above instead of File.ReadLines
Write your own implementation of Take which always starts iterating, e.g.
public static IEnumerable<T> Take<T>(this IEnumerable<T> source, int count)
{
// TODO: Argument validation
using (var iterator = source.GetEnumerator())
{
while (count > 0 && iterator.MoveNext())
{
count--;
yield return iterator.Current;
}
}
}
From the comment above File.ReadLines() in the Reference Source, it becomes obvious that the team responsible knew about this "bug":
Known issues which cannot be changed to remain compatible with 4.0:
The underlying StreamReader is allocated upfront for the IEnumerable<T> before
GetEnumerator has even been called. While this is good in that exceptions such as
DirectoryNotFoundException and FileNotFoundException are thrown directly by
File.ReadLines (which the user probably expects), it also means that the reader
will be leaked if the user never actually foreach's over the enumerable (and hence
calls Dispose on at least one IEnumerator<T> instance).
So they wanted File.ReadLines() to throw immediately when passed an invalid or unreadable path, as opposed to throwing when enumerating.
The alternative is simple: not calling Take(0), or instead not reading the file altogether if you aren't actually interested in its contents.
In my opinion, the root cause is Enumerable.Take iterator doesn't dispose an underlying iterator if the count is zero, since the code doesn't enter the foreach loop - see referencesource.
If one modifies the code in following way the issue gets resolved:
static IEnumerable<TSource> TakeIterator<TSource>(IEnumerable<TSource> source, int count)
{
foreach (TSource element in source)
{
if (--count < 0) break;
yield return element;
}
}

Can't find string in input file

I have a text file, which I am trying to insert a line of code into. Using my linked-lists I believe I can avoid having to take all the data out, sort it, and then make it into a new text file.
What I did was come up with the code below. I set my bools, but still it is not working. I went through debugger and what it seems to be going on is that it is going through the entire list (which is about 10,000 lines) and it is not finding anything to be true, so it does not insert my code.
Why or what is wrong with this code?
List<string> lines = new List<string>(File.ReadAllLines("Students.txt"));
using (StreamReader inFile = new StreamReader("Students.txt", true))
{
string newLastName = "'Constant";
string newRecord = "(LIST (LIST 'Constant 'Malachi 'D ) '1234567890 'mdcant#mail.usi.edu 4.000000 )";
string line;
string lastName;
bool insertionPointFound = false;
for (int i = 0; i < lines.Count && !insertionPointFound; i++)
{
line = lines[i];
if (line.StartsWith("(LIST (LIST "))
{
values = line.Split(" ".ToCharArray());
lastName = values[2];
if (newLastName.CompareTo(lastName) < 0)
{
lines.Insert(i, newRecord);
insertionPointFound = true;
}
}
}
if (!insertionPointFound)
{
lines.Add(newRecord);
}
You're just reading the file into memory and not committing it anywhere.
I'm afraid that you're going to have to load and completely re-write the entire file. Files support appending, but they don't support insertions.
you can write to a file the same way that you read from it
string[] lines;
/// instanciate and build `lines`
File.WriteAllLines("path", lines);
WriteAllLines also takes an IEnumerable, so you can past a List of string into there if you want.
one more issue: it appears as though you're reading your file twice. one with ReadAllLines and another with your StreamReader.
There are at least four possible errors.
The opening of the streamreader is not required, you have already read
all the lines. (Well not really an error, but...)
The check for StartsWith can be fooled if you lines starts with blank
space and you will miss the insertionPoint. (Adding a Trim will remove any problem here)
In the CompareTo line you check for < 0 but you should check for == 0. CompareTo returns 0 if the strings are equivalent, however.....
To check if two string are equals you should avoid using CompareTo as
explained in MSDN link above but use string.Equals
List<string> lines = new List<string>(File.ReadAllLines("Students.txt"));
string newLastName = "'Constant";
string newRecord = "(LIST (LIST 'Constant 'Malachi 'D ) '1234567890 'mdcant#mail.usi.edu 4.000000 )";
string line;
string lastName;
bool insertionPointFound = false;
for (int i = 0; i < lines.Count && !insertionPointFound; i++)
{
line = lines[i].Trim();
if (line.StartsWith("(LIST (LIST "))
{
values = line.Split(" ".ToCharArray());
lastName = values[2];
if (newLastName.Equals(lastName))
{
lines.Insert(i, newRecord);
insertionPointFound = true;
}
}
}
if (!insertionPointFound)
lines.Add(newRecord);
I don't list as an error the missing write back to the file. Hope that you have just omitted that part of the code. Otherwise it is a very simple problem.
(However I think that the way in which CompareTo is used is probably the main reason of your problem)
EDIT Looking at your comment below it seems that the answer from Sam I Am is the right one for you. Of course you need to write back the modified array of lines. All the changes are made to an in memory array of lines and nothing is written back to a file if you don't have code that writes a file. However you don't need new file
File.WriteAllLines("Students.txt", lines);

Most memory efficient way to merge two files

I need to merge two files while also applying a sort. It is important the I keep the task light on memory usage. I need to create a console app in c# for this.
Input File 1:
Some Header
A12345334
A00123445
A44566555
B55677
B55683
B66489
record count: 6
Input File 2:
Some Header
A00123465
B99423445
record count: 2
So, I need to make sure that the third file should have all the "A" records coming first and then the "B" records followed by the Total record count.
Output File:
Some header
A12345334
A00123445
A44566555
A00123465
B99423445
B55677
B55683
B66489
record count: 8
Record sorting within "A" and "B" is not relevant.
Since your source files appear sorted, you can do with with very low memory usage.
Just open both input files as well as a new file for writing. Then compare the next available line from each input file and write the line that comes first to your output file. Each time you write a line to the output file, get the next line from the input file it came from.
Continue until both input files are finished.
If memory is an issue the easiest way to do this is probably going to be to read the records from both files, store them in a SQLite or SQL Server Compact database, and execute a SELECT query that returns a sorted record set. Make sure you have an index on the field you want to sort on.
That way, you don't have to store the records in memory, and you don't need any sorting algorithms; the database will store the records on disk and do your sorting for you.
Quick idea, assuming the records are already sorted in the original files:
Start looping through file 2, collecting all A-records
Once you reach the first B-record, start collecting those in a separate collection.
Read all of File 1.
Write out the content of the A-records collection from file 2, then append the contents read from file 1, followed by the B-records from file 2.
Visualized:
<A-data from file 2>
<A-data, followed by B-data from file 1>
<B-data from file 2>
If you are concerned about memory this is a perfect case for insertion sort and read one line at a time from each file. If that is not an issue read the whole thing into a list and just call sort the write it out.
If you can't even keep the whole sorted list in memory then a database or memory mapped file is you best bet.
Assuming your input files are already ordered:
Open Input files 1 and 2 and create the Output file.
Read the first record from file 1. If it starts with A, write it to the output file. Continue reading from input file 1 until you reach a record that starts with B.
Read the first record from file 2. If it start with A, write it to the output file. Continue reading from input file 2 until you reach a record that starts with B.
Go back to file 1, and write the 'B' record to the output file. Continue reading from input file 1 until you reach the end of the stream.
Go back to file 2, and write the 'B' record to the output file. Continue reading from input file 2 until you reach the end of the stream.
This method will prevent you from ever having to hold more than 2 rows of data in memory at a time.
i would recommend using StreamReader and StreamWriter for this application. So you can open a file using StreamWriter, copy all lines using StreamReader for file #1, then for file #2. This operations are very fast, have integrated buffers and are very lightweight.
if the input files are already sorted by A and B, you can switch between the source readers to make the output sorted.
Since you have two sorted sequences you just need to merge the two sequences into a single sequence, in much the same way the second half of the MergeSort algorithm works.
Unfortunately, given the interface that IEnumerable provides, it ends up a bit mess and copy-pasty, but it should perform quite well and use a very small memory footprint:
public class Wrapper<T>
{
public T Value { get; set; }
}
public static IEnumerable<T> Merge<T>(IEnumerable<T> first, IEnumerable<T> second, IComparer<T> comparer = null)
{
comparer = comparer ?? Comparer<T>.Default;
using (var secondIterator = second.GetEnumerator())
{
Wrapper<T> secondItem = null; //when the wrapper is null there are no more items in the second sequence
if (secondIterator.MoveNext())
secondItem = new Wrapper<T>() { Value = secondIterator.Current };
foreach (var firstItem in first)
{
if (secondItem != null)
{
while (comparer.Compare(firstItem, secondItem.Value) > 0)
{
yield return secondItem.Value;
if (secondIterator.MoveNext())
secondItem.Value = secondIterator.Current;
else
secondItem = null;
}
}
yield return firstItem;
yield return secondItem.Value;
while (secondIterator.MoveNext())
yield return secondIterator.Current;
}
}
}
Once you have a Merge function it's pretty trivial:
File.WriteAllLines("output.txt",
Merge(File.ReadLines("File1.txt"), File.ReadLines("File2.txt")))
The File ReadLines and WriteAllLines here each utilize IEnumerable and will stream the lines accordingly.
Here's the source code for the more generic/boiler plate solution for merge sorting 2 files.
public static void Merge(string inFile1, string inFile2, string outFile)
{
string line1 = null;
string line2 = null;
using (StreamReader sr1 = new StreamReader(inFile1))
{
using (StreamReader sr2 = new StreamReader(inFile2))
{
using (StreamWriter sw = new StreamWriter(outFile))
{
line1 = sr1.ReadLine();
line2 = sr2.ReadLine();
while(line1 != null && line2 != null)
{
// your comparison function here
// ex: (line1[0] < line2[0])
if(line1 < line2)
{
sw.WriteLine(line1);
line1 = sr1.ReadLine();
}
else
{
sw.WriteLine(line2);
line2 = sr2.ReadLine();
}
}
while(line1 != null)
{
sw.WriteLine(line1);
line1 = sr1.ReadLine();
}
while(line2 != null)
{
sw.WriteLine(line2);
line2 = sr2.ReadLine();
}
}
}
}
}
public void merge_click(Object sender, EventArgs e)
{
DataTable dt = new DataTable();
dt.Clear();
dt.Columns.Add("Name");
dt.Columns.Add("designation");
dt.Columns.Add("age");
dt.Columns.Add("year");
string[] lines = File.ReadAllLines(#"C:\Users\user1\Desktop\text1.txt", Encoding.UTF8);
string[] lines1 = File.ReadAllLines(#"C:\Users\user2\Desktop\text1.txt", Encoding.UTF8);
foreach (string line in lines)
{
string[] values = line.Split(',');
DataRow dr = dt.NewRow();
dr["Name"] = values[0].ToString();
dr["designation"] = values[1].ToString();
dr["age"] = values[2].ToString();
dr["year"] = values[3].ToString();
dt.Rows.Add(dr);
}
foreach (string line in lines1)
{
string[] values = line.Split(',');
DataRow dr = dt.NewRow();
dr["Name"] = values[0].ToString();
dr["designation"] = values[1].ToString();
dr["age"] = values[2].ToString();
dr["year"] = values[3].ToString();
dt.Rows.Add(dr);
}
grdstudents.DataSource = dt;
grdstudents.DataBind();
}

Categories

Resources