C# Line by Line check of a file - c#

I am reading from a file line by line and i want to edit some of the lines i read..
The lines i want to edit must have some other specific lines above or below them, but i dont know how to express that in C#.
For example:
http://www.youtube.com
You Tube | Music | something
Sports
Music
Radio
Clips
http://www.youtube.com/EDIT ME
Sports
Music
Radio
Clips
and i want to edit a line
only if next line is
Sports
and the previous line is
Clips
So the only line i want to edit from the example above is
http://www.youtube.com/EDIT ME
Any ideas?

You can't really "edit" a file line by line. It would be best to take one of two approaches:
Read the whole file into memory, e.g. using File.ReadAllLines, then make appropriate changes in memory and rewrite the whole thing (e.g. using File.WriteAllLines)
Open both the input file and output file at the same time. Read a line at a time, work out what it should correspond to in the output file, then write the line.
The first version is simpler, but obviously requires more memory. It's particularly tricky if you need to look at the next and previous line, as you mentioned.
Simple example of the first approach:
string[] lines = File.ReadAllLines("foo.txt");
for (int i = 1; i < lines.Length - 1; i++)
{
if (lines[i - 1] == "Clips" && lines[i + 1] == "Sports")
{
lines[i] = "Changed";
}
}
File.WriteAllLines("foo.txt", lines);

Approach #1
While you're looping through the file, keep a variable for the the previous line (always update this to the current line before you loop). Now you know the current line and previous line, you can decide if you need to edit the current line.
Approach #2
While you're looping through the file, set a flag corresponding to some condition e.g. I've just found Sports. If you later find a condition that should unset the flag e.g. I've just found Radio, un set it. If you find Clips you can check is the SportsFlag set to see if you need to edit this Clips line.
The second approach is more flexible (allows you to set and unset multiple flags depending on the current line) and and is good if there could be multiple lines of crud between Sports and Clips. It's effectively a poor mans State Machine

If the file isn't that large, I would read in the entire file as a string. Then you can freely manipulate it using methods like indexOf and substring.
Once you have the string how you need it, write it back over the file you had.

Related

Call Length Property on Returned Array in Chained String/LINQ Methods of C#

I found this post on selecting a range from an array, and have to use the LINQ option:
Selecting a range of items inside an array in C#
Ultimately, I'm trying to get the last four lines from some text file. After, I've read in and cleaned the lines for unwanted characters and empty lines, I have an array with all of the lines. I'm using the following to do so:
string[] allLines = GetEachLine(results);
string[] lastFourLines = allLines.Skip(allLines.Length - 4).Take(4).ToArray();
This works fine, but I'm wondering if I could somehow skip assinging to the allLines variable all together. Such as:
string[] lastFourLines = GetEachLine(results).Skip(returnedArrayLength - 4).Take(4).ToArray();
It would be better to change GetEachLine and code preceding it (however results is computed) to use IEnumerable<T> and avoid using an array to read the entire file in memory for the last four lines (unless you use all of results for something else) - consider using File.ReadLines.
However, if you are using .Net Core 2.0 or greater, you can use Enumerable.TakeLast to efficiently return the last four lines:
var lastFourLines = GetEachLine(results).TakeLast(4);
if GetEachLine() returns string[] then that should work fine, though null checking may be needed.
As you chain more you may want to use line breaks to increase readability:
string[] lastFourLines = GetEachLine(results)
.Skip(allLines.Length - 4)
.Take(4)
.ToArray();
allLines.Length won't exist unless you still have line 1 from your question, you can avoid calling GetEachLine() twice by using TakeLast().
string[] lastFourLines = GetEachLine(results)
.TakeLast(4)
.ToArray();
If you are looking to efficiently retrieve the last N (filtered) line of a large file, you really need to start at the point where you are reading the file contents.
Consider a 1GB log file containing 10M records, where you only want the last few lines. Ideally, you would want to start by reading the last couple KB and then start extracting lines by searching for line breaks from the end, extracting each line and returning them in an iterator yield. If you run out of data, read the preceding block. Continue only as long as the consumer requests more values from the iterator.
Offhand, I don't know a built-in way to do this, and coding this from scratch could get pretty involved. Luckily, a search turned up this similar question having a highly rated answer.

Convert text of a C# project into 1 text file

So I'm doing Google Code Jam, and for their new format I have to upload my code as a single text file.
I like writing my code as properly constructed classes and multiple files even when under time pressure (I find that I save more time in clarity and my own debugging speed than I lose in wasted time.) and I want to re-use the common code.
Once I've got my code finished I have to convert from a series of classes in multiple files, to a single file.
Currently I'm just manually copying and pasting all the files' text into a single file, and then manually massaging the usings and namespaces to make it all work.
Is there a better option?
Ideally a tool that will JustDoIt for me?
Alternatively, if there were some predictable algorithm that I could implement that wouldn't require any manual tweaks?
Write your classes so that all "using"s are inside "namespace"
Write a script which collects all *.cs files and concatenates them
This is probably not the most optimal way to do this but this is a algorithm which can do what you need:
loop through every file and grab every line starting with "using" -> write them to a temp file/buffer
check for duplicates and remove them
get the position of the first '{' after the charsequence "namespace"
get the position of the last '}' in the file
append the text in between these two positions onto a temp file/buffer
append the second file/buffer to the first one
write out the merged buffer
It is very subjective. I see the algorithm as the following in pseudo code:
usingsLines = new HashSet<string>();
newFile = new StringBuilder();
foreeach (file in listOfFiles)
{
var textFromFile = file.ReadToEnd();
var usingOperators = textFromFile.GetUsings();
var fileBody = textFromFile.GetBody();
newFile+=fileBody ;
}
newFile = usingsLines.ToString() + newFile;
// As a result if will have something like this
// using usingsfromFirstFile;
// using usingsfromSecondFile;
//
// namespace FirstFileNamespace
// {
// ...
// }
//
// namespace SecondFileNamespace
// {
// ...
// }
But keep in mind this approach can lead to conflicts in namespaces if two different namespaces contan the same classes etc. To solve it you need to fix it manually, or rid of using operator and use fullnames with namespaces.
Also these few links may be useful:
Merge files,
Merge file in Java

how to write the last line of file

I have a file data.txt. data.txt contains text line by line as:
one
two
three
six
Here I need to write data in file as:
one
two
three
four
five
six
I dont know how to write file like this!!
Generally, you have to re-write the file when inserting - because text files have variable length rows.
There are optimizations you could employ: like extending a file and buffering and writing, but you may have to buffer an arbitrary amount - i.e. inserting a row at the top.
If we knew more about your complete scenario, we would be more able to help usefully.
Loop through your text file and put lines as array. Modify the array and save it back to file. But it's not a good idea if you have some other text file, for this particular example it can work no problem.

Big strings: System.OutOfMemoryException

var fileList = Directory.GetFiles("./", "split*.dat");
int fileCount = fileList.Length;
int i = 0;
foreach (string path in fileList)
{
string[] contents = File.ReadAllLines(path); // OutOfMemoryException
Array.Sort(contents);
string newpath = path.Replace("split", "sorted");
File.WriteAllLines(newpath, contents);
File.Delete(path);
contents = null;
GC.Collect();
SortChunksProgressChanged(this, (double)i / fileCount);
i++;
}
And for file that consists ~20-30 big lines(every line ~20mb) I have OutOfMemoryException when I perform ReadAllLines method. Why does this exception raise? And how do I fix it?
P.S. I use Mono on MacOS
You should always be very careful about performing operations with potentially unbounded results. In your case reading a file. As you mention, the file size and or line length is unbounded.
The answer lies in reading 'enough' of a line to sort then skipping characters until the next line and reading the next 'enough'. You probably want to aim to create a line index lookup such that when you reach an ambiguous line sorting order you can go back to get more data from the line (Seek to file position). When you go back you only need to read the next sortable chunk to disambiguate the conflicting lines.
You may need to think about the file encoding, don't go straight to bytes unless you know it is one byte per char.
The built in sort is not as fast as you'd like.
Side Note:
If you call GC.* you've probably done it wrong
setting contents = null does not help you
If you are using a foreach and maintaining the index then you may be better with a for(int i...) for readability
Okay, let me give you a hint to help you with your home work. Loading the complete file into memory will -as you know- not work, because it is given as a precondition of the assignment. You need to find a way to lazily load the data from disk as you go and throw as much data away as soon as possible. Because single lines could be too big, you will have to do this one char at a time.
Try creating a class that represents an abstraction over a line, for instance by wrapping the starting index and ending index of that line. When you let this class implement IComparable<T> it allows you to sort that line with other lines. Again, the trick is to be able to read characters from the file one at a time. You will need to work with Streams (File.Open) directly.
When you do this, you will be able to write your application code like this:
List<FileLine> lines = GetLines("fileToSort.dat");
lines.Sort();
foreach (var line in lines)
{
line.AppendToFile("sortedFile.dat");
}
Your task will be to implement GetLines(string path) and create the FileLine class.
Note that I assume that the actual number of lines will be small enough that the List<FileLine> will fit into memory (which means an approximate maximum of 40,000,000 lines). If the amount of lines can be higher, you would even need a more flexible approach, but since you are talking about 20 to 30 lines, this should not be a problem.
Basically you rapproach is bull. You are violatin a constraint of the homework you are given, and this constraint has been put there to make you think more.
As you said:
I must implement external sort and show my teacher that it works for files bigger than my
RAM
Ok, so how you think you will ever read the file in ;) this is there on purpose. ReadAllLiens does NOT implement incremental external sort. As a result, it blows.

C# Saving "X" times into one .txt file without overwriting last string

Well, now i have a new problem.
Im writing code in C#
I want to save from textBoxName into group.txt file each time i enter string into textbox and click on save button. It should save at this order (if its possible to sort it like A-Z that would be great):
1. Petar Milutinovic
2. Ljiljana Milutinovic
3. Stefan Milutinovic
4. ... etc
I cant get it to work, i tried to use tehniques from my first question, and no solution yet :(
This is easy one i guess, but im still a beginer and i need this baddly...
Try to tackle this from a top-down approach. Write out what should happen, because it's not obvious from your question.
Example:
User enters a value in a (single-line?) textbox
User clicks Save
One new line is appended to the end of a file, with the contents of the textbox in step 1
Note: each line is prefixed with a line number, in the form "X. Sample" where X is the line number and Sample is the text from the textbox.
Is the above accurate?
(If you just want to add a line to a text file, see http://msdn.microsoft.com/en-us/library/ms143356.aspx - File.AppendAllText(filename, myTextBox.Text + Environment.NewLine); may be what you want)
Here's a simple little routine you can use to read, sort, and write the file. There are loads of ways this can be done, mine probably isn't even the best. Even now I'm thinking "I could have written that using a FileStream and done the iteration for counting then", but they're micro-optimizations that can be done later if you have performance issues with multi-megabyte files.
public static void AddUserToGroup(string userName)
{
// Read the users from the file
List<string> users = File.ReadAllLines("group.txt").ToList();
// Strip out the index number
users = users.Select(u => u.Substring(u.IndexOf(". ") + 2)).ToList();
users.Add(userName); // Add the new user
users.Sort((x,y) => x.CompareTo(y)); // Sort
// Reallocate the number
for (int i = 0; i < users.Count; i++)
{
users[i] = (i + 1).ToString() + ". " + users[i];
}
// Write to the file again
File.WriteAllLines("group.txt", users);
}
If you need the file to be sorted every time a new line is added, you'll either have to load the file into a list, add the line, and sort it, or use some sort of search (I'd recommend a binary search) to determine where the new line belongs and insert it accordingly. The second approach doesn't have many advantages, though, as you basically have to rewrite the entire file in order to insert a line - it only saves you time in the best case scenario, which occurs when the line to be inserted falls at the end of the file. Additionally, the second method is a bit lighter on the processor, as you aren't attempting to sort every line - for small files however, the difference will be unnoticeable.

Categories

Resources