Parse text file with LINQ

Parse text file with LINQ - c#

I know normally you would use the File.ReadAllLines, but I'm trying to do it with an uploaded file.
Can I somehow put it into a temporary location?, or read it from memory?
I was able to get this working

Is this a string, a Stream, or what? either way, you want a TextReader - the question is simply StringReader vs StreamReader. Once you have that, I would do something like:
public static IEnumerable<string> ReadLines(TextReader reader) {
string line;
while((line = reader.ReadLine()) != null) yield return line;
}
then with whichever reader, I can either user:
foreach(var line in ReadLines(reader)) {
// note: non-buffered - i.e. more memory-efficient
}
or:
string[] lines = ReadLines(reader).ToArray();
// note: buffered - all read into memory at once (less memory efficient)
i.e. if it is a Stream you are reading from:
using(var reader = new StreamReader(inputStream)) {
foreach(var line in ReadLines(reader)) {
// do something fun and interesting
}
}

Related

File.ReadAllLines or Stream Reader

We can read file either by using StreamReader or by using File.ReadAllLines.
For example I want to load each line into a List or string[] for further manipulation on each line.
string[] lines = File.ReadAllLines(#"C:\\file.txt");
foreach(string line in lines)
{
//DoSomething(line);
}
or
using (StreamReader reader = new StreamReader("file.txt"))
{
string line;
while ((line = reader.ReadLine()) != null)
{
//DoSomething(line); or //save line into List<string>
}
}
//if list is created loop through list here
Application come across different size of text file. Which could grow from few KBs to MBs occasionally.
My question is that which one is preferred way and why one should be preferred over other?

If you want to process each line of a text file without loading the entire file into memory, the best approach is like this:
foreach (var line in File.ReadLines("Filename"))
{
// ...process line.
}
This avoids loading the entire file, and uses an existing .Net function to do so.
However, if for some reason you need to store all the strings in an array, you're best off just using File.ReadAllLines() - but if you are only using foreach to access the data in the array, then use File.ReadLines().

Microsoft uses a StreamReader in File.ReadAllLines:
private static String[] InternalReadAllLines(String path, Encoding encoding)
{
Contract.Requires(path != null);
Contract.Requires(encoding != null);
Contract.Requires(path.Length != 0);
String line;
List<String> lines = new List<String>();
using (StreamReader sr = new StreamReader(path, encoding))
while ((line = sr.ReadLine()) != null)
lines.Add(line);
return lines.ToArray();
}

The StreamReader read the file line by line, it will consume less memory.
Whereas, File.ReadAllLines read all lines at once and store it into string[], it will consume more memory. And if that string[] is larger than int.maxvalue then that will produce memory overflow(limit of 32bit OS).
So, for bigger files StreamReader will be more efficient.

text file: Reading line by line C#

So, let's say i have a text file with 20 lines, with on each line different text.
i want to be able to have a string that has the first line in it, but when i do NextLine(); i want it to be the next line. I tried this but it doesn't seem to work:
string CurrentLine;
int LastLineNumber;
Void NextLine()
{
System.IO.StreamReader file = new System.IO.StreamReader("c:\\test.txt");
CurrentLine = file.ReadLine(LastLineNumber + 1);
LastLineNumber++;
}
How would i be able to do this?
Thanks in advance.

In general, it would be better if you could design this in a way to leave your file open, and not try to reopen the file each time.
If that is not practical, you'll need to call ReadLine multiple times:
string CurrentLine;
int LastLineNumber;
void NextLine()
{
// using will make sure the file is closed
using(System.IO.StreamReader file = new System.IO.StreamReader("c:\\test.txt"))
{
// Skip lines
for (int i=0;i<LastLineNumber;++i)
file.ReadLine();
// Store your line
CurrentLine = file.ReadLine();
LastLineNumber++;
}
}
Note that this can be simplified via File.ReadLines:
void NextLine()
{
var lines = File.ReadLines("C:\\test.txt");
CurrentLine = lines.Skip(LastLineNumber).First();
LastLineNumber++;
}

One simple call should do it:
var fileLines = System.IO.File.ReadAllLines(fileName);
You will want to validate the file exists and of course you still need to watch for blank lines or invalid values but that should give you the basics. To loop over the file you can use the following:
foreach (var singleLine in fileLines) {
// process "singleLine" here
}
One more note - you won't want to do this with large files since it processes everything in memory.

Well, if you really don't mind re-opening the file each time, you can use:
CurrentLine = File.ReadLines("c:\\test.txt").Skip(LastLineNumber).First();
LastLineNumber++;
However, I'd advise you to just read the whole thing in one go using File.ReadAllLines, or perhaps File.ReadLines(...).ToList().

The ReadLine method already reads the next line in the StreamReader, you don't need the counter, or your custom function for that matter. Just keep reading until you reach your 20 lines or until the file ends.

You can't pass a line number to ReadLine and expect it to find that particular line. If you look at the ReadLine documentation, you'll see it doesn't accept any parameters.
public override string ReadLine()
When working with files, you must treat them as streams of data. Every time you open the file, you start at the very first byte/character of the file.
var reader = new StreamReader("c:\\test.txt"); // Starts at byte/character 0
You have to keep the stream open if you want to read more lines.
using (var reader = new StreamReader("c:\\test.txt"))
{
string line1 = reader.ReadLine();
string line2 = reader.ReadLine();
string line3 = reader.ReadLine();
// etc..
}
If you really want to write a method NextLine, then you need to store the created StreamReader object somewhere and use that every time. Somewhat like this:
public class MyClass : IDisposable
{
StreamReader reader;
public MyClass(string path)
{
this.reader = new StreamReader(path);
}
public string NextLine()
{
return this.reader.ReadLine();
}
public void Dispose()
{
reader.Dispose();
}
}
But I suggest you either loop through the stream:
using (var reader = new StreamReader("c:\\test.txt"))
{
while (some_condition)
{
string line = reader.ReadLine();
// Do something
}
}
Or get all the lines at once using the File class ReadAllLines method:
string[] lines = System.IO.File.ReadAllLines("c:\\test.txt");
for (int i = 0; i < lines.Length; i++)
{
string line = lines[i];
// Do something
}

Is there something like the 'opposite' of yield?

I recognize that I cannot hold my complete data in memory, so I want to stream parts in memory and work with them and then write them back.
Yield is a very useful keyword, it saves the whole bunch of using an enumerator and saving the index,....
But when I want to shift IEnumerable via yield around and write them back to a collection/file do I need to use the enumerator concept or is there something similar like the opposite of yield?
I head about RX, but I´m not clear if it solves my problem?
public static IEnumerable<string> ReadFile()
{
string line;
var reader = new System.IO.StreamReader(#"c:\\temp\\test.txt");
while ((line = reader.ReadLine()) != null)
{
yield return line;
}
reader.Close();
}
public static void StreamFile()
{
foreach (string line in ReadFile())
{
WriteFile(line);
}
}
public static void WriteFile(string line)
{
// how to save the state, of observe an collection/stream???
var writer = new System.IO.StreamWriter("c:\\temp\\test.txt");
writer.WriteLine(line);
writer.Close();
}

In your case, you can pass the IEnumerable<string> directly to WriteFile:
public static void WriteFile(IEnumerable<string> lines)
{
// how to save the state, of observe an collection/stream???
using(var writer = new System.IO.StreamWriter("c:\\temp\\test.txt"))
{
foreach(var line in lines)
writer.WriteLine(line);
}
}
Since the input is streamed through an IEnumerable<T>, the data will never be held in memory.
Note that, in this case, you could just use File.ReadLines to perform the read, as it already streams the results back via an IEnumerable<string>. With File.WriteAllLines, your code could be done as (though, you could also just use File.Copy):
File.WriteAllLines(outputFile, File.ReadLines(inputFile));

Removing the first line of a text file in C#

I can currently remove the last line of a text file using:
var lines = System.IO.File.ReadAllLines("test.txt");
System.IO.File.WriteAllLines("test.txt", lines.Take(lines.Length - 1).ToArray());
Although, how is it possible to instead remove the beginning of the text file?

Instead of lines.Take, you can use lines.Skip, like:
var lines = File.ReadAllLines("test.txt");
File.WriteAllLines("test.txt", lines.Skip(1).ToArray());
to truncate at the beginning despite the fact that the technique used (read all text and write everything back) is very inefficient.
About the efficient way: The inefficiency comes from the necessity to read the whole file into memory. The other way around could easily be to seek in a stream and copy the stream to another output file, delete the original, and rename the old. That one would be equally fast and yet consume much less memory.
Truncating a file at the end is much easier. You can just find the trunaction position and call FileStream.SetLength().

Here is an alternative:
using (var stream = File.OpenRead("C:\\yourfile"))
{
var items = new LinkedList<string>();
using (var reader = new StreamReader(stream))
{
reader.ReadLine(); // skip one line
string line;
while ((line = reader.ReadLine()) != null)
{
//it's far better to do the actual processing here
items.AddLast(line);
}
}
}
Update
If you need an IEnumerable<string> and don't want to waste memory you could do something like this:
public static IEnumerable<string> GetFileLines(string filename)
{
using (var stream = File.OpenRead(filename))
{
using (var reader = new StreamReader(stream))
{
reader.ReadLine(); // skip one line
string line;
while ((line = reader.ReadLine()) != null)
{
yield return line;
}
}
}
}
static void Main(string[] args)
{
foreach (var line in GetFileLines("C:\\yourfile.txt"))
{
// do something with the line here.
}
}

var lines = System.IO.File.ReadAllLines("test.txt");
System.IO.File.WriteAllLines("test.txt", lines.Skip(1).ToArray());
Skip eliminates the given number of elements from the beginning of the sequence. Take eliminates all but the given number of elements from the end of the sequence.

To remove fist line from a text file
System.IO.StreamReader file = new System.IO.StreamReader(filePath);
string data = file.ReadToEnd();
file.Close();
data = Regex.Replace(data, "<.*\n", "");
System.IO.StreamWriter file = new System.IO.StreamWriter(filePath, false);
file.Write(data);
file.Close();

can do in one line also
File.WriteAllLines(origialFilePath,File.ReadAllLines(originalFilePath).Skip(1));
Assuming you are passing your filePath as parameter to the function.

Best way to write huge string into a file

In C#, I'm reading a moderate size of file (100 KB ~ 1 MB), modifying some parts of the content, and finally writing to a different file. All contents are text. Modification is done as string objects and string operations. My current approach is:
Read each line from the original file by using StreamReader.
Open a StringBuilder for the contents of the new file.
Modify the string object and call AppendLine of the StringBuilder (until the end of the file)
Open a new StreamWriter, and write the StringBuilder to the write stream.
However, I've found that StremWriter.Write truncates 32768 bytes (2^16), but the length of StringBuilder is greater than that. I could write a simple loop to guarantee entire string to a file. But, I'm wondering what would be the most efficient way in C# for doing this task?
To summarize, I'd like to modify only some parts of a text file and write to a different file. But, the text file size could be larger than 32768 bytes.
== Answer == I'm sorry to make confusin to you! It was just I didn't call flush. StremWriter.Write does not have a short (e.g., 2^16) limitation.

StreamWriter.Write
does not
truncate the string and has no limitation.
Internally it uses String.CopyTo which on the other hand uses unsafe code (using fixed) to copy chars so it is the most efficient.

The problem is most likely related to not closing the writer. See http://msdn.microsoft.com/en-us/library/system.io.streamwriter.flush.aspx.
But I would suggest not loading the whole file in memory if that can be avoided.

can you try this :
void Test()
{
using (var inputFile = File.OpenText(#"c:\in.txt"))
{
using (var outputFile = File.CreateText(#"c:\out.txt"))
{
string current;
while ((current = inputFile.ReadLine()) != null)
{
outputFile.WriteLine(Process(current));
}
}
}
}
string Process(string current)
{
return current.ToLower();
}
It avoid to have to full file loaded in memory, by processing line by line and writing it directly

Well, that entirely depends on what you want to modify. If your modifications of one part of the text file are dependent on another part of the text file, you obviously need to have both of those parts in memory. If however, you only need to modify the text file on a line-by-line basis then use something like this :
using (StreamReader sr = new StreamReader(#"test.txt"))
{
using (StreamWriter sw = new StreamWriter(#"modifiedtest.txt"))
{
while (!sr.EndOfStream)
{
string line = sr.ReadLine();
//do some modifications
sw.WriteLine(line);
sw.Flush(); //force line to be written to disk
}
}
}

Instead of of running though the hole dokument i would use a regex to find what you are looking for Sample:
public List<string> GetAllProfiles()
{
List<string> profileNames = new List<string>();
using (StreamReader reader = new StreamReader(_folderLocation + "profiles.pg"))
{
string profiles = reader.ReadToEnd();
var regex = new Regex("\nname=([^\r]{0,})", RegexOptions.IgnoreCase);
var regexMatchs = regex.Matches(profiles);
profileNames.AddRange(from Match regexMatch in regexMatchs select regexMatch.Groups[1].Value);
}
return profileNames;
}

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Parse text file with LINQ - c#

I know normally you would use the File.ReadAllLines, but I'm trying to do it with an uploaded file. Can I somehow put it into a temporary location?, or read it from memory? I was able to get this working

Related

File.ReadAllLines or Stream Reader

text file: Reading line by line C#

Is there something like the 'opposite' of yield?

Removing the first line of a text file in C#

Best way to write huge string into a file

Categories

Resources