This question already has answers here:
Read line by line a large text file and search for a string
(2 answers)
Closed 7 years ago.
I have huge files in c# (more than 300 MB). I need effiecent way to read the file line by line because once I try to read it it takes more than 30 minutes while the target time is around 3 minutes. I tried File.ReadAllBytes which reads the files successfully and very fast and load it to the string. But after that it takes very long time to process the string line by line. Is there better way or faster way to do so.
Thank in advance.
You can use File.ReadLines, it will enumerate through lines of file:
var lines = File.ReadLines(path);
foreach(var line in lines)
{
// do your logic here
}
It will not load file at the first line. It will load it while looping through lines, so it's better way to read bigger files, than loading it at once.
MSDN says in description of File.ReadLines
Remarks The ReadLines and ReadAllLines methods differ as follows: When
you use ReadLines, you can start enumerating the collection of strings
before the whole collection is returned; when you use ReadAllLines,
you must wait for the whole array of strings be returned before you
can access the array. Therefore, when you are working with very large
files, ReadLines can be more efficient.
Related
I found this post on selecting a range from an array, and have to use the LINQ option:
Selecting a range of items inside an array in C#
Ultimately, I'm trying to get the last four lines from some text file. After, I've read in and cleaned the lines for unwanted characters and empty lines, I have an array with all of the lines. I'm using the following to do so:
string[] allLines = GetEachLine(results);
string[] lastFourLines = allLines.Skip(allLines.Length - 4).Take(4).ToArray();
This works fine, but I'm wondering if I could somehow skip assinging to the allLines variable all together. Such as:
string[] lastFourLines = GetEachLine(results).Skip(returnedArrayLength - 4).Take(4).ToArray();
It would be better to change GetEachLine and code preceding it (however results is computed) to use IEnumerable<T> and avoid using an array to read the entire file in memory for the last four lines (unless you use all of results for something else) - consider using File.ReadLines.
However, if you are using .Net Core 2.0 or greater, you can use Enumerable.TakeLast to efficiently return the last four lines:
var lastFourLines = GetEachLine(results).TakeLast(4);
if GetEachLine() returns string[] then that should work fine, though null checking may be needed.
As you chain more you may want to use line breaks to increase readability:
string[] lastFourLines = GetEachLine(results)
.Skip(allLines.Length - 4)
.Take(4)
.ToArray();
allLines.Length won't exist unless you still have line 1 from your question, you can avoid calling GetEachLine() twice by using TakeLast().
string[] lastFourLines = GetEachLine(results)
.TakeLast(4)
.ToArray();
If you are looking to efficiently retrieve the last N (filtered) line of a large file, you really need to start at the point where you are reading the file contents.
Consider a 1GB log file containing 10M records, where you only want the last few lines. Ideally, you would want to start by reading the last couple KB and then start extracting lines by searching for line breaks from the end, extracting each line and returning them in an iterator yield. If you run out of data, read the preceding block. Continue only as long as the consumer requests more values from the iterator.
Offhand, I don't know a built-in way to do this, and coding this from scratch could get pretty involved. Luckily, a search turned up this similar question having a highly rated answer.
This question already has answers here:
How best to read a File into List<string>
(10 answers)
Closed 5 years ago.
I'm trying to initialize a List<string> with some data from a file. The file is a list of words separated by carriage returns so currently, I am doing
var wordList = new List<string>(textFromFile.Split( new[] {"\r\n", "\r", "\n"}, StringSplitOptions.None ) )
but for the size of text files I'm dealing with (172,888 lines in one of the files!) this is very slow. Is there a better way to do this? The text file doesn't have to be formatted the way it is currently, I could parse it and write it out in a different format if there is a better method of storing the data. In C++ I would be thinking of binary data and a memcopy but I don't think there is a similar solution in C#?
If it's relevant, the code is in a Unity app so limited to early .NET capabilities of their Mono version
You might want to use File.ReadAllLines to read the file and it does exactly what you're looking for also it should be well optimized.
var wordList = File.ReadAllLines("yourFileSrc");
To improve performance even more you may want to split your files to N of files and process them in parallel using TPL (Task parallel library) or use .AsParallel method (as kindly suggested by Evk)
More info about PLINQ you can find here
*** Update
For parsing a large string you might want to split the string first (without parsing it) to a number of lesser strings and then process them in parallel.
I got the problem of reading single line form large file encoded in UTF-8. Lines in the file has the constant length.
The file in average has 300k lines. The time is the main constraint, so I want to do it the fastest way possible.
I've tried LinQ
File.ReadLines("file.txt").Skip(noOfLines).Take(1).First();
But the time is not satisfactory enough.
My biggest hope was using the stream, and setting its possition to the desired line start, but the problem is that lines sizes in bytes differ.
Any ideas, how to do it?
Now this is where you don't want to use linq (-:
You actually want to find a nth occurrence of a new line in the file and read something till the next new line.
You probably want to check out this documentation on memory mapped files as well:
https://msdn.microsoft.com/en-us/library/system.io.memorymappedfiles.memorymappedfile(v=vs.110).aspx
There is also a post comparing different access methods
http://cc.davelozinski.com/c-sharp/fastest-way-to-read-text-files
This question already has answers here:
Read and Write to File at the same time
(5 answers)
Closed 7 years ago.
i want to know how to read file and write on it in the same time for example:
file content:
Johny
tony
jack
Ahmad
Johny
string line;
line = file.ReadLine();
if (line == "johny")
{
line= "Sam"
}
You have to bear in mind that "Sam" and "Johnny" are not the same length. What will the file do with those empty bytes? Worse, what if you replaced "Sam" with "Johnny"? You will overwrite letters in the next record.
Fixed-width records can address this, but for a small file system, I would just read all into a list, then rewrite the whole list to file again. Or a different approach would be to set it all up in a database and let the database handle the reads and writes and you handle the business logic.
But just writing new data to a file on the fly as you are reading it is probably going to be more trouble than it's worth.
Can any one let me know fastest way of showing Range of Lines in a files of 5 GB size. For Example: If the File is having a Size of 5GB and it has line numbers has one of the column in the file. Say if the number of lines in a file are 1 million, I have Start Index Line # and End Index Line #. Say i want to read 25th Line to 89 th line of a large file, rather than reading each and every line, is there any fastest way of reading specific lines from 25th to 89th without reading whole file from begining in C#
In short, no. How can you possibly know where the carriage returns/line numbers are before you actually read them?
To avoid memory issues you could:
File.ReadLines(path)
.SkipWhile(line=>someCondition)
.TakeWhile(line=>someOtherCondition)
5GB is a huge amount of data to sift through without building some sort of index. I think you've stumbled upon a case where loading your data into a database and adding the appropriate indexes might serve you best.