C# Reading a specific line from large data .txt file - c#

I have a txt file in which X,Y coordinates are saved. Each lines contains different coordinates, as a result each line might have different size from the previous/next line.
The size of the file is too large to open it uses ReadAllLines function ( File can be larger than 10-50GB ). I have seen many answers talking about File.ReadLines(filename).Skip(n).Take(n).ToList();
My questions are, will this method load the file on RAM or it will load only the lines inside the Take(n) function?
If there is no way to access directly a specific line in txt file, would it be a good idea to transfer the data into a database table where access is easier?
Thanks in advance,

The microsoft documentation says: "The ReadLines and ReadAllLines methods differ as follows: When you use ReadLines, you can start enumerating the collection of strings before the whole collection is returned"
I Think you suggestion will load only the lines in the RAM.
Test it and use the memory debugger to check it Microsoft Doku
Maybe you can remove some overhead of this big file ...

Related

C# How to iterate through a list of files without using Directory.GetFiles() method

I have several security cameras that upload pictures to my ftp server. Some of these cams conveniently create subfolders at the start of a new day in the format "yyyymmdd". This is great and makes it easy to maintain/delete older pictures by a particular day. Other cams aren't so nice and just dump pictures in a giant folder, making deletion difficult.
So I am writing a C# windows form program to go to a specific folder (source folder) using FolderBrowserDialog and I name a target folder also using FBD. I was using the standard process to iterate through a file list using a string array filled via Directory.GetFiles() method. I use the file's Creation Date to create a subfolder if it doesn't exist. In either case I move the file to that date based subfolder. It works great while testing with small numbers of files.
Now I'm ready to test against real data and I'm concerned that with some folders having thousands of files, I'm going to have many issues with memory and other problems. How well can a string array handle such huge volumes of data? Note one folder has over 28,000 pictures. Can a string array handle such a large number of FileInfo objects?
My question then is how can I iterate through a list of files in a given folder without having to use a string array and Directory.GetFiles() method? I'm open to any thoughts though I do want to use c# in a windows form environment. I have an added feature that lets me delete pictures older than a given date instead of moving them.
Many thanks!
You'll be just fine with thousands of file names. You might have a problem with millions, but thousands isn't a big deal for C#. You may have a performance issue just because of how NTFS works, but if so there's nothing you can do about that in C#; it's a problem inherent in the file system.
However, if you really want to pick at this, you can do a little better by using Directory.EnumerateFileSystemInfos(). This method has two benefits over GetFiles():
It loads the file name and creation date in one disk access, instead of two.
It allows you to work with an IEnumerable instead of an array, such that you only need memory for one file record at a time.

Write Text to a file at a specific point

I am creating a program that sorts through 500k+ lines of text, and pulls certain strings out to be written in a clean version.
When I get my finished array of new clean lines to be written to file, I am curious as to if there is a way that I can use threads and tell the code exactly what line number or index to start writing on in the text file.
Effectively using multiple threads simultaneously writing sections of my text, maintaining the original order of compiling.
A simple example would be, say, I wanted to start writing text at the 125923rd line of the text file regardless of what already exists, if anything.
Thank you
You ca not write a single line without wrting the whole file unless it is the same lenght of the original line. By the way in your case, if I got it right, you want to use multiple threads to write to a single file but unfortunately this is not possibile in your case

Handling large text files in C#

Hey there! I am needing to read large text files up to 100mb in size. I need to read each line Search for a string and write the results to a log. What would be the best way of doing this? Should I read each line individually and search it then move on to the next one?
Allocating a string of up to 200mb isn't that much at all these days. Just read it all at once and process it.
One option is to use memory mapped files.

Reading large text files to datagridview a filtering

I need to read line by line from text file (log files from server) and they are big (about 150-200MB). I am using StreamReader and its great for "little" files like 12MB but not for so big. After sometime it is loaded and it shows in my DataGridView but its huge in memory. I am using bindingSource.Filter on this DataGridView (like textbox and when user write letter it is filtering one column a comparing strings, not showing rows without letters in textbox and so) and with big files its useless too. So I want to ask you what is best solution for me.
I was looking and find some solutions but I need help with decided whats best for me and with implementing (or if there is something else):
Load data in background and showing them in realtime. I am not really sure how to do that and I don´t know what to do with filtering in this solution.
Maybe upgrade somehow streamreader? Or write own method for reading lines from file with binary readers?
I found something about Memory-Mapped in c# 4.0 but i can´t use 4.0. Could this help feature help?
Thanks for help
Okay, so I am implementing Paging and I read 5k lines of text file than after clicking button next lines and so. I am using BaseStream.Position for saving a starting reading but I would like to use some other function which save number of lines and mainly I want use method for starting reading from exact line but I can´t find nothing for StreamReader. Is there something like that?
Load data in background and showing them in realtime. I am not really sure how to do that and I don't know what to do with filtering in this solution.
This is no help. It will still consume much memory in the background thread.
Maybe upgrade somehow streamreader? Or write own method for reading lines from file with binary readers?
Still no help, once you read the whole file into memory it will, well, consume memory.
I think you get the point. Don't load the whole file into memory. Load only chunks of it. Use paging. You cannot show 200MB worth of data on a single screen anyways, so only load the portion you need to show on the screen. So basically you need to implement the following function:
public IEnumerable<string> ReadFile(int page, int linesPerPage, out totalLines)
{
...
}
The Skip and Take extension methods could be helpful here.

Text editing LinkedList vs List of lines

I'm writing a text editor using C#, I know i'm just reinventing the wheel but it's a learning experience and something I want to do.
Right now I have a basic Text Document using something that resembles a Gap Buffer however I have to update my line buffer to hold the start of each line each time an edit is made to the buffer.
I am looking at creating another Text Document for testing using a list of lines and editing each line instead.
Now the question that I have is what would be the benefits of using a LinkedList vs a standard List?
A linked list of lines will be fast to insert new lines or delete lines, and fast to move down (and up if you have a doubly-linked list) from a specific line by a small number of lines, and fast to move to the start of the document. To move quickly to the end of the document you will also need to store the end of the linked list. It is relatively slow to go to a specific line number though, as you will have to start from the beginning and iterate over the lines, although this shouldn't be a problem unless your documents have very many lines.
An ordinary list is fast to move to a specific line number but slow to add or remove any line except at the end as the entire buffer will need to be copied each time a line is inserted or deleted.
I would prefer a linked list over an array based list for the purposes of editing large documents. In either case you may have problems if the document contains any extremely long lines as strings are immutable and changing each character will be slow.
I'm using a one string per line array. Arrays are often faster or equal to update then linked list because they have a much better cache locality then linked list and for a single 1st level cache miss you can move already a few dozens pointer items in an array. I would say for everything less then 10000 items just use an array of pointers.
If your edited text are small (like hand written source code files) it is the way to go. A lot of the meta information you need for state of the art text editing can be split very well into "beginning of a line" points. Most importantly syntax highlighting.
Also error lines, breakpoints, profiling infos, code coverage markers, all of them work on a line. It's the native structure of source code text and in same cases also for literature text (if you write a text processor and not a source code editor) but in that case your need to take a paragraph as a unit.
If i ever find the time to redesign my editor i will add different buffer implementations because on larger texts the overhead of all the line info data (it's about 80 byte per line on a 32bit system) is significant. Then a gap buffer model is better, it is also much better if you don't have lines at all for example when displaying binary files in hex mode.
And finally a third buffer model is required when you allow a user to open large files. It's funny to see marketing bullshit (free open source is surprisingly worse here) about unlimited file size editing and once you open a 400 MB log file, the whole system becames unresponsive. You need a buffer model here which is not loading the whole file into the buffer first.

Categories

Resources