Handling large text files in C# - c#

Hey there! I am needing to read large text files up to 100mb in size. I need to read each line Search for a string and write the results to a log. What would be the best way of doing this? Should I read each line individually and search it then move on to the next one?

Allocating a string of up to 200mb isn't that much at all these days. Just read it all at once and process it.

One option is to use memory mapped files.

Related

C# Reading a specific line from large data .txt file

I have a txt file in which X,Y coordinates are saved. Each lines contains different coordinates, as a result each line might have different size from the previous/next line.
The size of the file is too large to open it uses ReadAllLines function ( File can be larger than 10-50GB ). I have seen many answers talking about File.ReadLines(filename).Skip(n).Take(n).ToList();
My questions are, will this method load the file on RAM or it will load only the lines inside the Take(n) function?
If there is no way to access directly a specific line in txt file, would it be a good idea to transfer the data into a database table where access is easier?
Thanks in advance,
The microsoft documentation says: "The ReadLines and ReadAllLines methods differ as follows: When you use ReadLines, you can start enumerating the collection of strings before the whole collection is returned"
I Think you suggestion will load only the lines in the RAM.
Test it and use the memory debugger to check it Microsoft Doku
Maybe you can remove some overhead of this big file ...

Most efficient and quickest way to search numerous 1Gb text files for a certain string or instances of that string?

For work I am tasked with finding out how many times a certain set of sequential characters are used in a string. These files are all 1GB + in size and there are anywhere from 4 to 15 files like these. ex. find "cat" in "catastrophe" in every instance "cat" is part of the word in every file.
In theory (at least to me) I would load one text file into memory and then line by line look for the match. At the end of the text file I would remove it from memory and load the next text file.... until all files have been searched.
I have been doing mostly script files to automate tasks these last few years and I have been out of the coding game for so long, I don't remember, or maybe I never knew, the most efficient and fastest way to do this.
When I say speed I mean in elapsed time of the program, not how long it will take me to write this.
I would like to do this in C# because I am trying to get more comfortable with the language but really I could do it in any language. Preferably not assembly ...that was a joke...

Why does not changing a few number of bytes in a file corrupts the file?

In C#, I have a ZIP file that I want to corrupt by XORing or Nulling its bytes.
(by Nulling I mean make all the bytes in the file zeros)
XORing its bytes requires me to first, read the bytes to a byte array, XOR the bytes in the array with some value, then write the bytes back to the file.
Now, if I XOR/Null All (or half) of the file's bytes, it gets corrupted, but if Just
XOR/Null some of the bytes, say the first few bytes (or any few number of bytes in any position of the file) it doesn't get corrupted, and by that I mean that i can still access the file as if nothing really happend.
Same thing happened with mp3 files.
Why isn't the file getting corrupted ?
and is there a "FAST" way that i could corrupt a file with ?
the problem is that the zip file that I'm dealing with is big,
so XORing/Nulling even half of its bytes will take a couple of secs.
Thank You So Much In Advance .. :)
Just read all files completely and you probaly will get reading errors.
But of course, if you want to keep something 'secret', use encryption.
A zip contains a small header, a directory structure (a the end) and in between the individual files. See Wikipedia for details.
Corrupting the first bytes is sure to corrupt the file but it is also very easily repaired. The reader won't be able to find the directory block at the end.
Damaging the last block has the same effect: the reader will give up immediately but it is repairable.
Changing a byte in the middle will corrupt 1 file. The CRC will fail.
It depends on the file format you are trying to "corrupt". It also depends on what portion of the file you are trying to modify. Lastly, it depends how you are verifying if it is corrupted. Most file formats have some type of error detection.
The other thing working against you is that the zip file format uses a CRC algorithm for corruption. In addition, there are two copies of the directory structure, so you need to corrupt both.
I would suggest you corrupt the directory structure at the end and then modify some of the bytes in the front.
I could just lock the zip entries with a pass, but I don't want anybody to even open it up and see what's in it
That makes it sound as if you're looking for a method of secure deletion. If you simply didn't want someone to read the file, delete it. Otherwise, unless you do something extreme like go over it a dozen times with different values or apply some complex algorithm over it a hundred times, there are still going to be ways to read the data, even if the format is 'corrupt'.
On the other hand, breaking a file simply to stop someone else accessing it conventionally just seems overkill. If it's a zip, you can read it in (there are plenty of questions here for handling archive files), encrypt it with a password and then write it back out. If it's a different type of file, there are literally a million different questions and solutions for encrypting, hiding or otherwise preventing access to data. Breaking a file isn't something you should being going out of your way to do, unless this is to help test some sort of un-zip-corrputing-program or something similar, but your comments imply this is to prevent access. Perhaps a bit more background on why you want to do this could help us provide a better answer?

Reading large text files to datagridview a filtering

I need to read line by line from text file (log files from server) and they are big (about 150-200MB). I am using StreamReader and its great for "little" files like 12MB but not for so big. After sometime it is loaded and it shows in my DataGridView but its huge in memory. I am using bindingSource.Filter on this DataGridView (like textbox and when user write letter it is filtering one column a comparing strings, not showing rows without letters in textbox and so) and with big files its useless too. So I want to ask you what is best solution for me.
I was looking and find some solutions but I need help with decided whats best for me and with implementing (or if there is something else):
Load data in background and showing them in realtime. I am not really sure how to do that and I don´t know what to do with filtering in this solution.
Maybe upgrade somehow streamreader? Or write own method for reading lines from file with binary readers?
I found something about Memory-Mapped in c# 4.0 but i can´t use 4.0. Could this help feature help?
Thanks for help
Okay, so I am implementing Paging and I read 5k lines of text file than after clicking button next lines and so. I am using BaseStream.Position for saving a starting reading but I would like to use some other function which save number of lines and mainly I want use method for starting reading from exact line but I can´t find nothing for StreamReader. Is there something like that?
Load data in background and showing them in realtime. I am not really sure how to do that and I don't know what to do with filtering in this solution.
This is no help. It will still consume much memory in the background thread.
Maybe upgrade somehow streamreader? Or write own method for reading lines from file with binary readers?
Still no help, once you read the whole file into memory it will, well, consume memory.
I think you get the point. Don't load the whole file into memory. Load only chunks of it. Use paging. You cannot show 200MB worth of data on a single screen anyways, so only load the portion you need to show on the screen. So basically you need to implement the following function:
public IEnumerable<string> ReadFile(int page, int linesPerPage, out totalLines)
{
...
}
The Skip and Take extension methods could be helpful here.

Fastest way to store an ascii file's text in a RichTextBox?

I have some ascii files that are 60-100MB in size. I want to store one of these in a control in Visual C# as quickly as possible. I've been googling for answers and I've found a few solutions such as putting the file into a stringbuilder and then converting that to a string and storing it in the rtb. The solution I have found thus far uses a a filestream and does txt_log.LoadFile(fi, RichTextBoxStreamType.PlainText). This is the fastest implementation thus far but a better implementation must exist. Is there something else I'm overlooking? Is there a way to make the RTB dynamically page the file directly?
If it helps, I plan to read through the file to perform searches after I load it.
I imagine a simple way would be to do:
myRtb.Text = File.ReadAllText(bigFile.txt, Encoding.ASCIIEncoding);
but it's doubtful you'd get good performance out of it with such a huge file.

Categories

Resources