Best way to write huge string into a file - c#

In C#, I'm reading a moderate size of file (100 KB ~ 1 MB), modifying some parts of the content, and finally writing to a different file. All contents are text. Modification is done as string objects and string operations. My current approach is:
Read each line from the original file by using StreamReader.
Open a StringBuilder for the contents of the new file.
Modify the string object and call AppendLine of the StringBuilder (until the end of the file)
Open a new StreamWriter, and write the StringBuilder to the write stream.
However, I've found that StremWriter.Write truncates 32768 bytes (2^16), but the length of StringBuilder is greater than that. I could write a simple loop to guarantee entire string to a file. But, I'm wondering what would be the most efficient way in C# for doing this task?
To summarize, I'd like to modify only some parts of a text file and write to a different file. But, the text file size could be larger than 32768 bytes.
== Answer == I'm sorry to make confusin to you! It was just I didn't call flush. StremWriter.Write does not have a short (e.g., 2^16) limitation.

StreamWriter.Write
does not
truncate the string and has no limitation.
Internally it uses String.CopyTo which on the other hand uses unsafe code (using fixed) to copy chars so it is the most efficient.

The problem is most likely related to not closing the writer. See http://msdn.microsoft.com/en-us/library/system.io.streamwriter.flush.aspx.
But I would suggest not loading the whole file in memory if that can be avoided.

can you try this :
void Test()
{
using (var inputFile = File.OpenText(#"c:\in.txt"))
{
using (var outputFile = File.CreateText(#"c:\out.txt"))
{
string current;
while ((current = inputFile.ReadLine()) != null)
{
outputFile.WriteLine(Process(current));
}
}
}
}
string Process(string current)
{
return current.ToLower();
}
It avoid to have to full file loaded in memory, by processing line by line and writing it directly

Well, that entirely depends on what you want to modify. If your modifications of one part of the text file are dependent on another part of the text file, you obviously need to have both of those parts in memory. If however, you only need to modify the text file on a line-by-line basis then use something like this :
using (StreamReader sr = new StreamReader(#"test.txt"))
{
using (StreamWriter sw = new StreamWriter(#"modifiedtest.txt"))
{
while (!sr.EndOfStream)
{
string line = sr.ReadLine();
//do some modifications
sw.WriteLine(line);
sw.Flush(); //force line to be written to disk
}
}
}

Instead of of running though the hole dokument i would use a regex to find what you are looking for Sample:
public List<string> GetAllProfiles()
{
List<string> profileNames = new List<string>();
using (StreamReader reader = new StreamReader(_folderLocation + "profiles.pg"))
{
string profiles = reader.ReadToEnd();
var regex = new Regex("\nname=([^\r]{0,})", RegexOptions.IgnoreCase);
var regexMatchs = regex.Matches(profiles);
profileNames.AddRange(from Match regexMatch in regexMatchs select regexMatch.Groups[1].Value);
}
return profileNames;
}

Related

async reading and writing lines of text

I've found plenty of examples of how to read/write text to a file asynchronously, but I'm having a hard time finding how to do it with a List.
For the writing I've got this, which seems to work:
public async Task<List<string>> GetTextFromFile(string file)
{
using (var reader = File.OpenText(file))
{
var fileText = await reader.ReadToEndAsync();
return fileText.Split(new[] { Environment.NewLine }, StringSplitOptions.None).ToList();
}
}
The writing is a bit trickier though ...
public async Task WriteTextToFile(string file, List<string> lines, bool append)
{
if (!append && File.Exists(file)) File.Delete(file);
using (var writer = File.OpenWrite(file))
{
StringBuilder builder = new StringBuilder();
foreach (string value in lines)
{
builder.Append(value);
builder.Append(Environment.NewLine);
}
Byte[] info = new UTF8Encoding(true).GetBytes(builder.ToString());
await writer.WriteAsync(info, 0, info.Length);
}
}
My problem with this is that for a moment it appears my data is triple in memory.
The original List of my lines, then the StringBuilder makes it a single string with the newlines, then in info I have the byte representation of the string.
That seems excessive that I have to have three copies of essentially the same data in memory.
I am concerned with this because at times I'll be reading and writing large text files.
Following up on that, let me be clear - I know that for extremely large text files I can do this all line by line. What I am looking for are two methods of reading/writing data. The first is to read in the whole thing and process it, and the second is to do it line by line. Right now I am working on the first approach for my small and moderate sized text files. But I am still concerned with the data replication issue.
The following might suit your needs as it does not store the data again as well as writing it line by line:
public async Task WriteTextToFile(string file, List<string> lines, bool append)
{
if (!append && File.Exists(file))
File.Delete(file);
using (var writer = File.OpenWrite(file))
{
using (var streamWriter = new StreamWriter(writer))
foreach (var line in lines)
await streamWriter.WriteLineAsync(line);
}
}

How to remove all lines in a file, then rewrite the file in Compact Framework 3.5 c#

In the .net framework using a Windows Forms app I can purge a file, then write the data that I want back to into that file.
Here is the code that I use in Windows Forms:
var openFile = File.OpenText(fullFileName);
var fileEmpty = openFile.ReadLine();
if (fileEmpty != null)
{
var lines = File.ReadAllLines(fullFileName).Skip(4); //Will skip the first 4 then rewrite the file
openFile.Close();//Close the reading of the file
File.WriteAllLines(fullFileName, lines); //Reopen the file to write the lines
openFile.Close();//Close the rewriting of the file
}
openFile.Close();
openFile.Dispose();
I am trying to do the same thing the compact framework. I can keep the lines that I want, and then delete all the lines in the file. However I am not able to rewrite the file.
Here is my compact framework code:
var sb = new StringBuilder();
using (var sr = new StreamReader(fullFileName))
{
// read the first 4 lines but do nothing with them; basically, skip them
for (int i = 0; i < 4; i++)
sr.ReadLine();
string line1;
while ((line1 = sr.ReadLine()) != null)
{
sb.AppendLine(line1);
}
}
string allines = sb.ToString();
openFile.Close();//Close the reading of the file
openFile.Dispose();
//Reopen the file to write the lines
var writer = new StreamWriter(fullFileName, false); //Don't append!
foreach (char line2 in allines)
{
writer.WriteLine(line2);
}
openFile.Close();//Close the rewriting of the file
}
openFile.Close();
openFile.Dispose();
Your code
foreach (char line2 in allines)
{
writer.WriteLine(line2);
}
is writing out the characters of the original file, each on a separate line.
Remember, allines is a single string that happens to have Environment.NewLine between the original strings of the file.
What you probably intend to do is simply
writer.WriteLine(allines);
UPDATE
You are closing openFile a number of times (you should only do this once), but you are not flushing or closing your writer.
Try
using (var writer = new StreamWriter(fullFileName, false)) //Don't append!
{
writer.WriteLine(allines);
}
to ensure the writer is disposed and therefore flushed.
If you plan to do this to have something like a "rotating" buffer for a log file consider that most Windows CE devices uses flash as storage media and your approach will generate a full re-write of the whole file (whole - 4 lines) every time. If this happens quite often (every few seconds) this may wear our the flash, reaching its maximum number of erase cycles quickly (quickly may mean a few weeks or months).
An alternative approach would be rename the old log file when it has reached the maximum size (deleting any existing file with the same name) and create a new one.
In this was you logging info would be split on two files but you'll always append to the existing files, limiting the number of writes you perform. Also renaming or deleting a file aren't heavy operations from the point of view of a flash file system.

How to read an entire file to a string using C#?

What is the quickest way to read a text file into a string variable?
I understand it can be done in several ways, such as read individual bytes and then convert those to string. I was looking for a method with minimal coding.
How about File.ReadAllText:
string contents = File.ReadAllText(#"C:\temp\test.txt");
A benchmark comparison of File.ReadAllLines vs StreamReader ReadLine from C# file handling
Results. StreamReader is much faster for large files with 10,000+
lines, but the difference for smaller files is negligible. As always,
plan for varying sizes of files, and use File.ReadAllLines only when
performance isn't critical.
StreamReader approach
As the File.ReadAllText approach has been suggested by others, you can also try the quicker (I have not tested quantitatively the performance impact, but it appears to be faster than File.ReadAllText (see comparison below)). The difference in performance will be visible only in case of larger files though.
string readContents;
using (StreamReader streamReader = new StreamReader(path, Encoding.UTF8))
{
readContents = streamReader.ReadToEnd();
}
Comparison of File.Readxxx() vs StreamReader.Readxxx()
Viewing the indicative code through ILSpy I have found the following about File.ReadAllLines, File.ReadAllText.
File.ReadAllText - Uses StreamReader.ReadToEnd internally
File.ReadAllLines - Also uses StreamReader.ReadLine internally with the additionally overhead of creating the List<string> to return as the read lines and looping till the end of file.
So both the methods are an additional layer of convenience built on top of StreamReader. This is evident by the indicative body of the method.
File.ReadAllText() implementation as decompiled by ILSpy
public static string ReadAllText(string path)
{
if (path == null)
{
throw new ArgumentNullException("path");
}
if (path.Length == 0)
{
throw new ArgumentException(Environment.GetResourceString("Argument_EmptyPath"));
}
return File.InternalReadAllText(path, Encoding.UTF8);
}
private static string InternalReadAllText(string path, Encoding encoding)
{
string result;
using (StreamReader streamReader = new StreamReader(path, encoding))
{
result = streamReader.ReadToEnd();
}
return result;
}
string contents = System.IO.File.ReadAllText(path)
Here's the MSDN documentation
For the noobs out there who find this stuff fun and interesting, the fastest way to read an entire file into a string in most cases (according to these benchmarks) is by the following:
using (StreamReader sr = File.OpenText(fileName))
{
string s = sr.ReadToEnd();
}
//you then have to process the string
However, the absolute fastest to read a text file overall appears to be the following:
using (StreamReader sr = File.OpenText(fileName))
{
string s = String.Empty;
while ((s = sr.ReadLine()) != null)
{
//do what you have to here
}
}
Put up against several other techniques, it won out most of the time, including against the BufferedReader.
Take a look at the File.ReadAllText() method
Some important remarks:
This method opens a file, reads each line of the file, and then adds
each line as an element of a string. It then closes the file. A line
is defined as a sequence of characters followed by a carriage return
('\r'), a line feed ('\n'), or a carriage return immediately followed
by a line feed. The resulting string does not contain the terminating
carriage return and/or line feed.
This method attempts to automatically detect the encoding of a file
based on the presence of byte order marks. Encoding formats UTF-8 and
UTF-32 (both big-endian and little-endian) can be detected.
Use the ReadAllText(String, Encoding) method overload when reading
files that might contain imported text, because unrecognized
characters may not be read correctly.
The file handle is guaranteed to be closed by this method, even if
exceptions are raised
string text = File.ReadAllText("Path"); you have all text in one string variable. If you need each line individually you can use this:
string[] lines = File.ReadAllLines("Path");
System.IO.StreamReader myFile =
new System.IO.StreamReader("c:\\test.txt");
string myString = myFile.ReadToEnd();
if you want to pick file from Bin folder of the application then you can try following and don't forget to do exception handling.
string content = File.ReadAllText(Path.Combine(System.IO.Directory.GetCurrentDirectory(), #"FilesFolder\Sample.txt"));
#Cris sorry .This is quote MSDN Microsoft
Methodology
In this experiment, two classes will be compared. The StreamReader and the FileStream class will be directed to read two files of 10K and 200K in their entirety from the application directory.
StreamReader (VB.NET)
sr = New StreamReader(strFileName)
Do
line = sr.ReadLine()
Loop Until line Is Nothing
sr.Close()
FileStream (VB.NET)
Dim fs As FileStream
Dim temp As UTF8Encoding = New UTF8Encoding(True)
Dim b(1024) As Byte
fs = File.OpenRead(strFileName)
Do While fs.Read(b, 0, b.Length) > 0
temp.GetString(b, 0, b.Length)
Loop
fs.Close()
Result
FileStream is obviously faster in this test. It takes an additional 50% more time for StreamReader to read the small file. For the large file, it took an additional 27% of the time.
StreamReader is specifically looking for line breaks while FileStream does not. This will account for some of the extra time.
Recommendations
Depending on what the application needs to do with a section of data, there may be additional parsing that will require additional processing time. Consider a scenario where a file has columns of data and the rows are CR/LF delimited. The StreamReader would work down the line of text looking for the CR/LF, and then the application would do additional parsing looking for a specific location of data. (Did you think String. SubString comes without a price?)
On the other hand, the FileStream reads the data in chunks and a proactive developer could write a little more logic to use the stream to his benefit. If the needed data is in specific positions in the file, this is certainly the way to go as it keeps the memory usage down.
FileStream is the better mechanism for speed but will take more logic.
well the quickest way meaning with the least possible C# code is probably this one:
string readText = System.IO.File.ReadAllText(path);
you can use :
public static void ReadFileToEnd()
{
try
{
//provide to reader your complete text file
using (StreamReader sr = new StreamReader("TestFile.txt"))
{
String line = sr.ReadToEnd();
Console.WriteLine(line);
}
}
catch (Exception e)
{
Console.WriteLine("The file could not be read:");
Console.WriteLine(e.Message);
}
}
string content = System.IO.File.ReadAllText( #"C:\file.txt" );
You can use like this
public static string ReadFileAndFetchStringInSingleLine(string file)
{
StringBuilder sb;
try
{
sb = new StringBuilder();
using (FileStream fs = File.Open(file, FileMode.Open))
{
using (BufferedStream bs = new BufferedStream(fs))
{
using (StreamReader sr = new StreamReader(bs))
{
string str;
while ((str = sr.ReadLine()) != null)
{
sb.Append(str);
}
}
}
}
return sb.ToString();
}
catch (Exception ex)
{
return "";
}
}
Hope this will help you.
you can read a text from a text file in to string as follows also
string str = "";
StreamReader sr = new StreamReader(Application.StartupPath + "\\Sample.txt");
while(sr.Peek() != -1)
{
str = str + sr.ReadLine();
}
I made a comparison between a ReadAllText and StreamBuffer for a 2Mb csv and it seemed that the difference was quite small but ReadAllText seemed to take the upper hand from the times taken to complete functions.
I'd highly recommend using the File.ReadLines(path) compare to StreamReader or any other File reading methods. Please find below the detailed performance benchmark for both small-size file and large-size file.
I hope this would help.
File operations read result:
For small file (just 8 lines)
For larger file (128465 lines)
Readlines Example:
public void ReadFileUsingReadLines()
{
var contents = File.ReadLines(path);
}
Note : Benchmark is done in .NET 6.
This comment is for those who are trying to read the complete text file in winform using c++ with the help of C# ReadAllText function
using namespace System::IO;
String filename = gcnew String(charfilename);
if(System::IO::File::Exists(filename))
{
String ^ data = gcnew String(System::IO::File::RealAllText(filename)->Replace("\0", Environment::Newline));
textBox1->Text = data;
}

C# Streamreader writer (memory issues)

I have a few multimillion lined text files located in a directory, I want to read line by line and replace “|” with “\” and then write out the line to a new file. This code might work just fine but I’m not seeing any resulting text file, or it might be I’m just be impatient.
{
string startingdir = #"K:\qload";
string dest = #"K:\D\ho\jlg\load\dest";
string[] files = Directory.GetFiles(startingdir, "*.txt");
foreach (string file in files)
{
StringBuilder sb = new StringBuilder();
using (FileStream fs = new FileStream(file, FileMode.Open))
using (StreamReader rdr = new StreamReader(fs))
{
while (!rdr.EndOfStream)
{
string begdocfile = rdr.ReadLine();
string replacementwork = docfile.Replace("|", "\\");
sb.AppendLine(replacementwork);
FileInfo file_info = new FileInfo(file);
string outputfilename = file_info.Name;
using (FileStream fs2 = new FileStream(dest + outputfilename, FileMode.Append))
using (StreamWriter writer = new StreamWriter(fs2))
{
writer.WriteLine(replacementwork);
}
}
}
}
}
DUHHHHH Thanks to everyone.
Id10t error.
Get rid of the StringBuilder, and do not reopen the output file for each line:
string startingdir = #"K:\qload";
string dest = #"K:\D\ho\jlg\load\dest";
string[] files = Directory.GetFiles(startingdir, "*.txt");
foreach (string file in files)
{
var outfile = Path.Combine(dest, Path.GetFileName(file));
using (StreamReader reader = new StreamReader(file))
using (StreamWriter writer = new StreamWriter(outfile))
{
string line = reader.ReadLine();
while (line != null)
{
writer.WriteLine(line.Replace("|", "\\"));
line = reader.ReadLine();
}
}
}
Why are you using a StringBuilder - you are just filling up your memory without doing anything with it.
You should also move the FileStream and StreamWriter using statements to outside of your loop - you are re-creating your output streams for every line, causing unneeded IO in the form of opening and closing the file.
Use Path.Combine(dest, outputfilename), from your code it looks like you're writing to the file K:\D\ho\jlg\load\destouputfilename.txt
This code might work just fine but I’m not seeing any resulting text file, or it might be I’m just be impatient.
Have you considered having a Console.WriteLine in there to check the progress. Sure, it's going to slow down performance a tiny tiny bit - but you'll know what's going on.
It looks like you might want to do a Path.Combine, so that instead of new FileStream(dest + outputfilename), you have new FileStream(Path.Combine(dest + outputfilename)), which will create the files in the directory that you expect, rather than creating them in K:\D\ho\jlg\load.
However, I'm not sure why you're writing to a StringBuilder that you're not using, or why you're opening and closing the file stream and stream writer on each line that you're writing, is that to force the writer to flush it's output? If so, it might be easier to just flush the writer/stream on each write.
you're opening and closing the output strean for each line in the output, you'll have to be very patient!
open it once outside the loop.
I guess the problem is here:
string begdocfile = rdr.ReadLine();
string replacementwork = docfile.Replace("|", "\\");
you're reading into begdocfile variable but replacing chars in docfile which I guess is empty
string replacementwork = docfile.Replace("|", "\\");
I believe the above line in your code is incorrect : it should be "begdocfile.Replace ..." ?
I suggest you focus on getting as much of the declaration and "name manufacture" out of the inner loop as possible : right now you are creating new FileInfo objects, and path names for every single line you read in every file : that's got to be hugely expensive.
make a single pass over the list of target files first, and create, at one time, the destination files, perhaps store them in a List for easy access, later. Or a Dictionary where "string" will be the new file path associated with that FileInfo ? Another strategy : just copy the whole directory once, and then operate to directly change the copied files : then rename them, rename the directory, whatever.
move every variable declaration out of that inner loop, and within the using code blocks you can.
I suspect you are going to hear from someone here at more of a "guru level" shortly who might suggest a different strategy based on a more profound knowledge of streams than I have, but that's a guess.
Good luck !

Reading from file not fast enough, how would I speed it up?

This is the way I read file:
public static string readFile(string path)
{
StringBuilder stringFromFile = new StringBuilder();
StreamReader SR;
string S;
SR = File.OpenText(path);
S = SR.ReadLine();
while (S != null)
{
stringFromFile.Append(SR.ReadLine());
}
SR.Close();
return stringFromFile.ToString();
}
The problem is it so long (the .txt file is about 2.5 megs). Took over 5 minutes. Is there a better way?
Solution taken
public static string readFile(string path)
{
return File.ReadAllText(path);
}
Took less than 1 second... :)
S = SR.ReadLine();
while (S != null)
{
stringFromFile.Append(SR.ReadLine());
}
Of note here, S is never set after that initial ReadLine(), so the S != null condition never triggers if you enter the while loop. Try:
S = SR.ReadLine();
while (S != null)
{
stringFromFile.Append(S = SR.ReadLine());
}
or use one of the other comments.
If you need to remove newlines, use string.Replace(Environment.NewLine, "")
Leaving aside the horrible variable names and the lack of a using statement (you won't close the file if there are any exceptions) that should be okay, and certainly shouldn't take 5 minutes to read 2.5 megs.
Where does the file live? Is it on a flaky network share?
By the way, the only difference between what you're doing and using File.ReadAllText is that you're losing line breaks. Is this deliberate? How long does ReadAllText take?
return System.IO.File.ReadAllText(path);
Marcus Griep has it right. IT's taking so long because YOU HAVE AN INFINITE LOOP. copied your code and made his changes and it read a 2.4 M text file in less than a second.
but I think you might miss the first line of the file. Try this.
S = SR.ReadLine();
while (S != null){
stringFromFile.Append(S);
S = SR.ReadLine();
}
Do you need the entire 2.5 Mb in memory at once?
If not, I would try to work with what you need.
Use System.IO.File.RealAllLines instead.
http://msdn.microsoft.com/en-us/library/system.io.file.readalllines.aspx
Alternatively, estimating the character count and passing that to StringBuilder's constructor as the capacity should speed it up.
Try this, should be much faster:
var str = System.IO.File.ReadAllText(path);
return str.Replace(Environment.NewLine, "");
By the way: Next time you're in a similar situation, try pre-allocating memory. This improves runtime drastically, regardless of the exact data structures you use. Most containers (StringBuilder as well) have a constructor that allow you to reserve memory. This way, less time-consuming reallocations are necessary during the read process.
For example, you could write the following if you want to read data from a file into a StringBuilder:
var info = new FileInfo(path);
var sb = new StringBuilder((int)info.Length);
(Cast necessary because System.IO.FileInfo.Length is long.)
ReadAllText was a very good solution for me. I used following code for 3.000.000 row text file and it took 4-5 seconds to read all rows.
string fileContent = System.IO.File.ReadAllText(txtFilePath.Text)
string[] arr = fileContent.Split('\n');
The loop and StringBuilder may be redundant; Try using
ReadToEnd.
To read a text file fastest you can use something like this
public static string ReadFileAndFetchStringInSingleLine(string file)
{
StringBuilder sb;
try
{
sb = new StringBuilder();
using (FileStream fs = File.Open(file, FileMode.Open))
{
using (BufferedStream bs = new BufferedStream(fs))
{
using (StreamReader sr = new StreamReader(bs))
{
string str;
while ((str = sr.ReadLine()) != null)
{
sb.Append(str);
}
}
}
}
return sb.ToString();
}
catch (Exception ex)
{
return "";
}
}
Hope this will help you. and for more info, please visit to the following link-
Fastest Way to Read Text Files

Categories

Resources