replacing \n with \r\n in a large text file - c#

I have a large text file with a lot of \n that I need to replace with \r\n. With small text files, I was using the ReadToEnd method to get the file as a string and then use the Replace method and then write the string to a file. With a big file, however, I get an OutOfMemory exception because the string is too big. Any help would be greatly appreciated. Thanks.

private void foo() {
StreamReader reader = new StreamReader(#"D:\InputFile.txt");
StreamWriter writer = new StreamWriter(#"D:\OutputFile.txt");
string currentLine;
while (!reader.EndOfStream) {
currentLine = reader.ReadLine();
writer.Write(currentLine + "\r\n");
}
reader.Close();
writer.Close();
}
This should resolve your problem. Please note, that reader.ReadLine() cuts of the trailing "\n".

DiableNoir's solution is the right idea, but the implementation is buggy and it needs some explanation. Here's an improved one:
using (var reader = new StreamReader(#"D:\InputFile.txt"))
using (var writer = new StreamWriter(#"D:\OutputFile.txt")) // or any other TextWriter
{
while (!reader.EndOfStream) {
var currentLine = reader.ReadLine();
writer.Write(currentLine + "\r\n");
}
}
You use a TextReader for input and a TextWriter for output (this one might direct to a file or to an in-memory string). reader.ReadLine will not return the line ending as part of the line, so you need to write it explicitly (instead of using string.Replace, which will not accomplish anything at all).
Also, exactly because you will never see \n or \r as part of currentLine, this program is safe to run again on the output it has produced (in this case its output will be exactly identical to its input). This would not be the case if currentLine included the line ending, because it would change \n to \r\n the first time, and then make it \r\r\n the second time, etc.

You could use Read and specify how many bytes to read each time. Such as read the file in 10 MB chunks.

Or if you need like a larger buffer you can use StreamReader.ReadBlock();

Related

How to convert carriage returns into actual line breaks

I have a text file that I downloaded from this (it's just the English dictionary) which displays fine in a browser, but when I open it in Notepad it doesn't recognize the line breaks. I thought a simple C# application could detect the flavor of carriage returns they use and turn them into actual line breaks and spit out a more nicely formatted txt file but I've failed with techniques like String.Replace("\r", "\n"); that I thought would be easy tricks. How are these carriage returns encoded and how can I reformat the file to make it readable in something like Notepad? C# is preferred because that's what I'm used to, but if it's easier in some other method I'll be happy to consider alternatives.
If you really want to do this in c# all you need to do is this...
File.WriteAllLines("outfile.txt", File.ReadAllLines("infile.txt"));
... If you want slightly more complex yet faster and less memory do it this way ...
using (var reader = new StreamReader("infile.txt"))
using (var writer = new StreamWriter("outfile.txt"))
while (!reader.EndOfStream)
writer.WriteLine(reader.ReadLine());
... if you really want to overkill it as an excuse to use extension methods and LINQ then do this ...
//Sample use
//"infile.txt".ReadFileAsLines()
// .WriteAsLinesTo("outfile.txt");
public static class ToolKit
{
public static IEnumerable<string> ReadFileAsLines(this string infile)
{
if (string.IsNullOrEmpty(infile))
throw new ArgumentNullException("infile");
if (!File.Exists(infile))
throw new FileNotFoundException("File Not Found", infile);
using (var reader = new StreamReader(infile))
while (!reader.EndOfStream)
yield return reader.ReadLine();
}
public static void WriteAsLinesTo(this IEnumerable<string> lines, string outfile)
{
if (lines == null)
throw new ArgumentNullException("lines");
if (string.IsNullOrEmpty(outfile))
throw new ArgumentNullException("outfile");
using (var writer = new StreamWriter(outfile))
foreach (var line in lines)
writer.WriteLine(line);
}
}
Notepad is the only Windows text editor I know, which doesn't recognize the Unix-style newlines \n, and requires Windows-style newlines \r\n to properly format the text. If you convert \n to \r\n, it will be displayed as intended. Also, any other (modern) text editor should display the text properly as-is.

Unwanted second line in stream writer

I am coding a program, and i use stream writer, to write text to some files. The problem is, that when it writes to the text file, it leaves an unwanted extra line to the file, which confuses my program, when try to read it later. An example of the stream writer that i use is this:
string enbl = "Enabled = false;";
string path = Directory.GetCurrentDirectory();
System.IO.StreamWriter file = new System.IO.StreamWriter("path");
file.WriteLine(enbl);
file.Close();
Is it possible ti fix that ?
When you dont want that the output get an implicit CR+LF (0x0D + 0x0A) at the end you have to you use file.Write(enbl); instead of file.WriteLine(enbl);
Just use Write instead of WriteLine method.

Cut and paste line of text from text file c#

Hi everyone beginner here looking for some advice with a program I'm writing in C#. I need to be able to open a text document, read the first line of text (that is not blank), save this line of text to another text document and finally overwrite the read line with an empty line.
This is what I have so far, everything works fine until the last part where I need to write a blank line to the original text document, I just get a full blank document. Like I mentioned above I'm new to C# so I'm sure there is an easy solution to this but I can't figure it out, any help appreciated:
try
{
StreamReader sr = new StreamReader(#"C:\Users\Stephen\Desktop\Sample.txt");
line = sr.ReadLine();
while (line == "")
{
line = sr.ReadLine();
}
sr.Close();
string path = (#"C:\Users\Stephen\Desktop\new.txt");
if (!File.Exists(path))
{
File.Create(path).Dispose();
TextWriter tw = new StreamWriter(path);
tw.WriteLine(line);
tw.Close();
}
else if (File.Exists(path))
{
TextWriter tw = new StreamWriter(path, true);
tw.WriteLine(line);
tw.Close();
}
StreamWriter sw = new StreamWriter(#"C:\Users\Stephen\Desktop\Sample.txt");
int cnt1 = 0;
while (cnt1 < 1)
{
sw.WriteLine("");
cnt1 = 1;
}
sw.Close();
}
catch (Exception e)
{
Console.WriteLine("Exception: " + e.Message);
}
finally
{
Console.WriteLine("Executing finally block.");
}
else
Console.WriteLine("Program Not Installed");
Console.ReadLine();
Unfortunately, you do have to go through the painstaking process of rewriting the file. In most cases, you could get away with loading it into memory and just doing something like:
string contents = File.ReadAllText(oldFile);
contents = contents.Replace("bad line!", "good line!");
File.WriteAllText(newFile, contents);
Remember that you'll have to deal with the idea of line breaks here, since string.Replace doesn't innately pay attention only to whole lines. But that's certainly doable. You could also use a regex with that approach. You can also use File.ReadAllLines(string) to read each line into an IEnumerable<string> and test each one while you write them back to the new file. It just depends on what exactly you want to do and how precise you want to be about it.
using (var writer = new StreamWriter(newFile))
{
foreach (var line in File.ReadAllLines(oldFile))
{
if (shouldInsert(line))
writer.WriteLine(line);
}
}
That, of course, depends on the predicate shouldInsert, but you can modify that as you see so fit. But the nature of IEnumerable<T> should make that relatively light on resources. You could also use a StreamReader for a bit lower-level of support.
using (var writer = new StreamWriter(newFile))
using (var reader = new StreamReader(oldFile))
{
string line;
while ((line = reader.ReadLine()) != null)
{
if (shouldInsert(line))
writer.WriteLine(line);
}
}
Recall, of course, that this could leave you with an extra, empty line at the end of the file. I'm too tired to say that with the certainty I should be able to, but I'm pretty sure that's the case. Just keep an eye out for that, if it really matters. Of course, it normally won't.
That all said, the best way to do it would be to have a bit of fun and do it without wasting the memory, by writing a function to read the FileStream in and write out the appropriate bytes to your new file. That's, of course, the most complicated and likely over-kill way, but it'd be a fun undertaking.
See: Append lines to a file using a StreamWriter
Add true to the StreamWriter constructor to set it to "Append" mode. Note that this adds a line at the bottom of the document, so you may have to fiddle a bit to insert or overwrite it at the top instead.
And see: Edit a specific Line of a Text File in C#
Apparently, it's not that easy to just insert or overwrite a single line and the usual method is just to copy all lines while replacing the one you want and writing every line back to the file.

How to read an entire file to a string using C#?

What is the quickest way to read a text file into a string variable?
I understand it can be done in several ways, such as read individual bytes and then convert those to string. I was looking for a method with minimal coding.
How about File.ReadAllText:
string contents = File.ReadAllText(#"C:\temp\test.txt");
A benchmark comparison of File.ReadAllLines vs StreamReader ReadLine from C# file handling
Results. StreamReader is much faster for large files with 10,000+
lines, but the difference for smaller files is negligible. As always,
plan for varying sizes of files, and use File.ReadAllLines only when
performance isn't critical.
StreamReader approach
As the File.ReadAllText approach has been suggested by others, you can also try the quicker (I have not tested quantitatively the performance impact, but it appears to be faster than File.ReadAllText (see comparison below)). The difference in performance will be visible only in case of larger files though.
string readContents;
using (StreamReader streamReader = new StreamReader(path, Encoding.UTF8))
{
readContents = streamReader.ReadToEnd();
}
Comparison of File.Readxxx() vs StreamReader.Readxxx()
Viewing the indicative code through ILSpy I have found the following about File.ReadAllLines, File.ReadAllText.
File.ReadAllText - Uses StreamReader.ReadToEnd internally
File.ReadAllLines - Also uses StreamReader.ReadLine internally with the additionally overhead of creating the List<string> to return as the read lines and looping till the end of file.
So both the methods are an additional layer of convenience built on top of StreamReader. This is evident by the indicative body of the method.
File.ReadAllText() implementation as decompiled by ILSpy
public static string ReadAllText(string path)
{
if (path == null)
{
throw new ArgumentNullException("path");
}
if (path.Length == 0)
{
throw new ArgumentException(Environment.GetResourceString("Argument_EmptyPath"));
}
return File.InternalReadAllText(path, Encoding.UTF8);
}
private static string InternalReadAllText(string path, Encoding encoding)
{
string result;
using (StreamReader streamReader = new StreamReader(path, encoding))
{
result = streamReader.ReadToEnd();
}
return result;
}
string contents = System.IO.File.ReadAllText(path)
Here's the MSDN documentation
For the noobs out there who find this stuff fun and interesting, the fastest way to read an entire file into a string in most cases (according to these benchmarks) is by the following:
using (StreamReader sr = File.OpenText(fileName))
{
string s = sr.ReadToEnd();
}
//you then have to process the string
However, the absolute fastest to read a text file overall appears to be the following:
using (StreamReader sr = File.OpenText(fileName))
{
string s = String.Empty;
while ((s = sr.ReadLine()) != null)
{
//do what you have to here
}
}
Put up against several other techniques, it won out most of the time, including against the BufferedReader.
Take a look at the File.ReadAllText() method
Some important remarks:
This method opens a file, reads each line of the file, and then adds
each line as an element of a string. It then closes the file. A line
is defined as a sequence of characters followed by a carriage return
('\r'), a line feed ('\n'), or a carriage return immediately followed
by a line feed. The resulting string does not contain the terminating
carriage return and/or line feed.
This method attempts to automatically detect the encoding of a file
based on the presence of byte order marks. Encoding formats UTF-8 and
UTF-32 (both big-endian and little-endian) can be detected.
Use the ReadAllText(String, Encoding) method overload when reading
files that might contain imported text, because unrecognized
characters may not be read correctly.
The file handle is guaranteed to be closed by this method, even if
exceptions are raised
string text = File.ReadAllText("Path"); you have all text in one string variable. If you need each line individually you can use this:
string[] lines = File.ReadAllLines("Path");
System.IO.StreamReader myFile =
new System.IO.StreamReader("c:\\test.txt");
string myString = myFile.ReadToEnd();
if you want to pick file from Bin folder of the application then you can try following and don't forget to do exception handling.
string content = File.ReadAllText(Path.Combine(System.IO.Directory.GetCurrentDirectory(), #"FilesFolder\Sample.txt"));
#Cris sorry .This is quote MSDN Microsoft
Methodology
In this experiment, two classes will be compared. The StreamReader and the FileStream class will be directed to read two files of 10K and 200K in their entirety from the application directory.
StreamReader (VB.NET)
sr = New StreamReader(strFileName)
Do
line = sr.ReadLine()
Loop Until line Is Nothing
sr.Close()
FileStream (VB.NET)
Dim fs As FileStream
Dim temp As UTF8Encoding = New UTF8Encoding(True)
Dim b(1024) As Byte
fs = File.OpenRead(strFileName)
Do While fs.Read(b, 0, b.Length) > 0
temp.GetString(b, 0, b.Length)
Loop
fs.Close()
Result
FileStream is obviously faster in this test. It takes an additional 50% more time for StreamReader to read the small file. For the large file, it took an additional 27% of the time.
StreamReader is specifically looking for line breaks while FileStream does not. This will account for some of the extra time.
Recommendations
Depending on what the application needs to do with a section of data, there may be additional parsing that will require additional processing time. Consider a scenario where a file has columns of data and the rows are CR/LF delimited. The StreamReader would work down the line of text looking for the CR/LF, and then the application would do additional parsing looking for a specific location of data. (Did you think String. SubString comes without a price?)
On the other hand, the FileStream reads the data in chunks and a proactive developer could write a little more logic to use the stream to his benefit. If the needed data is in specific positions in the file, this is certainly the way to go as it keeps the memory usage down.
FileStream is the better mechanism for speed but will take more logic.
well the quickest way meaning with the least possible C# code is probably this one:
string readText = System.IO.File.ReadAllText(path);
you can use :
public static void ReadFileToEnd()
{
try
{
//provide to reader your complete text file
using (StreamReader sr = new StreamReader("TestFile.txt"))
{
String line = sr.ReadToEnd();
Console.WriteLine(line);
}
}
catch (Exception e)
{
Console.WriteLine("The file could not be read:");
Console.WriteLine(e.Message);
}
}
string content = System.IO.File.ReadAllText( #"C:\file.txt" );
You can use like this
public static string ReadFileAndFetchStringInSingleLine(string file)
{
StringBuilder sb;
try
{
sb = new StringBuilder();
using (FileStream fs = File.Open(file, FileMode.Open))
{
using (BufferedStream bs = new BufferedStream(fs))
{
using (StreamReader sr = new StreamReader(bs))
{
string str;
while ((str = sr.ReadLine()) != null)
{
sb.Append(str);
}
}
}
}
return sb.ToString();
}
catch (Exception ex)
{
return "";
}
}
Hope this will help you.
you can read a text from a text file in to string as follows also
string str = "";
StreamReader sr = new StreamReader(Application.StartupPath + "\\Sample.txt");
while(sr.Peek() != -1)
{
str = str + sr.ReadLine();
}
I made a comparison between a ReadAllText and StreamBuffer for a 2Mb csv and it seemed that the difference was quite small but ReadAllText seemed to take the upper hand from the times taken to complete functions.
I'd highly recommend using the File.ReadLines(path) compare to StreamReader or any other File reading methods. Please find below the detailed performance benchmark for both small-size file and large-size file.
I hope this would help.
File operations read result:
For small file (just 8 lines)
For larger file (128465 lines)
Readlines Example:
public void ReadFileUsingReadLines()
{
var contents = File.ReadLines(path);
}
Note : Benchmark is done in .NET 6.
This comment is for those who are trying to read the complete text file in winform using c++ with the help of C# ReadAllText function
using namespace System::IO;
String filename = gcnew String(charfilename);
if(System::IO::File::Exists(filename))
{
String ^ data = gcnew String(System::IO::File::RealAllText(filename)->Replace("\0", Environment::Newline));
textBox1->Text = data;
}

Best way to write huge string into a file

In C#, I'm reading a moderate size of file (100 KB ~ 1 MB), modifying some parts of the content, and finally writing to a different file. All contents are text. Modification is done as string objects and string operations. My current approach is:
Read each line from the original file by using StreamReader.
Open a StringBuilder for the contents of the new file.
Modify the string object and call AppendLine of the StringBuilder (until the end of the file)
Open a new StreamWriter, and write the StringBuilder to the write stream.
However, I've found that StremWriter.Write truncates 32768 bytes (2^16), but the length of StringBuilder is greater than that. I could write a simple loop to guarantee entire string to a file. But, I'm wondering what would be the most efficient way in C# for doing this task?
To summarize, I'd like to modify only some parts of a text file and write to a different file. But, the text file size could be larger than 32768 bytes.
== Answer == I'm sorry to make confusin to you! It was just I didn't call flush. StremWriter.Write does not have a short (e.g., 2^16) limitation.
StreamWriter.Write
does not
truncate the string and has no limitation.
Internally it uses String.CopyTo which on the other hand uses unsafe code (using fixed) to copy chars so it is the most efficient.
The problem is most likely related to not closing the writer. See http://msdn.microsoft.com/en-us/library/system.io.streamwriter.flush.aspx.
But I would suggest not loading the whole file in memory if that can be avoided.
can you try this :
void Test()
{
using (var inputFile = File.OpenText(#"c:\in.txt"))
{
using (var outputFile = File.CreateText(#"c:\out.txt"))
{
string current;
while ((current = inputFile.ReadLine()) != null)
{
outputFile.WriteLine(Process(current));
}
}
}
}
string Process(string current)
{
return current.ToLower();
}
It avoid to have to full file loaded in memory, by processing line by line and writing it directly
Well, that entirely depends on what you want to modify. If your modifications of one part of the text file are dependent on another part of the text file, you obviously need to have both of those parts in memory. If however, you only need to modify the text file on a line-by-line basis then use something like this :
using (StreamReader sr = new StreamReader(#"test.txt"))
{
using (StreamWriter sw = new StreamWriter(#"modifiedtest.txt"))
{
while (!sr.EndOfStream)
{
string line = sr.ReadLine();
//do some modifications
sw.WriteLine(line);
sw.Flush(); //force line to be written to disk
}
}
}
Instead of of running though the hole dokument i would use a regex to find what you are looking for Sample:
public List<string> GetAllProfiles()
{
List<string> profileNames = new List<string>();
using (StreamReader reader = new StreamReader(_folderLocation + "profiles.pg"))
{
string profiles = reader.ReadToEnd();
var regex = new Regex("\nname=([^\r]{0,})", RegexOptions.IgnoreCase);
var regexMatchs = regex.Matches(profiles);
profileNames.AddRange(from Match regexMatch in regexMatchs select regexMatch.Groups[1].Value);
}
return profileNames;
}

Categories

Resources