Clearing the memory associated to a large List - c#

I followed the advice in this SO Question. It did not work for me. Here is my situation and my code associated to it
I have a very large list, it has 2.04M items in it. I read it into memory to sort it, and then write it to a .csv file. I have 11 .csv files that I need to read, and subsequently sort. The first iteration gives me a memory usage of just over 1GB. I tried setting the list to null. I tried calling the List.Clear() I also tried the List.TrimExcess(). I have also waited for GC to do its thing. By hoping that it would know that there are no reads or writes going to that list.
Here is my code that I am using. Any advice is always greatly appreciated.
foreach (var zone in zones)
{
var filePath = string.Format("outputs/zone-{0}.csv", zone);
var lines = new List<string>();
using (StreamReader reader = new StreamReader(filePath))
{
var headers = reader.ReadLine();
while(! reader.EndOfStream)
{
var line = reader.ReadLine();
lines.Add(line);
}
//sort the file, then rewrite the file into inputs
lines = lines.OrderByDescending(l => l.ElementAt(0)).ThenByDescending(l => l.ElementAt(1)).ToList();
using (StreamWriter writer = new StreamWriter(string.Format("inputs/zone-{0}-sorted.csv", zone)))
{
writer.WriteLine(headers);
writer.Flush();
foreach (var line in lines)
{
writer.WriteLine(line);
writer.Flush();
}
}
lines.Clear();
lines.TrimExcess();
}
}

Try putting the whole thing in a using:
using (var lines = new List<string>())
{ ... }
Although I'm not sure about the nested usings.
Instead, where you have lines.Clear;, add lines = null;. That should encourage the garbage collector.

Related

C# how to read lines then delete line when found specified word

Im writing this in C#, I need my program to read a (large) XML file, search for a specific word ('Implements') and then delete every line where this word is present. But a bit confused as to how to go about it. This is what I have so far. Does anyone know what to use instead of what Im hoping 'Deleteline' would do, Im very new to coding. thanks!
using System;
using System.Xml;
namespace XMLReadAndParse
{
class XMLReadandParse
{
static void Main(string[] args)
{
string result = string.Empty;
var lines = File.ReadAllLines("CaxtonFx.FirebirdApi.xml");
foreach (var line in lines)
{
if(line.Contains("Implements"))
{
var text = line.Contains("Implements");
result = text.Deleteline(); //I know this doesn't work but not sure how to proceed
}
}
Console.WriteLine(result);
Console.ReadLine();
}
}
}
As far as I know, this is not possible. You cannot delete lines from the file you are reading.
You can however, create a new file, write the lines you need to the new file, delete the original file, and rename the new file to the old file.
I wrote a small and simple example using your code above, I have also tested it for you. All passed my side :)
var originalFile = "CaxtonFx.FirebirdApi.xml";
var tempFile = "TempXmlFile.xml";
var file = File.Create(tempFile);
file.Close();
var lines = File.ReadAllLines(originalFile);
var newLines = new List<string>();
foreach (var line in lines)
{
if (!line.Contains("Implements"))
{
newLines.Add(line);
}
}
File.WriteAllText(tempFile, String.Join(Environment.NewLine, newLines));
File.Delete(originalFile);
File.Move(tempFile, originalFile);
Let me know if you need me to clarify anything.
Happy coding!

C# Error: OutOfMemoryException - Reading a large text file and replacing from dictionary

I'm new to C# and object-oriented programming in general. I have an application which parses text file.
The objective of the application is to read the contents of the provided text file and replace the matching values.
When a file about 800 MB to 1.2GB is provided as the input, the application crashes with error System.OutofMemoryException.
On researching, I came across couple of answers which recommend changing the Target Platform: to x64.
Same issue exists after changing the target platform.
Following is the code:
// Reading the text file
var _data = string.Empty;
using (StreamReader sr = new StreamReader(logF))
{
_data = sr.ReadToEnd();
sr.Dispose();
sr.Close();
}
foreach (var replacement in replacements)
{
_data = _data.Replace(replacement.Key, replacement.Value);
}
//Writing The text File
using (StreamWriter sw = new StreamWriter(logF))
{
sw.WriteLine(_data);
sw.Dispose();
sw.Close();
}
The error points to
_data = sr.ReadToEnd();
replacements is a dictionary. The Key contains the original word and the Value contains the word to be replaced.
The Key elements are replaced with the Value elements of the KeyValuePair.
The approached being followed is Reading the file, replacing and writing.
I tried using a StringBuilder instead of string yet the application crashed.
Can this be overcome by reading the file one line at a time, replacing and writing? What would be the efficient and faster way of doing the same.
Update: The system memory is 8 GB and on monitoring the performance it spikes upto 100% memory usage.
#Tim Schmelter answer works well.
However, the memory utilization spikes over 90%. It could be due to the following code:
String[] arrayofLine = File.ReadAllLines(logF);
// Generating Replacement Information
Dictionary<int, string> _replacementInfo = new Dictionary<int, string>();
for (int i = 0; i < arrayofLine.Length; i++)
{
foreach (var replacement in replacements.Keys)
{
if (arrayofLine[i].Contains(replacement))
{
arrayofLine[i] = arrayofLine[i].Replace(replacement, masking[replacement]);
if (_replacementInfo.ContainsKey(i + 1))
{
_replacementInfo[i + 1] = _replacementInfo[i + 1] + "|" + replacement;
}
else
{
_replacementInfo.Add(i + 1, replacement);
}
}
}
}
//Creating Replacement Information
StringBuilder sb = new StringBuilder();
foreach (var Replacement in _replacementInfo)
{
foreach (var replacement in Replacement.Value.Split('|'))
{
sb.AppendLine(string.Format("Line {0}: {1} ---> \t\t{2}", Replacement.Key, replacement, masking[replacement]));
}
}
// Writing the replacement information
if (sb.Length!=0)
{
using (StreamWriter swh = new StreamWriter(logF_Rep.txt))
{
swh.WriteLine(sb.ToString());
swh.Dispose();
swh.Close();
}
}
sb.Clear();
It finds the line number in which the replacement was made. Can this be captured using Tim's code in order to avoid loading the data into memory multiple times.
If you have very large files you should try MemoryMappedFile which is designed for this purpose(files > 1GB) and enables to read "windows" of a file into memory. But it's not easy to use.
A simple optimization would be to read and replace line by line
int lineNumber = 0;
var _replacementInfo = new Dictionary<int, List<string>>();
using (StreamReader sr = new StreamReader(logF))
{
using (StreamWriter sw = new StreamWriter(logF_Temp))
{
while (!sr.EndOfStream)
{
string line = sr.ReadLine();
lineNumber++;
foreach (var kv in replacements)
{
bool contains = line.Contains(kv.Key);
if (contains)
{
List<string> lineReplaceList;
if (!_replacementInfo.TryGetValue(lineNumber, out lineReplaceList))
lineReplaceList = new List<string>();
lineReplaceList.Add(kv.Key);
_replacementInfo[lineNumber] = lineReplaceList;
line = line.Replace(kv.Key, kv.Value);
}
}
sw.WriteLine(line);
}
}
}
At the end you can use File.Copy(logF_Temp, logF, true); if you want to overwite the old.
Read file line by line and append changed line to other file. At the end replace source file with new one (create backup or not).
var tmpFile = Path.GetTempFileName();
using (StreamReader sr = new StreamReader(logF))
{
using (StreamWriter sw = new StreamWriter(tmpFile))
{
string line;
while ((line = sr.ReadLine()) != null)
{
foreach (var replacement in replacements)
line = line.Replace(replacement.Key, replacement.Value);
sw.WriteLine(line);
}
}
}
File.Replace(tmpFile, logF, null);// you can pass backup file name instead on null if you want a backup of logF file
An OutOfMemoryException is thrown whenever the application tries and fails to allocate memory to perform an operation. According to Microsoft's documentation, the following operations can potentially throw an OutOfMemoryException:
Boxing (i.e., wrapping a value type in an Object)
Creating an array
Creating an object
If you try to create an infinite number of objects, then it's pretty reasonable to assume that you're going to run out of memory sooner or later.
(Note: don't forget about the garbage collector. Depending on the lifetimes of the objects being created, it will delete some of them if it determines they're no longer in use.)
For What I suspect is this line :
foreach (var replacement in replacements)
{
_data = _data.Replace(replacement.Key, replacement.Value);
}
That sooner or later u will run out of memory. Do u ever count how many it loop?
Try
Increase the available memory.
Reduce the amount of data you are retrieving.

async reading and writing lines of text

I've found plenty of examples of how to read/write text to a file asynchronously, but I'm having a hard time finding how to do it with a List.
For the writing I've got this, which seems to work:
public async Task<List<string>> GetTextFromFile(string file)
{
using (var reader = File.OpenText(file))
{
var fileText = await reader.ReadToEndAsync();
return fileText.Split(new[] { Environment.NewLine }, StringSplitOptions.None).ToList();
}
}
The writing is a bit trickier though ...
public async Task WriteTextToFile(string file, List<string> lines, bool append)
{
if (!append && File.Exists(file)) File.Delete(file);
using (var writer = File.OpenWrite(file))
{
StringBuilder builder = new StringBuilder();
foreach (string value in lines)
{
builder.Append(value);
builder.Append(Environment.NewLine);
}
Byte[] info = new UTF8Encoding(true).GetBytes(builder.ToString());
await writer.WriteAsync(info, 0, info.Length);
}
}
My problem with this is that for a moment it appears my data is triple in memory.
The original List of my lines, then the StringBuilder makes it a single string with the newlines, then in info I have the byte representation of the string.
That seems excessive that I have to have three copies of essentially the same data in memory.
I am concerned with this because at times I'll be reading and writing large text files.
Following up on that, let me be clear - I know that for extremely large text files I can do this all line by line. What I am looking for are two methods of reading/writing data. The first is to read in the whole thing and process it, and the second is to do it line by line. Right now I am working on the first approach for my small and moderate sized text files. But I am still concerned with the data replication issue.
The following might suit your needs as it does not store the data again as well as writing it line by line:
public async Task WriteTextToFile(string file, List<string> lines, bool append)
{
if (!append && File.Exists(file))
File.Delete(file);
using (var writer = File.OpenWrite(file))
{
using (var streamWriter = new StreamWriter(writer))
foreach (var line in lines)
await streamWriter.WriteLineAsync(line);
}
}

Removing the first line of a text file in C#

I can currently remove the last line of a text file using:
var lines = System.IO.File.ReadAllLines("test.txt");
System.IO.File.WriteAllLines("test.txt", lines.Take(lines.Length - 1).ToArray());
Although, how is it possible to instead remove the beginning of the text file?
Instead of lines.Take, you can use lines.Skip, like:
var lines = File.ReadAllLines("test.txt");
File.WriteAllLines("test.txt", lines.Skip(1).ToArray());
to truncate at the beginning despite the fact that the technique used (read all text and write everything back) is very inefficient.
About the efficient way: The inefficiency comes from the necessity to read the whole file into memory. The other way around could easily be to seek in a stream and copy the stream to another output file, delete the original, and rename the old. That one would be equally fast and yet consume much less memory.
Truncating a file at the end is much easier. You can just find the trunaction position and call FileStream.SetLength().
Here is an alternative:
using (var stream = File.OpenRead("C:\\yourfile"))
{
var items = new LinkedList<string>();
using (var reader = new StreamReader(stream))
{
reader.ReadLine(); // skip one line
string line;
while ((line = reader.ReadLine()) != null)
{
//it's far better to do the actual processing here
items.AddLast(line);
}
}
}
Update
If you need an IEnumerable<string> and don't want to waste memory you could do something like this:
public static IEnumerable<string> GetFileLines(string filename)
{
using (var stream = File.OpenRead(filename))
{
using (var reader = new StreamReader(stream))
{
reader.ReadLine(); // skip one line
string line;
while ((line = reader.ReadLine()) != null)
{
yield return line;
}
}
}
}
static void Main(string[] args)
{
foreach (var line in GetFileLines("C:\\yourfile.txt"))
{
// do something with the line here.
}
}
var lines = System.IO.File.ReadAllLines("test.txt");
System.IO.File.WriteAllLines("test.txt", lines.Skip(1).ToArray());
Skip eliminates the given number of elements from the beginning of the sequence. Take eliminates all but the given number of elements from the end of the sequence.
To remove fist line from a text file
System.IO.StreamReader file = new System.IO.StreamReader(filePath);
string data = file.ReadToEnd();
file.Close();
data = Regex.Replace(data, "<.*\n", "");
System.IO.StreamWriter file = new System.IO.StreamWriter(filePath, false);
file.Write(data);
file.Close();
can do in one line also
File.WriteAllLines(origialFilePath,File.ReadAllLines(originalFilePath).Skip(1));
Assuming you are passing your filePath as parameter to the function.

Reading from file not fast enough, how would I speed it up?

This is the way I read file:
public static string readFile(string path)
{
StringBuilder stringFromFile = new StringBuilder();
StreamReader SR;
string S;
SR = File.OpenText(path);
S = SR.ReadLine();
while (S != null)
{
stringFromFile.Append(SR.ReadLine());
}
SR.Close();
return stringFromFile.ToString();
}
The problem is it so long (the .txt file is about 2.5 megs). Took over 5 minutes. Is there a better way?
Solution taken
public static string readFile(string path)
{
return File.ReadAllText(path);
}
Took less than 1 second... :)
S = SR.ReadLine();
while (S != null)
{
stringFromFile.Append(SR.ReadLine());
}
Of note here, S is never set after that initial ReadLine(), so the S != null condition never triggers if you enter the while loop. Try:
S = SR.ReadLine();
while (S != null)
{
stringFromFile.Append(S = SR.ReadLine());
}
or use one of the other comments.
If you need to remove newlines, use string.Replace(Environment.NewLine, "")
Leaving aside the horrible variable names and the lack of a using statement (you won't close the file if there are any exceptions) that should be okay, and certainly shouldn't take 5 minutes to read 2.5 megs.
Where does the file live? Is it on a flaky network share?
By the way, the only difference between what you're doing and using File.ReadAllText is that you're losing line breaks. Is this deliberate? How long does ReadAllText take?
return System.IO.File.ReadAllText(path);
Marcus Griep has it right. IT's taking so long because YOU HAVE AN INFINITE LOOP. copied your code and made his changes and it read a 2.4 M text file in less than a second.
but I think you might miss the first line of the file. Try this.
S = SR.ReadLine();
while (S != null){
stringFromFile.Append(S);
S = SR.ReadLine();
}
Do you need the entire 2.5 Mb in memory at once?
If not, I would try to work with what you need.
Use System.IO.File.RealAllLines instead.
http://msdn.microsoft.com/en-us/library/system.io.file.readalllines.aspx
Alternatively, estimating the character count and passing that to StringBuilder's constructor as the capacity should speed it up.
Try this, should be much faster:
var str = System.IO.File.ReadAllText(path);
return str.Replace(Environment.NewLine, "");
By the way: Next time you're in a similar situation, try pre-allocating memory. This improves runtime drastically, regardless of the exact data structures you use. Most containers (StringBuilder as well) have a constructor that allow you to reserve memory. This way, less time-consuming reallocations are necessary during the read process.
For example, you could write the following if you want to read data from a file into a StringBuilder:
var info = new FileInfo(path);
var sb = new StringBuilder((int)info.Length);
(Cast necessary because System.IO.FileInfo.Length is long.)
ReadAllText was a very good solution for me. I used following code for 3.000.000 row text file and it took 4-5 seconds to read all rows.
string fileContent = System.IO.File.ReadAllText(txtFilePath.Text)
string[] arr = fileContent.Split('\n');
The loop and StringBuilder may be redundant; Try using
ReadToEnd.
To read a text file fastest you can use something like this
public static string ReadFileAndFetchStringInSingleLine(string file)
{
StringBuilder sb;
try
{
sb = new StringBuilder();
using (FileStream fs = File.Open(file, FileMode.Open))
{
using (BufferedStream bs = new BufferedStream(fs))
{
using (StreamReader sr = new StreamReader(bs))
{
string str;
while ((str = sr.ReadLine()) != null)
{
sb.Append(str);
}
}
}
}
return sb.ToString();
}
catch (Exception ex)
{
return "";
}
}
Hope this will help you. and for more info, please visit to the following link-
Fastest Way to Read Text Files

Categories

Resources