I'm trying to Read through a bunch of Zipped Files without Opening them in Fileviewer cause of time consumption. So errog I'm reading them in through a stream. However for LARGE files like 10GB+ it's not able to read it and kills the thread.There has to be a way to fix this and make it so that regardless of file size it'll read them. Please help.
The following is getting a OutOfMemoryException around the StreamReader Portion of this
code.
using(FileStream zipToOpen - new FileStream(fileLocation + "\\" + zipfile + ".zip":, FileMOde.open)){
using (ZipArchive archive = new ZipArchive(zipToOpen, ZipArchiveMode.Read)){
int index = 0;
List<Tuple<string, int, int>> reportFiles = new List<Tuple<string, int, int>>();
foreach (ZipArchiveEntry entry in archive.Entries){
int tranTypeTranSet = 0;
int tranTypeTrans = 0;
while (index <archive.Entries.Count()){
if(archive.Entries[index].FullName.Startswith("asdf"){
bacgroundWorker.ReportProgress(index, archive.Entries.Count);
ZipArchiveEntry readmeEntry = archive.Entreis[index];
using(streamReader reader = new StreamReader(archive.Entries[index].Open())){
while(!reader.EndOfStream){
string contents = reader.ReadToEnd();
int fileTranSet = Regex.Matches( contents, transsetString).count;
int filleTran = Regex.Matches(contents, transstring).count;
tranTypeTranSet += fileTranSet;
tranTypeTrans += fileTran;
ReportFiles.Add(new Tuple<string, int, int>(archive.Entries[index].FullName,fileTransset, fileTran));
totalTypeTrans = tranTypeTrans;
totalTyupeTranSet = tranTypeTranSet;
reader.close();
}
index++;
}
zipToOpen.Close();
}
Directory.CreateDirectory(baseReportDirectoryLocation);
createPreoprts(ReportType, reportFiles, totoalTypeTranset, totalTypeTrans);
}
}
}
}
Here's the StackTrace
System.outOfMemoryException
HResult=0x80070000E
Message=Insufficent memory to continue the execution of the program
source = mscorlib
stackTrace:
at System.Text.StringBuilder.ExpandByABlock(int32 minBlockCharCount)
at System.Text.StringBuilder.Append(Char* value, int32 valueCount)
at System.Text.StringBuilder.Appe4nd(Char[] value, int32 startIndex, Int32 charCount)
at System.IO.StreamReader.ReadToEnd()
at TransactionCounts.Form1.OnGenerate() in ______.cs:Line162
at TransactionCounts.Form1.Backgroundworker1_DoWork(Object sender,m DoWorkEventArgs e) in Form1.cs:line 285
at System.ComponentModel.BackgroundWorker.ondowork(doworkeventargs e)
at System.componentModel.BackgroundWorker.WorkerThreadStart(Object arguement)
As #EtiennedeMartel and #GrumpyCrouton have already mentioned you should not try to read huge files into memory because it never ends well. It looks like your ultimate goal is to search the files using regular expressions. So you need a way to partition the file into smaller chunks, perform regex on the chunks, and then join the results back together. You can use the Gigantor class library for this by replacing your inner while loop with something like the following...
using Imagibee.Gigantor
...
// Create a synchronization event
AutoResetEvent progress = new(false);
// Create the Regex instances
Regex transSetRegex = new(transsetString, RegexOptions.Compiled);
Regex transRegex = new(transstring, RegexOptions.Compiled);
// Create the first searcher and start it
RegexSearcher transSetSearcher = new(progress);
transSetSearcher.Start(pathToTextFile, transSetRegex, 80);
// Create the second searcher and start it
RegexSearcher transSearcher = new(progress);
transSearcher.Start(pathToTextFile, transRegex, 80);
// Put both searchers in a list
var mySearchers = new List<RegexSearcher>() { transSetSearcher, transSearcher };
// Wait for both searchers to complete
RegexSearcher.Wait(
mySearchers,
progress,
(_) => { Console.Write('.') },
1000);
Console.Write('\n');
// Now its finished, check for errors and do something with matches
foreach (var searcher in mySearchers) {
if (searcher.LastError.Length != 0) {
throw new Exception(searcher.LastError);
}
Console.WriteLine($"found {searcher.MatchCount} matches in {searcher.Path}";
foreach (var matchData in searcher.GetMatchData()) {
foreach (var match in matchData.Matches) {
Console.WriteLine($"{match}");
}
}
}
Gigantor does not have a way to work with compressed files, but I can see how that is a need and have filed this issue. So you would need to decompress the file yourself for now or look at how Gigantor makes the sausage and roll your own. I hope this helps.
Related
I'm new to C# and object-oriented programming in general. I have an application which parses text file.
The objective of the application is to read the contents of the provided text file and replace the matching values.
When a file about 800 MB to 1.2GB is provided as the input, the application crashes with error System.OutofMemoryException.
On researching, I came across couple of answers which recommend changing the Target Platform: to x64.
Same issue exists after changing the target platform.
Following is the code:
// Reading the text file
var _data = string.Empty;
using (StreamReader sr = new StreamReader(logF))
{
_data = sr.ReadToEnd();
sr.Dispose();
sr.Close();
}
foreach (var replacement in replacements)
{
_data = _data.Replace(replacement.Key, replacement.Value);
}
//Writing The text File
using (StreamWriter sw = new StreamWriter(logF))
{
sw.WriteLine(_data);
sw.Dispose();
sw.Close();
}
The error points to
_data = sr.ReadToEnd();
replacements is a dictionary. The Key contains the original word and the Value contains the word to be replaced.
The Key elements are replaced with the Value elements of the KeyValuePair.
The approached being followed is Reading the file, replacing and writing.
I tried using a StringBuilder instead of string yet the application crashed.
Can this be overcome by reading the file one line at a time, replacing and writing? What would be the efficient and faster way of doing the same.
Update: The system memory is 8 GB and on monitoring the performance it spikes upto 100% memory usage.
#Tim Schmelter answer works well.
However, the memory utilization spikes over 90%. It could be due to the following code:
String[] arrayofLine = File.ReadAllLines(logF);
// Generating Replacement Information
Dictionary<int, string> _replacementInfo = new Dictionary<int, string>();
for (int i = 0; i < arrayofLine.Length; i++)
{
foreach (var replacement in replacements.Keys)
{
if (arrayofLine[i].Contains(replacement))
{
arrayofLine[i] = arrayofLine[i].Replace(replacement, masking[replacement]);
if (_replacementInfo.ContainsKey(i + 1))
{
_replacementInfo[i + 1] = _replacementInfo[i + 1] + "|" + replacement;
}
else
{
_replacementInfo.Add(i + 1, replacement);
}
}
}
}
//Creating Replacement Information
StringBuilder sb = new StringBuilder();
foreach (var Replacement in _replacementInfo)
{
foreach (var replacement in Replacement.Value.Split('|'))
{
sb.AppendLine(string.Format("Line {0}: {1} ---> \t\t{2}", Replacement.Key, replacement, masking[replacement]));
}
}
// Writing the replacement information
if (sb.Length!=0)
{
using (StreamWriter swh = new StreamWriter(logF_Rep.txt))
{
swh.WriteLine(sb.ToString());
swh.Dispose();
swh.Close();
}
}
sb.Clear();
It finds the line number in which the replacement was made. Can this be captured using Tim's code in order to avoid loading the data into memory multiple times.
If you have very large files you should try MemoryMappedFile which is designed for this purpose(files > 1GB) and enables to read "windows" of a file into memory. But it's not easy to use.
A simple optimization would be to read and replace line by line
int lineNumber = 0;
var _replacementInfo = new Dictionary<int, List<string>>();
using (StreamReader sr = new StreamReader(logF))
{
using (StreamWriter sw = new StreamWriter(logF_Temp))
{
while (!sr.EndOfStream)
{
string line = sr.ReadLine();
lineNumber++;
foreach (var kv in replacements)
{
bool contains = line.Contains(kv.Key);
if (contains)
{
List<string> lineReplaceList;
if (!_replacementInfo.TryGetValue(lineNumber, out lineReplaceList))
lineReplaceList = new List<string>();
lineReplaceList.Add(kv.Key);
_replacementInfo[lineNumber] = lineReplaceList;
line = line.Replace(kv.Key, kv.Value);
}
}
sw.WriteLine(line);
}
}
}
At the end you can use File.Copy(logF_Temp, logF, true); if you want to overwite the old.
Read file line by line and append changed line to other file. At the end replace source file with new one (create backup or not).
var tmpFile = Path.GetTempFileName();
using (StreamReader sr = new StreamReader(logF))
{
using (StreamWriter sw = new StreamWriter(tmpFile))
{
string line;
while ((line = sr.ReadLine()) != null)
{
foreach (var replacement in replacements)
line = line.Replace(replacement.Key, replacement.Value);
sw.WriteLine(line);
}
}
}
File.Replace(tmpFile, logF, null);// you can pass backup file name instead on null if you want a backup of logF file
An OutOfMemoryException is thrown whenever the application tries and fails to allocate memory to perform an operation. According to Microsoft's documentation, the following operations can potentially throw an OutOfMemoryException:
Boxing (i.e., wrapping a value type in an Object)
Creating an array
Creating an object
If you try to create an infinite number of objects, then it's pretty reasonable to assume that you're going to run out of memory sooner or later.
(Note: don't forget about the garbage collector. Depending on the lifetimes of the objects being created, it will delete some of them if it determines they're no longer in use.)
For What I suspect is this line :
foreach (var replacement in replacements)
{
_data = _data.Replace(replacement.Key, replacement.Value);
}
That sooner or later u will run out of memory. Do u ever count how many it loop?
Try
Increase the available memory.
Reduce the amount of data you are retrieving.
I followed the advice in this SO Question. It did not work for me. Here is my situation and my code associated to it
I have a very large list, it has 2.04M items in it. I read it into memory to sort it, and then write it to a .csv file. I have 11 .csv files that I need to read, and subsequently sort. The first iteration gives me a memory usage of just over 1GB. I tried setting the list to null. I tried calling the List.Clear() I also tried the List.TrimExcess(). I have also waited for GC to do its thing. By hoping that it would know that there are no reads or writes going to that list.
Here is my code that I am using. Any advice is always greatly appreciated.
foreach (var zone in zones)
{
var filePath = string.Format("outputs/zone-{0}.csv", zone);
var lines = new List<string>();
using (StreamReader reader = new StreamReader(filePath))
{
var headers = reader.ReadLine();
while(! reader.EndOfStream)
{
var line = reader.ReadLine();
lines.Add(line);
}
//sort the file, then rewrite the file into inputs
lines = lines.OrderByDescending(l => l.ElementAt(0)).ThenByDescending(l => l.ElementAt(1)).ToList();
using (StreamWriter writer = new StreamWriter(string.Format("inputs/zone-{0}-sorted.csv", zone)))
{
writer.WriteLine(headers);
writer.Flush();
foreach (var line in lines)
{
writer.WriteLine(line);
writer.Flush();
}
}
lines.Clear();
lines.TrimExcess();
}
}
Try putting the whole thing in a using:
using (var lines = new List<string>())
{ ... }
Although I'm not sure about the nested usings.
Instead, where you have lines.Clear;, add lines = null;. That should encourage the garbage collector.
I get an out of memory exception a few seconds after I execute the following code. It doesn't write anything before the exception is thrown. The text file is about half a gigabyte in size. The text file I'm writing too will end up being about 3/4 of a gigabyte. Are there any tricks to navigate around this exception? I assume it's because the text file is too large.
public static void ToCSV(string fileWRITE, string fileREAD)
{
StreamWriter commas = new StreamWriter(fileWRITE);
var readfile = File.ReadAllLines(fileREAD);
foreach (string y in readfile)
{
string q = (y.Substring(0,15)+","+y.Substring(15,1)+","+y.Substring(16,6)+","+y.Substring(22,6)+ ",NULL,NULL,NULL,NULL");
commas.WriteLine(q);
}
commas.Close();
}
I have changed my code to the following yet I still get the same excpetion?
public static void ToCSV(string fileWRITE, string fileREAD)
{
StreamWriter commas = new StreamWriter(fileWRITE);
using (FileStream fs = File.Open(fileREAD, FileMode.Open, FileAccess.Read, FileShare.ReadWrite))
using (BufferedStream bs = new BufferedStream(fs))
using (StreamReader sr = new StreamReader(bs))
{
string y;
while ((y = sr.ReadLine()) != null)
{
string q = (y.Substring(0, 15) + "," + y.Substring(15, 1) + "," + y.Substring(16, 6) + "," + y.Substring(22, 6) + ",NULL,NULL,NULL,NULL");
commas.WriteLine(q);
}
}
commas.Close();
}
Read the file line by line, it will help you to avoid OutOfMemoryException. And I personnaly prefer using using's to handle streams. It makes sure that the file is closed in case of an exception.
public static void ToCSV(string fileWRITE, string fileREAD)
{
using(var commas = new StreamWriter(fileWRITE))
using(var file = new StreamReader("yourFile.txt"))
{
var line = file.ReadLine();
while( line != null )
{
string q = (y.Substring(0,15)+","+y.Substring(15,1)+","+y.Substring(16,6)+","+y.Substring(22,6)+ ",NULL,NULL,NULL,NULL");
commas.WriteLine(q);
line = file.ReadLine();
}
}
}
In the following post you can find numerous methods for reading and writing large files. Reading large text files with streams in C#
Basically you just need to read the bytes in a buffer which will be reused. This way you will load a very small amount of the file into memory.
Instead of reading the whole file, try to read and process line by line.
That way you do not risk to run into an out of memory exception. Because even if you succeed in organising more memory for the program, one day will come where the file will again be too large.
But the program may loose speed if you use less memory, so basically you'd have to balance the memory use and the execution time. One workaround would be to use a buffered output, reading more than one line at a time or transforming the strings in multiple threads.
I put lock on a few lines which are supposed to get cookie files and read them, but I see sometimes an error saying file already in use! So not sure whats going wrong...
Code:
private Object cookieLock = new Object();
main{
for (int j = 0; j < maxThreads; j++)
{
// Thread thread = new Thread(new ThreadStart(startPosting2(worker)));
Thread thread = new Thread(() => postingFunction2());
thread.IsBackground = true;
thread.Start();
}
}
public void postFunction2()
{
string tmpUsername = string.Empty;
string[] array2 = null;
try
{
lock (cookieLock)
{
array2 = File.ReadAllLines(tmpUsername + ".txt");
}
}
catch(Exception ex)
{
TextWriter sUrl = new StreamWriter("readingSameCookieFile.txt", true);
sUrl.WriteLine(exp.ToString());
sUrl.Close();
}
}
Am I doing anything wrong? These lines are executed by 20-100 threads simultaneously, I do not see it much but I do see it some time, so wondering why!
TXT FILE ERROR:
System.IO.IOException: The process cannot access the file 'C:\Users\Administrator\My Projects\Bot Files\2 Message Poster Bot\MessagePoster - NoLog - Copy\cookies\LaureneZimmerebner57936.txt' because it is being used by another process.
I would suggest to read file only once and share array2 among the 20-100 thread, because reading it multiple times will cause performance degradations. Also in multithreading environment it is recommended to keep all I/O operations in single thread.
Sharing array2 won't require locks if it will be only read by all threads.
Debug.Write(file name);// to make sure each thread opens different file.
You are trying to read cookies; may be the browser or some other application outside your code is accessing/writing to the cookie file, hence the exception.
you have not posted the entire code, just make sure the lock object is not instantiated multiple times or make it static to be sure.
Also try adding Thread.Sleep(0) after reading; see if that helps.
If you writing the contents of array2 to another file, make sure that is disposed/closed properly after writing.
tryin putting the entire method inside the lock block
public void postFunction2()
{
lock (cookieLock)
{
string tmpUsername = string.Empty;
string[] array2 = null;
try
{
array2 = File.ReadAllLines(tmpUsername + ".txt");
}
catch(Exception ex)
{
TextWriter sUrl = new StreamWriter("readingSameCookieFile.txt", true);
sUrl.WriteLine(exp.ToString());
sUrl.Close();
}
}
}
If you just want to overcome the problem you are having and are sure it's not beeing written then you can use this function instead of File.ReadAllLines. The key is it the share options it gives FileShare.ReadWrite.
private static string[] ReadAllLines(string fileName)
{
using (var fs = new FileStream(fileName, FileMode.Open,
FileAccess.Read,
FileShare.ReadWrite))
{
var reader = new StreamReader(fs);
//read all at once and split, or can read line by line into a list if you prefer
var allLines = reader.ReadToEnd().Split(new string[] { "\r\n", "\r", "\n" },
StringSplitOptions.None);
return allLines;
}
}
I can currently remove the last line of a text file using:
var lines = System.IO.File.ReadAllLines("test.txt");
System.IO.File.WriteAllLines("test.txt", lines.Take(lines.Length - 1).ToArray());
Although, how is it possible to instead remove the beginning of the text file?
Instead of lines.Take, you can use lines.Skip, like:
var lines = File.ReadAllLines("test.txt");
File.WriteAllLines("test.txt", lines.Skip(1).ToArray());
to truncate at the beginning despite the fact that the technique used (read all text and write everything back) is very inefficient.
About the efficient way: The inefficiency comes from the necessity to read the whole file into memory. The other way around could easily be to seek in a stream and copy the stream to another output file, delete the original, and rename the old. That one would be equally fast and yet consume much less memory.
Truncating a file at the end is much easier. You can just find the trunaction position and call FileStream.SetLength().
Here is an alternative:
using (var stream = File.OpenRead("C:\\yourfile"))
{
var items = new LinkedList<string>();
using (var reader = new StreamReader(stream))
{
reader.ReadLine(); // skip one line
string line;
while ((line = reader.ReadLine()) != null)
{
//it's far better to do the actual processing here
items.AddLast(line);
}
}
}
Update
If you need an IEnumerable<string> and don't want to waste memory you could do something like this:
public static IEnumerable<string> GetFileLines(string filename)
{
using (var stream = File.OpenRead(filename))
{
using (var reader = new StreamReader(stream))
{
reader.ReadLine(); // skip one line
string line;
while ((line = reader.ReadLine()) != null)
{
yield return line;
}
}
}
}
static void Main(string[] args)
{
foreach (var line in GetFileLines("C:\\yourfile.txt"))
{
// do something with the line here.
}
}
var lines = System.IO.File.ReadAllLines("test.txt");
System.IO.File.WriteAllLines("test.txt", lines.Skip(1).ToArray());
Skip eliminates the given number of elements from the beginning of the sequence. Take eliminates all but the given number of elements from the end of the sequence.
To remove fist line from a text file
System.IO.StreamReader file = new System.IO.StreamReader(filePath);
string data = file.ReadToEnd();
file.Close();
data = Regex.Replace(data, "<.*\n", "");
System.IO.StreamWriter file = new System.IO.StreamWriter(filePath, false);
file.Write(data);
file.Close();
can do in one line also
File.WriteAllLines(origialFilePath,File.ReadAllLines(originalFilePath).Skip(1));
Assuming you are passing your filePath as parameter to the function.