I'm using log4net with an memoryappender.
When I try to read all the lines to an variable (here: StringBuilder) I'm getting an OutOfMemory-Exception when the amount of lines is to high. I've tested it with 1mio lines:
public class RenderingMemoryAppender : MemoryAppender
{
public IEnumerable<string> GetRenderedEvents(List<Level> levelList = null)
{
foreach (var loggingEvent in GetEvents())
{
yield return RenderLoggingEvent(loggingEvent);
}
}
public byte[] GetEventsAsByteArray(List<Level> levelList=null )
{
var events = GetRenderedEvents(levelList);
var s = new StringBuilder();
foreach (var e in events)
{
s.Append(e);
}
//the exception is thrown here below when calling s.ToString().
return Encoding.UTF8.GetBytes(s.ToString());
}
}
when I simply add one-million lines to an stringbuilder without the log4net component everything works fine...
I've also tried to use this:
var stringBuilder = new StringBuilder();
var stringWriter = new StringWriter(stringBuilder);
foreach (var loggingEvent in GetEvents())
{
stringBuilder.Clear();
loggingEvent.WriteRenderedMessage(stringWriter);
list.Add(stringBuilder.ToString());
}
but this also didn't work.
If you want to have that many lines in memory the runtime wants to allocate a piece of memory that can contain the whole string. If it is not possible because at that moment, the OutOfMemory exception is thrown. If you need the bytes from a memorystream it is more efficient to call the ToArray() method on the memorystream, that can also fail for the same reason as the ToString method fails on the StringBuilder. You can check if you run in 64bit mode, if not, it can help to get more address space. I would advice to rethink your logging method. The way you are doing it now is unreliable and can even break your program.
Related
I need to find a way to read information out of a very big CSV file with unity. The file is approx. 15000*4000 entries with almost 200MB and could even be longer.
Just using ReadAllLines on the file does kind of work but as soon as I try to do any operation on it, it will crash. Here is the code I am using just counting all non zero values which already crashes it. It's okay if the code might need loading time but it shouldn't crash. I assume it's because I save everything in the memory and therefore flood my RAM? Any ideas how to fix this that it won't crash?
private void readCSV()
{
string[] lines = File.ReadAllLines("Assets/Datasets/testCsv.csv");
foreach (string line in lines)
{
List<string> values = new List<string>();
values = line.Split(',').ToList();
int i = 0;
foreach (string val in values)
{
if (val != "0")
{
i++;
}
}
}
}
As I already stated in your other question you should rather go with a streamed solution in order to not load the entire thing into memory at all.
Also both FileIO as well as string.Split are slow especially for soany entries! Rather use a background thread / async Task for this!
The next future possible issue in your case 15000*4000 entries means a total of 60000000 cells. Which is still fine. However, the maximum value of int is 2147483647 so if your file grows further it might break / behave unexpected => rather use e.g. uint or directly ulong to avoid that issue.
private async Task<ulong> CountNonZeroEntries()
{
ulong count = 0;
// Using a stream reader you can load the content into memory one line at a time
using(var sr = new StreamReader("Assets/Datasets/testCsv.csv"))
{
while(true)
{
var line = await sr.ReadLineAsync();
if(line == null) break;
var values = line.Split(',');
foreach(var v in values)
{
if(v != "0") count++;
}
}
}
return count;
}
And then of course you would need to wait for the result e.g. using
// If you declare Start as asnyc Unity automatically calls it asynchronously
private async void Start()
{
var count = await CountNonZeroEntries();
Debug.Log($"{count} cells are != \"0\".");
}
The same can be done using Linq a bit easier to write in my eyes
using System.Linq;
...
private Task<ulong> CountNonZeroEntries()
{
return File.ReadLines("Assets/Datasets/testCsv.csv").Select(line => line.Split(',')).Count(v => v != "0");
}
Also File.ReadLines doesn't load the entire content at once but rather a lazy enumerable so you can use Linq queries on them one by one.
Simple host app searching for assemblies by special Interface and importing from them list of delegates, now it's Func<string,string>.
Then it can execute any Func<T,T> and there's no problem.
Problems starts when any of this Func tries to access file, which doesn't exist.
No try-catch block, no File.Exists doesn't helps — when function tries to access a file (anyway, read, get stream, check, etc) — whole app just fails with FileNotFound in mscorlib.
How this can be fixed? App is really critical, and I can't perform file check in app, only just in assemblies.
UPD: Yes, that delegates contains async logic.
UPD2: Parts of code:
try
{
if(!File.Exists(filePath)) return null;
using (StreamWriter writer = new StreamWriter(destinationFilePath))
{
using (StreamReader reader = new StreamReader(filePath))
{
//some logic there
}
}
}
catch
{
}
Exception thrown at File.Exists().
This code used to import assemblies.
Commands = new Dictionary<string, Func<string, string>>();
foreach (string f in fileNames)
{
Assembly asm = Assembly.LoadFrom(f);
var types = asm.GetTypes();
foreach(Type t in types)
{
if (t.GetInterface("IMountPoint") != null)
{
var obj = Activator.CreateInstance(t);
var cmds = ((IMountPoint)obj).Init(EntryPoint);
foreach (var cmd in cmds)
{
if (!Commands.ContainsKey(cmd.Key.Trim().ToUpper()))
{
Commands.Add(cmd.Key.Trim().ToUpper(), cmd.Value);
}
}
}
}
}
And this code to run delegates:
string input = Console.ReadLine();
string res = Commands[command_key](input);
That's shameful.
I'm using late binding and forgot to copy assemblies manually, so assemblies with file existence checking are not loaded by app and it used old ones.
Sorry, guys.
I am curious why the following throws an error message (text reader closed exception) on the "last" assignment:
IEnumerable<string> textRows = File.ReadLines(sourceTextFileName);
IEnumerator<string> textEnumerator = textRows.GetEnumerator();
string first = textRows.First();
string last = textRows.Last();
However the following executes fine:
IEnumerable<string> textRows = File.ReadLines(sourceTextFileName);
string first = textRows.First();
string last = textRows.Last();
IEnumerator<string> textEnumerator = textRows.GetEnumerator();
What is the reason for the different behavior?
You've discovered a bug in the framework, as far as I can tell. It's reasonably subtle, because of the interaction of a few things:
When you call ReadLines(), the file is actually opened. Personally, I think of this as a bug in itself; I'd expect and hope that it would be lazy - only opening the file when you try to start iterating over it.
When you call GetEnumerator() the first time on the return value of ReadLines, it will actually return the same reference.
When First() calls GetEnumerator(), it will create a clone. This will share the same StreamReader as textEnumerator
When First() disposes its clone, it will dispose of the StreamReader, and set its variable to null. This doesn't affect the variable within the original, which now refers to a disposed StreamReader
When Last() calls GetEnumerator(), it will create a clone of the original object, complete with disposes StreamReader. It then tries to read from that reader, and throws an exception.
Now compare this with your second version:
When First() calls GetEnumerator(), the original reference is returned, complete with open reader.
When First() then calls Dispose(), the reader will be disposed and the variable set to null
When Last() calls GetEnumerator(), a clone will be created - but because the value it's cloning has a null reference, a new StreamReader is created, so it's able to read the file with no problems. It then disposes of the clone, which closes the reader
When GetEnumerator() is called, a second clone of the original object, opening yet another StreamReader - again, no problems there.
So basically, the problem in the first snippet is that you're calling GetEnumerator() a second time (in First()) without having disposed of the first object.
Here's another example of the same problem:
using System;
using System.IO;
using System.Linq;
class Test
{
static void Main()
{
var lines = File.ReadLines("test.txt");
var query = from x in lines
from y in lines
select x + "/" + y;
foreach (var line in query)
{
Console.WriteLine(line);
}
}
}
You could fix this by calling File.ReadLines twice - or by using a genuinely lazy implementation of ReadLines, like this:
using System.IO;
using System.Linq;
class Test
{
static void Main()
{
var lines = ReadLines("test.txt");
var query = from x in lines
from y in lines
select x + "/" + y;
foreach (var line in query)
{
Console.WriteLine(line);
}
}
static IEnumerable<string> ReadLines(string file)
{
using (var reader = File.OpenText(file))
{
string line;
while ((line = reader.ReadLine()) != null)
{
yield return line;
}
}
}
}
In the latter code, a new StreamReader is opened each time GetEnumerator() is called - so the result is each pair of lines in test.txt.
I'm trying to run this method, it works fine but every time after some hundreds of internal iterations I get with an Out of Memory exception:
...
MNDBEntities db = new MNDBEntities();
var regs = new List<DOCUMENTS>();
var query = from reg in db.DOCUMENTS
where reg.TAG_KEYS.Any(p => p.TAG_DATE_VALUES.FirstOrDefault().TAG_DATE_VALUE.HasValue
&& p.TAG_DATE_VALUES.FirstOrDefault().TAG_DATE_VALUE.Value.Year == 2012)
select reg;
var pages = new List<string>();
foreach (var item in query)
{
Document cert = new Document();
var tags = item.TAG_KEYS;
foreach (var tag in tags)
{
// Basic stuff...
}
var pagesS = item.PAGES;
foreach (var page in pagesS)
{
var path = #"C:\Kumquat\" + (int)page.NUMBER + ".vpimg";
File.WriteAllBytes(path, page.IMAGE);
pages.Add(path);
Console.WriteLine(path);
}
//cms.Save(cert, pages.ToArray()).Wait();
foreach (var pageFile in pages)
File.Delete(pageFile);
pagesS = null;
pages.Clear();
}
...
I'm pretty sure problem is related with the File.WriteAllBytes or the File.Delete because if I comment those lines the method runs without exception. What I'm doing is basically get some tags from a DB plus a document image, that image is then saved onto disk then a stored into a cms and then deleted from disk. Honestly don't figure out what I'm doing wrong with that File calls. Any idea?
This is what PerfView shows:
This is what visual studio 2012 profiler shows as the hot point, the thing is: this is all generated code (within the Entity Model) am I doing something wrong maybe with the properties of the model?
Try to use http://www.microsoft.com/en-us/download/details.aspx?id=28567 to profile your code, focusing on GC events, and CLR managed allocation tick events.
page.IMAGE could be the problem. Most likely it will allocate a byte array and never delete it. Best to change the code to:
page.WriteTo(path);
The rest of the code shown does look fine. The only possible problem is large object allocation, which could lead to fragmentation problem in LOH.
I'm building a console application that have to process a bunch of document.
To stay simple, the process is :
for each year between X and Y, query the DB to get a list of document reference to process
for each of this reference, process a local file
The process method is, I think, independent and should be parallelized as soon as input args are different :
private static bool ProcessDocument(
DocumentsDataset.DocumentsRow d,
string langCode
)
{
try
{
var htmFileName = d.UniqueDocRef.Trim() + langCode + ".htm";
var htmFullPath = Path.Combine("x:\path", htmFileName;
missingHtmlFile = !File.Exists(htmFullPath);
if (!missingHtmlFile)
{
var html = File.ReadAllText(htmFullPath);
// ProcessHtml is quite long : it use a regex search for a list of reference
// which are other documents, then sends the result to a custom WS
ProcessHtml(ref html);
File.WriteAllText(htmFullPath, html);
}
return true;
}
catch (Exception exc)
{
Trace.TraceError("{0,8}Fail processing {1} : {2}","[FATAL]", d.UniqueDocRef, exc.ToString());
return false;
}
}
In order to enumerate my document, I have this method :
private static IEnumerable<DocumentsDataset.DocumentsRow> EnumerateDocuments()
{
return Enumerable.Range(1990, 2020 - 1990).AsParallel().SelectMany(year => {
return Document.FindAll((short)year).Documents;
});
}
Document is a business class that wrap the retrieval of documents. The output of this method is a typed dataset (I'm returning the Documents table). The method is waiting for a year and I'm sure a document can't be returned by more than one year (year is part of the key actually).
Note the use of AsParallel() here, but I never got issue with this one.
Now, my main method is :
var documents = EnumerateDocuments();
var result = documents.Select(d => {
bool success = true;
foreach (var langCode in new string[] { "-e","-f" })
{
success &= ProcessDocument(d, langCode);
}
return new {
d.UniqueDocRef,
success
};
});
using (var sw = File.CreateText("summary.csv"))
{
sw.WriteLine("Level;UniqueDocRef");
foreach (var item in result)
{
string level;
if (!item.success) level = "[ERROR]";
else level = "[OK]";
sw.WriteLine(
"{0};{1}",
level,
item.UniqueDocRef
);
//sw.WriteLine(item);
}
}
This method works as expected under this form. However, if I replace
var documents = EnumerateDocuments();
by
var documents = EnumerateDocuments().AsParrallel();
It stops to work, and I don't understand why.
The error appears exactly here (in my process method):
File.WriteAllText(htmFullPath, html);
It tells me that the file is already opened by another program.
I don't understand what can cause my program not to works as expected. As my documents variable is an IEnumerable returning unique values, why my process method is breaking ?
thx for advises
[Edit] Code for retrieving document :
/// <summary>
/// Get all documents in data store
/// </summary>
public static DocumentsDS FindAll(short? year)
{
Database db = DatabaseFactory.CreateDatabase(connStringName); // MS Entlib
DbCommand cm = db.GetStoredProcCommand("Document_Select");
if (year.HasValue) db.AddInParameter(cm, "Year", DbType.Int16, year.Value);
string[] tableNames = { "Documents", "Years" };
DocumentsDS ds = new DocumentsDS();
db.LoadDataSet(cm, ds, tableNames);
return ds;
}
[Edit2] Possible source of my issue, thanks to mquander. If I wrote :
var test = EnumerateDocuments().AsParallel().Select(d => d.UniqueDocRef);
var testGr = test.GroupBy(d => d).Select(d => new { d.Key, Count = d.Count() }).Where(c=>c.Count>1);
var testLst = testGr.ToList();
Console.WriteLine(testLst.Where(x => x.Count == 1).Count());
Console.WriteLine(testLst.Where(x => x.Count > 1).Count());
I get this result :
0
1758
Removing the AsParallel returns the same output.
Conclusion : my EnumerateDocuments have something wrong and returns twice each documents.
Have to dive here I think
This is probably my source enumeration in cause
I suggest you to have each task put the file data into a global queue and have a parallel thread take writing requests from the queue and do the actual writing.
Anyway, the performance of writing in parallel on a single disk is much worse than writing sequentially, because the disk needs to spin to seek the next writing location, so you are just bouncing the disk around between seeks. It's better to do the writes sequentially.
Is Document.FindAll((short)year).Documents threadsafe? Because the difference between the first and the second version is that in the second (broken) version, this call is running multiple times concurrently. That could plausibly be the cause of the issue.
Sounds like you're trying to write to the same file. Only one thread/program can write to a file at a given time, so you can't use Parallel.
If you're reading from the same file, then you need to open the file with only read permissions as not to put a write lock on it.
The simplest way to fix the issue is to place a lock around your File.WriteAllText, assuming the writing is fast and it's worth parallelizing the rest of the code.