I am trying to export SQL table data to a text file with '~' delimiter in C# code.
When data is small its fine. When it's huge, it is throwing an Out of memory exception.
My Code:
public static void DataTableToTextFile(DataTable dtToText, string filePath)
{
int i = 0;
StreamWriter sw = null;
try
{
sw = new StreamWriter(filePath, false); /*For ColumnName's */
for (i = 0; i < dtToText.Columns.Count - 1; i++)
{
sw.Write(dtToText.Columns[i].ColumnName + '~');
}
sw.Write(dtToText.Columns[i].ColumnName + '~');
sw.WriteLine(); /*For Data in the Rows*/
foreach (DataRow row in dtToText.Rows)
{
object[] array = row.ItemArray;
for (i = 0; i < array.Length - 1; i++)
{
sw.Write(array[i].ToString() + '~');
}
sw.Write(array[i].ToString() + '~');
sw.WriteLine();
}
sw.Close();
}
catch (Exception ex)
{
throw new Exception("");
}
}
Is there a better way to do this in a stored procedure or BCP command?
If there's no specific reason for using the ~ delimiter format, you might try using the DataTable WriteXml function (http://msdn.microsoft.com/en-us/library/system.data.datatable.writexml.aspx)
For example:
dtToText.WriteXml("c:\data.xml")
If you need to convert this text back to a DataTable later you can use ReadXml (http://msdn.microsoft.com/en-us/library/system.data.datatable.readxml.aspx)
If you really need to make the existing code work, I'd probably try closing and calling Dispose on the StreamWriter at a set interval, then reopen and append to the existing text.
I realize that this question is years old, but I recently experienced a similar problem. The solution: Briefly, I think that you're running into problems with the Windows Large Object Heap. A relevant link:
https://www.simple-talk.com/dotnet/.net-framework/the-dangers-of-the-large-object-heap/
To summarize the above article: When you allocate chunks of memory more than 85K long (which seems likely to happen behind the scenes in your StreamWriter object if the values in your DataTable are large enough), they go onto a separate heap, the Large Object Heap (LOH). Memory chunks in the LOH are deallocated normally when their lifetime expires, but the heap is not compacted. The net result is that a System.OutOfMemoryException is thrown, not because there isn't actually enough memory, but because there isn't enough contiguous memory in the heap at some point.
If you're using .NET framework 4.5.1 or later (which won't work on Visual Studio 2010 or before; it might work on VS2012), you can use this command:
System.Runtime.GCSettings.LargeObjectHeapCompactionMode = GCLargeObjectHeapCompactionMode.CompactOnce;
This command forces LOH compaction to happen at the next garbage collection. Just put that command as the first line in your function; it will be set to CompactOnce every time this function is called, which will cause LOH compaction at some indeterminate point after the function is called.
If you don't have .NET 4.5.1, it gets uglier. The problem is that the memory allocation isn't explicit; it's happening behind the scenes in your StreamWriter, most likely. Try calling GC.Collect(), forcing garbage collection, from time to time--perhaps every 3rd time this function is called.
A warning: Lots of people will advise you that calling GC.Collect() directly is a bad idea and will slow down your application--and they're right. I just don't know a better way to handle this problem.
Related
I'm using a datatable to hold a running last 1000 log messages in FIFO methodology. I add items into the datatable and remove first in row after the size grows to 1000 items. However, while the datatable doesn't exceed 1000 items the memory drops over time.
Sample:
DataTable dtLog = new DataTable();
for (int nLoop = 0; nLoop < 10000; nLoop++)
{
oLog LogType = new LogType();
oLog.Name = "Message number " + nLoop;
dtLog.Rows.Add( oLog);
if (dtLog.Rows.Count > 1000)
dtLog.Rows.RemoveAt(0);
}
So the messages are removed from the datatable, but the memory doesn't seem to get released. I would expect the memory to be released...?
Or perhaps there's a better way to do a running log using something other than datatables?
I can't speak to the memory leak part of your question as the Memory Management and Garbage Collection in .net makes that a hard thing to investigate.
But, what I can do is suggest that unless you have to, you should never use DataTables in .Net.
Now, "never" is a pretty strong claim! That sort of thing needs backing up with good reasons.
So,. what are those reasons? ... memory usage.
I created this .net fiddle: https://dotnetfiddle.net/wOtjw1
using System;
using System.Collections.Generic;
using System.Xml;
using System.Data;
public class DataObject
{
public string Name { get; set; }
}
public class Program
{
public static void Main()
{
Queue();
}
public static void DataTable()
{
var dataTable = new DataTable();
dataTable.Columns.Add("Name", typeof(string));
for (int nLoop = 0; nLoop < 10000; nLoop++)
{
var dataObject = new DataObject();
dataObject.Name = "Message number " + nLoop;
dataTable.Rows.Add(dataObject);
if (dataTable.Rows.Count > 1000)
dataTable.Rows.RemoveAt(0);
}
}
public static void Queue()
{
var queue = new Queue<DataObject>();
for (int nLoop = 0; nLoop < 10000; nLoop++)
{
var dataObject = new DataObject();
dataObject.Name = "Message number " + nLoop;
queue.Enqueue(dataObject);
if (queue.Count > 1000)
queue.Dequeue();
}
}
}
Run it twice, once with the DataTable method, once with the Queue method.
Look at the memory usage .net fiddle reports each time:
DataTable Memory: 2.74Mb
Queue Memory: 1.46Mb
It's almost half the memory usage! And all we did was stop using DataTables.
.Net DataTables are notoriously memory hungry. They have fairly good reasons for that, they can store lots of complex schema information and can track changes etc.
That's great, but ... do you need those features?
No? Dump the DT, use something under System.Collections(.Generic).
Whenever you modify/delete a row from DataTable the old/deleted data is still kept by the DataTable until you call DataTable.AcceptChanges
When AcceptChanges is called, any DataRow object still in edit mode successfully ends its edits. The DataRowState also changes: all Added and Modified rows become Unchanged, and Deleted rows are removed.
There is no memory leak because that is as designed.
As an alternative you can use a circular buffer which would fit better than a queue.
Your memory is released but it is not so easy to see. There is a lack of tools (except Windbg with SOS) to show the currently allocated memory minus dead objects. Windbg has for this the !DumpHeap -live option to display only live objects.
I have tried the fiddle from AndyJ https://dotnetfiddle.net/wOtjw1
First I needed to create a memory dump with DataTable to have a stable baseline. MemAnalyzer https://github.com/Alois-xx/MemAnalyzer is the right tool for that.
MemAnalyzer.exe -procdump -ma DataTableMemoryLeak.exe DataTable.dmp
This expects procdump from SysInternals in your path.
Now you can run the program with the queue implementation and compare the allocation metrics on the managed heap:
C>MemAnalyzer.exe -f DataTable.dmp -pid2 20792 -dtn 3
Delta(Bytes) Delta(Instances) Instances Instances2 Allocated(Bytes) Allocated2(Bytes) AvgSize(Bytes) AvgSize2(Bytes) Type
-176,624 -10,008 10,014 6 194,232 17,608 19 2934 System.Object[]
-680,000 -10,000 10,000 0 680,000 0 68 System.Data.DataRow
-7,514 -88 20,273 20,185 749,040 741,526 36 36 System.String
-918,294 -20,392 60,734 40,342 1,932,650 1,014,356 Managed Heap(Allocated)!
-917,472 0 0 0 1,954,980 1,037,508 Managed Heap(TotalSize)
This shows that we have 917KB more memory allocated with the DataTable approach and that 10K DataRow instances are still floating around on the managed heap. But are these numbers correct?
No.
Because most objects are already dead but no full GC did happen before we did take a memory dump these objects are still reported as alive. The fix is to tell MemAnalyzer to consider only rooted (live) objects like Windbg does it with the -live option:
C>MemAnalyzer.exe -f DataTable.dmp -pid2 20792 -dts 5 -live
Delta(Bytes) Delta(Instances) Instances Instances2 Allocated(Bytes) Allocated2(Bytes) AvgSize(Bytes) AvgSize2(Bytes) Type
-68,000 -1,000 1,000 0 68,000 0 68 System.Data.DataRow
-36,960 -8 8 0 36,960 0 4620 System.Data.RBTree+Node<System.Data.DataRow>[]
-16,564 -5 10 5 34,140 17,576 3414 3515 System.Object[]
-4,120 -2 2 0 4,120 0 2060 System.Data.DataRow[]
-4,104 -1 19 18 4,716 612 248 34 System.String[]
-141,056 -1,285 1,576 291 169,898 28,842 Managed Heap(Allocated)!
-917,472 0 0 0 1,954,980 1,037,508 Managed Heap(TotalSize)
The DataTable approach still needs 141,056 bytes more memory because of the extra DataRow, object[] and System.Data.RBTree+Node[] instances. Measuring only the Working set is not enough because the managed heap is lazy deallocated. The GC can keep large amounts of memory if it thinks that the next memory spike is not far away. Measuring committed memory is therefore a nearly meaningless metric except if your (very low hanging) goal is to fix only memory leaks of GB in size.
The correct way to measure things is to measure the sum of
Unmanaged Heap
Allocated Managed Heap
Memory Mapped Files
Page File baked Memory Mapped File (Shareable Memory)
Private Bytes
This is actually exactly what MemAnalyzer does with the -vmmap switch which expexct vmmap from Sysinternals in its path.
MemAnalyzer -pid ddd -vmmap
This way you can also track unmanaged memory leaks or file mapping leaks as well. The return value of MemAnalyzer is the total allocated memory in KB.
If -vmmap is used it will report the sum of the above points.
If vmmap is not present it will only report the allocated managed heap.
If -live is added then only rooted managed objects are reported.
I did write the tool because there are no tools out there to my knowledge which make it easy to look at memory leaks in a holistic way. I always want to know if I leak memory regardless if it is managed, unmanaged or something else.
By writing the diff output to a CSV file you can create easily Pivot diff charts like the one above.
MemAnalyzer.exe -f DataTable.dmp -pid2 20792 -live -o ExcelDiff.csv
That should give you some ideas how to track allocation metrics in a much more accurate way.
I want to create a hex editor to open large binary files.
This is my code. It works well for small files. But when I open large files, Hex editor faces problem.
data[] ... array of byte
string str = "";
byte[] temp = null;
int i;
for (i = 0; i < (data.Length - 16); i += 16)
{
temp = _sub_array(data, i, 16);
str += BitConverter.ToString(temp).Replace("-", "\t");
str += "\n";
}
temp = _sub_array(data, i, (data.Length - i));
str += BitConverter.ToString(temp).Replace("-", "\t");
richTextBox.Text = str;
As has been said in the comments, you should try to avoid reading in the entire file at once. However, if you need the entire file in memory at once, I think your main problem might be the "stickiness" that the program will experience while reading and converting data. You are wiser to use a separate thread for the hex work and let the main thread focus on keeping your UI operating smoothly. You could also use tasks instead of threads, either way. So using your code snippet, make it look more like this:
data[] ... array of byte
private void button1_Click(object sender, EventArgs e)
{
Thread t = new Thread(readHexFile);
t.Start();
}
private void readHexFile()
{
string str = "";
byte[] temp = null;
int i;
for (i = 0; i < (data.Length - 16); i += 16)
{
temp = _sub_array(data, i, 16);
str += BitConverter.ToString(temp).Replace("-", "\t");
str += "\n";
}
temp = _sub_array(data, i, (data.Length - i));
str += BitConverter.ToString(temp).Replace("-", "\t");
BeginInvoke(new Action(()=> richTextBox.Text = str));
}
You'll need to add "using System.Threading" to get access to threads. Also note the BeginInvoke with the richTextBox.Text work in a lambda expression. This is necessary when you run the data processing on a separate thread because if you try to access the textbox directly with that thread, Windows will complain about a cross-thread call. Only the thread that made the control is allowed to access it directly. BeginInvoke doesn't access the control directly, so you can use it from the data processing thread to get text written to the control. This will stop the data processing from "gumming up" the UI responsiveness.
This may seem intimidating at first if you have never done it, but trust me. If you get the hang of Threads and Tasks (which are different inside the machine but can be manipulated by similar developer tools) you will never want to render to the UI from the main thread again.
EDIT: I left the string from your code as it was, but I agree with the comment suggesting StringBuilder instead. Strings are immutable, so each time you concatenate to the string, internally what's happening is that the whole string is being scrapped and a new one is being made with the additional text. So yeah, do switch to a StringBuilder object as well.
So you've got working code for small files, but you face problems with large files. You don't mention what those problems are so here are a few guesses:
If you're loading the entire file into a byte[], then you could have a memory issue and possibly throw an OutOfMemoryException
You're concatenating a string repeatedly. This is not only a memory issue, but a performance one too (Reference Jon Skeet's article http://www.yoda.arachsys.com/csharp/stringbuilder.html)
You're _sub_array() is called repeatedly and returns a 16 length byte[], yet another memory and performance issue.
You call String.Replace() repeatedly (See bullet 2).
I consider these to be memory problems because we don't know when the Garbage Collector will clean up the memory.
So let's address these potential problems:
Read your file 16 bytes at a time (#EZI comment), this also eliminates the need for your _sub_array(). Look into the FileStream class to read 16 bytes at a time.
BitConverter.ToString() these 16 bytes into a StringBuilder with StringBuilder.AppendLine() (My comment), but don't do the String.Replace() until you're done reading the file.
Once you're done reading the file, you can assign the StringBuilder to your RichTextBox like so (sb is a variable name used for StringBuilder): richTextBox.Text = sb.ToString();
Hope this helps...
I've found a memory leak in my parser. I don't know how to fix that problem.
Let's see that basic routing.
private void parsePage() {
String[] tmp = null;
foreach (String row in rows) {
tmp = row.Split(new []{" "}, StringSplitOptions.None);
PrivateRow t = new PrivateRow();
t.Field1 = tmp[1];
t.Field2 = tmp[2];
t.Field3 = tmp[3];
t.Field4 = String.Join(" ", tmp);
myBigCollection.Add(t);
}
}
private void parseFromFile() {
String[] tmp = null;
foreach (String row in rows) {
PrivateRow t = new PrivateRow();
t.Field1 = "mystring1";
t.Field2 = "mystring2222";
t.Field3 = "mystring3333";
t.Field4 = "mystring1 xxx yy zzz";
myBigCollection.Add(t);
}
}
Launching parsePage(), on a collection (rows is a List of 100000 elements) make my app grown from 20MB to 70MB.
Launching parseFromFile(), that read SAME collection from file, but avoiding split/join, take about 1MB.
Using a MemoryProfiler, I see that "t" fields and PrivateRow, kkep reference to String.Split() array and Split.Join.
I suppose that's because I assign a reference, not a copy, that can be garbage collected.
Ok, use 70mb isn't a big deal, but when I launch on production, with a lot o site, it can raise 2.5-3GB...
Cheers
This isn't a memory leak per se. It's actually behaving properly. The reason your second function uses so much less memory, is simply because you only have four strings in use. Each of these four strings is allocated only once, and subsequent uses of the strings for new t.Fieldx instances actually refer to the same string values. Strings are immutable, so if you refer to the same string value more than once, it can be handled by the same string instances. See the paragraph labelled "Interning" at this article on String in .NET for some more detail on this.
In your first function, you have what are probably mostly different strings for each field, and each time through the loop. That simply is much more varied data. The fact that those strings are held on to is what you want to have happen for as long as your PrivateRow objects exist.
You don't have a memory leak at all, it's just garbage collector takes time to process it.
I suppose that's because I assign a reference, not a copy, that can
be garbage collected.
That is not correct assumption. string during assignment is copied, even if it is a reference type. It is special, kind of, unique type inside BCL.
Now what about possible solution, in case you have intensive memory pressure. If you have massive amount of string to process from file, you may look on 2 options.
1) Process them in sequence, by reading a srteam (not load all at once). Loading as less data in memory as possible/required/makes sence.
2) Use MemoryMappedFile to, again, load only chunks of data and process them in sequence.
2nd can be combined with 1st.
Like others have said, there is no evidence of a memory leak here, just delayed garbage collection. All memory should be cleaned up eventually.
That being said, there are a couple things you can do to help keep memory usage lower or recover it more quickly:
1)You should be able to replace
t.Field4 = String.Join(" ", tmp);
with
t.Field4 = row;
You created tmp by splitting row, then you're joining it back together. Avoid creating a new string by just using row.
2) Call GC.Collect(); at the end of the method to request immediate garbage collection. This won't reduce the memory used within the method, but it should free up memory more quickly.
If your application is memory-usage critical and there is a lot of repeating data you should replace string values with Enums.
I have desktop application developed in C#. The VM Size used by application is very high. I want to add watermark to a pdf file, which has more that 10,000 pages, 10776 pages to be exact, the VM size inscreases and some times the application freezes or it throws out of memory exception.
Is there a solution to release / decrease the VM size programatically in C#
Environment.FailFast :)
In all seriousness though, a large VM size is not necessarily an indication of a memory problem. I always get confused when it comes to the various memory metrics but I believe that VM size is a measurement of the amount of used address space, not necessarily used physical memory.
Here's another post on the topic: What does "VM Size" mean in the Windows Task Manager?
If you suspect that you have a problem with memory usage in your application, you probably need to consider using a memory profiler to find the root cause (pun intended.) It's a little tricky to get used to at first but it's a valuable skill. You'd be surprised what kind of performance issues surface when you're profiling.
This depends strongly on your source code. With the information given all I can say is that it would be best to get a memory profiler and check if there is space for optimizations.
Just to demonstrate you how memory usage might be optimized I would like to show you the following example. Instead of using string concatenation like this
string x = "";
for (int i=0; i < 100000; i++)
{
x += "!";
}
using a StringBuilder is far more memory- (and time-)efficient as it doesn't allocate a new string for each concatenation:
StringBuilder builder = new StringBuilder();
for (int i=0; i < 100000; i++)
{
builder.Append("!");
}
string x = builder.ToString();
The concatenation in the first sample creates a new string object on each iteration that occupies additional memory that will only be cleaned up when the garbage collector is running.
Edit2: I just want to make sure my question is clear: Why, on each iteration of AppendToLog(), the application uses 15mb more? (the size of the original log file)
I've got a function called AppendToLog() which receives the file path of an HTML document, does some parsing and appends it to a file. It gets called this way:
this.user_email = uemail;
string wanted_user = wemail;
string[] logPaths;
logPaths = this.getLogPaths(wanted_user);
foreach (string path in logPaths)
{
this.AppendToLog(path);
}
On every iteration, the RAM usage increases by 15mb or so. This is the function: (looks long but it's simple)
public void AppendToLog(string path)
{
Encoding enc = Encoding.GetEncoding("ISO-8859-2");
StringBuilder fb = new StringBuilder();
FileStream sourcef;
string[] messages;
try
{
sourcef = new FileStream(path, FileMode.Open);
}
catch (IOException)
{
throw new IOException("The chat log is in use by another process."); ;
}
using (StreamReader sreader = new StreamReader(sourcef, enc))
{
string file_buffer;
while ((file_buffer = sreader.ReadLine()) != null)
{
fb.Append(file_buffer);
}
}
//Array of each line's content
messages = parseMessages(fb.ToString());
fb = null;
string destFileName = String.Format("{0}_log.txt",System.IO.Path.GetFileNameWithoutExtension(path));
FileStream destf = new FileStream(destFileName, FileMode.Append);
using (StreamWriter swriter = new StreamWriter(destf, enc))
{
foreach (string message in messages)
{
if (message != null)
{
swriter.WriteLine(message);
}
}
}
messages = null;
sourcef.Dispose();
destf.Dispose();
sourcef = null;
destf = null;
}
I've been days with this and I don't know what to do :(
Edit: This is ParseMessages, a function that uses HtmlAgilityPack to strip parts of an HTML log.
public string[] parseMessages(string what)
{
StringBuilder sb = new StringBuilder();
HtmlDocument doc = new HtmlDocument();
doc.LoadHtml(what);
HtmlNodeCollection messageGroups = doc.DocumentNode.SelectNodes("//body/div[#class='mplsession']");
int messageCount = doc.DocumentNode.SelectNodes("//tbody/tr").Count;
doc = null;
string[] buffer = new string[messageCount];
int i = 0;
foreach (HtmlNode sessiongroup in messageGroups)
{
HtmlNode tablegroup = sessiongroup.SelectSingleNode("table/tbody");
string sessiontime = sessiongroup.Attributes["id"].Value;
HtmlNodeCollection messages = tablegroup.SelectNodes("tr");
if (messages != null)
{
foreach (HtmlNode htmlNode in messages)
{
sb.Append(
ParseMessageDate(
sessiontime,
htmlNode.ChildNodes[0].ChildNodes[0].InnerText
)
); //Date
sb.Append(" ");
try
{
foreach (HtmlTextNode node in htmlNode.ChildNodes[0].SelectNodes("text()"))
{
sb.Append(node.Text.Trim()); //Name
}
}
catch (NullReferenceException)
{
/*
* We ignore this exception, it just means there's extra text
* and that means that it's not a normal message
* but a system message instead
* (i.e. "John logged off")
* Therefore we add the "::" mark for future organizing
*/
sb.Append("::");
}
sb.Append(" ");
string message = htmlNode.ChildNodes[1].InnerHtml;
message = message.Replace(""", "'");
message = message.Replace(" ", " ");
message = RemoveMedia(message);
sb.Append(message); //Message
buffer[i] = sb.ToString();
sb = new StringBuilder();
i++;
}
}
}
messageGroups = null;
what = null;
return buffer;
}
As many have mentioned, this is probably just an artifact of the GC not cleaning up the memory storage as fast as you are expecting it to. This is normal for managed languages, like C#, Java, etc. You really need to find out if the memory allocated to your program is free or not if you're are interested in that usage. The questions to ask related to this are:
How long is your program running? Is it a service type program that runs continuously?
Over the span of execution does it continue to allocate memory from the OS or does it reach a steady-state? (Have you run it long enough to find out?)
Your code does not look like it will have a "memory-leak". In managed languages you really don't get memory leaks like you would in C/C++ (unless you are using unsafe or external libraries that are C/C++). What happens though is that you do need to watch out for references that stay around or are hidden (like a Collection class that has been told to remove an item but does not set the element of the internal array to null). Generally, objects with references on the stack (locals and parameters) cannot 'leak' unless you store the reference of the object(s) into an object/class variables.
Some comments on your code:
You can reduce the allocation/deallocation of memory by pre-allocating the StringBuilder to at least the proper size. Since you know you will need to hold the entire file in memory, allocate it to the file size (this will actually give you a buffer that is just a little bigger than required since you are not storing new-line character sequences but the file probably has them):
FileInfo fi = new FileInfo(path);
StringBuilder fb = new StringBuilder((int) fi.Length);
You may want to ensure the file exists before getting its length, using fi to check for that. Note that I just down-cast the length to an int without error checking as your files are less than 2GB based on your question text. If that is not the case then you should verify the length before casting it, perhaps throwing an exception if the file is too big.
I would recommend removing all the variable = null statements in your code. These are not necessary since these are stack allocated variables. As well, in this context, it will not help the GC since the method will not live for a long time. So, by having them you create additional clutter in the code and it is more difficult to understand.
In your ParseMessages method, you catch a NullReferenceException and assume that is just a non-text node. This could lead to confusing problems in the future. Since this is something you expect to normally happen as a result of something that may exist in the data you should check for the condition in the code, such as:
if (node.Text != null)
sb.Append(node.Text.Trim()); //Name
Exceptions are for exceptional/unexpected conditions in the code. Assigning significant meaning to NullReferenceException more than that there was a null reference can (likely will) hide errors in other parts of that same try block now or with future changes.
There is no memory leak. If you are using Windows Task Manager to measure the memory used by your .NET application you are not getting a clear picture of what is going on, because the GC manages memory in a complex way that Task Manager doesn't reflect.
A MS engineer wrote a great article about why .NET applications that seem to be leaking memory probably aren't, and it has links to very in depth explanations of how the GC actually works. Every .NET programmer should read them.
I would look carefully at why you need to pass a string to parseMessages, ie fb.ToString().
Your code comment says that this returns an array of each lines content. However you are actually reading all lines from the log file into fb and then converting to a string.
If you are parsing large files in parseMessages() you could do this much more efficiently by passing the StringBuilder itself or the StreamReader into parseMessages(). This would enable only loading a portion of the file into memory at any time, as opposed to using ToString() which currently forces the entire logfile into memory.
You are less likely to have a true memory leak in a .NET application thanks to garbage collection. You do not look to be using any large resources such as files, so it seems even less likely that you have an actual memory leak.
It looks like you have disposed of resources ok, however the GC is probably struggling to allocate and then deallocate the large memory chunks in time before the next iteration starts, and so you see the increasing memory usage.
While GC.Collect() may allow you to force memory deallocation, I would strongly advise looking into the suggestions above before resorting to trying to manually manage memory via GC.
[Update] Seeing your parseMessages() and the use of HtmlAgilityPack (a very useful library, by the way) it looks likely there are some large and possibly numerous allocations of memory being performed for every logile.
HtmlAgility allocates memory for various nodes internally, when combined with your buffer array and the allocations in the main function I'm even more confident that the GC is being put under a lot of pressure to keep up.
To stop guessing and get some real metrics, I would run ProcessExplorer and add the columns to show the GC Gen 0,1,2 collections columns. Then run your application and observe the number of collections. If you're seeing large numbers in these columns then the GC is struggling and you should redesign to use less memory allocations.
Alternatively, the free CLR Profiler 2.0 from Microsoft provides nice visual representation of .NET memory allocations within your application.
One thing you may want to try, is temporarily forcing a GC.Collect after each run. The GC is very intelligent, and will not reclaim memory until is feels the expense of a collection is worth the value of any recovered memory.
Edit: I just wanted to add that its important to understand that calling GC.Collect manually is a bad practice (for any normal use case. Abnormal == perhaps a load function for a game or somesuch). You should let the garbage collector decide whats best, as it will generally have more information than avaliable to you about system resources and the like on which to base its collection behaviour.
The try-catch block could use a finally (cleanup). If you look at what the using statement does, it is equivalent to try catch finally. Yes, running GC is a good idea also. Without compiling this code and giving it a try it is hard to say for sure ...
Also, dispose this guy properly using a using:
FileStream destf = new FileStream(destFileName, FileMode.Append);
Look up Effective C# 2nd edition
I would manually clear the array of message and the stringbuilder before the setting them to null.
edit
looking at what the process seem to do I got a suggestion, if it's not too late instead of parsing an html file.
create a dataset schemas and use that to write and read an xml log file and use a xsl file to convert it into an html file.
I don't see any obvious memory leaks; my first guess would be that it's something in the library.
A good tool to figure this kind of thing out is the .NET Memory Profiler, by SciTech. They have a free two-week trial.
Short of that, you could try commenting out some of the library functions, and see if the problem goes away if you just read the files and do nothing with the data.
Also, where are you looking for memory use stats? Keep in mind that the stats reported by Task Manager aren't always very useful or reflective of actual memory use.
HtmlDocument class (as far as I can determin) has a serious memory leak when used from managed code. I reccomend using the XMLDOM parser instead (though this does require well formed documents, but thats another +).