I have a StringBuilder that appends all the pixel in an image, this amount being extremely large. Every time I run my program, everything goes well, but once I change a pixel color (ArGB) I get a OutOfMemoryException at the spot where I clear the StringBuilder. The problem is that I need to create an instance of StreamWriter then add my text to it THEN set the file path.| My current code it:
StringBuilder PixelFile = new StringBuilder("", 5000);
Private void Render()
{
//One second run, I get an OutOfMemoryException
PixelFile.Clear();
//This is in a for but cut it out for reverence.
PixelFile.Append(ArGBFormat);
}
I do not know what is causing this. I have tried PixelFile.Length = 0; and PixelFile.Capacity = 0;
OutOfMemory probably means you're building the string too big for StringBuilder, which is designed to handle a very different type of operation.
While I'm at a loss for how to make StringBuilder work, let me point you at a more intuitive implementation that will be less likely to fail.
You can read and write from a file using direct binary through the BinaryReader and BinaryWriter classes. This can also save you a lot of effort since you can make sure you're serializing bytes instead of character strings or entire words.
If you absolutely must use plaintext, consider the StreamReader and StreamWriter classes directly, as they won't throw exceptions for size. Remember, streams are intended for this sort of operation, StringBuilder is not, so Streams are far more likely to work with far less effort on your part.
EDIT:
When the maximum capacity is reached, no further memory can be allocated for the StringBuilder object, and trying to add characters or expand it beyond its maximum capacity throws either an ArgumentOutOfRangeException or an OutOfMemoryException exception.
Therefore, this is a limitation of the StringBuilder class and cannot be overcome with your current implementation.
EDIT: Additional implementation
In addition to StreamWriters which can write directly to files, you can also use the MemoryStream class to pipe information to memory instead of disk. Be aware this could lead to slow performance of the program, and I recommend instead trying to refactor the process to only need to perform a stream once.
That being said, it is still possible.
var mem = new MemoryStream();
var memWriter = new StreamWriter(mem);
// TODO: use memWriter.Write as per StreamWriter
mem.Position = 0; // This ensures you are copying your stream from the beginning
// TODO: Show your file save dialog
var fileStream = new StreamWriter(fileNameFromDialog);
mem.CopyTo(fileWriter); // Perform the copy
Related
I'm creating an application that will take an image in a certain format from one of a video game's files and convert it to a DDS. This requires me to build the DDS in a buffer and then write it out to a DDS file. This buffer is of type List<byte>.
I first write the magic number, which is just the text "DDS ", with this code:
ddsFile.AddRange(Encoding.ASCII.GetBytes("DDS "));
I then need to write the header size, which is always 0x7C000000 (124), and this is where I've hit a wall. I used this code to write it to the buffer:
ddsFile.AddRange(BitConverter.GetBytes(0x0000007C));
This made sense to me because Encoding.ASCII.GetBytes()says itself that it returns a byte[], and it does accept an int as a parameter, no problem. And additionally, this was what I saw recommended when looking for a method for adding multi-byte values to a byte list. But for whatever reason, when the program tries to execute that line, this exception is thrown:
Unable to cast object of type 'System.Byte[]' to type 'System.IConvertible'.
But what's even more strange to the point of being ridiculous is that, upon seeing what did make it into the buffer, I see that the int actually was being written to the buffer, but the exception was still occurring for who knows what reason.
Bizarrely, even writing a single byte to the list after writing the magic number e.g. ddsFile.Add((byte)0x00)); results in the same thing.
Any help in figuring out why this exception occurs and/or a solution would be greatly appreciated.
This is not an answer to the question but a suggestion to do it differently.
Instead of using a List<byte> and manually doing all the conversions (while certainly possible, it's cumbersome), use a stream and a BinaryWriter - the stream can be a memory stream if you want to buffer the image in memory or a file stream if you want to write it to disk right away.
Using a BinaryWriter against the stream makes the conversions a lot simpler (and you can still manually convert parts of the data easily, if you need to do so).
Here's a short example:
var ms = new MemoryStream();
var bw = new BinaryWriter(ms, Encoding.ASCII);
bw.Write("DDS ");
bw.Write(124); // writes 4 bytes
bw.Write((byte) 124); // writes 1 byte
...
Use whichever overload of Write() you need to output the right bytes. (This short example omits cleaning up things but if you use a file stream, you'll need to make sure that you properly close it.)
I am developing an application that reads lines from enormous text files (~2.5 GB), manipulates each line to a specific format, and then writes each line to a text file. Once the output text file has been closed, the program "Bulk Inserts" (SQL Server) the data into my database. It works, it's just slow.
I am using StreamReader and StreamWriter.
I'm pretty much stuck with reading one line at a time due to how I have to manipulate the text; however, I think that if I made a collection of lines and wrote out the collection every 1000 lines or so, it would speed things up at least a bit. The problem is (and this could be purely from my ignorance) that I cannot write a string[] using StreamWriter. After exploring StackOverflow and the rest of the internet, I came across File.WriteAllLines, which allows me to write string[]s to file, but I dont think my computer's memory can handle 2.5 GB of data being stored at one time. Also, the file is created, populated, and closed, so I would have to make a ton of smaller files to break down the 2 GB text files only to insert them into the database. So I would prefer to stay away from that option.
One hack job that I can think of is making a StringBuilder and using the AppendLine method to add each line to make a gigantic string. Then I could convert that StringBuilder to a string and write it to file.
But enough of my conjecturing. The method I have already implemented works, but I am wondering if anyone can suggest a better way to write chunks of data to a file?
Two things will increase the speed of output using StreamWriter.
First, make sure that the output file is on a different physical disk than the input file. If the input and output are on the same drive, then very often reads have to wait for writes and writes have to wait for reads. The disk can do only one thing at a time. Obviously not every read or write waits, because the StreamReader reads into a buffer and parses lines out of it, and the StreamWriter writes to a buffer and then pushes that to disk when the buffer is full. With the input and output files on separate drives, your reads and writes overlap.
What do I mean they overlap? The operating system will typically read ahead for you, so it can be buffering your file while you're processing. And when you do a write, the OS typically buffers that and writes it to the disk lazily. So there is some limited amount of asynchronous processing going on.
Second thing is to increase your buffer size. The default buffer size for StreamReader and StreamWriter is 4 kilobytes. So every 4K read or written incurs an operating system call. And, quite likely, a disk operation.
If you increase the buffer size to 64K, then you make 16 times fewer OS calls and 16 times fewer disk operations (not strictly true, but close). Going to a 64K buffer can cut more than 25% off your I/O time, and it's dead simple to do:
const int BufferSize = 64 * 1024;
var reader = new StreamReader(filename, Encoding.UTF8, true, BufferSize);
var writer = new StreamWriter(filename, Encoding.UTF8, BufferSize);
Those two things will speed your I/O more than anything else you can do. Trying to build buffers in memory using StringBuilder is just unnecessary work that does a bad job of duplicating what you can achieve by increasing the buffer size, and done incorrectly can easily make your program slower.
I would caution against buffer sizes larger than 64 KB. On some systems, you get marginally better results with buffers up to 256 KB, but on others you get dramatically worse performance--to the tune of 50% slower! I've never seen a system perform better with buffers larger than 256 KB than they do with buffers of 64 KB. In my experience, 64 KB is the sweet spot.
One other thing you can do is use three threads: a reader, a processor, and a writer. They communicate with queues. This can reduce your total time from (input-time + process-time + output-time) to something very close to max(input-time, process-time, output-time). And with .NET, it's really easy to set up. See my blog posts: Simple multithreading, Part 1 and Simple multithreading, Part 2.
According to the docs, StreamWriter does not automatically flush after every write by default, so it is buffered.
You could also use some of the lazy methods on the File class like so:
File.WriteAllLines("output.txt",
File.ReadLines("filename.txt").Select(ProcessLine));
where ProcessLine is declared like so:
private string ProcessLine(string input) {
string result = // do some calculation on input
return result;
}
Since ReadLines is lazy and WriteAllLines has a lazy overload, it will stream the file rather than attempting to read the whole thing.
What about building strings to write?
Something like
int cnt = 0;
StringBuilder s = new StringBuilder();
while(line = reader.readLine())
{
cnt++;
String x = (manipulate line);
s.append(x+"\n");
if(cnt%10000 == 0)
{
StreamWriter.write(s);
s=new StringBuilder();
}
}
Edited because comment below is right, should have used stringbuilder.
I want to read a file into a RichTextBox without using LoadFile (I might want to display the progress). The file contains only ASCII characters.
I was thinking of reading the file in chunks.
I have done the following (which is working):
const int READ_BUFFER_SIZE = 4 * 1024;
BinaryReader reader = new BinaryReader(File.Open("file.txt", FileMode.Open));
byte[] buf = new byte[READ_BUFFER_SIZE];
do {
int ret = reader.Read(buf, 0, READ_BUFFER_SIZE);
if (ret <= 0) {
break;
}
string text = Encoding.ASCII.GetString(buf);
richTextBox.AppendText(text);
} while (true);
My concern is:
string text = Encoding.ASCII.GetString(buf);
I have seen that it is not possible to add a byte[] to a RichTextBox.
My questions are:
Will a new string object be allocated for every chunk which is read?
Isn't there a better way not to have to create a string object just for appending the text to the RichTextBox?
Or, is it more efficient to read lines from the file (StreamReader.ReadLine) and just add to the RichTextBox the string returned?
Will a new string object be allocated for every chunk which is read?
Yes.
Isn't there a better way not to have to create a string object just for appending the text to the RichTextBox?
No, AppendText requires a string
Or, is it more efficient to read lines from the file (StreamReader.ReadLine) and just add to the RichTextBox the string returned?
No, that's considerably less efficient. You'll now create a new string object much more frequently. Which is okay from the garbage collected heap perspective, you don't create more garbage. But it is absolute murder on the RichTextBox, it constantly needs to re-allocate its own buffer. Which includes moving all the text previously read. What you have is already good, you should just use a much larger READ_BUFFER_SIZE.
Unfortunately there are conflicting goals here. You don't want to make the buffer larger than 39,999 bytes or the strings end up in the Large Object Heap and clog it up until a gen# 2 garbage collection happens. But the RTB will be much happier if you go considerably past that size, like a megabyte if the file is so large that you need a progress bar.
If you want to make it really efficient then you need to replace RichTextBox.LoadFile(). The underlying Windows message is EM_STREAMIN, it uses a callback mechanism to stream in the text. You can technically replace the callback to do what the default one does in RichTextBox, plus update a progress bar. It does permit getting rid of the strings btw. The pinvoke is pretty unfriendly, use the Reference Source for guidance.
Take the easy route first, increase the buffer size. Only consider using the pinvoke route when your code is considerably slower than using File.ReadAllText().
Try this:
richTextBox.AppendText(File.ReadAllText("file.txt"));
or
richTextBox.AppendText(File.ReadAllText("file.txt", Encoding.ASCII));
You can use a StreamReader. Then you can read eacht row of the file and display the progress while reading.
My application need to parse some large string data. Which means I am heavily using Split, IndexOf and SubString method of string class. I am trying to use StringBuilder class whereever I have to do any concatenation. However when application is doing this parsing, app cpu usage goes high (60-70%). I am guessing that calling these string APIs is what's causing cpu usage to go high, speically the size of data is big (typical Length of string is 400K). Any ideas how can I verify what is causing cpu usage to go that high and also if there are any suggestion on how to bring cpu usage down?
One thing to check is that you are passing the StringBuilder around as much as possible, rather than creating a new one and then returning it's ToString() needlessly.
A much bigger gain though can be made if you process the data as smaller strings, read from a stream. Of course, this depends on just what sort of manipulation you are doing, but if at all possible, read your data from a StreamReader (or similar depending on the source) in small chunks, and then write it to a StreamWriter.
Often changes are only applicable within a given line of text, which makes the following pattern immediately useful:
using(StreamReader sr = new StreamReader(sourceInfo))
using(StreamWriter sw = new StreamWriter(destInfo))
for(string line = sr.ReadLine(); line != null; line = sr.ReadLine())
sw.WriteLine(ManipulateString(line));
In other cases where this doesn't apply, there are still ways to chunk the string to be processed up.
To find out where the CPU usage is coming from: see What Are Some Good .NET Profilers?
To reduce CPU usage: it depends, of course, on what's actually taking the time. You might, for instance, consider working not with actual substrings but with little objects encoding where they are within the big strings they came from. (There is no guarantee that this will actually be an improvement.) Very likely, when you profile your code there will be a few things that jump out at you as problems; they may well be things you'd never have guessed, and they may be very easy to fix as soon as you know they need fixing.
Further to Jon's answer if your parser does not need to do back-tracking i.e. it always reads through the sting in a forward direction and the source of the string is not a file/network stream that you can use a StreamReader with just wrap your String in a StringReader instead e.g.
//Create a StringReader using the String variable data which has your String in it
//A StringReader is just a TextReader implementation for Strings
StringReader reader = new StringReader(data);
//Now do whatever manipulation on the string you want...
In your case you are using typically very large string (Length of string is 400K).. For operations on large string we can use "ROPE" data structure, which is very efficient for your case
Please refer below links for more information
https://iq.opengenus.org/rope-data-structure/
https://www.geeksforgeeks.org/ropes-data-structure-fast-string-concatenation/
STL ropes in c++ : https://www.geeksforgeeks.org/stl-ropes-in-c/
Edit2: I just want to make sure my question is clear: Why, on each iteration of AppendToLog(), the application uses 15mb more? (the size of the original log file)
I've got a function called AppendToLog() which receives the file path of an HTML document, does some parsing and appends it to a file. It gets called this way:
this.user_email = uemail;
string wanted_user = wemail;
string[] logPaths;
logPaths = this.getLogPaths(wanted_user);
foreach (string path in logPaths)
{
this.AppendToLog(path);
}
On every iteration, the RAM usage increases by 15mb or so. This is the function: (looks long but it's simple)
public void AppendToLog(string path)
{
Encoding enc = Encoding.GetEncoding("ISO-8859-2");
StringBuilder fb = new StringBuilder();
FileStream sourcef;
string[] messages;
try
{
sourcef = new FileStream(path, FileMode.Open);
}
catch (IOException)
{
throw new IOException("The chat log is in use by another process."); ;
}
using (StreamReader sreader = new StreamReader(sourcef, enc))
{
string file_buffer;
while ((file_buffer = sreader.ReadLine()) != null)
{
fb.Append(file_buffer);
}
}
//Array of each line's content
messages = parseMessages(fb.ToString());
fb = null;
string destFileName = String.Format("{0}_log.txt",System.IO.Path.GetFileNameWithoutExtension(path));
FileStream destf = new FileStream(destFileName, FileMode.Append);
using (StreamWriter swriter = new StreamWriter(destf, enc))
{
foreach (string message in messages)
{
if (message != null)
{
swriter.WriteLine(message);
}
}
}
messages = null;
sourcef.Dispose();
destf.Dispose();
sourcef = null;
destf = null;
}
I've been days with this and I don't know what to do :(
Edit: This is ParseMessages, a function that uses HtmlAgilityPack to strip parts of an HTML log.
public string[] parseMessages(string what)
{
StringBuilder sb = new StringBuilder();
HtmlDocument doc = new HtmlDocument();
doc.LoadHtml(what);
HtmlNodeCollection messageGroups = doc.DocumentNode.SelectNodes("//body/div[#class='mplsession']");
int messageCount = doc.DocumentNode.SelectNodes("//tbody/tr").Count;
doc = null;
string[] buffer = new string[messageCount];
int i = 0;
foreach (HtmlNode sessiongroup in messageGroups)
{
HtmlNode tablegroup = sessiongroup.SelectSingleNode("table/tbody");
string sessiontime = sessiongroup.Attributes["id"].Value;
HtmlNodeCollection messages = tablegroup.SelectNodes("tr");
if (messages != null)
{
foreach (HtmlNode htmlNode in messages)
{
sb.Append(
ParseMessageDate(
sessiontime,
htmlNode.ChildNodes[0].ChildNodes[0].InnerText
)
); //Date
sb.Append(" ");
try
{
foreach (HtmlTextNode node in htmlNode.ChildNodes[0].SelectNodes("text()"))
{
sb.Append(node.Text.Trim()); //Name
}
}
catch (NullReferenceException)
{
/*
* We ignore this exception, it just means there's extra text
* and that means that it's not a normal message
* but a system message instead
* (i.e. "John logged off")
* Therefore we add the "::" mark for future organizing
*/
sb.Append("::");
}
sb.Append(" ");
string message = htmlNode.ChildNodes[1].InnerHtml;
message = message.Replace(""", "'");
message = message.Replace(" ", " ");
message = RemoveMedia(message);
sb.Append(message); //Message
buffer[i] = sb.ToString();
sb = new StringBuilder();
i++;
}
}
}
messageGroups = null;
what = null;
return buffer;
}
As many have mentioned, this is probably just an artifact of the GC not cleaning up the memory storage as fast as you are expecting it to. This is normal for managed languages, like C#, Java, etc. You really need to find out if the memory allocated to your program is free or not if you're are interested in that usage. The questions to ask related to this are:
How long is your program running? Is it a service type program that runs continuously?
Over the span of execution does it continue to allocate memory from the OS or does it reach a steady-state? (Have you run it long enough to find out?)
Your code does not look like it will have a "memory-leak". In managed languages you really don't get memory leaks like you would in C/C++ (unless you are using unsafe or external libraries that are C/C++). What happens though is that you do need to watch out for references that stay around or are hidden (like a Collection class that has been told to remove an item but does not set the element of the internal array to null). Generally, objects with references on the stack (locals and parameters) cannot 'leak' unless you store the reference of the object(s) into an object/class variables.
Some comments on your code:
You can reduce the allocation/deallocation of memory by pre-allocating the StringBuilder to at least the proper size. Since you know you will need to hold the entire file in memory, allocate it to the file size (this will actually give you a buffer that is just a little bigger than required since you are not storing new-line character sequences but the file probably has them):
FileInfo fi = new FileInfo(path);
StringBuilder fb = new StringBuilder((int) fi.Length);
You may want to ensure the file exists before getting its length, using fi to check for that. Note that I just down-cast the length to an int without error checking as your files are less than 2GB based on your question text. If that is not the case then you should verify the length before casting it, perhaps throwing an exception if the file is too big.
I would recommend removing all the variable = null statements in your code. These are not necessary since these are stack allocated variables. As well, in this context, it will not help the GC since the method will not live for a long time. So, by having them you create additional clutter in the code and it is more difficult to understand.
In your ParseMessages method, you catch a NullReferenceException and assume that is just a non-text node. This could lead to confusing problems in the future. Since this is something you expect to normally happen as a result of something that may exist in the data you should check for the condition in the code, such as:
if (node.Text != null)
sb.Append(node.Text.Trim()); //Name
Exceptions are for exceptional/unexpected conditions in the code. Assigning significant meaning to NullReferenceException more than that there was a null reference can (likely will) hide errors in other parts of that same try block now or with future changes.
There is no memory leak. If you are using Windows Task Manager to measure the memory used by your .NET application you are not getting a clear picture of what is going on, because the GC manages memory in a complex way that Task Manager doesn't reflect.
A MS engineer wrote a great article about why .NET applications that seem to be leaking memory probably aren't, and it has links to very in depth explanations of how the GC actually works. Every .NET programmer should read them.
I would look carefully at why you need to pass a string to parseMessages, ie fb.ToString().
Your code comment says that this returns an array of each lines content. However you are actually reading all lines from the log file into fb and then converting to a string.
If you are parsing large files in parseMessages() you could do this much more efficiently by passing the StringBuilder itself or the StreamReader into parseMessages(). This would enable only loading a portion of the file into memory at any time, as opposed to using ToString() which currently forces the entire logfile into memory.
You are less likely to have a true memory leak in a .NET application thanks to garbage collection. You do not look to be using any large resources such as files, so it seems even less likely that you have an actual memory leak.
It looks like you have disposed of resources ok, however the GC is probably struggling to allocate and then deallocate the large memory chunks in time before the next iteration starts, and so you see the increasing memory usage.
While GC.Collect() may allow you to force memory deallocation, I would strongly advise looking into the suggestions above before resorting to trying to manually manage memory via GC.
[Update] Seeing your parseMessages() and the use of HtmlAgilityPack (a very useful library, by the way) it looks likely there are some large and possibly numerous allocations of memory being performed for every logile.
HtmlAgility allocates memory for various nodes internally, when combined with your buffer array and the allocations in the main function I'm even more confident that the GC is being put under a lot of pressure to keep up.
To stop guessing and get some real metrics, I would run ProcessExplorer and add the columns to show the GC Gen 0,1,2 collections columns. Then run your application and observe the number of collections. If you're seeing large numbers in these columns then the GC is struggling and you should redesign to use less memory allocations.
Alternatively, the free CLR Profiler 2.0 from Microsoft provides nice visual representation of .NET memory allocations within your application.
One thing you may want to try, is temporarily forcing a GC.Collect after each run. The GC is very intelligent, and will not reclaim memory until is feels the expense of a collection is worth the value of any recovered memory.
Edit: I just wanted to add that its important to understand that calling GC.Collect manually is a bad practice (for any normal use case. Abnormal == perhaps a load function for a game or somesuch). You should let the garbage collector decide whats best, as it will generally have more information than avaliable to you about system resources and the like on which to base its collection behaviour.
The try-catch block could use a finally (cleanup). If you look at what the using statement does, it is equivalent to try catch finally. Yes, running GC is a good idea also. Without compiling this code and giving it a try it is hard to say for sure ...
Also, dispose this guy properly using a using:
FileStream destf = new FileStream(destFileName, FileMode.Append);
Look up Effective C# 2nd edition
I would manually clear the array of message and the stringbuilder before the setting them to null.
edit
looking at what the process seem to do I got a suggestion, if it's not too late instead of parsing an html file.
create a dataset schemas and use that to write and read an xml log file and use a xsl file to convert it into an html file.
I don't see any obvious memory leaks; my first guess would be that it's something in the library.
A good tool to figure this kind of thing out is the .NET Memory Profiler, by SciTech. They have a free two-week trial.
Short of that, you could try commenting out some of the library functions, and see if the problem goes away if you just read the files and do nothing with the data.
Also, where are you looking for memory use stats? Keep in mind that the stats reported by Task Manager aren't always very useful or reflective of actual memory use.
HtmlDocument class (as far as I can determin) has a serious memory leak when used from managed code. I reccomend using the XMLDOM parser instead (though this does require well formed documents, but thats another +).