Why isn't the StringBuilder class inherited from Stream? - c#

I'm just curious about this. It strikes me that the behavior of a StringBuilder is functionally (if not technically) the same as a Stream -- it's a bin of data to which other data can be added.
Again, just curious.

Stream is an input and output of binary data.
StringBuilder is means of building up text data.
Beyond that, there's the issue of state - a StringBuilder just has the current value, with no idea of "position". It allows you to access and mutate data anywhere within it. A stream, on the other hand, is logically a potentially infinite stream of data, with a cursor somewhere in the middle to say where you've got to. You generally just read/write forwards, with Seek/Position to skip to a specific part of the data stream.
Try to imagine implementing the Stream API with StringBuilder... it just doesn't fit. You could sort of do it, but you'd end up with StringReader and StringWriter, basically.

StringBuilder has more than just Append functions. It also has insert functions which is unnatural for a stream. Use the StringWriter class if you want a stream that wraps a StringBuilder.

A stream normally refers to an external input/output source (file, network). StringBuilder has no such characteristic.

Because it's not really a stream. It's more of a buffer that grows.

While both can have data added to them, the functionality as a whole is different.
A Stream is for inputting or outputting data from/to some source, not for building something. StringBuilder does not need functionality that Stream provides, like Buffering, etc to build a resource.

On the other hand you will find the classes StringReader/Writer in System.IO. The StringWriter e.g. implements TextWriter against an underlying StringBuilder.
Personally I've never used it but if you have a text file writing routine you could make it work against a TextWriter. Then in your test, instead of instantiating a StreamWriter you instantiate a StringWriter and you could then check what was written by looking at the underlying StringBuilder.
Now I'm dizzy...

Related

How to check name of element with WriteEndElement

I'm writing xml with XmlWriter. My code has lots of sections like this:
xml.WriteStartElement("payload");
ThirdPartyLibrary.Serialise(results, xml);
xml.WriteEndElement(); // </payload>
The problem is that the ThirdPartyLibrary.Serialise method is unreliable. It can happen (depending on the variable results) that it doesn't close all the tags it opens. As a consequence, my WriteEndElement line is perverted, consumed closing the library's hanging tags, rather than writing </payload>.
Thus I'd like to make a checked call to WriteEndElement that checks the element name, and throws an exception unless the cursor is at the expected element.
xml.WriteEndElement("payload");
You can think of this like XmlReader.ReadStartElement(name) which throws unless the cursor is at the expected place in the document.
How can I achieve this?
Edit: A second use case for this extension method would be to make my own code more readable and reliable.
XMLWriter is just writes the given xml information in the stream with out any validation. If it does any validation while writing the xml tags, the performance problem will arise while creating the big xml file.
Creating the XML file using XMLWriter is up to developer risk. If you want to do any such kind of validation, you can use XMLDocument.
If you really want to do this validation in XMLWriter, you have to create the writer by using String or StringBuilder. Because, if you use Stream or TextWriter you can't read the information which is written into the stream in the middle of writing. In Every update of the XML you have to read the string and write your own method to validate the written information.
I Suggest you to use XMLDocument for creating these type of xml.
In the end, I wrote an extention method WriteSubtree that gives this usable API:
using (var resultsXml = xml.WriteSubtree("Results"))
{
ThirdPartyLibrary.Serialise(results, resultsXml);
}
The extension method XmlWriter.WriteSubtree is analogous to .NET's XmlReader.ReadSubtree. It returns a special XmlWriter that checks against funny business. Its dispose method closes any tags left open.

Strings appear to be sticking around too long

In short, I've got an application that converts a flat data file into an XML file. It does this by populating objects and then serializing them to XML.
The problem I'm running into is that the Garbage Collector does not seem to be taking care of the serialized strings. 3500 record files are running up to OutOfMemoryExceptions before they finish. Something is fishy, indeed.
When I take the serialization out of the mix and simply pass an empty string, the memory consumption remains as expected, so I've ruled out the possibility that my intermediate objects (between flat file and xml) are the problem here. They seem to be collected as expected.
Can anyone help? How do I make sure these strings are disposed of properly?
Update: Some sample code
// myObj.Serialize invokes an XmlSerializer instance to handle its work
string serialized = myObj.Serialize();
myXmlWriter.WriteRaw(serialized);
This is basically where the problem is ocurring--if I take the string serialized out of play, the memory problems go away, too, even though I'm still transforming the flat file into objects, one at a time.
Update 2: Serialize method
public virtual string Serialize()
{
System.IO.StreamReader streamReader = null;
System.IO.MemoryStream memoryStream = null;
using (memoryStream = new MemoryStream())
{
memoryStream = new System.IO.MemoryStream();
Serializer.Serialize(memoryStream, this);
memoryStream.Seek(0, System.IO.SeekOrigin.Begin);
using (streamReader = new System.IO.StreamReader(memoryStream))
{
return streamReader.ReadToEnd();
}
}
}
You need to make sure they aren't referenced anywhere. Before an OutOfMemoryException is thrown, the GC is run. If it isn't recovering that memory, that means something is still holding on to it. Like others said, if you post some code, we might be able to help. Otherwise you can use a profiler or WinDbg/SOS to help figure out what is holding onto your strings.
Very curious indeed. I added the following dandy after each serialized record writes to the XmlWriter:
if (GC.GetTotalMemory(false) > 104857600)
{
GC.WaitForPendingFinalizers();
}
and wouldn't you know it, it's keeping it in check and it's processing without incident, never getting too far above the threshold I set. I feel like there should be a better way, but it almost seems like the code was executing too fast for the garbage collector to reclaim the strings in time.
Do you have an example of your code - how you're creating these strings? Are you breaking out into unmanaged code anywhere (which means you would be required to clean-up after yourself).
Another thought is how you are converting flat data file into XML. XML can be somewhat heavy depending on how you are building the file. If you are trying to hold the entire object in memory, it is very likely (easy to do, in fact) that you are running out of memory.
It sure looks like your method could be cleaned up to be just:
public virtual string Serialize()
{
StringBuilder sb = new StringBuilder();
using (StringWriter writer = new StringWriter(sb))
{
this.serializer.Serialize(writer, this);
}
return sb.ToString();
}
You are creating an extra MemoryStream for no reason.
But if you are writing the string to a file, then why don't you just send a FileStream to the Serialize() method?

EndOfStream for BinaryReader

BinaryReader does not have EndOfStream property. Is it safe to use following code to check if end of stream is reached?
reader.BaseStream.Length>reader.BaseStream.Position
The easiest way I found is to check the returned value of the BinaryReader's PeekChar() method. If it returns -1, then you reached the end of the stream.
It depends. There are various stream types that do not implement the Length or Position property, you'd get a NotSupportedException. NetworkStream for example. Of course, if you'd use such a stream then you really do have to know up front how often to call the BinaryReader.Read() method. So, yes, it's fine.
This won't work as a general solution because it assumes that the BaseStream value supports the Length property. Many Stream implementation do not and instead throw a NotSupportedException. In particular any networking base stream such as HttpRequestStream and NetworkStream
I've noticed that comparing Position to Length DOES NOT work on StreamReader even if the underlying BaseStream supports seeking. It seems that StreamReader buffers read-ahead from the BaseStream. This must be why StreamReader supplies an EndOfStream property, which is a good thing, and I wish BinaryReader did the same.
Checking these values (Length and Position) on the underlying base stream counts on BinaryReader to not behave as StreamReader does, i.e. relies on BinaryReader only grabbing the exact number of bytes from BaseStream needed to fulfill a user method call. Presumably if BinaryReader in fact operates this way internally it is why it does not need to supply an EndOfStream, but I sure wish it did supply one so that I knew that end of file was being correctly handled for clients in an implementation independent way.
Of course Readers are not Streams, but with respect to end of file behavior it would be nice if there were a common interface that enabled clients of input/output classes to know if A. end of file is a sensible concept for the underlying source of data, and B. when end of file occurs if A is sensible.
Check the Streams CanSeek property. If this property returns true then you can compare the streams Length to the stream's Position to tell if you are at the end of the stream. If this property returns false then this won't work.
For Network streams you may need to distinguish between the end of the available bytes (the client on the other end still have more to write but hasn't yet) and the stream being closed. The IsConnected property for an underlying Tcp connection isn't reliable for knowing when the stream has closed. It is possible to enumerate the connections that the computer has and see if the stream you are using is among them. This is more reliable, but more complex. It may be better to just handle IOExceptions when you can't read any

EndianBinaryReader - Contious update of the input stream?

I am trying to use the EndianBinaryReader and EndianBinaryWriter that Jon Skeet wrote as part of his misc utils lib. It works great for the two uses I have made of it.
The first reading from a Network Stream (TCPClient) where I sit in a loop reading the data as it comes in. I can create a single EndianBinaryReader and then just dispose of it on the shut down of the application. I construct the EndianBinaryReader by passing the TCPClient.GetStream in.
I am now trying to do the same thing when reading from a UdpClient but this does not have a stream as it is connection less. so I get the data like so
byte[] data = udpClientSnapShot.Receive(ref endpoint);
I could put this data into a memory stream
var memoryStream = new MemoryStream(data);
and then create the EndianBinaryReader
var endianbinaryReader = new EndianBinaryReader(
new BigEndianBitConverter(), memoryStream,Encoding.ASCII);
but this means I have to create a new endian reader every time I do a read. Id there a way where I can just create a single stream that I can just keep updateing the inputstream with the data from the udp client?
I can't remember whether EndianBinaryReader buffers - you could overwrite a single MemoryStream? But to be honest there is very little overhead from an extra object here. How big are the packets? (putting it into a MemoryStream will clone the byte[]).
I'd be tempted to use the simplest thing that works and see if there is a real problem. Probably the one change I would make is to introduce using (since they are IDisposable):
using(var memoryStream = new MemoryStream(data))
using(var endianbinaryReader = ..blah..) {
// use it
}
Your best option is probably an override of the .NET Stream class to provide your custom functionality. The class is designed to be overridable with custom behavior.
It may look daunting because of the number of members, but it is easier than it looks. There are a number of boolean properties like "CanWrite", etc. Override them and have them all return "false" except for the functionality that your reader needs (probably CanRead is the only one you need to be true.)
Then, just override all of the methods that start with the phrase "When overridden in a derived class" in the help for Stream and have the unsupported methods return an "UnsupportedException" (instead of the default "NotImplementedException".
Implement the Read method to return data from your buffered UDP packets using perhaps a linked list of buffers, setting used buffers to "null" as you read past them so that the memory footprint doesn't grow unbounded.

How to save the output of a console application

I need advice on how to have my C# console application display text to the user through the standard output while still being able access it later on. The actual feature I would like to implement is to dump the entire output buffer to a text file at the end of program execution.
The workaround I use while I don't find a cleaner approach is to subclass TextWriter overriding the writing methods so they would both write to a file and call the original stdout writer. Something like this:
public class DirtyWorkaround {
private class DirtyWriter : TextWriter {
private TextWriter stdoutWriter;
private StreamWriter fileWriter;
public DirtyWriter(string path, TextWriter stdoutWriter) {
this.stdoutWriter = stdoutWriter;
this.fileWriter = new StreamWriter(path);
}
override public void Write(string s) {
stdoutWriter.Write(s);
fileWriter.Write(s);
fileWriter.Flush();
}
// Same as above for WriteLine() and WriteLine(string),
// plus whatever methods I need to override to inherit
// from TextWriter (Encoding.Get I guess).
}
public static void Main(string[] args) {
using (DirtyWriter dw = new DirtyWriter("path", Console.Out)) {
Console.SetOut(dw);
// Teh codez
}
}
}
See that it writes to and flushes the file all the time. I'd love to do it only at the end of the execution, but I couldn't find any way to access to the output buffer.
Also, excuse inaccuracies with the above code (had to write it ad hoc, sorry ;).
The perfect solution for this is to use log4net with a console appender and a file appender. There are many other appenders available as well. It also allows you to turn the different appenders off and on at runtime.
I don't think there's anything wrong with your approach.
If you wanted reusable code, consider implementing a class called MultiWriter or somesuch that takes as input two (or N?) TextWriter streams and distributes all writs, flushes, etc. to those streams. Then you can do this file/console thing, but just as easily you can split any output stream. Useful!
Probably not what you want, but just in case... Apparently, PowerShell implements a version of the venerable tee command. Which is pretty much intended for exactly this purpose. So... smoke 'em if you got 'em.
I would say mimic the diagnostics that .NET itself uses (Trace and Debug).
Create a "output" class that can have different classes that adhere to a text output interface. You report to the output class, it automatically sends the output given to the classes you have added (ConsoleOutput, TextFileOutput, WhateverOutput).. And so on.. This also leaves you open to add other "output" types (such as xml/xslt to get a nicely formatted report?).
Check out the Trace Listeners Collection to see what I mean.
Consider refactoring your application to separate the user-interaction portions from the business logic. In my experience, such a separation is quite beneficial to the structure of your program.
For the particular problem you're trying to solve here, it becomes straightforward for the user-interaction part to change its behavior from Console.WriteLine to file I/O.
I'm working on implementing a similar feature to capture output sent to the Console and save it to a log while still passing the output in real time to the normal Console so it doesn't break the application (eg. if it's a console application!).
If you're still trying to do this in your own code by saving the console output (as opposed to using a logging system to save just the information you really care about), I think you can avoid the flush after each write, as long as you also override Flush() and make sure it flushes the original stdoutWriter you saved as well as your fileWriter. You want to do this in case the application is trying to flush a partial line to the console for immediate display (such as an input prompt, a progress indicator, etc), to override the normal line-buffering.
If that approach has problems with your console output being buffered too long, you might need to make sure that WriteLine() flushes stdoutWriter (but probably doesn't need to flush fileWriter except when your Flush() override is called). But I would think that the original Console.Out (actually going to the console) would automatically flush its buffer upon a newline, so you shouldn't have to force it.
You might also want to override Close() to (flush and) close your fileWriter (and probably stdoutWriter as well), but I'm not sure if that's really needed or if a Close() in the base TextWriter would issue a Flush() (which you would already override) and you might rely on application exit to close your file. You should probably test that it gets flushed on exit, to be sure. And be aware that an abnormal exit (crash) likely won't flush buffered output. If that's an issue, flushing fileWriter on newline may be desirable, but that's another tricky can of worms to work out.

Categories

Resources