I have an wrapper class that has a method that goes off and downloads a file from a web server and needs to return said file. The HttpWebResponse object returns a Stream for the body.
Should I return a stream? Or should I convert it to a byte array and return that instead?
This wrapper class may be used in several places so I need a solid way to return the file. In every case, the file will be saved somewhere after receiving it from the adapter class.
Short answer: Yes, it's fine.
Long answer: Yes, it's completely safe to return a Stream. The garbage collector is smart in .NET and you don't have to worry about the Stream being disposed of or anything. (That is, unless you call Dispose() on it — which you should not if you are planning on reusing it.)
Returning Stream object is totally valid. It would be responsibility of the code that calls the method returning stream to dispose it. Question is: does disposing response object in your wrapper method also dispose the stream? If so, then copy its content into a MemoryStream or make a temp file and return open stream to it instead.
Related
I have found the following construct in some open-source code:
var mstream = new MemoryStream();
// ... write some data to mstream
mstream.Close();
byte[] b = mstream.GetBuffer();
I thought this code would have "unexpected" behavior and maybe throw an exception, since the call to Close should effectively be a call to Dispose according to the MSDN documentation.
However, as far as I have been able to tell from experimenting, the call to GetBuffer() always succeeds and returns a valid result, even if I Thread.Sleep for 20 seconds or enforce garbage collection via GC.Collect().
Should the call to GetBuffer() succeed even after Close/Dispose? In that case, why is not the underlying buffer released in the MemoryStream disposal?
It doesn't need to. The buffer is managed memory, so normal garbage collection will deal with it, without needing to be included in the disposal.
It's useful to be able to get the bytes of the memory stream, even after the stream has been closed (which may have happened automatically after the steam was passed to a method that writes something to a stream and then closes said stream). And for that to work, the object needs to hold onto the buffer along with a record of how much had been written to it.
In considering the second point, it actually makes more sense in a lot of cases to call ToArray() (which as said, requires the in-memory store that GetBuffer() returns to still be alive) after you've closed the stream, because having closed the stream guarantees that any further attempt to write to the stream will fail. Hence if you have a bug where you obtain the array too early, it will throw an exception rather than just give you incorrect data. (Obviously if you explicitly want to get the current array part-way through the stream operations, that's another matter). It also guarantees that all streams are fully flushed rather than having part of their data in a temporary buffer (MemoryStream isn't buffered because MemoryStream essentially is a buffer, but you may have been using it with chained streams or writers that had their own separate buffer).
Technically there's nothing to dispose in MemoryStream. Literally nothing, it doesn't have operating system handles, unmanaged resources, nothing. it is just a wrapper around byte[]. All what you can do is set the buffer(internal array) to null, that BCL team hasn't done for some reason.
As #mike noted in comments BCL team wanted GetBuffer and ToArray to work, even after disposed, though we're not sure why?. Reference source.
Here's how Dispose implemented.
protected override void Dispose(bool disposing)
{
try {
if (disposing) {
_isOpen = false;
_writable = false;
_expandable = false;
// Don't set buffer to null - allow GetBuffer & ToArray to work.
#if FEATURE_ASYNC_IO
_lastReadTask = null;
#endif
}
}
finally {
// Call base.Close() to cleanup async IO resources
base.Dispose(disposing);
}
}
and GetBuffer is below
public virtual byte[] GetBuffer()
{
if (!this._exposable)
{
throw new UnauthorizedAccessException(Environment.GetResourceString("UnauthorizedAccess_MemStreamBuffer"));
}
return this._buffer;
}
As you can see in Dispose _buffer is untouched, and in GetBuffer no disposed checks.
Since the GC is non-deterministic you cannot force it to immediately dispose the MemoryStream, thus the instance will not be marked as disposed immediately, instead it will just be marked for disposal. This means that for some time, until it is really disposed you can use some of its functions. Since it keeps a strong reference to its buffer you can get it, here is what the GetBuffer method looks like:
public virtual byte[] GetBuffer()
{
if (!this._exposable)
{
throw new UnauthorizedAccessException(Environment.GetResourceString("UnauthorizedAccess_MemStreamBuffer"));
}
return this._buffer;
}
Unlike most interface methods, the IDisposable.Dispose does not promise to do anything. Instead, it provides a standard means by which the owner of an object can let that object know that its services are no longer required, in case the object might need to make use of that information. If an object has asked outside entities to do something on its behalf, and has promised those outside entities that it will let them know when their services are no longer required, its Dispose method can relay the notification to those entities.
If an object has a method which can only be performed while the object has outside entities acting on its behalf, an attempt to call that method after those entities have been dismissed should throw an ObjectDisposedException rather than failing in some other way. Further, if there is a method which cannot possibly be useful after the entity is dismissed, it should often throw ObjectDisposedException even if a particular didn't actually need to use the entity. On the other hand, if a particular call would have a sensible meaning after an object has dismissed all entities that were acting on its behalf, there's no particular reason why such a call shouldn't be allowed to succeed.
I would view the ObjectDisposedException much like I view the collection-modified InvalidOperationException of IEnumerator<T>.MoveNext(): if some condition (either Dispose, or modification of a collection, respectively) would prevent a method from behaving "normally", the method is allowed to throw the indicated exception, and is not allowed to behave in some other erroneous fashion. On the other hand, if the method is capable of achieving its objectives without difficulty, and if doing so would make sense, such behavior should be considered just as acceptable as would be throwing an exception. In general, objects are not required to operate under such adverse conditions, but sometimes it may be helpful for them to do so [e.g. enumeration of a ConcurrentDictionary will not be invalidated by changes to the collection, since such invalidation would make concurrent enumeration useless].
So, after discovering that the Bitmap class expects the original stream to stay open for the life of the image or bitmap, I decided to find out if the Bitmap class actually closes the stream when it is disposed.
Looking at the source code, the Bitmap and Image classes create a GPStream instance to wrap the stream, but do not store a reference to either the GPStream or the Stream instance.
num = SafeNativeMethods.Gdip.GdipLoadImageFromStreamICM(new GPStream(stream), out zero);
Now, the GPStream class (internal), does not implement a Release or Dispose method - nothing that would allow GDI to close or dispose of the stream. And since the Image/Bitmap class doesn't keep a reference to the GPStream instance, it seems that there is absolutely no way for either GDI, Drawing.Bitmap, or Drawing.Stream to close the stream properly.
I could subclass Bitmap to fix this, but, oh wait, it's sealed.
Please tell me I'm wrong, and that MS didn't just make it impossible to write code that doesn't leak resources with their API.
Keep in mind (a), Bitmap has no managed reference to the stream, meaning GC will collect it while it is still in use, and (b) .NET APIs take Bitmap/Image references and aren't deterministic about when they're done with them.
Since you supply the stream in this example, I'd imagine you are responsible for disposing it.
It is a good practice to have the method that opens a stream, close it as well. That way it is easier to keep track of leaks. It would be quite strange to have an other object closing the stream that you opened.
Because bitmap can't guarantee in which order the destructor is called it will not close the stream because it may already have been closed with its own destructor during garbage collection. Jeffrey Richter's CLR via C# has a chapter on memory management that explains with much more clarity than I can.
An easy workaround to the problem is:
var image = new Bitmap(stream);
image.Tag = stream;
Now the stream is referenced by the image and won't be garbage collected before the image is. If your stream happens to be a MemoryStream, it doesn't need to be disposed (its Dispose is a no-op). If not, you can dispose it when you dispose the image, or just let the GC do it when it gets around to it.
I know this might seem silly, but why does the following code only work if I Close() the file? If I don't close the file, the entire stream is not written.
Steps:
Run this code on form load.
Close form using mouse once it is displayed.
Program terminates.
Shouldn't the file object be flushed or closed automatically when it goes out of scope? I'm new to C#, but I'm used to adding calls to Close() in C++ destructors.
// Notes: complete output is about 87KB. Without Close(), it's missing about 2KB at the end.
// Convert to png and then convert that into a base64 encoded string.
string b64img = ImageToBase64(img, ImageFormat.Png);
// Save the base64 image to a text file for more testing and external validation.
StreamWriter outfile = new StreamWriter("../../file.txt");
outfile.Write(b64img);
// If we don't close the file, windows will not write it all to disk. No idea why
// that would be.
outfile.Close();
C# doesn't have automatic deterministic cleanup. You have to be sure to call the cleanup function if you want to control when it runs. The using block is the most common way of doing this.
If you don't put in the cleanup call yourself, then cleanup will happen when the garbage collector decides the memory is needed for something else, which could be a very long time later.
using (StreamWriter outfile = new StreamWriter("../../file.txt")) {
outfile.Write(b64img);
} // everything is ok, the using block calls Dispose which closes the file
EDIT: As Harvey points out, while the cleanup will be attempted when the object gets collected, this isn't any guarantee of success. To avoid issues with circular references, the runtime makes no attempt to finalize objects in the "right" order, so the FileStream can actually already be dead by the time the StreamWriter finalizer runs and tries to flush buffered output.
If you deal in objects that need cleanup, do it explicitly, either with using (for locally-scoped usage) or by calling IDisposable.Dispose (for long-lived objects such as referents of class members).
Because Write() is buffered and the buffer is explicitly flushed by Close().
Streams are objects that "manage" or "handle" non-garbage collected resources. They (Streams) therefore implement the IDisposable interface that, when used with 'using' will make sure the non-garbage collected resources are clean up. try this:
using ( StreamWriter outfile = new StreamWriter("../../file.txt") )
{
outfile.Write(b64img);
}
Without the #Close, you can not be sure when the underlying file handle will be properly closed. Sometimes, this can be at app shutdown.
Because you are using a streamwriter and it doesn't flush the buffer until you Close() the writer. You can specify that you want the writer to flush everytime you call write by setting the AutoFlush property of the streamwriter to true.
Check out the docs. http://msdn.microsoft.com/en-us/library/system.io.streamwriter.aspx
If you want to write to a file without "closing", I would use:
System.IO.File
Operating system cache write to block devices to enable the OS to have better performance. You force a write by flushing the buffer after a write of setting the streamwriter to autoflush.
Because the C# designers were cloning Java and not C++ despite the name.
In my opinion they really missed the boat. C++ style destruction on scope exit would have been so much better.
It wouldn't even have to release the memory to be better, just automatically run the finalizer or the IDisposable method.
Does anyone know of a lazy stream implementation in .net? IOW, I want a to create a method like this:
public Stream MyMethod() {
return new LazyStream(...whatever parameters..., delegate() {
... some callback code.
});
}
and when my other code calls MyMethod() to return retrieve the stream, it will not actually perform any work until someone actually tries to read from the stream. The usual way would be to make MyMethod take the stream parameter as a parameter, but that won't work in my case (I want to give the returned stream to an MVC FileStreamResult).
To further explain, what I'm looking for is to create a layered series of transformations, so
Database result set =(transformed to)=> byte stream =(chained to)=> GZipStream =(passed to)=> FileStreamResult constructor.
The result set can be huge (GB), so I don't want to cache the result in a MemoryStream, which I can pass to the GZipStream constructor. Rather, I want to fetch from the result set as the GZipStream requests data.
Most stream implementations are, by nature, lazy streams. Typically, any stream will not read information from its source until it is requested by the user of the stream (other than some extra "over-reading" to allow for buffering to occur, which makes stream usage much faster).
It would be fairly easy to make a Stream implementation that did no reading until necessary by overriding Read to open the underlying resource and then read from it when used, if you need a fully lazy stream implementation. Just override Read, CanRead, CanWrite, and CanSeek.
In your Stream class you have to implement several methods of System.IO.Stream including the Read method.
What you do in this method is up to you. If you choose to call a delegate - this is up to you as well, and of course you can pass this delegate as one of the parameters of your constructor. At least this is how I would do it.
Unfortunately it will take more than implementing read method, and your delegate will not cover other required methods
This answer (https://stackoverflow.com/a/22048857/1037948) links to this article about how to write your own stream class.
To quote the answer:
The producer writes data to the stream and the consumer reads. There's a buffer in the middle so that the producer can "write ahead" a little bit. You can define the size of the buffer.
To quote the original source:
You can think of the ProducerConsumerStream as a queue that has a Stream interface. Internally, it's implemented as a circular buffer. Two indexes keep track of the insertion and removal points within the buffer. Bytes are written at the Head index, and removed from the Tail index.
If Head wraps around to Tail, then the buffer is full and the producer has to wait for some bytes to be read before it can continue writing. Similarly, if Tail catches up with Head, the consumer has to wait for bytes to be written before it can proceed.
The article goes on to describe some weird cases when the pointers wrap around, with full code samples.
Does anyone know where I can find a Stream splitter implementation?
I'm looking to take a Stream, and obtain two separate streams that can be independently read and closed without impacting each other. These streams should each return the same binary data that the original stream would. No need to implement Position or Seek and such... Forward only.
I'd prefer if it didn't just copy the whole stream into memory and serve it up multiple times, which would be fairly simple enough to implement myself.
Is there anything out there that could do this?
I have made a SplitStream available on github and NuGet.
It goes like this.
using (var inputSplitStream = new ReadableSplitStream(inputSourceStream))
using (var inputFileStream = inputSplitStream.GetForwardReadOnlyStream())
using (var outputFileStream = File.OpenWrite("MyFileOnAnyFilestore.bin"))
using (var inputSha1Stream = inputSplitStream.GetForwardReadOnlyStream())
using (var outputSha1Stream = SHA1.Create())
{
inputSplitStream.StartReadAhead();
Parallel.Invoke(
() => {
var bytes = outputSha1Stream.ComputeHash(inputSha1Stream);
var checksumSha1 = string.Join("", bytes.Select(x => x.ToString("x")));
},
() => {
inputFileStream.CopyTo(outputFileStream);
},
);
}
I have not tested it on very large streams, but give it a try.
github: https://github.com/microknights/SplitStream
Not out of the box.
You'll need to buffer the data from the original stream in a FIFO manner, discarding only data which has been read by all "reader" streams.
I'd use:
A "management" object holding some sort of queue of byte[] holding the chunks to be buffered and reading additional data from the source stream if required
Some "reader" instances which known where and on what buffer they are reading, and which request the next chunk from the "management" and notify it when they don't use a chunk anymore, so that it may be removed from the queue
This could be tricky without risking keeping everything buffered in memory (if the streams are at BOF and EOF respectively).
I wonder whether it isn't easier to write the stream to disk, copy it, and have two streams reading from disk, with self-deletion built into the Close() (i.e. write your own Stream wrapper around FileStream).
The below seems to be valid called EchoStream
http://www.codeproject.com/Articles/3922/EchoStream-An-Echo-Tee-Stream-for-NET
Its a very old implementation (2003) but should provide some context
found via Redirect writes to a file to a stream C#
You can't really do this without duplicating at least part of the sourse stream - mostly due to the fact that if doesn't sound like you can control the rate at which they are consumed (multiple threads?). You could do something clever regarding one reading ahread of the other (and thereby making the copy at that point only) but the complexiy of this sounds like it's not worth the trouble.
I do not think you will be able to find a generic implementation to do just that. A Stream is rather abstract, you don't know where the bytes are coming from. For instance you don't know if it will support seeking; and you don't know the relative cost of operations. (The Stream might be an abstraction of reading data from a remote server, or even off a backup tape !).
If you are able to have a MemoryStream and store the contents once, you can create two separate streams using the same buffer; and they will behave as independent Streams but only use the memory once.
Otherwise, I think you are best off by creating a wrapper class that stores the bytes read from one stream, until they are also read by the second stream. That would give you the desired forward-only behaviour - but in worst case, you might risk storing all of the bytes in memory, if the second Stream is not read until the first Stream has completed reading all content.
With the introduction of async / await, so long as all but one of your reading tasks are async, you should be able to process the same data twice using only a single OS thread.
What I think you want, is a linked list of the data blocks you have seen so far. Then you can have multiple custom Stream instances that hold a pointer into this list. As blocks fall off the end of the list, they will be garbage collected. Reusing the memory immediately would require some other kind of circular list and reference counting. Doable, but more complicated.
When your custom Stream can answer a ReadAsync call from the cache, copy the data, advance the pointer down the list and return.
When your Stream has caught up to the end of the cache list, you want to issue a single ReadAsync to the underlying stream, without awaiting it, and cache the returned Task with the data block. So if any other Stream reader also catches up and tries to read more before this read completes, you can return the same Task object.
This way, both readers will hook their await continuation to the result of the same ReadAsync call. When the single read returns, both reading tasks will sequentially execute the next step of their process.