MemoryStream instance timing help

MemoryStream instance timing help - c#

Is it ok to instance a MemoryStream at the top of my method, do a bunch of stuff to it, and then use it?
For instance:
public static byte[] TestCode()
{
MemoryStream m = new MemoryStream();
...
...
whole bunch of stuff in between
...
...
//finally
using(m)
{
return m.ToArray();
}
}
Updated code
public static byte[] GetSamplePDF()
{
using (MemoryStream m = new MemoryStream())
{
Document document = new Document();
PdfWriter.GetInstance(document, m);
document.Open();
PopulateTheDocument(document);
document.Close();
return m.ToArray();
}
}
private static void PopulateTheDocument(Document document)
{
Table aTable = new Table(2, 2);
aTable.AddCell("0.0");
aTable.AddCell("0.1");
aTable.AddCell("1.0");
aTable.AddCell("1.1");
document.Add(aTable);
for (int i = 0; i < 20; i++)
{
document.Add(new Phrase("Hello World, Hello Sun, Hello Moon, Hello Stars, Hello Sea, Hello Land, Hello People. "));
}
}
My point was to try to reuse building the byte code. In other words, build up any kind of document and then send it to TestCode() method.

Technically, this is possible, but it's pointless. If you really want to avoid using the "using" statement around that code, just call Dispose() directly.
You should put the entire work that's using the MemoryStream into the using statement. This guarantees that the MemoryStream's Dispose method will be called, even if you receive an exception during your "whole bunch of stuff in between" code. The way you have it now, exceptions will prevent your MemoryStream from having Dispose() called on it.
The proper way to handle this would be:
public static byte[] TestCode()
{
MemoryStream m = new MemoryStream();
using(m)
{
// ...
// ...
// whole bunch of stuff in between
// ...
// ...
return m.ToArray();
}
}
Or, in the more common form:
public static byte[] TestCode()
{
using(MemoryStream m = new MemoryStream())
{
// ...
// ...
// whole bunch of stuff in between
// ...
// ...
return m.ToArray();
}
}

When seeing questions like these, I often wonder if we wouldn't have been better off using the Java model. There's an extraordinary amount of agony that .NET programmers suffer over that doggone IDisposable. After thousands of questions on SO (et al), it still remains poorly understood.
It is a memory stream. There's nothing that needs to be disposed when you use memory, the garbage collector already takes care of it. It is not some kind of special memory just because the class has a Dispose() method, there's only one kind. The other kind is wrapped by UnmanagedMemoryStream. The fact that MemoryStream inherits a do-nothing Dispose() method from Stream is a sad OOP liability.
It is up to you to decide to slovenly call a do-nothing method because it is there. Or you could take charge of your code and refuse to call methods that you know don't do anything useful now, nor will ever do anything useful for the rest of your career. Clearly I'm in the second camp, our life-expectancy must be better. I hope anyway. Then again, this post might have knocked a day off.

Related

Decompressing gzipped ReadOnlyMemory<byte> before I do JsonDocument.Parse

The websocket client is returning a ReadOnlyMemory<byte>.
The issue is that JsonDocument.Parse fails due to the fact that the buffer has been compressed. I've got to decompress it somehow before I parse it. How do I do that? I cannot really change the websocket library code.
What I want is something like public Func<ReadOnlyMemory<byte>> DataInterpreterBytes = () => which optionally decompresses these bytes out of this class. How do I do that? Is it possible to decompress ReadOnlyMemory<byte> and if the handler is unused to basically to do nothing.
private static string DecompressData(byte[] byteData)
{
using var decompressedStream = new MemoryStream();
using var compressedStream = new MemoryStream(byteData);
using var deflateStream = new GZipStream(compressedStream, CompressionMode.Decompress);
deflateStream.CopyTo(decompressedStream);
decompressedStream.Position = 0;
using var streamReader = new StreamReader(decompressedStream);
return streamReader.ReadToEnd();
}
Snippet
private void OnMessageReceived(object? sender, MessageReceivedEventArgs e)
{
var timestamp = DateTime.UtcNow;
_logger.LogTrace("Message was received. {Message}", Encoding.UTF8.GetString(e.Message.Buffer.Span));
// We dispose that object later on
using var document = JsonDocument.Parse(e.Message.Buffer);
var tokenData = document.RootElement;

So, if you had a byte array, you'd do this:
private static JsonDocument DecompressData(byte[] byteData)
{
using var compressedStream = new MemoryStream(byteData);
using var deflateStream = new GZipStream(compressedStream, CompressionMode.Decompress);
return JsonDocument.Parse(deflateStream);
}
This is similar to your snippet above, but no need for the intermediate copy: just read straight from the GzipStream. JsonDocument.Parse also has an overload that takes a stream, so you can use that and avoid yet another useless copy.
Unfortunately, you don't have a byte array, you have a ReadOnlyMemory<byte>. There is no way out of the box to create a memory stream out of a ReadOnlyMemory<byte>. Honestly, it feels like an oversight, like they forgot to put that feature into .NET.
So here are your options instead.
The first option is to just convert the ReadOnlyMemory<byte> object to an array with ToArray():
// assuming e.Message.Buffer is a ReadOnlyMemory<byte>
using var document = DecompressData(e.Message.Buffer.ToArray());
This is really straightforward, but remember it actually copies the data, so for large documents it might not be a good idea if you want to avoid using too much memory.
The second is to try and extract the underlying array from the memory. This can be achieved with MemoryMarshal.TryGetArray, which gives you an ArraySegment (but might fail if the memory isn't actually a managed array).
private static JsonDocument DecompressData(ReadOnlyMemory<byte> byteData)
{
if(MemoryMarshal.TryGetArray(byteData, out var segment))
{
using var compressedStream = new MemoryStream(segment.Array, segment.Offset, segment.Count);
// rest of the code goes here
}
else
{
// Welp, this memory isn't actually an array, so... tough luck?
}
}
The third way might feel dirty, but if you're okay with using unsafe code, you can just pin the memory's span and then use UnmanagedMemoryStream:
private static unsafe JsonDocument DecompressData(ReadOnlyMemory<byte> byteData)
{
fixed (byte* ptr = byteData.Span)
{
using var compressedStream = new UnmanagedMemoryStream(ptr, byteData.Length);
using var deflateStream = new GZipStream(compressedStream, CompressionMode.Decompress);
return JsonDocument.Parse(deflateStream);
}
}
The other solution is to write your own Stream class that supports this. The Windows Community Toolkit has an extension method that returns a Stream wrapper around the memory object. If you're not okay with using an entire third party library just for that, you can probably just roll your own, it's not that much code.

How should I wrap a MemoryStream to have access to the data that was written to it even after it is closed

I am attempting to use a ThirdParty library to write some data to a MemoryStream so that I can compare the output as part of some unit tests. Unfortunately the Third Party library closes the MemoryStream during execution of it Save() method.
Thus I have the following code:
byte[] expected = LoadExpectedResult("Test1");
using (var memoryStream = new MemoryStream()) {
ThirdPartyLibrary.Save(memoryStream);
var result = memoryStream.ToArray();
ConfirmBinaryBlobsAreSufficentlyEqual(expected, result);
}
Unfortunately it appears that the memoryStream.ToArray() function is only returning the last 3398 bytes that were the last loaded into the buffer as it has been disposed as part of the save process.
Is there anything I can wrap the MemoryStream in so that as data is written to it, it gets read out or written to another memory stream so that when it is disposed of I can still have access to the data.
Update
For clarity the Save() method also does the writing out so before it is called the MemoryStream is empty. I think the writers of the library expected you to only pass in FileStreams.

You can try with:
public class MyMemoryStream : MemoryStream
{
public bool CanDispose { get; set; }
public override void Close()
{
if (!CanDispose)
{
return;
}
base.Close();
}
}
In the Stream class, the Dispose() calls Close() that then calls Dispose(bool disposing). Close() is virtual, so I overrode it.
After using the stream, set CanDispose = true and then let it be disposed normally.
byte[] expected = LoadExpectedResult("Test1");
using (var memoryStream = new MyMemoryStream()) {
// implicitly memoryStream.CanDispose == false;
ThirdPartyLibrary.Save(memoryStream);
var result = memoryStream.ToArray();
ConfirmBinaryBlobsAreSufficentlyEqual(expected, result);
memoryStream.CanDispose = true;
}

To circumvent the bug/behaviour: you could either copy the memorystream to a second instance or write it to some temp-file before calling ThirdPartyLibrary.Save.

Close a filestream without Flush()

Can I close a file stream without calling Flush (in C#)? I understood that Close and Dispose calls the Flush method first.

MSDN is not 100% clear, but Jon Skeet is saying "Flush", so do it before close/dispose. It won't hurt, right?
From FileStream.Close Method:
Any data previously written to the buffer is copied to the file before
the file stream is closed, so it is not necessary to call Flush before
invoking Close. Following a call to Close, any operations on the file
stream might raise exceptions. After Close has been called once, it
does nothing if called again.
Dispose is not as clear:
This method disposes the stream, by writing any changes to the backing
store and closing the stream to release resources.
Remark: the commentators might be right, it's not 100% clear from the Flush:
Override Flush on streams that implement a buffer. Use this method to
move any information from an underlying buffer to its destination,
clear the buffer, or both. Depending upon the state of the object, you
might have to modify the current position within the stream (for
example, if the underlying stream supports seeking). For additional
information see CanSeek.
When using the StreamWriter or BinaryWriter class, do not flush the
base Stream object. Instead, use the class's Flush or Close method,
which makes sure that the data is flushed to the underlying stream
first and then written to the file.
TESTS:
var textBytes = Encoding.ASCII.GetBytes("Test123");
using (var fileTest = System.IO.File.Open(#"c:\temp\fileNoCloseNoFlush.txt", FileMode.CreateNew))
{
fileTest.Write(textBytes,0,textBytes.Length);
}
using (var fileTest = System.IO.File.Open(#"c:\temp\fileCloseNoFlush.txt", FileMode.CreateNew))
{
fileTest.Write(textBytes, 0, textBytes.Length);
fileTest.Close();
}
using (var fileTest = System.IO.File.Open(#"c:\temp\fileFlushNoClose.txt", FileMode.CreateNew))
{
fileTest.Write(textBytes, 0, textBytes.Length);
fileTest.Flush();
}
using (var fileTest = System.IO.File.Open(#"c:\temp\fileCloseAndFlush.txt", FileMode.CreateNew))
{
fileTest.Write(textBytes, 0, textBytes.Length);
fileTest.Flush();
fileTest.Close();
}
What can I say ... all files got the text - maybe this is just too little data?
Test2
var rnd = new Random();
var size = 1024*1024*10;
var randomBytes = new byte[size];
rnd.NextBytes(randomBytes);
using (var fileTest = System.IO.File.Open(#"c:\temp\fileNoCloseNoFlush.bin", FileMode.CreateNew))
{
fileTest.Write(randomBytes, 0, randomBytes.Length);
}
using (var fileTest = System.IO.File.Open(#"c:\temp\fileCloseNoFlush.bin", FileMode.CreateNew))
{
fileTest.Write(randomBytes, 0, randomBytes.Length);
fileTest.Close();
}
using (var fileTest = System.IO.File.Open(#"c:\temp\fileFlushNoClose.bin", FileMode.CreateNew))
{
fileTest.Write(randomBytes, 0, randomBytes.Length);
fileTest.Flush();
}
using (var fileTest = System.IO.File.Open(#"c:\temp\fileCloseAndFlush.bin", FileMode.CreateNew))
{
fileTest.Write(randomBytes, 0, randomBytes.Length);
fileTest.Flush();
fileTest.Close();
}
And again - every file got its bytes ... to me it looks like it's doing what I read from MSDN: it doesn't matter if you call Flush or Close before dispose ... any thoughts on that?

You don't have to call Flush() on Close()/Dispose(), FileStream will do it for you as you can see from its source code:
http://referencesource.microsoft.com/#mscorlib/system/io/filestream.cs,e23a38af5d11ddd3
[System.Security.SecuritySafeCritical] // auto-generated
protected override void Dispose(bool disposing)
{
// Nothing will be done differently based on whether we are
// disposing vs. finalizing. This is taking advantage of the
// weak ordering between normal finalizable objects & critical
// finalizable objects, which I included in the SafeHandle
// design for FileStream, which would often "just work" when
// finalized.
try {
if (_handle != null && !_handle.IsClosed) {
// Flush data to disk iff we were writing. After
// thinking about this, we also don't need to flush
// our read position, regardless of whether the handle
// was exposed to the user. They probably would NOT
// want us to do this.
if (_writePos > 0) {
FlushWrite(!disposing); // <- Note this
}
}
}
finally {
if (_handle != null && !_handle.IsClosed)
_handle.Dispose();
_canRead = false;
_canWrite = false;
_canSeek = false;
// Don't set the buffer to null, to avoid a NullReferenceException
// when users have a race condition in their code (ie, they call
// Close when calling another method on Stream like Read).
//_buffer = null;
base.Dispose(disposing);
}
}

I've been tracking a newly introduced bug that seems to indicate .NET 4 does not reliably flush changes to disk when the stream is disposed (unlike .NET 2.0 and 3.5, which always did so reliably).
The .NET 4 FileStream class has been heavily modified in .NET 4, and while the Flush*() methods have been rewritten, similar attention seems to have been forgotten for .Dispose().
This is resulting in incomplete files.

Since you've stated that you understood that close & dispose called the flush method if it was not called explicitly by user code, I believe that (by close without flush) you actually want to have a possibility to discard changes made to a FileStream, if necessary.
If that is correct, using a FileStream alone won't help. You will need to load this file into a MemoryStream (or an array, depending on how you modify its contents), and then decide whether you want to save changes or not after you're done.
A problem with this is file size, obviously. FileStream uses limited size write buffers to speed up operations, but once they are depleted, changes need to be flushed. Due to .NET memory limits, you can only expect to load smaller files in memory, if you need to hold them entirely.
An easier alternative would be to make a disk copy of your file, and work on it using a plain FileStream. When finished, if you need to discard changes, simply delete the temporary file, otherwise replace the original with a modified copy.

Wrap the FileStream in a BufferedStream and close the filestream before the buffered stream.
var fs = new FileStream(...);
var bs = new BufferedStream(fs, buffersize);
bs.Write(datatosend, 0, length);
fs.Close();
try {
bs.Close();
}
catch (IOException) {
}

Using Flush() is worthy inside big Loops.
when you have to read and write a big File inside one Loop. In other case the buffer or the computer is big enough, and doesn´t matter to close() without making one Flush() before.
Example: YOU HAVE TO READ A BIG FILE (in one format) AND WRITE IT IN .txt
StreamWriter sw = .... // using StreamWriter
// you read the File ...
// and now you want to write each line for this big File using WriteLine ();
for ( .....) // this is a big Loop because the File is big and has many Lines
{
sw.WriteLine ( *whatever i read* ); //we write here somrewhere ex. one .txt anywhere
sw.Flush(); // each time the sw.flush() is called, the sw.WriteLine is executed
}
sw.Close();
Here it is very important to use Flush(); beacause otherwise each writeLine is save in the buffer and does not write it until the buffer is frull or until the program reaches sw.close();
I hope this helps a little to understand the function of Flush

I think it is safe to use simple using statement, which closes the stream after the call to GetBytes();
public static byte[] GetBytes(string fileName)
{
byte[] buffer = new byte[4096];
using (FileStream fs = new FileStream(fileName))
using (MemoryStream ms = new MemoryStream())
{
fs.BlockCopy(ms, buffer, 4096); // extension method for the Stream class
fs.Close();
return ms.ToByteArray();
}
}

Storing MemoryStream in Cache

I've come across this code in one of my projects, which has a static function to return a MemoryStream from a file, which is then stored in Cache. Now the same class has a constructor which allows to store a MemoryStream in a private variable and later use it. So it looks like this:
private MemoryStream memoryStream;
public CountryLookup(MemoryStream ms)
{
memoryStream = ms;
}
public static MemoryStream FileToMemory(string filePath)
{
MemoryStream memoryStream = new MemoryStream();
ReadFileToMemoryStream(filePath, memoryStream);
return memoryStream;
}
Usage:
Context.Cache.Insert("test",
CountryLookup.FileToMemory(
ConfigurationSettings.AppSettings["test"]),
new CacheDependency(someFileName)
);
And then:
CountryLookup cl = new CountryLookup(
((MemoryStream)Context.Cache.Get("test"))
);
So I was wondering who should dispose the memoryStream and when? Ideally CountryLookup should implement IDisposable.
Should I even care about it?

It's slightly ugly - in particular, the MemoryStream is stateful, because it has the concept of the "current position".
Why not just store a byte array instead? You can easily build multiple MemoryStreams which wrap the same byte array when you need to, and you don't need to worry about the statefulness.
MemoryStreams don't usually require disposal, but I personally tend to dispose them out of habit. If you perform asynchronous operations on them or use them in remoting, I believe disposal does make a difference at that point. Byte arrays are just simpler :)

Does my code properly clean up its List<MemoryStream>?

I've got a third-party component that does PDF file manipulation. Whenever I need to perform operations I retrieve the PDF documents from a document store (database, SharePoint, filesystem, etc.). To make things a little consistent I pass the PDF documents around as a byte[].
This 3rd party component expects a MemoryStream[] (MemoryStream array) as a parameter to one of the main methods I need to use.
I am trying to wrap this functionality in my own component so that I can use this functionality for a number of areas within my application. I have come up with essentially the following:
public class PdfDocumentManipulator : IDisposable
{
List<MemoryStream> pdfDocumentStreams = new List<MemoryStream>();
public void AddFileToManipulate(byte[] pdfDocument)
{
using (MemoryStream stream = new MemoryStream(pdfDocument))
{
pdfDocumentStreams.Add(stream);
}
}
public byte[] ManipulatePdfDocuments()
{
byte[] outputBytes = null;
using (MemoryStream outputStream = new MemoryStream())
{
ThirdPartyComponent component = new ThirdPartyComponent();
component.Manipuate(this.pdfDocumentStreams.ToArray(), outputStream);
//move to begining
outputStream.Seek(0, SeekOrigin.Begin);
//convert the memory stream to a byte array
outputBytes = outputStream.ToArray();
}
return outputBytes;
}
#region IDisposable Members
public void Dispose()
{
for (int i = this.pdfDocumentStreams.Count - 1; i >= 0; i--)
{
MemoryStream stream = this.pdfDocumentStreams[i];
this.pdfDocumentStreams.RemoveAt(i);
stream.Dispose();
}
}
#endregion
}
The calling code to my "wrapper" looks like this:
byte[] manipulatedResult = null;
using (PdfDocumentManipulator manipulator = new PdfDocumentManipulator())
{
manipulator.AddFileToManipulate(file1bytes);
manipulator.AddFileToManipulate(file2bytes);
manipulatedResult = manipulator.Manipulate();
}
A few questions about the above:
Is the using clause in the AddFileToManipulate() method redundant and unnecessary?
Am I cleaning up things OK in my object's Dispose() method?
Is this an "acceptable" usage of MemoryStream? I am not anticipating very many files in memory at once...Likely 1-10 total PDF pages, each page about 200KB. App designed to run on server supporting an ASP.NET site.
Any comments/suggestions?
Thanks for the code review :)

AddFileToManipulate scares me.
public void AddFileToManipulate(byte[] pdfDocument)
{
using (MemoryStream stream = new MemoryStream(pdfDocument))
{
pdfDocumentStreams.Add(stream);
}
}
This code is adding a disposed stream to your pdfDocumentStream list. Instead you should simply add the stream using:
pdfDocumentStreams.Add(new MemoryStream(pdfDocument));
And dispose of it in the Dispose method.
Also you should look at implementing a finalizer to ensure stuff gets disposed in case someone forgets to dispose the top level object.

Is the using clause in the AddFileToManipulate() method redundant and unnecessary?
Worse, it's destructive. You're basically closing your memory stream before it's added in. See the other answers for details, but basically, dispose at the end, but not any other time. Every using with an object causes a Dispose to happen at the end of the block, even if the object is "passed off" to other objects via methods.
Am I cleaning up things OK in my object's Dispose() method?
Yes, but you're making life more difficult than it needs to be. Try this:
foreach (var stream in this.pdfDocumentStreams)
{
stream.Dispose();
}
this.pdfDocumentStreams.Clear();
This works just as well, and is much simpler. Disposing an object does not delete it - it just tells it to free it's internal, unmanaged resources. Calling dispose on an object in this way is fine - the object stays uncollected, in the collection. You can do this and then clear the list in one shot.
Is this an "acceptable" usage of MemoryStream? I am not anticipating very many files in memory at once...Likely 1-10 total PDF pages, each page about 200KB. App designed to run on server supporting an ASP.NET site.
This depends on your situation. Only you can determine whether the overhead of having these files in memory is going to cause you problems. This is going to be a fairly heavy-weight object, though, so I'd use it carefully.
Any comments/suggestions?
Implement a finalizer. It's a good idea whenever you implement IDisposable. Also, you should rework your Dispose implementation to the standard one, or mark your class as sealed. For details on how this should be done, see this article. In particular, you should have a method declared as protected virtual void Dispose(bool disposing) that your Dispose method and your finalizer both call.

It looks to me like you misunderstand what Using does.
It's just syntactic sugar to replace
MemoryStream ms;
try
{
ms = new MemoryStream();
}
finally
{
ms.Dispose();
}
Your usage in AddFileToManipulate is redundant. I'd set up the list of memorystreams in the constructor of PdfDocumentManipulator, then have PdfDocumentManipulator's dispose method call dispose on all the memorystreams.

Side note. This really seems like it calls for an extension method.
public static void DisposeAll<T>(this IEnumerable<T> enumerable)
where T : IDisposable {
foreach ( var cur in enumerable ) {
cur.Dispose();
}
}
Now your Dispose method becomes
public void Dispose() {
pdfDocumentStreams.Reverse().DisposeAll();
pdfDocumentStreams.Clear();
}
EDIT
You don't need the 3.5 framework in order to have extension methods. They will happily work on the 3.0 compiler down targeted to 2.0
http://blogs.msdn.com/jaredpar/archive/2007/11/16/extension-methods-without-3-5-framework.aspx

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

MemoryStream instance timing help - c#

Related

Decompressing gzipped ReadOnlyMemory<byte> before I do JsonDocument.Parse

How should I wrap a MemoryStream to have access to the data that was written to it even after it is closed

Close a filestream without Flush()

Storing MemoryStream in Cache

Does my code properly clean up its List<MemoryStream>?

Categories

Resources