I was trying to compress a byte array using DeflateStream. After wring the data, I was looking for a way to close the compression (mark as done). At first, I tried Dispose() then Close(), but those made the result MemoryStream unreadable. Then I thought I may need to Flush(), but the description said "The current implementation of this method has no functionality"
But it seems that without Flush(), the result seems empty. What does it mean that "The current implementation of this method has no functionality"?
static void Main(string[] args)
{
byte[] result = Encoding.UTF8.GetBytes("わたしのこいはみなみのかぜにのってはしるわ");
var output = new MemoryStream();
var dstream = new DeflateStream(output, CompressionLevel.Optimal);
dstream.Write(result, 0, result.Length);
var compressedSize1 = output.Position;
dstream.Flush();
var compressedSize2 = output.Position;
What does it mean that "The current implementation of this method has no functionality" ?
it means: it doesn't do anything, so calling Flush() by itself isn't going to help. You need to close the deflate stream for it to finish writing what it needs, but without closing the underlying stream (blocks will be written as you write to the stream, as the internal buffer fills - but the final partial block cannot be written until the data is ready to be terminated; not all compression sequences support being able to flush at an arbitrary point and then resume compressed data in additional blocks)
What you probably want is:
using (var dstream = new DeflateStream(output, CompressionLevel.Optimal, true))
{
// Your deflate code
}
// Now check the memory stream
Note the extra bool leaveOpen parameter usage.
Related
Taken from here:
private static string SerializeToString<T>(T value)
{
using (var stream = new MemoryStream()) {
var formatter = new BinaryFormatter();
formatter.Serialize(stream, value);
stream.Flush();
stream.Position = 0;
return Convert.ToBase64String(stream.ToArray());
}
}
private static T DeserializeFromString<T>(string data)
{
byte[] b = Convert.FromBase64String(data);
using (var stream = new MemoryStream(b)) {
var formatter = new BinaryFormatter();
stream.Seek(0, SeekOrigin.Begin);
return (T)formatter.Deserialize(stream);
}
}
Why do I need to flush and set the position to 0 in the serialize method, and seek in the deserialize method?
I removed them, they didn't affect anything.
I know that flushing means write whatever that's in the stream immediately.
But I don't know if it's necessary here... also not sure about the position and seek.
These samples contain unecessary code. The documentation for MemoryStream.ToArray (here) explicitly states that:
Writes the stream contents to a byte array, regardless of the Position
property.
Thus, we clearly don't need to set position. The flush is more debatable. It's very, very unlikely that memory stream would buffer under the hood, since it's just writing to a memory buffer anyway. However, I'm not sure that it's documented anywhere that memory stream won't buffer, so Flush() might be reasonable since we're calling ToArray() before disposing the stream. Another approach would be to call ToArray() outside the using block (we'd have to move the declaration of the variable out as well). This will work because ToArray() states that:
This method works when the MemoryStream is closed.
On the read side, you are creating a new stream, which starts at position 0 by default. Thus, there's no need for the Seek call.
In an utility method, which accepts a Stream parameter, I rely on some StreamReader to analyse data.
I don't want to close the incoming stream in my method. I want to let the caller method to take the decision to dispose the stream.
Is it safe to not dispose the opened StreamReader? I mean, will it eventually be automatically disposed? Will it lead to memory leaks?
Here is my utility method. Its goal is to read a Stream, and return its content as a string, regardless of how the data is encoded:
public static string GetStringAutoDetectEncoding(Stream data, out Encoding actualEncoding)
{
// 1. Is there a Bye Order Mask ?
var candidateEncoding = DetectEncodingWithByteOrderMask(data);
// 2a. No BOM, the data is either UTF8 no BOM or ANSI
if (candidateEncoding == Encoding.Default)
{
var utf8NoBomEncoding = Encoding.GetEncoding("utf-8",new EncoderExceptionFallback(), new DecoderExceptionFallback());
var positionBackup = data.Position;
var sr = new StreamReader(data, utf8NoBomEncoding);
try
{
// 3. Try as UTF8 With no BOM
var result = sr.ReadToEnd(); // will throw error if not UTF8
actualEncoding = utf8NoBomEncoding; // Probably an UTF8 no bom string
return result;
}
catch (DecoderFallbackException)
{
// 4. Rewind the stream and fallback to ASNI
data.Position = positionBackup;
var srFallback = new StreamReader(data, candidateEncoding);
actualEncoding = candidateEncoding;
return srFallback.ReadToEnd(); ;
}
}
// 2b. There is a BOM. Use the detected encoding
else
{
var sr = new StreamReader(data, candidateEncoding);
actualEncoding = candidateEncoding;
return sr.ReadToEnd(); ;
}
}
Then, I can have some methods in the like this:
void Foo(){
using(var stream = File.OpenRead(#"c:\somefile")) {
Encoding detected;
var fileContent = MyUtilityClass.GetStringAutoDetectEncoding(stream, detected);
Console.WriteLine("Detected encoding: {0}", encoding);
Console.WriteLine("File content: {0}", fileContent);
}
}
You could invert control using a closure. That is, create a method like so:
// This method will open the stream, execute the streamClosure, and then close the stream.
public static String StreamWork(Func<Stream, String> streamClosure) {
// Set up the stream here.
using (Stream stream = new MemoryStream()) { // Pretend the MemoryStream is your actual stream.
// Execute the closure. Return it's results.
return streamClosure(stream);
}
}
which is responsible for opening / closing the stream within the method.
Then you simply wrap up all the code that needs the stream into a Func<Stream, String> closure, and pass it in. The StreamWork method will open the stream, execute your code, then close the stream.
public static void Main()
{
// Wrap all of the work that needs to be done in a closure.
// This represents all the work that needs to be done while the stream is open.
Func<Stream, String> streamClosure = delegate(Stream stream) {
using (StreamReader streamReader = new StreamReader(stream)) {
return streamReader.ReadToEnd();
}
};
// Call StreamWork. This method handles creating/closing the stream.
String result = StreamWork(streamClosure);
Console.WriteLine(result);
Console.ReadLine();
}
UPDATE
Of course, this method of inversion is a matter of preference as mentioned in the comments below. The key point is to ensure that the stream is closed rather than allowing it to float around until the GC cleans it up (since the whole point of having stuff implement IDisposable is to avoid that sort of situation to begin with). Since this is a library function that accepts a Stream as input, the assumption is that the method-consumer will be creating the stream, and therefore as you point out, has the responsibility of ultimately closing the stream as well. But for sensitive resources where you are concerned about ensuring clean up occurs absolutely, inversion is sometimes a useful technique.
StreamReader close/dispose their underlying streams only when you call Dispose on them. They don't dispose of the stream if the reader/writer is just garbage collected.
Can I close a file stream without calling Flush (in C#)? I understood that Close and Dispose calls the Flush method first.
MSDN is not 100% clear, but Jon Skeet is saying "Flush", so do it before close/dispose. It won't hurt, right?
From FileStream.Close Method:
Any data previously written to the buffer is copied to the file before
the file stream is closed, so it is not necessary to call Flush before
invoking Close. Following a call to Close, any operations on the file
stream might raise exceptions. After Close has been called once, it
does nothing if called again.
Dispose is not as clear:
This method disposes the stream, by writing any changes to the backing
store and closing the stream to release resources.
Remark: the commentators might be right, it's not 100% clear from the Flush:
Override Flush on streams that implement a buffer. Use this method to
move any information from an underlying buffer to its destination,
clear the buffer, or both. Depending upon the state of the object, you
might have to modify the current position within the stream (for
example, if the underlying stream supports seeking). For additional
information see CanSeek.
When using the StreamWriter or BinaryWriter class, do not flush the
base Stream object. Instead, use the class's Flush or Close method,
which makes sure that the data is flushed to the underlying stream
first and then written to the file.
TESTS:
var textBytes = Encoding.ASCII.GetBytes("Test123");
using (var fileTest = System.IO.File.Open(#"c:\temp\fileNoCloseNoFlush.txt", FileMode.CreateNew))
{
fileTest.Write(textBytes,0,textBytes.Length);
}
using (var fileTest = System.IO.File.Open(#"c:\temp\fileCloseNoFlush.txt", FileMode.CreateNew))
{
fileTest.Write(textBytes, 0, textBytes.Length);
fileTest.Close();
}
using (var fileTest = System.IO.File.Open(#"c:\temp\fileFlushNoClose.txt", FileMode.CreateNew))
{
fileTest.Write(textBytes, 0, textBytes.Length);
fileTest.Flush();
}
using (var fileTest = System.IO.File.Open(#"c:\temp\fileCloseAndFlush.txt", FileMode.CreateNew))
{
fileTest.Write(textBytes, 0, textBytes.Length);
fileTest.Flush();
fileTest.Close();
}
What can I say ... all files got the text - maybe this is just too little data?
Test2
var rnd = new Random();
var size = 1024*1024*10;
var randomBytes = new byte[size];
rnd.NextBytes(randomBytes);
using (var fileTest = System.IO.File.Open(#"c:\temp\fileNoCloseNoFlush.bin", FileMode.CreateNew))
{
fileTest.Write(randomBytes, 0, randomBytes.Length);
}
using (var fileTest = System.IO.File.Open(#"c:\temp\fileCloseNoFlush.bin", FileMode.CreateNew))
{
fileTest.Write(randomBytes, 0, randomBytes.Length);
fileTest.Close();
}
using (var fileTest = System.IO.File.Open(#"c:\temp\fileFlushNoClose.bin", FileMode.CreateNew))
{
fileTest.Write(randomBytes, 0, randomBytes.Length);
fileTest.Flush();
}
using (var fileTest = System.IO.File.Open(#"c:\temp\fileCloseAndFlush.bin", FileMode.CreateNew))
{
fileTest.Write(randomBytes, 0, randomBytes.Length);
fileTest.Flush();
fileTest.Close();
}
And again - every file got its bytes ... to me it looks like it's doing what I read from MSDN: it doesn't matter if you call Flush or Close before dispose ... any thoughts on that?
You don't have to call Flush() on Close()/Dispose(), FileStream will do it for you as you can see from its source code:
http://referencesource.microsoft.com/#mscorlib/system/io/filestream.cs,e23a38af5d11ddd3
[System.Security.SecuritySafeCritical] // auto-generated
protected override void Dispose(bool disposing)
{
// Nothing will be done differently based on whether we are
// disposing vs. finalizing. This is taking advantage of the
// weak ordering between normal finalizable objects & critical
// finalizable objects, which I included in the SafeHandle
// design for FileStream, which would often "just work" when
// finalized.
try {
if (_handle != null && !_handle.IsClosed) {
// Flush data to disk iff we were writing. After
// thinking about this, we also don't need to flush
// our read position, regardless of whether the handle
// was exposed to the user. They probably would NOT
// want us to do this.
if (_writePos > 0) {
FlushWrite(!disposing); // <- Note this
}
}
}
finally {
if (_handle != null && !_handle.IsClosed)
_handle.Dispose();
_canRead = false;
_canWrite = false;
_canSeek = false;
// Don't set the buffer to null, to avoid a NullReferenceException
// when users have a race condition in their code (ie, they call
// Close when calling another method on Stream like Read).
//_buffer = null;
base.Dispose(disposing);
}
}
I've been tracking a newly introduced bug that seems to indicate .NET 4 does not reliably flush changes to disk when the stream is disposed (unlike .NET 2.0 and 3.5, which always did so reliably).
The .NET 4 FileStream class has been heavily modified in .NET 4, and while the Flush*() methods have been rewritten, similar attention seems to have been forgotten for .Dispose().
This is resulting in incomplete files.
Since you've stated that you understood that close & dispose called the flush method if it was not called explicitly by user code, I believe that (by close without flush) you actually want to have a possibility to discard changes made to a FileStream, if necessary.
If that is correct, using a FileStream alone won't help. You will need to load this file into a MemoryStream (or an array, depending on how you modify its contents), and then decide whether you want to save changes or not after you're done.
A problem with this is file size, obviously. FileStream uses limited size write buffers to speed up operations, but once they are depleted, changes need to be flushed. Due to .NET memory limits, you can only expect to load smaller files in memory, if you need to hold them entirely.
An easier alternative would be to make a disk copy of your file, and work on it using a plain FileStream. When finished, if you need to discard changes, simply delete the temporary file, otherwise replace the original with a modified copy.
Wrap the FileStream in a BufferedStream and close the filestream before the buffered stream.
var fs = new FileStream(...);
var bs = new BufferedStream(fs, buffersize);
bs.Write(datatosend, 0, length);
fs.Close();
try {
bs.Close();
}
catch (IOException) {
}
Using Flush() is worthy inside big Loops.
when you have to read and write a big File inside one Loop. In other case the buffer or the computer is big enough, and doesn´t matter to close() without making one Flush() before.
Example: YOU HAVE TO READ A BIG FILE (in one format) AND WRITE IT IN .txt
StreamWriter sw = .... // using StreamWriter
// you read the File ...
// and now you want to write each line for this big File using WriteLine ();
for ( .....) // this is a big Loop because the File is big and has many Lines
{
sw.WriteLine ( *whatever i read* ); //we write here somrewhere ex. one .txt anywhere
sw.Flush(); // each time the sw.flush() is called, the sw.WriteLine is executed
}
sw.Close();
Here it is very important to use Flush(); beacause otherwise each writeLine is save in the buffer and does not write it until the buffer is frull or until the program reaches sw.close();
I hope this helps a little to understand the function of Flush
I think it is safe to use simple using statement, which closes the stream after the call to GetBytes();
public static byte[] GetBytes(string fileName)
{
byte[] buffer = new byte[4096];
using (FileStream fs = new FileStream(fileName))
using (MemoryStream ms = new MemoryStream())
{
fs.BlockCopy(ms, buffer, 4096); // extension method for the Stream class
fs.Close();
return ms.ToByteArray();
}
}
I have a textReader that in a specific instance I want to be able to advance to the end of file quickly so other classes that might hold a reference to this object will not be able to call tr.ReadLine() without getting a null.
This is a large file. I cannot use TextReader.ReadToEnd() as it will often lead to an OutOfMemoryException
I thought I would ask the community if there was a way SEEK the stream without using TextReader.ReadToEnd() which returns a string of all data in the file.
Current method, inefficient.
The following example code is a mock up. Obviously I am not opening a file with an if statement directly following it asking if I want to read to the end.
TextReader tr = new StreamReader("Largefile");
if(needToAdvanceToEndOfFile)
{
while(tr.ReadLine() != null) { }
}
Desired solution (Note this code block contains fake 'concept' methods or methods that cannot be used due to risk of outofmemoryexception)
TextReader tr = new StreamReader("Largefile");
if(needToAdvanceToEndOfFile)
{
tr.SeekToEnd(); // A method that does not return anything. This method does not exist.
// tr.ReadToEnd() not acceptable as it can lead to OutOfMemoryException error as it is very large file.
}
A possible alternative is to read through the file in bigger chunks using tr.ReadBlock(args).
I poked around ((StreamReader)tr).BaseStream but could not find anything that worked.
As I am new to the community I figured I would see if someone knew the answer off the top of their head.
You have to discard any buffered data if you have read any file content - since data is buffered you might get content even if you seek the underlying stream to the end - working example:
StreamReader sr = new StreamReader(fileName);
string sampleLine = sr.ReadLine();
//discard all buffered data and seek to end
sr.DiscardBufferedData();
sr.BaseStream.Seek(0, SeekOrigin.End);
The problem as mentioned in the documentation is
The StreamReader class buffers input from the underlying stream when
you call one of the Read methods. If you manipulate the position of
the underlying stream after reading data into the buffer, the position
of the underlying stream might not match the position of the internal
buffer. To reset the internal buffer, call the DiscardBufferedData
method
Use
reader.BaseStream.Seek(0, SeekOrigin.End);
Test:
using (StreamReader reader = new StreamReader(#"Your Large File"))
{
reader.BaseStream.Seek(0, SeekOrigin.End);
int read = reader.Read();//read will be -1 since you are at the end of the stream
}
Edit: Test it with your code:
using (TextReader tr = new StreamReader("C:\\test.txt"))//test.txt is a file that has data and lines
{
((StreamReader)tr).BaseStream.Seek(0, SeekOrigin.End);
string foo = tr.ReadLine();
Debug.WriteLine(foo ?? "foo is null");//foo is null
int read = tr.Read();
Debug.WriteLine(read);//-1
}
What's the most efficient way to read a stream into another stream? In this case, I'm trying to read data in a Filestream into a generic stream. I know I could do the following:
1. read line by line and write the data to the stream
2. read chunks of bytes and write to the stream
3. etc
I'm just trying to find the most efficient way.
Thanks
Stephen Toub discusses a stream pipeline in his MSDN .NET matters column here. In the article he describes a CopyStream() method that copies from one input stream to another stream. This sounds quite similar to what you're trying to do.
I rolled together a quick extension method (so VS 2008 w/ 3.5 only):
public static class StreamCopier
{
private const long DefaultStreamChunkSize = 0x1000;
public static void CopyTo(this Stream from, Stream to)
{
if (!from.CanRead || !to.CanWrite)
{
return;
}
var buffer = from.CanSeek
? new byte[from.Length]
: new byte[DefaultStreamChunkSize];
int read;
while ((read = from.Read(buffer, 0, buffer.Length)) > 0)
{
to.Write(buffer, 0, read);
}
}
}
It can be used thus:
using (var input = File.OpenRead(#"C:\wrnpc12.txt"))
using (var output = File.OpenWrite(#"C:\wrnpc12.bak"))
{
input.CopyTo(output);
}
You can also swap the logic around slightly and write a CopyFrom() method as well.
Reading a buffer of bytes and then writing it is fastest. Methods like ReadLine() need to look for line delimiters, which takes more time than just filling a buffer.
I assume by generic stream, you mean any other kind of stream, like a Memory Stream, etc.
If so, the most efficient way is to read chunks of bytes and write them to the recipient stream. The chunk size can be something like 512 bytes.