GZipStream only decompresses first line

GZipStream only decompresses first line - c#

My GZipStream will only decompress the first line of the file. Extracting the contents via 7-zip works as expected and gives me the entire file contents. It also extracts as expected using gunzip on cygwin and linux, so I expect this is O/S specific (Windows 7).
I'm not certain how to go about troubleshooting this, so any tips on that would help me a great deal. It sounds very similar to this, but using SharpZLib results in the same thing.
Here's what I'm doing:
var inputFile = String.Format(#"{0}\{1}", inputDir, fileName);
var outputFile = String.Format(#"{0}\{1}.gz", inputDir, fileName);
var dcmpFile = String.Format(#"{0}\{1}", outputDir, fileName);
using (var input = File.OpenRead(inputFile))
using (var fileOutput = File.Open(outputFile, FileMode.Append))
using (GZipStream gzOutput = new GZipStream(fileOutput, CompressionMode.Compress, true))
{
input.CopyTo(gzOutput);
}
// Now, decompress
using (FileStream of = new FileStream(outputFile, FileMode.Open, FileAccess.Read))
using (GZipStream ogz = new GZipStream(of, CompressionMode.Decompress, false))
using (FileStream wf = new FileStream(dcmpFile, FileMode.Append, FileAccess.Write))
{
ogz.CopyTo(wf);
}

Your output file only contains a single line (gzipped) - but it contains all of the text data other than the line breaks.
You're repeatedly calling ReadLine() which returns a line of text without the line break and converting that text to bytes. So if you had an input file which had:
abc
def
ghi
You'd end up with an output file which was the compressed version of
abcdefghi
If you don't want that behaviour, why even go through a StreamReader in the first place? Just copy from the input FileStream straight to the GZipStream a block at a time, or use Stream.CopyTo if you're using .NET 4:
// Note how much simpler the code is using File.*
using (var input = File.OpenRead(inputFile))
using (var fileOutput = File.Open(outputFile, FileMode.Append))
using (GZipStream gzOutput = new GZipStream(os, CompressionMode.Compress, true))
{
input.CopyTo(gzOutput);
}
Also note that appending to a compressed file is rarely a good idea, unless you've got some sort of special handling for multiple "chunks" within a single file.

Related

Saving MemoryStream to File produces 0 bytes

I'm using this code to write an MP3 MemoryStream to file:
using (var nSpeakStreamAsMp3 = new MemoryStream())
using (var nWavFileReader = new WaveFileReader(nSpeakStream))
using (var nMp3Writer = new LameMP3FileWriter(nSpeakStreamAsMp3, nWavFileReader.WaveFormat, LAMEPreset.STANDARD_FAST))
{
nWavFileReader.CopyTo(nMp3Writer);
string sPath = "C:\\inetpub\\wwwroot\\server\\bin\\mymp3.mp3";
using (FileStream nFile = new FileStream(sPath, FileMode.Create, System.IO.FileAccess.Write))
{
nSpeakStreamAsMp3.CopyTo(nFile);
}
sRet = (String.Concat("data:audio/mpeg;base64,", Convert.ToBase64String(nSpeakStreamAsMp3.ToArray())));
}
return sRet;
For some reason which I don't see, this produces a file of 0 bytes.
However, the MP3 stream is valid and does work. I'm passing it as a Base64String to a website, and I do hear it.
Where might be the error here?

nSpeakStreamAsMp3 is currently positioned at the end of the stream; you need to think like a VCR: be kind, rewind (nSpeakStreamAsMp3.Position = 0;) before you copy the value out again
make sure you flush nMp3Writer; if possible, close nMp3Writer completely

Does CopyTo store the whole thing in memory?

I have the following code snippet, which is designed to add files to a .zip file, while at the same time calculating their sha1 checksum.
However, it's running out of memory on large files.
Which part of it is causing the whole file to be in memory? Surely this should all be just streamed?
using (ZipArchive archive = ZipFile.Open(buildFile, ZipArchiveMode.Update))
{
foreach (var fileName in nameList)
{
ZipArchiveEntry entry = archive.CreateEntry(source.filename);
using (Stream zipData = entry.Open())
using (SHA1Managed shaForFile = new SHA1Managed())
using (Stream sourceFileStream = File.OpenRead(fileName))
using (Stream sourceData = new CryptoStream(sourceFileStream, shaForFile, CryptoStreamMode.Read))
{
sourceData.CopyTo(zipData);
print fileName + ':' + shaForFile.Hash;
}
}
}

(Copied from a comment - as this answers the question)
The problem is ZipArchiveMode.Update, that can require significant alterations to the file on disk. It can only ever directly stream to disk when you use ZipArchiveMode.Create

Modify File Stream in memory

I am reading a file using StreamReader fileReader = File.OpenText(filePath). I would like to modify one line in the file in memory and push the modified stream to another method.
What I would like to avoid is reading the whole file into a string and modifying the string (doesn't scale). I would also like to avoid modifying the actual file.
Is there a straightforward way of doing this?

There is no built-in way to do that in .Net framework.
Stream and StreamReader/StreamWriter classes are designed to be chained if necessary (like GZipStream wraps stream to compress it). So you can create wrapper StreamReader and update data as you need for every operation after calling wrapped reader.

You can open two stream -one for read, one for write- at the same time. I tested simple code that works, but not sure that's what you want:
// "2.bar\r\n" will be replaced by "!!!!!\r\n"
File.WriteAllText("test.txt",
#"1.foo
2.bar
3.fake");
// open inputStream for StreamReader, and open outputStream for StreamWriter
using (var inputStream = File.Open("test.txt", FileMode.Open, FileAccess.Read, FileShare.ReadWrite))
using (var reader = new StreamReader(inputStream))
using (var outputStream = File.Open("test.txt", FileMode.Open, FileAccess.Write, FileShare.Read))
using (var writer = new StreamWriter(outputStream))
{
var position = 0L; // track the reading position
var newLineLength = Environment.NewLine.Length;
while (!reader.EndOfStream)
{
var line = reader.ReadLine();
// your particular conditions here.
if (line.StartsWith("2."))
{
// seek line start position
outputStream.Seek(position, SeekOrigin.Begin);
// replace by something,
// but the length should be equal to original in this case.
writer.WriteLine(new String('!', line.Length));
}
position += line.Length + newLineLength;
}
}
/* as a result, test.txt will be:
1.foo
!!!!!
3.fake
*/
As you can see, both streams can be accessed by StreamReader and StreamWriter at the same time. And you can also manipulate both read/write position as well.

Read a PDF into a string or byte[] and write that string/byte[] back to disk

I am having a problem in my app where it reads a PDF from disk, and then has to write it back to a different location later.
The emitted file is not a valid PDF anymore.
In very simplified form, I have tried reading/writing it using
var bytes = File.ReadAllBytes(#"c:\myfile.pdf");
File.WriteAllBytes(#"c:\output.pdf", bytes);
and
var input = new StreamReader(#"c:\myfile.pdf").ReadToEnd();
File.WriteAllText("c:\output.pdf", input);
... and about 100 permutations of the above with various encodings being specified. None of the output files were valid PDFs.
Can someone please lend a hand? Many thanks!!

In C#/.Net 4.0:
using (var i = new FileStream(#"input.pdf", FileMode.Open, FileAccess.Read))
using (var o = File.Create(#"output.pdf"))
i.CopyTo(o);
If you insist on having the byte[] first:
using (var i = new FileStream(#"input.pdf", FileMode.Open, FileAccess.Read))
using (var ms = new MemoryStream())
{
i.CopyTo(ms);
byte[] rawdata = ms.GetBuffer();
using (var o = File.Create(#"output.pdf"))
ms.CopyTo(o);
}
The memory stream may need to be ms.Seek(0, SeekOrigin.Origin) or something like that before the second CopyTo. look it up, or try it out

You're using File.WriteAllText to write your file out.
Try File.WriteAllBytes.

GZipStream works but extension is lost

I am using following code to zip a file and it works fine but when I decompress with WinRar I get the original file name without the extension, any clue why if filename is myReport.xls when I decompress I get only myReport ?
using (var fs = new FileStream(fileName, FileMode.Open))
{
byte[] input = new byte[fs.Length];
fs.Read(input, 0, input.Length);
fs.Close();
using (var fsOutput = new FileStream(zipName, FileMode.Create, FileAccess.Write))
using(var zip = new GZipStream(fsOutput, CompressionMode.Compress))
{
zip.Write(input, 0, input.Length);
zip.Close();
fsOutput.Close();
}
}

GZip compresses only one file - without knowing the name. Therefore if you compress the file myReport.xls you should name it myReport.xls.gz. On decompression the last file extension will be removed so you end up with the original filename.
That its the way how it is used in Unix/Linux for ages...

Very weird indeed. A brief search came up with the following:
http://dotnetzip.codeplex.com/discussions/268293
Which says that GZipStream has no way of knowing the name of the stream that is being written, and suggests you set the FileName property directly.
Hope that helps.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

GZipStream only decompresses first line - c#

Related

Saving MemoryStream to File produces 0 bytes

Does CopyTo store the whole thing in memory?

Modify File Stream in memory

Read a PDF into a string or byte[] and write that string/byte[] back to disk

GZipStream works but extension is lost

Categories

Resources