SharpZipLib Deflater creates bad data - c#

Original compressed data can be correctly inflated back. However, if I inflate data, deflate, and again inflate, resulting data are incorrect (e.g. simple data extraction, its modification and again compression - only now when testing no modification occurs, so I can test it).
Resulting data are somehow "damaged". The starting (about) 40 bytes are OK, and then "block" of incorrect data follows (remnants of original data are still there, but many bytes are missing).
Changing compression level doesn't help (except setting NO_COMPRESSION creates somehow incomplete stream).
Question is simple: why is that happening?
using ICSharpCode.SharpZipLib.Zip.Compression;
public byte[] Inflate(byte[] inputData)
{
Inflater inflater = new Inflater(false);
using (var inputStream = new MemoryStream(inputData))
using (var ms = new MemoryStream())
{
var inputBuffer = new byte[4096];
var outputBuffer = new byte[4096];
while (inputStream.Position < inputData.Length)
{
var read = inputStream.Read(inputBuffer, 0, inputBuffer.Length);
inflater.SetInput(inputBuffer, 0, read);
while (inflater.IsNeedingInput == false)
{
var written = inflater.Inflate(outputBuffer, 0, outputBuffer.Length);
if (written == 0)
break;
ms.Write(outputBuffer, 0, written);
}
if (inflater.IsFinished == true)
break;
}
inflater.Reset();
return ms.ToArray();
}
}
public byte[] Deflate(byte[] inputData)
{
Deflater deflater = new Deflater(Deflater.BEST_SPEED, false);
deflater.SetInput(inputData);
deflater.Finish();
using (var ms = new MemoryStream())
{
var outputBuffer = new byte[65536 * 4];
while (deflater.IsNeedingInput == false)
{
var read = deflater.Deflate(outputBuffer);
ms.Write(outputBuffer, 0, read);
if (deflater.IsFinished == true)
break;
}
deflater.Reset();
return ms.ToArray();
}
}
Edit: My bad, by mistake I rewrote first several bytes of the original compressed data. This isn't SharpZipLib fault, but mine.

I know this is a tangential answer, but the exact same thing happened to me, I abandoned SharpZipLib and went to DotNetZip :
http://dotnetzip.codeplex.com/
Easier API, no corrupt or strange byte order files.

Related

Verify that c# Embedded Resource matches file

I know that this is may be a question without one 'right' answer
I have a C# windows application that has an embedded resource included in the assembly. I've been trying to come up with a way to compare the contents of my resource stream to determine if the contents of that stream matches a particular file on the file system.
e.g.
using(var resourceStream = Assembly.GetExecutingAssembly().GetManifestResourceStream(#"Manifest/Resource/Path/thing.exe"))
using(var fileStream = new FileStream(#"File/System/Path/thing.exe", FileMode.Read))
// Compare Contents (thing.exe may be an older version)
if(CompareStreamContents(resourceStream, fileStream))
{
/* Do a thing */
}
else
{
/* Do another thing*/
}
Is there a better way than simply doing a byte-by-byte comparison? Thoughts? (and thanks in advance!)
Per my comment:
private bool CompareStreamContents(Stream resourceStream, Stream fileStream)
{
var sha = new SHA256CryptoServiceProvider();
var hash1 = Convert.ToBase64String(sha.ComputeHash(ReadToEnd(resourceStream)));
var hash2 = Convert.ToBase64String(sha.ComputeHash(ReadToEnd(fileStream)));
return hash1 == hash2;
}
private byte[] ReadToEnd(Stream stream)
{
var continueRead = true;
var buffer = new byte[0x10000];
var ms = new MemoryStream();
while (continueRead)
{
var size = stream.Read((byte[])buffer, 0, buffer.Length);
if (size > 0)
{
ms.Write(buffer, 0, size);
}
else
{
continueRead = false;
}
}
return ms.ToArray();
}
If you plan on doing something else with the streams after the compare, you may want to set the stream position back to origin before returning from the compare method.

Correct way to use GZipStream in dotNET C#

I'm working with GZipStream at the moment using .net 3.5.
I have two methods listed below. As input file I use text file which consists of chars 's'. Size of the file is 2MB. This code works fine if I use .net 4.5 but with .net 3.5 after compress and decompress I get file of size 435KB which of course isn't the same with source file.
If I try to decompress file via WinRAR it is also looks good (the same with source file).
If I try decompress file using GZipStream from .net4.5 (file compressed via GZipStream from .net 3.5) the result is bad.
UPD:
In general I really need to read the file as several separate gzip chunks, in this case all the bytes of copressed files are read at one call of the Read() method so I still don't understand why decompressing doesn't works.
public void CompressFile()
{
string fileIn = #"D:\sin2.txt";
string fileOut = #"D:\sin2.txt.pgz";
using (var fout = File.Create(fileOut))
{
using (var fin = File.OpenRead(fileIn))
{
using (var zip = new GZipStream(fout, CompressionMode.Compress))
{
var buffer = new byte[1024 * 1024 * 10];
int n = fin.Read(buffer, 0, buffer.Length);
zip.Write(buffer, 0, n);
}
}
}
}
public void DecompressFile()
{
string fileIn = #"D:\sin2.txt.pgz";
string fileOut = #"D:\sin2.1.txt";
using (var fsout = File.Create(fileOut))
{
using (var fsIn = File.OpenRead(fileIn))
{
var buffer = new byte[1024 * 1024 * 10];
int n;
while ((n = fsIn.Read(buffer, 0, buffer.Length)) > 0)
{
using (var ms = new MemoryStream(buffer, 0, n))
{
using (var zip = new GZipStream(ms, CompressionMode.Decompress))
{
int nRead = zip.Read(buffer, 0, buffer.Length);
fsout.Write(buffer, 0, nRead);
}
}
}
}
}
}
You're trying to decompress each "chunk" as if it's a separate gzip file. Don't do that - just read from the GZipStream in a loop:
using (var fsout = File.Create(fileOut))
{
using (var fsIn = File.OpenRead(fileIn))
{
using (var zip = new GZipStream(fsIn, CompressionMode.Decompress))
{
var buffer = new byte[1024 * 32];
int bytesRead;
while ((bytesRead = zip.Read(buffer, 0, buffer.Length)) > 0)
{
fsout.Write(buffer, 0, bytesRead);
}
}
}
}
Note that your compression code should look similar, reading in a loop rather than assuming a single call to Read will read all the data.
(Personally I'd skip fsIn, and just use new GZipStream(File.OpenRead(fileIn)) but that's just a personal preference.)
First, as #Jon Skeet mentioned, you are not using Stream.Read method correctly. It doesn't matter if your buffer is big enough or not, the stream is allowed to return less bytes than requested, with zero indicating no more, so reading from stream should always be performed in a loop.
However the main problem in your decompress code is the way you share the buffer. Your read the input into a buffer, than wrap it in a MemoryStream (note that the constructor used does not make a copy of the passed array, but actually sets it as it's internal buffer), and then you try to read and write to that buffer at the same time. Taking into account that decompressing writes data "faster" than reading, it's surprising that your code works at all.
The correct implementation is quite simple
static void CompressFile()
{
string fileIn = #"D:\sin2.txt";
string fileOut = #"D:\sin2.txt.pgz";
using (var input = File.OpenRead(fileIn))
using (var output = new GZipStream(File.Create(fileOut), CompressionMode.Compress))
Write(input, output);
}
static void DecompressFile()
{
string fileIn = #"D:\sin2.txt.pgz";
string fileOut = #"D:\sin2.1.txt";
using (var input = new GZipStream(File.OpenRead(fileIn), CompressionMode.Decompress))
using (var output = File.Create(fileOut))
Write(input, output);
}
static void Write(Stream input, Stream output, int bufferSize = 10 * 1024 * 1024)
{
var buffer = new byte[bufferSize];
for (int readCount; (readCount = input.Read(buffer, 0, buffer.Length)) > 0;)
output.Write(buffer, 0, readCount);
}

Sending and receiving compressed data over a TCP socket

Need help with sending and receiving compressed data over TCP socket.
The code works perfectly fine if I don't use compression, but something very strange happens when I do use compression.. Basically, the problem is that the stream.Read() operation gets skipped and I don't know why..
My code:
using (var client = new TcpClient())
{
client.Connect("xxx.xxx.xx.xx", 6100);
using (var stream = client.GetStream())
{
// SEND REQUEST
byte[] bytesSent = Encoding.UTF8.GetBytes(xml);
// send compressed bytes (if this is used, then stream.Read() below doesn't work.
//var compressedBytes = bytesSent.ToStream().GZipCompress();
//stream.Write(compressedBytes, 0, compressedBytes.Length);
// send normal bytes (uncompressed)
stream.Write(bytesSent, 0, bytesSent.Length);
// GET RESPONSE
byte[] bytesReceived = new byte[client.ReceiveBufferSize];
// PROBLEM HERE: when using compression, this line just gets skipped over very quickly
stream.Read(bytesReceived, 0, client.ReceiveBufferSize);
//var decompressedBytes = bytesReceived.ToStream().GZipDecompress();
//string response = Encoding.UTF8.GetString(decompressedBytes);
string response = Encoding.UTF8.GetString(bytesReceived);
Console.WriteLine(response);
}
}
You will notice some extension methods above. Here is the code in case you are wondering if something is wrong there.
public static MemoryStream ToStream(this byte[] bytes)
{
return new MemoryStream(bytes);
}
public static byte[] GZipCompress(this Stream stream)
{
using (var memoryStream = new MemoryStream())
{
using (var gZipStream = new GZipStream(memoryStream, CompressionMode.Compress))
{
stream.CopyTo(gZipStream);
}
return memoryStream.ToArray();
}
}
public static byte[] GZipDecompress(this Stream stream)
{
using (var memoryStream = new MemoryStream())
{
using (var gZipStream = new GZipStream(stream, CompressionMode.Decompress))
{
gZipStream.CopyTo(memoryStream);
}
return memoryStream.ToArray();
}
}
The extensions work quite well in the following, so I'm sure they're not the problem:
string original = "the quick brown fox jumped over the lazy dog";
byte[] compressedBytes = Encoding.UTF8.GetBytes(original).ToStream().GZipCompress();
byte[] decompressedBytes = compressedBytes.ToStream().GZipDecompress();
string result = Encoding.UTF8.GetString(decompressedBytes);
Console.WriteLine(result);
Does anyone have any idea why the Read() operation is being skipped when the bytes being sent are compressed?
EDIT
I received a message from the API provider after showing them the above sample code. They had this to say:
at a first glance I guess the header is missing. The input must start
with a 'c' followed by the length of the input
(sprintf(cLength,"c%09d",hres) in our example). We need this because
we can't read until we find a binary 0 to recognize the end.
They previously provided some sample code in C, which I don't fully understand 100%, as follows:
example in C:
#include <zlib.h>
uLongf hres;
char cLength[COMPRESS_HEADER_LEN + 1] = {'\0'};
n = read(socket,buffer,10);
// check if input is compressed
if(msg[0]=='c') {
compressed = 1;
}
n = atoi(msg+1);
read.....
hres = 64000;
res = uncompress((Bytef *)msg, &hres, (const Bytef*)
buffer/*compressed*/, n);
if(res == Z_OK && hres > 0 ){
msg[hres]=0; //original
}
else // errorhandling
hres = 64000;
if (compressed){
res = compress((Bytef *)buffer, &hres, (const Bytef *)msg, strlen(msg));
if(res == Z_OK && hres > 0 ) {
sprintf(cLength,"c%09d",hres);
write(socket,cLength,10);
write(socket, buffer, hres);
}
else // errorhandling
makefile: add "-lz" to the libs
They're using zlib. I don't suspect that to make any difference, but I did try using zlib.net and I still get no response anyway.
Can someone give me an example of how exactly I'm supposed to send this input length in C#?
EDIT 2
In response to #quantdev, here is what I am trying now for the length prefix:
using (var client = new TcpClient())
{
client.Connect("xxx.xxx.xx.xx", 6100);
using (var stream = client.GetStream())
{
// SEND REQUEST
byte[] bytes = Encoding.UTF8.GetBytes(xml);
byte[] compressedBytes = ZLibCompressor.Compress(bytes);
byte[] prefix = Encoding.UTF8.GetBytes("c" + compressedBytes.Length);
byte[] bytesToSend = new byte[prefix.Length + compressedBytes.Length];
Array.Copy(prefix, bytesToSend, prefix.Length);
Array.Copy(compressedBytes, 0, bytesToSend, prefix.Length, compressedBytes.Length);
stream.Write(bytesToSend, 0, bytesToSend.Length);
// WAIT
while (client.Available == 0)
{
Thread.Sleep(1000);
}
// GET RESPONSE
byte[] bytesReceived = new byte[client.ReceiveBufferSize];
stream.Read(bytesReceived, 0, client.ReceiveBufferSize);
byte[] decompressedBytes = ZLibCompressor.DeCompress(bytesReceived);
string response = Encoding.UTF8.GetString(decompressedBytes);
Console.WriteLine(response);
}
}
You need to check the return value of the Read() calls you are making on the TCP stream: it is the number of bytes effectively read.
MSDN says :
Return Value
The total number of bytes read into the buffer. This can be less than the number of bytes requested if that many bytes are not
currently available, or zero (0) if the end of the stream has been
reached.
If the socket is closed, the call will return immediately 0 (which is what might be happening here).
If is not 0, then you must check how many bytes you did actually received, if it is less than client.ReceiveBufferSize, you will need additional calls to Read to retrieve the remaining bytes.
Prior to you call to read, check that some data is actually available on the socket :
while(client.Available == 0)
// wait ...
http://msdn.microsoft.com/en-us/library/system.net.sockets.tcpclient.available%28v=vs.110%29.aspx
I think you may have the end of file or so. Can you try setting the stream position before reading the stream
stream.position = 0;
http://msdn.microsoft.com/en-us/library/vstudio/system.io.stream.read
Encoding.UTF8.GetString shouldn't be used on arbitrary byte array.
e.g.: The compressed bytes may contain NULL character, which is not allowed in UTF-8 encoded text except for being used as terminator.
If you want to print the received bytes for debugging, maybe you should just print them as integers.

RavenDB Attachments - Functionality how to do?

I have a file input control.
<input type="file" name="file" id="SaveFileToDB"/>
Lets say I browse to C:/Instruction.pdf document and click on submit. On Submit, I want to save the document in RavenDB and also later retrieve it for download purposes. I saw this link http://ravendb.net/docs/client-api/attachments
that says.. do this..
Stream data = new MemoryStream(new byte[] { 1, 2, 3 });
documentStore.DatabaseCommands.PutAttachment("videos/2", null, data,
new RavenJObject {{"Description", "Kids play in the garden"}});
I am not following what 1,2,3 mean here and what it means to say videos/2 in the command... how I can use these two lines to use it in my case.. to save word/pdfs in ravendb.. if any one has done such thing before, please advise.
I am not clear on one thing.. how the attachment is stored. If I want to store the attachment itself (say pdf) it is stored independently in ravendb.. and I just store the key of the attachment in the main document that it is associated with? If that is so, where is the pdf stored physically in ravendb? can I see it?
The 1,2,3 is just example data. What it is trying to get across is that you create a memory stream of whatever you want then use that memory stream in the PutAttachment method. Below is ad-hoc and not tested but should work:
using (var mem = new MemoryStream(file.InputStream)
{
_documentStore.DatabaseCommands.PutAttachment("upload/" + YourUID, null, mem,
new RavenJObject
{
{ "OtherData", "Can Go here" },
{ "MoreData", "Here" }
});
}
Edited for the rest of the questions
How is attachment stored? I believe it is a json document with one property holding the byte array of the attachment
Is the "document" stored independently? Yes. An attachment is a special document that is not indexed but it is part of the database so that tasks like replication work.
"Should I" store the key of the attachment in the main document that it is associated with? Yes you would reference the Key and anytime you want to get that you would just ask Raven for the attachment with that id.
Is the pdf stored physically in ravendb? Yes.
Can you see it? No. It does even show up in the studio (at least as far as I know)
Edit Corrected and Updated Sample
[AcceptVerbs(HttpVerbs.Post)]
public ActionResult Upload(HttpPostedFileBase file)
{
byte[] bytes = ReadToEnd(file.InputStream);
var id = "upload/" + DateTime.Now.Second.ToString(CultureInfo.InvariantCulture);
using (var mem = new MemoryStream(bytes))
{
DocumentStore.DatabaseCommands.PutAttachment(id, null, mem,
new RavenJObject
{
{"OtherData", "Can Go here"},
{"MoreData", "Here"},
{"ContentType", file.ContentType}
});
}
return Content(id);
}
public FileContentResult GetFile(string id)
{
var attachment = DocumentStore.DatabaseCommands.GetAttachment("upload/" + id);
return new FileContentResult(ReadFully(attachment.Data()), attachment.Metadata["ContentType"].ToString());
}
public static byte[] ReadToEnd(Stream stream)
{
long originalPosition = 0;
if (stream.CanSeek)
{
originalPosition = stream.Position;
stream.Position = 0;
}
try
{
var readBuffer = new byte[4096];
int totalBytesRead = 0;
int bytesRead;
while ((bytesRead = stream.Read(readBuffer, totalBytesRead, readBuffer.Length - totalBytesRead)) > 0)
{
totalBytesRead += bytesRead;
if (totalBytesRead == readBuffer.Length)
{
int nextByte = stream.ReadByte();
if (nextByte != -1)
{
var temp = new byte[readBuffer.Length*2];
Buffer.BlockCopy(readBuffer, 0, temp, 0, readBuffer.Length);
Buffer.SetByte(temp, totalBytesRead, (byte) nextByte);
readBuffer = temp;
totalBytesRead++;
}
}
}
byte[] buffer = readBuffer;
if (readBuffer.Length != totalBytesRead)
{
buffer = new byte[totalBytesRead];
Buffer.BlockCopy(readBuffer, 0, buffer, 0, totalBytesRead);
}
return buffer;
}
finally
{
if (stream.CanSeek)
{
stream.Position = originalPosition;
}
}
}
public static byte[] ReadFully(Stream input)
{
byte[] buffer = new byte[16 * 1024];
using (MemoryStream ms = new MemoryStream())
{
int read;
while ((read = input.Read(buffer, 0, buffer.Length)) > 0)
{
ms.Write(buffer, 0, read);
}
return ms.ToArray();
}
}
How is attachment stored?
It is stored as binary data inside RavenDB. It is NOT stored as json.
Is the "document" stored independently?
There isn't a document here, you have some metadata that is associated with the attachment, it isn't a seaprate document.
"Should I" store the key of the attachment in the main document that it is associated with?
Yes, there is no way to query for that.
Is the pdf stored physically in ravendb?
Yes
Can you see it?
Only if you go to the attachment directly, such as http://localhost:8080/static/ATTACHMENT_KEY
It won't show in the UI

If I check stream for valid image I can't write bytes to server

I am trying to check if a file is an image before I upload it to the image server.
I am doing it with the following function, which works exceptionally well:
static bool IsValidImage(Stream imageStream)
{
bool isValid = false;
try
{
// Read the image without validating image data
using (Image img = Image.FromStream(imageStream, false, false))
{
isValid = true;
}
}
catch
{
;
}
return isValid;
}
The problem is that when the below is called immediately afterwards, The line:
while ((bytesRead = request.FileByteStream.Read(buffer, 0, bufferSize)) > 0)
evalueates to zero and no bytes are read. I notice that when I remove the
IsValidImage function, bytes are read and the file is written. It seems
that bytes can only be read once? Any idea how to fix this?
using (FileStream outfile = new FileStream(filePath, FileMode.Create))
{
const int bufferSize = 65536; // 64K
int bytesRead = 0;
Byte[] buffer = new Byte[bufferSize];
while ((bytesRead = request.FileByteStream.Read(buffer, 0, bufferSize)) > 0)
{
outfile.Write(buffer, 0, bytesRead);
}
outfile.Close(); //necessary?
}
UPDATE: Thanks for your help Marc. I am new to stream manipulation and could use a little
more help here. I took a shot but may be mixing up the use of filestream and memorystream.
Would you mind taking a look? Thanks again.
using (FileStream outfile = new FileStream(filePath, FileMode.Create))
using (MemoryStream ms = new MemoryStream())
{
byte[] buffer = new byte[1024];
int bytesRead;
while ((bytesRead = request.FileByteStream.Read(buffer, 0, buffer.Length)) > 0)
{
ms.Write(buffer, 0, bytesRead);
}
// ms now has a seekable/rewindable copy of the data
// TODO: read ms the first time
// I replaced request.FileByteStream with ms but am unsure about
// the using statement in the IsValidImage function.
if (!IsValidImage(ms) == true)
{
ms.Close();
request.FileByteStream.Close();
return;
}
ms.Position = 0;
// TODO: read ms the second time
byte[] m_buffer = new byte[ms.Length];
while ((bytesRead = ms.Read(m_buffer, 0, (int)ms.Length)) > 0)
{
outfile.Write(m_buffer, 0, bytesRead);
}
}
static bool IsValidImage(MemoryStream imageStream)
{
bool isValid = false;
try
{
// Read the image without validating image data
using (Image img = Image.FromStream(imageStream, false, false))
{
isValid = true;
}
}
catch
{
;
}
return isValid;
}
As you read from any stream, the position increases. If you read a stream to the end (as is typical), and then try to read again, then it will return EOF.
For some streams, you can seek - set the Position to 0, for example. However, you should try to avoid relying on this as it is not available for many streams (especially when network IO is involved). You can query this ability via CanSeek, but it would be simpler to avoid this - partly as if you are branching based on this, you suddenly have twice as much code to maintain.
If you need the data twice, then the options depends on the size of the data. For small streams, buffer it in-memory, as either a byte[] or a MemoryStream. For larger streams (or if you don't know the size) then writing to a scratch file (and deleting afterwards) is a reasonable approach. You can open and read the file as many times (in series, not in parallel) as you like.
If you are happy the stream isn't too large (although maybe add a cap to prevent people uploading swap-files, etc):
using (MemoryStream ms = new MemoryStream()) {
byte[] buffer = new byte[1024];
int bytesRead;
while ((bytesRead = inputStream.Read(buffer, 0, buffer.Length)) > 0) {
ms.Write(buffer, 0, bytesRead);
}
// ms now has a seekable/rewindable copy of the data
// TODO: read ms the first time
ms.Position = 0;
// TODO: read ms the second time
}
Indeed Stream instances remember where the current "cursor" is. Some streams support "rewinding". The "CanSeek" property will then return true. In the case of a HTTP request Ithis won't work (CanSeek = false).
Isn't a MIME-type sent from the browser as well?
If you really want to keep your way of checking you'll have to go with Marc's proposition
In your update, you have a problem reading the stream a second time.
byte[] m_buffer = new byte[ms.Length];
while ((bytesRead = ms.Read(m_buffer, 0, (int)ms.Length)) > 0)
{
outfile.Write(m_buffer, 0, bytesRead);
}
The solution is simple:
byte[] m_buffer = ms.ToArray();
outfile.Write(m_buffer, 0, m_buffer.Length);
See also MemoryStream.ToArray
public static bool IsImagen(System.IO.Stream stream, String fileName)
{
try
{
using (Image img = Image.FromStream(stream, false, false))
{
if (fileName.ToLower().IndexOf(".jpg") > 0)
return true;
if (fileName.ToLower().IndexOf(".gif") > 0)
return true;
if (fileName.ToLower().IndexOf(".png") > 0)
return true;
}
}
catch (ArgumentException){}
return false;
}

Categories

Resources