GZipStream - write not writing all compressed data even with flush? - c#

I've got a pesky problem with gzipstream targeting .Net 3.5. This is my first time working with gzipstream, however I have modeled after a number of tutorials including here and I'm still stuck.
My app serializes a datatable to xml and inserts into a database, storing the compressed data into a varbinary(max) field as well as the original length of the uncompressed buffer. Then, when I need it, I retrieve this data and decompress it and recreates the datatable. The decompress is what seems to fail.
EDIT: Sadly after changing the GetBuffer to ToArray as suggested, my issue remains. Code Updated below
Compress code:
DataTable dt = new DataTable("MyUnit");
//do stuff with dt
//okay... now compress the table
using (MemoryStream xmlstream = new MemoryStream())
{
//instead of stream, use xmlwriter?
System.Xml.XmlWriterSettings settings = new System.Xml.XmlWriterSettings();
settings.Encoding = Encoding.GetEncoding(1252);
settings.Indent = false;
System.Xml.XmlWriter writer = System.Xml.XmlWriter.Create(xmlstream, settings);
try
{
dt.WriteXml(writer);
writer.Flush();
}
catch (ArgumentException)
{
//likely an encoding issue... okay, base64 encode it
var base64 = Convert.ToBase64String(xmlstream.ToArray());
xmlstream.Write(Encoding.GetEncoding(1252).GetBytes(base64), 0, Encoding.GetEncoding(1252).GetBytes(base64).Length);
}
using (MemoryStream zipstream = new MemoryStream())
{
GZipStream zip = new GZipStream(zipstream, CompressionMode.Compress);
log.DebugFormat("Compressing commands...");
zip.Write(xmlstream.GetBuffer(), 0, xmlstream.ToArray().Length);
zip.Flush();
float ratio = (float)zipstream.ToArray().Length / (float)xmlstream.ToArray().Length;
log.InfoFormat("Resulting compressed size is {0:P2} of original", ratio);
using (SqlCommand cmd = new SqlCommand())
{
cmd.CommandText = "INSERT INTO tinydup (lastid, command, compressedlength) VALUES (#lastid,#compressed,#length)";
cmd.Connection = db;
cmd.Parameters.Add("#lastid", SqlDbType.Int).Value = lastid;
cmd.Parameters.Add("#compressed", SqlDbType.VarBinary).Value = zipstream.ToArray();
cmd.Parameters.Add("#length", SqlDbType.Int).Value = xmlstream.ToArray().Length;
cmd.ExecuteNonQuery();
}
}
Decompress Code:
/* This is an encapsulation of what I get from the database
public class DupUnit{
public uint lastid;
public uint complength;
public byte[] compressed;
}*/
//I have already retrieved my list of work to do from the database in a List<Dupunit> dupunits
foreach (DupUnit unit in dupunits)
{
DataSet ds = new DataSet();
//DataTable dt = new DataTable();
//uncompress and extract to original datatable
try
{
using (MemoryStream zipstream = new MemoryStream(unit.compressed))
{
GZipStream zip = new GZipStream(zipstream, CompressionMode.Decompress);
byte[] xmlbits = new byte[unit.complength];
//WHY ARE YOU ALWAYS 0!!!!!!!!
int bytesdecompressed = zip.Read(xmlbits, 0, unit.compressed.Length);
MemoryStream xmlstream = new MemoryStream(xmlbits);
log.DebugFormat("Uncompressed XML against {0} is: {1}", m_source.DSN, Encoding.GetEncoding(1252).GetString(xmlstream.ToArray()));
try{
ds.ReadXml(xmlstream);
}catch(Exception)
{
//it may have been base64 encoded... decode first.
ds.ReadXml(Encoding.GetEncoding(1254).GetString(
Convert.FromBase64String(
Encoding.GetEncoding(1254).GetString(xmlstream.ToArray())))
);
}
xmlstream.Dispose();
}
}
catch (Exception e)
{
log.Error(e);
Thread.Sleep(1000);//sleep a sec!
continue;
}
Note the comment above... bytesdecompressed is always 0. Any ideas? Am I doing it wrong?
EDIT 2:
So this is weird. I added the following debug code to the decompression routine:
GZipStream zip = new GZipStream(zipstream, CompressionMode.Decompress);
byte[] xmlbits = new byte[unit.complength];
int offset = 0;
while (zip.CanRead && offset < xmlbits.Length)
{
while (zip.Read(xmlbits, offset, 1) == 0) ;
offset++;
}
When debugging, sometimes that loop would complete, but other times it would hang. When I'd stop the debugging, it would be at byte 1600 out of 1616. I'd continue, but it wouldn't move at all.
EDIT 3: The bug appears to be in the compress code. For whatever reason, it is not saving all of the data. When I try to decompress the data using a third party gzip mechanism, I only get part of the original data.
I'd start a bounty, but I really don't have much reputation to give as of now :-(

Finally found the answer. The compressed data wasn't complete because GZipStream.Flush() does absolutely nothing to ensure that all of the data is out of the buffer - you need to use GZipStream.Close() as pointed out here. Of course, if you get a bad compress, it all goes downhill - if you try to decompress it, you will always get 0 returned from the Read().

I'd say this line, at least, is the most wrong:
cmd.Parameters.Add("#compressed", SqlDbType.VarBinary).Value = zipstream.GetBuffer();
MemoryStream.GetBuffer:
Note that the buffer contains allocated bytes which might be unused. For example, if the string "test" is written into the MemoryStream object, the length of the buffer returned from GetBuffer is 256, not 4, with 252 bytes unused. To obtain only the data in the buffer, use the ToArray method.
It should be noted that in the zip format, it first works by locating data stored at the end of the file - so if you've stored more data than was required, the required entries at the "end" of the file don't exist.
As an aside, I'd also recommend a different name for your compressedlength column - I'd initially taken it (despite your narrative) as being intended to store, well, the length of the compressed data (and written part of my answer to address that). Maybe originalLength would be a better name?

Related

Wrong file extension added in image byte array

I have a method that takes a web-based image, converts it into a byte array and then passes that to a CDN as an image.
I have now used this successfully hundreds of times, however on migrating a particular set of images, I've noticed that these Jpeg files are identifying as .png files which when they arrive at the CDN are blank.
Using some code that I copied from the web, I am able to identify the file extension from the image byte array after it is built.
So, it's the conversion from the original image to byte array that is mysteriously updating the file type.
This is my method:-
public byte[] Get(string fullPath)
{
byte[] imageBytes = { };
var imageRequest = (HttpWebRequest)WebRequest.Create(fullPath);
var imageResponse = imageRequest.GetResponse();
var responseStream = imageResponse.GetResponseStream();
if (responseStream != null)
{
using (var br = new BinaryReader(responseStream))
{
imageBytes = br.ReadBytes(500000);
br.Close();
}
responseStream.Close();
}
imageResponse.Close();
return imageBytes;
}
I have also tried converting this to use MemoryStream instead.
I'm not sure what else I can do to ensure that this identifies the correct file type.
Edit
I have now updated the number of allowed bytes which has resulted in viable images.
However the issue with the JPEG files being altered to PNG is still ongoing.
It's only this selection of images that are affected.
These images were saved in an old CMS system so I do wonder if the way that they were saved is the cause?
Up to now, the code only reads 500,000 bytes for each file. If the file is larger than that, the end is truncated and the content is not valid anymore. In order to read all bytes, you can use the following code:
public byte[] Get(string fullPath)
{
List<byte> imageBytes = new List<byte>(500000);
var imageRequest = (HttpWebRequest)WebRequest.Create(fullPath);
using (var imageResponse = imageRequest.GetResponse())
{
using (var responseStream = imageResponse.GetResponseStream())
{
using (var br = new BinaryReader(responseStream))
{
var buffer = new byte[500000];
int bytesRead;
while ((bytesRead = br.Read(buffer, 0, buffer.length)) > 0)
{
imageBytes.AddRange(buffer);
}
}
}
}
return imageBytes.ToArray();
}
Above sample reads the data in chunks of 500,000 bytes - for most of your files, this should be sufficient. If a file is larger, the code reads more chunks until there are no more bytes to read. All the chunks are assembled in a list.
This asserts that all the bytes are read, even if the content is larger than 500,000 bytes.

How to read large file from SQL Server?

I tried to read file (650 megabytes) from SQL Server:
using (var reader = command.ExecuteReader(CommandBehavior.SequentialAccess))
{
if (reader.Read())
{
using (var dbStream = reader.GetStream(0))
{
if (!reader.IsDBNull(0))
{
stream.Position = 0;
dbStream.CopyTo(stream, 256);
}
dbStream.Close();
}
}
reader.Close();
}
But I got OutOfMemoryException on CopyTo().
With small files, this code snippet works fine. How can I handle large file?
You can read and write data to some temp file in small chunks. You can see example on MSDN - Retrieving Binary Data.
//Column Index in the result set
const int colIdx = 0;
// Writes the BLOB to a file (*.bmp).
FileStream stream;
// Streams the BLOB to the FileStream object.
BinaryWriter writer;
// Size of the BLOB buffer.
int bufferSize = 100;
// The BLOB byte[] buffer to be filled by GetBytes.
byte[] outByte = new byte[bufferSize];
// The bytes returned from GetBytes.
long retval;
// The starting position in the BLOB output.
long startIndex = 0;
// Open the connection and read data into the DataReader.
connection.Open();
SqlDataReader reader = command.ExecuteReader(CommandBehavior.SequentialAccess);
while (reader.Read())
{
// Create a file to hold the output.
stream = new FileStream(
"some-physical-file-name-to-dump-data.bmp", FileMode.OpenOrCreate, FileAccess.Write);
writer = new BinaryWriter(stream);
// Reset the starting byte for the new BLOB.
startIndex = 0;
// Read bytes into outByte[] and retain the number of bytes returned.
retval = reader.GetBytes(colIdx, startIndex, outByte, 0, bufferSize);
// Continue while there are bytes beyond the size of the buffer.
while (retval == bufferSize)
{
writer.Write(outByte);
writer.Flush();
// Reposition start index to end of last buffer and fill buffer.
startIndex += bufferSize;
retval = reader.GetBytes(colIdx, startIndex, outByte, 0, bufferSize);
}
// Write the remaining buffer.
writer.Write(outByte, 0, (int)retval);
writer.Flush();
// Close the output file.
writer.Close();
stream.Close();
}
// Close the reader and the connection.
reader.Close();
connection.Close();
Make sure you are using SqlDataReader with CommandBehavior.SequentialAccess, note this line in above code snippet.
SqlDataReader reader = command.ExecuteReader(CommandBehavior.SequentialAccess);
More information on CommandBehavior enum can be found here.
EDIT:
Let me clarify myself. I agreed with #MickyD, cause of the problem is not whether you are using CommandBehavior.SequentialAccess or not, but reading the large file at-once.
I emphasized on this because it is commonly missed by developers, they tend to read files in chunks but without setting CommandBehavior.SequentialAccess they will encounter other problems. Although it is already posted with original question, but highlighted in my answer to give point to any new comers.
#MatthewWatson yeah var stream = new MemoreStream(); What is not right with it? – Kliver Max 15 hours ago
Your problem is not whether or not you are using:
`command.ExecuteReader(CommandBehavior.SequentialAccess)`
...which you are as we can see; or that your stream copy buffer size is too big (it's actually tiny) but rather that you are using MemoryStream as you indicated in the comments above. More than likely you are loading in the 650MB file twice, once from SQL and another to be stored in the MemoryStream thus leading to your OutOfMemoryException.
Though the solution is to instead write to a file stream, the cause of the problem wasn't highlighted in the accepted answer. Unless you know the cause of a problem, you won't learn to avoid such issues in the future.

Convert List of bytes to memorystream without using ToArray()

How can I save my list<byte> to MemoryStream() without using ToArray() or creating new array ?
This is my current method:
public Packet(List<byte> data)
{
// Create new stream from data buffer
using (Stream stream = new MemoryStream(data.ToArray()))
{
using (BinaryReader reader = new BinaryReader(stream))
{
Length = reader.ReadInt16();
pID = reader.ReadByte();
Result = reader.ReadByte();
Message = reader.ReadString();
ID = reader.ReadInt32();
}
}
}
The ToArray solution is the most efficient solution possible using documented APIs. MemoryStream will not copy the array. It will just store it. So the only copy is in List<T>.ToArray().
If you want to avoid that copy you need to pry List<T> open using reflection and access the backing array. I advise against that.
Instead, use a collection that allows you to obtain the backing array using legal means. Write your own, or use a MemoryStream in the first place.
A List<T> is not the most efficient way to move around bytes anyway. Storing them is fine, moving them usually has more overhead. For example, adding items bytewise will be far slower than a memcpy.
What about something like:
public Packet(List<byte> data)
{
using (Stream stream = new MemoryStream())
{
// Loop list and write out bytes
foreach(byte b in data)
stream.WriteByte(b);
// Reset stream position ready for read
stream.Seek(0, SeekOrigin.Begin);
using (BinaryReader reader = new BinaryReader(stream))
{
Length = reader.ReadInt16();
pID = reader.ReadByte();
Result = reader.ReadByte();
Message = reader.ReadString();
ID = reader.ReadInt32();
}
}
}
But why do you have a list in the first place? Can't you pass it into the method as a byte[] to start with? It'd be interesting to see how you populate that list.

C++ zlib inflate failing - translation of c# fixup?

I'm trying to inflate a string using zlib's deflate, but it's failing, apparently because it doesn't have the right header. I read elsewhere that the C# solution to this problem is:
public static byte[] FlateDecode(byte[] inp, bool strict) {
MemoryStream stream = new MemoryStream(inp);
InflaterInputStream zip = new InflaterInputStream(stream);
MemoryStream outp = new MemoryStream();
byte[] b = new byte[strict ? 4092 : 1];
try {
int n;
while ((n = zip.Read(b, 0, b.Length)) > 0) {
outp.Write(b, 0, n);
}
zip.Close();
outp.Close();
return outp.ToArray();
}
catch {
if (strict)
return null;
return outp.ToArray();
}
}
But I know nothing about C#. I can surmise that all it's doing is adding a prefix to the string, but what that prefix is, I have no idea. Would someone be able to phrase this function (or even just the header creation and string concatenation) in C++?
The data which I'm trying to inflate is taken from a PDF using zlib deflation.
Thanks a million,
Wyatt
I've had better luck using SharpZipLib for zlib interop than with the native .Net Framework classes. This correctly handles streams from C++ (zlib native) and from Java's compression classes without any funny business being needed.
I can't see any prefixes, sorry. Here's what the logic appears to be; sorry this isn't in C++:
MemoryStream stream = new MemoryStream(inp);
InflaterInputStream zip = new InflaterInputStream(stream);
Create an inflate stream from the data passed
MemoryStream outp = new MemoryStream();
Create a memory buffer stream for output
byte[] b = new byte[strict ? 4092 : 1];
try {
int n;
while ((n = zip.Read(b, 0, b.Length)) > 0) {
If you're in strict mode, read up to 4092 bytes - or 1 in non-strict mode - into a byte buffer
outp.Write(b, 0, n);
Write all the bytes decoded (may be less than the 4092) to the output memory buffer stream
zip.Close();
outp.Close();
return outp.ToArray();
Clean up, and return the output memory buffer stream as an array.
I'm a bit confused, though: why not just cut array b off at n elements and return that rather than go via a MemoryStream? The code also ought really to take care to clean up the memory streams and zip on exception (e.g. using using) since they're all IDisposable but I guess that's not really important since they don't correspond to I/O file handles, only memory structures.

C# Error reading two dates from a binary file

When reading two dates from a binary file I'm seeing the error below:
"The output char buffer is too small to contain the decoded characters, encoding 'Unicode (UTF-8)' fallback 'System.Text.DecoderReplacementFallback'. Parameter name: chars"
My code is below:
static DateTime[] ReadDates()
{
System.IO.FileStream appData = new System.IO.FileStream(
appDataFile, System.IO.FileMode.Open, System.IO.FileAccess.Read);
List<DateTime> result = new List<DateTime>();
using (System.IO.BinaryReader br = new System.IO.BinaryReader(appData))
{
while (br.PeekChar() > 0)
{
result.Add(new DateTime(br.ReadInt64()));
}
br.Close();
}
return result.ToArray();
}
static void WriteDates(IEnumerable<DateTime> dates)
{
System.IO.FileStream appData = new System.IO.FileStream(
appDataFile, System.IO.FileMode.Create, System.IO.FileAccess.Write);
List<DateTime> result = new List<DateTime>();
using (System.IO.BinaryWriter bw = new System.IO.BinaryWriter(appData))
{
foreach (DateTime date in dates)
bw.Write(date.Ticks);
bw.Close();
}
}
What could be the cause? Thanks
The problem is that you're using PeekChar - that's trying to decode binary data as if it were a UTF-8 character. Unfortunately, I can't see anything else in BinaryReader which allows you to detect the end of the stream.
You could just keep calling ReadInt64 until it throws an EndOfStreamException, but that's pretty horrible. Hmm. You could call ReadBytes(8) and then BitConverter.ToInt64 - that would allow you to stop when ReadBytes returns a byte array with anything less than 8 bytes... it's not great though.
By the way, you don't need to call Close explicitly as you're already using a using statement. (That goes for both the reader and the writer.)
I think Jon is correct that it's PeekChar that chokes on the binary data.
Instead of streaming the data, you could get it all as an array and get the values from that:
static DateTime[] ReadDates() {
List<DateTime> result = new List<DateTime>();
byte[] data = File.ReadAllBytes(appDataFile);
for (int i = 0; i < data.Length; i += 8) {
result.Add(new DateTime(BitConverter.ToInt64(data, i)));
}
return result;
}
A simple solution to your problem would be to explicitly specify ASCII encoding for the BinaryReader, that way PeekChar() uses only a single byte and this kind of exception (a .NET bug, actually) doesn't happen:
using (System.IO.BinaryReader br = new System.IO.BinaryReader(appData, Encoding.ASCII))

Categories

Resources