Is it possible to compress a List<T> using SharpZipLib?
Turning the List to a byte array gives me around 60000 bytes (uncompressed).
Compression this with System.IO.Compression.DeflateStream gives me around a 1/3 compression rate but this is far from enough.
The purpose is to store the collections in the (MS SQL) database as a byte[] because saving them as individual rows uses to much space (1 million rows/day).
Thanks
Edit:
List<ItemLog> itemLogs = new List<ItemLog>();
//populate with 1000 ItemLogs
byte[] array = null; //original byte array
BinaryFormatter bf = new BinaryFormatter();
MemoryStream ms = new MemoryStream();
bf.Serialize(ms, itemLogs);
array = ms.ToArray();
the array size is now 60000 bytes
Zip the collection using a ZipOutputStream
MemoryStream outputMemoryStream = new MemoryStream();
ZipOutputStream zipStream = new ZipOutputStream(outputMemoryStream);
zipStream.SetLevel(3);
ZipEntry entry = new ZipEntry("logs");
entry.DateTime = DateTime.Now;
zipStream.PutNextEntry(entry);
StreamUtils.Copy(ms, zipStream, new byte[4096]);
zipStream.CloseEntry();
zipStream.IsStreamOwner = false;
zipStream.Close();
outputMemoryStream.Position = 0;
byte[] compressed = outputMemoryStream.ToArray();
The compressed is now 164 bytes in size. <- length not valid/possible?
Uncompressing gives me a empty array. But as the compression is not right I will skip the uncompression code for now.
I do not see any real problem in your code. The only part, where the problem can be is copying of data. Is the stream of input data at the start of data, that should be stored? Try add the following line:
ms.Seek(0, SeekOrigin.Begin); // added line
StreamUtils.Copy(ms, zipStream, new byte[4096]);
Based on your code I wrote a simple compress function and it works as expected.
private static byte[] Compress(byte[] source)
{
byte[] compressed;
using (var memory = new MemoryStream())
using (var zipped = new ZipOutputStream(memory))
{
zipped.IsStreamOwner = false;
zipped.SetLevel(9);
var entry = new ZipEntry("data")
{
DateTime = DateTime.Now
};
zipped.PutNextEntry(entry);
#if true
zipped.Write(source, 0, source.Length);
#else
using (var src = new MemoryStream(source))
{
StreamUtils.Copy(src, zipped, new byte[4096]);
}
#endif
zipped.Close();
compressed = memory.ToArray();
}
#if false
using (var file = new FileStream("test.zip", FileMode.Create, FileAccess.Write, FileShare.Read))
{
file.Write(compressed, 0, compressed.Length);
}
#endif
return compressed;
}
You have there alternatives how to save the output (array or stream) and there is disabled code to save compressed data to file (to check in a external application the real content of compressed data).
My tested data were a 256 bytes long (data, with small compression rate), and the result was 407 bytes (file).
Try to use array, or check the stream content that is saved.
Related
I have 27000 images that are stored in a folder and which need to be added to the database using EntityFramework. I have a code
var files = Directory.GetFiles(path, "*", SearchOption.AllDirectories);
foreach(var file in files)
{
using (ApplicationContext db = new ApplicationContext())
{
Image img = Image.FromFile(file);
var imgRes = ResizeImage(img, ImageSettings.Width, ImageSettings.Height);
MemoryStream memoryStream = new MemoryStream();
img.Save(memoryStream, ImageFormat.Png);
var label = Directory.GetParent(file).Name;
var bytes = memoryStream.ToArray();
memoryStream.Close();
db.Add(new ImageData { Image = bytes, Label = label });
img.Dispose();
memoryStream.Dispose();
imgRes.Dispose();
}
}
it only works when there are less than 10,000 images otherwise I get Out of memory exception.
how can i upload my 27000 images to the database.
First of all, this code doesn't deal with entities or objects, so using an ORM doesn't help at all. This doesn't cause the OOM though, it only makes the code a lot slower.
The real problem is that MemoryStream is actually a wrapper around a buffer. Once the buffer is full, a new one is reallocated with double the size, the original data are copied over and the old buffer deleted. Growing a 50MB byte buffer this way results in a lot of reallocations, log2(50M). This fragments the free memory to the point the runtime can no longer allocate a large enough contiguous buffer. This results in OOMs with List<T> objects too, not just MemoryStreams.
The quick fix would be to pass the expected size as the stream's capacity through the MemoryStream(Int32) constructor. This cuts down on reallocations and saves a lot of CPU cycles. The number doesn't have to be exact, just large enough to avoid too much garbage :
using(Image img = Image.FromFile(file))
using(var imgRes = ResizeImage(img, ImageSettings.Width, ImageSettings.Height))
using(var memoryStream = new MemoryStream(10_000_000))
{
img.Save(memoryStream, ImageFormat.Png);
var label = Directory.GetParent(file).Name;
var bytes = memoryStream.ToArray();
db.Add(new ImageData { Image = bytes, Label = label });
}
There's no need to close MemoryStream, it's just a wrapper over an array. That still allocates a big buffer for each file though.
If we know the maximum file size, we can allocate a single buffer and reuse it in all iterations. In this case the size matters - it's no longer possible to resize the buffer :
var buffer=new byte[100_000_000];
using(Image img = Image.FromFile(file))
using(var imgRes = ResizeImage(img, ImageSettings.Width, ImageSettings.Height))
using(var memoryStream = new MemoryStream(buffer))
{
img.Save(memoryStream, ImageFormat.Png);
var label = Directory.GetParent(file).Name;
var bytes = memoryStream.ToArray();
db.Add(new ImageData { Image = bytes, Label = label });
}
Try to avoid creating ApplicationContext in foreach loop:
using (ApplicationContext db = new ApplicationContext())
{
foreach(var file in files)
{
using (MemoryStream ms = new MemoryStream())
using(Image img = Image.FromFile(file))
using(var imgRes = ResizeImage(img, ImageSettings.Width, ImageSettings.Height))
{
var imgRes = ResizeImage(img, ImageSettings.Width, ImageSettings.Height);
MemoryStream memoryStream = new MemoryStream();
img.Save(memoryStream, ImageFormat.Png);
var label = Directory.GetParent(file).Name;
var bytes = memoryStream.ToArray();
db.Add(new ImageData { Image = bytes, Label = label });
}
}
}
I am trying to compress image(usually around 5-30) quality / size with Magick.NET library, and I cant really understand how can I use ImageOptimizer class and call LosslessCompress() method using stream.
Do I need to use FileStream or MemoryStream?
Do I need to save / create a temp file on server for each image and then proceed with the compression flow? (Performance?)
Anything else?
Simple Code example:
private byte[] ConvertImageToByteArray(IFormFile image)
{
byte[] result = null;
// filestream
using (var fileStream = image.OpenReadStream())
// memory stream
using (var memoryStream = new MemoryStream())
{
var before = fileStream.Length;
ImageOptimizer optimizer = new ImageOptimizer();
optimizer.LosslessCompress(fileStream); // what & how can I pass here stream?
var after = fileStream.Length;
// convert to byte[]
fileStream.CopyTo(memoryStream);
result = memoryStream.ToArray();
}
return result;
}
You cannot use the fileStream because the stream needs to be both readable and writable. If you first copy the data to a memorystream you can then compresses the image in that stream. Your code should be changed to this:
private byte[] ConvertImageToByteArray(IFormFile image)
{
byte[] result = null;
// filestream
using (var fileStream = image.OpenReadStream())
// memory stream
using (var memoryStream = new MemoryStream())
{
fileStream.CopyTo(memoryStream);
memoryStream.Position = 0; // The position needs to be reset.
var before = memoryStream.Length;
ImageOptimizer optimizer = new ImageOptimizer();
optimizer.LosslessCompress(memoryStream);
var after = memoryStream.Length;
// convert to byte[]
result = memoryStream.ToArray();
}
return result;
}
I am testing some code. I am stuck with the following. What ever I write as text, the length of the zipped stream is always 10? What am I doing wrong?
var inStream = new MemoryStream();
var inWriter = new StreamWriter(inStream);
str text = "HelloWorldsasdfghj123455667880fgsjfhdfasdferrbvbyjun hbwecwcxqsz edcrgvebrjnuj5juerqwetsrgfggshurhtnbvzkfjhguhgrgal;kjhao;rhl;zkfhg;aorihghg;oahrgarhguhh';aaeaeiaijeihjrhfidfhfidfidhh953453453";
inWriter.WriteLine(text);
inWriter.Flush();
inStream.Position = 0;
var outStream = new MemoryStream();
var compressStream = new GZipStream(outStream, CompressionMode.Compress);
inStream.CopyTo(compressStream);
compressStream.Flush();
outStream.Flush();
compressStream.Flush();
outStream.Position = 0;
Console.WriteLine(outStream.Position);
Console.WriteLine(outStream.Length);
Until you Close it the compression stream doesn't know you've finished writing to it - so cannot complete its compression algorithm. Flushing flushes those parts it can flush, but until its been told you have completed adding new bytes it cannot flush its last package of compressed data.
I'm about to lose my freaking mind. I've been trying to get GzipStream to compress a string for the past hour, but for whatever reason, it refuses to write the entire byte array to the memory stream. At first I thought it had something to do with the using statements, but even after removing them it didn't seem to make a difference.
Initial config:
var str = "Here is a relatively simple string to compress";
byte[] compressedBytes;
string returnedData;
var bytes = Encoding.UTF8.GetBytes(str);
Works correctly (writes 64 length byte array):
using (var msi = new MemoryStream(bytes))
using (var mso = new MemoryStream()) {
using (var gs = new GZipStream(mso, CompressionMode.Compress)) {
msi.CopyTo(gs);
}
compressedBytes = mso.ToArray();
}
Fails (writes 10 length byte array):
using(var mso = new MemoryStream())
using(var msi = new MemoryStream(bytes))
using(var zip = new GZipStream(mso, CompressionMode.Compress))
{
msi.CopyTo(zip);
compressedBytes = mso.ToArray();
}
Also fails (writes 10 length byte array):
var mso = new MemoryStream();
var msi = new MemoryStream(bytes);
var zip = new GZipStream(mso, CompressionMode.Compress);
msi.CopyTo(zip);
compressedBytes = mso.ToArray();
Can somebody explain why the first one works but in the other two I'm getting these incomplete arrays? Is something getting disposed out from under me? For that matter, is there a way for me to avoid using two memorystreams?
Thanks,
Zoombini
System.IO.Compression.GZipStream has to be closed (disposed) before you can use the underlying stream, because
It works block oriented
It has to write the footer, including the checksum (see the file format description on Wikipedia)
You're trying to get the the compressed data before GZipStream is closed. This doesn't return the full data, as you've seen. The reason the first one works is because you're calling compressedBytes = mso.ToArray(); after GZipStream has been disposed. So, untested but in theory, you should be able to modify your second code slightly like this to get it to work.
using(var mso = new MemoryStream())
{
using(var msi = new MemoryStream(bytes))
using(var zip = new GZipStream(mso, CompressionMode.Compress))
{
msi.CopyTo(zip);
}
compressedBytes = mso.ToArray();
}
As others have said, you need to close the GZipStream before you can get the full data. A using statement will cause the Dispose method to be called on the stream at the end of the block, which will close the stream if it is not already closed. All of your examples above will work as expected if you place zip.Close(); after msi.CopyTo(zip);.
You can eliminate one of the MemoryStreams if you write it this way:
using (MemoryStream mso = new MemoryStream())
{
using (GZipStream zip = new GZipStream(mso, CompressionMode.Compress))
{
zip.Write(bytes, 0, bytes.Length);
}
compressedBytes = mso.ToArray();
}
I am attempting to create a new FileStream object from a byte array. I'm sure that made no sense at all so I will try to explain in further detail below.
Tasks I am completing:
1) Reading the source file which was previously compressed
2) Decompressing the data using GZipStream
3) copying the decompressed data into a byte array.
What I would like to change:
1) I would like to be able to use File.ReadAllBytes to read the decompressed data.
2) I would then like to create a new filestream object usingg this byte array.
In short, I want to do this entire operating using byte arrays. One of the parameters for GZipStream is a stream of some sort, so I figured I was stuck using a filestream. But, if some method exists where I can create a new instance of a FileStream from a byte array - then I should be fine.
Here is what I have so far:
FolderBrowserDialog fbd = new FolderBrowserDialog(); // Shows a browser dialog
fbd.ShowDialog();
// Path to directory of files to compress and decompress.
string dirpath = fbd.SelectedPath;
DirectoryInfo di = new DirectoryInfo(dirpath);
foreach (FileInfo fi in di.GetFiles())
{
zip.Program.Decompress(fi);
}
// Get the stream of the source file.
using (FileStream inFile = fi.OpenRead())
{
//Create the decompressed file.
string outfile = #"C:\Decompressed.exe";
{
using (GZipStream Decompress = new GZipStream(inFile,
CompressionMode.Decompress))
{
byte[] b = new byte[blen.Length];
Decompress.Read(b,0,b.Length);
File.WriteAllBytes(outfile, b);
}
}
}
Thanks for any help!
Regards,
Evan
It sounds like you need to use a MemoryStream.
Since you don't know how many bytes you'll be reading from the GZipStream, you can't really allocate an array for it. You need to read it all into a byte array and then use a MemoryStream to decompress.
const int BufferSize = 65536;
byte[] compressedBytes = File.ReadAllBytes("compressedFilename");
// create memory stream
using (var mstrm = new MemoryStream(compressedBytes))
{
using(var inStream = new GzipStream(mstrm, CompressionMode.Decompress))
{
using (var outStream = File.Create("outputfilename"))
{
var buffer = new byte[BufferSize];
int bytesRead;
while ((bytesRead = inStream.Read(buffer, 0, BufferSize)) != 0)
{
outStream.Write(buffer, 0, bytesRead);
}
}
}
}
Here is what I ended up doing. I realize that I did not give sufficient information in my question - and I apologize for that - but I do know the size of the file I need to decompress as I am using it earlier in my program. This buffer is referred to as "blen".
string fi = #"C:\Path To Compressed File";
// Get the stream of the source file.
// using (FileStream inFile = fi.OpenRead())
using (MemoryStream infile1 = new MemoryStream(File.ReadAllBytes(fi)))
{
//Create the decompressed file.
string outfile = #"C:\Decompressed.exe";
{
using (GZipStream Decompress = new GZipStream(infile1,
CompressionMode.Decompress))
{
byte[] b = new byte[blen.Length];
Decompress.Read(b,0,b.Length);
File.WriteAllBytes(outfile, b);
}
}
}