I have files stored in one container within a blob storage account. I need to create a zip file in a second container containing the files from the first container.
I have a solution that works using a worker role and DotNetZip but because the zip file could end up being 1GB in size I am concerned that doing all the work in-process, using MemoryStream objects etc. is not the best way of doing this. My biggest concern is that of memory usage and freeing up resources given that this process could happen several times a day.
Below is some very stripped down code showing the basic process in the worker role:
using (ZipFile zipFile = new ZipFile())
{
foreach (var uri in uriCollection)
{
var blob = new CloudBlob(uri);
byte[] fileBytes = blob.DownloadByteArray();
using (var fileStream = new MemoryStream(fileBytes))
{
fileStream.Seek(0, SeekOrigin.Begin);
byte[] bytes = CryptoHelp.EncryptAsBytes(fileStream, "password", null);
zipFile.AddEntry("entry name", bytes);
}
}
using (var zipStream = new MemoryStream())
{
zipFile.Save(zipStream);
zipStream.Seek(0, SeekOrigin.Begin);
var blobRef = ContainerDirectory.GetBlobReference("output uri");
blobRef.UploadFromStream(zipStream);
}
}
Can someone suggest a better approach please?
At the time of writing this question, I was unaware of the LocalStorage options available in Azure. I was able to write files individually to this and the work with them within the LocalStorage and then write them back to blob storage.
If all you are worried aobut is your memorysteam taking up too much memory then what you can do is implement your own stream and as your stream is being read, you you add your zip files to the stream and remove already read files from the stream. This will keep your memory stream size to the size of one file.
Related
I have a very interesting problem that I hope I can solve using .Net, simply I have a zip file in google storage which I want to decompress and move to a different bucket, but I don't have enough memory nor storage to save the whole file and decompress. To solve this issue I have to read the central directory part of the zip file at the end of the file and then do streaming decompress. Did anyone work on a similar issue?
So far I figured to get the last 1024 bytes from the file using the following code:
var fileInfo = _storage.GetObject(BucketName, fileName, new GetObjectOptions { Projection = Projection.Full });
var stream = new MemoryStream();
_storage.DownloadObject(BucketName, fileName, stream, new DownloadObjectOptions { Range = new System.Net.Http.Headers.RangeHeaderValue((long)(fileInfo.Size - 1024), (long)(fileInfo.Size)) });
The problem is I can't read the central directory from this stream:
ZipArchive z = new ZipArchive(stream);
You can try to adapt sunzip to your needs. It reads a zip file as a stream and decompresses it.
I have a ZipArchive object which contains an XML file that I am modifying. I then want to return the modified ZipArchive.
Here's the code I have:
var package = File.ReadAllBytes(/* location of existing .zip */);
using (var packageStream = new MemoryStream(package, true))
using (var zipPackage = new ZipArchive(packageStream, ZipArchiveMode.Update))
{
// obtain the specific entry
var myEntry = zipPackage.Entries.FirstOrDefault(entry => /* code elided */));
XElement xContents;
using (var reader = new StreamReader(myEntry.Open()))
{
// read the contents of the myEntry XML file
// then modify the contents into xContents
}
using (var writer = new StreamWriter(myEntry.Open()))
{
writer.Write(xContents.ToString());
}
return packageStream.ToArray();
}
This code throws a "Memory stream is not expandable" exception on the packageStream.ToArray() call.
Can anyone explain what I've done wrongly, and what is the correct way of updating an existing file inside a ZipArchive?
Clearly, ZipArchive wants to expand or resize the ZIP archive stream. However, you have provided a MemoryStream with a fixed stream length (due to using the constructor MemoryStream(byte[], bool), which creates a memory stream with a fixed length that is equal to the length of the array provided to the constructor).
Since ZipArchive wants to expand (or resize) the stream, provide an resizable MemoryStream (using its parameter-less constructor). Then copy the original file data into this MemoryStream and proceed with the ZIP archive manipulations.
And don't forget to reset the MemoryStream read/write position back to 0 after copying the original file data into it, otherwise ZipArchive will only see "End of Stream" when trying to read the ZIP archive data from this stream.
using (var packageStream = new MemoryStream())
{
using (var fs = File.OpenRead(/* location of existing .zip */))
{
fs.CopyTo(packageStream);
}
packageStream.Position = 0;
using (var zipPackage = new ZipArchive(packageStream, ZipArchiveMode.Update))
{
... do your thing ...
}
return packageStream.ToArray();
}
This code here contains one more correction. In the original code in the question, return packageStream.ToArray(); has been placed within the using block of the ZipArchive. At the time this line will be executed, the ZipArchive instance might not yet have written all data to the MemoryStream, perhaps keeping some data still in some internal buffers and/or perhaps having deferred writing some ZIP data structures.
To ensure that the ZipArchive has actually written all necessary data completely to the MemoryStream, it is here sufficient to move return packageStream.ToArray(); outside after the ZipArchive using block. At the end of its using block, the ZipArchive will be disposed which will also ensure that ZipArchive has written all so far yet unwritten data to the stream. Thus, accessing the MemoryStream after the ZipArchive has been disposed off will yield the complete data of the completely updated ZIP archive.
Side note: Do this only with small-ish ZIP files. The MemoryStream will obviously use internal data buffers (arrays) to hold the data in the MemoryStream. However, packageStream.ToArray(); will create a copy of the data in the MemoryStream, so for a period of time the memory requirements of this routine will be a little more than twice the size of the ZIP archive.
I am calling REST API which is accepting Stream to upload file from local device, so for that right now I am using following code to get Stream from a file and than closing that stream after it get's uploaded:
var stream = new FileStream(file, FileMode.Open, FileAccess.ReadWrite);
The problem with the above approach is that, until entire file gets uploaded to server user don't have any chance to delete that file because stream of that file is open, what would be the solution to resolve this issue?
If your typical file is reasonably sized (and I'm hoping you won't be uploading 2GB+ files to a REST API), you could always just read the stream into memory and before feeding it to your API, like so:
using (MemoryStream memoryStream = new MemoryStream())
{
using (FileStream fileStream = new FileStream(file, FileMode.Open, FileAccess.ReadWrite)) {
fileStream.CopyTo(memoryStream);
}
memoryStream.Position = 0; // Reset to origin.
// Now use the MemoryStream as you would a FileStream:
api.Upload(memoryStream);
}
Another alternative is to create a temp copy of the file on your hard drive and feed that to the API - but then dealing with cleanup can become a bit cumbersome. FileOptions.DeleteOnClose is your friend and may very well suffice for your purposes, but it still offers no bulletproof guarantees.
I have been developing an web application with Asp.Net and I'm using SharpZipLib to work with odt files (from Open Office) and in the future docx files (for ms office). I need to open an odt file (like a zip file) change a xml file inside it, zip again and give it to the browser send to my client.
I can do this in file system but it will get a space in my disk temporarily and we don't want it. I would like to do this in memory (with a MemoryStream class), but I don't know how to unzip folders/files in a memory stream with SharpZipLib, change and use it to zip again. Is there any sample about how to do this?
Thank you
You can use something like
Stream inputStream = //... File.OpenRead(...);
//for read file
ZipInputStream zipInputStream = new ZipInputStream(inputStream));
//for output
MemoryStream memoryStream = new MemoryStream();
using ( ZipOutputStream zipStream = new ZipOutputStream(memoryStream))
{
ZipEntry entry = new ZipEntry("...");
//...
zipStream.PutNextEntry(entry);
zipStream.Write(data, 0, data.Length);
//...
zipStream.Finish();
zipStream.Close();
}
Edit ::
You need in general unZip your file, get ZipEntry , change , and write in ZipOutputStream with MemoryStream.
Use this article http://www.codeproject.com/KB/cs/Zip_UnZip.aspx
Is it possible to open a file directly from a MemoryStream opposed to writing to disk and doing Process.Start() ? Specifically a pdf file? If not, I guess I need to write the MemoryStream to disk (which is kind of annoying). Could someone then point me to a resource about how to write a MemoryStream to Disk?
It depends on the client :) if the client will accept input from stdin you could push the dta to the client. Another possibility might be to write a named-pipes server or a socket-server - not trivial, but it may work.
However, the simplest option is to just grab a temp file and write to that (and delete afterwards).
var file = Path.GetTempFileName();
using(var fileStream = File.OpenWrite(file))
{
var buffer = memStream.GetBuffer();
fileStream.Write(buffer, 0, (int)memStream.Length);
}
Remember to clean up the file when you are done.
Path.GetTempFileName() returns file name with '.tmp' extension, therefore you cant't use Process.Start() that needs windows file association via extension.
If by opening a file, you mean something like starting Adobe Reader for PDF files, then yes, you have to write it to a file. That is, unless the application provides you with some API do that.
One way to write a stream to file would be:
using (var memoryStream = /* create the memory stream */)
using (var fileStream = File.OpenWrite(fileName))
{
memoryStream.WriteTo(fileStream);
}