decompress files on google storage on the fly using c# - c#

I have a very interesting problem that I hope I can solve using .Net, simply I have a zip file in google storage which I want to decompress and move to a different bucket, but I don't have enough memory nor storage to save the whole file and decompress. To solve this issue I have to read the central directory part of the zip file at the end of the file and then do streaming decompress. Did anyone work on a similar issue?
So far I figured to get the last 1024 bytes from the file using the following code:
var fileInfo = _storage.GetObject(BucketName, fileName, new GetObjectOptions { Projection = Projection.Full });
var stream = new MemoryStream();
_storage.DownloadObject(BucketName, fileName, stream, new DownloadObjectOptions { Range = new System.Net.Http.Headers.RangeHeaderValue((long)(fileInfo.Size - 1024), (long)(fileInfo.Size)) });
The problem is I can't read the central directory from this stream:
ZipArchive z = new ZipArchive(stream);

You can try to adapt sunzip to your needs. It reads a zip file as a stream and decompresses it.

Related

DotNetZip doesn't read all entries from zip stored at azure blob storage

I have a zip stored in azure blob storage which I'm streaming it locally and iterating its entries.
I'm getting the stream like that:
BlobClient blob = _blobServiceClientProp.GetBlobContainerClient(blobExtractionSource.ContainerName)
.GetBlobClient(blobExtractionSource.BlobName);
Stream zipStream = await blob.OpenReadAsync().ConfigureAwait(false);
The stream length is valid (8890655642 bytes).
Using DotNetZip 1.16, I'm reading from the zip stream:
ZipFile zipFile = ZipFile.Read(zipStream);
The problem is that I'm getting wrong number of entries. According to DotNetZip, I have 41082 entries in the zip which is wrong. I checked the number of entries both by the Entries property (zipFile.Entries) and also by iterating and count them manually.
If I switch to IO.Compression.ZipArchive and iterating the zip entries, IO.Compression.ZipArchive is telling me I have 85,413 entries in the zip, which is the right number of entries.
Any suggestions how can I still work with DotNetZip and make it get the right number of entries?
Note that when reading from the same zip locally (after I manually download it) with same version of DotNetZip, I successfully get all the entries.
Through ZipArchive we could able to pull off the exact number. Below is the code that worked for us.
var stream = await blobClient.OpenReadAsync();
using ZipArchive zip = new ZipArchive(stream, ZipArchiveMode.Read);
Console.WriteLine(zip.Entries.Count);
Edit
The Code that worked for us using DotNetZip
BlobClient blob = blobServiceClient.GetBlobContainerClient("container-1")
.GetBlobClient("samplezip1.zip");
Stream zipStream = await blob.OpenReadAsync().ConfigureAwait(false);
ZipFile zipFile = ZipFile.Read(zipStream);
Console.WriteLine(zipFile.Entries.Count);

Issues with saving XML file (C#)

I'm trying to make a config file from an XML file, but I can't figure out how to save the file after I add to it. I can read from the file fine, so I know it's not an issue with where it's located, but I still don't know how to save it.
I've looked around for about 2 hours and can't figure out the problem. I'm know my way around c# but am completely new to XML.
public async Task CreateReaction(string name, DiscordMessage message, DiscordEmoji emoji, DiscordRole role)
{
string path = #"E:\Visual Studio\repos\JustHangoutBot\bin\Debug\netcoreapp1.1\configs\reactions.xml";
XDocument doc = XDocument.Load(path);
await message.CreateReactionAsync(emoji);
XElement root = new XElement(name);
root.Add(new XElement("MessageID", message.Id));
root.Add(new XElement("ReactionID", emoji.Id));
root.Add(new XElement("RoleID", role.Id));
doc.Element("Reactions").Add(root);
byte[] byteArray = Encoding.UTF8.GetBytes(path);
MemoryStream stream = new MemoryStream(byteArray);
doc.Save(stream);
}
I think the problem is somewhere in the last three lines. I've seen tutorials of people saving the file by just using doc.Save("reactions.xml") for example, but I get the error of not being able to convert from string to Stream.
Any help would be appreciated. Thank you in advance!
This will do it:
using (var fileStream = System.IO.File.OpenWrite("path to the file you want to write"))
{
doc.Save(fileStream);
}
When you do this:
byte[] byteArray = Encoding.UTF8.GetBytes(path);
MemoryStream stream = new MemoryStream(byteArray);
doc.Save(stream);
What's happening is
You're opening the file at path and reading it into a byte array.
You're creating a MemoryStream that has those bytes as its content
You're saving that document to the MemoryStream.
Under the hood a MemoryStream is just an array of bytes in memory. So it's writing the file to memory, not to a file.
File.OpenWrite(path) opens a FileStream with the specified path. If the file doesn't exist it creates it. If the file does exist it will overwrite it.
So when you call doc.Save(fileStream) you're writing to the file.

Compress large log file before reading

We have a large amount of logs (117 logs with total of about 17gb of data). It's straight text so I know it will compress well. I'm not looking for great compression, or speed (but that would be a good bonus). What I currently do is get a list of log files to read (they have a date stamp in the file name, so I filter on that first). After I get the list I then read each file using File.ReadAllLines() but we also filter on that...
private void GetBulkUpdateItems(List<string> allLines, Regex updatedRowsRegEx)
{
foreach (var file in this)
allLines.AddRange(File.ReadAllLines(file).Where(x => updatedRowsRegEx.IsMatch(x)));
allLines.Sort();
}
reading 5 files from the network takes about 22 seconds. What I'd like to do is compress the list of files into a single zip file. copy the zip file locally, then unzip them and do the rest. Problem is I can't figure out how to start. Since I'm using .net 4.5 I first tried System.IO.Compression.ZipFile but it wants a Directory and I don't want all 117 files. I saw someone use a network stream and 7zip which sounded promising, and I'm fairly certain that 7zip is installed on the server I need the logs from (Probably not important because we use the UNC path). So I'm stuck. Any suggestions?
ZipArchive is the underlying class for ZipFile and allows more granular manipulation.
Sample from the article adding hardcoded text:
using (FileStream zipToOpen = new FileStream(
#"c:\users\exampleuser\release.zip", FileMode.Open))
{
using (ZipArchive archive = new ZipArchive(zipToOpen, ZipArchiveMode.Update))
{
ZipArchiveEntry readmeEntry = archive.CreateEntry("Readme.txt");
using (StreamWriter writer = new StreamWriter(readmeEntry.Open()))
{
writer.WriteLine("Information about this package.");
writer.WriteLine("========================");
}
}
}
As Praveen Paulose suggested you can use ZipFileExtensions.CreateEntryFromFile to create entry from file to add to archive.

Creating a zip file in situ within azure blob storage

I have files stored in one container within a blob storage account. I need to create a zip file in a second container containing the files from the first container.
I have a solution that works using a worker role and DotNetZip but because the zip file could end up being 1GB in size I am concerned that doing all the work in-process, using MemoryStream objects etc. is not the best way of doing this. My biggest concern is that of memory usage and freeing up resources given that this process could happen several times a day.
Below is some very stripped down code showing the basic process in the worker role:
using (ZipFile zipFile = new ZipFile())
{
foreach (var uri in uriCollection)
{
var blob = new CloudBlob(uri);
byte[] fileBytes = blob.DownloadByteArray();
using (var fileStream = new MemoryStream(fileBytes))
{
fileStream.Seek(0, SeekOrigin.Begin);
byte[] bytes = CryptoHelp.EncryptAsBytes(fileStream, "password", null);
zipFile.AddEntry("entry name", bytes);
}
}
using (var zipStream = new MemoryStream())
{
zipFile.Save(zipStream);
zipStream.Seek(0, SeekOrigin.Begin);
var blobRef = ContainerDirectory.GetBlobReference("output uri");
blobRef.UploadFromStream(zipStream);
}
}
Can someone suggest a better approach please?
At the time of writing this question, I was unaware of the LocalStorage options available in Azure. I was able to write files individually to this and the work with them within the LocalStorage and then write them back to blob storage.
If all you are worried aobut is your memorysteam taking up too much memory then what you can do is implement your own stream and as your stream is being read, you you add your zip files to the stream and remove already read files from the stream. This will keep your memory stream size to the size of one file.

Unzip file (odt or docx) in a memory stream

I have been developing an web application with Asp.Net and I'm using SharpZipLib to work with odt files (from Open Office) and in the future docx files (for ms office). I need to open an odt file (like a zip file) change a xml file inside it, zip again and give it to the browser send to my client.
I can do this in file system but it will get a space in my disk temporarily and we don't want it. I would like to do this in memory (with a MemoryStream class), but I don't know how to unzip folders/files in a memory stream with SharpZipLib, change and use it to zip again. Is there any sample about how to do this?
Thank you
You can use something like
Stream inputStream = //... File.OpenRead(...);
//for read file
ZipInputStream zipInputStream = new ZipInputStream(inputStream));
//for output
MemoryStream memoryStream = new MemoryStream();
using ( ZipOutputStream zipStream = new ZipOutputStream(memoryStream))
{
ZipEntry entry = new ZipEntry("...");
//...
zipStream.PutNextEntry(entry);
zipStream.Write(data, 0, data.Length);
//...
zipStream.Finish();
zipStream.Close();
}
Edit ::
You need in general unZip your file, get ZipEntry , change , and write in ZipOutputStream with MemoryStream.
Use this article http://www.codeproject.com/KB/cs/Zip_UnZip.aspx

Categories

Resources