Unzip AWS object - c#

I am working on a project where I need to get Zip files from S3 Bucket.
I was able to copy these files one at a time to my EC2 instance using
foreach (S3Object o in response.S3Objects)
{
GetObjectRequest requests = new GetObjectRequest();
requests.BucketName = "mybucket";
requests.Key = o.Key;
GetObjectResponse responses = client.GetObject(requests);
responses.WriteResponseStreamToFile(#"D:\myfile.zip");
Console.WriteLine("{0}\t{1}\t{2}", o.Key, o.Size, o.LastModified);
}
but I would like to unzip these files on the fly to a specific location instead of copy them locally.
I tried the following but it did not work
using (ZipArchive archive = ZipFile.OpenRead(responses.ResponseStream.ToString())) //unzip file
{
foreach (ZipArchiveEntry entry in archive.Entries)
{
archive.ExtractToDirectory(myPath);
}
}
}
Thanks

ZipFile.OpenRead() takes a file name of a local file so that won't work. You can, however, unzip an archive in a MemoryStream (see this answer and this answer).

Related

how to find the file with max date/time inside a zip file having more than one sub directories

In a down stream system Every day one data folder is created within folder Data and files are generated over time within a sub folder TS and finally it's zip with name Data.zip and uploaded to azure blob by customer.
Now I am downloading the zip file and trying to find out the one file which has max date/time. using below code I am able to print all files name inside the zip file, but how to get (print) only the file with max date/time?
var blobClient = new BlobClient("conn-string", "upload", "Data.zip");
await DownloadFromStream(blobClient);
public static async Task DownloadFromStream(BlobClient blobClient)
{
int i = 0;
var stream = await blobClient.OpenReadAsync(new BlobOpenReadOptions(false));
using ZipArchive archive = new ZipArchive(stream);
foreach (ZipArchiveEntry entry in archive.Entries.OrderBy(x => x.LastWriteTime))
{
if (entry.Name.StartsWith("XXX_TS_"))
{
i++;
Console.WriteLine(i);
Console.WriteLine(entry.Name);
}
}
}
I have tried in the below way, and it worked for me.
Uploaded the below files to azure container in zip folder.
In Azure portal.
Code snippet
var blobClient = new BlobClient("ConnectionString", "ContainerName", "Data.zip");
await DownloadFromStream(blobClient);
public static async Task DownloadFromStream(BlobClient blobClient)
{
var stream = await blobClient.OpenReadAsync(new BlobOpenReadOptions(false));
ZipArchive archive = new ZipArchive(stream);
List<string> sFileslist = new List<string>();
foreach (ZipArchiveEntry entry in archive.Entries.OrderBy(x => x.LastWriteTime))
{
if (entry.Name.Contains("_TS_"))
{
string[] strFileTokens = entry.Name.Split('_');
sFileslist.Add(strFileTokens[2]);
}
}
string maxValue = sFileslist.Max();
Console.WriteLine(maxValue);
}
Fetched the Latest file as output.

Extract files from stream containing zip

I am making a GET request using HttpClient to download a zip file from the internet.
I want to extract all the files contained in the zip file without saving the zip file to disk.
Currently, I am able to download and save the zip file to disk, extract its contents and then delete the zip file from disk. This perfectly fine. However, I want to optimize the process.
I found a way to extract the contents directly from the downloaded zip stream but I have to specify the filenames and extensions.
I am not sure how to extract the contents while preserving their original filenames and extensions without me specifying them.
Current Approach:
string requestUri = "https://www.nuget.org/api/v2/package/" + PackageName + "/" + PackageVersion;
HttpResponseMessage response = await client.GetAsync(requestUri);
response.EnsureSuccessStatusCode();
using Stream PackageStream = await response.Content.ReadAsStreamAsync();
SaveStream($"{DownloadPath}.zip", PackageStream);
ZipFile.ExtractToDirectory($"{DownloadPath}.zip", ExtractPath);
File.Delete($"{DownloadPath}.zip");
// Directly extract Zip contents without saving file and without losing filename and extension
using (ZipArchive archive = new ZipArchive(await response.Content.ReadAsStreamAsync()))
{
foreach (ZipArchiveEntry entry in archive.Entries)
{
using (Stream stream = entry.Open())
{
using (FileStream file = new FileStream("file.txt", FileMode.Create, FileAccess.Write))
{
stream.CopyTo(file);
}
}
}
}
.NET 4.8
.NET Core 3.1
C# 8.0
Any help in this regards would be appreciated.
Please feel free to comment on alternative approaches or suggestions.
Thank you in advance.
ZipArchiveEntry has a Name and FullName property that can be used to get the names of the files within the archive while preserving their original filenames and extensions
The FullName property contains the relative path, including the subdirectory hierarchy, of an entry in a zip archive. (In contrast, the Name property contains only the name of the entry and does not include the subdirectory hierarchy.)
For example
using (ZipArchive archive = new ZipArchive(await response.Content.ReadAsStreamAsync())) {
foreach (ZipArchiveEntry entry in archive.Entries) {
using (Stream stream = entry.Open()) {
string destination = Path.GetFullPath(Path.Combine(downloadPath, entry.FullName));
var directory = Path.GetDirectoryName(destination);
if (!Directory.Exists(directory))
Directory.CreateDirectory(directory);
using (FileStream file = new FileStream(destination, FileMode.Create, FileAccess.Write)) {
await stream.CopyToAsync(file);
}
}
}
}
will extract the files in the same subdirectory hierarchy as they were stored in the archive while if entry.Name was used, all the files would be extracted to the same location.

Zip up collection of files and move to target location

I'm collecting all my files in a target directory and adding them to a zip folder. Once this zip is made and no more files need adding to it, I want to move this zip folder to another location.
Here is my code for doing all of the above:
var targetFolder = Path.Combine(ConfigurationManager.AppSettings["targetFolder"], "Inbound");
var archiveFolder = ConfigurationManager.AppSettings["ArchiveFolder"];
// get files
var files = Directory.GetFiles(targetFolder)
.Select(f => new FileInfo(f))
.ToList();
// places files into zip
using (var zip = ZipFile.Open("file.zip", ZipArchiveMode.Create))
{
foreach (var file in files)
{
var entry = zip.CreateEntry(file.Name);
entry.LastWriteTime = DateTimeOffset.Now;
using (var stream = File.OpenRead(file.ToString()))
using (var entryStream = entry.Open())
stream.CopyTo(entryStream);
}
}
// move the zip file
File.Move("file.zip", archiveFolder );
Where I'm falling down is the moving of the zip folder. When my code gets to File.Move I get an error telling me it can not create something that already exists. This happen even when I hard code in my archive folder location instead of getting it from my config.
What am I doing wrong with this?
You need to specify the destination file name as well as directory:
File.Move("file.zip", Path.Combine(archiveFolder, "file.zip"));

How to download Azure Blobs by referencing the file?

I want to download files from Azure using C# then stream those into MemoryStream after that return/display to the user in Front-end with a link (Azure URI - which goes to the Azure blob) and the user will be able to see those PDF files in the browser or download them. There are multiple blobs/files in Azure so, I want to loop through each file and download to stream for example: using a foreach.
I'm not sure how can I reference those blobs CloudBlockBlob blockBlob = container.GetBlockBlobReference(fileName); as here I could give a name of the specific file but I've multiple files so not sure what to go here "fileName".
Code:
var files = container.ListBlobs();
foreach (var file in files)
{
using (var memoryStream = new MemoryStream())
{
CloudBlockBlob blockBlob = container.GetBlockBlobReference(fileName);
blockBlob.DownloadToStream(memoryStream);
}
}
I'm not sure if I'm looping correcting right now in the code and downloading every blob?
Also, I tried replacing fileName with file.Uri.Segments.Last() -
I guess which gets the name of blobs.
The problem I'm having is that this foreach is just getting me one PDF file whenever I try to use the links in front-end. So, I need to know how can I properly loop through each file and download them?
So, I need to know how can I properly loop through each file and download them?
We can't download the mutiple files from the memory directly. If zip file is acceptable, you could use a compressed file such as a zip file to transfer multiple files instead. The following is my demo code, it works correctly on my side.
using (var ms = new MemoryStream())
{
using (var zipArchive = new ZipArchive(ms, ZipArchiveMode.Create, true))
{
foreach (var file in files)
{
if (file.GetType() != typeof(CloudBlockBlob)) continue;
var blob = (CloudBlockBlob) file;
var entry = zipArchive.CreateEntry(blob.Name, CompressionLevel.Fastest);
using (var entryStream = entry.Open())
{
CloudBlockBlob blockBlob = container.GetBlockBlobReference(blob.Name);
blockBlob.DownloadToStream(entryStream);
}
}
}
}

How to read data from a zip file without having to unzip the entire file

Is there anyway in .Net (C#) to extract data from a zip file without decompressing the complete file?
I possibly want to extract data (file) from the start of a zip file if the compression algorithm compress the file used was in a deterministic order.
With .Net Framework 4.5 (using ZipArchive):
using (ZipArchive zip = ZipFile.Open(zipfile, ZipArchiveMode.Read))
foreach (ZipArchiveEntry entry in zip.Entries)
if(entry.Name == "myfile")
entry.ExtractToFile("myfile");
Find "myfile" in zipfile and extract it.
DotNetZip is your friend here.
As easy as:
using (ZipFile zip = ZipFile.Read(ExistingZipFile))
{
ZipEntry e = zip["MyReport.doc"];
e.Extract(OutputStream);
}
(you can also extract to a file or other destinations).
Reading the zip file's table of contents is as easy as:
using (ZipFile zip = ZipFile.Read(ExistingZipFile))
{
foreach (ZipEntry e in zip)
{
if (header)
{
System.Console.WriteLine("Zipfile: {0}", zip.Name);
if ((zip.Comment != null) && (zip.Comment != ""))
System.Console.WriteLine("Comment: {0}", zip.Comment);
System.Console.WriteLine("\n{1,-22} {2,8} {3,5} {4,8} {5,3} {0}",
"Filename", "Modified", "Size", "Ratio", "Packed", "pw?");
System.Console.WriteLine(new System.String('-', 72));
header = false;
}
System.Console.WriteLine("{1,-22} {2,8} {3,5:F0}% {4,8} {5,3} {0}",
e.FileName,
e.LastModified.ToString("yyyy-MM-dd HH:mm:ss"),
e.UncompressedSize,
e.CompressionRatio,
e.CompressedSize,
(e.UsesEncryption) ? "Y" : "N");
}
}
Edited To Note: DotNetZip used to live at Codeplex. Codeplex has been shut down. The old archive is still available at Codeplex. It looks like the code has migrated to Github:
https://github.com/DinoChiesa/DotNetZip. Looks to be the original author's repo.
https://github.com/haf/DotNetZip.Semverd. This looks to be the currently maintained version. It's also packaged up an available via Nuget at https://www.nuget.org/packages/DotNetZip/
Something like this will list and extract the files one by one, if you want to use SharpZipLib:
var zip = new ZipInputStream(File.OpenRead(#"C:\Users\Javi\Desktop\myzip.zip"));
var filestream = new FileStream(#"C:\Users\Javi\Desktop\myzip.zip", FileMode.Open, FileAccess.Read);
ZipFile zipfile = new ZipFile(filestream);
ZipEntry item;
while ((item = zip.GetNextEntry()) != null)
{
Console.WriteLine(item.Name);
using (StreamReader s = new StreamReader(zipfile.GetInputStream(item)))
{
// stream with the file
Console.WriteLine(s.ReadToEnd());
}
}
Based on this example: content inside zip file
Here is how a UTF8 text file can be read from a zip archive into a string variable (.NET Framework 4.5 and up):
string zipFileFullPath = "{{TypeYourZipFileFullPathHere}}";
string targetFileName = "{{TypeYourTargetFileNameHere}}";
string text = new string(
(new System.IO.StreamReader(
System.IO.Compression.ZipFile.OpenRead(zipFileFullPath)
.Entries.Where(x => x.Name.Equals(targetFileName,
StringComparison.InvariantCulture))
.FirstOrDefault()
.Open(), Encoding.UTF8)
.ReadToEnd())
.ToArray());
the following code can read specific file as byte array :
using ZipArchive zipArchive = ZipFile.OpenRead(zipFilePath);
foreach(ZipArchiveEntry zipArchiveEntry in zipArchive.Entries)
{
if(zipArchiveEntry.Name.Equals(fileName,StringComparison.OrdinalIgnoreCase))
{
Stream stream = zipArchiveEntry.Open();
using MemoryStream memoryStream = new MemoryStream();
await stream.CopyToAsync(memoryStream);
return memoryStream.ToArray();
}
}
Zip files have a table of contents. Every zip utility should have the ability to query just the TOC. Or you can use a command line program like 7zip -t to print the table of contents and redirect it to a text file.
In such case you will need to parse zip local header entries. Each file, stored in zip file, has preceding Local File Header entry, which (normally) contains enough information for decompression, Generally, you can make simple parsing of such entries in stream, select needed file, copy header + compressed file data to other file, and call unzip on that part (if you don't want to deal with the whole Zip decompression code or library).

Categories

Resources