Compress large files using .NET Framework ZipArchive class on ASP.NET

Compress large files using .NET Framework ZipArchive class on ASP.NET - c#

I have a code that get all files on a directory, compresses each one and creates a .zip file. I'm using the .NET Framework ZipArchive class on the System.IO.Compression namespace and the extension method CreateEntryFromFile. This is working well except when processing large files (aproximately 1GB and up), there it throws a System.IO.Stream Exception "Stream too large".
On the extension method reference on MSDN it states that:
When ZipArchiveMode.Update is present, the size limit of an entry is limited to Int32.MaxValue. This limit is because update mode uses a MemoryStream internally to allow the seeking required when updating an archive, and MemoryStream has a maximum equal to the size of an int.
So this explains the exception I get, but provides no further way of how to overcome this limitation. How can I allow large file proccesing?
Here is my code, its part of a class, just in case, the GetDatabaseBackupFiles() and GetDatabaseCompressedBackupFiles() functions returns a list of FileInfo objects that I iterate:
public void CompressBackupFiles()
{
var originalFiles = GetDatabaseBackupFiles();
var compressedFiles = GetDatabaseCompressedBackupFiles();
var pendingFiles = originalFiles.Where(c => compressedFiles.All(d => Path.GetFileName(d.Name) != Path.GetFileName(c.Name)));
foreach (var file in pendingFiles)
{
var zipPath = Path.Combine(_options.ZippedBackupFilesBasePath, Path.GetFileNameWithoutExtension(file.Name) + ".zip");
using (ZipArchive archive = ZipFile.Open(zipPath, ZipArchiveMode.Update))
{
archive.CreateEntryFromFile(file.FullName, Path.GetFileName(file.Name));
}
}
DeleteFiles(originalFiles);
}

When you are only creating a zip file, replace the ZipArchiveMode.Update with ZipArchiveMode.Create.
The update mode is meant for cases, when you need delete files from an existing archive, or add new files to existing archive.
In the update mode the whole zip file is loaded into memory and it consumes a lot of memory for big files. Therefore this mode should be avoided when possible.

Related

C#- Renci.Ssh.Net- Which one gives optimized performance- WriteAllText Vs. UploadFile

I need to generate multiple XML files at SFTP location from C# code. for SFTP connectivity, I am using Renci.Ssh.net. I found there are different methods to generate files including WriteAllText() and UploadFile(). I am producing XML string runtime, currently I've used WriteAllText() method (just to avoid creating the XML file on local and thus to avoid IO operation).
using (SftpClient client = new SftpClient(host,port, sftpUser, sftpPassword))
{
client.Connect();
if (client.IsConnected)
{
client.BufferSize = 1024;
var filePath = sftpDir + fileName;
client.WriteAllText(filePath, contents);
client.Disconnect();
}
client.Dispose();
}
Will using UploadFile(), either from FileStream or MemoryStream give me better performance in long run?
The result document size will be in KB, around 60KB.
Thanks!

SftpClient.UploadFile is optimized for uploads of large amount of data.
But for 60KB, I'm pretty sure that it makes no difference whatsoever. So you can continue using the more convenient SftpClient.WriteAllText.
Though, I believe that most XML generators (like .NET XmlWriter are able to write XML to Stream (it's usually the preferred output API, rather than a string). So the use of SftpClient.UploadFile can be more convenient in the end.
See also What is the difference between SftpClient.UploadFile and SftpClient.WriteAllBytes?

Compress large log file before reading

We have a large amount of logs (117 logs with total of about 17gb of data). It's straight text so I know it will compress well. I'm not looking for great compression, or speed (but that would be a good bonus). What I currently do is get a list of log files to read (they have a date stamp in the file name, so I filter on that first). After I get the list I then read each file using File.ReadAllLines() but we also filter on that...
private void GetBulkUpdateItems(List<string> allLines, Regex updatedRowsRegEx)
{
foreach (var file in this)
allLines.AddRange(File.ReadAllLines(file).Where(x => updatedRowsRegEx.IsMatch(x)));
allLines.Sort();
}
reading 5 files from the network takes about 22 seconds. What I'd like to do is compress the list of files into a single zip file. copy the zip file locally, then unzip them and do the rest. Problem is I can't figure out how to start. Since I'm using .net 4.5 I first tried System.IO.Compression.ZipFile but it wants a Directory and I don't want all 117 files. I saw someone use a network stream and 7zip which sounded promising, and I'm fairly certain that 7zip is installed on the server I need the logs from (Probably not important because we use the UNC path). So I'm stuck. Any suggestions?

ZipArchive is the underlying class for ZipFile and allows more granular manipulation.
Sample from the article adding hardcoded text:
using (FileStream zipToOpen = new FileStream(
#"c:\users\exampleuser\release.zip", FileMode.Open))
{
using (ZipArchive archive = new ZipArchive(zipToOpen, ZipArchiveMode.Update))
{
ZipArchiveEntry readmeEntry = archive.CreateEntry("Readme.txt");
using (StreamWriter writer = new StreamWriter(readmeEntry.Open()))
{
writer.WriteLine("Information about this package.");
writer.WriteLine("========================");
}
}
}
As Praveen Paulose suggested you can use ZipFileExtensions.CreateEntryFromFile to create entry from file to add to archive.

DeflateStream CopyTo writes nothing and throws no exceptions

I've basically copied this code sample directly from msdn with some minimal changes. The CopyTo method is silently failing and I have no idea why. What would cause this behavior? It is being passed a 78 KB zipped folder with a single text file inside of it. The returned FileInfo object points to a 0 KB file. No exceptions are thrown.
public static FileInfo DecompressFile(FileInfo fi)
{
// Get the stream of the source file.
using (FileStream inFile = fi.OpenRead())
{
// Get original file extension,
// for example "doc" from report.doc.cmp.
string curFile = fi.FullName;
string origName = curFile.Remove(curFile.Length
- fi.Extension.Length);
//Create the decompressed file.
using (FileStream outFile = File.Create(origName))
{
// work around for incompatible compression formats found
// here http://george.chiramattel.com/blog/2007/09/deflatestream-block-length-does-not-match.html
inFile.ReadByte();
inFile.ReadByte();
using (DeflateStream Decompress = new DeflateStream(inFile,
CompressionMode.Decompress))
{
// Copy the decompression stream
// into the output file.
Decompress.CopyTo(outFile);
return new FileInfo(origName);
}
}
}
}

In a comment you say that you are trying to decompress a zip file. The DeflateStream class can not be used like this on a zip file. The MSDN example you mentioned uses DeflateStream to create individual compressed files and then uncompresses them.
Although zip files might use the same algorithm (not sure about that) they are not just compressed versions of a single file. A zip file is a container that can hold many files and/or folders.
If you can use .NET Framework 4.5 I would suggest to use the new ZipFile or ZipArchive class. If you must use an earlier framework version there are free libraries you can use (like DotNetZip or SharpZipLib).

7zip compress network stream

I will like to compress a file before sending it through the network. I think the best approach is 7zip because it is free and open source.
How I use 7zip with .net?
I know that 7zip is free and that they have the source code in c# but for some reason it is very slow on c# so I rather call the dll 7z.dll that comes when installing 7zip for performance reasons. So the way I am able to eassily marshal and call the methods in 7z.dll is with the help of the library called sevenzipsharp . For example adding that dll to my project will enable me to do:
// if you installed 7zip 64bit version then make sure you change plataform target
// like on the picture I showed above!
SevenZip.SevenZipCompressor.SetLibraryPath(#"C:\Program Files\7-Zip\7z.dll");
var stream = System.IO.File.OpenRead(#"SomeFileToCompress.txt");
var outputStream = System.IO.File.Create("Output.7z");
SevenZip.SevenZipCompressor compressor = new SevenZip.SevenZipCompressor();
compressor.CompressionMethod = SevenZip.CompressionMethod.Lzma2;
compressor.CompressionLevel = SevenZip.CompressionLevel.Ultra;
compressor.CompressStream(stream, outputStream);
that's how I use 7zip within c#.
Now my question is:
I will like to send a compressed file over the network. I know I could compress it first then send it. The file is 4GB so I will have to wait a long time for it to compress. I will be wasting a lot of space on hard drive. then I will finally be able to send it. I think that is to complicated. I was wondering how it will be possible to send the file meanwhile it is being compressed.
It seems to be a problem with SevenZipSharp:

Have you considered an alternate library - one that doesn't even require 7-Zip to be installed / available?
From the description posted at http://dotnetzip.codeplex.com/ :
creating zip files from stream content, saving to a stream, extracting
to a stream, reading from a stream
Unlike 7-Zip, DotNetZip is designed to work with C# / .Net.
Plenty of examples - including streaming, are available at http://dotnetzip.codeplex.com/wikipage?title=CS-Examples&referringTitle=Examples .
Another option is to use the 7-Zip Command Line Version (7z.exe), and write to/read from standard in/out. This would allow you to use the 7-Zip file format, while also keeping all of the core work in native code (though there likely won't be much of a significant difference).
Looking back at SevenZipSharp:
Since the 0.29 release, streaming is supported.
Looking at http://sevenzipsharp.codeplex.com/SourceControl/changeset/view/59007#364711 :
it seems you'd want this method:
public void CompressStream(Stream inStream, Stream outStream)
Thank you for considering performance here! I think way too many people would do exactly what you're trying to avoid: compress to a temp file, then do something with the temp file.

CompressStream threw an exception. My code is as follows:
public void TestCompress()
{
string fileToCompress = #"C:\Users\gary\Downloads\BD01.DAT";
byte[] inputBytes = File.ReadAllBytes(fileToCompress);
var inputStream = new MemoryStream(inputBytes);
byte[] zipBytes = new byte[38000000]; // this memory size is large enough.
MemoryStream outStream = new MemoryStream(zipBytes);
string compressorEnginePath = #"C:\Engine\7z.dll";
SevenZipCompressor.SetLibraryPath(compressorEnginePath);
compressor = new SevenZip.SevenZipCompressor();
compressor.CompressionLevel = CompressionLevel.Fast;
compressor.CompressionMethod = CompressionMethod.Lzma2;
compressor.CompressStream(inputStream, outputStream);
inputStream.Close();
outputStream.Close();
The exception messages:
Message: Test method Test7zip.UnitTest1.TestCompress threw exception:
SevenZip.SevenZipException: The execution has failed due to the bug in the SevenZipSharp.
Please report about it to http://sevenzipsharp.codeplex.com/WorkItem/List.aspx, post the release number and attach the archive

ICSharpCode.SharpZipLib.Zip example with crc variable details

I am using icsharpziplib dll for zipping sharepoint files using c# in asp.net
When i open the output.zip file, it is showing "zip file is either corrupted or damaged".
And the crc value for files in the output.zip is showing as 000000.
How do we calculate or configure crc value using icsharpziplib dll?
Can any one have the good example how to do zipping using memorystreams?

it seems you're not creating each ZipEntry.
Here's is a code that I adapted to my needs:
http://wiki.sharpdevelop.net/SharpZipLib-Zip-Samples.ashx#Create_a_Zip_fromto_a_memory_stream_or_byte_array_1
Anyway with SharpZipLib there are many ways you can work with zip file: the ZipFile class, the ZipOutputStream and the FastZip.
I'm using the ZipOutputStream to create an in-memory ZIP file, adding in-memory streams to it and finally flushing to disk, and it's working quite good. Why ZipOutputStream? Because it's the only choice available if you want to specify a compression level and use Streams.
Good luck :)

1:
You could do it manually but the ICSharpCode library will take care of it for you. Also something I've discovered: 'zip file is either corrupted or damaged' can also be a result of not adding your zip entry name correctly (such as an entry that sits in a chain of subfolders).
2:
I solved this problem by creating a compressionHelper utility. I had to dynamically compose and return zip files. Temp files were not an option as the process was to be run by a webservice.
The trick with this was a BeginZip(), AddEntry() and EndZip() methods (because I made it into a utility to be invoked. You could just use the code directly if need be).
Something I've excluded from the example are checks for initialization (like calling EndZip() first by mistake) and proper disposal code (best to implement IDisposable and close your zipfileStream and your memoryStream if applicable).
using System.IO;
using ICSharpCode.SharpZipLib.Zip;
public void BeginZipUpdate()
{
_memoryStream = new MemoryStream(200);
_zipOutputStream = new ZipOutputStream(_memoryStream);
}
public void EndZipUpdate()
{
_zipOutputStream.Finish();
_zipOutputStream.Close();
_zipOutputStream = null;
}
//Entry name could be 'somefile.txt' or 'Assemblies\MyAssembly.dll' to indicate a folder.
//Unsure where you'd be getting your file, I'm reading the data from the database.
public void AddEntry(string entryName, byte[] bytes)
{
ZipEntry entry = new ZipEntry(entryName);
entry.DateTime = DateTime.Now;
entry.Size = bytes.Length;
_zipOutputStream.PutNextEntry(entry);
_zipOutputStream.Write(bytes, 0, bytes.Length);
_zipOutputStreamEntries.Add(entryName);
}
So you're actually having the zipOutputStream write to a memoryStream. Then once _zipOutputStream is closed, you can return the contents of the memoryStream.
public byte[] GetResultingZipFile()
{
_zipOutputStream.Finish();
_zipOutputStream.Close();
_zipOutputStream = null;
return _memoryStream.ToArray();
}
Just be aware of how much you want to add to a zipfile (delay in process/IO/timeouts etc).

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Compress large files using .NET Framework ZipArchive class on ASP.NET - c#

Related

C#- Renci.Ssh.Net- Which one gives optimized performance- WriteAllText Vs. UploadFile

Compress large log file before reading

DeflateStream CopyTo writes nothing and throws no exceptions

7zip compress network stream

ICSharpCode.SharpZipLib.Zip example with crc variable details

Categories

Resources