How to read data from inner archives without extracting zip file?

How to read data from inner archives without extracting zip file? - c#

I have a zip file which contains inner zip file (Ex:ZipFile1.zip->ZipFile2.zip->file.txt). I want to read the data of inner archive file content (file.txt) using ICSharpCode.SharpZipLib library without extracting to disk. Is it possible? If it is possible, Let me know how to get this.

Based on this answer, you can open a file within the zip as a Stream. You can also open a ZipFile from a Stream. I'm sure you can see where this is heading.
using (var zip = new ZipFile("ZipFile1.zip"))
{
var nestedZipEntry = zip.GetEntry("ZipFile2.zip");
using (var nestedZipStream = zip.GetInputStream(nestedZipEntry))
using (var nestedZip = new ZipFile(nestedZipStream))
{
var fileEntry = nestedZip.GetEntry("file.txt");
using (var fileStream = nestedZip.GetInputStream(fileEntry))
using (var reader = new StreamReader(fileStream))
{
Console.WriteLine(reader.ReadToEnd());
}
}
}
What we're doing here:
Open ZipFile1.zip
Find the entry for ZipFile2.zip
Open ZipFile2.zip as a Stream
Create a new ZipFile object around nestedZipStream.
Find the entry for file.txt
Create a StreamReader around fileStream to read the text file.
Read the contents of file.txt and output it to the console.
Try it online - in this sample, the base64 data is the binary data of a zip file which contains "test.zip", which in turn contains "file.txt". The contents of that text file is "hello".
P.S. If an entry isn't found then GetEntry will return null. You'll want to check for that in any code you write. It works here because I'm sure that these entries exist in their respective archives.

Related

How to find a text in the uploaded PDF file in ASP.NET c#

I want to find whether a text is present in the uploaded PDF file in ASP.NET c#.
using (MemoryStream str = new MemoryStream(this.docUploadField.FileBytes))
{
using (StreamReader sr = new StreamReader(str, Encoding.UTF8))
{
string line = sr.ReadToEnd();
}
}
I am getting the below as the file content when I read the contents of file.
Please help me with this

You surely need some PDF reading library.
Most famous being
IText (ITextSharp for who remembers it): https://github.com/itext/itext7-dotnet
PdfSharp: https://github.com/empira/PDFsharp
and many other free options.
With those you open pdf file and read it and take the text you need.
Usually they give you a collection of the PDF elements (paragraphs, images, etc etc, and you loop through them or use a search function to look for what you need)

Compress large log file before reading

We have a large amount of logs (117 logs with total of about 17gb of data). It's straight text so I know it will compress well. I'm not looking for great compression, or speed (but that would be a good bonus). What I currently do is get a list of log files to read (they have a date stamp in the file name, so I filter on that first). After I get the list I then read each file using File.ReadAllLines() but we also filter on that...
private void GetBulkUpdateItems(List<string> allLines, Regex updatedRowsRegEx)
{
foreach (var file in this)
allLines.AddRange(File.ReadAllLines(file).Where(x => updatedRowsRegEx.IsMatch(x)));
allLines.Sort();
}
reading 5 files from the network takes about 22 seconds. What I'd like to do is compress the list of files into a single zip file. copy the zip file locally, then unzip them and do the rest. Problem is I can't figure out how to start. Since I'm using .net 4.5 I first tried System.IO.Compression.ZipFile but it wants a Directory and I don't want all 117 files. I saw someone use a network stream and 7zip which sounded promising, and I'm fairly certain that 7zip is installed on the server I need the logs from (Probably not important because we use the UNC path). So I'm stuck. Any suggestions?

ZipArchive is the underlying class for ZipFile and allows more granular manipulation.
Sample from the article adding hardcoded text:
using (FileStream zipToOpen = new FileStream(
#"c:\users\exampleuser\release.zip", FileMode.Open))
{
using (ZipArchive archive = new ZipArchive(zipToOpen, ZipArchiveMode.Update))
{
ZipArchiveEntry readmeEntry = archive.CreateEntry("Readme.txt");
using (StreamWriter writer = new StreamWriter(readmeEntry.Open()))
{
writer.WriteLine("Information about this package.");
writer.WriteLine("========================");
}
}
}
As Praveen Paulose suggested you can use ZipFileExtensions.CreateEntryFromFile to create entry from file to add to archive.

Create ZipArchive from XML with base64 encoded content

I am creating an XML file on the fly.
One of it's nodes contains a ZIP file encoded as a BASE64 string.
I then create another ZIP file.
I add this XML file and a few other JPEG files.
I output the file to the browser.
I am unable to open the FINAL ZIP file.
I get: "Windows cannot open the folder. The Compressed(zipped) Folder'c:\path\file.zip' is invalid."
I am able to save my original XML file to the file system.
I can open that XML file, decode the ZIP node and save to the file system.
I am then able to open that Zip file with no problems.
I can create the final ZIP file, OMIT my XML file, and the ZIP file opens no problem.
I seem to only have an issue with I attempt to ZIP an XML file that has a node with ZIP content encoded as a BASE64 string.
Any ideas? Code snipets are below. Heavily edited.
XDocument xDoc = new XDocument();
XDocument xDocReport = new XDocument();
XElement xNodeReport;
using (FileStream fsData = new FileStream(strFullFilePath, FileMode.Open, FileAccess.Read)) {
xDoc = XDocument.Load(fsData);
xNodeReport = xDoc.Element("Data").Element("Reports").Element("Report");
//SNIP
//create XDocument xDocReport
//SNIO
using (MemoryStream zipInMemoryReport = new MemoryStream()) {
using (ZipArchive zipFile = new ZipArchive(zipInMemoryReport, ZipArchiveMode.Update)) {
//Add REPORT to ZIP file
ZipArchiveEntry entryReport = zipFile.CreateEntry("data.xml");
using (StreamWriter writer = new StreamWriter(entryReport.Open())) {
writer.Write(xDocReport.ToString());
} //END USING report entry
}
xNodeReport.Value = System.Convert.ToBase64String(zipInMemoryReport.GetBuffer());
//I am able to write this file to disk and manipulate it no problem.
//File.WriteAllText("c:\\users\\snip\\desktop\\Report.xml",xDoc.ToString());
}
//create ZIP for response
using (MemoryStream zipInMemory = new MemoryStream()) {
using (ZipArchive zipFile = new ZipArchive(zipInMemory, ZipArchiveMode.Update)) {
//Add REPORT to ZIP file
ZipArchiveEntry entryReportWrapper = zipFile.CreateEntry("Report.xml");
//THIS IS THE STEP THAT makes the Zip "invalid". Although i can open and manipulate this source file no problem.
//********
using (StreamWriter writer = new StreamWriter(entryReportWrapper.Open())) {
xDoc.Save(writer);
}
//Add JPEG(s) to report
//Create Charts
if (chkDLSalesPrice.Checked) {chartDownloadSP.SaveImage(entryChartSP.Open(), ChartImageFormat.Jpeg);}
if (chkDLSalesDOM.Checked) {chartDownloadDOM.SaveImage(entryChartDOM.Open(), ChartImageFormat.Jpeg);}
if (chkDLSPLP.Checked) {chartDownloadSPLP.SaveImage(entryChartSPLP.Open(), ChartImageFormat.Jpeg);}
if (chkDLSPLP.Checked) {chartDownloadLP.SaveImage(entryChartLP.Open(), ChartImageFormat.Jpeg);}
} // END USING ziparchive
Response.Clear();
Response.AppendHeader("content-disposition", "attachment; filename=file.zip");
Response.ContentType = "application/zip";
Response.BinaryWrite(zipInMemory.GetBuffer());
Response.End();

Without a good, minimal, complete code example, it's impossible to know for sure what bugs are in the code. But there are at least two apparent errors in the code snippet you posted, one of which could easily be responsible for the "invalid .zip" error:
In the statement writer.Write(xDocReport.ToString());, the variable xDocReport has not been initialized to anything useful, at least not in the code you posted. So you'll get an empty XML document in the archive.
Since the code example is incomplete, it's possible you just omitted from the code example in your question the initialization of that variable to something else. In any case, even if you didn't that would just lead to an empty XML document in the archive, not an invalid archive.
More problematic though…
You are calling GetBuffer() on your MemoryStream objects, instead of ToArray(). You want the latter. The former gets the entire backing buffer for the MemoryStream object, including the uninitialized bytes past the end of the valid stream. Since a valid .zip file includes a CRC value at the end of the file, adding extra data beyond that causes anything trying to read the file as a .zip archive to miss the correct CRC, reading the uninitialized data instead.
Replace your calls to GetBuffer() with calls to ToArray() instead.
If the above does not lead to a solution for your problem, you should edit your post, to provide a better code example.
One last comment: there is no point in initializing a variable like xDoc to an empty XDocument object when you're going to just replace that object with a different one (e.g. by calling XDocument.Load()).

Creating a zipped Document that containing files from memory stream / byte array in C#

I'm needing to create a zipped document containing files that exist on the server. I am using the Ionic.Zip to do so, and to create a new Package (which is the zip file) I have to have either a path to a physical file or a stream. I am trying to not create an actual file that would be the zip file, instead just create a stream that would exist in memory or something. how can i do this?

Create the package using a MemoryStream then.

You can try the save method in the ZipFile Class. It can save to a stream
try this.
using (MemoryStream ms = new MemoryStream())
{
using (Ionic.Zip.ZipFile zipFile = new Ionic.Zip.ZipFile())
{
zipFile.AddFiles(filesToBeZipped, false, "NewFolder");//filesTobeZipped is a List<string>
zipFile.Save(ms);
}
}

You'll want to use the .AddEntry method on the ZipFile you've created specifying a name and the byte[] containing the actual file data.
ex.
ZipFile zipFile = new ZipFile();
zipFile.AddEntry(file.FileName, file.FileData);
where file.FileName will be the entry name (in the zip file) and file.FileData is the byte array.

Unzip file (odt or docx) in a memory stream

I have been developing an web application with Asp.Net and I'm using SharpZipLib to work with odt files (from Open Office) and in the future docx files (for ms office). I need to open an odt file (like a zip file) change a xml file inside it, zip again and give it to the browser send to my client.
I can do this in file system but it will get a space in my disk temporarily and we don't want it. I would like to do this in memory (with a MemoryStream class), but I don't know how to unzip folders/files in a memory stream with SharpZipLib, change and use it to zip again. Is there any sample about how to do this?
Thank you

You can use something like
Stream inputStream = //... File.OpenRead(...);
//for read file
ZipInputStream zipInputStream = new ZipInputStream(inputStream));
//for output
MemoryStream memoryStream = new MemoryStream();
using ( ZipOutputStream zipStream = new ZipOutputStream(memoryStream))
{
ZipEntry entry = new ZipEntry("...");
//...
zipStream.PutNextEntry(entry);
zipStream.Write(data, 0, data.Length);
//...
zipStream.Finish();
zipStream.Close();
}
Edit ::
You need in general unZip your file, get ZipEntry , change , and write in ZipOutputStream with MemoryStream.
Use this article http://www.codeproject.com/KB/cs/Zip_UnZip.aspx

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

How to read data from inner archives without extracting zip file? - c#

I have a zip file which contains inner zip file (Ex:ZipFile1.zip->ZipFile2.zip->file.txt). I want to read the data of inner archive file content (file.txt) using ICSharpCode.SharpZipLib library without extracting to disk. Is it possible? If it is possible, Let me know how to get this.

Related

How to find a text in the uploaded PDF file in ASP.NET c#

Compress large log file before reading

Create ZipArchive from XML with base64 encoded content

Creating a zipped Document that containing files from memory stream / byte array in C#

Unzip file (odt or docx) in a memory stream

Categories

Resources